Protein sequences for the selected orthologs in OMA

For each ortholog group (OG) in OMA, we selected only one ortholog per species. For example, there are dozens of E. coli strains in OMA, so if an OG had multiple E. coli orthologs, we picked only one of these and ignored the others. We did this to avoid oversampling and guarantee a wider phylogenetic distribution. We only kept OGs with orthologs from 3 or more separate clades. That is, if an OG was composed of, say, orthologs from 4 different E. coli strains and 2 from B. subtilis strains, we ignored it, since there were only 2 strains represented in total.