Un test ‪ADN‬ pour prédire votre homosexualité ? Une affirmation fausse, pourtant relayée par les médias anglo-saxons

>> No, Scientists Have Not Found the ‘Gay Gene’
[spacer]

«Des scientifiques trouvent un “gène gay“ qui peut aider à prédire votre sexualité» ou encore «Des chercheurs expliquent qu’un changement dans un gène montre qui est gay»… Ces affirmations proviennent d’abord d’une « enquête » menée par une équipe de chercheurs de la célèbre université de Los Angeles, UCLA. Lors de la conférence American Society of Human Genetics 2015, Tuck Ngun, co-auteur de l’étude, a expliqué avoir trouvé plusieurs marqueurs épigénétiques (des changements chimiques de l’ADN qui n’affectent pas la séquence) qui seraient associés à l’homosexualité chez les hommes.

Pourtant, comme l’indique « The Atlantic », l’emballement médiatique cache de profondes divisions au sein de la communauté scientifique présente ce jour-là à la conférence. Pour mieux comprendre cette suspicion qui entoure le travail de Tuck Ngun, il faut regarder le détail de ses travaux. Il a basé ses analyses sur 37 paires de jumeaux hommes dont l’un était homosexuel et l’autre hétérosexuel, et sur 10 autres paires de jumeaux hommes, tous homosexuels. «Il a analysé 140.000 régions dans le génome des jumeaux et a cherché des marqueurs appelés méthylations, des Post-It chimiques qui disent où et quand les gènes sont activés». Après des ajustements sur le nombre de régions étudiées, il a construit un modèle de calcul pour comparer ces marqueurs et l’orientation sexuelle des patients.

Et c’est là que les résultats posent problème pour de nombreux scientifiques :

La classification de la sexualité des jumeaux n’était correcte que dans 67% des cas, et seuls cinq méthylations ont été conservés. Le fond du problème tient aussi dans la division faite entre les deux groupes de personnes testées, le premier ayant servi de base pour créer un modèle de calcul, et le second qui a été retenu pour les tests définitifs. Les chercheurs ont brisé une règle fondamentale en ne gardant que le modèle qui s’est révélé le plus précis lors des analyses sur le groupe test, deux choses qui doivent être faites individuellement l’une de l’autre normalement. Pour être clair, privilégier un modèle avec aussi peu d’échantillons à analyser ne devrait pas permettre de titrer de telles conclusions.

«Ce que nous avons, au final, souligne « The Atlantic », c’est une expédition sous-alimentée qui a utilisé des statistiques inappropriées et qui s’est accrochée à des résultats qui pourraient être faussement positifs. Les marqueurs épigénétiques pourraient bien être impliqués dans l’orientation sexuelle. Mais cette étude, malgré ce qu’elle affirme, ne le prouve pas et n’était pas construite pour le prouver.»

Pour le professeur John Greally, du Albert Einstein College of Medicine, il est tentant de tirer des conclusions hâtives, mais comme il l’a écrit sur son blog, «nous ne pouvons plus donner de la crédibilité à des études pauvres si ce champ d’étude veut survivre. Et par « pauvre », je veux dire impossibles à interpréter.»

Tuck Ngun a depuis fait savoir qu’il avait manqué de moyens et qu’il souhaitait améliorer son étude avec d’autres tests. En espérant que le modèle d’analyse sera lui aussi amélioré.

Vincent Manilève

>> This week, a team from the University of California, Los Angeles claimed to have found several epigenetic marks—chemical modifications of DNA that don’t change the underlying sequence—that are associated with homosexuality in men.

Postdoc Tuck Ngun presented the results yesterday at the American Society of Human Genetics 2015 conference. Nature News were among the first to break the story based on a press release issued by the conference organisers. Others quickly followed suit. “Have They Found The Gay Gene?” said the front page of Metro, a London paper, on Friday morning.

Meanwhile, the mood at the conference has been decidedly less complimentary, with several geneticists criticizing the methods presented in the talk, the validity of the results, and the coverage in the press.

Ngun’s study was based on 37 pairs of identical male twins who were discordant—that is, one twin in each pair was gay, while the other was straight—and 10 pairs who were both gay. He analysed 140,000 regions in the genomes of the twins and looked for methylation marks—chemical Post-It notes that dictate when and where genes are activated. He whittled these down to around 6,000 regions of interest, and then built a computer model that would use data from these regions to classify people based on their sexual orientation.

The best model used just five of the methylation marks, and correctly classified the twins 67 percent of the time. “To our knowledge, this is the first example of a biomarker-based predictive model for sexual orientation,” Ngun wrote in his abstract.

The problems begin with the size of the study, which is tiny. The field of epigenetics is littered with the corpses of statistically underpowered studies like these, which simply lack the numbers to produce reliable, reproducible results.

Unfortunately, the problems don’t end there. The team split their group into two: a “training set” whose data they used to build their algorithm, and a “testing set”, whose data they used to verify it. That’s standard and good practice—exactly what they should have done. But splitting the sample means that the study goes from underpowered to really underpowered.
If you use this strategy, chances are you will find a positive result through random chance alone.

There’s also another, larger issue. As far as could be judged from the unpublished results presented in the talk, the team used their training set to build several models for classifying their twins, and eventually chose the one with the greatest accuracy when applied to the testing set. That’s a problem because in research like this, there has to be a strict firewall between the training and testing sets; the team broke that firewall by essentially using the testing set to optimise their algorithms.

If you use this strategy, chances are you will find a positive result through random chance alone. Chances are some combination of methylation marks out of the original 6,000 will be significantly linked to sexual orientation, whether they genuinely affect sexual orientation or not. This is a well-known statistical problem that can be at least partly countered by running what’s called a correction for multiple testing. The team didn’t do that. (In an email to The Atlantic, Ngun denies that such a correction was necessary.)

And, “like everyone else in the history of epigenetics studies they could not resist trying to interpret the findings mechanistically,” wrote John Greally from the Albert Einstein College of Medicine in a blog post. By which he means: they gave the results an imprimatur of plausibility by noting the roles of the genes affected by the five epi-marks. One is involved in controlling immune genes that have been linked to sexual attraction. Another is involved in moving molecules along neurons. Could epi-marks on these genes influence someone’s sexual attraction? Maybe. It’s also plausible that someone’s sexual orientation influences epi-marks on these genes. Correlation, after all, does not imply causation.

So, ultimately, what we have is an underpowered fishing expedition that used inappropriate statistics and that snagged results which may be false positives. Epigenetics marks may well be involved in sexual orientation. But this study, despite its claims, does not prove that and, as designed, could not have.

In a response to Greally’s post, Ngun admitted that the study was underpowered. “The reality is that we had basically no funding,” he said. “The sample size was not what we wanted. But do I hold out for some impossible ideal or do I work with what I have? I chose the latter.” He also told Nature News that he plans to “replicate the study in a different group of twins and also determine whether the same marks are more common in gay men than in straight men in a large and diverse population.”

Great. Replication and verification are the cornerstones of science. But to replicate and verify, you need a sturdy preliminary finding upon which to build and expand—and that’s not the case here. It may seem like the noble choice to work with what you’ve got. But when what you’ve got are the makings of a fatally weak study, of the kind well known to cause problems in a field, it really is an option—perhaps the best option—to not do it at all. (The same could be said for journalists outside the conference choosing to cover the study based on a press release.)

As Greally wrote in his post: “It’s not personal about [Ngun] or his colleagues, but we can no longer allow poor epigenetics studies to be given credibility if this field is to survive. By ‘poor,’ I mean uninterpretable.”

“This is only representative of the broader literature,” he told me. “The problems in the field are systematic. We need to change how epigenomics research is performed throughout the community.”