Monday, November 23, 2020

Some data quality issues in a published paper

 This is Joseph

This paper has been getting a lot of attention, and not the best kind. The discussion has some unexpectedd  conclusions:

While it has been shown that having female mentors increases the likelihood of female protégés staying in academia and provides them with better career outcomes, such studies often compare protégés that have a female mentor to those who do not have a mentor at all, rather than to those who have a male mentor. Our study fills this gap, and suggests that female protégés who remain in academia reap more benefits when mentored by males rather than equally-impactful females. The specific drivers underlying this empirical fact could be multifold, such as female mentors serving on more committees, thereby reducing the time they are able to invest in their protégés, or women taking on less recognized topics that their protégés emulate, but these potential drivers are out of the scope of current study. Our findings also suggest that mentors benefit more when working with male protégés rather than working with comparable female protégés, especially if the mentor is female. These conclusions are all deduced from careful comparisons between protégés who published their first mentored paper in the same discipline, in the same cohort, and at the very same institution.

 There are a number of issues with this article. Some of the most interesting are in this twitter thread, and the author of the thread has links to the data and his analysis

One really important finding is that the approach used by the authors actually evaluates "co-authorship" and not "mentorship". From the supplement (note, not the paper itself):

We identified mentor-proteg´ e pairs as follows: For any given scientist, we consider the first 7 ´ years of their career to be their junior years, and the ones after that to be their senior years. Whenever a junior scientist publishes a paper with a senior scientist, we consider the former to be a proteg´ e´, and the latter to be a mentor, as long as they authored at least one paper with 20 or less co-authors and share the same discipline and US-based affiliation.

This makes a big difference, as it suggests connectivity to very prominent male co-authors is important, but this is a different estimand than the one that the paper presents in the discussion. It's almost certainly based on flawed data, but even on it's own terms that is concerning.  

Daniel Weeks graphical look at the data shows cases with mentor ages of greater than 200 (seriously) and number of mentors exceeding 90. These are not plausible values for  the scientific question, and cast doubt on the reliability of the data analysis. 

Furthermore, the time period stretches from 1897 to 2019. It is worth nothing that women did not gain the right to vote until 1920 in the US (and this paper ONLY considers researchers with a US affiliation by design). Can we honestly say that there has been no important change in social and cultural practices in terms of granting of senior research positions since 1897??  

Finally, the approach used to assign gender seems . . . unreliable. Nevermind that it may miss important distinctions like trans-gender researchers, it seems that experiments with the tool used show poor results. Consider:

The authors claim they used Genderize.io to identify the gender of the author. It appears that this app assigns a gender based on the first name. I tried it by entering the names of 55 recent co-authors, whose gender identity and preferred pronouns I know, into a google sheet and using the API tool. Genderize.io mis-gendered 12% of them. It likely will not shock you that the names that were mis-gendered are more likely to be from non-American scientists. 

At some point the sheer amount of measurement error in the data has got to be an issue.  

Now look at the first sentence of the conclusion:

Our gender-related findings suggest that current diversity policies promoting female–female mentorships, as well-intended as they may be, could hinder the careers of women who remain in academia in unexpected ways.

What an incredibly strong conclusion for data on co-authorship that includes vast periods of under-representation of senior female scientists. Does it suggest that female senior scientists are bad for the careers of their trainees? I worry that it does, based on an incorrect estimand and some data with clear problems. This sort of strong conclusion is really not ideal, and even if you want to defend the need to look at controversial positions then it seems that the discussion should have focused on the grave limitations and not such strong conclusions. 

The article now includes the following statement:

Editor’s Note: Readers are alerted that this paper is subject to criticisms that are being considered by the editors. Those criticisms were targeted to the authors’ interpretation of their data that gender plays a role in the success of mentoring relationships between junior and senior researchers, in a way that undermines the role of female mentors and mentees. We are investigating the concerns raised and an editorial response will follow the resolution of these issues.  

But it has an altmetric score of 7195 (likely higher when this blog posts) and has been accessed 302 thousand times.  Perhaps we should catch these problems earlier in the process. Just graphing the mentor age issue shows many fun problems including age starting at 5 years old (FIVE!) and people over 200 being immediately visible. Perhaps extra data cleaning, a better estimand (or use of the correct one), and a more relevant time period would have been a superior research result? 

 

No comments:

Post a Comment