The gender gap and the travesty of assumed linear ordering

Markus Pössel

There were quite a number of interesting points raised at the Heidelberg Laureate Forum panel debate about the Gender Gap in Science on September 27. But the statement I keep coming back to is the following one by Margo Seltzer (University of British Columbia), one of the panelists: It is a myth that you can take researchers and sort them into a linear order. Choosing between candidates is a multidimensional problem.

Once you’re aware of the myth of linear order, you’ll find examples everywhere: entities in a multidimensional space, forced, more or less arbitrarily, into a linear order. The best universities. The best cities to live in. Commonly with that ultimate weasel word, “best,” suggesting that there is some common linear direction along which the list is ordered. It’s so common, we don’t even object to this travesty anymore.

Linear order is not a given

Gender Gap Panel at 7th HLF: Marie-Francoise Roy, Jessica Carter, Fernando Seabra Chirigati, Anna Vasilchenko, Ragni Piene (standing) and Anna Wienhard (via Skype, not pictured)

Mathematicians should know better. Having a linear order is a special property of sets. For ordinary (integer, rational, real) numbers, no problem. But whenever more than one dimension is involved, things get complicated. That part is easy to see in everyday examples, whenever there is more than one criterion for evaluation. Quantifying any criterion – assigning a number, in a meaningful way – is hard enough for more complicated properties. For assigning such numbers in a way that allows us to compare different criteria, there is no unique, objective solution.

In practice, it is common to just add up the number, and give each criterion a characteristic coefficient: a weight. But even that involves more of an arbitrary choice than merely choosing the coefficients. If we add up those numbers directly, we (tacitly or explicitly) assume that the relation is linear. But it’s quite possible that the number best-suited (if there is such a thing) for defining the generalized property we intend to describe goes with the square root, or with some power, or with a more complicated function of the numbers specifying the separate criteria.

Whenever you define such a number, you should be aware of the choices involved, and of the different choices possible. Who, in such a situation, could be arrogant enough to take such an arbitrary construct, and slap on labels, which, from their meaning in everyday life, bring with it associations of an absolute ordering: better than, worse than, best, second-best, third-best?

How come, for instance, that anyone still gives anything on university rankings, for instance? Why should we even expect some combination of “Research”, “Citations”, “Teaching”, “Industry Income” and “International Outlook,” each difficult to measure on its own, to yield a meaningful number for which it makes sense to attach the label “the best,” and from which to derive a ranking? (Interestingly, this blog post here recently popped up in my Twitter timeline.)

The answer, of course, is: Many people do it that way. And some of the resulting rankings have become quite influential.

Ranking job candidates

Take a situation that is even more difficult: scientists applying for faculty positions. How do you rank the applicants, and decide whom to offer the job? I don’t know what is more error-prone: Emulating the university rankings, that is, coming up with some overall number that suggests objectivity, and going by that – or what appears to be the more common way, with individuals on the selection committee looking through candidates’ applications with an eye towards the given criteria, and making their own choices based on that.

I’ve been in hiring discussions like that, where the different weighting suddenly becomes important. I have had discussions with a colleague who favored one candidate, while I favored another, and quite naturally, the discussion shifted to the different weights given. The colleague was arguing to give one criterion more weight (which favored their preferred candidate), I was arguing for assigning more weight to a criterion that gave my preferred candidate the advantage. There need not have been anything sinister about all of this. I genuinely believed, and still believe, my argument was sound, and I have no reason to assume my colleague did not think the same for their argument. But it does open up the process for the influence of biases, even while all those involved consider themselves to be acting objectively, and with a view towards choosing the best (oops, there is that weasel word again) candidate.

Biases

Margo Seltzer at the Gender Gap in Science panel during the 7th HLF.

Some of the biases are conscious – there is nothing subtle about being told that girls cannot do physics anyway, or that the male students in the lecture hall are there to enrich science, while the women are there to make it more beautiful (to quote recent examples from my Twitter timeline). Other biases are unconscious. During the HLF panel debate, Seltzer gave us some homework: Complete two of the implicit bias tests at http://implicit.harvard.edu – something I can only recommend, even though the results tend to be disconcerting.

Studies of artificial situations in which professors were asked to make hiring decisions paint a mixed picture – some of the studies (such as this and this) found clear bias against women, while another one found the opposite. But all such experiments have one definite disadvantage: People may react differently when there’s nothing really at stake for them. Here, on the other hand, is a real-life study from my own field, astronomy: When NASA changed the rules for applications for observing with the Hubble Space Telescope, introducing a double-blind review that forced reviewers to consider the proposal on its direct scientific merits, instead of adding an evaluation of applicants’ previous track records and other factors to the mix, there was a flip: In the 18 previous years, proposals led by men had always had higher acceptance rates than those led by women. With the new double-blind evaluation, female-led proposals did (slightly) better than male-led proposals.

Combine biases with an artificial linear ordering, and you are likely to end up with people who are probably convinced that they have made an objective choice (“the best candidate”). But if we have a sufficient number of people making those choices who are biased against female applicants, that could well be an important part of why we have a gender gap for senior positions in science, which is typically larger than for junior positions or for student numbers. Helped by the travesty of insisting that there is a linear ordering for the multidimensional problem of evaluating applicants.

No silver bullet

Anonymization, as in the case of the Hubble proposal, will not be possible in the case of a faculty search. There’s no silver bullet that will solve the whole problem at once, although there are some tweaks. One step is becoming aware of the biases involved – and of the multi-dimensionality of the problem, and the problems with forcing multi-dimensional entities into a linear order.

Sometimes, one can change the procedure in a way that avoids some of the biases. To this end, Seltzer told the anecdote of a colleague at a major university, who is involved in hiring new faculty. That colleague had developed an interesting strategy: In the search phase, they would call a number of suitable experts and ask them to name what the experts saw as the three top candidates for the position. Almost invariably, the three names that came back would be those of white males. Then, the colleague would mention the need for diversity in the department, and ask for three names of good candidates who would make the department more diverse. The key was in the last question: The colleague would ask his respondents to rank all six names – and usually, the new order would not have the three original names in the first three places.

Sometimes, even a simple thing like that can produce a significant change in the results. Which is appalling, and should be sobering to those who insist that they are merely going by candidates’ merits to fill their positions. But it’s also hopeful that change is possible – if we think about it, act on what we see, and try to curb the influence of biases.

 

Der Beitrag The gender gap and the travesty of assumed linear ordering erschien zuerst auf Heidelberg Laureate Forum.