Everyone talks about social media increasing polarization. I wonder if our ability to pre-screen our romantic partners has something to do with it as well. We can now ensure we don’t ever have to date people we don’t want to.
Everyone talks about social media increasing polarization. I wonder if our ability to pre-screen our romantic partners has something to do with it as well. We can now ensure we don’t ever have to date people we don’t want to.
So the following letter is being widely reported online as if it is evidence for the importance of gun control. I’m skeptical of the results as I detail in the next post but even if one takes the results at face value the letter is pretty misleading and the media reporting is nigh fraudulent.
In particular if one digs into the appendix to the letter one finds the following statement: “many of the firearm injuries observed in the commercially insured patient population may reflect non-crime-related firearm injuries.” This is unsurprising as using health insurance data means you are only looking at patients rich enough to be insured and willing to report their injury as firearms related: so basically excluding anyone injured in the commission of a crime or who isn’t legally allowed to use a gun. As a result they also analyzed differences in crime rates and found no effect.
So even on it’s face this study would merely show that people who choose to use firearms are sometimes injured in that use. That might be a good reason to stay away from firearms yourself but not additional reason for regulation as is being suggested in the media.
Moreover, if the effect is really just about safety at gun ranges then its unclear if the effect is from lower use of such ranges or that the NRA conference encourages greater care and best practices.
Also, I’m pretty skeptical of the underlying claim in the study. The size of the effect claimed is huge relative to the number of people who attend an NRA conference. I mean about 40% of US households are gun owners but only ~80,000 people attend nationwide NRA conventions or ~.025% of the US population or ~.0625 of US gun owners. Thus, for this statistic to be true because NRA members are busy at the conference we would have to believe NRA conference attendees were a whopping 320 times more likely to be inflict a gun related injury than the average gun owner.
Now if we restrict our attention to homicides this is almost surely not the case. Attending an NRA convention requires a certain level of financial wealth and political engagement which suggests membership in a socioeconomic class less likely to commit gun violence and than the average gun owner. And indeed, the study finds no effect in terms of gun related crime. Even if we look to non-homicides gun deaths from suicides far outweigh those from accidents and I doubt those who go to an NRA convention are really that much more suicidal inclined.
An alternative likely explanation is that the NRA schedules its conferences for certain times of the year when people are likely to be able to attend and we are merely seeing seasonal correlations masquerading as effects from the NRA conference (a factor they don’t control for). Also as they run all subgroup analysises and don’t report the results for census tracks and other possible subgroups the possibility for p-hacking is quite real. Looking at the graph they provide I’m not exactly overwhelmed.
The claim gets harder to believe when one considers the fact that people who attend NRA meetings almost surely don’t give up going to firing ranges during the meeting. Indeed, I would expect (though haven’t been able to verify) that there would be any number of shooting range expeditions during the conference and that this would actually mean many attendees would be more likely to handle a gun during that time period.
Though, once one realizes that the data set one is considering is only those who make insurance claims relating to gun related injuries it is slightly more plausible but only at the cost of undermining the significance of the claim. Deaths and suicides are much less likely to produce insurance claims and the policy implications aren’t very clear if all we are seeing is a reduction in people injured because of incorrect gun grips (see the mythbusters about this..such injuries can be quite serious).
Here is a bit of hard data to respond to the claims that observed performance on IQ tests by those in third world countries reflect genetic deficits. Its a good thing too (even if this was hardly the first piece of evidence on the point) since its easy to imagine that the world could have turned out in a way in which (despite race not being a scientifically useful category) third world populations also suffered from genetic intelligence disadvantages. There is a decent case to be made that the Ashkenazi Jews have genetic differences given them higher average IQs. Notably that case doesn’t merely depend on differences in performance on some tests but, likely all compelling scientific arguments, weaves together an explanation of a number of different phenomena with an appealing theoretical account1. Whether or not this ultimately turns out to be true it could have been true and other undisputed cases of recent evolutionary pressure like adult tolerance for lactose make it abundantly clear that we got very lucky that there aren’t major differences in genetic predisposition to IQ across people of different descent and seeing studies like this reassures me that we really did get lucky and its not just that we are laboring under a desirable fiction.
Even though our racial categories don’t correspond to any principled scientific division at the genetic level it is a classification that correlates with ones ancestry. Given that people still tend to choose mates relatively close to themselves genetically (whether or not race is salient to them or merely geographic proximity) that means it could easily have been the case that, even supposing all developmental and social effects are controlled for, that some races would average much, much worse on IQ tests and other measures of intelligence than others. It wouldn’t really matter that race wasn’t the best scientific category to explain the effect if it turned out that 80% of people we classified as black had genes which cost them 20 IQ points while only 20% of Caucasians and 30% of Semites had this genetic combination. Such a fact would have amplified existing prejudices and resentments making it much more difficult to roll back racist attitudes and laws. In such a world I doubt one of the 20% of blacks without those genes would have had much luck explaining to the white racists around them that no, no, black isn’t the appropriate scientific concept with which to analyze this effect its really this other grouping they should be using, e.g., one which is purely defined via heredity and doesn’t exactly track our racial divisions but just happens to correlate with them.
One might try and argue that there is too much human genetic mixing for substantial genetic differences in IQ to have arisen. While it is true that for the most part humans haven’t partitioned themselves into non-interbreeding (or at least rarely) sub-populations that only holds for the most part and is itself purely a lucky accident. Australian aborigines appear to have been genetically isolated for almost 50,000 years with that isolation only ending quite recently2. There is evidence that the San people in Africa may split off from the rest of the human lineage at around the time modern homo sapiens first arrived on the scene and were then genetically isolated for nearly 100,000 years until only 40,000 years ago. There is no scientific law that ensured there weren’t major genetically isolated branches of the human species with substantially different intellectual abilities which remained separated until the end of the middle ages. It didn’t have to be the case that America was populated by genetically modern humans3 and for less extreme cases one doesn’t even need genetic isolation at all. One can imagine a scenario in which the black death is even worse and attacks the neural system creating strong selective pressure in Europe for a mutation which protects against it despite its detrimental effects on IQ. I suppose one could argue that people are just too rapacious and generally willing to fuck each other for differences to have persisted during the historical era but that is only true if all populations were subject to the same selective pressures and one could certainly imagine a scenario in which only farmers and not hunter gatherers (or vice versa) experience selection for the kind of mixed blessing genes postulated to be more prevalent in the Ashkenazi.
Of course, if we learn enough about genetics and perform sufficiently high powered studies we will probably come across some minor statistical difference in IQ between racial groups. If we assumed that humans were all otherwise genetically identical the small IQ advantage observed in Ashkenazi Jews would be enough to ensure that sufficiently powerful studies would find some average difference. Of course we aren’t all otherwise genetically identical and surely the beneficial and detrimental mutations won’t perfectly cancel out on average. But the fact that we haven’t already found substantial differences and don’t even know who will come out on top if average differences are ever found already means that we got incredibly lucky. It didn’t have to be that the HBD people were wrong, it didn’t even half to be that our racial categories didn’t track scientifically important genetic fault lines. Even though many of the HBD proponents seem so desperately motivated to believe their theories (and not all for racist reasons…some just want to be contrarian) their views certainly describe a way the world could have been and we got quite lucky that human capacities ended up sufficiently close together and interbreeding smeared us out enough that we can’t obviously pick out the more and less capable major ethnic groups.
The Wonder of International Adoption: Adult IQ in Sweden | EconLog | Library of Economics and Liberty
In Selfish Reasons to Have More Kids, I showed that nurture effects are small within the First World. But I also freely conceded that the nurture effects of growing up outside the First World are probably large:The most important weakness…
Vox’s takeaway is,
All it takes to reduce support for housing assistance among Donald Trump supporters is exposure to an image of a black man.
Which they back up with the following description:
In a randomized survey experiment, the trio of researchers exposed respondents to images of either a white or black man. They found that when exposed to the image of a black man, white Trump supporters were less likely to back a federal mortgage aid program. Favorability toward Trump was a key measure for how strong this effect was.
If you look at the actual study its chock full of warning signs. They explicitly did not find any statistically significant difference between those Trump voters given the prompts showing black or white aid recipients degree of support for the program or degree of anger they felt or blame they assigned towards those recipients. Given that this is the natural reading of Vox’s initial description its already disappointing (Vox does elaborate to some extent but not in a meaningfully informative way).
What the authors of the study did is asked for a degree of Trump support (along with many other questions such as liberal/conservative identification, vote preference, racial resentment giving researchers a worryingly large range of potentially analysises they could have conducted). Then they regressed the conditional effect of the black/white prompt on the level of blame, support and anger against degree of Trump support controlling for a whole bunch of other crap (though they do claim ‘similar’ results without controls) and are using some dubious claims about this regression to justify their claims. This should already raise red flags about research degree of freedom especially given the pretty unimpressive R^2 values.
But what should really cause one to be skeptical is that the regression of Hillary support with conditional effect of black/white prompt shows a similar upward slope (visually the slope appears on slightly less for Hillary support than it did for Trump) though at the extreme high end of Hillary support the 95% confidence interval just barely includes 0 while for Trump it just barely excludes it. Remember, as Andrew Gelman would remind us the difference between significant and non-significant results isn’t significant and indeed the study didn’t find a significant difference between how Hillary and Trump support interacted with the prompt in terms of degree of support for the program. In other words if we take the study at face value it suggests at only a slightly lower confidence level that increasing support for Hillary makes one more racist.
So what should we make of this strange seeming result? Is it really the case that Hillary support also makes one more racist but just couldn’t be captured by this survey? No, I think there is a more plausible explanation: the primary effect this study is really capturing is how willing one is to pick larger numbers to describe one’s feelings. Yes, there is a real effect of showing a black person rather than a white person on support for the program (though showing up as not significant on its own in this study) but if you are more willing to pick large numbers on the survey this effect looks larger for you and thus correlates with degree of support for both Hillary and Trump.
To put this another way imagine there are two kinds of people who answer the survey. Emoters and non-emoters. Non-emoters keep all their answers away from the extremes and so the effect of the black-white prompt on them is numerically pretty small and they avoid expressing strong support for either candidate (support is only a positive variable) while Emoters will show both a large effect of the black-white prompt (because changes in their opinion result in larger numerical differences) and a greater likelihood of being a strong Trump or Hillary supporter.
This seems to me to be a far more plausible explanation than thinking that increasing Hillary support correlates with increasing racism and I’m sure there are any number of other plausible alternative interpretations like this. Yes, the study did seem to suggest some difference between Trump and Hillary voters on the slopes of the blame and anger regressions (but not support for the program) but this may reflect nothing more pernicious than the unsurprising fact that conservative voters are more willing to express high levels of blame and anger toward recipients of government aid.
However, even if you don’t accept my alternative interpretation the whole thing is sketchy as hell. Not only do the researchers have far too many degrees of freedom (both in terms of the choice of regression to run but also in criteria for inclusion of subjects in the study) for my comfort but the data itself was gathered via a super lossy survey process creating the opportunity for all kinds of bias to enter into the process not to mention. Moreover, the fact that all the results are about regressions is already pretty worrisome as it is often far too easy to make strong seeming statistical claims about regressions, a worry which is amplified by the fact that they don’t actually plot the data. I suspect that there is far more wrong with this analysis than I’m covering here so I’m hoping someone with more serious statistical chops than I have such as Andrew Gelman will analyze these claims.
But even if we take the study’s claims at face value the most you could infer (and technically not even this) is that there are some more people who are racist among strong Trump supporters than among those who have low support for Trump which is a claim so unimpressive it certainly doesn’t deserve a Vox article much less support the description given. Indeed, I think it boarders on journalistically unethical to show the graphs showing the correlation between increasing support for Trump and prompt effect but not the ones showing similar effects for support of Hillary. However, I’m willing to believe this is the result of the general low standards for science literacy in journalism and the unfortunate impression that statistical significance is some magical threshold.
All it takes to reduce support for housing assistance among Trump supporters is exposure to an image of a black man. That’s the takeaway from a new study by researchers Matthew Luttig, Christopher Federico, and Howard Lavine, set to be published in Research & Politics.
Personally, I think the proposal to ‘change’ the p-value for significant results from .05 to .005 is a mistake. The only sense in which this proposal has any real bite is if journals and hiring committees respond by treating research that doesn’t meet p < .005 as less important but all that does is make the incentives for the kind of behavior causing all the problems much stronger.
I’d much rather have a well designed (ideally pre-registered) trial at p < .05 than a p < .005 result that is cherry picked as a result of after the fact choice of analysis. Rather than making the distinction between well designed appropriate methodology and dangerous potentially misleading methodology more apparent this further obscures it and tells any scientist who was standing on principle they need to stop hoping their better methodology will be appreciated and do something to compete on p-value with papers published using problematic data analysis.
In particular, I think this kind of proposal doesn’t take sufficient account of the economics and incentives of researchers. Yes, p < .005 studies would be more convincing but they also cost more (both in $ and time) so by telling fledgling researchers they need p < .005 you force them to put all their eggs in one basket making dubious data analysis choices that much more tempting when their study fails to meet the threshold.
What we need is more results blind publication processes (in which journals publish the results based merely on a description of the experimental process without knowledge of what the results found). That would both help combat many of these biases and truly evaluate researchers on their ability not their luck. Ideally such studies would be pre-accepted before results were actually analyzed. Of course there still needs to be a place for merely suggestive work that invites further research but it should be regarded as such without any particular importance assigned to p-value.
However, as these are only my brief immediate thoughts I’m quite open to potential counterarguments.
In a previous post I was very critical of a study claiming to show gender bias in journal publications in political science. Like too many studies of this kind the data only supported the judgement of gender bias to the extent one was already inclined to believe gender bias was the appropriate explanation for gender disparities in the field. However, not all studies suffer from these flaws so when I heard about a recent study in PNAS examining how an individuals race affects how police treat them at traffic stops and saw that it was well done I thought I should post an example of the right way to engage in this kind of study (and the important/unexpected information one gets when one studies bias rigorously).
What the authors of this paper did was take body camera footage from Oakland police officers in April 2014 and examine vehicle stops they made. They had human raters (I presume college students) examine transcripts of the interactions (without knowledge of officer or civilian’s race) and rate them based on respectfulness, formality, friendliness, politeness and impartiality. After determining that such ratings were repeatable (different raters tended to agree on scoring) they then trained a computational model to predict both respect and formality which they verified against human ratings. I’ll let the paper’s authors speak for themselves about the results.
Controlling for these contextual factors, utterances spoken by officers to white community members score higher in Respect [β = 0.05 (0.03, 0.08)]. Officer utterances were also higher in Respect when spoken to older [β = 0.07 (0.05, 0.09)] community members and when a citation was issued [β = 0.04 (0.02, 0.06)]; Respect was lower in stops where a search was conducted [β = −0.08 (−0.11, −0.05)]. Officer race did not contribute a significant effect. Furthermore, in an additional model on 965 stops for which geographic information was available, neither the crime rate nor density of businesses in the area of the stop were significant, although a higher crime rate was indicative of increased Formality [β = 0.03 (0.01, 0.05)].
Note that the authors themselves raised the possibility that geographic region might play a confounding role, e.g., people in high crime areas might be treated more suspiciously, and rejected it. However, one still might worry that any effect we are seeing is a result of minorities being more inclined toward criminal behavior and thus more frequently pulled over on suspicion of serious infractions but that too is considered and rejected.
One might consider the hypothesis that officers were less respectful when pulling over community members for more severe offenses. We tested this by running another model on a subset of 869 interactions for which we obtained ratings of offense severity on a four-point Likert scale from Oakland Police Department officers, including these ratings as a covariate in addition to those mentioned above. We found that the offense severity was not predictive of officer respect levels, and did not substantially change the results described above. To consider whether this disparity persists in the most “everyday” interactions, we also reran our analyses on the subset of interactions that did not involve arrests or searches (N = 781), and found the results from our earlier models were fundamentally unchanged.
Finally, the paper authors are careful to acknowledge limitations of their analysis. In particular, they acknowledge the limitations of their study in identifying the cause of these disparities in treatment/language and with respect to the possibility that it is differences in minority behavior which itself causes officers to respond differently they say:
The racial disparities in officer respect are clear and consistent, yet the causes of these disparities are less clear. It is certainly possible that some of these disparities are prompted by the language and behavior of the community members themselves, particularly as historical tensions in Oakland and preexisting beliefs about the legitimacy of the police may induce fear, anger, or stereotype threat. However, community member speech cannot be the sole cause of these disparities. Study 1 found racial disparities in police language even when annotators judged that language in the context of the community member’s utterances. We observe racial disparities in officer respect even in police utterances from the initial 5% of an interaction, suggesting that officers speak differently to community members of different races even before the driver has had the opportunity to say much at all.
I feel that this analysis considered and fairly convincingly rejected all the plausible confounders. Of course others might disagree and suggest some other factor, e.g., expensiveness of car, is responsible but even if you are inclined to take such a line you have to admit that this study provides some pretty damn good evidence by ruling out many other plausible confounding variables.
Having said this one should still be careful (as the authors of this paper are) in interpreting the results. In particular, we don’t have a good sense of what the psychological reason for officers different behavior with minorities. Is it because they judge them to be less deserving of respect? Or maybe officers judge minorities to be less respectful to them so begin the interaction less respectfully? Or some other explanation? If the goal is making the world a better place and not merely assigning blame those answers matter and hopefully more good scientific studies will reveal them.
I’d like to close with what I take to be one of the most important reasons to do this research rigorously. While most people could have probably guessed that officers would be less respectful to minority drivers it wasn’t at all obvious that officer race wouldn’t play a factor in respectfulness to minorities. Nor was it obvious that we would see a difference in treatment from a broad swath of police officers not merely a few particularly biased officers. The reason this kind of research is important (in addition to validating minority claims of police treatment) is that we need to learn how and why minorities are treated differently if we are going to fix the problem. Without studies like this I think many people’s natural assumption is that hiring minority officers would address these problems. It doesn’t.
For a number of reasons I think it’s vital that we have a good empirical grip on the reasons why different genders are over/under represented in various disciplines and at various levels of acclaim in those disciplines. There is the obvious reason, namely that, it is only through such an understanding that we can usefully discuss claims of unfairness and evaluate schemes to address those claims. If we get the reasons for under/over representation in various areas wrong we not only risk failing to correct real instances of unfair based treatment but also undermining the credibility of attempts to address unfair treatment more generally. This isn’t only about avoiding gender based biases but, more broadly, identifying ways in which anyone might face unjust hardship in pursuing their chosen career and succeeding at it1.
Also, even putting questions of fairness and discrimination to the side there are important social and cultural reasons to care about these outcomes. For instance, the imbalance of men and women in STEM fields both imposes personal hardships on both genders in those fields but also creates an excuse for dismissing the style of thinking developed by STEM disciplines. As such, identifying simple changes that could substantially increase female participation in STEM subjects is desirable in and of itself and similar cultural considerations beyond mere fairness extend to other fields. However, I worry that incorrect interpretation of the empirical data could lead us to overlook such changes especially when they don’t fit nicely into the default cultural narrative2.
Point is that I genuinely want to accurately identify the causes of gender differences in educational attainment and academic outcomes. One could be forgiven for thinking that we’ve already nailed down these causes. After all every couple of months one sees a new study being touted in the mainstream media claiming to show sexism playing a role in some educational or professional evaluation. Unfortunately, closer examination of the actual studies conducted often reveals that they don’t actually support the interpretation provided and everyone suffers from a misleading interpretation of the empirical data.
So, in an attempt to get a better picture of what the evidence tells us, every time I see a new study claiming to document gender bias or otherwise explain gender differentials in outcomes I’m going to dive into the results and see if they support the claims made by the article. While I can’t claim that I’m choosing studies to examine in a representative fashion I do hope that comparing the stated claims to what the data supports will help uncover the truth.
I ran across this claim that there is gender bias against female authors in political science in the wall street journal blog monkey cage. For once, the mainstream media deserves credit because they accurately conveyed the claims made by the study.
The study claims to show gender bias in political science publication based on an analysis of published papers in political science. By coding the authors of published papers the study gives us the following information about the rate of female publication.
The paper deserves credit for recognizing that this may reflect some degree of sorting by subfield and recognizing that sorting into subfield might falsely create the impression of bias even when none was present. However, any credit granted should be immediately revoked on account of the following argument.
However, gendered sorting into subfields would not explain is the pattern we observe for the four “generalist” journals in our sample (AJPS, APSR, JOP and POP). These four journals—official journals either of the national association or one of its regional affiliates—are all “generalist” outlets, in that their websites indicate that they are open to submissions across all subfields. Yet, as figure 3 shows, women are underrepresented, against all three benchmarks, in three of those four “generalist” journals.
The mere fact that these are generalist journals in no way means that they are not more likely to publish some kinds of analysis rather than others. As the study goes on to observer women are substantially underrepresented in quantitative and statistical work while overrepresented (at least as compared to their representation at prestigious institutions) in qualitative work. Despite the suggestion by the study authors to the contrary choosing, for valid intellectual (or even invalid gender unrelated) reasons, to value quantitative work more highly and publish it more readily doesn’t constitute gender bias in journal publication in the sense that their conclusions and ethical interpretations assume.
Ideally, the authors would have provided some more quantitative evaluation of what part of the observed effect was explained by choice of subfield and mode of analysis. However, I think it’s fair to say based on the graph above that women aren’t so overrepresented in publications in qualitative areas for subfield preferences to explain everything so lets put the concern about subfield/analysis type based sorting to one side and return to the primary issue
This paper also deserves praise for recognizing that merely comparing the percentage of women in the field with the percentage of prestigious female publications will merely reflect the fact that past discrimination means the oldest, and most influential, segment of the discipline is disproportionately male. In other words, even assuming that all discrimination and bias magically vanished in the year 2000 one would still expect to find men being published and cited at a greater rate than women for the simple reason that eliminating barriers to female participation biases female representation to the less experienced parts of the discipline. By breaking down authors by their professorial rank the study is able to minimize the extent to which this issue affects their conclusions.
Importantly, in the discussion section (and throughout the paper) the study makes it clear that it takes this result to be evidence of bias. The WSJ post was quite right in understanding the paper to be alleging gender bias in publication. Yes, the study doesn’t claim to decide whether this bias is a result of female authors being rejected more frequently or female authors being less likely to publish in the most prestigious journals but in either case it assumes that the ultimate explanation is pernicious gender bias.
The paper also explores the issue of gender based coauthorship and the relative prevalence of papers with all male authors, mixed gender etc.. etc.. These patterns are used to motivate various speculations about the fears women may face in choosing to coauthor but the complete lack of any attempt to determine to what extent these patterns are simply the result of subfield and analysis type preferences, e.g., quantitative and statistical analysis might lend themselves more frequently to coauthorship, and the relevant percentages of women in those fields undermines any attempt to use this data to support such speculations. While I believe that female scholars do face real concerns about being insufficiently credited as co-authors the ways such concerns could play our are so varied that I don’t think we can use this data to draw the conclusion the study authors do: women aren’t benefiting equally from trends toward coauthorship. However, I’m going to set this issue aside.
At this point one might be inclined to think this paper should get pretty good marks. Sure, I’ve identified a few concerns that aren’t fully addressed but surely it makes a pretty good case for the claim of gender bias in political science? Unfortunately, that’s simply a mirage created by thinking about the data in exactly one way. Notice that one could equally well use the same data and analysis to draw the conclusion: Women Hired in Political Science Despite Fewer Publications. After all the way one gets professorial jobs is by publishing papers and this data suggest that women at the same professional level have less publications than their male colleagues.
Now I think there are multiple plausible ways of resisting the conclusion that this data shows a bias in favor of women in hiring. For one, if past discrimination means that men and women at the same professional level haven’t had the same amount of time to right papers (e.g. women are more likely to have just got the job) then the conclusion is suspect. For another, one might point out that not all the jobs given the same professorial rank in the study are really equivalent. There are further reasons to doubt these conclusions, but each and every reason equally well undermines any support this data provides for claims of gender bias.
Ultimately, I think it’s safe to say that while this study shows that women publish in influential journals at a rate lower than their representation in the political science profession would suggest it does little to identify a cause. If you came into this with the prior that said: the reason women are underrepresented in political science is because they face bias and other obstacles you’ll explain this effect in terms of bias and obstacles. In contrast, if you came in with the prior that said: the reason women are still underrepresented in political science is because of gender related differences in ability/interest (which need not be negative it could as well be a greater affinity for some rival career option) then the data are perfectly compatible with women gravitating towards more qualitative less rigorous aspects of the profession and putting greater focus on teaching and other aspects of the profession that don’t result in publications.
Frankly, I don’t know enough about political science to have much opinion on this point one way or another. However, I do think we can safely mark this study down as misleading at least insofar as it is cited as further evidence of gender bias against women. Don’t get me wrong, I think that is a very plausible interpretation of the data but I’m just sharing the bias I came in with rather than being persuaded by evidence.