Thoughts on rationalism and the rationalist community from a skeptical perspective. The author rejects rationality in the sense that he believes it isn't a logically coherent concept, that the larger rationalism community is insufficiently critical of it's beliefs and that ELIEZER YUDKOWSKY IS NOT THE TRUE CALIF.
In recent years a number of prominent individuals have raised concerns about our ability to control powerful AIs. The idea is that once we create truly human level generally intelligent software or AGI computers will undergo an intelligence explosion and will be able to escape any constraints we place on them. This concern has perhaps been most throughly developed by Eliezer Yudkowsky.
Unlike the AI in bad science fiction the concern isn’t that the AI will be evil or desire dominion the way humans are but simply that it will be too good at whatever task we set it to perform. For instance, suppose Waymo builds an AI to run its fleet of self-driving cars. The AI’s task is to converse with passengers/app users and route its vehicles appropriately. Unlike more limited self-driving car software this AI is programmed to learn the subtleties of human behavior so it can position a pool of cars in front of the stadium right before the game ends and helpfully show tourists the sites. On Yudkowsky’s vision the engineers achieve this by coding in a reward function that the software works to maximize (or equivalently a penalty function it works to minimize). For instance, in this case the AI might be punished based on negative reviews/frustrated customers, deaths/damage from accidents involving its vehicles, travel delays and customers who choose to use a competitor rather than Waymo. I’m already skeptical that (super) human AI would have anything identifiable as a global reward/utility function but on Yudkowsky’s picture AGI is something like a universal optimizer which is set loose to do its best to achieve rewards.
The concern is that the AI would eventually realize that it could minimize its punishment by arranging for everyone to die in a global pandemic since then there would be no bad reviews, lost customers or travel delays. Given the AI’s vast intelligence and massive data set it would then hack into microbiology labs and manipulate the workers there to create a civilization ending plague. Moreover, no matter what kind of firewalls or limitations we try and place on the AI as long as it can somehow interact with the external world it will find a way around these barriers. Since its devilishly difficult to specify any utility function without such undesirable solutions Yudkowsky concludes that AGI poses a serious threat to the human species.
Rewards And Reflection
The essential mechanism at play in all of Yudkowsky’s apocalyptic scenarios is that the AI examines its own reward function, realizes that some radically different strategy would offer even greater rewards and proceeds to surreptitiously work to realize this alternate strategy. Now its only natural that a sufficiently advanced AI would have some degree of reflective access to its own design and internal deliberation. After all it’s common for humans to reflect on our own goals and behaviors to help shape our future decisions, e.g., we might observe that if we continue to get bad grades we won’t get into the college we want and as a result decide that we need to stop playing World of Warcraft.
At first blush it might seem obvious that realizing its rewards are given by a certain function would induce an AI to maximize that function. One might even be tempted to claim this is somehow part of the definition of what it means for an agent to have a utility function but that’s trading off on an ambiguity between two notions of reward.
The sense of reward which gives rise to the worries about unintended satisfaction is that of positive reinforcement. It’s the digital equivalent of giving someone cocaine. Of course, if you administer cocaine to someone every time they write a blog post they will tend to write more blog posts. However, merely learning that cocaine causes a rewarding distribution of dopamine in the brain doesn’t cause people to go out and buy cocaine. Indeed, that knowledge could just as well have the exact opposite effect. Similarly, there is no reason to assume that merely because an AGI has a representation of their reward function they will try and reason out alternative ways to satisfy it. Indeed, indulging in anthropomorphizing for a moment, there is no reason to assume that an AGI will have any particular desire regarding rewards received by its future time states much adopt a particular discount rate.
Of course, in the long run, if a software program was rewarded for analyzing its own reward function and finding unusual ways to activate it then it could learn to do so just as people who are rewarded with pleasurable drug experiences can learn to look for ways to short-circuit their reward system. However, if that behavior is punished, e.g., humans intervene and punish the software when it starts recommending public transit, then the system will learn to avoid short-circuiting its reward pathways just like people can learn to avoid addictive drugs. This isn’t to say that there is no danger here, left alone an AGI, just like a teen with access to cocaine, could easily learn harmful reward seeking behavior. However, since the system doesn’t start in a state in which it applies its vast intelligence to figure out ways to hack its reward function the risk is far less severe.
Now, Yudkowsky might respond by saying he didn’t really mean the system’s reward function but its utility function. However, since we don’t tend to program machine learning algorithms by specifying the function they will ultimately maximize (or reflect on and try to maximize) its unclear why we need to explicitly specify a utility function that doesn’t lead to unintended consequences. After all, Yudkowsky is the one trying to argue that its likely that AGI will have these consequences so merely restating the problem in a space that has no intrinsic relationship to how one would expect AGI to be constructed doesn’t do anything to advance his argument. For instance, I could point out that phrased in terms of the locations of fundamental particles its really hard to specify a program that excludes apocalyptic arrangements of matter but that wouldn’t do anything to convince you that AIs risked causes such apocalypses since such specifications have nothing to do with how we expect an AI to be programed.
The Human Comparison
Ultimately, we have one example of a kind of general intelligence: the human brain. Thus, when evaluating claims about the dangers of AGI one of the first things we should do is see if the same story applies to our brain and if not if there is any special reason to expect our brains to be different.
Looking at the way humans behave its striking how poorly Yudkowsky’s stories describe our behavior even though evolution has shaped us in ways that make us far more dangerous than we should expect AGIs to be (we have self-preservation instincts, approximately coherent desires and beliefs, and are responsive to most aspects of the world rather than caring only about driving times or chess games). Time and time again we see that we follow heuristics and apply familiar mental strategies even when its clear that a different strategy would offer us greater activation of reward centers, greater reproductive opportunities or any other plausible thing we are trying to optimize.
The fact that we don’t consciously try and optimize our reproductive success and instead apply a forest of frameworks and heuristics that we follow even when they undermine our reproductive success strongly suggests that an AGI will most likely function in a similar heuristic layered fashion. In other words, we shouldn’t expect intelligence to come as a result of some pure mathematical optimization but more as a layered cake of heuristic processes. Thus, when an AI responsible for routing cars reflects on its performance it won’t see the pure mathematical question of how can I minimize such and such function any more than we see the pure mathematical question of how can I cause dopamine to be released in this part of my brain or how can I have more offspring. Rather, just as we break up the world into tasks like ‘make friends’ or ‘get respect from peers’ the AI will reflect on the world represented in terms of pieces like ‘route car from A to B’ or ‘minimize congestion in area D’ that bias it towards a certain kind of solution and away from plots like avoid congestion by creating a killer plague.
This isn’t to say there aren’t concerns. Indeed, as I’ve remarked elsewhere I’m much more concerned about schizophrenic AIs than I am about misaligned AI’s but that’s enough for this post.
This is an important point not just about AI software but discussions about race and gender more generally. Accurately reporting (or predicting) facts that, all too often, are the unfortunate result of a long history of oppression or simple random variation isn’t bias.
Personally, I feel that the social norm which regards accurate observation of facts such as (as mentioned in the article) racial differences in loan repayment rate conditional on wealth to be a reflection of bias is just a way of pretending society’s social warts don’t exist. Only by accurately reporting such effects can we hope to identify and rectify the causes, e.g., perhaps differences in treatment make employment less stable for certain racial groups or whether or not the bank officer looks like you affects likelihood of repayment. Our unwillingness to confront these issues places our personal interest in avoiding the risk of seeming racist/sexist over the social good of working out and addressing the causes of these differences.
Ultimately, the society I want isn’t the wink and a nod cultural in which people all mouth platitudes but we implicitly reward people for denying underrepresented groups loans or spots in colleges or whatever. I think we end up with a better society (not the best, see below) when the bank’s loan evaluation software spits out a number which bakes in all available correlations (even the racial ones) and rewards the loan officer for making good judgements of character independent of race rather than the system where the software can’t consider that factor and we reward the loan officers who evaluate the character of applications of color more negatively to compensate or the bank executives who choose not to place branches in communities of color and so on. Not only does this encourage a kind of wink and nod racism but when banks optimize profits via subtle discrimination rather than explicit consideration of the numbers one ends up creating a far higher barrier to minorities getting loans than a slight tick up in predicted default rate. If we don’t want to use features like the applicant race in decisions like loan offers, college acceptance etc.. we need to affirmatively acknowledge these correlations exist and ensure we don’t implement incentives to be subtly racist, e.g., evaluate loan officer’s performance relative to the (all factors included) default rate so we don’t implicitly reward loan officers and bank managers with biases against people of color (which itself imposes a barrier to minority loan officers).
In short, don’t let the shareholders and executives get away with passing the moral buck by saying ‘Ohh no, we don’t want to consider factors like race when offering loans’ but then turning around and using total profits as the incentive to ensure their employees do the discrimination for them. It may feel uncomfortable openly acknowledging such correlates but not only is it necessary to trace out the social causes of these ills but the other option is continued incentives for covert racism especially the use of subtle social cues of being the ‘right sort’ to identify likely success and that is what perpetuates the cycle.
In Florida, a criminal sentencing algorithm called COMPAS looks at many pieces of data about a criminal and computes the probability that they will commit new crimes. Judges use these risk scores in criminal sentencing and parole hearings to determine whether the offender should be kept in jail or released.
In hindsight it often turns out the biggest effect of a new technology is very different than what people imagined beforehand. I suggest that this may well be the case for self-driving cars.
Sure, the frequently talked about effects like less time wasted in commutes or even the elimination of personal car ownership are nice but I think self-driving cars might have an even larger effect by eliminating the constraint of proximity in schooling and socialization for children.
While adults often purchase homes quite far from their workplaces proximity is a huge constraint on which schools students attend. In a few metropolises with extensive public transport systems its possible for older children to travel to distant schools (and, consequently, these cities often have more extensive school choice) but in most of the United States busing is the only practical means to transport children whose parents can’t drive them to school. While buses need not take children to a nearby school they are practically limited by the need to pick children up in a compact geographic area. A bus might be able to drive from downtown Chicago to a school in a suburb on the north side of the city but you couldn’t, practically, bus students to their school of choice in the metropolitan area. Even in cases where busing takes students to better schools in remote areas attending a school far from home has serious costs. How can you collaborate with classmates, play with school friends, attend after school activities or otherwise integrate into the school peer group without a parent to drive you?
This all changes with self-driving cars. Suddenly proximity poses far less of a barrier to schooling and friendship. By itself this doesn’t guarantee change but it creates an opportunity to create a school system that is based on specialization and differing programs rather than geographic region.
Of course, we aren’t likely to see suburban schools opening their doors to inner city kids at the outset. Everyone wants the best for their children and education, at least at the high end, is a highly rivalrous good (it doesn’t really matter how well a kid scores objectively on the SAT only that he scores better than the other kids). However, self-driving cars open up a whole world of possibility for specialty schools catering to students who excel at math and science, who have a particular interest in theater or music or who need special assistance. As such schools benefit wealthy influential parents they will be created and, by their very nature, be open to applicants from a wide geographic area.
No, this won’t fix the problem of poor educational outcomes in underprivileged areas but it will offer a way out for kids who are particularly gifted/interested in certain areas. This might be the best that we can hope for if, as I suspect, who your classmates are matters more than good technology or even who your teachers are.
I should probably give credit to this interesting point suggesting that school vouchers aren’t making schools better because they don’t result in school closures for inspiring this post (and because I think its an insightful point).
An Ineffective Strategy With Worrying Implications
Wait what? We are launching a DDOS attack against North Korea. Could we do anything more stupid? Its not like North Korea uses the internet enough for this to represent a serious inconvenience to the nation while at the same time we legitimize the use of cyber attacks against civilian infrastructure as a way to settle international disputes. Dear god this is a bad idea!
As the US reportedly conducts a denial-of-service attack against North Korea’s access to the Internet, the regime of Kim Jong Un has gained another connection to help a select few North Koreans stay connected to the wider world-thanks to a Russian telecommunications provider.
Machine Learning, Sensitive Information and Prenatal Hormones
So there’s been some media attention recently to this study which found they were able to accurately predict sexual orientation with 91% for men and 83% for women. Sadly, everyone is focusing on the misleading idea that we can somehow use this algorithm to decloak who is gay and who isn’t rather than the really interesting fact that this is suggestive of some kind of hormonal or developmental cause of homosexuality.
Rather, given 5 pictures of a gay man and 5 pictures of a straight man 91% of the time it is able to correctly pick out the straight man. Those of us who remember basic statistics with all those questions about false positive rates should realize that, given the low rate of homosexuality in the population, this algorithm doesn’t actually give very strong evidence of homosexuality at all. Indeed, one would expect that, if turned loose on a social network, the vast majority of individuals judged to be gay would be false positives. However, in combination with learning based on other signals like your friends on social media one could potentially do a much better job. But at the moment there isn’t much of a real danger this tech could be used by anti-gay governments to identity and persecute individuals.
Also, I wish the media would be more careful about their terms. This kind of algorithm doesn’t reveal private information it reveals sensitive information inadvertently exposed publicly.
However, what I found particularly interesting was the claim in the paper that they were able to achieve a similar level of accuracy for photographs taken in a neutral setting. This, along with other aspects of the algorithm, strongly suggest the algorithm isn’t picking up on some kind of gay/straight difference in what kind of poses people find appealing. The researchers also generated a heat map of what parts of the image the algorithm is focusing on and while some of them do suggest grooming based information about hair, eyebrows or beard play some role the strong role that the nose, checks and corners of the mouth play suggests that relatively immutable characteristics are pretty helpful in predicting orientation.
The authors acknowledge that personality has been found to affect facial features in the long run so this is far from conclusive. I’d also add my own qualification that there might be some effect of the selection procedure that plays a role, e.g., if homosexuals are less willing to use a facial closeup on dating sites/facebook if they are ugly the algorithm could be picking up on that. However, it is at least interestingly suggestive evidence for the prenatal hormone theory (or other developmental theory) of homosexuality.
This is an interesting piece but I couldn’t disagree more with the title or the author’s obvious feeling that there must be a cynical explanation for techie’s distrust of government regulation.
Silicon valley types are simply classical pragmatic libertarians. They aren’t Ayn Rand quoting objectivists who believe government intervention is in principle unacceptable. Rather, they, like most academic economists, simply tend to feel that well-intentioned government regulation often has serious harmful side effects and isn’t particularly likely to accomplish the desired goals.
I think this kind of skepticism flows naturally from a certain kind of quantitative results oriented mindset and I expect you would find the same kind of beliefs (to varying degrees) among the academic physicists, civil engineers and others who share the same educational background and quantitative inclination as silicon valley techies. I’m sure that the particular history of poorly understood tech regulation like the original crypto wars in the 90s plays a role but I suspect it just amplified existing tendencies.
But by the 1990s, with the advent of the World Wide Web and the beginning of the tech industry’s march to the apex of the world’s economy, another Silicon Valley political narrative took root: techies as unapologetic libertarians, for whom the best government is a nearly nonexistent one.
So doesn’t this suggest that you can just hang around on Bitcoin forums/chats/etc and use that info to arbitrage your way into substantial sums (by using your info about likely resolution of bitcoin forking/update discussions to predict prices in ways that are already occupied in developed markets)?
I mean I suppose there is the limitation on leverage. You can borrow for stock trades using your stocks as collateral but I don’t believe you can do the same yet with bitcoin. But still seems like a good deal. Is there any other reason this won’t work/
The recent (highly damaging) Wcry ransomware worm is derived from NSA code recently disclosed by hackers. This has lead Microsoft (and others) to call on the government to disclose security vulnerabilities so they can be fixed rather than stockpiling them for use in offensive hacking operations. However, I think the lesson we should learn from this incident is exactly the opposite.
This debate about how to balance the NSA‘s two responsibilities: protecting US computer systems from infiltration and gathering intelligence from foreign systems is hardly new (and Bruce Schneier’s take on it is worth reading). The US government is very much aware of this tension and has a special process, the vulnerabilities equities process (VEP), to decide whether or not to disclose a particular vulnerability. Microsoft is arguing that recent events illustrate just how much harm is caused by stockpiled vulnerabilities and, analogizing this incident to the use of stolen conventional weaponry, suggesting the government needs to take responsibility by always choosing to report vulnerabilities to vendors so they can be patched.
However, if anything, this incident illustrates the limitations of reporting vulnerabilities to vendors. Rather than being 0-days the vulnerabilities used by the Wcry worm were already patched a month before the publication of the NSA exploits and the circumstances of the patch suggest that the NSA, aware that it had been compromised, reported these vulnerabilities to Microsoft. Thus, rather than illustrating the dangers of stockpiling vulnerabilities, this incident reveals the limitations of reporting vulnerabilities. Even once vulnerabilities are disclosed the difficulty convincing users to update and the lack of support for older operating systems leave a vast many users at risk. In contrast, once a patch is released (or even upon disclosure to a vendor) the vulnerability can no longer be used to collect intelligence from security aware targets, e.g., classified systems belonging to foreign governments.
It is difficult not to interpret Microsoft’s comments on this issue as an attempt to divert blame. After all, it is their code which is vulnerable and it was their choice to cease support for windows XP. However, to be fair, this is not the first time they have taken such a position publicly. Back in February Microsoft called for a “Digital Geneva Convention” under which governments would forswear “cyber-attacks that target the private sector or critical infrastructure or the use of hacking to steal intellectual property” and commit to reporting vulnerabilities rather than stockpiling them.
While there may an important role for international agreement to play in this field Microsoft’s proposal here seems hopelessly naive. There are good reasons why there has never been an effective international agreement barring spying and they all apply to this case as well. There is every incentive for signatories to such a treaty to loudly affirm it and then secretly continue to stockpile vulnerabilities and engage in offensive hacking. While at first glance one might think that we could at least leave the private sector out of this that ignores the fact that many technologies are dual purpose1 and that frequently the best way to access government secrets will be to compromise email accounts hosted by private companies as well as the uses big data can be put to by government actors. Indeed, the second that a government thought such a treaty was being followed they would move all their top secret correspondence to (in country version of) something like gmail.
Successful international agreements forswearing certain weapons or behaviors need to be verifiable and not (too) contrary to the interests of the great powers. The continued push to ban land mines is unlikely to be successful as long as they are seen as important to many powerful countries’ (including a majority of permanent security council members) military strategies2 and it is hard to believe that genuinely giving up stockpiling vulnerabilities and offensive hacking would be in the interests of Russia or China. Moreover, if a treaty isn’t verifiable there is no reason for countries not to defect and secretly fail to comply. While Microsoft proposes some kind of international cooperative effort to assign responsibility for attacks it is hard to see how this wouldn’t merely encourage false flag operations to trigger condemnation and sanctions against rivals. It is telling that the one aspect of such a treaty that would be verifiable, the provision banning theft of IP (at least for use by private companies rather than for national security purposes), is the only aspect Microsoft points to as having been the subject of a treaty (a 2015 US-China agreement).
While it isn’t uncommon for idealistic individuals and non-profit NGOs to act as if treaties can magic away the realities of state interests and real world incentives I have trouble believing Microsoft is this naive about this issue. I could very well be wrong on this point but it’s hard for me not to think their position on this issue is more about shifting blame for computer security problems than a thoughtful consideration of the costs and benefits.
Of course, none of this is to say that there isn’t room for improvement in how the government handles computer security vulnerabilities. For instance, I’m inclined to agree with most of the reforms mentioned here. As far as the more broad question of whether we should tip the scales more toward reporting vulnerabilities instead of stockpiling them I think that depends heavily on how frequently the vulnerabilities we find are the same as those found by our rivals and how quickly our intelligence services are able to discover what vulnerabilities are known to our rivals. As such information is undoubtedly classified (and for good reasons) it seems the best we can do is make sure congress exercises substantial oversight and use the political process to encourage presidents to install leadership at the NSA who understands these issues.
Facial recognition technology can be used to identify spies, code advertisers uses to surreptitiously identify and track customers is ideal for covert surveillance and the software the NSA uses to monitor it’s huge data streams was built by private sector companies using much of the same technology used to various kinds of search engines. ↩
A less idealistic treaty that recognize the role for land mines in major military operations probably could have done more to safe guard civilians from harm by, instead, banning persistent mines. As such a ban would actually favor the interests of the great powers (persistent mines are easier to make by low tech actors) they would have helped enforce it rather than providing cover for irresponsible use of landmines. ↩