Wednesday, August 20, 2014

The Journal Club ♯2 - Karlsen on God and the Benefits of Existence



Welcome to this, the second edition of the Philosophical Disquisitions Journal Club. The goal of the journal club is to encourage people to read, reflect upon, and debate some of the latest works in philosophy. The club focuses on work being done in the philosophy of religion (broadly defined). This month we’re looking at the following paper:




The paper tries to fuse traditional concerns about the problem of evil with recent work in population ethics. The result is an interesting, and somewhat novel, atheological argument. As is the case with every journal club, I will try to kick start the discussion by providing an overview of the paper’s main arguments, along with some questions you might like to ponder about its effectiveness.


1. The Dialectical Context
One of the key features of Karlsen’s paper is his attempt to embed his argument in the debate about the problem of evil. It may be insulting to recapitulate the basic features of that debate here, but some scene-setting is in order. So here we go…

Traditionally, the problem of evil can be run in two different ways. The logical problem of evil (LPE) tries to argue that there is a logical (or maybe metaphysical) incompatibility between the existence of any amount of evil/suffering and the existence of God; the evidential problem of evil tries to argue that the existence of evil/suffering provides evidence against the existence of god (with the weight of that evidence going up in direct proportion to the amount of evil in the world). For better or worse, Karlsen concentrates his energies on the logical version.

In doing so, he looks at the classic rebuttal of the LPE from Alvin Plantinga. Plantinga famously pointed out that the atheist’s threshold for success in this argument is pretty high. The atheist needs to show that God could never have a morally sufficient reason for creating beings that suffer, i.e. that there is no possible world in which the existence of God is compatible with the existence of suffering. But it is doubtful that the atheist could ever successfully show this. Indeed, there is good reason to think that there is a least one possible world in which the co-existence of God and suffering is not incompatible. To prove this point, Plantinga deploys his famous “Free Will Defence” (FWD), which argues that human free will is a good conferred by God on human beings, and that this particular good may entail the existence of suffering in at least one possible world.

Karlsen takes a broad perspective on this debate, using the back-and-forth between Plantinga and defenders of the LPE as an illustration of a problem he has with other proposed theodicies. The problem is that theists participating in this debate tend to assume that (a) God is a moral agent whose behaviour can be judged according to moral standards; and (b) that God had a morally sufficient reason for creating human beings who suffer. Karlsen doubts that both assumptions can be consistently sustained. He thinks that theists are working with an impoverished understanding of what counts as a morally sufficient reason for creation and are neglecting the implications of one widely accepted moral standard.


2. The Argument: God Did Not Benefit Us through Creation
So how does Karlsen’s argument work? It starts with the claim that there is one moral principle that nearly everyone can agree on. This is the principle of benevolence. In broad terms, this principle holds that a moral agent ought never to harm another being with moral status; they ought only to benefit them. More precisely (and for the purposes of Karlsen’s argument) it entails the following:

Principle of Benevolence: requires that harm be avoided unless its avoidance implies greater harm or deprivation of benefits that outweigh the harm.

Under this formulation, it is not the case that a moral agent can never harm another being, it is just that they cannot harm another being unless that harm is outweighed by greater benefits. There are some classic analogies that help us to understand the idea. If I am a parent, I can justifiably bring my child for a painful round of vaccinations (or other medical therapy). This is because the harm inflicted in the course of those treatments is outweighed by the benefit they bestow on the child’s long-term health.

All of this seems pretty reasonable, and unless one is a fideist or Ockhamist, one would probably agree that God is also bound by the principle of benevolence. Nevertheless, this gives rise to a problem. For, according to Karlsen, it is not the case that God actually benefitted us through the act of creation. Indeed, he may have harmed us by doing so. Consequently, he breached the principle of benevolence. And surely no being worthy of the name “God” would have breached this principle? To put these thoughts into a more formal argument:


  • (1) A true God would meet any requirement entailed by the principle of benevolence (because of omnipotence, omniscience and omnibenevolence)
  • (2) Benevolence requires that harm be avoided unless its avoidance implies greater harm or deprivation of benefits that outweigh the avoided harm.
  • (3) Existent beings that suffer are harmed by existence.
  • (4) The never-existent cannot be deprived of benefits.
  • (5) Therefore, a true God would not have created beings that suffer.
  • (6) There do exist beings that suffer.
  • (7) Therefore, God does not exist.


(Note: this is my reconstruction of the argument. It differs slightly from the more cumbersome version set out by Karlsen in the article, but I think it retains the basic logic and the key premises).

What should we make of this argument? Obviously, there are a variety of ways for the theist to respond. They could deny (1) or (2), holding that God is not bound by the principle of benevolence or that we don’t have the right understanding of that principle. This would be pretty drastic, either amounting to a form of Ockhamism (“God determines his own moral standards”) or sceptical theism (“there are beyond-our-ken moral reasons for God doing what he does”). Both of those positions have their problems, some of which I have addressed in the past.

The more plausible theistic response would be to deny premise (4). After all, that premise relies on a controversial thesis, viz. that coming into existence cannot be a benefit to the being that comes into existence. Furthermore, that premise is crucial to the inference to the conclusion stated in premise (5). It is only if creation benefits the non-existent that the suffering in existence is morally justified. Let’s look at what can be said about this thesis now.


3. Is Existence a Benefit?
It is very common for people to assume that existence is better than non-existence; that being alive is, all things considered, a benefit to the living. As Bertrand Russell once put it, this is the reason why children are “exhorted to be grateful to their parents”. But how credible is the idea? At this point in the paper, Karlsen reviews a number of debates and positions in population ethics. I won’t go through all of them here. I’ll just highlight some of the main conclusions.

First, and following the views of Nils Holtug, there does seem to be one obvious way in which we can deny premise (4). We can say that if we cause a person to come into existence whose life is, on balance, worth living, we are benefitting them. And vice versa. Thus, if we cause a person to come into existence and their life is nothing but unending misery and suffering, we are not benefitting them; but if their life is full of joy and satisfaction, then we are. This view assumes symmetry between the harms and benefits of existence and non-existence. In other words, it assumes that existence can be, on balance, beneficial, and that non-existence can be, on balance, a deprivation of benefits (and vice versa). Call this the symmetry thesis.

The symmetry thesis is somewhat attractive, and does seem to track how people think about the benefits of existence. Still, it has a number of problems. For starters, it would seem to run into the non-identity problem. This problem has to do with the fact that a subject needs to exist before we can make claims about what will harm and/or benefit that subject. All talk about the harms and benefits of non-existence seems odd because there is no subject in existence who can be harmed or benefitted. Holtug (and others) get around this problem by claiming that we are not talking about individual agents but rather classes of possible agents (i.e. we are not talking about different states of the same person but about different states of affairs in general). Thus, it is not that existence or non-existence is beneficial for a particular agent, but rather that it can be for agents as a general class. This may skirt the non-identity problem, but it then runs directly into another one. The general class of non-existent agents is very large indeed. It could be that we are failing to benefit all members of this general class by not bringing them into existence. Should we be concerned? Should there be a massive push to ensure that all agents with potentially beneficial lives are brought into being? This would seem odd indeed (Parfit’s “repugnant conclusion” becomes relevant here).

There are various possible refinements to the symmetry thesis, some of which are discussed by Karlsen in the article. In the end he thinks they all run into the same basic problem: they seem to lead to absurd (and repugnant) views about our obligation to procreate. For if non-existence can be harmful (either in person-affecting terms or in impersonal terms) it would seem like a moral agent would have to do something about it. This leads him to consider the case for the alternative asymmetry thesis. According to the asymmetry thesis, there is reason to think that non-existence is very different from existence. Specifically, there is reason to think that non-existence is not harmful (or benefit-depriving).

In defending this view, Karlsen appeals to Christoph Fehige’s “anti-frustrationism” thesis. This thesis starts from the presumption that what matters in terms of harm and benefit is preference satisfaction. A person is benefitted when their preferences are satisfied and they are harmed when their preferences are frustrated. At first glance, this would seem to once again entail the symmetry thesis. After all, when we look at existence, we can say that it is impersonally good if people exist and have their preferences satisfied; contrariwise, when we look at non-existence, we can say that, from an impersonal perspective, it is bad that no preferences are being satisfied.

But that is not the right way to look at it. Not according to Fehige anyway. For there is a critical difference between the existent state and the non-existent one. In the existent state, agents are in existence and have certain preferences. It is then, of course, harmful if those agents have their preferences frustrated. But that is only because they already exist and have preferences. In the non-existent state, no agents ever come into existence with preferences. Consequently, there is no preference frustration in the non-existent state. This is crucial because, according to Fehige’s account, what matters when it comes to evaluating the moral worth of different states of affairs is the amount of preference frustration going on in those states of affairs, not the amount of preference satisfaction. A world in which no agents exist and no preferences are frustrated is, on his view, on a par with a world in which lots of agents exist and their preferences are satisfied. This is the anti-frustrationist thesis.

This then gives Karlsen what he needs to defend premise (4) of his argument. If anti-frustrationism is true, then it is also true that the non-existent are not being deprived of benefits. It would then follow that God lacked a sufficient moral reason to bring agents into existence, particularly ones that seem to suffer greatly. Consequently, we have reason to suspect that God would not bring such agents into existence and so the fact that we exist and we suffer gives reason to think that God does not exist.


4. Questions
I have to say, I like several features of Karlsen’s argument. I like how he draws from the literature on population ethics, and I like how he draws our attention to the issue of God’s moral reasons for bringing us into existence. Nevertheless, I find the paper a little bit messy, like it was trying to do too much in too short a space. Here a couple of questions I think might be worth pondering:


  • 1. Does Karlsen succeed in connecting his argument to the traditional debate about the problem of evil? More precisely, does he succeed in showing that most theodicies rely on the assumption that God benefitted us by bringing us into existence?


  • 2. Is anti-frustrationism a plausible moral thesis? The claim is that it follows from general considerations of welfare and resolves certain problems with other welfarist view, but does it really? Does it lead to problems of it’s own?


What do you think? Please fell free to comment below.

Monday, August 18, 2014

Bostrom on Superintelligence (6): Motivation Selection Methods



(Series Index)

This is the sixth part in my series on Nick Bostrom’s recent book Superintelligence: Paths, Dangers, Strategies. The series is covering those parts of the book that most interest me. This includes the sections setting out the basic argument for thinking that the creation of superintelligent AI could threaten human existence, and the proposed methods for dealing with that threat.

I’m currently working through chapter 9 of the book. In this chapter, Bostrom describes the “Control Problem”. This is the problem that human engineers and developers have when they create a superintelligence (or, rather, when they create the precursor to a superintelligence). Those engineers and developrs will want the AI to behave in a manner that is consistent with (perhaps even supportive of) human flourishing. But how can they ensure that this is the case?

As noted the last day, there are two general methods of doing this. The first is to adopt some form of “capability control”, i.e. limit the intelligence’s capabilities so that, whatever its motivations might be, it cannot pose a threat to human beings. We reviewed various forms of capability control in the previous post. The second is to adopt some form of “motivation selection”, i.e. ensure that the AI has benevolent (or non-threatening) motivations. We’ll look at this today.

In doing so, we’ll have to contend with four possible forms of motivation selection. They are: (i) direct specification; (ii) domestication; (iii) indirect normativity; and (iv) augmentation. I’ll explain each and consider possible advantages and disadvantages.


1. Direct Specification
The direct specification method — as name suggests — involves directly programming the AI with the “right” set of motivations. The quintessential example of this is Isaac Asimov’s three (or four!) laws of robotics, from his “Robot” series of books and short stories. As you may know, in these stories Asimov imagined a future in which robots are created and programmed to follow a certain set of basic moral laws. The first one being “A robot may not injure a human being or allow, through inaction, a human being to come to harm”. The second one being “A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law”. And so on (I won’t go through all of them).

At first glance, laws of this sort seem sensible. What could go wrong if a robot was programmed to always follow Asimov’s first law? Of course, anyone who has read the books will know that lots can go wrong. Laws and rules of this sort are vague, open to interpretation. In specific contexts they could be applied in very odd ways, especially if the robot has a very logical or literalistic mind. Take the first law as an example. It says that a robot may not, through inaction, allow any human to come to harm. This implies that the robot must be at all times seeking to avoid possible ways in which humans could come to harm. But humans come to harm all the time. How can we stop it? A superintelligent robot, with a decisive advantage over human beings, might decide that the safest thing to do would be to put all humans into artificially induced comas. It wouldn’t be great for them, but it would prevent them from coming to harm.

Now, you may object and say that this is silly. We could specify the meaning of “harm” in such a way that it can avoid the induced coma outcome. And maybe we could, but as Bostrom points out, quoting from Bertrand Russell, “everything is vague to a degree you do not realize till you have tried to make it precise”. In other words, adding one exception-clause or one-degree of specification doesn’t help to avoid other possible problems with vagueness. Anyone who has studied the development and application of human laws will be familiar with this problem. The drafters of those laws can never fully anticipate every possible future application: the same will be true for AI programmers and coders.

There is, in fact, a more robust argument to made here. I articulated it last year in one of my posts on AI-risk. I called it the “counterexample problem”, and based it on an argument from Muehlhauser and Helm. I’ll just give the gist of it here. The idea behind the direct specification method is that programming intelligences to follow moral laws and moral rules will ensure a good outcome for human beings. But every moral law and rule that we know of is prone to defeating counterexamples, i.e. specific contextual applications of the rule that lead to highly immoral and problematic outcomes. Think of classic counterexample to consequentialism which suggest that following that moral system could lead someone to kill one person in order to harvest his/her organs for five needy patients. Humans usually recoil from such outcomes because of shared intuitions or background beliefs. But how can we ensure that an AI will do the same? It may be free from our encumbrances and inhibitions: it may be inclined to kill the one to save the five. Since all moral theories are subject to the same counterexample problem,

I should note that Richard Loosemore has recently penned a critique of this problem, but I have not yet had the time to grapple with it.


2. Domesticity
The second suggested method of motivation selection is called “domesticity”. The analogy here might be with the domestication of wild animals. Dogs and cats have been successfully domesticated and tamed from wild animals over the course of many generations; some wild animals can be successfully domesticated over the course of their lifespan (people claim this for all sorts of creatures though we can certainly doubt whether it is really true of some animals, e.g. tigers). Domestication means that the animals are trained (or bred) to lack the drive or motivation to do anything that might harm their human owners: they are happy to operate within the domestic environment and their behaviour can be controlled in that environment.

The suggestion is that something similar could be done with artificial intelligences. They could be domesticated. As Bostrom puts it, “it seems extremely difficult to specify how one would want a superintelligence to behave in the world in general — since this would require us to account for all the trade offs in all the situations that could arise — it might be feasible to specify how a superintelligence should behave in one particular situation. We could therefore seek to motivate the system to confine itself to acting on a small scale, within a narrow context, and through a limited set of action modes.” (Bostrom 2014, p. 141)

The classic example of a domesticated superintelligence would be the so-called “oracle” device. This functions as a simple question-answering system. Its final goal is to produce correct answers to any questions it is asked. It would usually do so from within a confined environment (a “box”). This would make it domesticated, in a sense, since it would be happy to work in a constrained way within a confined environment.

But, of course, things are not so simple as that. Even giving an AI the seemingly benign goal of giving correct answers to questions could have startling implications. Anyone who has read the Hitchhiker’s Guide to the Galaxy knows that. In that story, the planet earth is revealed to be a supercomputer created by an oracle AI in order to formulate the “ultimate question” — the meaning of life, the universe and everything — having earlier worked out the answer (“42”). The example is silly, but it highlights the problem of “resource acquisition”, which was mentioned earlier in this series: making sure one has the correct answers to questions could entail the acquisition of vast quantities of resources.

There might be ways around this, and indeed Bostrom dedicates a later chapter to addressing the possibility of an oracle AI. Nevertheless, there is a basic worry about the domestication strategy that needs to be stated: the AI’s understanding of what counts as a minimised and constrained area of impact needs to be aligned with our own. This presents a significant engineering challenge.


3. Indirect Normativity and augmentation
The third possible method of motivation selection is indirect normativity. The idea here is that instead of directly programming ethical or moral standards into the AI, you give it some procedure for determining its own ethical and moral standards. If you get the procedure just right, the AI might turn out to be benevolent and perhaps even supportive of human interests and needs. Popular candidates for such a procedure tend to be modelled on something like the ideal observer theory in ethics. The AI is to function much like an ideal, hyper-rational human being who can “achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard” (Bostrom, 2014, p. 141).

Bostrom doesn’t say a whole lot about this in chapter 9, postponing a fuller discussion to a later chapter. But as I noted in one of my previous posts on this topic, one of the main problems with this method of motivation selection is ensuring you’ve got the right norm-picking procedure. Getting it slightly wrong could have devastating implications, particularly if the machine has a decisive strategic advantage over us.

The fourth and final method of motivation selection is augmentation. This is quite different from the methods discussed thus far. They were all imagining an artificial intelligence that would be designed from scratch. This imagines that we start with a system that has the “right” motivations and we amp-up its intelligence from there. The obvious candidate for such a system would be a human being (or group of human beings). We could simply take their brains, with their evolved and learned motivations, and augment their capabilities until we reach a point of superintelligence. (Ignore, for now, the ethics of doing this.)

As Bostrom notes, augmentation might look pretty attractive if all other methods turn out to be too difficult to implement. Furthermore, it might end up being a “forced choice”. If augmentation is the only route to superintelligence, then augmentation is, by default, the only available method of motivation selection. Contrariwise, if the route to superintelligence is via the development of AI, augmentation is not on the cards.

In any event, as a “solution” to the control problem, augmentation leaves a lot to be desired. If the system we augment has some inherent biases or flaws, we may simply end up exaggerating those flaws through a series of augments. It might be wonderful to augment a Florence Nightingale to superintelligence, but it might be nightmarish to do the same with a Hitler. Furthermore, even if the starter-system is benevolent and non-threatening, the process of augmentation could have a corrupting effect. A super-rational, super-intelligent human being, for instance, might end up being an anti-natalist and might decide that human annihilation is the morally best outcome.

Okay, so that it’s for this post. The table below summarises the various motivation selection methods. This might be it for my series on Bostrom’s book. If I have the time, I may do two more on chapter 10, but that’s looking less viable every day.







Wednesday, August 13, 2014

Are we heading for technological unemployment? An Argument




We’re all familiar with the headlines by now: “Robots are going to steal our jobs”, “Automation will lead to joblessness”, and “AI will replace human labour”. It seems like more and more people are concerned about the possible impact of advanced technology on employment patterns. Last month, Lawrence Summers worried about it in the Wall Street Journal but thought maybe the government could solve the problem. Soon after, Vivek Wadhwa worried about it in the Washington Post, arguing that there was nothing the government could do. Over on the New York Times, Paul Krugman has been worrying about it for years.

But is this really something we should worry about? To answer that, we need to distinguish two related questions:

The Factual Question: Will advances in technology actually lead to technological unemployment?
The Value Question: Would long-term technological unemployment be a bad thing (for us as individuals, for society etc)?

I think the answer to the value question is a complex one. There are certainly concerns one could have about technological unemployment — particularly its tendency to exacerbate social inequality — but there are also potential boons — freedom from routine drudge work, more leisure time and so on. It would be worth pursuing these issues further. Nevertheless, in this post I want to set the value question to one side. This is because the answer to that question is going to depend on the answer to the factual question: there is no point worrying or celebrating technological unemployment if its never going to happen.

So what I want to do is answer the factual question. More precisely, I want to try to evaluate the arguments for and against the likelihood of technological unemployment. I’ll start by looking at an intuitively appealing, but ultimately naive, argument in favour of technological unemployment. As I’ll point out, many mainstream economists find fault with this argument because they think that one of the assumptions it rests on is false. I'll then outline five reasons for thinking that the mainstream view is wrong. This will leave us with a more robust argument for technological unemployment. I will reach no final conclusion about the merits of that argument. As with all future-oriented debates, I think there is plenty of room for doubt and disagreement. I will, however, suggest that the argument in favour of technological unemployment is a plausible one and that we should definitely think about the possible future to which it points.

My major reference point for all this will be the discussion of technological unemployment in Brynjolfsson and McAfee’s The Second Machine Age. If you are interested in a much longer, and more detailed, assessment of the relevant arguments, might I suggest Mark Walker’s recent article in the Journal of Evolution and Technology?


1. The Naive Argument and the Luddite Fallacy
To start off with, we need to get clear about the nature of technological unemployment. In its simplest sense, technological unemployment is just the replacement of human labour by machine “labour” (where the term “machine” is broadly construed and where one can doubt whether we should call what machines do “labour”). This sort of replacement happens all the time, and has happened throughout human history. In many cases, the unemployment that results is temporary: either the workers who are displaced find new forms of work, or, even if those particular workers don’t, the majority of human beings do, over the long term.

Contemporary debates about technological unemployment are not concerned with this temporary form of unemployment; instead, they are concerned with the possibility of technology leading to long-term structural unemployment. This would happen if displaced workers, and future generations of workers, cannot find new forms of employment, even over the long-term. This does not mean that there will be no human workers in the long term; just that there will be a significantly reduced number of them (in percentage terms). Thus, we might go from a world in which there is a 10% unemployment rate, to a world in which there is a 70, 80 or 90% unemployment rate. The arguments I discuss below are about this long-term form of technological unemployment.

So what are those arguments? In many everyday conversations (at least the conversations that I have) the argument in favour of technological unemployment takes an enthymematic form. That is to say, it consists of one factual/predictive premise and a conclusion. Here’s my attempt to formulate it:

(1) Advances in technology are replacing more and more forms of existing human labour.
(2) Therefore, there will be technological unemployment.

The problem with this argument is that it is formally invalid. This is the case with all enthymemes. We are not entitled to draw that conclusion from that premise alone. Still, formal invalidity will not always stop someone from accepting an argument. The argument might seem intuitively appealing because it relies on a suppressed or implied premise that people find compelling. We’ll talk about that suppressed premise in a moment, and why many economists doubt it. Before we do that though, it’s worth briefly outlining the case for premise (1).

That case rests on several different strands of evidence. The first is just a list of enumerative examples, i.e. cases in which technological advances are replacing existing forms of human labour. You could probably compile a list of such examples yourself. Obviously, many forms of manufacturing and agricultural labour have already been replaced by machines. This is why we no longer rely on humans to build cars, plough fields and milk cows (there are still humans involved in those processes, to be sure, but their numbers are massively diminished when compared with the past). Indeed, even those forms of agricultural and manufacturing labour that have remained resistant to technological displacement — e.g. fruit pickers — may soon topple. There are other examples too: machines are now replacing huge numbers of service sector jobs, from supermarket checkout workers and bank tellers, to tax consultants and lawyers; advances in robotic driving seem likely to displace truckers and taxi drivers in the not-too-distant future; doctors may soon see diagnostics outsourced to algorithms; and the list goes on and on.

In addition to these examples of displacement, there are trends in the economic data that are also suggestive of displacement. Brynjolfsson and McAfee outline some of this in chapter 9 of their book. One example is the fact that recent data suggests that in the US and elsewhere, capital’s share of national income has been going up while labour’s share has been going down. In other words, even though productivity is up overall, human workers are taking a reduced share of those productivity gains. More is going to capital, and technology is one of the main drivers of this shift (since technology is a form of capital). Another piece of evidence comes from the fact that since the 1990s recessions have, as per usual, been followed by recoveries, but these recoveries have tended not to significantly increase overall levels of employment. This means that productivity gains are not matched by employment gains. Why is this happening? Again, the suggestion is that businesses find that technology can replace some of the human labour they relied on prior to the recession. There is consequently no need to rehire workers to spur the recovery. This seems to be especially true of the post-2008 recovery.

So premise (1) looks to be solid. What about the suppressed premise? First, here’s my suggestion for what that suppressed premise looks like:

(3) Nowhere to go: If technology replaces all existing forms of human labour, and there are no other forms of work for humans to go to, then there will be technological unemployment.

This plugs the logical gap in the initial argument. But it does so at a cost. The cost is that many economists think that the “nowhere to go” claim is false. Indeed, they even have a name for it. They call it the “Luddite fallacy”, inspired in that choice of name by the Luddites, who protested against the automation of textile work during the Industrial Revolution. History seems to suggest that the Luddite concerns about unemployment were misplaced. Automation has not, in fact, led to increased long-term unemployment. Instead, human labour has found new uses. What’s more, there appear to be sound economic reasons for this, grounded in basic economic theory. The reason why machines replace humans is that they increase productivity at a reduced cost. In other words, you can get more for less if you replace a human worker with a machine. This in turn reduces the costs of economic outputs on the open market. When costs go down, demand goes up. This increase in demand should spur the need or desire for more human workers, either to complement the machines in existing industries, or to assist entrepreneurial endeavours in new markets.

So embedded in the economists’ notion of the Luddite Fallacy are two rebuttals to the suppressed premise:

(4) Theoretical Rebuttal: Economic theory suggests that the increased productivity from machine labour will reduce costs, increase demand, and expand opportunities for existing or novel forms of human labour.
(5) Evidential Rebuttal: Accumulated evidence, over the past 200 years, suggests that technological unemployment is at most a temporary problem: humans have always seemed to find other forms of work.



Are these rebuttals any good? There are five reasons for thinking they aren’t.


2. Five Reasons to Question the Luddite Fallacy
The five reasons are drawn from Brynjolfsson and McAfee’s book. I will refer to them as “problems” for the mainstream approach. The first is as follows:

(6) The Inelastic Demand Problem: The theoretical rebuttal assumes that demand for outputs will be elastic (i.e. that reductions in price will lead to increases in demand), but this may not be true. It may not be true for particular products and services, and it may not be true for entire industries. Historical evidence seems to bear out this point.

Let’s go through this in a little more detail. The elasticity of demand is a measure of how sensitive demand is to changes in price. The higher the elasticity, the higher the the sensitivity; the lower the elasticity, the lower the sensitivity. If a particular good or service has a demand elasticity of one, then for every 1% reduction in price, there will be a corresponding 1% increase in demand for that good or service. Demand is inelastic when it is relatively insensitive to changes in price. In other words, consumers tend to demand about the same over time (elasticity of zero).

The claim made by proponents of the Luddite fallacy is that the demand elasticity for human labour, in the overall economy, is around one, over the long haul. But as McAfee and Brynjolfsson point out, that isn’t true in all cases. There are particular products for which there is pretty inelastic demand. They cite artificial lighting as an example: there is only so much artificial lighting that people need. Increased productivity gains in the manufacture of artificial lighting don’t result in increased demand. Similarly, there are entire industries in which the demand elasticity for labour is pretty low. Again, they cite manufacturing and agriculture as examples of this: the productivity gains from technology in these industries do not lead to increased demand for human workers in those industries.

Of course, lovers of the Luddite fallacy will respond to this by arguing that it doesn’t matter if the demand for particular goods or services, or even particular industries, is inelastic. What matters is whether human ingenuity and creativity can find new markets, i.e. new outlets for human labour. They argue that it can, and, more pointedly, that it always has. The next two arguments against the Luddite fallacy give reason to doubt this too.

(7) The Outpacing Problem: The theoretical rebuttal assumes that the rate of technological improvement will not outpace the rate at which humans can retrain, upskill or create new job opportunities. But this is dubious. It is possible that the rate of technological development will outpace these human abilities.

I think this argument speaks for itself. For what it’s worth, when JM Keynes first coined the term “technological unemployment”, it was this outpacing problem that he had in mind. If machines displace human workers in one industry (e.g. manufacturing) but there are still jobs in other industries (e.g. computer programming), then it is theoretically possible for those workers (or future generations of workers) to train themselves to find jobs in those other industries. This would solve the temporary problem of automation. But this assumes that humans will have the time to develop those skills. In the computer age, we have witnessed exponential improvements in technology. It is possible that these exponential improvements will continue, and will mean that humans cannot redeploy their labour fast enough. Thus, I could encourage my children to train to become software engineers, but by the time they developed those skills, machines might be better software engineers than most humans.

The third problem is perhaps the most significant:

(8) The Inequality Problem: The technological infrastructure we have already created means that less human labour is needed to capture certain markets (even new ones). Thus, even if people do create new markets for new products and services, it won’t translate into increased levels of employment.

This one takes a little bit of explanation. There are two key trends in contemporary economics. First is the fact that an increasing number of goods and services are being digitised (with the advent of 3D printing, this now include physical goods). Digitization allows for those goods and services to be replicated at near zero marginal cost (since it costs relatively little for a digital copy to be made). If I record a song, I can have it online in an instant, and millions of digital copies can be made in a matter of hours. The initial recording and production may cost me a little bit, but the marginal cost of producing more copies is virtually zero. A second key trend in contemporary economics is the existence of globalised networks for the distribution of goods and services. This is obviously true of digital goods and services, which can be distributed via the internet. But it is also true of non-digital goods, which can rely on vastly improved transport networks for near-global distribution.

These two trends have led to more and more “winner takes all” markets. In other words, markets in which being the second (or third or fourth…) best provider of a good or service is not enough: all the income tends to flow to one participant. Consider services like Facebook, Youtube, Google and Amazon. They dominate particular markets thanks to globalised networks and cheap marginal costs. Why go to the local bookseller when you have the best and cheapest bookstore in the world at your fingertips?

The fact that the existing infrastructure makes winner takes all markets more common has pretty devastating implications for long-term employment. If it takes less labour input to capture an entire market — even a new one — then new markets won’t translate into increased levels of employment. There are some good recent examples of this. Instagram and WhatsApp have managed to capture near-global markets for photo-sharing and free messaging, but with relatively few employees. (Note: there is some hyperbole in this, but the point still holds. Even if the best service provider doesn’t capture the entire market, there is still less opportunity for less-good providers to capture a viable share of the market. This still reduces likely employment opportunities.)

The fourth problem with the Luddite fallacy has to do with its reliance on historical data:

(9) The Historical Data Problem: Proponents of the Luddite fallacy may be making unwarranted inferences from the historical data. It may be that, historically, technological improvements were always matched by corresponding improvements in the human ability to retrain and find new markets. But that’s because we were looking at the relative linear portion of an exponential growth curve. As we now enter a period of rapid growth, things may be different.

In essence, this is just a repeat of the point made earlier about the outpacing problem. The only difference is that this time it is specifically targetted at the use of historical data to support inferences about the future. That said, Brynjolfsson and McAfee do suggest that recent data support this argument. As mentioned earlier, since the 1990s job growth has “decoupled” from productivity: the number of jobs being created is not matching the productivity gains. This may be the first sign that we have entered the period of rapid technological advance.

The fifth and final problem is essentially just a thought experiment:

(10) The Android Problem: Suppose androids could be created. These androids could do everything humans could do, only more efficiently (no illness, no boredom, no sleep) and at a reduced cost. In such a world, every rational economic actor would replace human labour with android labour. This would lead to technological unemployment.

The reason why this thought experiment is relevant here is that there doesn’t seem to be anything unfeasible about the creation of androids: it could happen that we create such entities. If so, there is reason to think technological unemployment will happen. What’s more, this could arise even if the androids are not perfect facsimiles of human beings. It could be that there are one or two skills that the androids can’t compete with humans on. Even still, this will lead to a problem because it will mean that more and more humans will be competing for jobs that involve those one or two skills.




3. Conclusion
So there you have it: an argument for technological unemployment. At first, it was naively stated, but when defended from criticism, it looks more robust. It is indeed wrong to assume that the mere replacement of existing forms of human labour by machines will lead to technological unemployment, but if the technology driving that replacement is advancing at a rapid rate; if it is built on a technological infrastructure that allows for “winner takes all” markets; and if ultimately it could lead to the development of human-like androids, then there is indeed reason to think that technological unemployment could happen. Since this will lead to a significant restructuring of human society, we should think seriously about its implications.

At least, that’s how I see it right now. But perhaps I am wrong? There are a number of hedges in the argument — we’re predicting the future after all. Maybe technology will not outpace human ingenuity? Maybe we will always create new job opportunities? Maybe these forces will grind capitalism to a halt? What do you think?

Tuesday, August 12, 2014

An Ethical Framework for the Use of Enhancement Drugs




Debate about the merits of enhancement tends to be pretty binary. There are some — generally called bioconservatives — who are opposed to it; and others — transhumanists, libertarians and the like — who embrace it wholeheartedly. Is there any hope for an intermediate approach? One that doesn’t fall into the extremes of reactionary reject or uncritical endorsement?

Probably. Indeed, a careful reading of many pro- and anti-enhancement writers suggests that they are not, always and everywhere, in favour or against the use of enhancement. But to sustain the intermediate approach, we need some framework for deciding when enhancement is permissible and when it is not. In their paper, “Who Should Enhance? Conceptual and Normative Dimensions of Cognitive Enhancement”, Filippo Santoni di Sio, Philip Robichaud and Nicole Vincent try to provide one such framework. They base it on set of tools for determining the “nature of an activity”. They argue that for certain practice-oriented activities, the use of cognitive enhancement may not be permissible, but for goal-directed activities, it probably is permissible (maybe even obligatory).

In this post I want to outline their proposed framework, and offer some (minor) critical comments along the way. On the whole, I find the approach promising, but I think there are some practical difficulties (as I will explain). I’ll divide my discussion into two parts. First, I’ll set out the conceptual toolkit proposed by the authors for analysing human activities. Second, I’ll explain the implications of this toolkit for the enhancement debate.


1. The Nature of Activities Approach
Santoni di Sio and his colleagues claim that the permissibility of cognitive enhancement depends on the nature of the activity we are interested in. To back this up, they present us with a toolkit of concepts for understanding the nature of an activity. This toolkit consists of three pairs of related concepts.

The first pair of concepts is the distinction between practice-oriented activities and goal-oriented activities. This is probably the most important pair of concepts and forms the backbone of their approach to understanding the permissibility/impermissibility of cognitive enhancement.

The idea is that every human activity has definitional limits, i.e. qualities or attributes that render it distinct from other activities. “Walking” is distinct from “running”; “trading on the stock exchange” is distinct from “performing surgery”; “engaging in academic research” is distinct from “being educationally assessed”. At a very broad level, one of the key differentiators of activities is whether the activities are externally-focused or internally-focused. That is to say, whether they are concerned with producing or bringing about a certain outcome, or engaging in a particular kind of performance.

Arguably, performing surgery is an externally-focused activity. We care about surgery to the extent that it produces a particular kind of outcome: healing or curing the patient. The precise manner in which it is performed matters less. Indeed, surgical techniques and methods are evolving all the time, usually with the explicit goal of improving patient-outcomes. Contrariwise, running the 100m sprint is, arguably, an internally-focused activity. There is an outcome that we care about, to be sure (viz. who crosses the line first), but we care about the way in which that outcome is brought about even more. You must run down the track in order to perform that activity; you cannot rollerblade or cycle.

Still, as the sprinting example suggests, there is some need for nuance here. Most human activities are hybrid in nature: they are partially externally-focused and partially internally-focused. To get around this problem, Santoni di Sio et al suggest that we can still look on activities as being predominantly externally-focused and predominantly internally-focused. The former they call goal-oriented activities; the latter they call practice-oriented activities.

The second pair of concepts employed by Santoni di Sio and his colleagues is the distinction between constitutive rules and regulative rules. I’ve covered this many times before so I’ll just offer a brief explanation here. A regulative rule is one that ascribes certain standards of performance to an independently intelligible activity. The rules of the road, for example, are regulative. They take the activity of driving — which is intelligible in itself — and tell us how to perform that activity safely. Constitutive rules are different. They actually create (or “constitute”) an activity. Apart from those rules, the activity is not in itself intelligible. A classic example is the rules of chess. Those rules actually determine what it is to perform the activity of playing chess. Without those rules, you simply have the activity of moving bits of carved wood around a checkered board. You don’t have chess.

Why is this distinction relevant to this debate? It is relevant because Santoni di Sio et al argue that goal-oriented activities are governed by regulative rules, whereas practice-oriented activities are governed by constitutive rules. A goal-oriented activity like surgery is all about the production of a particular outcome. There are no internal limits on how it can be performed. Consequently, the only standards that are applied to it are regulative in nature: they tell us how to perform surgery safely and ethically. A practice-oriented activity is different. It does have some internal limits on how it can be performed. Consequently, it must be governed (at least in part) by constitutive rules, i.e. rules that specify what the activity is to be.

The third and final pair of concepts employed by Santoni di Sio and his colleagues is the distinction between coarse-grained and fine-grained descriptions of activities. A coarse-grained description of an activity abstracts away from most of the particular details of the performance, and focuses instead on general, macroscopic features of the performance. A fine-grained description is the opposite: it doesn’t abstract from particular details. Obviously, these are not binary categories: descriptions exist along a continuum from the extremely fine-grained to the extremely coarse-grained.

Why is this important? Santoni di Sio et al think that, when it comes to practice-oriented activities, the level of description can make a normative difference. They use an example involving car racing; I’ll use an example involving golf (which I am more familiar with). Imagine how golf was played 150 years ago. It was played on poorly-kept courses, with gutta-percha (rubber-like) balls that became misshapen with use, and with wooden clubs (with wooden or steel shafts). Now think about how golf is played today. It is played (usually) on well-kept courses, with durable multi-layered, synthetic golf balls, and with titanium and graphite golf clubs. At a coarse-grained level, golf is still the same game it always was: it is about getting a ball into a hole. But at a more fine-grained level, it is a different game: the changes in technology and course preparation have seen to that. The question that needs to be asked is whether those changes somehow subvert the practice-oriented point of the game. For the most part they do not. We have, after all, tolerated most of these technological changes. But sometimes they do. This is why certain types of golf-related technologies have been banned (e.g. club faces with a high COR or distance finders in competitive play). They are thought to subvert or alter the nature of the activity.


2. The Ethical Framework
With the conceptual toolkit in place, we can now build the ethical framework. I think the easiest way to do this is to imagine it as a flow-chart. You start with a given activity (e.g. academic research, surgery, education, financial trading etc.). You then proceed through a series of nodes on the flow chart. At each node in the flow-chart you ask and answer a question. At the end of the process you reach an ethical evaluation about the use of enhancement in that particular activity.

The first question you need to ask yourself is: what is the nature of the activity in question? In other words, is it goal-oriented or practice-oriented? As noted above, this can be a complicated question as many activities are a bit of both. So how can you settle on one category? Santoni di Sio and his colleagues propose a simple test. They say:

…to realize whether a certain activity is goal- directed or practice-oriented…try to mentally eliminate either the realization of the internal or external goals of a given activity, and see which one would result in the loss of that activity’s point. Would it make sense, for instance, to go out with friends if one did not enjoy their company, or to play a certain game if one did not find the activity amusing or challenging or interesting? As the answer to both questions is negative (setting aside other goal like wishing to develop the friendship or to acquire an appreciation for the games), one may conclude that those are practice-oriented activities.
(Santoni di Sio et al, 2014, p. 182

This is an intuitively appealing test. You could probably also ask yourself whether the activity seems to be governed by regulative or constitutive rules. Nevertheless, I think there are problems lurking here. The reality is that activities might take on different characteristics for different performers. For example, it might seem natural to say that a sport like the 100m sprint or golf is practice-oriented. But is that true for all players? I suspect that for some professional athletes the external goods of the activity (the winning; the money; the sponsorship deals; the fame etc.) may swamp the internal ones. Thus, for them, the sport will seem very much like a goal-directed activity. This may, in turn, explain why some athletes adopt a “win at all costs” attitude. (Consider all the professional diving and fouling at this year’s World Cup). I suspect something similar may be true of certain students in university education. For them, education will be about outcomes (good grades and good jobs). They will care rather less about the internal goods of learning.

You can, of course, argue that these athletes and students have a distorted view of the nature of their respective activities; that their attitude subverts and undermines the point of those activities. In that case you are presupposing some normative view of those activities that is independent of the attitudes of particular performers (and, perhaps, independent of the test proposed by Santoni di Sio and his colleagues). This may be the right perspective to adopt. Still, I think the different perspectives are worth bearing in mind. This for two reasons. First, there may be borderline or fuzzy cases where we’re just not sure what the correct categorisation is. Second, the fact that different performers will view the activity differently will make any proposed enhancement policy more or less difficult to implement.

Leaving those criticisms to the side, let’s suppose that we can answer the initial question. In that case, we will have two diverging branches. Along one of those branches we will be dealing with practice-oriented activities, along the other we will be dealing with goal-directed activities. Let’s focus on practice-oriented activities for the time being.

In order to determine whether the use of enhancement should be permissible or impermissible in practice-oriented activities, we have to ask two separate questions. First, is the activity one with some high social or moral value? In other words, is it something whose integrity we would like to preserve? In the case of education (which we will assume to be practice-oriented) we would probably say “yes”: it is something of high social value whose integrity we would like to preserve. In the case of something like stamp-collecting or amateur sudoko, we would probably say “no”: there is nothing of huge social or moral value hinging on the current way in which those activities are performed.

What difference does this make? Well, if the activity is of low social or moral value, then we probably shouldn’t care all that much whether people use enhancers while performing it. If my father wants to use modafinil when solving his sudoko puzzles, then so be it. Its use should be permissible in this and similar cases (note: this is assuming there are no severe side effects). If, on the other hand, the action is of high social value, we proceed to the next question.

That next question is: would the use of cognitive enhancement subvert the valuable point of the activity? In answering that question, we will need to look at different levels of description. If enhancement makes a difference at a coarse-grained level of description (i.e. if it seems to alter the activity at that abstract level), then it should probably be impermissible: in that case it is definitely subverting the point of the activity. If it makes a difference at a fine-grained level, then the issue is more complex. We will have to ask whether that description accurately captures the point of the activity. If it does, then once again we have reason to deem the use of enhancement impermissible.

Santoni di Sio and his colleagues think that education might be one example of an activity where this is true. They suggest that education is about doing things with a certain kind of effort (i.e. with certain obstacles in place), and that the use of enhancers may reduce that effort (or remove those obstacles). The result would be that the point of education is undermined. I have say I find this dubious. I think we would need to be much more precise in our account of the “effort” required in education. And I think, given what we know about how existing cognitive enhancers work, it is unlikely that they really do subvert the point of education. That, however, is an argument for another time.

So much for practice-oriented activities. Let’s switch focus now and consider goal-directed activities. The ethical analysis of these activities is much more straightforward. Basically, if the activity is goal-directed, then all that matters is whether the goal is attained, not how the activity is performed (except, of course, it should not be performed in a way that breaches other ethical standards). So, in a sense, we shouldn’t care about the use of enhancement in these activities: if it makes us more likely to attain the goal, we should be allowed to use it.

Of course, that’s not quite true. We still have to ask ourselves the question: is the goal something of high social or moral value? If it is, then not only would enhancement be permissible, it may even be obligatory. For example, the goal of surgery is something of high moral and social value. If there is anything we could do to make it more likely that the surgery is successful we should do it. In fact, if the stakes are high enough, we might even be obliged to do it. For example, pretty much everyone would say that hand-washing is obligatory prior to surgery. Why? Because it reduces the risk of infection and the complications that could arise from that. But if the use of cognitive enhancers would have a similar positive effect, why wouldn’t that be obligatory too?

Santoni di Sio and his colleagues argue that it could be, but only if the use of the enhancers is necessary, efficacious, easy and safe. In other words, only if it is proven that it does make the user more likely to reach the goal, there is no other way of achieving the same gains, and it doesn’t impose significant costs on the user (either in terms of effort or health). The “easy and safe” requirement may actually be overly-cautious: it may be that in certain cases even hard and unsafe practices are obligatory in order to achieve a particular goal. But it’s easier to make the argument for the easy and safe cases.

That’s it for the ethical framework. A diagram depicting the flow chart and the conclusions it brings us to is depicted below.





3. Conclusion
In summary, the notion that there is a “middle ground” in the enhancement debate is appealing, but it needs to be plausibly mapped out. Santoni di Sio and his colleagues have tried to do that with their “nature of activities” approach to the ethics of enhancement. Their claim is that working out the nature of a given activity can determine whether enhancement is permissible/impermissible/obligatory.

I think the best way to sum up the results of their framework is as a series of tests:

Impermissibility Test: If the activity is (a) practice-oriented; (b) has a valuable point; and (c) the use of enhancement would subvert that point, then the use of enhancement should be impermissible in that particular activity.
Permissibility Test: If the activity is (a) practice-oriented but lacks a valuable point; or (b) goal-directed, then the use of enhancement should be permissible in that particular activity.
Obligatoriness Test: If the activity is (a) goal-directed; (b) has a high value; and (c) the use of enhancement is a necessary, efficacious, easy and safe means of achieving that goal, then the use of enhancement should be obligatory in that particular activity.

Anyway, that’s it for this post. As a final note, I would like to point out that Santoni di Sio and his colleagues’ work on this framework is part of a research project called Enhancing Responsibility. The project is being jointly run by TU Delft in the Netherlands, and the University of Oxford. I would recommend checking out some of the other work being done as part of that project.

Monday, August 11, 2014

Finally...The Next (Two) Journal Clubs




I must start off with an apology. I started a journal club initiative back in May, with the aim of doing one per month. As you may have noticed, I have been terribly remiss in the interim, failing to do any over the past two months. This is one of my general character flaws... I have, however, been doing some thinking over that period of time about what I want to the journal club to achieve and what shape I want it to take.

I have decided that I would like the journal club to focus on topics that broadly lie within the philosophy of religion. Why limit it in this way? Well, the philosophy of religion is one of my long-standing interests. In fact, it tended to dominate the early content on this blog. But my primary research and teaching interests do not lie in this area (I focus on ethics, emerging technologies and law). Consequently, I'm looking for an excuse to keep up with the latest research in the philosophy of religion. My hope is that the journal club will give me that excuse.

Of course, that's not all I'm hoping for. I know I have a lot of readers who are interested in the philosophy of religion too, and I'm hoping the journal club will give them an excuse to debate and discuss the latest research in this area as well.

I have several papers already planned for the next few months. To make up for my failure to do anything in June or July, I'm going to do two this month. They are:


  • Dagfinn Sjaastad Karlsen 'Is God Our Benefactor? An argument from suffering' (2013) 3(3) Philosophy of Life 145-167 - This is an interesting paper in that it tries to unite the problem of evil with some debates in population ethics. In essence, it tries to argue that God had no morally sufficient reason for creating us because bringing us into existence did not benefit us (in fact, it may even have harmed us). I'm not sure it is entirely successful in making this core argument, but it provides plenty of food for thought. The discussion about this will begin on the 20th of August 2014.




Both of these papers are available for free online. And in future I'll make sure that every paper that features on the journal club is available for free (I've learned my lesson from the first attempt). I do hope you can join in the conversation about these papers.

Sunday, August 10, 2014

Philosophers' Carnival ♯166



I have the great honour of hosting this month's edition of the Philosophers' Carnival. Last month, Brandon over at Siris counselled us against assuming that all entries could be neatly categorised into areas like "philosophy of mind" or "metaphysics". He's right, no doubt, but I'm too stuck in my ways to do anything but categorise. Before I get underway I want to remind everyone to send in their submissions for the next edition of the carnival. This can be done at the Philosophers' Carnival webpage. Anyway, here we go...


First, given that this is coming out on a Sunday, I thought some stuff broadly related to the philosophy of religion would be in order:



  • Helen de Cruz, on the Prosblogion, looks at a recent study claiming that children from religious backgrounds have trouble distinguishing fact from fiction. She suggests the study doesn't quite warrant that conclusion.




Once you're done sorting out your general worldview, you might like to come back down to earth (relatively speaking) with some moral philosophy. Lots of good stuff out there this month, this is just a small sample:




  • Kantians are a dour bunch, thinking that the good person cannot really want to do good. At Pea Soup, Nomy Arpaly takes issue with this. She tries to defend an account of happy good doing from the ridicule of Christine Korsgaard.


  • Finally, proving that philosophy isn't completely irrelevant to the real word, Frances Kamm and Jeff McMahan offer their thoughts on the just war principle of proportionality and how it might apply to the ongoing conflict in Gaza. 


Speaking of the relevance of philosophy to the real world, there is a well-known saying in aesthetics saying that "Aesthetics is for the artist as ornithology is for the birds". Over on the appropriately-named, Aesthetics for Birds blog, Donald Brook looks at different ways to parse that saying, not all of which are insightfully true.

By the time your done reading all that, you'll have warmed up your mental faculties quite nicely. So-much-so that you might start pondering the mysteries of consciousness and rational thought. If so, here are some suggesting readings on the philosophy of mind and decision-making:


  • On Splintered Mind, Eric Schwitzgebel evaluates Tononi's exclusion postulate, which is part of Tononi's fascinating but "strange" theory of consciousness. The theory claims that consciousness arises from information integrating networks in the brain, and the exclusion postulate basically claims only one such network is 'conscious' at any one time. This is important because the brain contains many information integrating networks.




If you're still eager for more after all that, I suggest that you can finish off with some slightly more random posts ("random" in the sense that my categorisation-skills have run out at this point; not in the sense that their content is random and meaningless, quite the contrary in fact):






Okay, so that's it for this month. Next month's carnival will be hosted at David Papineau's Sport and Philosophy Blog. Remember to keep sending in submissions to the Philosophers' Carnival.

Saturday, August 9, 2014

Bostrom on Superintelligence (5): Limiting an AI's Capabilities



(Series Index)

This is the fifth part of my series on Nick Bostrom’s book Superintelligence: Paths, Dangers, Strategies. So far in the series, we’ve covered why Bostrom thinks superintelligent AIs might pose an existential risk to human beings. We’ve done this by looking at some of his key claims about the nature of artificial intelligence (the orthogonality thesis and the instrumental convergence thesis); and at the structure of his existential risk argument.

In the remaining posts in this series, we’re going to focus on ways in which to contain that existential risk. We start by looking at chapter 9 of the book, which is entitled “The Control Problem”. In this chapter, Bostrom tries to do two things. First, he tries to explain exactly what the problem is when it comes to containing existential risk (that is the control problem). Second, he tries to catalogue and offer brief evaluations of various strategies for addressing that problem.

We’re going to cover both of these things today. First, by talking about principal agent problems and the unique nature of the principal-agent problem that arises in the construction of a superintelligent AI. And second, by looking at one possible set of solutions to that problem: limiting the AI’s capabilities. We will continue to catalogue possible solutions to the control problem in the next post.


1. Principal-Agent Problems and the Control Problem
Principal-agent problems are a mainstay of economic and regulatory theory. They are very easy to explain. Suppose that I want to buy a house, but I don’t have time to view lots of houses and negotiate deals on the ones that best suit my needs. Consequently, I decide to hire you to do all this for me. In this scenario, I am the principal (the person who wants some task to be performed in accordance with my interests), and you are the agent (the person carrying out the tasks on my behalf).

The principal-agent problem arises because the interests of the principal and the agent are not necessarily aligned, and because the agent has access to information that the principal does not. So, for example, when I send you out to look for a house, I have no way of knowing if you actually looked at a sufficient number of houses (you could easily lie to me about the number), or whether you actually negotiated the best possible deal. You just want to get your agent’s fee; you don’t necessarily want to get a house that will suit my needs. After all, you aren’t going to live there. This gives you an incentive to do less than I would like you to do and act in ways that are counterproductive to my interests.

Principal-agent problems are common in many economic and regulatory situations. A classic example arises in the management of companies/corporations. In many publicly-owned companies, the owners of the company (the shareholders) are not the same as the people who manage the company on a day-to-day basis. The owner’s put their money at risk for the company; but the managers do not. There is thus a danger that the managers’ interests are not aligned with those of the shareholders, and that they might make decisions that are averse to those interests.

There are, of course, a variety of “solutions” to this type of principal-agent problem. A classic one is to pay the managers in company stocks so that their interests become aligned with those of the shareholders. Similarly, there are a range of oversight and governance mechanisms that are supposed to hold the managers accountable for bad behaviour. We needn’t get into all that here, though — all we need is the general overview of the principal-agent problem.

Why do we need it? Because, according to Bostrom, the development and creation of superintelligent AIs gives rise to a unique and exceptionally difficult version of the principal-agent problem. To be more precise, it gives rise to two separate principal-agent problems. As he describes them (pp. 127-128)

The First Principal Agent Problem: This involves human principals and human agents. The first project to develop a highly intelligent AI will, presumably, involve some wealthy financial backers (maybe governments) who hire a group of AI engineers. The sponsors will need to ensure that the engineers carry out the project in accordance with their interests. This is a standard principal-agent problem, which may pose difficulties, but nothing too unusual.
The Control Problem: This involves human principals and artificial agents. The engineers who create the first advanced AI will have a set of goals or interests they will wish for it to pursue. The question is whether they will be able to do this successfully once the AI exceeds a certain threshold in intelligence and competence. This is a unique problem, which gives rise to unique set of concerns. A failure to address this problem is what could give rise to existential risk.

Bostrom’s focus is on the Control Problem, and the possible solutions to it. He identifies two general methods for addressing it. The first is to try to limit an advanced AI’s abilities (the “Capability Control Method”); the second is try to motivate the advanced AI to pursue goals that are in our (human) interest (the “Motivation Selection Method”). Bostrom is clear that either set of methods would have to be implemented before an AI achieves superintelligence. Why? Because after that the AI could have a decisive strategic advantage over human beings, and it may be impossible to constrain or limit it in any way.

In the remainder of this post we will look at four capability control methods: (i) boxing; (ii) incentive design; (iii) stunting; and (iv) tripwiring.


2. Boxing Methods of Control
Perhaps the simplest and most intuitively compelling method of controlling an AI is to put it in a “box”. Not a literal box of course — though that’s one form it could take — but rather a metaphorical box: a set of protocols that contains and constrains the way in which it can interact with the world. A typical AI will have three components: a sensor (or input channel); a processor; and an actuator (or output channel). The AI will receive inputs from the external world via its sensor (e.g. light rays, radio waves, chemical compounds, sound waves and so on), it will manipulate and transform those inputs using its processor, and it will then respond to world by performing an “action” of some kind via its actuators. The action could be something simple like displaying information on a screen, or something more complex like buying and selling shares on a stock exchange. Given the right set of actuators, an AI could conceivably perform any sort of action.

It would seem logical then to say that an AI with a high degree of intelligence, access to the right information, and a wide array of actuators at its disposal, would have the power to pose a significant existential threat to human beings (whether it would be inclined to do so is a separate matter, covered in previous posts). But if we could limit the kinds of information it receives, and minimise the number of actions it can perform in the world, we might have a way to control the threat. This is what the boxing solution proposes.

Bostrom discusses two types of boxing solution in the book. The first he describes as a form of “physical” containment. This is like literal boxing. You place the AI in a physical container of some kind, and only allow it to “act” in the real world via a specific set of actuators. This means removing things like robotic arms or other devices that would allow the AI to break out of its containment area. One big problem with physical containment is the fact that AIs can manipulate devices using radio waves, which it could easily generate by shuffling electrons around inside its processors. We would, consequently, need to build a containment field that prevented radio-wave communication. A Faraday Cage might do the trick.

Bostrom doesn’t have anything overly negative to say about physical containment at this stage (he discusses it in more detail in chapter 10). He thinks it is a simple solution to the problem. He just thinks it could have all sorts of subtle vulnerabilities that we may not realise (e.g. how many of you thought of the radio-wave problem before it was pointed out?), and may lull us into a false sense of security. He also notes that physical containment will reduce the functionality of the AI, which may be a drawback that some sponsors and developers are unwilling to tolerate.

The other method of boxing Bostrom refers to is “informational” containment. This restricts the flow of information into and out of the AI. The idea here is that it may be the inter-connected nature of informational flows that makes an AI dangerous. This is particularly true if the AI is connected to the internet which, with the advent of the “Internet of Things”, will soon become something akin to a central nervous system for the entire globe. If we could de-link the AI from this system, we might be able to contain it. The apotheosis of this would be an AI which can only receive information via a teletype interface, and which can only respond via the same medium with “yes” or “no” answers.

Would this work? Bostrom is sceptical. For one thing, humans would act as the gatekeepers between the AI and the outside world (e.g. by inputting queries on the teletype interface) and humans are vulnerable to manipulation. A smart AI could trick its human gatekeepers into letting it out of the box. Admittedly this might be difficult if it is limited to “yes” or “no” answers. There are further problems here though, discussed in chapter 10 of the book.

Bostrom also notes that you could combine the two methods — i.e. physical and informational containment — but argues that it is virtually impossible to have a perfectly isolated system: any interaction with a human agent would compromise the isolation.


3. Incentive Methods
The second capability control method is difficult to explain. Bostrom refers to it as the “incentive” method, but I find that slightly confusing as it seems to overlap with motivation selection methods. The idea seems to be that if you crafted the right environment for the AI — the right “incentive environment” — it wouldn’t be able to act in an existentially threatening manner. An appropriate analogy here might be the way in which you raise a child. Obviously, children have certain innate qualities that make them different from AIs, but set that to the side for the moment. The child has goals, but has to interact with an external environment in order to pursue those goals. We obviously don’t want our children to grow up into amoral psychopaths who will do anything to get what they want. So instead, we socialise and train them within an incentive environment that forces them to behave in morally and socially acceptable ways.

The suggestion is that we could do something similar with an AI. Bostrom imagines several scenarios here, including one in which the AI is forced to cooperate with other AIs and one in which it is socialised within a human environment (in the hope that it will acquire human norms and values). I’m going to ignore those and focus instead on what I think is his most interesting proposal. This involves creating an AI that whose final value involves receiving “cryptographic reward tokens”. In other words, the AI can be motivated to do anything to receive these tokens. The tokens are sequences of numbers that operate as the keys to ciphers. They would be pretty meaningless to human beings, but overwhelmingly desirable to the AI. The tokens would have to be created before the AI, and would have to stored in a secure location, to be destroyed if the AI ever makes an attempt to get them.

The argument then is that one could use this set-up to ensure that the AI always “plays nice”, i.e. goes along with what its human principals want it to do. I’ll leave Bostrom explain the idea:

The AI could then be given a bounded utility function, and matters could be arranged so that the AI would get (for instance) 99% of its maximum utility from the first reward token, 99% of its remaining utility from the second reward token; and so on. By keeping the “discount rate” high, and by transmitting reward tokens to the AI at a reasonably high rate, one would dissuade the AI from attempting to carry out long-term plans that would increase the expected number of reward tokens in the distal future at the expense of some risk that its reward stream will be cut short, or that the next reward token will be cancelled and destroyed. If the AI always believes that defecting (i.e. doing anything other than its best to look after the principal’s interests) would incur at least a 2% risk of forfeiting the next reward token, whereas cooperating would reduce the risk of forfeiting the next reward token below 1%, then a utility-maximising AI would always cooperate.  
(Bostrom 2014, p. 133)

As I say, I think this is interesting. There is an argument and logic that one can follow. Would it work to contain existential risk? Bostrom says “not necessarily”. He gives three reasons for doubt. First, the system of reward tokens would be administered by humans and the AI may come to distrust those humans or worry about their becoming incapacitated, or the possibility of “regime change”, or the possibility of any other human error that prevents it from getting its reward. This may lead it to take precautions that threaten our existence. Second, he doubts whether we would actually be able to tell whether the outcomes produced by the AI are beneficial to us (this is an argument developed elsewhere in the book — I covered related epistemic concerns in part three). Third, he thinks there are other “esoteric considerations” that might make this idea less feasible than it initially seems. He cites as a specific example of this the uncertainty the AI might have about its own “simulation hypothesis”.

Again, we get a flavour of the rather profound epistemic implications of Bostrom’s views: if he is right, we should be plagued by doubts about what an advanced AI might do, and how it might affect us.


4. Stunting and Tripwiring
The final two methods for controlling an AIs capabilities are stunting and tripwiring. “Stunting”, as the name implies, involves hampering or disabling the AI in some way. Examples would include making the AI run on slow hardware, reducing its memory capacity, or limiting the kind of data it can process (this would seem to overlap slightly with the informational containment strategy discussed earlier).

Bostrom argues that the use of stunting poses a dilemma. Either we stunt the AI too much and it just becomes another “dumb” piece of software; or we stunt it too little and it is capable of overcoming its disabilities. Getting the balance just right could be pretty tricky. He also discusses a “data deprivation” method of stunting. This would attempt to limit the kind of data the AI can process to a narrow domain, e.g. it could only process facts about peptide chemistry. Bostrom argues that this too would be an uncertain method of capability control. The problem is that what we think of as narrow and constrained datasets might be anything but. They could contain circumstantial or implied facts that an acute, superintelligent mind, could work out. Or the data could be relevant to other fields in ways that we don’t fully understand. Or it may even be that a sufficiently intelligent AI could work out lots of things via deduction or a priori reflection.

Tripwiring is a different method of capability control. It involves building into any AI development project a set of “tripwires” which, if crossed, will lead to the project being shut down and destroyed. Bostrom identifies three types of tripwire:

Behavioural Tripwires: A network of detectors would be placed around a “boxed” AI in order to detect any attempt to breach the containment, e.g. any attempt to make radio communication.
Ability Tripwires: The AI would be regularly tested for its capabilities. If at any point new abilities are detected, or its rate of improvement in a particular skill seems high, or if its gets close to some capability threshold, it will be shut down.
Content Tripwires: The AI’s internal processing could be constantly monitored for any suspicious changes, e.g. changes in how it represents its final values.

Bostrom thinks that tripwires could be useful, particularly during the development phase and if used in conjunction with other methods. But, unsurprisingly, he thinks that they too have shortcomings. An advanced AI could subvert a tripwire in an unexpected or unanticipated way. He also notes that AI project developers could grow impatient if tripwires repeatedly hamper their progress. They might undermine any safety advantage gained by the tripwire system.


5. Conclusion
Okay, so that brings us to the end of this post. The four capability control methods are summarised in the table below. In the next post we will look at motivation selection methods.