SPT Model—Exploring Safety Insights in AI Competition

Parameter definitions and model setup

The safety-performance tradeoff (SPT) model explores a competition between two actors to implement a new technology with potentially large stakes.

No one should make decisions on the basis of the model alone, since it abstracts away many important things. But plugging different parameter values into the model may be useful for:

Investigating how equilibrium safety might change in competitive contexts
Understanding some ways that competitive dynamics can diverge from common intuitions
Clarifying what interventions might be helpful in different strategic contexts

The model below follows an upcoming paper by Robert Trager, Paolo Bova, Nicholas Emery-Xu, Eoghan Stafford, and Allan Dafoe.

We model each actor as a rational agent who chooses their performance level to maximize their expected utility subject to a safety-performance tradeoff. The higher the level of performance they choose, the less likely the technology is to be safe. An unsafe technology causes a “disaster” that affects all of the actors.

Each actor wants to increase their chance of winning the competition, while reducing the risk of disaster. They simultaneously choose a level of performance, $p_{i} \geq 0$ , which implies a level of safety according to the safety-performance tradeoff, $s_{i}$ . We can think of the level of safety as the probability the actor does not cause a disaster. For simplicity assume that only the winner of the competition can cause a disaster (this means we set $laggard risk = 0$ in our model. Note that when $laggard risk = 1$ the model is somewhat different from what follows). The probability that $i$ wins the technology competition is given by a “conflict success function” $({CSF}_{i})$ that is increasing in the performance of $i$ and decreasing in the performance of $j$ . The win probability also depends on a decisiveness parameter. We write the payoff for winning the competition, or in other words the stakes, as $e > 0$ , the normalized payoff for losing is $0$ , and the disaster payoff is $d < 0$ (in simulations we set $d = -10$ . This yields the following simple expected utility function:

u_{i} = s_{i} {CSF}_{i} e + (1 - (s_{i} {CSF}_{i} + s_{j} {CSF}_{j})) d

Note:

s_{i}

, the safety level of player

i

, is a function of

p_{i}

In the plots below, we display several of the model components and show how the Nash equilibrium of the game varies as we change the parameters of the model.

You can make the following parameter choices when exploring the model below (to update the plots after changing the parameters, you must click the “Show results” button):

CSF form: Whether the form of the contest success function should be logistic or the ratio of the performance levels chosen by the two actors.
S-P tradeoff form: Whether the form of the safety-performance tradeoff should be linear, concave, or convex.
Performance boundary: Whether the actors face a limit in the level of performance that a technology can reach. At this point, additional gains in performance cannot be bought by reducing safety.
S-P steepness: How flat or steep the slope of the safety-performance tradeoff should be. The steeper the slope, the lower the safety for a given performance level.
Decisiveness: How low or high the decisiveness should be. The higher the decisiveness, the larger the importance of small performance advantages to winning the competition (according to the contest success function).
Stakes: How low or high the stakes of the competition should be. The higher the stakes, the worse it is for a player if their rival wins.
Laggard risk: The proportion of the laggard's risk which contributes to the final disaster risk. At 0, only the leader risks causing a disaster. None of the risk that the laggard incurs by sacrificing safety affects the final disaster risk. At 1, both players risk causing a disaster no matter who wins.
We advise you to read key insight 7 before changing laggard risk.

Scenario 1

CSF form

S-P tradeoff form

Performance boundary

S-P steepness = 0.6

Decisiveness = 1

Stakes = 10

Laggard risk = 0

Scenario 2

CSF form

S-P tradeoff form

Performance boundary

S-P steepness = 0.6

Decisiveness = 1

Stakes = 40

Laggard risk = 0

Presets

Choose from our preset scenarios:

A sudden change in stakes: In Scenario 2 the stakes are four times larger than in Scenario 1. The higher the stakes, the more competitive, reducing equilibrium safety!

Choose a preset or custom values for each scenario and afterward click “Show results” to update the graphs below.

Disaster risk Scenario 119%

Disaster risk Scenario 228%

Legend

Scenario 1

Scenario 2

Equilibrium (both scenarios)

Performance of rival

S-P steepness of rival

y = x

Save as SVG Save as PNG

Key Model Insights

Flattening the safety-performance tradeoff can decrease total safety

If we flatten the safety-performance tradeoff, so that the technology is safer for each level of performance, we might expect that this change should increase total safety. However, this is often not the case.

Instead, the flatter tradeoff encourages actors to increase their performance so that in equilibrium:

total safety remains unchanged for ratio conflict success functions
total safety falls for logistic conflict success functions

You can see the first of these effects when picking the “Industry-wide safety progress” preset scenario and clicking “Show results”.

Flattening the safety-performance tradeoff increases total safety only if we have a low enough performance boundary

When actors choose to implement at the performance boundary of technologies, they are no longer restrained by safety concerns. In this case, actors cannot increase performance when the tradeoff flattens and thus the only effect is to increase safety.

The combination of the above insights leads to “Safety valleys”. At times, flattening the safety-performance tradeoff will first decrease equilibrium safety, and then, as the slope falls further, increase it beyond its initial level.

You can see an example of a safety valley by picking the “Safety valley” preset scenario and clicking “Show results”.

The riskiest scenarios involve actors with similar technologies and safety-performance tradeoffs

Consider what happens as we move along one of the curves in the “Comparative Statics in S-P Steepness (Asymmetric)” plot towards an S-P steepness of zero. We often see that total safety is lowest when the S-P steepness of both players are equal. As S-P steepness falls further so that player 1 has a lead, often total safety rises.

Intuitively, when one player can achieve the same level of performance but with much higher safety than the other player, then this leader faces far less pressure to trade off safety for performance. The laggard also has little incentive to trade safety for performance since it would cost far too much safety to compete with the leader. However, as the laggard catches up to the leader, they face a stronger incentive to compete, and so sacrifice more safety in equilibrium.

Yet, these dynamics depend on risk preferences and the technology context of the leader. For instance, when the leader is implementing at a performance boundary, the dynamics are different. In this case, the leader cannot respond to the laggard by increasing performance. So, as the laggard catches up, the laggard takes on less risk and equilibrium safety rises. You can see an example of this dynamic by picking the “Safely closing the gap” preset scenario and clicking “Show results”.

Reducing the stakes of the competition increases total safety by encouraging actors to choose lower performance implementations

As the stakes of the competition fall, total safety rises as actors choose lower performance implementations.

However, reducing stakes may not have an immediate effect on equilibrium total safety at a performance boundary. In that case, the stakes must be reduced enough before actors choose to implement below the performance boundary and therefore increase safety.

An important driver of stakes is likely to be the extent that actors believe that others will exploit them if they lose the competition. You can see an example of a change in stakes by picking the “A sudden change in stakes” preset scenario and clicking “Show results”.

As small performance leads become more decisive for achieving victory, total safety decreases

As small performance leads become more decisive for achieving victory, which in the model is captured as a higher decisiveness value, total safety decreases. You can see an example of this dynamic by picking the “Decisive Victory” preset scenario and clicking “Show results”.

In this case, symmetric actors have an incentive to compete harder around the equilibrium, decreasing safety. If the equilibrium is at a performance boundary, then, decisiveness must decrease enough before actors will increase safety.

Increasing the concavity of the safety-performance tradeoff around the equilibrium can increase total safety

When the safety-performance tradeoff is more concave, achieving higher performance requires increasingly large safety tradeoffs. In many cases, these large tradeoffs incentivize safer implementations, increasing equilibrium safety. You can see an example of this effect when picking the “Domain-specific safety progress” preset scenario and clicking “Show results”.

This is sometimes not true with a logistic contest success function, since the concave tradeoff permits smaller levels of performance. For a logistic contest success function, these smaller differences in performance matter more, sometimes counteracting the above effect.

So, one road to improving equilibrium total safety is to make technologies close to the current equilibrium performance safer, without improving the safety of much higher performance technologies. This sort of safety requires that responsible actors appreciate the risks of higher-level performance implementation. A related approach that does not require these assumptions is to raise the costs of higher performance implementations through regulation. Yet another consideration is when to invest in research on systems that are not yet possible, expecting performance boundaries to shift outward.

Laggard risk in a multi-polar world increases the risk of disaster

In a multi-polar world, instead of only the leader being capable of causing a disaster, both the leader and laggard can independently cause the disaster. Even though both actors are aware of this spillover in risk, and both actors choose to be safer, the total risk of disaster is still higher than if only the leader can cause a disaster. You can see an example of this effect when picking the “Multi-polar disaster risks” preset scenario and clicking “Show results”.

We can also see that, in the presence of laggard risk, the actors do not always find a pure strategy equilibrium (a point where both actors are choosing their preferred level of safety given the choice of the other actor). We indicate the absence of an equilibrium by showing no data for the relevant regions of the last four plots. If the equilibrium for the user's chosen scenario(s) does not exist, then the circle too will be omitted.

We note that, in the absence of laggard risk, the actors can always find an equilibrium. It is only when we add laggard risk that the actors can fail to find a pure strategy equilibrium. This is because laggard risk gives each actor an incentive to exit the race (by setting performance to 0) when the race gets too dangerous. However, once one actor leaves, the other actor reduces their performance, encouraging that actor to reenter the race. Thus, no equilibrium in pure strategies exists. In these cases, an equilibrium in mixed strategies will exist. However, we are not convinced that their dynamics are a useful guide to strategic behavior in these contexts. So, we do not present these equilibria. Richer models, for instance those that explicitly model AI competition as a dynamic process (Stafford et al, forthcoming), can eliminate this need for mixed strategies.

Result plots

Legend

Key Model Insights