Key Model Insights
If we flatten the safety-performance tradeoff, so that the technology is safer for each level of performance, we might expect that this change should increase total safety. However, this is often not the case.
Instead, the flatter tradeoff encourages actors to increase their performance so that in equilibrium:
- total safety remains unchanged for ratio conflict success functions
- total safety falls for logistic conflict success functions
You can see the first of these effects when picking the “Industry-wide safety progress” preset scenario and clicking “Show results”.
When actors choose to implement at the performance boundary of technologies, they are no longer restrained by safety concerns. In this case, actors cannot increase performance when the tradeoff flattens and thus the only effect is to increase safety.
The combination of the above insights leads to “Safety valleys”. At times, flattening the safety-performance tradeoff will first decrease equilibrium safety, and then, as the slope falls further, increase it beyond its initial level.
You can see an example of a safety valley by picking the “Safety valley” preset scenario and clicking “Show results”.
Consider what happens as we move along one of the curves in the “Comparative Statics in S-P Steepness (Asymmetric)” plot towards an S-P steepness of zero. We often see that total safety is lowest when the S-P steepness of both players are equal. As S-P steepness falls further so that player 1 has a lead, often total safety rises.
Intuitively, when one player can achieve the same level of performance but with much higher safety than the other player, then this leader faces far less pressure to trade off safety for performance. The laggard also has little incentive to trade safety for performance since it would cost far too much safety to compete with the leader. However, as the laggard catches up to the leader, they face a stronger incentive to compete, and so sacrifice more safety in equilibrium.
Yet, these dynamics depend on risk preferences and the technology context of the leader. For instance, when the leader is implementing at a performance boundary, the dynamics are different. In this case, the leader cannot respond to the laggard by increasing performance. So, as the laggard catches up, the laggard takes on less risk and equilibrium safety rises. You can see an example of this dynamic by picking the “Safely closing the gap” preset scenario and clicking “Show results”.
As the stakes of the competition fall, total safety rises as actors choose lower performance implementations.
However, reducing stakes may not have an immediate effect on equilibrium total safety at a performance boundary. In that case, the stakes must be reduced enough before actors choose to implement below the performance boundary and therefore increase safety.
An important driver of stakes is likely to be the extent that actors believe that others will exploit them if they lose the competition. You can see an example of a change in stakes by picking the “A sudden change in stakes” preset scenario and clicking “Show results”.
As small performance leads become more decisive for achieving victory, which in the model is captured as a higher decisiveness value, total safety decreases. You can see an example of this dynamic by picking the “Decisive Victory” preset scenario and clicking “Show results”.
In this case, symmetric actors have an incentive to compete harder around the equilibrium, decreasing safety. If the equilibrium is at a performance boundary, then, decisiveness must decrease enough before actors will increase safety.
When the safety-performance tradeoff is more concave, achieving higher performance requires increasingly large safety tradeoffs. In many cases, these large tradeoffs incentivize safer implementations, increasing equilibrium safety. You can see an example of this effect when picking the “Domain-specific safety progress” preset scenario and clicking “Show results”.
This is sometimes not true with a logistic contest success function, since the concave tradeoff permits smaller levels of performance. For a logistic contest success function, these smaller differences in performance matter more, sometimes counteracting the above effect.
So, one road to improving equilibrium total safety is to make technologies close to the current equilibrium performance safer, without improving the safety of much higher performance technologies. This sort of safety requires that responsible actors appreciate the risks of higher-level performance implementation. A related approach that does not require these assumptions is to raise the costs of higher performance implementations through regulation. Yet another consideration is when to invest in research on systems that are not yet possible, expecting performance boundaries to shift outward.
In a multi-polar world, instead of only the leader being capable of causing a disaster, both the leader and laggard can independently cause the disaster. Even though both actors are aware of this spillover in risk, and both actors choose to be safer, the total risk of disaster is still higher than if only the leader can cause a disaster. You can see an example of this effect when picking the “Multi-polar disaster risks” preset scenario and clicking “Show results”.
We can also see that, in the presence of laggard risk, the actors do not always find a pure strategy equilibrium (a point where both actors are choosing their preferred level of safety given the choice of the other actor). We indicate the absence of an equilibrium by showing no data for the relevant regions of the last four plots. If the equilibrium for the user's chosen scenario(s) does not exist, then the circle too will be omitted.
We note that, in the absence of laggard risk, the actors can always find an equilibrium. It is only when we add laggard risk that the actors can fail to find a pure strategy equilibrium. This is because laggard risk gives each actor an incentive to exit the race (by setting performance to 0) when the race gets too dangerous. However, once one actor leaves, the other actor reduces their performance, encouraging that actor to reenter the race. Thus, no equilibrium in pure strategies exists. In these cases, an equilibrium in mixed strategies will exist. However, we are not convinced that their dynamics are a useful guide to strategic behavior in these contexts. So, we do not present these equilibria. Richer models, for instance those that explicitly model AI competition as a dynamic process (Stafford et al, forthcoming), can eliminate this need for mixed strategies.