Thoughts on Selectorate Theory

Governance
Research
Published

February 16, 2026

George Grosz. *Eclipse of the Sun* (1926)

Introduction

Bruce Bueno de Mesquita and Alastair Smith’s The Dictator’s Handbook (2011) is one of the more successful pop-science books in political economics. Its central thesis is that to remain in power all leaders must maintain their coalition: this requirement creates certain incentives governing leaders’ behavior. Furthermore, the ratio between the number of members a leader needs in their coalition and the total size of the set of possible supporters (the selectorate) governs the incentive structure around the leader. The authors go on to argue that different governments behave differently not because of ideology or culture, but because of the game-theoretic structures arising from their selectorate and winning coalition.

The book reflects a body of formal, game-theoretic work called selectorate theory, developed by de Mesquita, Smith, Siverson, and Morrow in The Logic of Political Survival(2003), which presents the same ideas mathematically. In this post, I follow along with the formal model from The Logic of Political Survival with some departures, and then look at some repercussions of the results with respect to hierarchies.

This post will continue a line of reasoning I started (but haven’t yet continue) in my post on differential stag hunt, where I looked at how the structure of incentives shapes behavior in multi-agent systems. The selectorate model is a particularly clean example of this principle, which perhaps can shed some light on how emergent behavior arises from underlying incentive structures in other domains as well.

Epistemic status: This post got denser then expected, and underwent multiple revisions (including one large one where I refactored substantial portions of the post), with the “help” of Claude. I apologize in advance for any errors that may have survived the process.

The original model can be thought of as the 2nd-order model in a family of models parametrized by the number of channels the budget can be distributed to. Based on this, we’ll build up the model in stages. First, Olson’s collective action problem, a symmetric game with no leader and no allocation, establishes the baseline. Then, we introduce a leader with a single allocation channel (targeted transfers) and derive the minimum cost for the leader to buy loyalty. Then, in the actual selectorate model, we add a second channel (broadly distributed “public goods”), let the challenger optimize across all channels, and show that the full selectorate geometry compresses to three coefficients. Finally, we look at higher-order generalizations with more channels, as well as hierarchical composition.

Mancur Olson and Collective Action

One intellectual ancestor of the selectorate model is Mancur Olson’s The Logic of Collective Action (1965). Olson’s model is a symmetric \(n\)-player public goods game: each agent independently chooses to contribute or free-ride, and the public good is produced as a function of total contributions. The individual incentive to contribute scales as \(1/n\).

Concretely, suppose \(n\) agents each choose an effort \(e_i \in [0,1]\) at cost \(c(e_i)\). A public good of value \(v(\sum e_i)\) is produced and shared equally. Agent \(i\)’s payoff is:

\[ u_i = \frac{v\!\left(\sum_j e_j\right)}{n} - c(e_i) \]

The marginal benefit of contributing is \(v'/n\), which shrinks as \(n\) grows. When \(v'/n < c'\), the dominant strategy is \(e_i = 0\).

When the production function has a threshold (the good is provided iff \(k \geq k^*\) contributors), this is equivalent to an \(n\)-player stag hunt. There are two equilibria (enough contribute, or nobody does) with coordination increasing in difficulty as \(n\) grows. When production is linear, it collapses to an \(n\)-player Prisoner’s Dilemma, dominated by free-riding. Olson’s key result is that the first case degrades toward the second as groups grow.

The model has no control variables, and no agent chooses an allocation; therefore there is no budget, no leader, and no asymmetry. All the agents face the same payoff function. The only “decision” is a scalar effort level, and the equilibrium is pinned by the ratio \(v'/nc'\).

The takeaway is that collective action fails at scale because no one is in charge. The benefit of contributing is diluted across all \(n\) agents, but the cost is borne individually. To produce scalable collective action, someone must control and allocate the budget.

One-Channel Allocation

Let’s now extend the model to include a leader who controls a budget and allocates it to coalition members. The leader’s survival depends on maintaining a winning coalition, which requires buying loyalty from coalition members. The question is: how much does the leader need to spend to maintain their coalition?

Setup

Suppose we have a selectorate of size \(S\). For the leader to remain in power, they require a winning coalition of size \(W_{req} \leq S\). The leader is equipped with a budget \(B\). The leader chooses a total targeted transfer \(p\) to distribute among coalition members. Assuming equal distribution, each coalition member receives \(p/W\). The remainder \(B - p\) is discretionary surplus: rents, personal consumption, or waste.

In each round of the game, the leader must nominate a coalition of size \(W_n\) and choose the transfer \(p\). The coalition members can then choose to either stay loyal to the leader or defect. If the leader attracts \(W_{loyal} \geq W_{req}\) supporters, they remain in power. If the leader fails to attract a coalition of at least size \(W_{req}\), they are replaced by a challenger1. For the static versions below, we set \(W := W_{req} = W_n\) and suppress the distinction.

Payoffs

Remains Loyal

Each coalition member receives an equal share of the targeted transfer:

\[ u_L = \frac{p}{W} \]

Defects

If a coalition member defects, their payoff depends on whether the challenger will include this particular member in the new winning coalition. The challenger needs to assemble a coalition of size \(W_r\) from the full selectorate of size \(S\). Assuming equal probability of inclusion, a given member’s probability of being selected is at most \(\frac{W_r}{S}\)2.

If we assume the challenger is adversarial, the challenger allocates the entire budget as targeted transfers: \(p' = B\). The defecting member’s expected payoff3 is:

\[ u_D = \frac{W_r}{S} \cdot \frac{B}{W_r} = \frac{B}{S} \]

The size of the challenger’s coalition turns out to be irrelevant. The first term is the probability of inclusion (\(W_r / S\)) times the targeted transfer per coalition member (\(B / W_r\)), giving \(B / S\) regardless of the challenger’s coalition size. The \(p^{\min}\) we derive below is therefore a lower bound on required spending: the minimum the leader needs under the most favorable assumptions about defection risk.

Leader

A coalition member stays loyal if \(u_L \geq u_D\). The loyalty constraint \(u_L \geq u_D\) becomes:

\[ \frac{p}{W} \geq \frac{B}{S} \]

Therefore:

\[ p^{\min} = B \cdot \frac{W}{S} \]

The minimum transfer is proportional to the coalition ratio \(r = W/S\). The leader’s discretionary surplus is \(B - p^{\min} = B(1 - r)\). In the one-shot version, stability is a tie at the minimum; persistence requires dynamics or additional frictions.

Consequences

Private Goods Are Cheap In Small Coalitions

When \(W\) is small relative to \(S\), the leader can buy loyalty with a small fraction of spending.

A small coalition means each member gets a large slice of the pie, and defecting to the challenger is unattractive because the probability of being included in the new coalition (\(W/S\)) is low. The leader can keep most of the budget for discretionary purposes or personal enrichment.

In equilibrium, with a small coalition the leader can spend just a small share of the budget on the coalition, which is enough to keep the coalition loyal, and the rest is available for discretionary use.

In Large Coalitions Private Goods Are Expensive

As \(W\) grows toward \(S\), \(p^{\min}/B = W/S\) increases toward \(1\). At \(r = 1\), the leader must spend the entire budget on targeted transfers, leaving nothing for rents.

When the coalition is large, each member’s slice of the budget is thin, but each member’s probability of being included in a challenger’s coalition (\(W/S\)) is high. Private loyalty-buying is expensive and the leader’s rents are squeezed.

Numerical Examples

\(W/S\) Regime \(p^{\min}/B\) Rents Character
0.1 Autocracy 10% 90% Loyalty is cheap; most of the budget is discretionary
0.3 Junta 30% 70% Small clique, affordable
0.6 Broad coalition 60% 40% Getting expensive. Small perturbations in \(W\) cause large swings
0.8 Near-democracy 80% 20% Private targeting consumes most of the budget
1.0 Full inclusion 100% 0% The entire budget goes to targeted transfers

Two-Channel Allocation: The Selectorate Model

The one-decision model isolates the core selectorate geometry: the leader’s survival cost is \(p^{\min} = Br\). But real leaders have more than one channel or instrument by which to distribute the budget. In particular, they can provide public goods, which benefit everyone, not just coalition members. The selectorate model introduces a second control variable and an adversarial challenger who optimizes across channels.

Setup

Let the total population be \(N\), with \(S \leq N\) the selectorate and \(W \leq S\) the winning coalition. The leader now allocates the budget across three categories:

\[ B = p + g + R \]

where \(p\) is targeted private goods (split among \(W\)), \(g\) is public goods spending (benefiting all \(N\)), and \(R\) is rents (leader’s discretionary surplus that they retain).

Payoffs

Coalition Member

Remains Loyal

A coalition member’s payoff from loyalty is:

\[ u_L = \frac{p}{W} + v(g, N) \]

where \(v(g, N)\) is a per-capita public good function, increasing in spending \(g\) and decreasing in population \(N\). In general, the shape of \(v\) matters a great deal: concave \(v\) produces the classic BdM result where public goods provision increases with coalition size, while threshold effects and increasing returns from network goods can qualitatively change the predictions. For our purposes, however, the linear case \(v(g, N) = \beta g / N\) suffices, where \(\beta \in (0,1]\) is an efficiency parameter capturing how effectively public spending translates into individual welfare4. This gives:

\[ u_L = \frac{p}{W} + \frac{\beta g}{N} \]

Defects

If a coalition member defects, their payoff depends on the challenger’s allocation across both channels. The targeted channel works as before. The inclusion-probability argument gives \(1/S\) per dollar to a random defector, independent of the challenger’s coalition size. But the challenger can now also use the universal channel, which yields \(\beta/N\) per dollar to everyone regardless of coalition membership.

Leader

The leader maximizes rents \(R = B - p - g\) subject to the loyalty constraint \(u_L \geq u_D\). Since both payoffs are linear in \((p, g)\), and spending is constrained by \(p + g \leq B\) with \(p, g \geq 0\), this is a linear program over the spending simplex. The optimum is always at a corner, so the leader spends entirely on one channel or the other. The solution depends on which channel the adversarial challenger uses, which we consider next.

Adversarial Challenger

The adversarial challenger has the same budget \(B\) and allocates it entirely to whichever channel yields the highest defector payoff per dollar. They can either target a new coalition of size \(W_r\) with targeted transfers, or provide universal public goods. The defector’s expected payoff from the targeted channel is \(B/S\) (same as before, since the challenger can pick any \(W_r\) and the expected payoff per dollar is \(1/S\)), while the payoff from the universal channel is \(\beta B / N\). The challenger picks the channel that yields the higher payoff to the defector:

\[ u_D = B \cdot \max\!\left(\frac{1}{S},\ \frac{\beta}{N}\right) \]

Channel Comparison

The loyalty constraint \(u_L \geq u_D\) reduces to comparing per-dollar coefficients on each side:

Channel Loyalty coefficient Defection coefficient
Targeted \(1/W\) \(1/S\)
Universal \(\beta/N\) \(\beta/N\)

The incumbent has a positional advantage in the targeted channel (\(1/W > 1/S\) since \(W < S\)) and no advantage in the universal channel (both sides get \(\beta/N\)). The incumbent’s optimal instrument is \(\max(1/W,\ \beta/N)\): targeted dominates when \(1/W > \beta/N\) (i.e., \(W < N/\beta\)), universal dominates when \(W > N/\beta\).

The interesting question is which channel the challenger uses for the defection benchmark. When \(1/S > \beta/N\), the challenger uses targeted transfers, and the defection benchmark is \(B/S\). When \(\beta/N \geq 1/S\), the challenger uses public goods, and the defection benchmark becomes \(\beta B/N\).

The second case is sometimes called the democratic region, though the name is misleading. The condition \(\beta S \geq N\) is about selectorate breadth (\(S\) close to \(N\)) and public goods efficiency (\(\beta\) not too small), not about coalition size \(W\). A polity can have a large winning coalition and still fall outside this region if the selectorate is narrow or public goods are inefficient. With \(\beta \leq 1\) and \(S \leq N\), the condition requires nearly universal selectorate and effective state capacity. When it holds, the challenger’s best response flips from the targeted to the universal channel, changing the binding constraint on the incumbent.

Equilibrium in the Democratic Region

When \(\beta/N \geq 1/S\), the loyalty constraint becomes:

\[ \frac{p}{W} + \frac{\beta g}{N} \geq \frac{\beta B}{N} \]

The leader maximizes rents \(R = B - p - g\) subject to this constraint. Since both payoffs are linear, the solution is a corner: the incumbent uses whichever instrument has the larger loyalty coefficient per dollar.

In the first case, \(1/W > \beta/N\), and targeted spending dominates. The leader sets \(g = 0\):

\[ \frac{p}{W} \geq \frac{\beta B}{N} \implies p^{\min} = \frac{\beta W B}{N} \]

This is more expensive than the non-democratic benchmark \(p^{\min} = WB/S\), since \(\beta S \geq N\) in the democratic region implies \(\beta WB/N \geq WB/S\). The challenger’s access to an efficient public channel raises the defection threshold; the incumbent still responds with targeted transfers but must spend more to compensate.

In the second case, \(1/W \leq \beta/N\). Here, the coalition is large enough (\(W \geq N/\beta\)) that universal spending dominates even for the incumbent. The leader sets \(p = 0\):

\[ \frac{\beta g}{N} \geq \frac{\beta B}{N} \implies g^{\min} = B \]

The entire budget goes to public goods, leaving zero rents. Both challenger and incumbent operate through the same channel.

In the linear model, the challenger and incumbent are both optimizing linear programs over the spending simplex, so both always spend entirely on one channel. The challenger picks whichever channel maximizes defector payoff per dollar, and the incumbent picks whichever channel maximizes loyalty surplus per dollar.

The leader never provides a mix of public and private goods. At \(\beta/N = 1/S\), the challenger’s best response flips from the targeted to the universal channel, changing the binding constraint on the incumbent. The incumbent’s policy remains a corner solution throughout5.

Generalization to \(n\) Channels

The two-control model has two channels: targeted (to \(W\)) and universal (to \(N\)). What happens if we extend this? In fact, we could define a channel for any subset \(T \subseteq N\), where spending on \(T\) distributes evenly across \(|T|\) members6.

What is the marginal value of a dollar spent on subset \(T\) for the defector? A defecting coalition member is included in the challenger’s new coalition with probability \(W_r/S\), and the per-member payout from spending on \(T\) is \(1/|T|\). But the defector only benefits if they are in \(T\). Under the adversarial challenger, the defector’s expected payoff per dollar on channel \(T\) is:

\[ \tilde{a}_T = \frac{|T \cap S|}{S} \cdot \frac{1}{|T|} \]

Targeting non-selectorate members wastes spending (since they cannot defect), so the only relevant channels have \(T \subseteq S\). For any such \(T\), \(|T \cap S| = |T|\), and the subset size cancels:

\[ \tilde{a}_T = \frac{|T|}{S} \cdot \frac{1}{|T|} = \frac{1}{S} \]

Whether the challenger targets 10 or 10,000 members of \(S\), the expected defector payoff per dollar is the same. All subset-targeting channels are summarized by the same defector-side coefficient.

On the incumbent side, the loyalty constraint binds on the worst-off coalition member. If the incumbent targets subset \(T\), members of \(W \setminus T\) receive nothing from this channel and will defect, so the incumbent must have \(T \supseteq W\). Given \(T \supseteq W\), each coalition member receives \(1/|T|\) per dollar, which is strictly decreasing in \(|T|\). The optimum is the minimum viable set \(T = W\), giving \(1/W\) per dollar. Any larger \(T\) dilutes the transfer across non-coalition members.

What’s the adversarial equilibrium? For the challenger, the equilibrium is \(\max_T \tilde{a}_T = 1/S\) for any targeted channel, or \(\beta/N\) from the universal channel. They pick \(\max(1/S,\ \beta/N)\).

For the incumbent, the equilibrium is \(\max_{T \supseteq W} 1/|T| = 1/W\) from targeted, or \(\beta/N\) from the universal channel. The incumbent chooses whichever channel yields the best surplus of loyalty.

Therefore, if spending on \(T\) splits evenly among \(|T|\), the entire channel space collapses to three numbers: \(1/W\), \(1/S\), and \(\beta/N\). The two-control model is not an arbitrary simplification. Instead, under adversarial play and uniform inclusion, channels defined by uniform subset transfers compress to three coefficients. The \(1/S\) compression on the defector side depends on uniform inclusion (equal probability of being in the challenger’s coalition) and no commitment (the challenger cannot condition on who defected). Instruments with different transfer technologies (coercion, propaganda, targeted services with non-uniform delivery) need not compress this way. If challengers could preferentially recruit defectors, subset channels would no longer collapse.

Hierarchical Composition

What happens when each coalition member is themselves a leader with a sub-selectorate? Assume that the sub-selectorates partition the top-level selectorate, so each member of \(S_1\) belongs to one sub-leader’s domain7.

Channel Attenuation

We can think of the subleaders as “leaking” as the budget flows down the levels of the hierarchy. The top leader sends a targeted transfer, but the sub-leader must spend at least part of it to maintain their own coalition. The fraction that passes through determines how much the hierarchy costs. Call this the attenuation factor \(\kappa\).

Two-Level Model

A top leader has parameters \((W_1, S_1, B_1)\). Each of the \(W_1\) coalition members is a sub-leader with their own selectorate \((W_2, S_2, B_2)\). Assume that the sub-leader’s only budget is the transfer \(T\) received from above, and they have no independent revenue (no local taxes, fiefs, or alternative instruments). Under this assumption, \(B_2\) drops out and the sub-leader’s equilibrium is determined entirely by \(r_2 = W_2/S_2\) (if sub-leaders had independent local budgets, the recursion would acquire additive terms and the clean one-parameter Möbius recursion below would not close). The sub-leader’s loyalty constraint is:

\[ p_2^{\min} = T \cdot r_2 \]

where \(T\) is the transfer received from the top leader. The sub-leader spends \(T r_2\) to maintain their own coalition and can pass through at most \(T(1-r_2)\) as usable value to their coalition members. From the top leader’s perspective, that means a dollar sent to a coalition member is discounted by the downstream factor \(1-r_2\). The top-level loyalty constraint therefore becomes:

\[ (1-r_2)\cdot \frac{p_1}{W_1} \geq \frac{B_1}{S_1} \]

Solving gives:

\[ p_1^{\min} = B_1\cdot \frac{r_1}{1-r_2} \]

This is the two-level instance of the bottom-up effective-share recursion defined below: \(r_2^{\text{eff}}=r_2\) and \(r_1^{\text{eff}}=\frac{r_1}{1-r_2^{\text{eff}}}\). The only thing the top level needs from the sub-level is the scalar \(r_2^{\text{eff}}\), which summarizes downstream incentive consumption.

\(n\)-Level Model

The two-level result generalizes recursively. A sub-leader’s effective cost share must include all downstream hierarchy costs, not just their local share \(r_k\). Define the effective cost share bottom-up:

\[ r_n^{\text{eff}} = r_n, \qquad r_k^{\text{eff}} = \frac{r_k}{1 - r_{k+1}^{\text{eff}}} \]

Each sub-level’s effective share reduces the available surplus, inflating the cost at the level above. For two levels this gives \(r_1/(1-r_2)\) as before. For three levels:

\[ r^{\text{eff}} = \frac{r_1}{1 - \frac{r_2}{1 - r_3}} = \frac{r_1(1 - r_3)}{1 - r_2 - r_3} \]

The hierarchy is viable when \(r^{\text{eff}} \leq 1\). For identical sub-levels \(r_k = r\), the viability constraint tightens with depth. An infinite hierarchy converges only when \(r \leq 1/4\) (the fixed point of \(x = r/(1-x)\) exists when \(1 - 4r \geq 0\)).

Deep hierarchies selectively attenuate the targeted channel. The continued-fraction structure means costs compound, and each sub-level’s effective cost shrinks the surplus available to the level above. The universal channel (public goods), by contrast, is not attenuated by the hierarchy, since public goods benefit everyone regardless of intermediary structure. This differential attenuation is why deep hierarchies push toward public goods provision.

The asymmetric attenuation is a modeling assumption. Public goods pass through the hierarchy undiminished (\(\beta g/N\) per citizen regardless of depth), while targeted transfers are consumed at each level. In practice, local public goods can be captured by intermediaries, and some targeted transfers (direct electronic payments) can bypass the hierarchy entirely. The general point is that different channels attenuate differently, and depth selects for whichever channel is least attenuated.

Interpretation

The upshot of the model is that downstream politics inflates the upstream effective cost share: \(r_1^{\text{eff}} \geq r_1\) whenever \(r_2^{\text{eff}} > 0\). Sub-leaders consume transfers to maintain their own coalitions, attenuating the targeted channel. Because autocracy is cheap (small \(r\), low attenuation), the top leader is incentivized to prefer “autocratic” sub-leaders (small \(r_2\), small \(\tau\)). This can create top-down pressure for authoritarianism at every level of the hierarchy.

Furthermore, the attenuation compounds through the continued fraction. Even if each level’s local cost \(r_k\) is small, the effective cost \(r^{\text{eff}}\) grows as downstream costs eat into the surplus at every level. Under the assumption that public goods are not attenuated by intermediaries, the system must eventually switch to the universal channel when targeted transfers become too expensive. No single level’s parameters make private loyalty-buying impossible, but composition across levels can. Whether this produces a sharp threshold depends on the relative attenuation rates across channels.

There are two ways to read the causality. Either “deep hierarchies make targeted transfers unviable, selecting for less-attenuated channels” or “societies that rely on broadly distributed goods (defense, infrastructure, trade networks) can sustain deeper hierarchies.” The model itself is static and doesn’t distinguish these.

The robust prediction is narrower: when \(r^{\text{eff}} > 1\), the targeted channel cannot fund the hierarchy (\(p_1^{\min} > B_1\)). Deep patronage hierarchies hit this bound. What replaces the targeted channel depends on the attenuation structure of available alternatives.

These preferences can reverse through a mechanism outside the static model. If sub-level public goods feed back into the top-level budget (democratic governors produce education, infrastructure, rule of law, raising local productivity and therefore \(B_1\) through taxation), then the top leader faces a tradeoff not present in the formal setup: \(p^{\min}/B\) doesn’t depend on \(B\), but the absolute discretionary surplus is \((1 - r) \cdot B_1\), so the top leader prefers democratic governors when the productivity gain to \(B_1\) outweighs the cost amplification from the hierarchy. This requires endogenizing \(B_1\) as a function of sub-level policy, which the static selectorate game does not do.

This explains the logic of modern federal democracies: the central government tolerates democratic local governance because it produces a wealthier economy to tax. Empires that allow local self-governance (Rome at its peak, the British dominion model, America) seem to outperform centralized control in some cases (late-stage Ottoman, Soviet).

Renormalization

We can think of the \(r^{\text{eff}}\) recursion as a type of renormalization. At each level of the hierarchy, we solve the sub-level equilibrium, extract the single scalar \(r_k^{\text{eff}}\), and discard the rest. The full specification of sub-level strategies, payoffs, and coalition dynamics is replaced by one number that summarizes everything the level above needs to know. The top leader does not need to know how many agents are at level 3, or what their loyalty margins are, or how the sub-sub-leaders allocate. All of that information is compressed into \(r_k^{\text{eff}}\).

This compression composes. The continued fraction \(r_k^{\text{eff}} = r_k/(1-r_{k+1}^{\text{eff}})\) summarizes an arbitrarily deep hierarchy into one effective cost share. Budget invariance is what makes the composition one-directional, since the sub-level’s equilibrium depends only on \(r_k\), not on the transfer it receives, so each step is independent of the top-level solution.

As depth increases under fixed local parameters (e.g. identical \(r_k = r\)), the effective share \(r^{\text{eff}}\) evolves under repeated Möbius maps and eventually hits the pole at \(r^{\text{eff}} = 1\) when the hierarchy becomes unviable. No single level’s parameters makes private loyalty-buying impossible, but the accumulated flow can.

For other queries (distributional outcomes at the bottom, total public goods delivered, probability of revolt), the compression is lossy. \(r^{\text{eff}}\) tells you everything you need to know about \(p_1^{\min}\), but not about everything.

Tradeoffs on Width and Depth

A flat structure (\(n = 1\)) pays no hierarchy tax. Every additional level inflates \(r^{\text{eff}}\) through the recursion, making loyalty more expensive. So why not keep everything flat?

Flat structures face a control problem that lives outside this model. A single leader managing a large population directly is logistically impossible. Hierarchy exists to solve coordination and monitoring problems that scale with \(S\). The hierarchy tax is the price paid for this coordination capacity.

The trade-off determines an implied maximum depth for targeted-transfer regimes. The feasibility constraint is \(r^{\text{eff}} \leq 1\) (the leader cannot spend more than the budget). For identical sub-levels \(r_k = r\), the recursion \(x_{k+1} = r/(1-x_k)\) starting from \(x_1 = r\) determines the maximum viable depth: the largest \(n\) for which \(r^{\text{eff}} < 1\). For \(r \leq 1/4\), the continued fraction converges and arbitrarily deep hierarchies are viable. For \(r > 1/4\), the recursion reaches the pole \(x = 1\) at finite depth.

Beyond this depth, the targeted channel cannot sustain the hierarchy and the system must either switch to a less-attenuated channel or flatten.

Population size creates a constraint in the other direction. The total population at the bottom scales as \((rs)^{n-1} \cdot s(1-r)\), so managing a population of size \(N\) with span \(s\) requires at least \(n \geq \log N / \log(rs)\) levels. A city-state can stay relatively flat, but a large empire cannot. This gives two competing bounds on hierarchy depth.

The lower bound (from population) is \(n \geq \log N / \log(rs)\). Large populations require deep hierarchies.

The upper bound (from viability) is the \(n\) at which \(r^{\text{eff}} > 1\), i.e., \(p_1^{\min} > B_1\).

The intersection is the feasible region. For small \(r\) (with autocratic sub-levels), the recursion converges slowly and the upper bound is generous, but the lower bound still forces depth as \(N\) grows. For large \(r\) (democratic sub-levels), \(r^{\text{eff}}\) diverges quickly and the upper bound is tight, but public goods provision sidesteps the attenuation problem entirely.

The implication is that large populations cannot sustain deep patronage hierarchies: the hierarchy tax accumulates exponentially, and the targeted channel eventually becomes unviable. What replaces it depends on which channels are less attenuated. If public goods pass through the hierarchy with lower attenuation than targeted transfers (our modeling assumption), then large \(N\) selects for public goods provision8.

We can classify governance structures along these two axes9.

Small \(r\) (private goods) Large \(r\) (public goods)
Shallow (\(n \leq 2\)) Personalist dictatorships, city-states. No hierarchy tax. Stable as long as \(S\) is manageable. Direct democracies, Swiss cantons. Stable but scale-limited: flat structure can’t coordinate large \(S\).
Deep (\(n \gg 1\)) Feudalism, tributary empires, patronage networks. Fragile: \(r^{\text{eff}}\) diverges toward the pole. Federal democracies, imperial bureaucracies with civil service. Viable because broadly distributed goods sidestep the attenuation problem.

Decision Count Analysis

How many decisions does a hierarchy involve? In an Olsonian public goods game (stag hunt) with \(n\) agents, the answer is simply \(n\): each agent makes one symmetric binary decision (contribute or free-ride). There is no distinguished agent, no allocation variable, and no asymmetry.

The selectorate model breaks this symmetry. There are two types of decisions. Each leader makes an allocation decision (choose \(p\)), and each coalition member makes a loyalty decision (loyal or defect).

In the feudal nesting model, suppose each level has uniform selectorate size \(S_k = s\) and coalition size \(W_k = rs\) (so the coalition ratio is \(r\) at every level):

  • Level 1: 1 leader allocates, \(rs\) coalition members each decide loyalty. Total: \(1 + rs\) decisions.
  • Level 2: \(rs\) sub-leaders each allocate, each with \(rs\) coalition members deciding loyalty. Total: \(rs + (rs)^2\) decisions.
  • Level \(k\): \((rs)^{k-1}\) allocation decisions + \((rs)^k\) loyalty decisions.

The total decision count across \(n\) levels is:

\[ \underbrace{\frac{(rs)^n - 1}{rs - 1}}_{\text{allocation}} + \underbrace{\frac{rs \cdot ((rs)^n - 1)}{rs - 1}}_{\text{loyalty}} = (1 + rs) \cdot \frac{(rs)^n - 1}{rs - 1} \]

Loyalty decisions dominate by a factor of \(rs\). For every leader choosing how to split a budget, there are \(rs\) agents deciding whether to stay or defect. The total population (citizens at the bottom who don’t lead anyone) scales as \((rs)^{n-1} \cdot s(1-r)\).

The attenuation result indicates that all \((1+rs) \cdot \frac{(rs)^n - 1}{rs - 1}\) micro-level decisions are compressed into a single effective parameter \(r^{\text{eff}}\) at the top, and so the top leader doesn’t need to know the internal politics of each sub-domain. They only need to know \(r^{\text{eff}}\), which summarizes everything below into a single cost share. The continued-fraction recursion replaces exponentially many individual decisions with a small number of effective parameters. An Olsonian model with the same population would have the same number of decisions but no comparable compression. In a symmetric \(n\)-player game, you can exploit symmetry to characterize the equilibrium by a single mixed-strategy probability \(p^*\), but this is an analytical convenience for the modeler, not a structural feature of the game. Finding \(p^*\) requires solving the full system simultaneously. No agent inside the game has privileged access to the compressed description, and there is no modular decomposition, so you cannot solve “part of the game” independently and feed the result into another part. In the selectorate model, each sub-level’s equilibrium is computed from \(r_k\) and \(r_{k+1}^{\text{eff}}\), producing \(r_k^{\text{eff}}\), which feeds into the level above. Budget invariance ensures the decomposition is one-directional: the sub-problem doesn’t depend on the top-level solution.

More broadly, \(r^{\text{eff}}\) is a lossless compression of sub-level coalition politics for the specific query “what \(p_1^{\min}\) does the top leader need?” The “source” is the full specification of sub-level strategies, payoffs, and equilibria; the “compressed representation” is a single scalar; and the distortion is zero for this query. For other queries (distributional outcomes at the bottom, total public goods delivered, probability of revolt) the compression is lossy. This is a special case of a more general question: given an \(n\)-player game, when can you compress a coalition of players into an effective agent with fewer parameters while preserving the equilibrium structure at coarser levels? The selectorate model is compressible due to linearity and budget invariance. In general, the compression will be lossy10.

The decision count also constrains which hierarchies are feasible. Deeper hierarchies require exponentially larger populations, which is why deep patronage hierarchies are historically associated with empires rather than city-states.

Conclusion

The selectorate model distills the logic of political survival into a small set of parameters. When the winning coalition \(W\) is small relative to the selectorate \(S\), private loyalty-buying is cheap and the leader retains wide discretion. As \(W\) grows, the cost of targeted transfers increases (\(p^{\min} = Br\)), squeezing rents. In the linear model, this does not by itself produce public goods: the incumbent uses targeted transfers throughout unless \(W \geq N/\beta\) (an extreme corner). The classic BdM result, where public goods provision increases smoothly with \(W\), requires concavity in \(v\). What the linear model does establish cleanly is the challenger’s best-response switch at \(\beta S = N\) and the three-coefficient geometry.

The hierarchical composition result is a separate mechanism. Depth compounds the effective cost \(r^{\text{eff}}\) through the continued-fraction recursion, which can make the targeted channel unviable even when no single level does. This pushes deep hierarchies toward channels with lower attenuation (conditionally on public goods being less attenuated than targeted transfers, which is a modeling assumption, not a theorem).

The leader’s problem is to design an incentive scheme that induces loyalty among a coalition of agents. The model provides a closed-form solution for the minimum cost, and characterizes how it changes with institutional parameters. The hierarchical composition shows how local incentive problems aggregate into global constraints.

Is this continued-fraction structure specific to linear payoffs and budget invariance, or is it generic to any model where local equilibrium constraints rescale upstream transfers? The hierarchy composition acts via \(PGL(2)\) (see appendix) on the effective cost share, with each level contributing a non-diagonal matrix \(M_k = \bigl(\begin{smallmatrix} 0 & r_k \\ -1 & 1 \end{smallmatrix}\bigr)\).

The channel-compression and hierarchical Möbius structure derived here rest on linearity, budget invariance, and symmetric inclusion. Whether similar renormalization-style recursions survive under more general transfer technologies or informational frictions remains an open question and suggests a broader research program.

Caveats

The selectorate model is elegant and generates sharp predictions. It is also, in certain popular treatments, sometimes oversold.

  1. Binary loyalty. Coalition members choose Loyal or Defect. Real political actors face a spectrum of options: partial cooperation, conditional support, hedging, signaling.

  2. No information asymmetry. Everyone observes the leader’s allocation \(p\), the challenger’s strategy, and the coalition structure. Real authoritarian politics is rife with private information: leaders don’t know who is truly loyal, coalition members don’t know the leader’s true budget, and challengers can’t credibly commit to future allocations. Models that incorporate these features (e.g., Egorov and Sonin, 2009, on dictators and their viziers) yield richer and sometimes different predictions.

  3. Linear public goods. With linear payoffs, the leader always uses one channel or the other, never a mix, and uses targeted transfers exclusively unless \(W \geq N/\beta\) (typically infeasible). The headline qualitative result, that large coalitions push leaders toward public goods, does not follow from the linear model; it requires concavity in \(v\), which produces interior solutions with \(g^* > 0\) increasing smoothly in \(W\). Concave returns can also generate public goods provision outside the democratic region, since the high marginal return at low \(g\) can justify some public spending even when \(\beta/N < 1/S\). The linear model captures the regime switch and the three-coefficient geometry but misses this interior structure.

  4. Exogenous institutions. The model takes \(W\) and \(S\) as given. But real leaders actively manipulate these parameters: expanding the selectorate (extending suffrage), shrinking the coalition (purging rivals), creating new institutional structures. Endogenizing \(W\) and \(S\) is a much harder problem11.

  1. Oversimplified mapping to real regimes. Popular presentations sometimes map \(W/S\) ratios too directly onto regime types: “democracy = large \(W\), dictatorship = small \(W\).” Reality is messier. Some democracies have effectively small winning coalitions (gerrymandered single-party states); some autocracies maintain large coalitions (Singapore’s PAP). The model provides useful intuition about incentives but should not be mistaken for a precise taxonomy of political systems.

None of this invalidates the model. The selectorate framework remains one of the most productive formal theories in comparative politics. But its predictions are best understood as comparative statics about incentives, not as iron laws of political behavior.

Appendix

Additional Model Analysis

Projective Structure

The minimum transfer share \(p^{\min}/B = r = W/S\) has clean structural properties:

Budget Invariance

\(B\) cancels completely. A rich autocracy and a poor autocracy have identical equilibrium shares. This is a formal version of the “institutions, not resources” thesis. A singular source of wealth (i.e. oil) increases \(B\) without changing \(p^{\min}/B\), so the discretionary surplus grows proportionally. Foreign aid has the same problem if it enters as \(B\). More money flowing to a small-coalition regime likely makes governance worse, not better.

Population Scaling

\((W, S) \to (\lambda W, \lambda S)\) leaves \(p^{\min}/B\) invariant. Only the ratio between \(W\) and \(S\) matters.

Linearity and Projective Structure

In the one-decision model, \(p^{\min}/B = r\) is simply linear. The interesting structure emerges when we consider hierarchy. The recursion \(r_k^{\text{eff}} = r_k/(1 - r_{k+1}^{\text{eff}})\) is a Möbius transformation in \(r_{k+1}^{\text{eff}}\). Work in projective coordinates: identify nonzero vectors \((x,y) \sim (\lambda x,\lambda y)\) for \(\lambda \neq 0\). On the affine chart \(y \neq 0\), the coordinate is \(r = x/y\).

To see the matrix structure, represent \(r\) as the vector \((r, 1)^T\). A \(2 \times 2\) matrix \(\bigl(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\bigr)\) sends \((r,1)^T \mapsto (ar+b,\, cr+d)^T\), which corresponds in the affine chart \(cr+d \neq 0\) to the value \((ar+b)/(cr+d)\). The recursion \(r_k/(1-x)= (0 \cdot x + r_k)/(-1 \cdot x + 1)\) gives \(a = 0\), \(b = r_k\), \(c = -1\), \(d = 1\):

\[ M_k = \begin{pmatrix} 0 & r_k \\ -1 & 1 \end{pmatrix} \in PGL(2) \]

For example, \(M_k\) sends \((r_{k+1}^{\text{eff}}, 1)^T \mapsto (r_k,\, 1-r_{k+1}^{\text{eff}})^T\), representing \(r_k/(1-r_{k+1}^{\text{eff}}) = r_k^{\text{eff}}\). For \(n\) levels, the composition \(r^{\text{eff}} = f_1(f_2(\cdots f_{n-1}(r_n)))\) corresponds to the matrix product:

Write the bottom parameter as the vector \((r_n, 1)^T\). Then

\[ (M_1 M_2 \cdots M_{n-1})(r_n, 1)^T = (x, y)^T \]

and the effective share is the affine coordinate \(r^{\text{eff}} = x/y\) (when \(y \neq 0\)).

These matrices are not diagonal; composition is genuinely \(PGL(2)\), not just scaling. For three levels:

\[ M_1 M_2 = \begin{pmatrix} 0 & r_1 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} 0 & r_2 \\ -1 & 1 \end{pmatrix} = \begin{pmatrix} -r_1 & r_1 \\ -1 & 1-r_2 \end{pmatrix} \]

Applied to \(r_3\), this gives \((-r_1 r_3 + r_1)/(-r_3 + 1 - r_2) = r_1(1-r_3)/(1-r_2-r_3)\). The off-diagonal entries are real. The pole at \(r_{k+1}^{\text{eff}} = 1\) (where the sub-hierarchy consumes the entire transfer) is a fixed point of the group action. The viability boundary \(r^{\text{eff}} \leq 1\) is the condition that the effective cost share does not exceed the budget12.

With \(n\) spending channels, the loyalty constraint is a linear inequality in \(n\) spending variables, and the constraint coefficients live in \((\mathbb{RP}^n)^*\). The natural conjecture is that hierarchical composition acts via \(PGL(n+1)\) on this coefficient space. This post derives the single-channel case; the general multi-channel composition remains open.

Code

We can encode the model as a differentiable PyTorch module, making the comparative statics computable rather than just algebraic. Caveat Emptor: Claude wrote this code.

@dataclass
class SelectorateEquilibrium:
    p_min: torch.Tensor              # minimum targeted transfer
    rents: torch.Tensor              # B - p_min
    coalition_payoff: torch.Tensor   # p_min / W
    defection_payoff: torch.Tensor   # B / S
    inclusion_prob: torch.Tensor     # W / S
    loyalty_margin: torch.Tensor     # coalition - defection
class SelectorateModel(nn.Module):
    def __init__(self, W=10.0, S=100.0, B=100.0):
        super().__init__()
        self.W = nn.Parameter(torch.tensor(W))
        self.S = nn.Parameter(torch.tensor(S))
        self.B = nn.Parameter(torch.tensor(B))

    @property
    def r(self):
        return self.W / self.S

    def p_min(self):
        return self.B * self.r

    def tau(self):
        r = self.r
        return 1.0 / torch.clamp(1.0 - r, min=1e-8)

    def kappa(self):
        """Attenuation factor: fraction of transfer that passes through."""
        return 1.0 - self.r

    def forward(self):
        p = self.p_min()
        rents = self.B - p
        coalition_pay = p / self.W
        defection_pay = self.B / self.S
        return SelectorateEquilibrium(
            p_min=p, rents=rents,
            coalition_payoff=coalition_pay, defection_payoff=defection_pay,
            inclusion_prob=self.r,
            loyalty_margin=coalition_pay - defection_pay,
        )

The \(p^{\min} = B \cdot W/S\) formula, wrapped so that PyTorch’s autograd can differentiate through it:

model = SelectorateModel(W=10.0, S=100.0, B=100.0)
eq = model()
eq.rents.backward()
print(f"d(rents)/dW = {model.W.grad:.4f}")  # negative: more W, less rents

Each additional coalition member decreases rents, confirming that larger coalitions squeeze the leader’s discretionary surplus.

There is no separate hierarchical model. A hierarchy is just composition of flat selectorates. Each level has its own SelectorateModel. Hierarchical composition follows the continued-fraction recursion \(r_k^{\text{eff}} = r_k/(1-r_{k+1}^{\text{eff}})\), not multiplication of per-level factors:

def r_eff_from_levels(*levels):
    """
    levels ordered top to bottom.
    returns the top-level effective share r_eff under the recursion
    r_n^eff = r_n,  r_k^eff = r_k / (1 - r_{k+1}^eff).
    """
    x = levels[-1].r
    for level in reversed(levels[:-1]):
        x = level.r / torch.clamp(1.0 - x, min=1e-8)
    return x

def p_min_composed(top, *sub_levels):
    """Top-level p_min accounting for hierarchy composition."""
    r_eff = r_eff_from_levels(top, *sub_levels)
    return top.B * r_eff
# Two-level hierarchy: autocratic sub-leaders
top = SelectorateModel(W=10.0, S=100.0, B=100.0)
sub_auto = SelectorateModel(W=10.0, S=100.0)
print(f"kappa = {sub_auto.kappa():.3f}")                     # 0.900
print(f"p_min = {p_min_composed(top, sub_auto):.3f}")         # 11.111

# Democratic sub-leaders
sub_dem = SelectorateModel(W=80.0, S=100.0)
print(f"kappa = {sub_dem.kappa():.3f}")                       # 0.200
print(f"p_min = {p_min_composed(top, sub_dem):.3f}")          # 50.000

# Gradient: how does sub-level democratization affect top cost?
p_min_composed(top, sub_auto).backward()
print(f"dp/dW2 = {sub_auto.W.grad:.4f}")

Because everything is differentiable, we can compute how sensitive the top-level equilibrium is to sub-level institutional changes. The gradient \(\partial p^{\min}_1 / \partial W_2\) tells us how much expanding the sub-level coalition costs the top leader.

AI Disclosure

I used Claude to help draft, revise, and edit this essay. Claude wrote the caveats section and the code. I did ideation, and also made significant edits, reviews, and revisions to the text.

Novelty

The channel attenuation framing, the compression of uniform-transfer subset channels to three coefficients (\(1/W\), \(1/S\), \(\beta/N\)), the continued-fraction recursion \(r_k^{\text{eff}} = r_k/(1-r_{k+1}^{\text{eff}})\), the \(PGL(2)\) representation of hierarchy composition via \(M_k = \bigl(\begin{smallmatrix} 0 & r_k \\ -1 & 1 \end{smallmatrix}\bigr)\), and the renormalization interpretation are, to the author’s knowledge, novel observations that do not appear in the original selectorate theory literature. The differential attenuation insight (targeted channels degrade through hierarchy while universal channels do not) is a modeling assumption that generates the depth-selects-channel-switching result. The \(PGL(n+1)\) generalization to multi-channel composition is conjectured but not derived here.

Footnotes

  1. The full model includes additional complexities such as multiple rounds, discounting, and the possibility of the challenger being a former coalition member. For now, we’ll focus on the static version to learn about the core insights.↩︎

  2. This is a key modeling assumption and an upper bound. The challenger has no particular loyalty to those who helped them seize power and can pick any \(W_r\) out of \(S\). Note that \(W_r\) doesn’t necessarily equal the incumbent’s \(W_n\); the challenger can form a coalition of a different size. Crucially, the standard BdM model does not explicitly model punishment of the defectors for failed defection. The entire cost of defection comes from the inclusion lottery. In reality, failed defectors in autocracies are purged, imprisoned, or killed, which would introduce a deposition probability and a punishment payoff into the constraint.↩︎

  3. Using \(W_r\) makes this technically the maximum payout for the defector. This is desired as it makes the game “adversarial” for the leader. As we see, the size of challenger’s coalition ends up not affecting the expected payoff. This is a consequence of uniformity assumption we made. If inclusion were non-uniform (e.g., the challenger preferentially recruits defectors), \(W_r\) would matter and the bound would tighten for the leader. This also creates incentives for the leader to equally distribute private goods among coalition members, to minimize the chance of a “weak link” spoiling their coalition.↩︎

  4. The linear specification produces corner solutions: both the challenger and incumbent solve linear programs over the spending simplex and pick corners. The challenger’s best-response switch at \(\beta/N = 1/S\) exists without concavity, but the leader always uses one channel or the other, never a mix. Concave \(v\) (e.g., \(v = \beta g^\gamma / N\) with \(\gamma < 1\)) produces interior solutions with \(g^* > 0\) increasing smoothly in \(W\), and operates across the full parameter range. This is the classic BdM result. The original Logic of Political Survival handles the general case.↩︎

  5. In BdM, the result is a bit more realistic due to concavity in \(v\). This produces interior solutions where the leader provides a mix of public and private goods. With concave \(v\), the optimum satisfies \(v'(g^*) = 1/W\): the marginal loyalty per dollar must equalize across channels. As \(W\) grows, \(1/W\) shrinks, so \(g^*\) increases smoothly. Concavity can also generate public goods provision outside the democratic region, since the high marginal return at low \(g\) can justify some public spending even when \(\beta/N < 1/S\). The linear model captures the regime switch but misses this interior structure. For us, the key point is that the challenger and incumbent optimize across channels, and the selectorate geometry compresses to three coefficients: \(1/W\), \(1/S\), and \(\beta/N\).↩︎

  6. We could generalize even further to allow for arbitrary inclusion probabilities, per-member payoffs, multiple types of currencies, etc., but the uniform-inclusion assumption suffices to show the compression result.↩︎

  7. This is the cleanest case, but you can imagine parallel hierarchies (overlapping sub-selectorates, matrix organizations) or cross-level externalities that break the modular structure. Corporate conglomerates and federal systems with concurrent jurisdiction are examples where the parallel case matters.↩︎

  8. The same typology applies to firms. Map \(W\) to key employees whose departure threatens the firm, \(S\) to the labor market, \(B\) to the compensation budget, \(p\) to targeted retention (bonuses, equity grants), and defection to leaving for a competitor. Startups are flat autocracies (founder and a few key people, targeted equity, “founder-mode”). Partnerships and cooperatives are flat democracies (broad profit-sharing). Conglomerates with deep management layers and patronage-heavy compensation (GE under Welch) occupy the fragile quadrant. Large tech companies with broad equity compensation occupy the viable one. The hierarchy tax predicts that middle managers consume transfers before passing them down, attenuating the targeted channel, which is why deep corporate hierarchies either move toward broad compensation or suffer talent drain at the bottom.↩︎

  9. The same typology applies to firms. Map \(W\) to key employees whose departure threatens the firm, \(S\) to the labor market, \(B\) to the compensation budget, \(p\) to targeted retention (bonuses, equity grants), and defection to leaving for a competitor. Startups are flat autocracies (founder and a few key people, targeted equity, “founder-mode”). Partnerships and cooperatives are flat democracies (broad profit-sharing). Conglomerates with deep management layers and patronage-heavy compensation (GE under Welch) occupy the fragile quadrant. Large tech companies with broad equity compensation occupy the viable one. The hierarchy tax predicts that middle managers consume transfers before passing them down, attenuating the targeted channel, which is why deep corporate hierarchies either move toward broad compensation or suffer talent drain at the bottom.↩︎

  10. This connects to a broader research programme I’ve been thinking about on coalition formation as information compression. The general claim is that whenever maintaining cooperation is a control problem under uncertainty, viable coalitions are those that achieve the target cooperative outcome with minimal information rate, but that’s beyond the scope of this note.↩︎

  11. Bueno de Mesquita and Smith’s “Political Survival and Endogenous Institutional Change” (2005) makes some progress on this, modeling institutional change as an equilibrium outcome. But the endogenous-institutions version is considerably more complex and less clean than the baseline model.↩︎

  12. The matrix representation makes the algebraic structure explicit. Each level contributes \(M_k \in PGL(2)\); the total composition is \(\prod M_k\); the viability boundary is \(r^{\text{eff}} \leq 1\); and the pole at \(r_{k+1}^{\text{eff}} = 1\) is a fixed point. Population scaling \((W, S) \to (\lambda W, \lambda S)\) acts trivially on \(r = W/S\), confirming that only the projective coordinate matters.↩︎