Introduction

Invariant theory is the study of how symmetries constrain the structure of mathematical objects (similar to Noether’s theorem). In this post, I will give a brief introduction to invariant theory and its applications.

Invariant theory is a vast field. I’m pulling mainly from the book Invariant Theory by Peter Olver (but with changes in order and notation), but will also thread in and ultimately work with computational approaches, such as those described in the book Computational Invariant Theory by Harm Derksen and Gregor Kemper. Throughout the post, I’ll use the running example of binary forms (homogeneous polynomials in two variables) and the action of \(GL(2)\) on them, which is the classical setting for invariant theory¹.

I also use make use of Claude and GPT where appropriate (although I have personally reviewed all the outputs). These are my notes, not a textbook or peer-reviewed paper. As always, be cautious of mistakes in my exposition, check them against the sources, and let me know if you find any errors.

Background

Invariant theory began with the study of polynomials and their geometric properties (those properties that do not depend on a particular choice of coordinates). For example, the multiplicity patterns of roots of a polynomial are invariant under changes of coordinates. The discriminant of a polynomial is an invariant that tells us whether the roots are distinct or not. The coefficients of a polynomial are not invariant, but they transform in a specific way under changes of coordinates. If you have a systematic way to determine the invariants of a polynomial, you can classify and understand its geometric properties without reference to a particular coordinate system.

Homogeneous Polynomials

We start by considering homogeneous polynomials (with coefficients drawn from a field \(k\) of characteristic zero), also called “forms”. A binary form is an \(n\)-degree polynomial in two variables, defined as

\[ Q(x,y) = \sum_{i=0}^n {n \choose i} a_i x^{n-i} y^i \]

We are interested in the geometric properties of these forms, by which we mean properties that do not depend on a particular choice of coordinates. Since we don’t care about the choice of coordinates, we should consider the transformations that can change those coordinates. In this case, linear changes of variables:

\[ \bar{x} = a x + b y \] \[ \bar{y} = c x + d y \] \[ ad - bc \neq 0 \]

This should remind you of an (invertible) matrix transformation (with non-singular determinant), and indeed we can write this as: \[ \begin{bmatrix}\bar{x} \\ \bar{y}\end{bmatrix} = \begin{bmatrix}a & b \\ c & d\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix} \]

So we can transform one binary form to another by:

\[ \bar{Q}(\bar{x}, \bar{y}) = Q(a x + b y, c x + d y) = \sum_{i=0}^n {n \choose i} a_i (a x + b y)^{n-i} (c x + d y)^i \]

Olver gives an explicit formula for the coefficients of the transformed form, but we won’t need it here. The point is that we can transform one form to another by applying a linear transformation to the variables.

For each homogeneous polynomial (of fixed degree \(n\)), and given a scalar variable \(p\), we can also associate a corresponding inhomogenous polynomial ² in \(p\) by substituting \(x = py\) and \(y = 1\). That is:

\[ Q(x, y) = y^n \] \[ q(p) = Q(p, 1) = 1 \]

Conversely, we can associate any inhomogenous polynomial in one variable, \(Q(p)\), with a binary form by substituting \(p = x/y\). That is, if we specify \(n\) in advance, we have:

\[ Q(x,y) = y^n Q(x/y) = \sum_{i=0}^n {n \choose i} a_i x^{n-i} y^i \]

We now have two transformations, one on coordinates and one between homogeneous and inhomogeneous polynomials. We can combine these two transformations to get a transformation on inhomogeneous polynomials.

This leads us to view our transformation as a linear fractional transformation:

\[ \bar{p} = \frac{a p + b}{c p + d} \]

such that

\[ \bar{Q}(p) = (c p + d)^n Q(\bar{p}) = (c p + d)^n Q\left(\frac{a p + b}{c p + d}\right) \]

Invariants and Covariants

We are now ready to define an invariant for a binary form. An invariant is a function of the coefficients of the form that is unchanged by the linear transformation.

Definition 1 (Invariant) An invariant of a binary form of degree \(n\) is a function \(I(a_0, a_1, \ldots, a_n)\) of the coefficients such that (up to some factor) it does not change under linear transformation.

\[ I(a_0, a_1, \ldots, a_n) = (ad - bc)^k I(\bar{a}_0, \bar{a}_1, \ldots, \bar{a}_n) \]

The power \(k\) is called the weight of the invariant. If \(k=0\), then the invariant is called an absolute invariant, since it is completely unchanged by the transformation.

We can also define a covariant, which is a function of the coefficients and the variables that transforms in a specific way under the linear transformation.

Definition 2 (Covariant) A covariant of weight \(k\) of a binary form of degree \(n\) is a function \(J(a_0, a_1, \ldots, a_n; x, y)\) such that

\[ J(a_0, a_1, \ldots, a_n; x, y) = (ad - bc)^k \bar{J}(\bar{a}_0, \bar{a}_1, \ldots, \bar{a}_n; \bar{x}, \bar{y}) \]

So an invariant is a covariant that does not depend on the variables.

Product of Covariants

Given two covariants \(J_1, J_2\) of weight \(k\) and \(l\) respectively, their product is a covariant of weight \(k + l\):

\[ J_1(\mathbf{a}; x, y) \cdot J_2(\mathbf{a}; x, y) = (ad - bc)^{k+l} \, \bar{J}_1(\bar{\mathbf{a}}; \bar{x}, \bar{y}) \cdot \bar{J}_2(\bar{\mathbf{a}}; \bar{x}, \bar{y}) \]

Sum of Covariants

Given two covariants \(J_1, J_2\) of the same weight \(k\), their sum is also a covariant of weight \(k\):

\[ \begin{aligned} J_1(\mathbf{a}; x, y) + J_2(\mathbf{a}; x, y) &= (ad - bc)^k \, \bar{J}_1(\bar{\mathbf{a}}; \bar{x}, \bar{y}) + (ad - bc)^k \, \bar{J}_2(\bar{\mathbf{a}}; \bar{x}, \bar{y}) \\ &= (ad - bc)^k \left( \bar{J}_1(\bar{\mathbf{a}}; \bar{x}, \bar{y}) + \bar{J}_2(\bar{\mathbf{a}}; \bar{x}, \bar{y}) \right) \end{aligned} \]

The constant \(0\) is trivially a covariant of any weight, and the constant \(1\) is a covariant of weight \(0\). Therefore, covariants of a fixed weight form a vector space, and the set of all covariants forms a ring, graded by weight. We call this the algebra of polynomial covariants in \(k[a_0, \ldots, a_n, x, y]\) (over a field \(k\) of characteristic zero). The invariants (covariants of weight \(0\) that do not depend on \(x, y\)) form a subring in \(k[a_0, \ldots, a_n]\).

Representation Theory

We are interested in how GL(2) acts on the space of binary forms. Since binary forms are already polynomials, we can think of them as vectors in a vector space. So we have a group (GL(2)) acting on a vector space (the space of binary forms).

Since this is a map from the group GL(2) to the general linear group of the vector space of binary forms, it is a representation (by definition). Therefore, can use tools from the representation theory of GL(2) to help understand what kinds of actions GL(2) can perform on the polynomials.

(Note that this entails finding other matrices, not necessarily of dimension 2x2, that represent the group elements of GL(2)).

Furthermore, if we could decompose said representation into irreducible subrepresentations, we will know which subspaces of the space of binary forms “map to themselves” under the action of GL(2) (or even better, are trivial). By viewing the subspaces, we can find invariants and covariants.

So let’s start by reviewing some basic definitions from representation theory, and then apply them to the specific case of binary forms.

(Before we proceed, note that we may switch between \(SL(2)\) and \(GL(2)\). For representation theory (Clebsch-Gordan, Schur’s lemma), it is cleaner to work with \(SL(2)\), since it removes the determinant character and makes the symmetric powers \(V(n)=\mathrm{Sym}^n(\mathbb C^2)\) irreducible representations. For classical invariant theory, the natural transformation group on binary forms is \(GL(2)\), and invariants for \(GL(2)\) typically transform by a power of \(\det(g)\). Equivalently, \(GL(2)\)-invariants are \(SL(2)\)-invariants with additional bookkeeping for the determinant weights.)

Representations

Definition 3 (Representation) A representation of a group \(G\) on a vector space \(V\) is a homomorphism \(\rho: G \to GL(V)\).

We define for each \(g \in G\) an invertible linear map \(\rho(g): V \to V\), such that \(\rho(gh) = \rho(g)\rho(h)\) and \(\rho(e) = \text{id}\), where \(e\) is the identity element of \(G\).

We often suppress \(\rho\) and write \(g \cdot v\) for \(\rho(g)(v)\). Convention refers to \(V\) as a “representation of \(G\)” (\(V\) by itself is just a vector space. Technically “a representation” means the map \(\rho\), but since \(G\) is usually fixed, sometimes it is labelled by the target space).

A subrepresentation of \(V\) (with action \(\rho\)) is a subspace \(W \subseteq V\) such that for all \(g \in G\), \(w \in W\) we have \(g \cdot w \in W\) (\(W\) is closed under the action of \(G\)).

It’s worth noting that if we choose a basis for \(V\), each \(\rho(g)\) can be associated with an invertible matrix. The representation is then a map \(\rho: G \to GL(n)\), where \(n = \dim V\). The choice of basis is unique up to conjugation by an element of \(GL(n)\), so the representation is really a homomorphism into \(GL(n)\) up to conjugation.

Irreducibility and Complete Reducibility

Definition 4 (Irreducible Representation) We say that a representation \(V\) is irreducible if its only subrepresentations are \(\{0\}\) and \(V\) itself.

Definition 5 (Completely Reducible Representation) We say that a representation \(V\) is completely reducible if it decomposes as a direct sum of irreducible representations:

\[ V \cong V_1 \oplus V_2 \oplus \cdots \oplus V_k \]

where each \(V_i\) is a subspace of \(V\) and \(G\) acts on each \(V_i\) by restricting the original action.

In other words, a representation \(V\) is completely reducible if every vector in \(V\) can be uniquely written as a sum of vectors from the various \(V_i\)’s.

As an example, consider the action of \(S_3\) on \(\mathbb{C}^3\), where \(S_3\) permutes coordinates.

\[ \begin{aligned} \sigma \cdot (v_1, v_2, v_3) &= (v_{\sigma^{-1}(1)}, v_{\sigma^{-1}(2)}, v_{\sigma^{-1}(3)}) \end{aligned} \]

Complete Reducibility Example

Consider the subspace \(\{(v_1, v_2, v_3) \mid v_1 + v_2 + v_3 = 0\}\), where \(e_i\) is the standard basis vector with a 1 in the \(i\)-th coordinate and 0 elsewhere.

All elements of this subspace can be written as (\(v_1, v_2, -v_1 - v_2\)) for some \(v_1, v_2 \in \mathbb{C}\).

If we apply \(\sigma\) to a vector in this subspace, we get:

\[ \sigma \cdot (v_1, v_2, v_3) = (v_{\sigma^{-1}(1)}, v_{\sigma^{-1}(2)}, v_{\sigma^{-1}(3)}) \] \[ = (v_{\sigma^{-1}(1)}, v_{\sigma^{-1}(2)}, - v_{\sigma^{-1}(1)} - v_{\sigma^{-1}(2)}) \]

which is an element of the same subspace.

Now consider the subspace \(\text{Span}\{ (1,1,1) \}\) (i.e. all elements the same size). This is also closed under the action of \(S_3\).

Since the two subspaces only intersect at the zero vector, the two representations are complementary. Since they are two-dimensional and one-dimensional, respectively, we have a direct sum decomposition:

\[ \mathbb{C}^3 = \text{span}{(a*(e_1, e_2, e_3) | a \in \mathbb{C})} \oplus \{(v_1, v_2, v_3) \mid v_1 + v_2 + v_3 = 0\} \]

So the action of \(S_3\) on \(\mathbb{C}^3\) is completely reducible because it decomposes as a direct sum of the trivial representation (where \(S_3\) fixes everything) and an irreducible 2-dimensional representation.

Irreducibility Example

Not every representation is completely reducible. Consider the action of \(\mathbb{Z}\) on \(\mathbb{C}^2\) given by

\[ n \cdot (v_1, v_2) = (v_1 + n v_2, v_2) \]

The subspace \(\{(v_1, 0) \mid v_1 \in \mathbb{C}\}\) is closed under the action of \(\mathbb{Z}\), so it is a subrepresentation. However, there is no complementary subrepresentation, since under the action of \(\mathbb{Z}\), any subspace that contains a vector of the form \((0, v_2)\) must also contain all vectors of the form \((n v_2, v_2)\) for \(n \in \mathbb{Z}\), which is the entire space. So this representation is not completely reducible.

Schur’s Lemma

The same group \(G\) can act on different vector spaces in different ways. Each such action is a representation. Schur’s lemma tells us what maps between two such representations can look like.

Definition 6 (Equivariant Linear Map) Consider a linear map \(\phi: V \to W\), where \(V\) and \(W\) are representations of a group \(G\). If for all \(g \in G\) and \(v \in V\), we have \(\phi(g \cdot v) = g \cdot \phi(v)\), we say that \(\phi\) is a \(G\)-equivariant linear map.

This should remind you of our earlier exploration of equivariance. The idea is that the representation \(\phi\) “commutes” with the action of \(G\).

Lemma 1 (Schur’s Lemma) Let \(V\) and \(W\) be irreducible representations of \(G\) over an algebraically-closed field \(k\). If \(\phi: V \to W\) is a \(G\)-equivariant linear map, then: 1. \(\phi\) is either zero or an isomorphism. 2. If \(V = W\), then \(\phi = \lambda \cdot \text{id}\) for some scalar \(\lambda\).

Proof.

(1): If \(\phi(v) = 0\), then \(\phi(g \cdot v) = g \cdot \phi(v) = 0\), and so the kernel \(\ker \phi\) is an invariant subspace of \(V\). By irreducibility \(\ker \phi\) is either \(\{0\}\) or \(V\), as those are the only subspaces of \(V\).

Similarly, the image \(\text{im}(\phi)\) is an invariant subspace of \(W\), so it is either \(\{0\}\) or \(W\). If \(\ker \phi = \{0\}\) and \(\text{im}(\phi) = W\), then \(\phi\) is an isomorphism. Otherwise \(\phi = 0\).

(2): Consider the map \(\phi - \lambda \cdot \text{id}\).

Since \(k\) is algebraically closed, the polynomial \(\det(\phi - \lambda \cdot \text{id})\) has at least one root. So the kernel of this map is nontrivial.

Since we have \((\phi - \lambda \cdot \text{id})(g \cdot v) = \phi(g \cdot v) - \lambda (g \cdot v) = g \cdot \phi(v) - \lambda (g \cdot v) = g \cdot (\phi - \lambda \cdot \text{id})(v)\), the map \(\phi - \lambda \cdot \text{id}\) is equivariant.

Since \(\phi - \lambda \cdot \text{id}\) is equivariant and \(V\) is irreducible, by part (1) it must be zero or an isomorphism. Since the kernel is nontrivial, \(\phi - \lambda \cdot \text{id}\) is zero. So \(\phi = \lambda \cdot \text{id}\). \(\square\)

Schur’s lemma constrains maps between irreducible representations. The space of equivariant maps \(\text{Hom}_G(V, W)\) is zero if \(V \not\cong W\) or one-dimensional if \(V \cong W\). So given some representation \(U\), the projection of \(U\) onto it decomposition \(U \cong \bigoplus V_i\) (by projecting it onto each irreducible summand) is unique (up to scaling).

Basically, the combination of irreducibility and the group structure reduces linear algebra to scalar algebra, which is why representation theory is so powerful for reductive groups.

Symmetric Powers

Let us now apply some of these ideas to the specific case of binary forms. We want to understand how \(GL(2)\) acts on the space of binary forms of degree \(n\). We’ll use \(SL(2)\) instead of \(GL(2)\) to remove the determinant factor, which will make things simpler.

Definition 7 Denote the space of homogeneous polynomials of degree \(n\) in two variables as \(\text{Sym}^n(\mathbb{C}^2)\).

We call this the “\(n\)-th symmetric power of \(\mathbb{C}^2\)”.

If \(V = \mathbb{C}^2\) with basis \(\{e_1, e_2\}\) and coordinates \(x, y\), then \(\text{Sym}^n(\mathbb{C}^2)\) has basis \(\{x^n, x^{n-1}y, \ldots, y^n\}\) and dimension \(n + 1\).

Basically, this is just another set of notation for the space of binary forms of degree \(n\).

If \(G\) acts on \(V\) (one representation), this induces a different action of \(G\) on \(\text{Sym}^n(V)\) (a new representation, on a bigger space) by:

\[ (g \cdot f)(v) = f(g^{-1} \cdot v) \]

This is exactly the action of \(GL(2)\) on binary forms that we have been studying.

Symmetric Powers are Irreducible Representations of \(SL(2)\)

The action of \(SL(2)\) on \(\text{Sym}^n(\mathbb{C}^2)\) by coordinate substitution is an irreducible representation.

To be more specific, each \(2 \times 2\) matrix in \(SL(2)\) induces an \((n+1) \times (n+1)\) matrix on the space of degree-\(n\) binary forms, and there is no proper subspace of degree-\(n\) binary forms that all of these \((n+1) \times (n+1)\) matrices simultaneously preserve.

So we are representing elements of \(SL(2)\) (which are 2x2 matrices) by \((n+1)\times(n+1)\) dimensional matrices acting as linear transformations on the space of binary forms.

Proof.

\(SL(2)\) acts on \(\text{Sym}^n(\mathbb{C}^2)\) by linear substitution. For \(g \in SL(2)\) and \(f(x,y) \in \text{Sym}^n(\mathbb{C}^2)\), we have:

\[ (g \cdot f)(x,y) = f\!\left(g^{-1} \begin{pmatrix} x \\ y \end{pmatrix}\right) \]

This is the action on binary forms from earlier (the \(g^{-1}\) ensures associativity \((gh) \cdot f = g \cdot (h \cdot f)\)).

So \(\text{Sym}^n(\mathbb{C}^2)\) is a representation of \(SL(2)\).

We want to show it is irreducible. To do this, we need to show that its only invariant subspaces are \(\{0\}\) and \(\text{Sym}^n(\mathbb{C}^2)\) itself.

Let \(W \subseteq \text{Sym}^n(\mathbb{C}^2)\) be some nonzero invariant subspace. We must show that \(W = \text{Sym}^n(\mathbb{C}^2)\).

Every nonzero polynomial in \(\text{Sym}^n(\mathbb{C}^2)\) factors (over \(\mathbb{C}\)) as a product of \(n\) linear forms.

The \(n\)-th powers are transitive under the action of \(SL(2)\).

\(SL(2)\) acts transitively on the nonzero vectors of \(\mathbb{C}^2\). Given nonzero \(v, w \in \mathbb{C}^2\), pick \(v'\) such that \(\det[v \mid v'] = 1\) and \(w'\) such that \(\det[w \mid w'] = 1\). Then

\[ g = [w \mid w'][v \mid v']^{-1} \in SL(2) \] \[ gv = w \]

A linear form \(\ell(x,y) = \alpha x + \beta y\) is determined by its coefficient vector \((\alpha, \beta)\). Since \((g^{-1})^T \in SL(2)\) whenever \(g \in SL(2)\), the fact that the coefficient vectors are transitive implies that the \(n\)th powers are transitive.

So, for any nonzero linear forms \(\ell, m\) there exists \(g\) with \(g \cdot \ell^n = m^n\).

The \(n\)-th powers span \(\text{Sym}^n(\mathbb{C}^2)\).

Every monomial \(x^{n-k}y^k\) can be written as a linear combination of \(n\)-th powers. Expand

\[ (\alpha x + y)^n = \sum_{k=0}^n \binom{n}{k} \alpha^{n-k} x^{n-k} y^k \]

and evaluate at \(\alpha = 0, 1, 2, \ldots, n\). This gives \(n+1\) equations in the \(n+1\) unknowns \(\binom{n}{k} x^{n-k} y^k\):

\[ \begin{pmatrix} 0^n & 0^{n-1} & \cdots & 1 \\ 1^n & 1^{n-1} & \cdots & 1 \\ 2^n & 2^{n-1} & \cdots & 1 \\ \vdots & & & \vdots \\ n^n & n^{n-1} & \cdots & 1 \end{pmatrix} \begin{pmatrix} \binom{n}{0} x^n \\ \binom{n}{1} x^{n-1}y \\ \vdots \\ \binom{n}{n} y^n \end{pmatrix} = \begin{pmatrix} y^n \\ (x+y)^n \\ (2x+y)^n \\ \vdots \\ (nx+y)^n \end{pmatrix} \]

The matrix has \((i,j)\)-entry \(i^{n-j}\) for \(i = 0, \ldots, n\) and \(j = 0, \ldots, n\). Its determinant is \(\prod_{0 \leq i < j \leq n}(j - i) \neq 0\), so the system is invertible and each monomial is a linear combination of the \(n\)-th powers on the right-hand side.

(Conclusion)

So, if \(W\) contains any single \(n\)-th power, then it contains all \(n\)-th powers (1), and these span the whole space (2).

So we just need to show \(W\) contains a single \(n\)-th power.

Pick a nonzero \(f \in W\) and write \(f = \sum_{k=0}^n c_k x^{n-k}y^k\). Consider the diagonal matrices \(d(s) = \begin{pmatrix} s & 0 \\ 0 & s^{-1} \end{pmatrix} \in SL(2)\), which act on elements of \(\text{Sym}^n(\mathbb{C}^2)\) by

\[ d(s) \cdot x^{n-k}y^k = s^{n-2k} x^{n-k}y^k \]

So \(d(s) \cdot f = \sum_{k=0}^n c_k \, s^{n-2k} \, x^{n-k}y^k\). The exponents \(n-2k\) are distinct for \(k = 0, \ldots, n\). Evaluating at \(n+1\) distinct values of \(s\) and solving the same invertible system as in (2) extracts each monomial with \(c_k \neq 0\) individually.

Since \(W\) is closed under the action of \(d(s)\) and under linear combinations, there is some monomial \(x^{n-k}y^k \in W\).

Now apply \(g = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \in SL(2)\), which acts by \(x \mapsto x, \, y \mapsto x + y\):

\[ g \cdot x^{n-k}y^k = x^{n-k}(x+y)^k = \sum_{j=0}^k \binom{k}{j} x^{n-j} y^j \]

The \(x^n\) coefficient is \(\binom{k}{0} = 1 \neq 0\). So \(g \cdot x^{n-k}y^k \in W\) has nonzero \(x^n\) term. Apply the diagonal trick again to extract \(x^n \in W\).

Since \(x^n = (x)^n\) is an \(n\)-th power, we are done. \(\square\)

The significance is that the space of binary forms of degree \(n\) is an irreducible representation of \(SL(2)\).

Computing Invariants and Covariants

We know that invariants and covariants form a ring, but how do we compute the actual elements of this ring?

Finite Groups

Let’s start by considering a slightly more abstract problem. We have a group \(G\) acting on a vector space \(V\), and we want to find the subspace of \(V\) that is invariant under the action of \(G\). That is, we want to find the set of vectors \(v \in V\) such that for all \(g \in G\), \(g \cdot v = v\).

If we want to produce a polynomial that is invariant under a group \(G\), one idea is to average (or sum) over all possible transformations. For a finite group, we can simply sum:

\[ \mathcal{R}(f) = \frac{1}{|G|} \sum_{g \in G} g \cdot f \]

The average is invariant. For any \(h \in G\),

\[ h \cdot \mathcal{R}(f) = \frac{1}{|G|} \sum_{g \in G} (hg) \cdot f = \frac{1}{|G|} \sum_{g' \in G} g' \cdot f = \mathcal{R}(f) \]

What’s happened? The map \(g \mapsto hg\) is a bijection on \(G\) (its inverse is \(g \mapsto h^{-1}g\)), and so we are summing the same terms in a different order. Applying \(h\) to every term in the sum just permutes the terms.

What’s more, if some \(f\) is already invariant, then \(\mathcal{R}(f) = f\). This is because \(g \cdot f = f\) for all \(g \in G\), so the sum just gives us \(|G|\) copies of \(f\), which we then divide by \(|G|\) to get back \(f\).

The above operation is called the Reynolds operator, and it is a linear projection from \(V\) onto the subspace of \(G\)-invariant vectors.

Conjugation and Equivariance

If \(A\) is a linear map from \(V\) to itself, then we can define an action of \(G\) on \(A\) by conjugation:

\[ g \star A := \rho(g) A \rho(g)^{-1} \]

Since for all \(v \in V\), \(A(\rho(g) v) = \rho(g) A(v)\) if and only if \(\rho(g) A = A \rho(g)\), it’s also true \(A\) is \(G\)-equivariant if and only if \(g \star A = A\) for all \(g \in G\). So the space of \(G\)-equivariant maps from \(V\) to itself is exactly the space of maps that are invariant under conjugation by \(G\).

This will motivate the construction of the Reynolds operator in the proof of Maschke’s theorem, where we will average over the group to produce a \(G\)-equivariant projection.

Maschke’s Theorem for Finite Groups

Theorem 1 (Maschke’s Theorem for Finite Groups) Let \(\rho: G \to GL(V)\) be a representation of a finite group \(G\), where \(V\) is a (finite-dimensional) vector space over a field \(k\). If the characteristic \(char(k)\) of the field \(k\) does not divide \(|G|\), then \(G\) is completely reducible.

Proof.

We want to show that \(V\) decomposes as a direct sum of irreducible representations. It suffices to show that for any subrepresentation \(W \subseteq V\), there is a complementary subrepresentation \(W' \subseteq V\) such that \(V = W \oplus W'\). If this is true, then we can apply the same argument to \(W\) and \(W'\) to find complementary subrepresentations, and so on, until we have decomposed \(V\) into irreducible representations.

Let \(V\) be a representation of \(G\), and let \(W \subseteq V\) be a subrepresentation that is closed under action of \(G\).

Choose a linear map \(P : V \to V\) such that \(W\) is stable under action by \(P\) (that is, \(\text{im}(P) = W\) and \(Pw = w\) for all \(w \in W\)). This works by standard linear algebra since we assumed finite dimensions.

Each \(g\) acts as a linear map \(\rho(g): V \to V\). Then we can define a new projection \(\mathcal{R}(P): V \to V\) (by averaging over the group):

\[ \mathcal{R}(P) = \frac{1}{|G|} \sum_{g \in G} (g \star P) = \frac{1}{|G|} \sum_{g \in G} \rho(g) \cdot P \cdot \rho(g)^{-1} \]

This new projection \(\mathcal{R}(P)\) is \(G\)-equivariant, since for any \(h \in G\) we have

\[ h\star \mathcal{R}(P) = \frac{1}{|G|} \sum_{g \in G} \rho(h) \rho(g) \cdot P \cdot \rho(g)^{-1}\rho(h)^{-1} \] \[ = \frac{1}{|G|} \sum_{g \in G} \rho(hg) \cdot P \cdot \rho(hg)^{-1} \] \[ = \frac{1}{|G|} \sum_{g' \in G} g' \cdot P \cdot g'^{-1}= \mathcal{R}(P) \]

(Note that if \(k\) divides \(|G|\), then we cannot divide by \(|G|\) and this construction fails, which is why the condition on the characteristic is necessary).

Since \(\mathcal{R}(P)\) is invariant under the conjugation action of \(G\), then \(\mathcal{R}(P)\) is \(G\)-equivariant.

We know that the image of \(\mathcal{R}(P)\) is \(W\), since \(\mathcal{R}(P)(w) = w\) for all \(w \in W\), and \(\mathcal{R}(P)(v) \in W\) for all \(v \in V\). So the kernel of \(\mathcal{R}(P)\) is a complementary subrepresentation to \(W\). By induction on the dimension of \(V\), we can decompose \(V\) into irreducible subrepresentations. \(\square\)

Essentially, we have taken our complements from ordinary linear algebra and equipped them with \(G\)-equivariance.

Compact Groups

Sadly, \(GL(2)\) is not a finite group, so we can’t just sum over all transformations. However, we can still try an analogous trick. Instead of summing, we could integrate.

When does the Reynolds operator exist for continuous groups? We need a measure to integrate over a continuous group such that no “region” of the group is weighted more than any other. If we had such a measure \(\mu\), then we could define the Reynolds operator as

\[ \mathcal{R}(f) = \int_{G} (g \cdot f) \, d\mu(g) \]

for some measure \(\mu\) on the group \(G\). Then for any \(h \in G\):

\[ h \cdot \mathcal{R}(f) = \int_{G} (hg \cdot f) \, d\mu(g) = \int_{G} (g' \cdot f) \, d\mu(g') = \mathcal{R}(f) \]

Luckily for us, in 1933, Haar proved that every compact group has such a measure, and that it is essentially unique³.

How can we construct a Haar measure? If we assume that the group is also smooth (i.e. it is Lie group), then the we can concretely construct the Haar measure. In fact, there is an “algorithm” to do so⁴:

The construction is as follows:

Parametrize the group: write each element as \(U(\theta_1, \ldots, \theta_n)\)
Compute the Maurer-Cartan form \(\Omega = U^{-1} dU\)
Expand in a basis of the Lie algebra: \(\Omega = \omega_1 T_1 + \cdots + \omega_n T_n\)
The Haar measure is \(\omega_1 \wedge \cdots \wedge \omega_n\)

For \(SO(2)\), we can parametrize by \(\theta\), then compute \[ R_\theta^{-1} dR_\theta = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}d\theta \]

and the Haar measure is \(d\theta / 2\pi\) (to normalize the total measure to 1).

We’ll probably explicitly write code for this when we look at computational invariant theory, but the key point is that for compact groups, we can construct a Reynolds operator by integrating over the group with respect to the Haar measure. This allows us to compute invariants and covariants for compact groups.

Maschke’s Theorem for Compact Groups

Theorem 2 (Maschke’s Theorem for Compact Groups) Let \(G\) be a compact Lie group with normalized Haar measure \(\mu\) and \(V\) a finite-dimensional vector space over \(\mathbb{C}\).

Given a continuous finite-dimensional representation \(\rho : G\to GL(V)\) and a \(G\)-stable subspace \(W\subseteq V\), choose a linear projection \(P:V\to V\) with \(\mathrm{im}(P)=W\) and \(Pw = w\) for all \(w \in W\).

Define the averaged operator \[ \mathcal R(P) \;=\; \int_G \rho(g)\,P\,\rho(g)^{-1}\, d\mu(g) \]

Then \(\mathcal R(P)\) is invariant under the \(\star\)-action (by the change of variables \(g\mapsto hg\) and left-invariance of \(\mu\)). So \(\mathcal{R}(P)\) is \(G\)-equivariant.

Also \(\mathrm{im}(\mathcal R(P)) = W\), since \(\mathcal R(P)(w) = P(w) = w\) for all \(w \in W\), and \(\mathcal R(P)(v) \in W\) for all \(v \in V\).

Hence \(\ker(\mathcal R(P))\) is a complement to \(W\) that is invariant under the action of \(G\).

Reductive Groups

Unfortunately, \(GL(2)\) is also noncompact. This means that the group is not bounded, and so we cannot integrate over it in a way that gives us a finite result. The integral can diverge.

To see this, we can decompose a linear transformation in \(GL(2)\) as follows (using the Iwasawa decomposition):

\[ \begin{bmatrix}a & b \\ c & d\end{bmatrix} = \begin{bmatrix}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta\end{bmatrix} \begin{bmatrix}r_1 & 0 \\ 0 & r_2\end{bmatrix} \begin{bmatrix}1 & n \\ 0 & 1\end{bmatrix} \]

The entries are unbounded. So we can try to integrate over the group by integrating over \(\theta, r_1, r_2, n\), but the integrals over \(r_1\) and \(r_2\) could diverge.

However, GL(2) is reductive, which means that it has a nice representation theory that allows us to compute the invariants and covariants without needing to integrate. I cover the relevant representation theory above.

Theorem 3 (Reynolds Operator for Reductive Groups) If \(G\) is a reductive group acting on a vector space \(V\) over a field \(k\) (of characteristic zero), then there exists a Reynolds operator \(\mathcal{R}: k[V] \to k[V]^G\).

Proof. If \(G\) is reductive, then by definition every representation is completely reducible. In particular, each graded piece \(k[V]_d\) (which in this case are the polynomials of degree \(d\)) decomposes as a direct sum of irreducible representations, one of which is the invariant subspace \(k[V]_d^G\). The invariant subspace \(k[V]_d^G\) is the sum of all copies of the trivial representation in this decomposition (since that’s where \(g\cdot v = v\) for all \(g \in G\)). Complete reducibility guarantees that this summand has a complement and that the projection onto it is unique. Applied degree by degree, this projection is just the Reynolds operator. We will also see later (once we have the Hilbert basis theorem) that this is enough to guarantee that the invariant ring is finitely generated.

The reductive groups have been completely classified⁵. They include all finite groups, compact Lie groups, and \(GL(n)\), \(SL(n)\), \(O(n)\), \(Sp(n)\) over characteristic zero. I won’t go into the details here. It will suffice for our purpose to simply know that we can check the list to see if a given group is reductive or not⁶. See below for the details on \(GL(n)\).

Definition of Reductivity

Definition 8 (Reductive) A group \(G\) is reductive if every finite-dimensional representation of \(G\) is completely reducible. That is, a group \(G\) is reductive if for every homomorphism \(\rho: G \to GL(V)\), the representation \(V\) decomposes as a direct sum of irreducible representations.

Here’s some reductive groups (over fields of characteristic zero):

All finite groups (the Reynolds operator \(\mathcal{R}(f) = \frac{1}{|G|}\sum g \cdot f\) projects the space onto its invariants).
All compact Lie groups (same argument, with integration replacing the sum).
The classical groups: \(GL(n)\), \(SL(n)\), \(O(n)\), \(Sp(n)\).

The additive group \(\mathbb{G}_a\) is not reductive.

What we’ve done so far is enough to show all of the groups that we claimed above were reductive, except \(SL(n)\) and \(GL(n)\) (as they are not compact). However, we can show that \(GL(n)\) is reductive by showing that it contains a compact subgroup (the unitary group \(U(n)\)) such that every representation of \(GL(n)\) restricts to a representation of \(U(n)\) that is completely reducible.

Reductivity of \(GL(n)\)

Theorem 4 (Reductivity of \(GL(n)\)) \(GL(n)\) is reductive.

Proof (Based on Weyl’s Unitary Trick).

We will borrow some theorems from linear algebra to do this proof.

Any matrix \(a \in GL(n)\) can be written as \(a = up\), where \(u \in U(n)\) is unitary and \(p\) is positive-definite Hermitian. This is the polar decomposition of \(a\).

The unitary group \(U(n)\) is compact, so by Maschke’s theorem for compact groups, every representation of \(U(n)\) is completely reducible.

Since \(U(n)\) is a subgroup of \(GL(n)\), any representation of \(GL(n)\) restricts to a representation of \(U(n)\). Since the representation of \(U(n)\) is completely reducible, it decomposes as a direct sum of irreducible representations of \(U(n)\).

Next we will show:

If \(\rho: GL(n) \to GL(V)\) is a representation of \(GL(n)\), and \(T: V\to V\) is a \(U(n)\)-equivariant map, then \(T\) is also \(GL(n)\)-equivariant.

If we can do this, then by Schur’s lemma, the projection of \(V\) onto each irreducible summand of the \(U(n)\)-representation is unique up to scaling, and since these projections are also \(GL(n)\)-equivariant, they are also projections onto irreducible summands of the \(GL(n)\)-representation. So the decomposition of \(V\) into irreducible representations of \(U(n)\) is also a decomposition into irreducible representations of \(GL(n)\), and thus every representation of \(GL(n)\) is completely reducible.

We already know that every \(g \in GL(n, \mathbb{C})\) can be written as \(g = up\) for some \(u \in U(n)\) and \(p\) positive-definite Hermitian.

Since \(T\) commutes with the representation \(\rho(u)\), we just need to show that \(T\) also commutes with the representation \(\rho(p)\).

Since \(p\) is positive-definite Hermitian, it can be diagonalized by a unitary matrix. So we can write \(p = vdv^{-1}\), where \(v \in U(n)\) and \(d\) is a diagonal matrix with positive real entries on the diagonal.

Since \(T\) commutes with the representation of \(\rho(v)\), we just need to show that \(T\) also commutes with the representation \(\rho(d)\) (positive diagonal matrices).

Now we will show: if \(T\) commutes with the representation \(\rho(u)\), then it must commute with the representation \(\rho(d)\) for all positive diagonal matrices \(d\).

We can write \(d = \exp(h)\) for some diagonal matrix \(h\) with real entries on the diagonal.

Because the representation \(\rho\) is polynomial (hence analytic), the map \(t \mapsto \rho(\exp(th))\) is a differentiable one-parameter subgroup of \(GL(V)\). Since \(T\) commutes with \(\rho(\exp(ith))\) for all real \(t\) (these are diagonal unitary matrices), differentiating at \(t=0\) shows that \(T\) also commutes with the infinitesimal action of \(h\). Exponentiating again implies that \(T\) commutes with \(\rho(\exp(th))\) for all \(t\), hence with all positive diagonal matrices. By unitary conjugation it therefore commutes with \(\rho(p)\) for every positive-definite Hermitian matrix \(p\). Since every \(g \in GL(n, \mathbb{C})\) can be written as \(g = up\) with \(u \in U(n)\) and \(p\) positive-definite Hermitian, \(T\) commutes with \(\rho(g)\) for all \(g \in GL(n)\).

Thus every \(U(n)\)-equivariant map is \(GL(n)\)-equivariant. \(\square\)

Tensor Products

We are interested in maps between binary forms of different degrees, as we are trying to understand binary forms under change of basis by \(GL(2)\). For example, given two binary forms \(Q_1 \in V(m)\) and \(Q_2 \in V(n)\), we might want to construct a covariant of degree \(d\) from them. We can think of this as constructing a \(GL(2)\)-equivariant map from \(V(m) \otimes V(n)\) to \(V(d)\), since any polynomial built from \(Q_1\) and \(Q_2\) lives in the tensor product \(V(m) \otimes V(n)\).

Representations are closed under both direct sums and tensor products.

If \(V\) and \(W\) are representations of \(G\), then their direct sum \(V \oplus W\) is also a representation, with \(G\) acting componentwise: \(g \cdot (v, w) = (g \cdot v, g \cdot w)\).

Tensor products are more interesting. If \(V\) and \(W\) are representations of \(G\), then their tensor product \(V \otimes W\) is also a representation of \(G\), with the action defined by \(g \cdot (v \otimes w) = (g \cdot v) \otimes (g \cdot w)\).

Lemma 2 (Tensor Product of Representations) If \(V\) and \(W\) are representations of a group \(G\), then their tensor product \(V \otimes W\) is also a representation of \(G\), with the action defined by \(g \cdot (v \otimes w) = (g \cdot v) \otimes (g \cdot w)\).

Proof. We need to check that if \(g \cdot v\) is a representation of \(G\), and \(g \cdot w\) is a representation of \(G\), then \(g \cdot (v \otimes w)\) is a representation of \(G\). For any \(g, h \in G\) and \(v \in V\), \(w \in W\):

\[ (g \cdot (h \cdot (v \otimes w)) = g \cdot ((h \cdot v) \otimes (h \cdot w)) = (g \cdot (h \cdot v)) \otimes (g \cdot (h \cdot w)) = ((gh) \cdot v) \otimes ((gh) \cdot w) = (gh) \cdot (v \otimes w)) \]

Also \(e \cdot (v \otimes w) = (e \cdot v) \otimes (e \cdot w) = v \otimes w\). \(\square\)

The Clebsch-Gordan Decomposition

Unfortunately, even if \(V\) and \(W\) are irreducible, \(V \otimes W\) is not necessarily irreducible. Instead, it decomposes as a direct sum of irreducible representations. The problem of determining how \(V \otimes W\) decomposes into irreducibles is called the Clebsch-Gordan problem.

Motivation

Why do we care about this? We know binary forms of degree \(n\) live in:

\[ V(n) := \mathrm{Sym}^n(\mathbb{C}^2) \]

So given two binary forms of degree \(m\) and \(n\), then any polynomials built from \(Q_1 \in V(m)\), \(Q_2 \in V(n)\) live in the tensor product \(V(m) \otimes V(n)\).

Covariants constructed from \(Q_1\) and \(Q_2\) come from from \(GL(2)\)-equivariant maps

\[ V(m) \otimes V(n) \to V(d) \]

How can we find the irreducible representations \(V(d)\) that appear in the decomposition of \(V(m) \otimes V(n)\)?

Start⁷ by viewing an element of

\[ V(m)\otimes V(n) \]

as a polynomial in two pairs of variables, \((x_1,y_1)\), \((x_2,y_2)\) that is homogeneous of degree \(m\) in \((x_1,y_1)\) and degree \(n\) in \((x_2,y_2)\).

So:

\[ V(m)\otimes V(n) \cong k[x_1,y_1,x_2,y_2]_{m,n} \]

(The space of bihomogeneous polynomials of bidegree \((m,n)\)).

The group \(SL(2)\) (as a stand-in for \(GL(2)\)) act on both pairs simultaneously by linear substitution.

Diagonal Restriction

Let \(\mu\) be an \(SL(2)\)-equivariant map

\[ \mu : V(m)\otimes V(n) \to V(m+n) \]

obtained by identifying the two pairs of variables:

\[ \mu(f(x_1,y_1;x_2,y_2)) = f(x,y;x,y) \]

In other words, we restrict the polynomial to the diagonal

\[ (x_1,y_1) = (x_2,y_2) \]

The image consists exactly of homogeneous polynomials of degree \(m+n\), so

\[ \mathrm{im}(\mu) = V(m+n) \]

The Kernel

Which bihomogeneous polynomials vanish on the diagonal?

The diagonal in \((\mathbb C^2)^2\) is defined by the equation

\[ x_1y_2 - y_1x_2 = 0 \]

Denote this determinant by

\[ [12] := x_1y_2 - y_1x_2 \]

Any polynomial that vanishes on the diagonal must therefore be divisible by \([12]\).

(Since \([12]\) is linear in each pair of variables, it is irreducible. The quotient ring \(k[x_1,y_1,x_2,y_2]/([12])\) is therefore a domain, so if \(f\) vanishes wherever \([12]\) does, then \(f \equiv 0\) in this quotient, meaning \([12]\) divides \(f\).)

Thus

\[ \ker(\mu) = [12]\cdot k[x_1,y_1,x_2,y_2]_{m-1,n-1} \]

Multiplication by \([12]\) raises the degree in each pair by one, so this space is naturally isomorphic to

\[ V(m-1)\otimes V(n-1) \]

We therefore obtain an exact sequence

\[ 0 \to V(m-1)\otimes V(n-1) \to V(m)\otimes V(n) \to V(m+n) \to 0 \]

Iterating the Construction

Applying the same argument to \(V(m-1)\otimes V(n-1)\) yields

\[ 0 \to V(m-2)\otimes V(n-2) \to V(m-1)\otimes V(n-1) \to V(m+n-2) \to 0 \]

Continuing inductively produces a filtration whose successive quotients are

\[ V(m+n),\; V(m+n-2),\; V(m+n-4),\; \dots \]

until the process terminates after \(m\) steps (since we can’t have negative degree).

Clebsch–Gordan Decomposition

Theorem 5 (Clebsch–Gordan Decomposition) For \(m \le n\),

\[ V(m)\otimes V(n) \cong \bigoplus_{r=0}^{m} V(m+n-2r) \]

Equivalently,

\[ \mathrm{Sym}^m(\mathbb C^2)\otimes \mathrm{Sym}^n(\mathbb C^2) \cong \bigoplus_{r=0}^{m} \mathrm{Sym}^{m+n-2r}(\mathbb C^2) \]

Proof.

We have already shown that \(V(m+n-2r)\) appears as a quotient in the filtration of \(V(m)\otimes V(n)\) for each \(r = 0,1,\ldots,m\).

Since \(SL(2)\) is reductive, each short exact sequence in this filtration splits, so these quotients appear as \(SL(2)\)-stable direct summands and we obtain an \(SL(2)\)-equivariant inclusion

\[ \bigoplus_{r=0}^{m} V(m+n-2r) \subseteq V(m)\otimes V(n) \]

Finally, observe that

\[ \sum_{r=0}^{m} (m+n-2r+1) = (m+1)(n+1) = \dim\bigl(V(m)\otimes V(n)\bigr) \]

so the inclusion is an equality. \(\square\)

The Clebsch–Gordan decomposition tells us exactly which irreducible representations occur inside the tensor product \(V(m)\otimes V(n)\). Each representation

\[ V(m+n-2r) \]

appears once.

As a consequence, any \(SL(2)\)–equivariant linear map

\[ V(m)\otimes V(n) \to V(m+n-2r) \]

must be unique up to a scalar multiple.

As we have already seen, by Schur’s lemma the space of equivariant maps between two irreducible representations is one–dimensional when the representations are isomorphic and zero otherwise.

Thus, by the Clebsch–Gordan decomposition, for each \(r\) there exists a unique canonical equivariant projection

\[ V(m)\otimes V(n) \to V(m+n-2r) \]

(up to scaling).

We call these projections the transvectants. They are the building blocks of all \(SL(2)\)-equivariant maps between symmetric powers (and, as we will see, the building blocks of all covariants of binary forms).

Computing Transvectants

Writing the binary forms as \(Q_1 \in \text{Sym}^m(\mathbb{C}^2)\) and \(Q_2 \in \text{Sym}^n(\mathbb{C}^2)\), the projection onto \(\text{Sym}^{m+n-2r}(\mathbb{C}^2)\) is (up to normalization) the \(r\)-th transvectant is:

\[ (Q_1, Q_2)^{(r)} = \sum_{k=0}^r (-1)^k \binom{r}{k} \frac{\partial^r Q_1}{\partial x^{r-k} \partial y^k} \cdot \frac{\partial^r Q_2}{\partial x^k \partial y^{r-k}} \]

To see this, note that this formula is manifestly \(SL(2)\)-equivariant⁸ and maps \(\text{Sym}^m \otimes \text{Sym}^n \to \text{Sym}^{m+n-2r}\) (each differentiation reduces degree by 1, and we differentiate \(r\) times in each factor).

By Schur’s lemma, any equivariant map between these spaces is unique up to scalar, so the transvectant must be the Clebsch-Gordan projection (up to normalization).

The first transvectant (\(r = 1\)) is the Jacobian:

\[ [Q_1, Q_2] := (Q_1, Q_2)^{(1)} = \frac{\partial Q_1}{\partial x} \frac{\partial Q_2}{\partial y} - \frac{\partial Q_1}{\partial y} \frac{\partial Q_2}{\partial x} \]

The second self-transvectant (\(Q_1 = Q_2 = Q\), \(r = 2\)) gives the Hessian (up to a factor of 2):

\[ (Q, Q)^{(2)} = 2\left(\frac{\partial^2 Q}{\partial x^2} \frac{\partial^2 Q}{\partial y^2} - \left(\frac{\partial^2 Q}{\partial x \partial y}\right)^2\right) \]

Since the Clebsch-Gordan decomposition is complete, the transvectants cover all \(SL(2)\)-equivariant pairings between symmetric powers.

First Fundamental Theorem of Invariants for Binary Forms

We know the covariant ring is finitely generated, but what are the generators? We need the First Fundamental Theorem for Binary Forms under \(GL(2)\). This theorem states essentially that every polynomial covariant of a system of binary forms can be written as polynomial in the transvectants of that system. This means that if we can generate all the transvectants, then we can generate all the covariants.

Theorem 6 (First Fundamental Theorem) Let \((x_1,y_1),\dots,(x_p,y_p)\) be \(p\) copies of \(\mathbb C^2\) with the diagonal action of \(GL(2)\), and define \[ [ij] = x_i y_j - y_i x_j \]

Then the invariant ring \[ k[x_1,y_1,\dots,x_p,y_p]^{SL(2)} \]

is generated by the brackets \([ij]\).

Proof.

Let

\[ A = k[x_1,y_1,\dots,x_p,y_p] \]

with the diagonal action of \(SL(2)\) on each pair \((x_i,y_i)\). Define

\[ [ij] := x_i y_j - y_i x_j \]

(You can think of \([ij]\) as the determinant of the \(2\times 2\) matrix formed by the \(i\)-th and \(j\)-th columns of the matrix of variables).

Each \([ij]\) is \(SL(2)\)-invariant.

If \(v_i=(x_i,y_i)^T\) and \(g\in SL(2)\), then

\[ [ij](gv_1,\dots,gv_p)=\det(gv_i,gv_j)=\det(g)\det(v_i,v_j)=\det(v_i,v_j)=[ij](v_1,\dots,v_p) \]

So \(k\bigl[\, [ij] \mid 1 \le i < j \le p \,\bigr]\subseteq A^{SL(2)}\).

Normalize two columns using an explicit \(SL(2)\) matrix.

Fix \((1,2)\) and assume \([12]\neq 0\). Set

\[ S=\begin{pmatrix}x_1 & x_2\\ y_1 & y_2\end{pmatrix} \]

Then \(\det(S)=[12]\). Define

\[ A_{12}:= \begin{pmatrix}1/[12] & 0\\ 0 & 1\end{pmatrix}\operatorname{adj}(S) \]

Since \(\det(\operatorname{adj}(S))=\det(S)=[12]\) and \(\det\!\begin{pmatrix}1/[12] & 0\\ 0 & 1\end{pmatrix}=1/[12]\), we have \(\det(A_{12})=1\), so \(A_{12}\in SL(2)\) whenever \([12]\neq 0\).

Also \(\operatorname{adj}(S)\,S=[12]I\), so

\[ A_{12}S=\begin{pmatrix}1 & 0\\ 0 & [12]\end{pmatrix} \]

Equivalently,

\[ A_{12}v_1=e_1 \] \[ A_{12}v_2=[12]\,e_2 \]

For \(k\ge 3\), write

\[ A_{12}v_k=\begin{pmatrix}a_k\\ b_k\end{pmatrix} \]

Because \(\det(A_{12})=1\), brackets are unchanged under \(A_{12}\), so

\[ [1k]=\det(v_1,v_k)=\det(A_{12}v_1,A_{12}v_k)=\det\!\left(e_1,\begin{pmatrix}a_k\\ b_k\end{pmatrix}\right)=b_k \]

and

\[ [2k]=\det(v_2,v_k)=\det(A_{12}v_2,A_{12}v_k)=\det\!\left([12]e_2,\begin{pmatrix}a_k\\ b_k\end{pmatrix}\right)=-[12]\,a_k \]

Hence

\[ b_k=[1k] \] \[ a_k=-\frac{[2k]}{[12]} \]

So, after applying \(A_{12}\), the normalized matrix \(A_{12}M\) is determined by the bracket data \([12]\), \([1k]\), \([2k]\).

An invariant polynomial is a polynomial in the brackets.

Let \(f\in A^{SL(2)}\). For any point with \([12]\neq 0\), invariance gives

\[ f(M)=f(A_{12}M) \]

But \(A_{12}M\) has entries that are rational functions of the brackets (the only denominators are powers of \([12]\)), so on the region \([12]\neq 0\) we can write

\[ f(M)=\frac{P([ij])}{[12]^N} \]

for some polynomial \(P\) and some \(N\ge 0\).

Multiply both sides by \([12]^N\):

\[ [12]^N f(M)=P([ij]) \]

Both sides are polynomials in the coordinates \((x_r,y_r)\). Since the identity holds whenever \([12]\neq 0\), it also holds identically as a polynomial identity. In particular, the right-hand side is divisible by \([12]^N\) in \(A\), so

\[ f(M)=Q([ij]) \]

for some polynomial \(Q\) in the brackets.

So every \(SL(2)\)-invariant polynomial lies in \(k\bigl[\, [ij] \mid 1 \le i < j \le p \,\bigr]\).

Since we have both inclusions, we conclude that

\[ A^{SL(2)} = k\bigl[\, [ij] \mid 1 \le i < j \le p \,\bigr] \]

\(\square\)

This is the symbolic version of the First Fundamental Theorem. The corresponding statement for covariants of binary forms is obtained by translating bracket expressions into iterated transvectants.

In other words, in the symbolic calculus every invariant is obtained by multiplying and combining these basic determinants. When translated back to binary forms, these determinant-contractions correspond to the iterated Clebsch-Gordan projections. For example, the Jacobian \([Q_1, Q_2]\) corresponds to the first transvectant \((Q_1, Q_2)^{(1)}\), and the Hessian \((Q, Q)^{(2)}\) corresponds to the second self-transvectant \((Q, Q)^{(2)}\).

Second Fundamental Theorem of Invariants for Binary Forms

The First Fundamental Theorem tells us what generates the covariant ring (the transvectants). The Second Fundamental Theorem tells us what relations those generators satisfy.

Theorem 7 (Second Fundamental Theorem) Let \[ \phi : k[T_{ij}\mid 1\le i<j\le p] \to k[x_1,y_1,\dots,x_p,y_p]^{SL(2)} \] be the homomorphism defined by \[ T_{ij}\mapsto [ij] \]

Then \(\ker(\phi)\) is generated by the quadratic relations \[ T_{ij}T_{kl} + T_{ik}T_{lj} + T_{il}T_{jk} = 0 \]

for all indices \(i,j,k,l\).

Proof.

Let

\[ R := k[x_1,y_1,\dots,x_p,y_p]^{SL(2)} \]

and define the surjective homomorphism (by FFT)

\[ \phi : k[T_{ij}\mid 1\le i<j\le p] \to R \] \[ T_{ij}\mapsto [ij]=x_i y_j-y_i x_j \]

Let \(I\) be the ideal generated by the quadratic polynomials

\[ Q_{ijkl}:=T_{ij}T_{kl}+T_{ik}T_{lj}+T_{il}T_{jk} \]

for all indices \(i,j,k,l\).

We need to prove \(\ker(\phi)=I\). We will do this in four steps:

The quadratic identities lie in the kernel.

For all \(i,j,k,l\),

\[ [ij][kl]+[ik][lj]+[il][jk]=0 \]

as polynomials in the coordinates \((x_r,y_r)\).

Expand each bracket product:

\[ [ij][kl]=(x_i y_j-y_i x_j)(x_k y_l-y_k x_l) \] \[ = x_i y_j x_k y_l - x_i y_j y_k x_l - y_i x_j x_k y_l + y_i x_j y_k x_l \]

Similarly,

\[ [ik][lj]=(x_i y_k-y_i x_k)(x_l y_j-y_l x_j) \] \[ = x_i y_k x_l y_j - x_i y_k y_l x_j - y_i x_k x_l y_j + y_i x_k y_l x_j \]

and

\[ [il][jk]=(x_i y_l-y_i x_l)(x_j y_k-y_j x_k) \] \[ = x_i y_l x_j y_k - x_i y_l y_j x_k - y_i x_l x_j y_k + y_i x_l y_j x_k \]

Now add the three expansions. Every monomial cancels with an identical monomial of opposite sign (after commuting scalars), so the sum is identically zero.

Therefore

\[ \phi(Q_{ijkl})=0 \]

for all \(i,j,k,l\), hence

\[ I \subseteq \ker(\phi) \]

Straightening procedure

Call a monomial

\[ M = T_{i_1 j_1} T_{i_2 j_2}\cdots T_{i_m j_m} \qquad (i_t<j_t) \]

standard if

\[ i_1\le i_2\le \cdots \le i_m \quad\text{and}\quad j_1\le j_2\le \cdots \le j_m \]

If \(M\) is not standard, then there exist two factors \(T_{ij}T_{kl}\) with

\[ i<k \quad\text{but}\quad j>l \]

Apply the quadratic identity to these four indices and solve for the crossed product:

\[ T_{ij}T_{kl} \equiv -\,T_{ik}T_{lj} - T_{il}T_{jk}\pmod I \]

This rewrite replaces the crossed pair \((ij),(kl)\) by a sum of terms where the second indices are less out of order.

To see termination, sort the factors by increasing \(i\)-index and count inversions in the resulting list of \(j\)-indices. Each rewrite strictly decreases this inversion count, so repeated rewriting must stop.

Hence every monomial is congruent mod \(I\) to a \(k\)-linear combination of standard monomials. In particular, standard monomials span

\[ k[T_{ij}]/I \]

Standard monomials are linearly independent.

It suffices to prove linear independence in each homogeneous degree \(d\) in the variables \(T_{ij}\).

Fix such a degree \(d\), and set

\[ N_r := (d+1)^r \qquad (r=1,\dots,p) \]

Now specialize

\[ x_r = u^{N_r}, \qquad y_r = v^{N_r} \]

Then for \(i<j\),

\[ [ij] = x_i y_j - y_i x_j = u^{N_i} v^{N_j} - u^{N_j} v^{N_i} \]

Under lexicographic order with \(u \gg v\), the leading monomial of \([ij]\) is

\[ u^{N_j} v^{N_i} \]

Therefore, if

\[ M = T_{i_1 j_1}\cdots T_{i_d j_d} \]

is a standard monomial, then the leading monomial of \(\phi(M)\) is

\[ u^{N_{j_1}+\cdots+N_{j_d}} \, v^{N_{i_1}+\cdots+N_{i_d}} \]

Because \(M\) is standard, the sequences \(i_1 \le \cdots \le i_d\) and \(j_1 \le \cdots \le j_d\) are weakly increasing. Since each index can occur at most \(d\) times, the sums

\[ N_{i_1}+\cdots+N_{i_d}, \qquad N_{j_1}+\cdots+N_{j_d} \]

uniquely determine the multisets \(\{i_1,\dots,i_d\}\) and \(\{j_1,\dots,j_d\}\), because the \(N_r=(d+1)^r\) are powers of \(d+1\) and base-\((d+1)\) expansion is unique.

Thus distinct standard monomials have distinct leading monomials after this specialization. Hence their images under \(\phi\) are linearly independent.

Conclusion

By (1), \(I\subseteq\ker(\phi)\).

By (2), every element of \(k[T_{ij}]/I\) is a linear combination of standard monomials.

By (3), the images of standard monomials in \(R\) are linearly independent, so the induced map

\[ \overline\phi:\; k[T_{ij}]/I \to R \]

is injective. Since \(\phi\) is surjective, \(\overline\phi\) is an isomorphism.

Thus

\[ \ker(\phi)=I \]

So every relation among the bracket generators is generated by the quadratic identities.

\(\square\)

Summary for Reductive Groups

For reductive groups, we can compute the invariants using the Reynolds operator, which exists by Maschke’s theorem for reductive groups. For finite groups and compact Lie groups, we can construct the Reynolds operator by summing or integrating over the group. And for noncompact reductive groups like \(GL(2)\), we can compute the invariants using representation theory to identify the equivariant projections directly (e.g. the Clebsch-Gordan decomposition for \(SL(2)\)).

Nonreductive Groups

What if GL(2) was non-reductive? Hilbert’s 14th problem asked whether the invariant ring of a linear group action on a polynomial ring is always finitely generated.

For non-reductive groups, the answer is no. In 1959, Nagata constructed an explicit counterexample (showing an action of \(\mathbb{G}_a^{13}\) on a polynomial ring that has an invariant ring requiring infinitely many generators).

So we can’t necessarily use representation theory to compute the invariants, and the Reynolds operator might not exist. In fact, the invariant ring may not even be finitely generated.

Based on a Claude-based survey, it seems there are a few options in these cases, which I’ll presumably examine in the future⁹.

Overall Flow

If we look back at what we did to compute the invariants, we can see that we basically followed a process:

We have some polynomial ring¹⁰ \(k[V]\) over field \(k\) of characteristic zero¹¹ and a group¹² action \(G\) on \(V\). We want the invariant ring \(k[V]^G\).
Is the group finite? If so, we can compute the invariants by summing over the group (Reynolds operator).
Is the group infinite, but compact? If so, we can compute the invariants by integrating over the group with respect to the Haar measure (Reynolds operator). Constructing the Haar measure depends on the topology of \(G\):
1. If \(G\) is continuous, then it is a Lie group¹³, and we can construct the Haar measure explicitly via the Maurer-Cartan form (as we did above for \(SO(2)\)).
2. If \(G\) is totally disconnected (e.g. profinite groups, \(p\)-adic analytic groups like \(GL(n, \mathbb{Z}_p)\)), the Haar measure still exists but the construction is more difficult¹⁴.
Is the group noncompact but reductive¹⁵? If so, a Reynolds operator exists, but the actual computation uses representation theory to identify the equivariant projections directly.
Is the group non-reductive? If so, there is no general method. However, idiosyncratic methods exist for specific cases.

Our example \(GL(2)\) flows down to Step 4. The unitarian trick gives us a Reynolds operator by integrating over \(U(2)\), and representation theory (Clebsch-Gordan) tells us the result is exactly the transvectant calculus.

In all cases where a Reynolds operator exists (steps 2–4), the general algorithm for finding generators is to use the Molien series to count how many invariants to expect in each degree, apply \(\mathcal{R}\) to produce candidates, use Grobner bases to test algebraic independence, and finally stop when the Molien series terminates. Each of these steps requires theoretical justification, which we’ll look at in the next section.

This overall process is handy for computational approaches, where we encode these algorithms as code. Probably we will mostly deal with finite or compact groups, but it’s good to have the general picture in mind.

General Theory

In addition to being able to find invariants, we also want to understand the structure of the invariant ring. For example, can we find all of the invariants? Is a particular invariant ring finitely generated? If so, what are the generators of this ring, what are the relationships between the generators, and what does the geometry of the invariant ring look like?

Finding All Invariants

To prove that we have found all the invariants for a given transformation acting on a object, we need to show that the process terminates (i.e. it does not produce an infinite sequence of new invariants) and that it is complete (i.e. it produces all the invariants).

We’ll continue to look at the example of binary forms, but the same questions apply to any group action on any object.

Noetherian Rings

We need to first introduce the concept of a Noetherian ring¹⁶.

An ideal \(I\) of a ring \(R\) is a subset that is closed under addition and under multiplication by any element of \(R\): if \(f \in I\) and \(r \in R\), then \(f + g \in I\) for any \(g \in I\), and \(rf \in I\). An ideal is finitely generated if there exist finitely many elements \(f_1, \ldots, f_m \in I\) such that every element of \(I\) can be written as \(r_1 f_1 + \cdots + r_m f_m\) for some \(r_i \in R\).

Definition 9 (Noetherian Ring) A ring \(R\) is Noetherian if every ideal of \(R\) is finitely generated.

Fields, like \(\mathbb{R}\) or \(\mathbb{C}\), are Noetherian, since their only ideals are \((0)\) and \(k\) itself, both finitely generated.

Hilbert’s Basis Theorem

Now we are ready to state Hilbert’s Basis Theorem, which is the key result that allows us to conclude that the invariant ring of a group action on a polynomial ring is finitely generated.

Theorem 8 (Hilbert’s Basis Theorem) If \(R\) is a Noetherian ring, then the polynomial ring \(R[x]\) is also Noetherian. Equivalently, if \(R\) is Noetherian, then every ideal of \(R[x]\) is finitely generated.

Proof.

Fix an ideal \(I \subset R[x]\).

If \(I\) is not finitely generated, then we can construct an infinite sequence by choosing each \(f_{n+1}\) to be an element of minimal degree in \(I \setminus (f_1, \ldots, f_n)\). By construction, the degrees are non-decreasing.

Let \(a_i\) be the corresponding leading coefficient for each \(f_i\). Since \(R\) is Noetherian, the ideal generated by these leading coefficients, \((a_1, a_2, \ldots)\), is finitely generated. Therefore, there exists some \(n\) such that, for all \(i > n\), we have \(a_i \in (a_1, \ldots, a_n)\).

So for all \(i > n\), we can write \(a_i = r_1 a_1 + \cdots + r_m a_m\) for some set of \(r_i \in R\), where \(m \leq n\).

The leading term of each \(f_i\) can be written \(a_i x^d\) for some degree \(d\). Let \(d_{*}\) be the smallest degree for which there exists a polynomial in \(I\) not already in \((f_1, \ldots, f_n)\). Since \(f_{n+1} \notin (f_1, \ldots, f_n)\), the degree of \(f_{n+1}\) must be at least \(d_{*}\).

Since \(a_{n+1}\) is a linear combination of \(a_1, \ldots, a_n\), we can linearly combine the leading terms of \(f_1, \ldots, f_n\) to get a leading term \(a_{n+1} x^{d_{n+1}}\)

\[ a_{n+1} x^{d_{n+1}} = r_1 a_1 x^{d_1} x^{d_{n+1} - d_1} + \cdots + r_m a_m x^{d_m} x^{d_{n+1} - d_m} \]

We can subtract this linear combination from \(f_{n+1}\) to get a new polynomial of lower degree. If we repeat this process a finite number of times, we can eventually get a polynomial \(g\) that has degree less than \(d_{*}\).

So we can write \(f_{n+1}\) as a linear combination of \(f_1, \ldots, f_n\) plus a remainder polynomial \(g\) of degree less than \(d_{*}\):

\[ f_{n+1} = q_1(x) f_1 + \cdots + q_n(x) f_n + g \]

But we said that \(d_{*}\) is the smallest degree not contained in the ideal generated by \(f_1, \ldots, f_n\). Since \(g\) has degree less than \(d_{*}\), it must be contained in the ideal generated by \(f_1, \ldots, f_n\). So \(f_{n+1}\) is contained in the ideal generated by \(f_1, \ldots, f_n\).

This is a contradiction. Therefore, every ideal of \(R[x]\) is finitely generated, and \(R[x]\) is Noetherian. \(\square\)

Conceptually, we just did long division over and over again, and the Noetherian condition guaranteed that this process terminated after finitely many steps.

Generators

What are the relations between the generators? This question (for binary forms) is answered by the Second Fundamental Theorem, which gives a complete description of the syzygies (relations) between the generators of the invariant ring.

Syzygies

We know from the First Fundamental Theorem that transvectants generate all the covariants. But the generators are not algebraically independent. They have relations between them called syzygies.

Definition 10 (Syzygy) Given a field \(k\) of characteristic zero, and a polynomial \(F \in k[T_1, \ldots, T_s]\) in \(s\) variables, a syzygy among generators \(J_1, \ldots, J_s\) of a graded ring is a polynomial relation

\[ F(J_1, \ldots, J_s) = 0 \]

Equivalently, if \(\phi: k[T_1, \ldots, T_s] \to k[V]^G\) is the surjection sending \(T_i \mapsto J_i\), then the syzygies are the elements of \(\ker \phi\).

So a syzygy is a polynomial relation among the generators of the invariant ring. The set of all syzygies forms an ideal in the polynomial ring \(k[T_1, \ldots, T_s]\), called the syzygy ideal.

Since the syzygy ideal is itself an ideal over \(k[T_1, \ldots, T_s]\), we can continue to iterate this process. The relations between the generators of the syzygy ideal are called second-order syzygies, and so on. Does this process terminate?

Hilbert’s Syzygy Theorem

Theorem 9 (Hilbert’s Syzygy Theorem) Consider a vector space \(V\) over a field \(k\) of characteristic zero, and let \(k[V]\) be the polynomial ring on \(V\).

Let \(S = k[T_1, \ldots, T_s]\) be a polynomial ring in \(s\) variables, and let \(\phi: S \to k[V]^G\) be a surjection sending \(T_i \mapsto J_i\), where \(J_1, \ldots, J_s\) are generators of the invariant ring.

Consider the “tower of syzygies” generated recursively by \(\ker \phi\):

Pick generators \(R_1^{(0)}, \ldots, R_{m_0}^{(0)}\) of \(\ker \phi\).
Let \(K_1\) be the set of tuples \((p_1, \ldots, p_{m_0})\) such that \(p_1 R_1^{(0)} + \cdots + p_{m_0} R_{m_0}^{(0)} = 0\).
Pick generators \(R_1^{(1)}, \ldots, R_{m_1}^{(1)}\) of \(K_1\), and continue.

The tower of syzygies of \(\ker \phi\) vanishes for all \(n > s\).

Proof.

Omitted. A full proof of Hilbert’s Syzygy Theorem requires too much machinery that is beyond the scope of this blog post (Grobner bases, free resolutions of modules). See Cox–Little–O’Shea, Ideals, Varieties, and Algorithms, Chapter 10 for an “algorithmic” approach, or a homological algebra text.

The intuition is that each variable provides one independent “direction” in which cancellations can occur. After using up all \(s\) variables, there are no new directions left for higher syzygies to appear. In other words, there are only finitely many levels of syzygies, and we can find all of them in a finite amount of time.

I tried to look for a more elementary proof, but I couldn’t find one.

\(\square\)

Note that this also extends to modules over \(k[T_1, \ldots, T_s]\) (e.g. the module of covariants).

Geometry

Nullstellensatz

Theorem 10 (Hilbert’s Nullstellensatz) An ideal \(I\) of a polynomial ring \(R = k[x_1, \ldots, x_n]\) corresponds to the set of common zeros of the polynomials in \(I\). That is, denote \(V(I) = \{(a_0, ..., a_n) \in k^n \mid f(a_0, \ldots, a_n) = 0 \text{ for all } f \in I\}\).

Conversely, let \(V\) denote a subset of \(k^n\) (meaning tuples in \(k\)) that corresponds to the ideal of all polynomials that vanish on \(V\). That is, denote \(I(V) = \{f \in R \mid f(x) = 0 \text{ for all } x \in V\}\).

Let \(k\) be an algebraically closed field and \(R = k[x_1, \ldots, x_n]\).

If \(I \subseteq R\) is an ideal and \(f \in R\) vanishes at every common zero of \(I\), then \(f^m \in I\) for some \(m \geq 1\).

Equivalently, define \(\sqrt{I} = \{f \in R \mid f^m \in I \text{ for some } m \geq 1\}\). Then \(I(V(I)) = \sqrt{I}\).

Proof.

Assume \(f\) vanishes on \(V(I)\). We want to show that \(f^m \in I\) for some \(m\).

Introduce a new variable \(t\) and consider the ideal \[ J = I + (1 - tf) \subset k[x_1, \ldots, x_n, t] \]

Suppose \((a,t)\) is a common zero of \(J\). Then every polynomial in \(I\) vanishes at \(a\), so \(a \in V(I)\). The equation \(1 - tf(a) = 0\) therefore implies \(tf(a) = 1\). But \(f(a)=0\) for every \(a \in V(I)\), which is impossible. Therefore \(J\) has no common zero.

(To see this: if \(J\) were a proper ideal, it would be contained in some maximal ideal \(\mathfrak{m}\). Then \(k[x_1,\ldots,x_n,t]/\mathfrak{m}\) is a field that is finitely generated as a \(k\)-algebra. But any field finitely generated as an algebra over an algebraically closed field \(k\) must equal \(k\) itself (since each generator satisfies a polynomial over \(k\), and \(k\) already contains all roots). So \(\mathfrak{m} = (x_1 - a_1, \ldots, x_n - a_n, t - b)\) for some point \((a,b)\), meaning \(J\) has a common zero — contradicting what we just showed.)

An ideal with no common zero must contain \(1\), so \(1 \in J\). Thus we can write \[ 1 = g_1 f_1 + \cdots + g_r f_r + h(1 - tf) \]

where \(f_1, \ldots, f_r \in I\) and \(g_1, \ldots, g_r, h\) are polynomials in \(k[x_1, \ldots, x_n, t]\).

Substitute \(t = 1/f\) into this equation. The term \(h(1 - tf)\) becomes zero, so we obtain

\[ 1 = g_1 f_1 + \cdots + g_r f_r \]

This expression may contain denominators coming from \(1/f\). Multiplying both sides by a sufficiently large power \(f^m\) clears the denominators and yields

\[ f^m = a_1 f_1 + \cdots + a_r f_r \]

for some polynomials \(a_1, \ldots, a_r \in k[x_1, \ldots, x_n]\).

Thus \(f^m \in I\). \(\square\)

How do we interpret this?

If we have a set of polynomials that generates an ideal \(I\), then the common zeros of \(I\) are exactly the points where all the polynomials in \(I\) vanish.

If we have a set of polynomials that vanishes on a set of points \(V\), then the ideal generated by those polynomials contains all polynomials that vanish on \(V\) (up to radicals).

So we can go back and forth between the algebraic relations among the generators and the geometric shape of the solution set.

For invariants theory, consider the map:

\[ \pi : V \to k^s \] \[ v \mapsto (J_1(v), \ldots, J_s(v)) \]

This takes each point in \(V\) and maps it to the tuple of its invariants. Since invariants are constant under action of \(G\), this map is constant on orbits of \(G\). So \(\pi\) is constant along \(G\)-orbits.

Now, consider

\[ \phi: k[T_1, \ldots, T_s] \to k[V]^G \]

which maps \(T_i \mapsto J_i\). The kernel of \(\phi\) is the ideal of relations among the generators.

By the Nullstellensatz, the \(J_i\) satisfy the relations in \(\ker \phi\), and any other relation satisfied by the \(J_i\) is a consequence of those in \(\ker \phi\).

So the \(J_i\) behave like coordinates on the image of \(\pi\), and the relations in \(\ker \phi\) determine the shape of that image. In other words, the algebraic structure of the invariant ring determines the geometry of the orbit space \(V//G\).

So, given some space, we can look at what \(G\) leaves fixed. If we use those invariants as coordinates, we can get a smaller space that captures the structure of the original space, but with the symmetries “divided out”.

Summary

The three theorems fit together nicely.

The Basis Theorem tells us the invariant ring is finitely generated, so the orbit space \(V//G\) is finite-dimensional.
The Syzygy Theorem describes the relations among the generators, which determine the shape of \(V//G\).
The Nullstellensatz says the algebra of the invariant ring gives the geometry of the orbit space, so we can understand the geometry of \(V//G\) by understanding the algebra of \(k[V]^G\).

Conclusion

We’ve now explored the process of computing invariants and looked at the theory surrounding that process, especially for the particular case of binary forms with coefficients from fields of characteristic zero. We also now have a process we can follow where, given a group action on a ring, we first check if the group is finite, compact, or reductive, and then apply the appropriate method to compute the invariants. The footnotes also give some ideas extend this process to other types of objects and group actions.

We have also discussed the structure of the invariant ring, including the question of finite generation, the relations between generators (syzygies), and the geometry of the invariant ring. Each of Hilbert’s three theorems answers one of these questions, and they are all fundamental to our understanding of invariant theory, as well as modern mathematics. For example, Noether’s work on finite generation led to the concept of Noetherian rings, which is fundamental to commutative algebra. Hilbert’s Syzygy Theorem led to the development of homological algebra, and the Nullstellensatz is a cornerstone of algebraic geometry¹⁷.

In the next post, I’ll look more closely at the computational aspects of invariant theory, including algorithms for computing invariants and covariants (such as the Molien series, Gröbner bases, primary/secondary decomposition, and Kemper’s algorithms).

Possibilities for applications include game theory, allometric scaling (allometric scaling laws can be viewed as syzygies of the invariant ring of the scaling group acting on biological observables), multilevel selection, and machine learning. I have several threads I’ve been developing, which include applying the geometric controls framework to games, stacking symmetries to constrain admissible Lagrangians, rederiving allometric scaling from representation theory, and looking at multilevel selection mathematically, all of which seem to require invariant theory. I don’t know exactly how yet, so the plan is to learn the core algorithms by implementing them, and see where they lead

AI Disclosure

I used AI to brainstorm, find references, edit, organize sections (a huge pain), format LaTeX, and check proofs.

Footnotes

In retrospect, this was a mistake, as the classical setting turns out to be very different (and much more complicated) than the standard examples for the computational setting, and this clutters the narrative of the post (which is trying to both give the overall process and also work through an example).↩︎
Be careful here. The linear form \(Q_1(x,y) = x + 2y\) has inhomogeneous version \(Q_1(p) = p + 2\), and the quadratic form \(Q_2(x,y) = xy + 2y^2\) also seems to have \(Q_2(p) = p + 2\). But they are not the same, since \(Q_1\) is linear and \(Q_2\) is quadratic. So we need to track the overall degree of the form to ensure that these mappings between homogeneous and inhomogeneous polynomials are unique.↩︎
Haar’s theorem is out of scope of this blog post. Basically, it says that Every locally compact group has a measure \(\mu\) satisfying \(\mu(gS) = \mu(S)\) for all group elements \(g\) and measurable sets \(S\), unique up to a positive scalar. For finite groups this is the counting measure. For compact groups the total measure is finite, so we normalize to \(\mu(G) = 1\). Essentially the left-invariance condition means that the measure is uniform across the group, so that we can integrate without worrying about weighting some regions more than others. The proof uses Arzelà-Ascoli and Riesz representation. For noncompact locally compact groups, the construction is harder and uses Tychonoff’s theorem.↩︎
See here or here. If the group is NOT smooth, then the construction is (potentially MUCH) more difficult. For example, the Haar measure on p-adic Lie groups was only explicitly constructed in 2023 (see Aniello et al).↩︎
A semisimple group is a reductive group with finite center. Every reductive group is a product of a semisimple group and a torus, so the classification reduces to classifying semisimple groups. We can use the Killing form \(\kappa(X,Y) = \text{tr}(\text{ad}_X \circ \text{ad}_Y)\) on the Lie algebra to determine if a group is reductive. A group is semisimple if and only if its Killing form is nondegenerate. A group is reductive if and only if the radical of the Killing form is contained in the center. Either way, checking is a finite linear algebra computation (I expect we will see this in the next post). Furthermore, if the group is semisimple, we can identify which particular semisimple group it is by choosing a Cartan subalgebra (the maximal abelian subalgebra of semisimple elements) checking how it acts on the rest of the Lie algebra. The eigenvalues form a root system, which is encoded by a Dynkin diagram. The complete list of connected diagrams is: \(A_n\) (\(SL(n+1)\)), \(B_n\) (\(SO(2n+1)\)), \(C_n\) (\(Sp(2n)\)), \(D_n\) (\(SO(2n)\)), and five exceptional cases (\(E_6, E_7, E_8, F_4, G_2\)). See Milne’s Reductive Groups or Humphreys’s Introduction to Lie Algebras and Representation Theory.↩︎
There’s also a way to see the existence of the Reynolds operator for reductive groups using Weyl’s unitarian trick (1925). The idea is to integrate over the maximal compact subgroup \(K \subset G\) (e.g. \(U(n) \subset GL(n, \mathbb{C})\)) that is “small enough” to integrate over, but “large enough” to determine all invariants (“Zariski-dense”).↩︎
In an original draft of this post I used the Omega process to define transvectants, but I found it to be nonintuitive. Therefore, I am attempting to avoid Omega process language here to make the construction more concrete and less abstract. If you squint hard enough, you can see that the construction is basically the same as the Omega process. If it seems confusing or unmotivated I apologize. I suspect that as I digest invariant theory more (and, most importantly, write implementations), the underlying intuition for how the subject snaps together will become clearer. Unfortunately, the writing of the blog post is happening in parallel with my learning of the subject, so I don’t have the benefit of hindsight to make the exposition as clear as possible.↩︎
The equivariance can be verified using the transformation law for partial derivatives under linear substitution, which we computed in the Omega process section: the gradient transforms as \(\nabla \mapsto A^{-T}\nabla\), so the determinant \(\frac{\partial}{\partial x_1}\frac{\partial}{\partial y_2} - \frac{\partial}{\partial y_1}\frac{\partial}{\partial x_2}\) picks up an additional factor of \(\det(A)^{-1}\), making it equivariant.↩︎
One is to abandon guarantees of completeness or termination. For example, the Derksen ideal approach can still compute rational invariants (the invariant field). Similarly, SAGBI bases can sometimes compute some of the invariants, but neither is guaranteed to terminate. Derksen-Kemper also showed that a finite collection of invariants that distinguishes all orbits of the full ring (the separating set) always exists and can be computed. On the geometric side, Berczi, Doran, and Kirwan have developed a non-reductive GIT for groups whose unipotent radical is “graded” by a 1-parameter subgroup, which was used to prove the Green-Griffiths-Lang conjecture (Berczi-Kirwan, 2024).↩︎
What if we want invariants of something other than polynomial rings? According to Claude, the answer depends on what you’re replacing \(k[V]\) with. For smooth functions, the Schwarz-Mather theorem says the polynomial generators are also smooth generators, we can apply everything we did here. For differential forms, the invariant forms on \(G/H\) are computed by relative Lie algebra cohomology (this is Cartan’s method and underlies Chern-Weil theory). For formal power series (which are relevant for local normal forms near equilibria), Luna’s slice theorem says you can reduce the problem to finding the the stabilizer of the equilibrium point acting on the directions transverse to its orbit. This is the mathematical backbone of symmetric bifurcation theory. For rational functions (Noether’s problem) this is connected to the inverse Galois problem, open in many cases. For noncommutative algebras (e.g. matrices under conjugation), Procesi-Razmyslov says invariants are generated by traces of products. For tensors (e.g. payoff tensors in \(n\)-player games living in \(\bigotimes^n \mathbb{R}^{s_i}\)), polynomial invariant theory applies directly. For combinatorial structures** (e.g. graphs, multilevel population structures), Polya enumeration and symmetric function theory applies.↩︎
What if the field has characteristic \(p\)? If \(G\) is finite and \(|G|\) is coprime to \(p\), everything still works. If \(G\) is finite but \(p\) divides \(|G|\), then the averaging trick fails (since dividing by \(|G|\) is dividing by zero), so there is no Reynolds operator. We would need modular representation theory, where representations can have indecomposable summands that are not irreducible. Computing invariants requires different tools (Claude says transfer maps, Steenrod operations, and explicit constructions specific to each group). If \(G\) is reductive, the invariant ring is still finitely generated (Haboush, 1975), but complete reducibility fails, so again no Reynolds operator and no Molien series. The proofs apparently run through geometric invariant theory (Mumford’s GIT) rather than the algebraic pipeline we develop in this post. Characteristic \(p\) invariant theory arises in coding theory (weight enumerators of codes over \(\mathbb{F}_q\)), cryptography (classifying elliptic curves over \(\mathbb{F}_p\)), and algebraic geometry over finite fields.↩︎
What if it’s not a group acting on the structure? The answer depends on what you’re replacing \(G\) with. Some highlights (Claude once again): For Lie algebras acting by derivations, invariants are elements killed by all derivations (i.e. Casimir invariants). For reductive Lie algebras over characteristic zero this is equivalent to the group picture, but for infinite-dimensional Lie algebras (Virasoro, Kac-Moody) or nilpotent ones, there’s no corresponding group. For Hopf algebras and quantum groups (which generalize both groups and Lie algebras), coinvariants under a coaction are the source of knot invariants like the Jones polynomial. For monoids and semigroups, the theory degrades, since no inverses means no Reynolds operator and no guaranteed finite generation.↩︎
Why does continuous imply Lie? By Hilbert’s fifth problem, solved by Gleason, Montgomery, and Zippin in 1952, we know that every locally compact, locally Euclidean topological group is a Lie group. In fact, any locally compact group acting faithfully on a finite-dimensional manifold is necessarily Lie. Since we have \(G\) acting on a vector space, “continuous” and “Lie” are effectively synonymous for our purposes. Compact connected groups that are not Lie groups do exist (e.g. \(\prod_{n=1}^{\infty} SO(2)\), solenoids), but they cannot act faithfully on finite-dimensional vector spaces. I’ll ignore these pathological cases unless somehow they become important.↩︎
In this case, the construction likely comes from the combinatorics of cosets of open subgroups rather than differential forms. For example, the Haar measure on \(GL(n, \mathbb{Q}_p)\) is well-understood and built from the \(p\)-adic absolute value. For more exotic \(p\)-adic groups the story is harder. Claude dug up explicit constructions for certain cases were only worked out as recently as 2023 (see Aniello et al). Th point is, the topology matters in these cases, which are out of scope for my purposes.↩︎
As we saw, a group is reductive if every finite-dimensional representation decomposes as a direct sum of irreducible representations (equivalently, the unipotent radical is trivial). All finite groups, compact Lie groups, and \(GL(n)\), \(SL(n)\), \(O(n)\), \(Sp(n)\) over characteristic zero are reductive.↩︎
It’s probably no surprise that Noether was thinking about invariants from the algebraic perspective in addition to her work on invariants in physics. Hopefully we will soon understand the deep connection between these perspectives.↩︎
I don’t think I appreciated Hilbert’s contributions until I wrote this post. We are living in his shadow.↩︎