Linear Methods for Learning Dynamical Systems

Machine Learning
Exposition
Published

July 4, 2025

Victor Vasarely. *Vega-Nor*. (1969)

Overview

In this series of posts, I will record some of my notes and code snippets on estimating dynamical system from data1.

Introduction

Let’s say we have a stream of data points:

\[ X = {x_0, x_1, ..., x_n} \]

Assume further that the data points were produced by a (discrete-time)2 dynamical system of form:

\[ x_{t+1} = F(x_t) \]

where \(x_t\) is the state of the system at time \(t\) and \(F\) is the “evolution operator” that relates the current state to the next one.

Given the data, how might we try to approximate the form of \(F\)?

Dynamic Mode Decomposition

In general, \(F\) might be nonlinear. However, if \(F\) was linear, the system would admit a closed-form solution

\[ x_{t+1} \approx Ax_{t} \]

where \(A\) is a matrix and the \(x_i\) are vectors.

Assume each \(x_i\) is a column vector. We can construct two matrices

\[ X = [x_0, ..., x_{n - 1}] \]

and

\[ Y = [x_1, ... x_{n}] \]

Note that \(Y\) is \(X\) shifted over in time. It can be shown3 that the best-fit operator satisfies

\[ A_{opt} = \text{argmin}_{A} || Y - AX ||_{fro} = YX^{\dagger} \]

where \(||\cdot||_{fro}\) is the Frobenius norm and \(X^{\dagger}\) is the pseudoinverse of \(X\). Furthermore, the best rank-\(k\) approximation for \(A_{opt}\) is given by truncating \(U\), \(S\), and \(V\) (keeping the first \(k\) columns).

In essence, we are breaking our state data \(x_t\) down into its dominant modes. That is, we want to compute the spectral decomposition of \(x_t\) such that:

\[ x_t = \sum_{j=0}^k \phi_{j}\lambda_{j}^{t}b_j = \Phi \Lambda b \]

where \(\Phi\) contains the DMD modes (eigenvectors of \(A_{opt}\)), \(\Lambda\) contains the eigenvalues (of \(A_{opt}\)), and \(b\) contains the amplitudes. These give us linear approximations of the (nonlinear) spatial information, growth/decay/oscillation, and relative contributions, respectively. We should keep in mind that the actual system is nonlinear, so this can be quite powerful.

To compute the dominant modes of \(A_{opt}\), we can use the “Dynamic Mode Decomposition” (DMD) algorithm.

DMD Algorithm

There are two basic algorithms to compute the DMD: one is an iterative Arnoldi-based method (easier to analyze theoretically), one is based on SVD (more robust to noise in practice). Since we are going to implement this, let’s look at the SVD-like algorithm:

  1. Compute the SVD of the matrix \(X\).

\[ X = USV^{*} \]

  1. Truncate \(U\),\(S\), and \(V^{*}\) to the desired dimensionality (i.e. keep the first \(k\) rows).

  2. We know

\[ Y \approx AX \]

therefore

\[ YX^{\dagger} \approx A \]

By our SVD, we have

\[ AUSV^{*} \approx Y \]

so

\[ A \approx YVS^{-1}U^{*} \]

However, \(Y\) may have many rows (since we have many data points).

Luckily, we can multiply by \(U^{*}\) on the left and \(U\) on the right such that

\[ U^{*}AU \approx U^{*}YVS^{-1}U^{*}U \]

and since \(U^{*}U = I\),

\[ U^{*}AU \approx U^{*}YVS^{-1} \]

Let’s define \(\tilde{A} := U^{*}AU\). \(\tilde{A}\) is similar to \(A\), so it has the same eigenvalues and eigenvectors4. Thus, we can approximate the eigenvalues and eigenvectors of \(A\) with those of

\[ \tilde{A} \approx U^{*}YVS^{-1} \]

  1. Once we have \(\tilde{A}\)’s eigenvalues and eigenvectors5, we can retrieve the first \(k\) eigenvalues and eigenvectors of the original \(A\) by projecting back into the original space.

We know by SVD that

\[ A \approx YVS^{-1}U^{*} \]

so

\[ AU \approx YVS^{-1} \]

Let’s suppose that we had some eigenvector \(Ux\) in the reduced space. So it’s true that

\[ \tilde{A}(Ux) = \lambda (Ux) \]

Then in the original space (and using our \(AU\)-equation).

\[ A(Ux) = \lambda (Ux) = YVS^{-1}x \]

So any eigenvector in the reduced space corresponds to an eigenvector in the original space \(YVS^{-1}x\). So \(YVS^{-1}W\) should span all the eigenvectors in the original space.

Let’s double check to be sure:

\[ \begin{aligned} A\!\left(YV S^{-1} W\right) &\;\approx\; (YV S^{-1} U^{*})\,(YV S^{-1})W \\[4pt] &= (YV S^{-1})(U^{*}YV S^{-1})W \\[4pt] &= (YV S^{-1})\,\tilde{A}\,W \\[4pt] &= (YV S^{-1} W)\,\Lambda . \end{aligned} \]

So this equation does satisfy the eigenvector equations \(A\Phi = \Phi \Lambda\)6.

  1. We can finally retrieve amplitudes of each eigenvalue/eigenvector pair (i.e. how much each mode contributes to the overall system). We know:

\[ x = \Phi b \]

so we can solve for

\[ b = \Phi^{\dagger} x \]

by least-squares.

Implementation

def dmd(X, Y, r=None):
    U, S, Vt = np.linalg.svd(X, full_matrices=False)
    
    if r is not None and r < len(S):
        U = U[:, :r]
        S = S[:r]
        Vt = Vt[:r, :]

    Atilde = U.conj().T @ Y @ Vt.conj().T @ np.diag(1.0/S)
    eigenvalues, eigenvectors = np.linalg.eig(Atilde)
    modes = Y @ Vt.conj().T @ np.diag(1.0/S) @ eigenvectors

    x1 = X[:, 0]
    amplitudes = np.linalg.lstsq(modes, x1, rcond=None)[0] 

    reconstruction = (modes @ np.diag(amplitudes))
    
    return modes, eigenvalues, amplitudes, reconstruction

Pretty much follows along exactly as described.

Koopman Operator

Any discrete-time dynamical system \(x_{t+1} = F(x_{t})\) induces a “Koopman operator” \(K\) which advances some “observables” \(g(x_t)\) of the state forward in time. That is:

\[ Kg(x_t) := g(F(x_t)) = g(x_{t+1}) \]

with eigenfunctions

\[ K\psi(x_k) = \lambda \psi(x_k) = \psi(x_{k+1}) \]

The significance of this is that, given the Koopman operator, its eigenvalues and eigenfunctions characterize growth/decay/oscillation rates exactly, even for a nonlinear \(F\). We can think of the DMD procedure as assuming the form of \(g\) is \(g(x_t) = x_t\) and fitting \(K\) thusly. This is convenient as it reduces the problem of finding \(K\) to a linear algebra problem.

However, DMD can only capture linear dynamics. In general, the Koopman operator is infinite-dimensional, and its eigenfunctions live in function space, not in the state space. Could we instead find better \(g\) to analyze the behavior of our system?

Extended Dynamic Mode Decomposition

In extended dynamic mode decomposition, we start with a dictionary of potential observables:

\[ \psi(x) = [\psi_1(x), ... \psi_m(x)] \]

We then evaluate the dictionary on the data to produce two matrices, \(\Psi_X\) and \(\Psi_Y\):

\(\Psi(X) = [\psi(x_0),\psi(x_1),...,\psi(x_{n-1})]\) \(\Psi(Y) = [\psi(x_1),\psi(x_2),...,\psi(x_{n})]\)

Now this reduces back to the original DMD problem:

\[ K_{EDMD} = \Psi_Y\Psi_X^{\dagger} \]

With the trivial dictionary \(\psi(x) = x\) we have regular DMD.

How do we choose a good dictionary? We can use polynomials, Fourier modes, Gaussian kernels etc. We now have a machine learning problem, so we need to validate on hold trajectories, regularize, etc.

Implementation

Here’s a simple implementation of EDMD.

def edmd(X, Y, basis, r=None):
    obs_X = np.vstack([f(X) for f in basis])        
    obs_Y = np.vstack([f(Y) for f in basis])        
    
    U, S, Vt = np.linalg.svd(obs_X, full_matrices=False)
    
    if r is not None and r < len(S):
        U = U[:, :r]
        S = S[:r]
        Vt = Vt[:r, :]
    
    S_inv_diag = np.diag(1.0 / S)

    K = U.conj().T @ obs_Y @ Vt.conj().T @ S_inv_diag 
    eig_vals, eig_vecs = np.linalg.eig(K)
    eig_vecs_K = U @ eig_vecs

    modes = obs_X.T @ eig_vecs_K
    return K, eig_vals, eig_vecs, modes, eig_vecs_K

Experiments

Let’s try using these methods on an example problem. Let’s take a look at the heat equation:

\[ \frac{\partial u}{\partial t} = \alpha\nabla^{2}u \]

Discretize in time and space. Let \(u_{i,j}^{(n)}\) be the temperature at point \((i,j)\) and time \(t\).

\[ u_{i,j}^{(n)}, \qquad 0 \le i \le N-1,\; 0 \le j \le M-1 \]

Also let

\[ r = \frac{\alpha\,\Delta t}{(\Delta x)^2} \]

Now we can determine the temperature at time \(n+1\) as a function of the nearby points at the previous \(n\):

\[ \begin{aligned} u_{i,j}^{(n+1)} &= u_{i,j}^{(n)} + r\Bigl( u_{i+1,j}^{(n)} + u_{i-1,j}^{(n)} + u_{i,j+1}^{(n)} + u_{i,j-1}^{(n)} - 4\,u_{i,j}^{(n)} \Bigr), \\[4pt] &\qquad 1 \le i \le N-2,\;\; 1 \le j \le M-2 . \end{aligned} \]

Given boundary conditions: \[ u_{i,j}^{(n+1)} = u_{i,j}^{(n)}, \qquad (i,j)\in\partial\Omega \]

Let’s implement this function in code:

def heat_diffusion_step_fn(u, r=0.001):
    u_new = u.copy()
    u_new[1:-1,1:-1] = (
        u[1:-1,1:-1] +
        r * (u[2:,1:-1] + u[:-2,1:-1] + u[1:-1,2:] + u[1:-1,:-2] - 4 * u[1:-1,1:-1])
    )
    return u_new
def heat_diffusion(time, m=50, n=50, alpha=0.01, dt=0.1, dx=1.0):
    
    # Discretization
    steps = int(time / dt)
    
    # Stability condition (not enforced but usually should be)
    r = alpha * dt / dx**2
    if r > 0.25:
        print("Warning: Scheme may be unstable (r > 0.25)")

    # Initial condition: hot spot in center
    u = np.zeros((m, n))
    u[m//2, n//2] = 100

    # Time stepping
    for _ in range(steps):
        u = heat_diffusion_step_fn(u)

    return u

Note that the code also defines an initial condition.

DMD

Now that we have a way to produce data, we can run dmd on it:

def collect_snapshots(step_fn, z0, num_steps):
    m = z0.size  
    X = np.zeros((m, num_steps))
    Y = np.zeros((m, num_steps))
    
    state = z0.copy()
    for i in range(num_steps):
        X[:, i] = state.flatten()
        next_state = step_fn(state)
        Y[:, i] = next_state.flatten()
        state = next_state

    return X, Y

z0 = np.zeros((10, 10))
z0[10//2, 10//2] = 100
X, Y = collect_snapshots(heat_diffusion_step_fn, z0, 1000)
modes, eigenvalues, amplitudes, reconstruction = dmd(X, Y, r=5)
print("DMD eigenvalues:", eigenvalues)

The result we get is

DMD eigenvalues: [0.99269705 0.99490437 0.99682663 0.99972148 0.99862672]

All eigenvalues close to, but less than, one, suggesting slow decay (as expected).

EDMD

For edmd, we also need to create a dictionary. For example:

def polynomial_basis(x_dim, degree=2):
    basis = []
    symbolic = []
    
    basis.append(lambda x: np.ones((1, x.shape[1])))
    symbolic.append("1")
 
    for d in range(1, degree + 1):
        for combo in combinations_with_replacement(range(x_dim), d):
            basis.append(lambda x, combo=combo: np.prod(np.array([x[i:i+1, :] for i in combo]), axis=0))
            term = "*".join([f"x_{i}" for i in combo])
            symbolic.append(term)
    
    return basis, symbolic

Now we run the code the same way:

X, Y = collect_snapshots(heat_diffusion_step_fn, z0, 150)
basis, symbolic = polynomial_basis(X.shape[0], degree=1)
K, eig_vals, eig_vecs, modes, eig_vecs_K = edmd(X, Y, basis, r=5) 
print("EDMD eigenvalues:", eig_vals)

The eigenvalues we receive are:

EDMD eigenvalues: [0.99962953 0.99817247 0.99613991 0.99244467 0.99410837] 

Conclusion

In the next post in this series, I will explore symbolic methods for fitting dynamical systems.

Footnotes

  1. As part of general exploration of model discovery and control.↩︎

  2. Typically, we also assume that the data are evenly-spaced in time.↩︎

  3. By the Eckhart-Young-Mirsky theorem. See https://arxiv.org/abs/1704.02343, and the Kutz and Brunton book Data-Driven Science and Engineering.↩︎

  4. True for all similar matrices. We do this because it’s lower rank and thus easier to compute.↩︎

  5. Glossing over this step. We can use out-of-the-box methods for this.↩︎

  6. This is the “exact” DMD method (standard). There’s also the older “projected DMD”, which uses the relation \(\Phi = UW\). Exact DMD is more reliable; because we started by working backwards from the eigenvectors \(W\) of the projected space, we are guaranteed that the eigenvectors \(\Phi\) correspond one-to-one to eigenvectors \(W\). Projected DMD solves the same eigenproblem but keeps the modes in the reduced subspace. Because projected DMD discards directions orthogonal to \(U\), an eigenvector of \(\Phi\) can correspond to energy that never existed in the truncated basis, which may lead to spurious growth/decay when you reconstruct the dynamics. See https://arxiv.org/abs/1312.0041.↩︎