def finite_difference(X, dt):
= X.shape
n_vars, n_samples
# Use central differences where possible, forward/backward at boundaries
= np.zeros((n_samples, n_vars))
dXdt
# Forward difference at first point
0, :] = (X[:, 1] - X[:, 0]) / dt
dXdt[
# Central differences for interior points
for i in range(1, n_samples - 1):
= (X[:, i + 1] - X[:, i - 1]) / (2 * dt)
dXdt[i, :]
# Backward difference at last point
-1, :] = (X[:, -1] - X[:, -2]) / dt
dXdt[
return dXdt
Introduction
In the last post I looked at DMD and EDMD, two methods for analyzing dynamical systems. Both methods look at pairs of snapshots \((x_k, x_{k+1})\) and fit a linear update rule that best predicts the next snapshot. By inspecting that linear map’s eigenvalues and eigenvectors we can learn which patterns dominate and how fast they grow or decay.
Unfortunately, both of these methods rely on a set of features you have chosen in advance. Without the right features you might miss important physics; too many features and the model will become unwieldy. Additionally, while the spectral methods provide some useful insight, they might not have the most easily interpretable output.
What we need is a sparse method, so that we can test a wide number of features and only select the relevant ones for the dynamics, ideally to recover an actual set of interpretable governing equations.
Sparse Identification of Nonlinear Dynamical Systems (SINDy)
SINDy creates a library of candidate basis functions for the eigenfunctions (like EDMD), then does an L1-regularized1 regression to determine which ones to use. There are two formulations: discrete-time and continuous-time.
Discrete Time
This setup is the same as in DMD and EDMD. We have a discrete-time dynamical system of form:
\[ x_{t+1} = F(x_t) \]
We have our data stream, which we use to create our matrix \(X\):
\[ \mathbf{X} = \begin{bmatrix} \mathbf{x}^T(t_0) \\ \mathbf{x}^T(t_1) \\ \vdots \\ \mathbf{x}^T(t_{m-1}) \end{bmatrix} = \begin{bmatrix} x_1(t_0) & x_2(t_0) & \cdots & x_n(t_0) \\ x_1(t_1) & x_2(t_1) & \cdots & x_n(t_1) \\ \vdots & \vdots & \ddots & \vdots \\ x_1(t_{m-1}) & x_2(t_{m-1}) & \cdots & x_n(t_{m-1}) \end{bmatrix} \]
and its time-shifted counterpart \(Y\):
\[ \mathbf{Y} = \begin{bmatrix} \mathbf{x}^T(t_1) \\ \mathbf{x}^T(t_2) \\ \vdots \\ \mathbf{x}^T(t_m) \end{bmatrix} = \begin{bmatrix} x_1(t_1) & x_2(t_1) & \cdots & x_n(t_1) \\ x_1(t_2) & x_2(t_2) & \cdots & x_n(t_2) \\ \vdots & \vdots & \ddots & \vdots \\ x_1(t_m) & x_2(t_m) & \cdots & x_n(t_m) \end{bmatrix} \]
Now, we need some “library matrix” of relevant basis functions applied to the data (similar to EDMD). This may look something like:
\[ \mathbf{\Theta}(\mathbf{X}) := [\theta_1(X), \theta_2(X),...,\theta_{\ell}(X)] \]
This matrix can be quite large. For example, if the basis functions are drawn from combinations of the monomials \(x_1\), \(x_2\), and \(x_3\) (up to quadratic), we would have something like:
\[ \Theta(X) = \begin{bmatrix} | & | & | & | & | & | & & | \\ 1 & x_1 & x_2 & x_3 & x_1^2 & x_1x_2 & \cdots & x_3^2 \\ | & | & | & | & | & | & & | \end{bmatrix} \]
We are looking for a sparse set of coefficients \(\mathbf{\Xi}\) such that
\[ \mathbf{\Xi} = \begin{bmatrix} | & | & & | \\ \xi_1 & \xi_2 & \cdots & \xi_n \\ | & | & & | \end{bmatrix} \]
This leads us to the SINDy equation:
\[ \mathbf{Y} = \mathbf{\Theta}(\mathbf{X}) \mathbf{\Xi} \]
Each basis function is weighted by a \(\xi_i\). To find the \(\xi_i\), we run a sparse \(L1\)-regularized regression:
\[ \Xi_{opt} = \text{arg }\text{min}_{\Xi} ||\mathbf{Y} - \mathbf{\Theta}(\mathbf{X})\mathbf{\Xi} ||_{2} + \mu||\mathbf{\Xi}||_{1} \]
where \(\mu\) is a parameter controlling the strength of the regularization.
To see the mapping at a single time-step, take the \(k\)-th row of \(\Theta(X)\):
\[ \boxed{\;x^{(k+1)\top} = \Theta\!\bigl(x^{(k)}\bigr)\,\Xi\;}, \quad k=0,\dots,m-1, \]
where
\[ \Theta(x^{(k)})=[\theta_1(x^{(k)}),\dots,\theta_L(x^{(k)})]\in\mathbb R^{1\times L} \]
Continuous Time
A more typical formulation is to assume a dynamical system of the form
\[ \dot{x} = \frac{dx}{dt} = f(x(t)) \]
where \(x\) is the state (possibly a vector) and \(t\) is the time. We have moved from discrete-time to continuous time. We now seek to learn the vector field \(f\), which relates the current state to the rate of change in the state, rather than the next state. Luckily, as we describe in the previous post the Koopman eigenfunctions satisfy:
\[ \frac{d}{dt}\phi(x) = K\phi(x) = \lambda \phi(x) \]
By the chain rule, we also have
\[ \frac{d}{dt}\phi(x) = \nabla \phi(x) \cdot \dot{x} = \nabla \phi(x) \cdot f(x) \]
Combining the two, we have
\[ \nabla \phi(x) \cdot f(x) = \lambda \phi(x) \]
So we can approximate the eigenfunctions via regression2. Approximate \(f(x)\) by a sparse weighted sum of dictionary elements:
\[ f(x)\;=\;\sum_{j=1}^{L}\xi_{j}\,\theta_{j}(x)\;=\;\mathbf{\Theta}(x)\,\mathbf{\Xi} \]
This gives:
\[ \nabla\phi(x)\,\cdot\,\mathbf{\Theta}(x)\,\mathbf{\Xi} \;=\;\lambda\,\phi(x) \]
Evaluating at the sample states:
\[ \nabla\phi\!\bigl(\mathbf{x}^{(i)}\bigr)\, \cdot\,\mathbf{\Theta}\!\bigl(\mathbf{x}^{(i)}\bigr)\, \mathbf{\Xi} \;=\; \lambda\, \phi\!\bigl(\mathbf{x}^{(i)}\bigr), \quad i=1,\dots,m \]
Now we have
\[ \mathbf{\dot X} \;=\; \mathbf{\Theta}(\mathbf{X})\,\mathbf{\Xi}, % (5) \quad\text{where} \; \mathbf{\dot X} = \begin{bmatrix} \dot{\mathbf x}^{(1)\!\top}\\ \dot{\mathbf x}^{(2)\!\top}\\ \vdots\\ \dot{\mathbf x}^{(m)\!\top} \end{bmatrix},\; \mathbf{\Theta}(\mathbf{X}) = \begin{bmatrix} \mathbf{\Theta}(x^{(1)})\\ \mathbf{\Theta}(x^{(2)})\\ \vdots\\ \mathbf{\Theta}(x^{(m)}) \end{bmatrix} \]
and so we can solve the continuous-time problem with the following regression
\[ \mathbf{\Xi}_{opt} \;=\; \operatorname*{arg\,min}_{\mathbf{\Xi}} \Bigl\| \mathbf{\dot X} - \mathbf{\Theta}(\mathbf{X})\,\mathbf{\Xi} \Bigr\|_{2} \;+\; \mu\,\|\mathbf{\Xi}\|_{1} \]
Conveniently, exactly the same as the previous regression equation, except here we are approximating:
\[ \frac{dx}{dt} = f(x(t)) = \mathbf{\Theta}(x)\mathbf{\Xi} \]
Implementation
To implement SINDy, we need a library matrix function and a method of handling the regression. We may also want a finite differences method, to calculate derivatives.
Finite Differences
We will use finite differences3 to approximate the time derivative by dividing small state increments by the timestep \(\Delta t\).
At an interior point \(t_i\) we use the second-order central stencil
\[
\dot x(t_i)\;\approx\;\frac{x(t_{i+1}) - x(t_{i-1})}{2\,\Delta t},
\]
which cancels the first-order truncation error, yielding \(\mathcal{O}(\Delta t^{2})\) accuracy.
At the boundaries we fall back to the first-order forward and backward stencils: \[ \dot x(t_0)\;\approx\;\frac{x(t_1)-x(t_0)}{\Delta t}, \qquad \dot x(t_{m-1})\;\approx\;\frac{x(t_{m-1})-x(t_{m-2})}{\Delta t}. \]
The helper above implements exactly this scheme: it accepts a state matrix of shape \((n_{\text{vars}}, n_{\text{samples}})\), returns the derivative matrix \((n_{\text{samples}}, n_{\text{vars}})\), and also transposes \(X\) so rows correspond to time snapshots for the subsequent regression.
Library Matrix
Next we build the candidate feature matrix \(\Theta\) by stacking a constant term, all monomials up to poly_order, and optionally sine/cosine transforms of each state variable. Returns that matrix along with a parallel descriptions list so the sparse coefficients can later be mapped back to human-readable terms.
def sindy_library(X, poly_order=3, include_sine=False, include_cosine=False):
= X.shape
n_vars, n_samples
# Start with constant term
= [np.ones(n_samples)]
library_functions = ['1']
descriptions
# Add polynomial terms
for order in range(1, poly_order + 1):
for combo in combinations_with_replacement(range(n_vars), order):
if order == 1:
= combo[0]
var_idx
library_functions.append(X[var_idx, :])f'x_{var_idx}')
descriptions.append(else:
= np.ones(n_samples)
term = []
term_desc for var_idx in combo:
*= X[var_idx, :]
term f'x_{var_idx}')
term_desc.append(
library_functions.append(term)'*'.join(term_desc))
descriptions.append(
# Add trigonometric terms if requested
if include_sine:
for i in range(n_vars):
library_functions.append(np.sin(X[i, :]))f'sin(x_{i})')
descriptions.append(
if include_cosine:
for i in range(n_vars):
library_functions.append(np.cos(X[i, :]))f'cos(x_{i})')
descriptions.append(
= np.column_stack(library_functions)
Theta
return Theta, descriptions
Sequential Threshold Least Squares
We will use Sequential Threshold Least Squares to obtain the full coefficient matrix.
At each iteration \(k\):
Set every entry with \(|\xi_{ij}^{(k)}| < \lambda_{reg}\) to zero, forcing small terms out of the model4.
Refit for each state component \(j\), solve a new least–squares problem using only the surviving (non-zero) columns of \(\Theta\) to get updated weights.
Repeat until convergence or for at most
max_iter
cycles; the final \(\Xi\) contains only the terms that remain large after repeated shrink-and-refit, giving a sparse governing equation.
def sequential_threshold_least_squares(Theta, dXdt, lambda_reg=0.1, max_iter=10):
= dXdt.shape[1]
n_states
# Initialize coefficient matrix
= np.linalg.lstsq(Theta, dXdt, rcond=None)[0]
Xi
# Iterative thresholding
for iteration in range(max_iter):
# Find small coefficients to remove
= np.abs(Xi) < lambda_reg
small_inds
# Set small coefficients to zero
= 0
Xi[small_inds]
# Identify active (non-zero) coefficients for each state
for i in range(n_states):
= ~small_inds[:, i]
big_inds if np.any(big_inds):
# Recompute non-zero coefficients using least squares
= np.linalg.lstsq(
Xi[big_inds, i] =None
Theta[:, big_inds], dXdt[:, i], rcond0]
)[else:
= 0
Xi[:, i]
return Xi
print_equations
I’ll add one bonus helper function, which will be useful later
def print_equations(Xi, descriptions, feature_names=None):
= Xi.shape
n_functions, n_features
if feature_names is None:
= [f'x_{i}' for i in range(n_features)]
feature_names
for i in range(n_features):
# Build equation string
= []
terms for j in range(n_functions):
= Xi[j, i]
coef if abs(coef) > 1e-10: # Only include non-zero terms
if abs(coef - 1.0) < 1e-10:
f"{descriptions[j]}")
terms.append(elif abs(coef + 1.0) < 1e-10:
f"-{descriptions[j]}")
terms.append(else:
f"{coef:.6f}*{descriptions[j]}")
terms.append(
if terms:
= " + ".join(terms).replace(" + -", " - ")
equation print(f"d{feature_names[i]}/dt = {equation}")
else:
print(f"d{feature_names[i]}/dt = 0")
print()
Integration
Putting the piece together:
def sindy(X, dXdt, poly_order=3, lambda_reg=0.1, include_sine=False,
=False, max_iter=10):
include_cosine
# Build library of candidate functions
= sindy_library(
Theta, descriptions
X, =poly_order,
poly_order=include_sine,
include_sine=include_cosine
include_cosine
)
# Sparse regression using sequential thresholded least squares
= sequential_threshold_least_squares(Theta, dXdt, lambda_reg, max_iter)
Xi
return Xi, descriptions
For a production version you may want to factor the library production out of the SINDy method.
Experiments
Let’s generate data from few known systems and see if SINDy can recover their governing equations.
Simple Harmonic Oscillator
Definition
The simple harmonic oscillator is described by the following second-order ODE:
\[ x'' + \omega^2 x = 0 \]
If we let: \[ \begin{align} x_0 &:= x \\ x_1 &:= \frac{dx}{dt} \end{align} \]
We can then derive
\[ \begin{align} \frac{dx_0}{dt} &= \frac{dx}{dt} = x_1 \\ \frac{dx_1}{dt} &= - \omega^2 x_0 \end{align} \]
So it has an alternative formulation as a set of coupled first-order ODEs.
Approximation
The simple harmonic oscillator has a well-known analytic solution: \[ x = \cos(\omega t) \]
Let
def generate_simple_harmonic_oscillator_data(omega=2.0, t_final=10, dt=0.01):
= np.arange(0, t_final, dt)
t
# Analytical solution
= np.cos(omega * t)
x = -omega * np.sin(omega * t)
xdot
= np.vstack([x, xdot])
X_train
return X_train
This will give us data for the harmonic oscillator5.
Now we run SINDy on this data:
= generate_simple_harmonic_oscillator_data()
X = finite_difference(X, dt)
dXdt = sindy(X, dXdt, poly_order=1, lambda_reg=0.1)
Xi, desc =['x_0', 'x_1']) print_equations(Xi, desc, feature_names
We discover the following equations:
dx_0/dt = 0.999925*x_1
dx_1/dt = -3.999764*x_0
Which is close to the true generating equations:
\[ \begin{align} \frac{dx_0}{dt} &= x_1 \\ \frac{dx_1}{dt} &= -4 x_0 \end{align} \]
Lorenz System
Definition
The Lorenz-63 model is a three-dimensional ODE derived from a truncated Fourier expansion of the Boussinesq equations for thermal convection. In its nondimensional form the state \(\mathbf{x}=(x,y,z)^{\!\top}\) evolves as
\[ \begin{aligned} \dot x &= \sigma \,(y - x), \\ \dot y &= x \,(\rho - z) - y, \\ \dot z &= x\,y - \beta\,z, \end{aligned} \]
Let us recover this equation from data.
Approximation
def generate_lorenz_data(initial_conditions, sigma=10, rho=28, beta=8/3, t_final=10, dt=0.01):
def lorenz_rhs(state, sigma=sigma, rho=rho, beta=beta):
= state
x, y, z return np.array([
* (y - x),
sigma * (rho - z) - y,
x * y - beta * z
x
])
# Generate training data
= np.arange(0, t_final, dt)
t_train = len(t_train)
n_steps
= np.zeros((3, n_steps))
X_train 0] = initial_conditions
X_train[:,
for i in range(1, n_steps):
= lorenz_rhs(X_train[:, i-1])
k1 = lorenz_rhs(X_train[:, i-1] + dt/2 * k1)
k2 = lorenz_rhs(X_train[:, i-1] + dt/2 * k2)
k3 = lorenz_rhs(X_train[:, i-1] + dt * k3)
k4 = X_train[:, i-1] + dt/6 * (k1 + 2*k2 + 2*k3 + k4)
X_train[:, i]
return X_train
Following the same steps:
= [-8, 8, 27]
initial_conditions = generate_lorenz_data(initial_conditions)
X
= finite_difference(X, dt)
dXdt = sindy(X, dXdt, poly_order=2, lambda_reg=0.05)
Xi, desc
=['x', 'y', 'z']) print_equations(Xi, desc, feature_names
We discover the following equations:
dx/dt = -9.971816*x_0 + 9.972789*x_1
dy/dt = 27.823397*x_0 - 0.970096*x_1 - 0.994849*x_0*x_2
dz/dt = -2.658203*x_2 + 0.996862*x_0*x_1
Which is once again pretty close to the actual generating equations (rewritten): \[ \begin{align} \frac{dx_0}{dt} &= -10x_0 + 10x_1 \\ \frac{dx_1}{dt} &= 28x_0 - x_1 - x_0x_2 \\ \frac{dx_2}{dt} &= -\frac{8}{3}x_2 + x_0x_1 \end{align} \]
Conclusion
We used SINDy to recover governing equations for two toy problems.
Further Reading
Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control.
Discovering governing equations from data by sparse identification of nonlinear dynamical systems.
Data-driven discovery of coordinates and governing equations.
PySINDy – Sparse Identification of Nonlinear Dynamics in Python — open-source implementation.
Footnotes
Also known as “LASSO” in the ML literature. I’ve seen some sources on SINDy do an L0 regularization here, instead; there are many variants of SINDy. You’ll want to follow the typical machine learning best practices (hold-out data, etc.) to fit the best model.↩︎
Assumes the dynamics are both continuous and differentiable. Some systems are more easily represented in the continuous time format than the discrete-time. In the Brunton & Kutz book they also give an alternative method using Laurent series, and some extensions.↩︎
In production, the error from finite differences is too large and may cause SINDy to blow up. Alternatives like smoothed finite differences are available.↩︎
Despite the similar notation \(\lambda_{reg}\) is unrelated to the Koopman eigenvalues.↩︎
This is toy data. In real life, you should probably whiten, non-dimensionalize, etc. your data.↩︎