Hamiltonian Mechanics
This post introduces Hamiltonian mechanics using the formalism of differential geometry, culminating in Noether's theorem, which describes the correspondence between certain symmetries and conservation laws.
There are three standard approaches to classical mechanics, each named after their respective originator:
- Newtonian mechanics - after Sir Isaac Newton,
- Lagrangian mechanics - after Joseph-Louis Lagrange,
- Hamiltonian mechanics - after Sir William Rowan Hamilton.
Newtonian mechanics is based on the concept of force, i.e., actions which cause an object's velocity to change, i.e., actions which cause acceleration, and is expressed via Newton's laws of motion.
Lagrangian mechanics is based on the principal of stationary action (or least action) and seeks to compute a stationary point (typically a minimum) of an action functional which typically expresses the total difference between kinetic and potential energy of a system.
Hamiltonian mechanics is based on conservation (of the so-called Hamiltonian function, which typically represents total energy and thus expresses the conservation of energy).
Each is equivalent insofar as they derive the same equations of motion, yet each has certain advantages in various contexts.
To compare and contrast these approaches, we will derive the one-dimensional equation of motion for a simple harmonic oscillator using each of them.
Newtonian Mechanics
We will first give an example of the formulation of the equation of motion of a simple harmonic oscillator using Newtonian mechanics. For more information, see this post.
Newton's second law of motion states that the force acting on an object is precisely the change in its momentum with respect to time. Writing \(x\) for the (one-dimensional, e.g. vertical) position of an object (treated as a point mass) as a function of time \(t \in \mathbb{R}\) and \(m\) for the (inertial) mass of the object, the momentum is thus expressed as a function of time \(p\) as
\[p = m \cdot \dot{x}\]
where the velocity \(v = \dot{x}\) is
\[\dot{x} = \frac{dx}{dt}.\]
Newton's second law is thus codified as
\begin{align*}F &= \dot{p} \\&= \frac{d}{dt}\left(m \cdot \frac{dx}{dt}\right) \\&= m \cdot \frac{d}{dt}\left(\frac{dx}{dt}\right) \\&= m \cdot \frac{d^2x}{dt^2} \\&= m \cdot \ddot{x} \\&= ma\end{align*}
where \(a = \ddot{x}\) is the acceleration of the object (i.e., the rate of change of its velocity with respect to time).
Consider the example of an object suspended from a horizontal ceiling by a spring, as in the following figure.

Hooke's law indicates that the restorative force which restores the spring to its equilibrium position is directly proportional to and opposite the displacement (change in vertical position) of the object from the equilibrium position, which is expressed mathematically for some constant of proportionality \(k \in \mathbb{R}\) as
\[F_r = -k \cdot x.\]
Assuming for the sake of simplicity that this is the only force acting upon the object (and neglecting the mass of the spring itself, etc.), Newton's second law of motion thus implies that
\[F = F_r,\]
i.e., that
\[m\ddot{x} = -k x.\]
This is equivalent to the second-order homogeneous ordinary differential equation
\[m \ddot{x} + k x = 0\]
which has solutions
\[x(t) = c_1 e^{i \omega t} + c_2 e^{-i \omega t}\]
where \(\omega = \sqrt{k/m}\) is the angular frequency and \(c_1, c_2 \in \mathbb{C}\).
Using Euler's formula and trigonometric identities, this is often expressed for constants \(a,b \in \mathbb{C}\) as
\[x(t) = a \cos(\omega t) + b \sin(\omega t).\]
The constants can be computed using initial conditions, for instance, \(x(0) = d\) for some initial displacement \(d \in \mathbb{R}\) and \(\dot{x}(0) = 0\). These particular initial conditions imply that \(a = d\) and \(b = 0\), so that the equation of motion is
\[x(t) = d \cdot \cos(\omega t).\]
Once \(k\) is determined, this completely specifies the equation of motion. Thus, the motion is that of an infinitely oscillating sinusoidal wave of fixed amplitude.
Lagrangian Mechanics
Lagrangian mechanics is based on the stationary action principle (a.k.a. the least action principle) which indicates that the equation of motion is derived as a stationary point (critical point) of a so-called action functional. In this approach, one specifies a so-called Lagrangian function \(\mathcal{L}\) which is typically expressed as the difference between the kinetic energy \(T\) and potential energy \(V\) of a system. Often, the Lagrangian is a function of time, position, and velocity, and thus
\[\mathcal{L}(t, x(t), \dot{x}(t)) = T(t, x(t), \dot{x}(t)) - V(t, x(t), \dot{x}(t)).\]
In other words, \(\mathcal{L} : \mathbb{R}^3 \rightarrow \mathbb{R}\) is the map \((y_1, y_2, y_3) \mapsto L(y_1, y_2, y_3)\) and \(\mathcal{L}(t, x(t), \dot{x}(t)) : \mathbb{R} \rightarrow \mathbb{R}\) denotes the composite map \(t \mapsto (t, x(t), \dot{x}(t)) \mapsto \mathcal{L}(t, x(t), \dot{x}(t))\).
The action functional is typically expressed as the integral of the Lagrangian over some domain \([t_0, t_1]\) with respect to the parameter \(x\):
\[\mathcal{S}[x] = \int_{t_0}^{t_1} \mathcal{L}(t, x(t), \dot{x}(t))~dt.\]
The equation of motion \(x\) is then a stationary point of the action functional, i.e.,
\[\delta \mathcal{S}_x = 0\]
where \(\delta \mathcal{S}_x\) is the total differential (i.e., the Fréchet differential) of the functional \(\mathcal{S}\) at the point \(x\) (see this post and this post for more information).
By definition, the functional derivative of the functional \(\mathcal{S}\) with respect to the function \(x\) is a function \((\delta \mathcal{S}/\delta x)\) such that, for all \(\varphi \in C^{1}(\mathbb{R})\),
\[\delta \mathcal{S}_x[\varphi] = \int_{t_0}^{t_1} \frac{\delta \mathcal{S}}{\delta x}(t) \cdot \varphi(t)~dt.\]
The fundamental theorem of the calculus of variations indicates that, under appropriate boundary conditions (i.e., when restricted to those \(\varphi\) such that \(\varphi(t_0) = 0\) and \(\varphi(t_1) = 0\)), this expression is \(0\) precisely when
\[\frac{\delta \mathcal{S}}{\delta x} = 0\]
and the Euler-Lagrange formula indicates that, for functionals of the form of \(\mathcal{S}\),
\[\frac{\delta \mathcal{S}}{\delta x}(t) = \frac{\partial \mathcal{L}}{\partial y_2}(t, x(t), \dot{x}(t)) - \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial y_3}(t, x(t), \dot{x}(t)),\]
and thus we seek a solution \(x\) to the differential equation
\[\frac{\partial \mathcal{L}}{\partial y_2}(t, x(t), \dot{x}(t)) - \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial y_3}(t, x(t), \dot{x}(t)) = 0.\]
Thus, the Lagrangian formulation of mechanics may be interpreted as one which derives an equation of motion which "optimizes" the total difference between kinetic and potential energy of a system, where the optimum is a critical point (which is typically a minimum, though it is important to note that critical points do not necessarily represent extrema).
Returning to the example of a simple harmonic oscillator in one dimension, the kinetic energy is
\[T(t, x(t), \dot{x}(t)) = \frac{1}{2} m \dot{x}^2(t)\]
and the potential energy is
\[V(t, x(t), \dot{x}(t)) = \frac{1}{2} k x^2(t)\]
so that the Lagrangian function is
\[\mathcal{L}(t, x(t), \dot{x}(t)) = \frac{1}{2} m \dot{x}^2(t) - \frac{1}{2} k x^2(t).\]
Thus, since \(\mathcal{L}\) is the function
\[\mathcal{L}(y_1, y_2, y_3) = \frac{1}{2} m y_3^2 - \frac{1}{2} k y_2^2\]
we compute
\[\frac{\partial \mathcal{L}}{\partial y_2}(t, x(t), \dot{x}(t)) = -kx(t)\]
and
\[\frac{\partial \mathcal{L}}{\partial y_3}(t, x(t), \dot{x}(t)) = m \dot{x}\]
so that
\[\frac{d}{dt}\frac{\partial \mathcal{L}}{\partial y_3}(t, x(t), \dot{x}(t)) = m \ddot{x}\]
and hence the Euler-Lagrange equation is
\[-kx(t) - m \ddot{x}(t) = 0.\]
Thus, the equation of motion must satisfy the differential equation
\[m \ddot{x} + k x = 0,\]
which is precisely the same equation from the Newtonian example.
Although in this example the Newtonian formulation is much simpler, it is often not possible to express certain problems using the Newtonian formulation whereas the Lagrangian formalism provides a direct expression. For instance, see this post for an example using the Lagrangian formalism to derive the field equations of general relativity from the Einstein-Hilbert action.
Hamiltonian Mechanics
The Hamiltonian approach to classical mechanics replaces the second-order equation \(F = m \ddot{x}\) of Newtonian mechanics with a system of first order equations called Hamilton's equations defined as follows:
- \(\frac{dx}{dt}(t) = \frac{\partial \mathcal{H}}{\partial y_2}(x(t), p(t))\), and
- \(\frac{dp}{dt}(t) = -\frac{\partial \mathcal{H}}{\partial y_1}(x(t), p(t))\).
Here, \(\mathcal{H} : \mathbb{R}^2 \rightarrow \mathbb{R}\) is the Hamiltonian function. The Hamiltonian function is often defined as the total energy of the system, i.e., the sum of the kinetic energy \(T\) and potential energy \(V\) of the system
\[\mathcal{H}(x(t), p(t)) = T(x(t), p(t)) + V(x(t), p(t))\]
where \(\mathcal{H}\) is the map \((y_1, y_2) \mapsto \mathcal{H}(y_1, y_2)\) and \(x\) is the map \(t \mapsto x(t)\) representing position as usual and \(p\) is the map \(t \mapsto p(t)\) representing momentum.
If we define the kinetic energy as
\[T(x(t), p(t)) = \frac{1}{2m}(p(t))^2 = \frac{1}{2}m(\dot{x}(t))^2\]
and if we take the Hamiltonian function to be the following function
\[\mathcal{H}(x(t), p(t)) = T(x(t), p(t)) + V(x(t), p(t)) = \frac{1}{2m} p^2(t) + V(x(t), p(t)),\]
then we recover Newton's second law from Hamilton's equations, since Hamilton's equations imply the following:
- \(\dot{x}(t) = \frac{\partial \mathcal{H}}{\partial y_2}(x(t), p(t)) = \frac{p}{m}\), and
- \(\dot{p}(t) = -\frac{\partial \mathcal{H}}{\partial y_1}(x(t), p(t)) = -\frac{\partial V}{\partial y_1}(x(t), p(t))\).
The first equation is simply the definition of momentum, i.e. \(p = m\dot{x}\). If we assume that the force \(F\) is conservative meaning that \(F = -(\partial V/\partial y_1)\), then the second equation yields \(\dot{p} = F\), which is Newton's second law.
If we apply the chain rule for partial derivatives, then, for a function \(f(x(t), p(t)) : \mathbb{R} \rightarrow \mathbb{R}\), i.e. the composition \(f \circ g\) of the map \(f = (z_1, z_2) \mapsto f(z_1, z_2)\) and the map \(g = t \mapsto (x(t), p(t))\), we compute
\begin{align*}\frac{d}{dt}(f(x(t), p(t))) &= \frac{d(f \circ g)}{dt} \\&= \frac{\partial f}{\partial z_1}(x(t), p(t)) \frac{\partial x}{\partial t}(t) + \frac{\partial f}{\partial z_2}(x(t), p(t))\frac{\partial p}{\partial t}(t) \\&= \frac{\partial f}{\partial z_1}(x(t), p(t)) \frac{\partial \mathcal{H}}{\partial y_2}(x(t), p(t)) - \frac{\partial f}{\partial z_2}(x(t), p(t))\frac{\partial \mathcal{H}}{\partial y_1}(x(t), p(t)).\end{align*}
We thus define the Poisson bracket for any two functions \(f,g : \mathbb{R}^2 \rightarrow \mathbb{R}\) as follows:
\[\{f,g\} = \frac{\partial f}{\partial z_1} \frac{\partial g}{\partial y_2} - \frac{\partial f}{\partial z_2}\frac{\partial g}{\partial y_1}\]
so that
\[\frac{d}{dt}(f(x(t), p(t))) = \{f, \mathcal{H}\}(x(t), p(t)).\]
We say that some scalar quantity represented by a smooth function \(f : \mathbb{R}^2 \rightarrow \mathbb{R}\) is conserved if \(f(x(t), p(t))\) is independent of \(t\), i.e., if
\[\frac{d}{dt}(f(x(t), p(t))) = 0.\]
Thus, \(f\) is conserved if and only if \(\{f, \mathcal{H}\} = 0\). In particular, the Hamiltonian is necessarily preserved, since
\[\{\mathcal{H}, \mathcal{H}\} = \frac{\partial \mathcal{H}}{\partial y_1} \frac{\partial \mathcal{H}}{\partial y_2} - \frac{\partial \mathcal{H}}{\partial y_2}\frac{\partial \mathcal{H}}{\partial y_1} = 0.\]
Thus, if the Hamiltonian represents the total energy of the system, then Hamiltonian mechanics can be interpreted as being based on the conservation of energy.
Returning to the example of a simple harmonic oscillator, we may define the Hamiltonian as
\[\mathcal{H}(x(t), p(t)) = \frac{1}{2m}p(t) + \frac{1}{2}k(x(t))^2\]
so that we obtain
\[\dot{x} = \frac{\partial \mathcal{H}}{\partial y_2}(x(t), p(t)) = \frac{p(t)}{m}\]
which means that \(p = m \dot{x}\) and
\[\dot{p} = -\frac{\partial \mathcal{H}}{\partial y_1}(x(t), p(t)) = -kx(t),\]
which, substituting \(m \dot{x}\) for \(p\), yields
\[m \ddot{x} = -kx\]
or equivalently
\[m \ddot{x} + kx = 0,\]
which is the very same differential equation we obtained from both the Newtonian and Lagrangian formulations.
The reduction of the second-order equation \(F = m \ddot{x}\) to a system of first-order equations is one advantage of Hamiltonian mechanics. Another advantage is its natural geometric formulation, which will be described in the following sections.
Preliminary Motivation
Now that we have briefly demonstrated the gist of Hamiltonian mechanics, we will proceed to define Hamiltonian mechanics in terms of differential geometry. Hamiltonian mechanics is particularly amenable to such a definition.
In order to motivate the ensuing definitions, we make the following preliminary considerations.
First, we assume that we have a smooth manifold \(M\) which we think of as the phase space, i.e., the space of all states which our system can occupy.
We want to study the time evolution of this state, which we model as curves, i.e., continuous maps \(\gamma : J \rightarrow M\) from an interval \(J \subseteq \mathbb{R}\) (which represents time). Thus, the state of the system at time \(t_0 \in J\) is \(\gamma(t_0)\). We additionally assume that these curves are differentiable so that we can consider their velocity, i.e., the tangent vectors
\[\gamma'(t_0) = d\gamma\left(\frac{d}{dt}\bigg\rvert_{t_0}\right) \in T_{\gamma(t_0)}M.\]
We want to study various quantities defined on our phase space \(M\) which we model as smooth functions \(f \in C^{\infty}(M)\). To this end, we define certain vector fields \(X_f\) corresponding to these functions \(f\) which, intuitively speaking, at each state in \(M\), point in the direction in which \(f\) is constant. Constancy means that the differential of \(f\) at \(X_f\) should be \(0\), that is, \(df(X_f) = 0\). In particular, we will be interested in integral curves \(\gamma\) for the vector fields \(X_f\), i.e., curves whose velocity at each point matches \(X_f\), i.e., \(\gamma'(t) = X_f\rvert_{\gamma(t)}\) for all \(t \in J\); the function \(f\) will be constant along the integral curves of \(X_f\). Also recall that vector fields are equivalently sections of the tangent bundle or derivations of \(C^{\infty}(M)\). Thus, treating \(X_f\) as a derivation, conservation means that
\[X_f(f) = df(X_f) = 0.\]
This leads us to an analog of the notion of gradient from Riemannian geometry. Recall that, given a metric tensor field \(g\), the gradient is the unique vector field \(\textrm{grad} f\) such that, for every vector field \(X\),
\[\langle \textrm{grad} f, X \rangle_g = df(X) = X(f).\]
Thus, the gradient \(\textrm{grad} f\) represents the differential \(df\) via the canonical operation \(\langle \textrm{grad} f, \cdot \rangle_g\) induced by the metric. We seek an analogous notion for our phase space. We will likewise make use of a covariant \(2\)-tensor field \(\omega\), and we desire that
\[\omega(X_f, Y) = df(Y) = Y(f).\]
Note that \(\omega(X_f, X_f) = df(X_f) = X_f(f) = 0\), and thus, our tensor field \(\omega\) is alternating (since one of the definitions of an alternating tensor field is one which evaluates to \(0\) whenever any two of its arguments are equal). Thus, our tensor field is a \(2\)-form. We desire for the operation \(\omega(X_f, \cdot)\) to represent the differential \(df\).
More generally, we desire for every covector field to be represented as \(\omega(X, \cdot)\) for some vector field \(X\). Recall that in any Riemannian manifold, the metric tensor field induces an isomorphism between the tangent and cotangent bundles which is extremely useful. We likewise desire that the mapping \(X \mapsto \omega(X, \cdot)\) characterize an isomorphism between the tangent and cotangent bundles of the phase space. This is closely related to the definition of non-degeneracy, so we additionally require that our \(2\)-form \(\omega\) be non-degenerate.
Moreover, we desire for \(\omega\) to be invariant with respect to the flow generated by each vector field \(X_f\). This occurs precisely when the Lie derivative of \(\omega\) with respect to \(X_f\) vanishes, i.e., whenever \(\mathcal{L}_{X_f}\omega = 0\). For any vector field \(X\), by Cartan's magic formula,
\[\mathcal{L}_{X}\omega = d(\omega(X, \cdot)) + d\omega(X, \cdot).\]
Now, if \(\omega\) is closed, then \(d\omega = 0\) and thus
\[\mathcal{L}_{X}\omega = d(\omega(X, \cdot)).\]
It follows that \(\mathcal{L}_{X}\omega = 0\) if and only if \(\omega(X, \cdot)\) is closed. Since \(\omega(X_f, \cdot) = df\) and \(df\) is closed, it follows that \(\omega\) is invariant under \(X_f\). It is also possible to demonstrate the converse, namely, if \(\omega\) is invariant with respect to a vector field \(X\), then \(X = X_f\) for some smooth function \(f\) locally (i.e., at each point \(p\) there is some neighborhood of \(p\) on which \(X = X_f\)). Thus \(\omega\) is invariant with respect to a vector field \(X\) if and only if \(X = X_f\) locally for some \(f \in C^{\infty}(M)\).
We have thus discovered three important properties for \(\omega\): it must be non-degenerate, alternating, and closed. This leads to the following definition:
Definition (Symplectic Manifold). A symplectic manifold is a smooth manifold \(M\) equipped with non-degenerate, closed \(2\)-form \(\omega\) called a symplectic form.
Hamiltonian Systems
Now we will briefly indicate the formulation of Hamiltonian mechanics using the apparatus of differential geometry.
Definition (Hamiltonian vector field). Let \((M,\omega)\) denote a syplectic manifold. For any smooth function \(f \in C^{\infty}(M)\), the Hamiltonian vector field of \(f\) is the smooth vector field denoted \(X_f\) defined by the expression
\[\omega(X_f, \cdot) = X_f \lrcorner \omega = df,\]
or, equivalently, in terms of the bundle isomorphism \(\hat{\omega} : TM \rightarrow T^*M\) induced by \(\omega\),
\[X_f = \hat{\omega}^{-1}(df),\]
which means that, for any vector field \(Y\),
\[\omega(X_f, Y) = df(Y) = Yf.\]
One very important fact about symplectic manifolds is that every symplectic manifold is locally equivalent to the standard symplectic manifold.
Definition (Standard symplectic manifold). The standard symplectic manifold consists of the space \(\mathbb{R}^{2n}\) endowed with the standard symplectic form which is defined with respect to the standard coordinates written in the form \((x^1, \dots, x^n, y^1, \dots, y^n)\) as
\[\omega = \sum_{i=1}^n dx^i \wedge dy^i.\]
One of the basic facts about symplectic vector spaces (and hence symplectic manifolds) is that they are necessarily even-dimensional.
Theorem (Darboux). Let \((M,\omega)\) denote a \(2n\)-dimensional symplectic manifold. Around any point \(p \in M\) there is a neighborhood \(U\) on which there exist smooth coordinates \(\varphi\) written in component form as \((x^1, \dots, x^n, y^1, \dots, y^n)\) centered at \(p\) (i.e., \(x^i(p) = y^i(p) = 0\)) on which \(\omega\) has the following coordinate representation:
\[\omega = \sum_{i=1}^n dx^i \wedge dy^i.\]
Equivalently, given the standard symplectic manifold \((\mathbb{R}^{2n}, \omega_0)\),
\[\omega = \varphi^*(\omega_0).\]
Such coordinate are called standard coordinates or Darboux coordinates. Let's consider the coordinate representation for Hamiltonian vector fields. Every vector field may be expressed in terms of the coordinate basis corresponding to coordinates \((x^1, \dots, x^n, y^1, \dots, y^n)\) and certain smooth coefficient functions \((a^i, b^i)\) as follows:
\[X_f = \sum_{i=1}^n\left(a^i \frac{\partial}{\partial x^i} + b^i \frac{\partial}{\partial y^i}\right).\]
Since \(X_f\) is defined in terms of \(\omega\), we then compute as follows:
\begin{align*}X_f \lrcorner \omega &= \sum_{j=1}^n\left(a^j \frac{\partial}{\partial x^j} + b^j \frac{\partial}{\partial y^j}\right) \lrcorner \sum_{i=1}^n dx^i \wedge dy^i \\&= \sum_{i=1}^n (dx^i \otimes dy^i - dy^i \otimes dx^i)\left(a^j \frac{\partial}{\partial x^j} + b^j \frac{\partial}{\partial y^j}\right) \\&= \sum_{i=1}^n (dx^i \left(a^j \frac{\partial}{\partial x^j} + b^j \frac{\partial}{\partial y^j}\right) dy^i - dy^i \left(a^j \frac{\partial}{\partial x^j} + b^j \frac{\partial}{\partial y^j}\right) dx^i) \\&= \sum_{i=1}^n (a^i dy^i - b^i dx^i).\end{align*}
Since \(X_f \lrcorner \omega = df\) and \(df\) has the following coordinate representation
\[df = \sum_{i=1}^n\left(\frac{\partial f}{\partial x^i }dx^i + \frac{\partial f}{\partial y^i} dy^i \right),\]
it follows that
- \(a^i = \frac{\partial f}{\partial y^i}\) and
- \(b^i = -\frac{\partial f}{\partial x^i}\).
Thus, \(X_f\) has the following representation in Darboux coordinates:
\[X_f = \sum_{i=1}^n\left(\frac{\partial f}{\partial y^i}\frac{\partial}{\partial x^i } - \frac{\partial f}{\partial x^i}\frac{\partial}{\partial y^i}\right).\]
A symplectic manifold with a designated smooth function is called a Hamiltonian system.
Definition (Hamiltonian system). A symplectic manifold \((M, \omega)\) together with a designated smooth function \(H \in C^{\infty}(M)\) is called a Hamiltonian system. The function \(H\) is called the Hamiltonian. The flow associated with the Hamiltonian vector field \(X_H\) is called the Hamiltonian flow. The integral curves of \(X_H\) are called trajectories or orbits.
We can now examine the coordinate expression for the trajectories within a Hamiltonian system. Since the coordinate expression for the velocity of an integral curve \(\gamma\) in terms of coordinates \((x^1, \dots, x^n,y^1, \dots, y^n)\) is
\[\gamma'(t_0) = \frac{d\gamma^i}{dt}(t_0)\frac{\partial}{\partial x^i}\bigg\rvert_{\gamma(t_0)} + \frac{d\gamma^{i + n}}{dt}(t_0)\frac{\partial}{\partial y^i}\bigg\rvert_{\gamma(t_0)},\]
if we write \(\gamma = (x^i(t), y^i(t))\), we may instead write this expression as
\[\gamma'(t_0) = \dot{x}^i(t_0)\frac{\partial}{\partial x^i}\bigg\rvert_{\gamma(t_0)} + \dot{y}^i(t_0)\frac{\partial}{\partial x^i}\bigg\rvert_{\gamma(t_0)}.\]
The coordinate expression for \(X_H\) in Darboux coordinates is
\[X_H = \sum_{i=1}^n\left(\frac{\partial H}{\partial y^i}\frac{\partial}{\partial x^i } - \frac{\partial H}{\partial x^i}\frac{\partial}{\partial y^i}\right).\]
Since \(\gamma'(t) = X_H\rvert_{\gamma(t)}\) by definition for any trajectory \(\gamma\), it follows that
\[\dot{x}^i(t) = \frac{\partial H}{\partial y^i}(x(t), y(t))\]
and
\[\dot{y}^i(t) = -\frac{\partial H}{\partial x^i}(x(t), y(t)).\]
These equations are precisely Hamilton's equations. Thus, we can interpret Hamilton's equations as a characterization of trajectories relative to standard coordinates within Hamiltonian systems.
Hamiltonian and Symplectic Vector Fields
Next, we will demonstrate an important theorem which relates Hamiltonian and symplectic vector fields.
Definition (Symplectic Vector Field). A smooth vector field \(X\) on a symplectic manifold \((M, \omega)\) is symplectic if \(\omega\) is invariant under the flow of \(X\).
Definition (Hamiltonian Vector Field). A smooth vector field \(X\) on a symplectic manifold \((M, \omega)\) is Hamiltonian (or globally Hamiltonian) if there exists a smooth function \(f \in C^{\infty}(M)\) such that \(X = X_f\). \(X\) is locally Hamiltonian if each point \(p \in M\) has a neighborhood \(U \subseteq M\) on which \(X\) is Hamiltonian.
Every globally Hamiltonian vector field is locally Hamiltonian with \(U=M\).
Theorem. Let \((M, \omega)\) be a symplectic manifold. A smooth vector field \(X\) on \(M\) is symplectic if and only if it is locally Hamiltonian.
Proof. By properties of the Lie derivative, a vector field is symplectic if and only if \(\mathcal{L}_X\omega = 0\). By Cartan's magic formula, and since \(\omega\) is closed and hence \(d\omega = 0\),
\[\mathcal{L}_X\omega = d(X \lrcorner \omega) + X \lrcorner d\omega = d(X \lrcorner \omega).\]
Thus, \(X\) is symplectic if and only if \(X \lrcorner \omega\) is closed. Suppose that \(X\) is locally Hamiltonian. Then, in a neighborhood \(U\) of each point there is a smooth function \(f\) such that \(X = X_f\) when restricted to \(U\), and hence \(X \lrcorner \omega = X_f \lrcorner \omega = df\), which is closed, and thus \(X\) is symplectic.
Conversely, suppose that \(X\) is symplectic. By the Poincaré lemma, around each point \(p\) there is a neighborhood \(U\) on which the closed \(1\)-form \(X \lrcorner \omega\) is exact, which means that there exists a \(0\)-form (i.e., smooth function) \(f \in C^{\infty}(U)\) such that \(X \lrcorner \omega = df\). Since \(X_f \lrcorner \omega = df\) and \(\omega\) is non-degenerate, this means that \(X = X_f\) on \(U\), and hence \(X\) is locally Hamiltonian. \(\square\)
The following theorem establishes the condition under which every locally Hamiltonian vector field is globally Hamiltonian.
Theorem. Let \((M, \omega)\) be a symplectic manifold. Every locally Hamiltonian vector field on \(M\) is globally Hamiltonian if and only if every closed \(1\)-form is also exact.
Proof. Suppose that every closed \(1\)-form is exact. Let \(X\) be a locally Hamiltonian vector field. Then, by the previous theorem, it is also symplectic, and thus (by Cartan's magic formula), \(X \lrcorner \omega\) is closed. Then, by hypothesis, \(X \lrcorner \omega\) is exact and thus there exists a smooth function \(f \in C^{\infty}(M)\) such that \(X \lrcorner \omega = df\), which means that \(X = X_f\) and thus \(X\) is globally Hamiltonian.
Conversely, suppose that every locally Hamiltonian vector field is globally Hamiltonian. Let \(\eta\) be a closed \(1\)-form, and let \(X = \hat{\omega}^{-1}\eta\) (where \(\hat{\omega}\) is the induced tangent-cotangent isomorphism). Then, by Cartan's magic formula,
\begin{align*}\mathcal{L}_X\omega &= d(X \lrcorner \omega) \\&= d(\hat{\omega}(\hat{\omega}^{-1}\eta)) \\&= d\eta \\&= 0,\end{align*}
and thus \(X\) is symplectic and therefore locally Hamiltonian by the previous theorem. Then, by hypothesis, \(X\) is globally Hamiltonian and thus there exists a. smooth function \(f\) such that \(X = X_f\). Thus
\begin{align*}\eta &= \hat{\omega}(\hat{\omega}^{-1}\eta)) \\&= \hat{\omega}(X) \\&= \hat{\omega}(X_f) \\&= df,\end{align*}
and \(\eta\) is therefore exact. \(\square\)
The Canonical Symplectic Form
The cotangent bundle \(T^*M\) naturally carries a symplectic structure. To demonstrate this, we first define a \(1\)-form \(\tau \in \Omega^1(T^*M)\) called the tautological \(1\)-form. Each element of \(T^*M\) is pair \((q, \varphi)\) consisting of a point \(q \in M\) and a covector \(\varphi \in T_q^*M\). Thus, at each point \((q, \varphi)\) we want to define \(\tau_{(q, \varphi)} \in T_{(q, \varphi)}^*(T^*M)\). The natural projection \(\pi : T^*M \rightarrow M\) is the map
\[\pi(q, \varphi) = q.\]
The point-wise pullback along natural projection is a map with signature \(d\pi^*_{(q, \varphi)} : T^*_qM \rightarrow T_{(q, \varphi)}^*(T^*M)\). We can thus define \(\tau\) as
\[\tau_{(q, \varphi)} = d\pi^*_{(q, \varphi)}\varphi.\]
Thus, \(\tau_{(q, \varphi)}\) pulls \(\varphi\) back along the natural projection at the point \(q\). The action of \(\tau_{(q, \varphi)}\) on a tangent vector \(v \in T_{(q, \varphi)}(T^*M)\) is thus
\[\tau_{(q, \varphi)}(v) = \varphi\left(d\pi_{(q, \varphi)}(v)\right).\]
We can then define a symplectic form \(\omega\) on \(T^*M\) as
\[\omega = -d\tau.\]
We then consider the representation of \(\tau\) in terms of the natural coordinates for \(T^*M\). If we let \((x^i)\) denote smooth coordinates for \(M\) and write \(\varphi = \xi_i(q, \varphi) dx^i\rvert_q\), then we compute
\begin{align*}\tau_{(q, \varphi)} &= d\pi^*_{(q, \varphi)}\varphi \\&= d\pi^*_{(q, \varphi)}(\xi_i(q, \varphi) dx^i\rvert_q) \\&= \xi_i(q, \varphi) d\pi^*_{(q, \varphi)}( dx^i\rvert_q) \\&= \xi_i(q, \varphi) d(x^i \circ \pi)\rvert_{(q, \varphi)}.\end{align*}
Denoting the natural coordinate functions \((y^j)\) on \(T^*M\) as
\[(y^1, \dots, y^n, y^{n+1}, \dots, y^{2n}) = (x^1 \circ \pi, \dots, x^n \circ \pi, \xi_1, \dots, \xi_n),\]
the ultimate equation is justified by the following calculation
\begin{align*}d\pi^*_{(q, \varphi)}(dx^i\rvert_q) &= d\pi^*_{(q, \varphi)}(dx^i\rvert_q) \left( \frac{\partial}{\partial y^j}\bigg\rvert_{(q, \varphi)} \right) dy^j\rvert_{(q, \varphi)} \\&= dx^i\rvert_q\left( d\pi_{(q, \varphi)} \left( \frac{\partial}{\partial y^j}\bigg\rvert_{(q, \varphi)} \right)\right)dy^j\rvert_{(q, \varphi)} \\&= dx^i\rvert_q\left( d\pi_{(q, \varphi)} \left( \frac{\partial}{\partial y^j}\bigg\rvert_{(q, \varphi)} \right)(x^k\rvert_q)\frac{\partial}{\partial x^k}\bigg\rvert_q\right)dy^j_{(q, \varphi)} \\&= dx^i\rvert_q\left(\frac{\partial \pi^k}{\partial y^j} \bigg\rvert_{(q, \varphi)}\frac{\partial}{\partial x^k}\bigg\rvert_q\right)dy^j\rvert_{(q, \varphi)}\\&= dx^i\left( \delta^k_j \frac{\partial}{\partial x^k} \right)dy^j\rvert_{(q, \varphi)} \\&= dx^i\left( \frac{\partial}{\partial x^j} \right)dy^j\rvert_{(q, \varphi)} \\&= \delta^i_j dy^j\rvert_{(q, \varphi)} \\&= d(x^i \circ \pi)\rvert_{(q, \varphi)}.\end{align*}
It thus follows that
\[\tau_{(q, \varphi)} = \xi_i(q, \varphi) d(x^i \circ \pi)\rvert_{(q, \varphi)}\]
and thus
\[\tau = \xi_i d(x^i \circ \pi).\]
It is customary to conflate the coordinate functions \(x^j \circ \pi\) on \(T^*M\) with the coordinate functions \(x^j\) on \(M\). In this case, one simply writes
\[\tau = \xi_i dx^i,\]
and the exterior derivative yields
\begin{align*}\omega &= -d\tau \\&= -\left(\sum_i d\xi_i \wedge dx^i\right) \\&= \sum_i dx^i \wedge d\xi_i.\end{align*}
Thus, the coordinate expression for \(\omega\) takes the form of Darboux coordinates.
Example
Next, we will give an example of a Hamiltonian system. The \(n\)-body problem considers a set of \(n\) particles moving throughout space. Each particle is idealized as a point mass whose position \(q_i\) is an element of \(\mathbb{R}^3\), resulting in \(n\) points
\[q_1, \dots, q_n.\]
We may consider the time evolution of the positions of the positions of each particle by treating the positions as functions of time, i.e., \(q_i(t) = (q_i^1(t), q_i^2(t), q_i^3(t))\) is the position of particle \(i\) at time \(t\). The time evolution of the positions of all particles can be modeled as a single curve \(q\) in \(\mathbb{R}^{3n}\) defined as
\[q(t) = (q_1^1(t), q_1^2(t), q_1^3(t), \dots, q_n^1(t), q_n^2(t), q_n^3(t)).\]
We may also label the coordinates as follows:
\[q(t) = (q^1(t), \dots, q^{3n}(t)).\]
The set of valid positions might be constrained in some manner, resulting in a manifold \(Q \subseteq \mathbb{R}^{3n}\) called the configuration space.
We denote the inertial mass of particle \(i\) as \(m_i\). We can then arrange these masses into a \(3n \times 3n\) diagonal matrix \(M\) with diagonal \(m_1, m_1, m_1, \dots, m_n, m_n, m_n\). This matrix is positive-definite since each mass is positive. Since the matrix \(M\) is symmetric and positive-definite, it can be interpreted as a Riemannian metric. This particular metric is a constant-coefficient metric (i.e., its coefficients do not vary from point to point). The Riemannian metric \(M\) induces an isomorphism \(\widehat{M} : TQ \rightarrow T^*Q\) between the tangent and cotangent spaces. If we write the natural coordinates for \(TQ\) as
\[(q^1, \dots, q^{3n}, v^1, \dots, v^{3n})\]
and the natural coordinates for \(T^*Q\) as
\[(q^1, \dots, q^{3n}, p_1, \dots, p_{3n}),\]
then we may write the action of the metric \(M\) in coordinates as
\[M(v, w) = M_{ij}v^iw^j.\]
The isomorphism \(\widehat{M}\) then has the following coordinate representation:
\[\widehat{M}(v) = M(v, \cdot) = M_{ij}dx^idx^j(v) = M_{ij}v^idx^j\]
or, equivalently,
\[\widehat{M}(v)(w) = M_{ij}v^iw^j.\]
We may think of the elements of \(TQ\) as velocity vectors and the velocity \(\dot{q}(t)\) of the system of particles at time \(t\) is
\[\dot{q}(t) = (\dot{q}^1(t), \dots, \dot{q}^{3n}(t)).\]
We may then interpret the covector \(\widehat{M}(\dot{q}(t))\) as the total momentum \(p\) of the system at time \(t\), since
\[p_j(t) = \widehat{M}(\dot{q}(t)) = M_{ij}\dot{q}^i(t)\]
and thus the momentum of each individual particle \(k\) at time \(t\) is given by
\[p_k(t) = (p_k^1(t), p_k^2(t), p_k^3(t)) = m_k\dot{q}_k(t) = (m_k\dot{q}_k^1(t), m_k\dot{q}_k^2(t), m_k\dot{q}_k^3(t)).\]
This explains why the metric was defined in terms of the inertial masses of the particles: the induced tangent-cotangent isomorphism then represents momentum in the manner described above (i.e., the coordinates of the induced covector represent the momentum of the system).
The kinetic energy \(T\) of the system is a function of momentum given by
\[T(p) = \frac{1}{2}M^{-1}(p, p) = \frac{1}{2}M^{ij}p_ip_j\]
where \(M^{ij}\) represents the elements of the inverse of the matrix \(M\).
We further suppose that there exists a smooth function \(V\) representing the potential energy of the system. We then define the total energy \(E\) of the system as
\[E(q, p) = V(q) + T(p).\]
We then define the phase space as \(P = T^*Q\). The phase space has the structure of a symplectic manifold when endowed with the canonical symplectic form on \(T^*Q\) induced by the respective tautological \(1\)-form.
The function \(E\) can be interpreted with signature \(E \in C^{\infty}(P)\).
If we define the Hamiltonian as the total energy of the system, i.e. \(H=E\), then Hamilton's equations thus indicate that, for any curve \(\gamma(t) = (q(t), p(t))\) in \(P\),
\[\dot{q}^i(t) = M^{ij}p_j(t)\]
and
\[\dot{p}_i(t) = -\frac{\partial V}{\partial q^i}(q(t)).\]
The first equation is the definition of momentum in another form and the second equation is Newton's second law under the assumption that the involved forces are conservative, i.e., that \(F = -dV\).
Thus, the fact that \(H=E\) is constant along the trajectories of its Hamiltonian flow encodes the law of conservation of energy.
Poisson Brackets
There is a natural operation on pairs of smooth functions induced by the action of the symplectic form on their respective Hamiltonian vector fields called the Poisson bracket. For smooth functions \(f,g \in C^{\infty}(M)\), the Poisson bracket \(\{f, g\}\) is defined as follows:
\[\{f,g\} = \omega(X_f, X_g) = df(X_g) = X_g(f).\]
Thus, we can interpret this action as a measure of the rate of change of \(f\) along the Hamiltonian flow of \(g\).
In particular, if \(\{f, g\} = \{g, f\}\), then \(\omega(X_f, X_g) = \omega(X_g, X_f) = -\omega(X_f, X_g)\), which is satisfied only if \(\{f, g\} = 0\). Conversely, if \(\{f,g\} = 0\), then \(\omega(X_f, X_g) = 0 = -\omega(X_g, X_f)\) which means that \(\omega(X_g, X_f)=0\), and hence \(\{f,g\} = \{g,f\}\). Thus, \(f\) is constant along the Hamiltonian flow of \(g\) if and only if the respective Poisson bracket commutes.
Using the coordinate expression for \(X_g\) in Darboux coordinates, namely,
\[X_g = \sum_{i=1}^n \left(\frac{\partial g}{\partial y^i}\frac{\partial}{\partial x^i} - \frac{\partial g}{\partial x^i}\frac{\partial}{\partial y^i}\right),\]
we compute the coordinate expression for the Poisson bracket as follows:
\begin{align*}\{f,g\} &= X_g(f) \\&= \sum_{i=1}^n \left(\frac{\partial g}{\partial y^i}\frac{\partial f}{\partial x^i} - \frac{\partial g}{\partial x^i}\frac{\partial f}{\partial y^i}\right) \\&= \sum_{i=1}^n \left(\frac{\partial f}{\partial x^i}\frac{\partial g}{\partial y^i} - \frac{\partial f}{\partial y^i}\frac{\partial g}{\partial x^i}\right). \end{align*}
The Poisson bracket enjoys the following properties:
- Bilinearity: \(\{f,g\}\) is bilinear over \(\mathbb{R}\).
- Antisymmetry: \(\{f,g\} = -\{g,f\}\).
- Jacobi Identity: \(\{\{f,g\},h\} + \{\{g,h\},f\} + \{\{h,f\},g\} = 0\).
- \(X_{\{f,g\}} = -[X_f,X_g]\).
Bilinearity and antisymmetry are immediate, since \(\{f,g\} = \omega(X_f, X_g)\) and \(\omega\) satisfies the same properties.
To prove the final property, we will exploit the non-degeneracy of \(\omega\), which implies that if
\[\omega\left(X_{\{f,g\}}, Y\right) = \omega\left(-[X_f, X_g], Y\right)\]
for every vector field \(Y\), then \(X_{\{f,g\}} = -[X_f, X_g]\). This is equivalent to demonstrating the following:
\[\omega\left(X_{\{f,g\}}, Y\right) - \omega\left(-[X_f, X_g], Y\right) = 0.\]
Note that
\begin{align*}\omega\left(X_{\{f,g\}}, Y\right) &= d(\{f,g\})(Y) \\&= Y\{f,g\} \\&= YX_gf.\end{align*}
Since \(X_g\) is a Hamiltonian vector field, it is also symplectic, which means that \(\mathcal{L}_{X_g}\omega = 0\), and thus, by properties of the Lie derivative,
\begin{align*}0 &= (\mathcal{L}_{X_g}\omega)(X_f, Y) \\&= X_g(\omega(X_f, Y)) - \omega([X_g, X_f], Y) - \omega(X_f, [X_g, Y]).\end{align*}
We then compute
\[X_g(\omega(X_f, Y)) = X_gYf\]
and
\begin{align*}\omega(X_f, [X_g, Y]) &= df([X_g, Y]) \\&= [X_g, Y]f \\&= X_gYf = YX_gf \\&= X_gYf - \omega(X_{\{f,g\}}, Y).\end{align*}
Substituting, we obtain
\begin{align*}0 &= X_g(\omega(X_f, Y)) - \omega([X_g, X_f], Y) - \omega(X_f, [X_g, Y]) \\&= X_gYf - \omega([X_g, X_f], Y) - (X_gYf - \omega(X_{\{f,g\}}, Y)) \\&= - \omega([X_g, X_f], Y) + \omega(X_{\{f,g\}}, Y) \\&= \omega([X_f, X_g], Y) + \omega(X_{\{f,g\}}, Y).\end{align*}
For the Jacobi identity, note that
\begin{align*}\{\{g,h\},f\} &= -\{f,\{g,h\}\} \\&= -X_{\{g,h\}}f \\&= [X_g, X_h]f \\&= X_gX_hf - X_hX_gf \\&= X_g\{f,h\} - X_h\{f,g\} \\&= \{\{f,h\},g\} - \{\{f,g\},h\} \\&= -\{\{h,f\},g\} - \{\{f,g\},h\}.\end{align*}
These properties indicate that the vector space \(C^{\infty}(M)\) is a Lie Algebra under the Poisson bracket operation.
Noether's Theorem
Given a Hamiltonian system \((M, \omega, H)\), a smooth function \(f \in C^{\infty}(M)\) is a conserved quantity of the system if \(f\) is constant along every integral curve of \(X_H\).
Recall that, by the definition of the velocity \(\gamma'\) of a curve \(\gamma : J \rightarrow M\),
\[\gamma'(t_0) = d\gamma\left(\frac{d}{dt}\bigg\rvert_{t_0}\right) \in T_{\gamma(t_0)}M.\]
The action of the velocity \(\gamma'\) on a smooth function \(f\) is thus
\begin{align*}\gamma'(t_0)f &= d\gamma\left(\frac{d}{dt}\bigg\rvert_{t_0}\right)f \\&= \frac{d}{dt}\bigg\rvert_{t_0}(f \circ \gamma) \\&= (f \circ \gamma)'(t_0).\end{align*}
Thus, the action computes the derivative of \(f\) along \(\gamma\).
If \(\gamma\) is an integral curve of \(X_H\), then, by definition, \(\gamma'(t) = X_H\rvert_{\gamma(t)}\) for every \(t \in J\). If \(f\) is constant along the integral curve \(\gamma\), this means that
\[\gamma'(t)f = (f \circ \gamma)'(t) = 0\]
and thus
\[X_Hf = 0.\]
A smooth function \(f\) is therefore a conserved quantity if and only if \(\{f, H\} = 0\), since
\begin{align*}\{f,H\} &= \omega(X_f, X_H) \\&= df(X_H) \\&= X_Hf.\end{align*}
Since \(\{H, H\} = 0\), it follows that every Hamiltonian system necessarily conserves its own Hamiltonian.
Definition (Infinitesimal symmetry). Let \((M, \omega, H)\) be a Hamiltonian system. A smooth vector field \(V\) on \(M\) is called an infinitesimal symmetry of the system if both \(\omega\) and \(H\) are invariant under the flow of \(V\).
Since invariance is equivalent to a vanishing Lie derivative and \(\mathcal{L}_VH = VH\), this means that a symplectic vector field \(V\) is an infinitesimal symmetry if and only if
\[0 = \mathcal{L}_VH = VH.\]
With this terminology in place, we can now state Noether's theorem, which is a very important theorem in physics and mathematics. Noether's theorem establishes a correspondence between conserved quantities in Hamiltonian systems and infinitesimal symmetries.
Theorem (Noether). Let \((M, \omega, H)\) be a Hamiltonian system. The Hamiltonian vector field \(X_f\) of a every conserved quantity \(f\) is an infinitesimal symmetry. Conversely, if every closed \(1\)-form is exact, then each infinitesimal symmetry is the Hamiltonian vector field of some conserved quantity which is unique up to the addition of a function that is constant on each component of \(M\).
Proof. Suppose that \(f\) is a conserved quantity. Then \(\{f, H\} = 0\), and thus \(X_fH = \{H, f\} = -\{f, H\} = 0\), and \(H\) is constant along the flow of \(X_f\). Furthermore, since every Hamiltonian vector field is a symplectic vector field, it follows that \(X_f\) is a symplectic vector field, and thus, by definition, \(\omega\) is invariant along the flow of \(X_f\). Since \(\omega\) is invariant along the flow of \(H\) and \(\omega\), it is an infinitesimal symmetry.
Conversely, suppose that every closed \(1\)-form is exact. Let \(V\) be any infinitesimal symmetry. Then \(V\) is symplectic by definition, and thus globally Hamiltonian, so there exists some smooth function \(f\) such that \(V = X_f\). Since \(H\) is constant along the flow of \(V\), \(0 = \{f, H\} = -\{H, f\}\) and thus \(0 = \{H, f\} = X_fH = VH = 0\), so \(f\) is a conserved quantity. If \(\bar{f}\) is another function satisfying \(X_{\bar{f}} = V = X_f\), then
\begin{align*}d(f - \bar{f}) &= (X_f - X_{\bar{f}}) \lrcorner \omega \\&= (V - V) \lrcorner \omega \\&= 0 \lrcorner \omega \\&= 0\end{align*}
which means that \(f - \bar{f}\) is constant on each component (maximal connected subset) of \(M\).