Functional Derivatives
This post is about the concept of functional derivatives. These derivatives find important applications in physics and engineering.
Note: in this post, we will use the term derivative to refer to some sort of limit or related concept, whereas we will use the term differential to refer to a sort of continuous linear map.
Total Differentials
The total functional differential is simply the total differential (in the sense of the Fréchet differential from functional analysis) of a map whose domain is a space of functionals and whose codomain is \(\mathbb{R}\). A functional is a generic term which typically refers to a map between a certain type of vector space of functions and \(\mathbb{R}\). For instance, the space \(C[a,b]\) of continuous maps \(f : [a,b] \rightarrow \mathbb{R}\) might be taken as the domain of functionals, and then a functional is a map \(F[f] : C[a,b] \rightarrow \mathbb{R}\). Thus, functionals can be conceived as higher-order functions which accept functions as input and produce numbers as output. We use square brackets (e.g., \(F[f]\)) to emphasize that a map is a functional.
Definition (Total Functional Differential). The total differential of a functional \(F : V \rightarrow \mathbb{R}\) at a point \(f \in V\) defined on a normed vector space \(V\) (where \(V \subseteq [\Omega, \mathbb{R}]\) is an appropriate domain of real-valued functions) is the total differential of \(F\) at the point \(f\), which is therefore a bounded linear functional denoted \(\delta F_f : V \rightarrow \mathbb{R}\) satisfying, for each \(\varphi \in V\),
\[F[f + \varphi] = F[f] + \delta F_f[\varphi] + \varepsilon[\varphi],\]
where
\[\varepsilon[\varphi] = F[f + \varphi] - F[f] - \delta F_f[\varphi],\]
and \(\lvert \varepsilon[\varphi] \lvert \in o\left(\lVert \varphi \rVert_V\right)\), meaning that
\[\lim_{\varphi \to 0}\frac{\lvert \varepsilon[\varphi] \rvert}{\lVert \varphi \rVert_V} = 0.\]
In detail, this means that
\[\lim_{\varphi \to 0}\frac{\lvert F[f + \varphi] - F[f] - \delta F_f[\varphi] \rvert}{\lVert \varphi \rVert_V} = 0.\]
Note that this definition is dependent on the choice of norms for \(V\) and \(\mathbb{R}\) and assumed the standard choice of the Euclidean norm for \(\mathbb{R}\), which is equivalent to the absolute value.
Thus, the functional differential is just the total differential of a map between a normed vector space \(V\) of functionals and \(\mathbb{R}\).
Example
Given a function, there isn't much one can do with it, except to integrate it in some manner. Thus, functionals are typically expressed as integrals.
Example. Consider the functional \(F : C[a,b] \rightarrow \mathbb{R}\) defined as follows for every \(f \in C[a,b]\):
\[F[f] = \int_a^b\left(f(x)\right)^2~dx.\]
Let \(\varphi \in C[a,b]\) and consider \(\delta_f[\varphi]\). Here, we use the supremum norm \(\lVert \varphi \rVert_{\infty} = \sup_{x \in [a,b]}\varphi(x)\) on \(C[a,b]\).
Since \(F[f+\varphi]=F[f] + \delta F_f[\varphi] + \varepsilon[\varphi]\), this means that there is an approximation
\[F[f + \varphi]-F[f] \approx \delta F_f[\varphi],\]
i.e.,
\[F[f + \varphi]-F[f] = \delta F_f[\varphi] + \varepsilon[\varphi],\]
where \(\varepsilon[\varphi]\) is the error in the approximation, so we compute
\begin{align}F[f + \varphi]-F[f] &= \int_a^b \left(f(x)+\varphi(x)\right)^2 - \left(f(x)\right)^2~dx \\&= \int_a^b \left(f(x)\right)^2 + 2f(x)\varphi(x) + \left(\varphi(x)\right)^2 - \left(f(x)\right)^2~dx \\&= \int_a^b 2f(x)\varphi(x)~dx + \int_a^b \left(\varphi(x)\right)^2~dx.\end{align}
Thus, if we define
\[\delta F_f[\varphi] = \int_a^b 2f(x)\varphi(x)~dx,\]
which is indeed a continuous (and thus bounded) linear map, it follows that
\[\varepsilon[\varphi] = \int_a^b \left(\varphi(x)\right)^2~dx,\]
and \(\varepsilon(v) \in o(\lVert \varphi \rVert)\). To see this, suppose \(\varepsilon > 0\), and define \(\delta = \varepsilon / (b-a)\). If \(\lVert \varphi \rVert_{\infty} \lt \delta\), then we compute
\begin{align}\frac{1}{\lVert \varphi \rVert_{\infty}} \cdot \left\lvert \int_a^b \left(\varphi(x)\right)^2~dx \right\rvert &\le \frac{1}{\lVert \varphi \rVert_{\infty}} \cdot \int_a^b \left \lvert\varphi(x)\right \rvert^2~dx \\&\le \frac{1}{\lVert \varphi \rVert_{\infty}} \cdot \int_a^b \lVert \varphi \rVert_{\infty}^2~dx \\&= \frac{1}{\lVert \varphi \rVert_{\infty}} \cdot (b-a) \cdot \lVert \varphi \rVert_{\infty}^2 \\&= (b-a) \cdot \lVert \varphi \rVert_{\infty} \\&\lt (b-a) \cdot \delta \\&= \varepsilon.\end{align}
Thus, it follows that
\[\lim_{\varphi \to 0}\frac{\lvert \varepsilon[\varphi] \rvert}{\lVert \varphi\rVert_{\infty}} = 0.\]
This example is a theorem that represents a particular rule for computing total differentials which is formally analogous to a corresponding rule for finite-dimensional vector spaces. For instance, the total differential of the function \(f : \mathbb{R} \rightarrow \mathbb{R}\) defined as \(f(x) = x^2\) is the map
\[h \mapsto f'(x) \cdot h = 2x \cdot h\]
where \(f'(x) = 2x\) is the classical (i.e. partial) derivative of \(f\). Thus, formally, one can compute in many cases with expressions involving functions in the same manner that one can compute with expressions involving numbers, etc.
Directional Derivatives
Directional derivatives can be defined for functionals since they are maps between normed vector spaces. Here we will state this definition specialized for functionals.
Definition (Directional Functional Derivative). Given a functional \(F : V \rightarrow \mathbb{R}\) defined on a normed vector space \(V \subseteq [\Omega, \mathbb{R}]\), the directional derivative \(D_fF[\varphi]\) of \(F\) at the point \(f\) in the direction \(\varphi\) is the following limit (if it exists):
\[D_fF[\varphi] = \lim_{t \to 0}\frac{F[f + t \cdot \varphi] - F[f]}{t}.\]
Note that this definition is equivalent to the following definition:
\[D_fF[\varphi] = \frac{d}{dt}\bigg\rvert_0F[f + t \cdot \varphi].\]
To see this, note the following:
\begin{align*}\frac{d}{dt}\bigg\rvert_0F[f + t \cdot \varphi] &= \lim_{t \to 0}\frac{F[f + (0 + t) \cdot \varphi] - F[f + 0 \cdot \varphi]}{t} \\&= \lim_{t \to 0}\frac{F[f + t \cdot \varphi] - F[f]}{t}.\end{align*}
Directional Differentials
Likewise, the directional differential can be defined for functionals.
Definition (Directional Functional Differential). Given a functional \(F : V \rightarrow \mathbb{R}\) defined on a normed vector space \(V \subseteq [\Omega,\mathbb{R}]\), the directional differential of \(F\) at the point \(f \in V\) is a bounded linear map \(D_fF : V \rightarrow \mathbb{R}\) such that \(D_fF[\varphi]\) is the directional functional derivative for all \(\varphi \in V\).
Thus, the directional differential is simply the mapping \(\varphi \mapsto D_fF[\varphi]\) with the additional constraint that this mapping must be linear and continuous (all bounded linear maps are continuous).
The directional functional differential can likewise be characterized in terms of an error function as follows.
Definition (Directional Functional Differential). Given a functional \(F : V \rightarrow \mathbb{R}\) defined on a normed vector space \(V\), the directional differential of \(F\) at the point \(f \in V\) is a bounded linear map \(D_fF : V \rightarrow \mathbb{R}\) such that, for all \(t \in \mathbb{R}\) and \(\varphi \in V \),
\[F[f + t \cdot \varphi] = F[f] + t \cdot D_fF[\varphi] + \varepsilon[\varphi]\]
and \[\varepsilon[\varphi] \in o(t),\]
meaning that
\[\lim_{t \to 0}\frac{\varepsilon[\varphi]}{t} = 0,\]
where
\[\varepsilon[\varphi] = F[f + t \cdot \varphi] - F[f] - t \cdot D_fF[\varphi].\]
Thus, this requires that
\[\lim_{t \to 0}\frac{F[f + t \cdot \varphi] - F[f] - t \cdot D_fF[\varphi]}{t} = 0,\]
which is equivalent to the previous definition (once the terms are rearranged).
It can be demonstrated that the total functional differential is likewise a bounded linear map that maps each function to its directional functional derivative (see the post about derivatives on normed vector spaces for a proof). However, the directional differential is a weaker notion than the total differential since the total differential requires uniform convergence. See the post about the Fréchet and Gâteaux derivatives for more information. The subtle difference is that the total differential is defined in terms of a single limit involving the entire map \(\delta F_f\), whereas the directional differential is defined in terms of many independent limits, each involving a separate direction \(\varphi\).
Example
Consider an example of a calculation involving a directional differential.
Example. Consider the functional \(F : C[a,b] \rightarrow \mathbb{R}\) defined as follows for every \(f \in C[a,b]\):
\[F[f] = \int_a^b\left(f(x)\right)^2~dx.\]
Let \(\varphi \in C[a,b]\) and consider \(D_fF[\varphi]\).
Since \(F[f+\varphi]=F[f] + D_fF[\varphi] + \varepsilon[\varphi]\), this means that there is an approximation
\[F[f + t \cdot \varphi]-F[f] \approx t \cdot D_fF[\varphi],\]
i.e.,
\[F[f + t \cdot \varphi]-F[f] = t \cdot D_fF[\varphi] + \varepsilon[\varphi],\]
where \(\varepsilon[\varphi]\) is the error in the approximation, so we compute
\begin{align}F[f + t \cdot \varphi]-F[f] &= \int_a^b \left(f(x)+ t \cdot \varphi(x)\right)^2 - \left(f(x)\right)^2~dx \\&= \int_a^b \left(f(x)\right)^2 + 2 \cdot t \cdot f(x)\varphi(x) + \left(t \cdot \varphi(x)\right)^2 - \left(f(x)\right)^2~dx \\&= t \cdot \int_a^b 2f(x)\varphi(x)~dx + t^2 \cdot \int_a^b \left(\varphi(x)\right)^2~dx.\end{align}
Thus, if we define
\[D_fF[\varphi] = \int_a^b 2f(x)\varphi(x)~dx,\]
which is indeed a continuous (and thus bounded) linear map, it follows that
\[\varepsilon[\varphi] = t^2 \cdot \int_a^b \left(\varphi(x)\right)^2~dx,\]
and thus
\[\frac{\varepsilon[\varphi]}{t} = t \cdot \int_a^b \left(\varphi(x)\right)^2~dx\]
which vanishes as \(t\) approaches \(0\).
Thus, it follows that
\[\lim_{t \to 0}\frac{ \varepsilon[\varphi] }{t} = 0.\]
We see in this example that establishing the limit was simpler (compared to the example involving the total differential) since it was an independent limit parameterized by the parameter \(t\).
Functional Derivatives
Now, recall that, for finite-dimensional vector spaces, the partial derivative is simply the directional derivative in the direction of a basis vector. Since spaces of functionals are generally infinite-dimensional, such a definition of partial derivatives does not apply. However, it is natural to ask whether there is some analogous notion for functionals.
Recall that, for an \(n\)-dimensional vector space \(X\) with basis \((b_i)\) and coordinate functions \((x^i)\), the total derivative \(dF_p(v)\) and directional derivative \(D_pf(v)\) of a continuous map \(f : X \rightarrow \mathbb{R}\) at a point \(p \in X\) are given by
\[dF_p(v) = D_pf(v) = \sum_{i=1}^n \frac{\partial f}{\partial x^i}(a) \cdot v^i.\]
Then, by formal analogy, if we replace the sum with an integral over the entire domain \(\Omega\) (where the functions \(f, \varphi\) are appropriate functions \(f : \Omega \rightarrow \mathbb{R}\)) and replace the finite set of coordinates \((v^i)\) with a function \(\varphi(x)\) and replace the coordinate functions \((x^i)\) with a function \(f\), and if we denote the formal analogy of the partial derivative as \((\delta F/\delta f)(x)\), we obtain
\[\delta F_f[\varphi] = D_fF[\varphi] = \int_{\Omega} \frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx.\]
A functional derivative is precisely such a formal analog of the partial derivative. Intuitively, one may think of this as an "infinite linear combination" (although, literally, this is not the case): in place of the discrete \(v^i\), we get infinitely many values \(\varphi(x)\), and in place of the discrete set of coordinate functions \(x^i\), we get infinitely many parameters \(f(x)\) and hence infinitely many "partial derivatives" \((\delta F / \delta f(x))\), which is why many authors write
\[\frac{\delta F}{\delta f(x)} \cdot \varphi(x)\]
as an alternative notation (although might obscure the fact that the functional derivative is a function). However, strictly speaking, the definition is as follows.
Definition (Functional Derivative). The functional derivative of a functional \(F : V \rightarrow \mathbb{R}\) defined on a normed vector space \(V \subseteq [\Omega, \mathbb{R}]\) of functionals with respect to \(f \in V\) is a function denoted
\[\frac{\delta F}{\delta f}(x) : \Omega \rightarrow \mathbb{R}\]
such that for all \(\varphi \in V\)
\[\delta F_f[\varphi] = \int_{\Omega} \frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx.\]
Example. Consider the functional \(F : C[a,b] \rightarrow \mathbb{R}\) defined as follows for every \(f \in C[a,b]\):
\[F[f] = \int_a^b\left(f(x)\right)^2~dx.\]
We previously determined that
\[\delta F_f[\varphi] = \int_a^b 2f(x)\varphi(x)~dx.\]
Thus, it follows that
\[\frac{\delta F}{\delta f}(x) = 2f(x).\]
Note that it is also possible to define the functional derivative in terms of the directional differential instead of the total differential, which might be desirable in contexts where the total differential does not exist. By default, the functional derivative is defined in terms of the total differential.
Properties
The functional derivative inherits various properties of the total (or directional) differential.
Linearity
If the functional derivative exists, i.e.,
\[\delta F_f[\varphi] = D_fF[\varphi] = \int_{\Omega}\frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx,\]
then, since \(D_f(\lambda \cdot F)[\varphi] = \lambda \cdot D_fF[\varphi]\), it follows that
\begin{align*}\delta (\lambda \cdot F)_f[\varphi] &= D_f(\lambda \cdot F)[\varphi] \\&= \lambda \cdot D_fF[\varphi] \\&= \lambda \cdot \int_{\Omega}\frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx \\&= \int_{\Omega}\lambda \cdot \frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx,\end{align*}
and hence
\[\frac{\delta (\lambda \cdot F)}{\delta f}(x) = \lambda \cdot \frac{\delta F}{\delta f}(x).\]
Likewise, whenever the functional derivatives exist, one can infer that
\[\frac{\delta (F + G)}{\delta f}(x)= \frac{\delta F}{\delta f}(x) + \frac{\delta G}{\delta f}(x).\]
This means that the functional derivative is linear in \(F\).
The Physicists' Functional Derivative
In physics textbooks, one might encounter the following (putative) definition of the functional derivative:
\[\frac{\delta F}{\delta f(x)} = \lim_{t \to 0}\frac{F[f + t \cdot \delta_x] - F[f]}{t}.\]
Here, \(\delta_x\) is the Dirac delta function which is putatively defined such that
\[\delta_x(x') = \begin{cases}\infty & \text{if}~x'=x \\ 0 & \text{otherwise}\end{cases}\]
and
\[\int_{\Omega}\delta_x(x')~dx' = x\]
and
\[\int_{\Omega}\delta_x(x') \cdot f(x')~dx' = f(x).\]
However, no such function exists. Indeed, the codomain includes \(\infty\), so it is not a valid test function. However, it is possible to define the Dirac delta as a distribution (a certain kind of continuous linear functional).
This implicitly presents the Dirac delta "functions" (one for each \(x\)) as a sort of uncountable basis (where, under the integral sign, each delta acts like a characteristic function or indicator function) and is analogous to the definition of the partial derivative for finite-dimensional vector spaces.
However, this does not really represent a definition; instead, it presents a formal calculus for performing formal calculations. Given the formal analogy between partial derivatives and functional derivatives (and the formal analogy between sums and integrals, etc.), this usually works (i.e., produces correct results). However, it is not guaranteed to produce correct results, so the results must be verified. The perturbation implied by the Dirac delta is isolated to a single point \(x\), and this technique might fail for more elaborate perturbations \(varphi\) of the functions \(f\).
Alternatively, one can proceed by proving a group of theorems (e.g., as we did previously in examples), where each theorem represents a "rule" for performing calculations. Then these rules can be applied to compute derivatives. This is the typical manner in which derivatives are computed in calculus, for instance.
The physicists' definition thus serves as a sort of heuristic technique for computing the functional derivatives.
Euler-Lagrange Formula
The Euler-Lagrange formula provides a method for computing functional derivatives subject to certain boundary conditions. This formula also yields an equation which permits calculating the extrema of functionals subject to the boundary conditions. It is thus useful for optimization problems.
We consider a wide class of functions which are parameterized by a variable \(x\), a function \(f(x)\), and the first derivative \(f'(x)\), which we denote as \(L(x,f(x), f'(x))\), so that the functionals under consideration all have the form
\[F[f] = \int_a^b L(x,f(x),f'(x))~dx.\]
In other words, \(x \in \mathbb{R}\), \(f, f' : \mathbb{R} \rightarrow \mathbb{R}\), \(L : \mathbb{R}^3 \rightarrow \mathbb{R}\) is the map \((y_1, y_2, y_3) \mapsto L(y_1,y_2,y_3)\), and \(L(x,f(x),f'(x))\) denotes the composite map \(x \mapsto (x,f(x),f'(x)) \mapsto L(x,f(x),f'(x))\).
The function \(L\) is typically assumed to be sufficiently differentiable (for instance, at least twice differentiable).
Our goal is to compute the functional derivative of \(F\) at \(f\) subject to the generic boundary conditions \(f(a) = A\) and \(f(b) = B\).
We seek a solution to the equation
\[\delta F_f[\varphi] = \frac{d}{dt}\bigg\lvert_0 F[f + t \cdot \varphi] = \int_a^b \frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx.\]
We restrict the class of admissible "test" functions \(\varphi\) to those such that \((f + t \cdot \varphi)(a) = f(a) = A\) and \((f + t \cdot \varphi)(b) = f(b) = B\), which implies that \(\varphi(a) = 0\) and \(\varphi(b) = 0\). Note that such restricted functions still form a vector space since \((\varphi_1 + \varphi_2)(a) = \varphi_1(a) + \varphi_2(a) = 0 + 0 = 0\) and \((c \cdot \varphi)(a) = c \cdot \varphi(a) = c \cdot 0 = 0\), etc.
We then calculate
\begin{align}\frac{d}{dt}\bigg\lvert_0 F[f + t \cdot \varphi] &= \frac{d}{dt}\bigg\lvert_0\int_a^b L(x,f(x) + t \cdot \varphi(x),f'(x) + t \cdot \varphi'(x))~dx \\&= \int_a^b \frac{d}{dt}\bigg\lvert_0 L(x,f(x) + t \cdot \varphi(x),f'(x) + t \cdot \varphi'(x))~dx \\&= \int_a^b \left[\varphi(x) \cdot \frac{\partial L}{\partial y_2}(x,f(x)+t\cdot\varphi(x),f'(x)+t\cdot\varphi'(x)) + \varphi'(x) \cdot \frac{\partial L}{\partial y_3}(x,f(x)+t\cdot\varphi(x),f'(x)+t\cdot\varphi'(x)) \right]_{t=0} ~dx\\&= \int_a^b \varphi(x) \cdot \frac{\partial L}{\partial y_2}(x,f(x),f'(x)) + \varphi'(x) \cdot\frac{\partial L}{\partial y_3}(x,f(x),f'(x)) ~dx\end{align}
Next, we can apply integration by parts to the second term of the integrand to obtain the following:
\[\int_a^b \left[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x))\right] \cdot \varphi(x)~dx + \left[\varphi(x) \cdot \frac{\partial L}{\partial y_3}(x,f(x),f'(x))\right]_a^b.\]
Applying the boundary conditions \(\varphi(a) = 0\) and \(\varphi(b) = 0\), this yields
\[\frac{d}{dt}\bigg\lvert_0 F[f + t \cdot \varphi] = \int_a^b \left[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x))\right] \cdot \varphi(x)~dx.\]
Thus, under the appropriate boundary conditions, the functional derivative is
\[\frac{\delta F}{\delta f}(x) = \frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x)).\]
The expression
\[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x))\]
is called the Euler-Lagrange formula.
Theorem. (The Fundamental Lemma of the Calculus of Variations) If \(f \in C[a,b]\) and if \(\int_a^b f(x)h(x)~dx=0\) for all \(h \in C[a,b]\) with \(h(a) = h(b) = 0\), then \(f(x) = 0\) for all \(x \in [a,b]\).
Proof. We will prove the contrapositive. Let \(f \in C[a,b]\), and suppose that \(f\) is non-zero somewhere in \([a,b]\). Since \(f\) is continuous, there is some interval \([c,d] \subset [a,b]\) on which \(f\) has the same sign. Define a function \(h\) such that \(h(x) = (x - c)(d - x)\) if \(x \in [c, d]\) and \(h(x) = 0\) otherwise. Then, \(h \in C[a,b]\) and \(h(a) = h(b) = 0\). Then
\[\int_a^b f(x)h(x)~dx = \int_c^d f(x)h(x)~dx,\]
and since \(h(x) > 0\) on the interval \([c,d]\), it follows that the integral is positive when \(f\) is positive and negative when \(f\) is negative, so \(\int_c^d f(x)h(x)~dx \ne 0\). \(\square\)
The Euler-Lagrange formula is often used to find the the critical points of functions . The critical points occur when
\[\frac{d}{dt}\bigg\lvert_0 F[f + t \cdot \varphi] = \int_a^b \frac{\delta F}{\delta f}(x) \cdot \varphi(x)~dx = 0.\]
Thus, by the Euler-Lagrange formula, under the boundary conditions, it follows that
\[\int_a^b \left[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x))\right] \cdot \varphi(x)~dx = 0,\]
and applying the Fundamental Lemma of the Calculus of Variations, it follows that
\[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x)) = 0.\]
This equation is called the Euler-Lagrange equation.
Example. We will apply the Euler-Lagrange equation to compute the shortest path between two points \(A\) and \(B\) on the Euclidean plane \(\mathbb{R}^2\). Each path is represented as a curve, that is, as a function \(f : [a,b] \rightarrow \mathbb{R}\). The curve can be conceived as the graph of \(f\), i.e. the set of points \((x, f(x)) \in \mathbb{R}^2\). The curve can thus be parameterized by the map \(r(x) = (r^1(x),r^2(x)) = (x,f(x))\). To compute the length of the curve, we use a line integral:
\[F[f] = \int_a^b \lVert r'(x) \rVert~dx.\]
Note that
\begin{align}\lVert r'(x) \rVert &= \left\lVert \left(\frac{\partial r^1}{\partial x}(x), \frac{\partial r^2}{\partial x}(x)\right) \right\rVert \\&= \left\lVert \left(1, \frac{d f}{d x}(x) \right) \right\rVert \\&= \sqrt{1 + (f'(x))^2},\end{align}
so the integral can also be written as
\[F[f] = \int_a^b L(x,f(x),f'(x))~dx = \int_a^b \sqrt{1 + f'(x)}~dx,\]
where \(L\) is the map
\[L(y_1,y_2,y_3) = \sqrt{1 + y_3^2}.\]
The map \(F\) is thus a functional. Our goal is to find the value of \(f\) that minimizes the curve length, i.e. the minimum of the functional \(L\).
We compute
\[\frac{\partial L}{\partial y_2} = 0,\]
and
\[\frac{\partial L}{\partial y_3} = \frac{y_3}{\sqrt{1 + y_3^2}},\]
so that
\[L(x,f(x),f'(x)) = \frac{f'(x)}{\sqrt{1 + (f'(x))^2}}.\]
Writing \(L(x,f(x),f'(x))\) as an abbreviation for the function \(x \mapsto L(x,f(x),f'(x))\), the Euler-Lagrange equation yields
\[\frac{\partial L}{\partial y_2}(x,f(x),f'(x)) - \frac{d}{dx}\frac{\partial L}{\partial y_3}(x,f(x),f'(x)) = 0 - \frac{d}{dx}\frac{f'(x)}{\sqrt{1 + (f'(x))^2}} = 0,\]
so
\[ \frac{d}{dx}\frac{1}{\sqrt{1 + (f'(x))^2}} = 0.\]
This yields a differential equation. We solve by integrating both sides of the equation to yield
\[\frac{f'(x)}{\sqrt{1 + (f'(x))^2}} = C\]
for some constant \(C\) (which will be the negation of the constant of integration).
It then follows that
\[\frac{(f'(x))^2}{1 + (f'(x))^2} = C^2,\]
and so
\[\frac{1}{C^2} = \frac{1 + (f'(x))^2}{(f'(x))^2} = \frac{1}{(f'(x))^2} + 1.\]
Then
\[\frac{1}{(f'(x))^2} = \frac{1}{C^2} - 1 = \frac{1-C^2}{C^2},\]
so
\[(f'(x))^2 = \frac{C^2}{1-C^2}\]
and
\[f'(x) = \pm \sqrt{\frac{C^2}{1-C^2}}.\]
Since \(f'(x)\) is continuous, it is therefore everywhere equal to a constant \(\alpha\) which is either the positive or negative root. Then, integrating the differential equation
\[f'(x) = \alpha,\]
we obtain
\[f(x) = \alpha \cdot x + \beta\]
for some constant of integration \(\beta\). Thus, the shortest path between two points in the Euclidean plane is a straight line.