The Hyperreal Numbers

This post discusses the basics of non-standard analysis.

The Hyperreal Numbers
Generated by Google Gemini using the prompt "generate an artistic depiction of infinitesimals".

The notion of infinitesimals, i.e., infinitely small quantities, was applied historically by figures such as Newton and Leibniz to problems of analysis. This concept was later criticized since it originally lacked a rigorous formal foundation and was eventually replaced with the notion of limits. However, the mathematician Abraham Robinson demonstrated in his seminal book "Nonstandard Analysis" that it is possible to give infinitesimals a fully precise and rigorous foundation and to exploit them usefully. In this post we give a detailed overview of the basis of this theory.

Superstructures

First, we will define a mathematical universe within which we will operate. Recall that the standard set-theoretic universe (the cumulative hierarchy or von Neumann universe) is defined via transfinite recursion as the class of sets \(V_{\alpha}\) indexed by the class of ordinal numbers as follows:

  • \(V_0 = \emptyset\).
  • \(V_{\beta + 1} = \mathcal{P}(V_{\beta})\) for any ordinal number \(\beta\), where \(\mathcal{P}(V_{\beta})\) denotes the power set of \(V_{\beta}\).
  • \(V_{\lambda} = \bigcup_{\beta \lt \lambda}V_{\beta}\) for any limit ordinal \(\lambda\).

This represents a systematic, hierarchical construction of a universe containing all well-founded, pure sets and is often used to motivate and interpret the standard axioms of set theory (the ZFC axioms). It serves as the arena for "ordinary mathematics".

Superstructures also represent a certain kind of cumulative hierarchy. However, they differ from the von Neumann universe in the following ways:

  • They begin with a given set of atoms \(A\) (or urelements) as opposed to the empty set.
  • They are truncated, i.e., they stop the accumulation at the first infinite ordinal \(\omega\). Thus, they can be defined as \(V(A) = \bigcup_{n \in \mathbb{N}}V_n(A)\) in terms of a sequence of sets \((V_n(A))_{n \in \mathbb{N}}\) by ordinary induction instead of transfinite induction.
  • They form an ascending chain, i.e., \(V_0(A) \subsetneq V_1(A) \subsetneq V_2(A) \subsetneq \dots\).

Note: we use standard set theory (ZFC) as the metatheory. The atoms are not true urelements; conceptually, they are the "seed" for the universe, but we use ordinary set theory, so there are "sets all the way down" (hereditary sets).

Definition (Superstructure). A superstructure over a set of atoms \(A\) (such that \(\emptyset \notin A\)) is the set \(V(A)\) defined as

\[V(A) = \bigcup_{n \in \mathbb{N}}V_n(A)\]

where the sequence of sets \((V_n(A))_{n \in \mathbb{N}}\) is defined by induction as follows:

  • \(V_0(A) = A\).
  • \(V_{n+1}(A) = V_n(A) \cup \mathcal{P}(V_n(A)).\)

The condition \(\emptyset \notin A\) ensures that the empty set plays a unique role within the hierarchy as the empty set (which arises necessarily in \(V_1(A)\)) and not an ambiguous role as both an atom and the empty set.

By design, every element of \(V(A)\) has a unique, finite rank, which is the index \(n\) of the set \(V_n(A)\) containing the first occurrence of the element.

Definition (Rank). The rank of an element \(x \in V(A)\) within a superstructure \(V(A)\) over a set \(A\) is the least index \(n \in \mathbb{N}\) such that \(x \in V_n(A)\) and is denoted \(\mathrm{rank}(x)\).

We are interested in the superstructure \(V(\mathbb{R})\). This superstructure represents all of analysis. It includes the real numbers \(\mathbb{R}\), all pairs of real numbers \((a, b) \in \mathbb{R} \times \mathbb{R}\) (since we can interpret the pair \((a,b)\) as the set \(\{a, \{a, b\}\}\)), all sets of pairs of real numbers (and hence all relations and functions), and all sets of sets of real numbers (and hence all topologies), etc. Thus it is more than sufficient for "ordinary" analysis.

The Bounded Quantifier Language

Next, we want a way to formalize properties of the elements of a superstructure. We thus define a specialized first-order language with a single binary non-logical relation symbol \(\in\) used to represent set membership. We restrict all quantifiers such that they are bounded, i.e., we only permit quantifications of the form \((\forall x \in D) \varphi\) or \((\exists x \in D) \varphi\) for some term (variable or constant) \(D\) and we exclude quantification of the form \(\forall x \varphi\) or \(\exists x \varphi\). This restriction, together with the unique rank of each element of a superstructure, ensures that we can apply strong induction.

Signatures

First, we will describe signatures which define the non-logical symbols of a first-order language.

Definition (Signature). A signature \(\Sigma\) is a set whose elements are called constant symbols.

Note: typically, a signature includes function symbols and relation symbols and an arity function that indicates the arities of the symbols. However, in the context of this post, we will work exclusively with signatures that lack any function symbols and have a single implicit relation symbol \((\in)\) representing set membership. Thus, it suffices to consider the signature simply as a set of constants.

In particular, we are concerned with the signature \(\Sigma_{V(A)}\) associated to a superstructure, where there is one constant symbol per element of \(V(A)\).

Definition (Superstructure Signature). The signature \(\Sigma_{V(A)}\) associated with a superstructure \(V(A)\) over a set of atoms \(A\) is the signature that associates a unique representative constant symbol \(\overline{c}\) to every element \(c \in V(A)\), i.e.,

\[\Sigma_{V(A)} = \{ \overline{c} \mid c \in V(A)\}.\]

It doesn't matter which set represents each constant, as long as it is unique. The operation \(c \mapsto \overline{c}\) is any suitable bijection.

Languages

Next, we will define the language associated with a signature. Languages are defined inductively. We begin by specifying the terms of a language, which are analogous to noun phrases in natural language. Roughly speaking, they are the expressions about which the language makes assertions.

Variables

Every language has an associated countable set of variables \(\mathcal{V}\), often denoted \(v_0, v_1, \dots\), etc. We may choose any metavariables whatsoever (e.g. \(x\), \(y\), \(D\), etc.), but each metavariable denotes an element of the set \(\mathcal{V}\) which formally represents the stock of variables.

Definition (Term). Given a signature \(\Sigma\), a term relative to this signature is an element of the set \(\mathrm{Term(\Sigma)}\) defined inductively as follows:

  • Every variable \(v \in \mathcal{V}\) is a term.
  • Every constant symbol \(c \in \Sigma\) is a term.

Now we can define (bounded) formulas, which are the well-formed expressions of a language.

Definition (Formula). Given a signature \(\Sigma\), a (bounded) formula relative to this signature is an element of the set \(\mathrm{Form}(\Sigma)\) defined inductively as follows:

  • \(\top\) is a formula.
  • \(\bot\) is a formula.
  • \((t_1 \in t_2)\) is a formula for all terms \(t_1\) and \(t_2\).
  • \((t_1 = t_2)\) is a formula for all terms \(t_1\) and \(t_2\).
  • \((\neg \varphi)\) is a formula for every formula \(\varphi\).
  • \((\varphi \land \psi)\) is a formula for all formulas \(\varphi\) and \(\psi\).
  • \(((\exists x \in D) \varphi)\) is a formula for every variable \(x \in \mathcal{V}\), term \(D\), and formula \(\varphi\).

We also may omit parentheses when writing formulas informally when there is no ambiguity. However, the parentheses are always part of the formal expressions.

We permit customary abbreviations such as the following:

  • \((\varphi \lor \psi)\) for \((\neg ((\neg \varphi) \land (\neg \psi)))\).
  • \((\varphi \rightarrow \psi)\) for \(((\neg \varphi) \lor \psi)\).
  • \((\varphi \leftrightarrow \psi)\) for \(((\varphi \rightarrow \psi) \land (\psi \rightarrow \varphi))\).
  • \(((\forall x \in D)\varphi)\) for \((\neg ((\exists x \in D)(\neg \varphi)))\).

The formulas for a signature represent a language.

Definition (Language). A language \(\mathcal{L}_{\Sigma}\) for a signature \(\Sigma\) is the collection of formulas \(\mathrm{Form}(\Sigma)\).

In particular, we are concerned with the language \(\mathcal{L}_{V(A)}\) associated to a superstructure, i.e., the language for the signature \(\Sigma_{V(A)}\).

Certain formulas are atomic, i.e., they can't be decomposed into constituent formulas.

Definition (Atomic Formula). An atomic formula of a language for a signature \(\Sigma\) is an element of the set \(\mathrm{Atom}(\Sigma)\) defined inductively as follows:

  • \(\top\) is an atomic formula.
  • \(\bot\) is an atomic formula.
  • \((t_1 \in t_2)\) is an atomic formula for all terms \(t_1\) and \(t_2\).
  • \((t_1 = t_2)\) is an atomic formula for all terms \(t_1\) and \(t_2\).

Next, we will define the variables of a term.

Definition (Variables of a Term). The variables of a term \(t\) are the elements of the set \(\mathrm{Var}(t)\) defined recursively as follows:

  • \(\mathrm{Var}(v) = \{v\}\) for all variables \(v \in \mathcal{V}\).
  • \(\mathrm{Var}(c) = \emptyset\) for all constants \(c \in \Sigma\).

Next, we will define the set of variables of a formula.

Definition (Variables of a Formula). The variables of a formula \(\varphi\) are the elements of the set \(\mathrm{Var}(\varphi)\) defined recursively as follows:

  • \(\mathrm{Var}(\top) = \emptyset\).
  • \(\mathrm{Var}(\bot) = \emptyset\).
  • \(\mathrm{Var}((t_1 \in t_2)) = \mathrm{Var}(t_1) \cup \mathrm{Var}(t_2)\) for all terms \(t_1\) and \(t_2\).
  • \(\mathrm{Var}((t_1 = t_2)) = \mathrm{Var}(t_1) \cup \mathrm{Var}(t_2)\) for all terms \(t_1\) and \(t_2\).
  • \(\mathrm{Var}((\neg \varphi)) = \mathrm{Var}(\varphi)\) for every formula \(\varphi\).
  • \(\mathrm{Var}((\varphi \land \psi)) = \mathrm{Var}(\varphi) \cup \mathrm{Var}(\psi)\) for all formulas \(\varphi\) and \(\psi\).
  • \(\mathrm{Var}((\exists x \in D)\varphi) = \{x\} \cup \mathrm{Var}(D) \cup \mathrm{Var}(\varphi)\) for every variable \(x \in \mathcal{V}\), term \(D\), and formula \(\varphi\).

It is important to distinguish between free and bound occurrences of variables in formulas; the former are not in the scope of any quantifier whereas the latter are in the scope of a quantifier.

Definition (Free Variable). A free variable of a formula \(\varphi \in \mathrm{Form}(\Sigma)\) of a language for a signature \(\Sigma\) is an element of the set \(\mathrm{Free}(\varphi)\) defined recursively as follows:

  • If \(\varphi \in \mathrm{Atom}(\Sigma)\), then \(\mathrm{Free}(\varphi) = \mathrm{Var}(\varphi)\).
  • \(\mathrm{Free}(\neg \varphi) = \mathrm{Free}(\varphi)\) for every formula \(\varphi\).
  • \(\mathrm{Free}((\varphi \land \psi)) = \mathrm{Free}(\varphi) \cup \mathrm{Free}(\psi)\) for all formulas \(\varphi\) and \(\psi\).
  • \(\mathrm{Free}((\exists x \in D)\varphi) = \mathrm{Free}(\varphi) \setminus \{x\}\) for every variable \(x \in \mathcal{V}\), term \(D\), and formula \(\varphi\).

Definition (Bound Variable). A bound variable of a formula \(\varphi \in \mathrm{Form}(\Sigma)\) of a language for a signature \(\Sigma\) is an element of the set \(\mathrm{Bound}(\varphi)\) defined as \(\mathrm{Bound}(\varphi) = \mathrm{Var}(\varphi) \setminus \mathrm{Free}(\varphi)\).

Now we can define open and closed formulas.

Definition (Open Formula). A formula \(\varphi\) is open if \(\mathrm{Free}(\varphi) \ne \emptyset\).

Definition (Closed Formula). A formula \(\varphi\) is closed if \(\mathrm{Free}(\varphi) = \emptyset\).

Definition (Sentence). A sentence is a closed formula.

Next, we define the substitution of a term for a variable within another term.

Definition (Substitution in Terms). The substitution \(s[t/x]\) of a term \(t\) for a variable \(x\) within another term \(s\) is the result of the application of a function \((s, t, x) \mapsto s[t/x]\) defined recursively as follows:

  • \(x[t/x] = t\) for every variable \(x \in \mathcal{V}\) and term \(t\).
  • \(y[t/x] = y\) for all variables \(x, y \in \mathcal{V}\) such that \(y \ne x\) and every term \(t\).
  • \(c[t/x] = c\) for every constant symbol \(c \in \Sigma\), term \(t\), and variable \(x \in \mathcal{V}\).

Next, we define how to generate fresh variables, i.e., variables that don't occur in some expression.

Definition (Fresh Variable Function). The function \(\mathrm{fresh} : \mathcal{P}_f(\mathcal{V}) \rightarrow \mathcal{V}\) is defined as \(\mathrm{fresh}(X) = v_n\) for any finite subset \(X \subseteq \mathcal{V}\) where

\[n = \min \{n \in \mathbb{N} \mid v_n \in \mathcal{V} \setminus X\}.\]

In other words, the function \(\mathrm{fresh}\) picks the first variable that doesn't occur in some set of variables.

Now we will define substitution in formulas.

Definition (Substitution in Formulas). The substitution \(\varphi[t/x]\) of a term \(t\) for a variable \(x\) within a formula \(\varphi\) for a signature \(\Sigma\) is the result of the application of a function \((\varphi, t, x) \mapsto \varphi[t/x]\) defined via structural recursion as follows:

  • \(\top[t/x] = \top\) for every term \(t\) and variable \(x \in \mathcal{V}\).
  • \(\bot[t/x] = \bot\) for every term \(t\) and variable \(x \in \mathcal{V}\).
  • \((t_1 \in t_2)[t/x] = (t_1[t/x] \in t_2[t/x])\) for all terms \(t, t_1, t_2\) and every variable \(x \in \mathcal{V}\).
  • \((t_1 = t_2)[t/x] = (t_1[t/x] = t_2[t/x])\) for all terms \(t, t_1, t_2\) and every variable \(x \in \mathcal{V}\).
  • \((\neg\varphi)[t/x] = (\neg\varphi[t/x])\) for every formula \(\varphi\), term \(t\), and variable \(x \in \mathcal{V}\).
  • \((\varphi \land \psi)[t/x] = (\varphi[t/x] \land \psi[t/x])\) for all formulas \(\varphi, \psi\), every term \(t\), and every variable \(x \in \mathcal{V}\).
  • \(((\exists x \in D) \varphi)[t/x] = ((\exists x \in D) \varphi)\) for every variable \(x \in \mathcal{V}\), all terms \(t, D\), and every formula \(\varphi\).
  • \(((\exists y \in D) \varphi)[t/x] = ((\exists y \in D) \varphi[t/x])\) if \(y \ne x\) and \(y \notin \mathrm{Free}(t)\) for all variables \(x,y \in \mathcal{V}\), all terms \(t,D\), and every formula \(\varphi\).
  • \(((\exists y \in D) \varphi)[t/x] = ((\exists z \in D) \varphi[z/y, t/x])\) if \(y \ne x\) and \(y \in \mathrm{Free}(t)\) for all variables \(x,y \in \mathcal{V}\), all terms \(t, D\), and every formula \(\varphi\), where \(z = \mathrm{fresh}(\mathrm{Var}(\varphi) \cup \mathrm{Var}(t) \cup \{x\})\).

We may write \(\varphi[t_1/x_1, \dots, t_n/x_n]\) to represent multiple simultaneous substitutions, which may be defined recursively as

\[\varphi[t_1/x_1, \dots, t_n/x_n] = (\varphi[t_1/x_1,\dots, t_{n-1}/x_{n-1}])[t_n/x_n].\]

We will also write \(\varphi(x_1, \dots, x_n)\) to indicate a formula whose free variables are precisely \(x_1, \dots, x_n\) and \(\varphi(t_1, \dots, t_n)\) for the substitution \(\varphi[t_1/x_1, \dots, t_n/x_n]\).

Satisfaction

Definition (\(\in\)-structure). An \(\in\)-structure \(\mathcal{A} = (\lvert \mathcal{A} \rvert, (-)^{\mathcal{A}}, \in^{\mathcal{A}})\) consists of a set \(\lvert \mathcal{A} \rvert\) called the domain, a function \((-)^{\mathcal{A}} : \Sigma \rightarrow \lvert \mathcal{A} \rvert\) called the constant interpretation that assigns each constant \(c \in \Sigma\) to a value \(c^{\mathcal{A}} \in \lvert \mathcal{A} \rvert\), and a binary relation \(\in^{\mathcal{A}}\) on \(\lvert \mathcal{A} \rvert\) called the \(\in\)-interpretation.

Definition (Variable Assignment). A variable assignment for an \(\in\)-structure \(\mathcal{A}\) is a function \(s : \mathcal{V} \rightarrow \lvert \mathcal{A} \rvert\) that assigns an element of \(\lvert \mathcal{A} \rvert\) to every variable. Given a variable assignment \(s\) and an element \(a \in \lvert \mathcal{A} \rvert\), the variable assignment \(s[a/x]\) for a variable \(x \in \mathcal{V}\) is defined as

\[s[a/x](y) = \begin{cases}a & \text{if } y = x \\ s(y) & \text{otherwise.}\end{cases}\]

Definition (Value of a Term). The value \(\mathrm{val}^{\mathcal{A}}_s(t)\) of a term \(t\) within an \(\in\)-structure \(\mathcal{A} \) relative to a variable assignment \(s\) is defined by structural recursion as follows:

  • \(\mathrm{val}^{\mathcal{A}}_s(c) = c^{\mathcal{A}}\) for all constant symbols \(c\in \Sigma\) corresponding to the value \(c \in V(A)\).
  • \(\mathrm{val}^{\mathcal{A}}_s(x) = s(x)\) for all variables \(x \in \mathcal{V}\).

Definition (Satisfaction). The satisfaction predicate \(\mathcal{A},s \models \varphi\) for formulas \(\varphi\) is defined recursively for an \(\in\)-structure \(\mathcal{A} = (\lvert \mathcal{A} \rvert, \in^{\mathcal{A}})\) and variable assignment \(s\) as follows:

  • \(\mathcal{A},s \models \top\).
  • \(\mathcal{A},s \not \models \bot\).
  • \(\mathcal{A},s \models (t_1 \in t_2)\) if and only if \(\mathrm{val}^{\mathcal{A}}_s(t_1) \in^{\mathcal{A}} \mathrm{val}_s(t_2)\).
  • \(\mathcal{A},s \models (t_1 = t_2)\) if and only if \(\mathrm{val}^{\mathcal{A}}_s(t_1) = \mathrm{val}_s(t_2)\).
  • \(\mathcal{A},s \models (\neg \varphi)\) if and only if \(\mathcal{A},s \not \models \varphi\).
  • \(\mathcal{A},s \models (\varphi \land \psi)\) if and only if \(\mathcal{A},s \models \varphi\) and \(\mathcal{A},s \models \psi\).
  • \(\mathcal{A},s \models ((\exists x \in D) \varphi)\) if and only if \(\mathcal{A},s[a/x] \models \varphi\) for some \(a \in \lvert \mathcal{A} \rvert\) such that \(a \in^{\mathcal{A}} \mathrm{val}^{\mathcal{A}}_s(D)\).

If \(\mathcal{A},s \models \varphi\), we say that the \(\in\)-structure satisfies the formula \(\varphi\) under the variable assignment \(s\).

Definition (Truth). The truth predicate \(\mathcal{A}\models \varphi\) for sentences \(\varphi\) is defined for an \(\in\)-structure \(\mathcal{A}\) as follows: \(\mathcal{A} \models \varphi\) if and only if \(\mathcal{A},s \models \varphi\) for all variable substitutions \(s\). If \(\mathcal{A} \models \varphi\), we say that the sentence \(\varphi\) is true in the \(\in\)-structure \(\mathcal{A}\). Given a set of sentences \(\Gamma\), we also write \(\mathcal{A} \models \Gamma\) to indicate that \(\mathcal{A}\models \varphi\) for every \(\varphi \in \Gamma\).

Ultrafilters and Ultrapowers

Before examining the details, we consider the following preliminary intuitive motivation. Consider an infinite sequence of real numbers

\[\varepsilon = (r_1, r_2, \dots, r_n, \dots).\]

We will interpret this sequence as a single new "hyperreal" number. We think of each index \(n \in \mathbb{N}\) in this sequence as a distinct parallel "world". Each world hosts its own "copy" of the real numbers \(\mathbb{R}\). The number \(r_n\) is how world \(n\) "sees" the hyperreal number \(\varepsilon\): it is the "representative" number that represents the sequence to the respective world. We then proceed to ask questions about this sequence, such as "is the hyperreal number greater than \(0\)". Each world consults its own representative and then "casts a vote" indicating its answer. If \(r_n \gt 0\), world \(n\) votes "yes"; otherwise, world \(n\) votes "no". The set of indices of worlds that voted "yes" is a set \(X \subseteq \mathbb{N}\). If there were a finite number of worlds, we could simply tally the votes and let the majority rule the outcome. However, we have an infinite number of votes, so we need another mechanism to decide if a particular set \(X\) constitutes a majority of "yes" votes. The following conditions characterize a plausible set \(\mathcal{U}\) of majority ratifications:

  • \(\mathbb{N} \in \mathcal{U}\): if the vote is unanimously "yes", then it is certainly ratified.
  • \(\emptyset \notin \mathcal{U}\): if the vote is unanimously "no", then the motion certainly fails.
  • If \(A \in \mathcal{U}\) and \(A \subseteq B\) then \(B \in \mathcal{U}\): if \(A\) constitutes a ratifying majority and \(B\) is a superset of \(A\), then \(B\) is also a ratifying majority.
  • If \(A \in \mathcal{U}\) and \(B \in \mathcal{U}\), then \(A \cap B \in \mathcal{U}\): if two different "bills" are "ratified", then their "joint bill" must be ratified also for the system to remain consistent.
  • For every subset \(A \subseteq \mathbb{N}\), either \(A \in \mathcal{U}\) or \(A^c \in \mathcal{U}\), but not both: there can be no "gridlocks" or "indecision"; every proposal passes or fails.

A set satisfying these conditions is called an ultrafilter.

If the ultrafilter is a principal ultrafilter, i.e. if it has a least element, then we can show that \(\{n\} \in \mathcal{U}\) for some index \(n\) and thus whatever world \(n\) votes always wins. This means that the hyperreals defined via \(\mathcal{U}\) will simply be equal to world \(n\)'s copy, namely to \(\mathbb{R}\). This is uninteresting, so we require that the ultrafilter be non-principal. We can show that no finite sets belong to a non-principal ultrafilter, which is desirable: a finite vote should never win. The condition \(\emptyset \notin \mathcal{U}\) is equivalent to the condition that the ultrafilter be non-principal.

Next, we decide when two hyperreal numbers (i.e. sequences) \((x_n)\) and \((y_n)\) are "the same". We examine the indices where they agree:

\[X = \{n \in \mathbb{N} \mid x_n = y_n\}.\]

If \(X \in \mathcal{U}\), then this constitutes a vote that the two sequences are equal. This is an equivalence relation \(x \sim_{\mathcal{U}} y\) defined as follows:

\[x \sim_{\mathcal{U}} y \text{ if and only if } \{n \in \mathbb{N} \mid x_n = y_n\} \in \mathcal{U}.\] The equivalence class \([x]\) then represents a hyperreal number:

\[[x] = \{y \in \mathbb{R}^{\mathbb{N}} \mid y \sim_{\mathcal{U}} x\}.\]

We then compute a quotient \(\mathbb{R}^{\mathbb{N}} / \sim_{\mathcal{U}}\):

\[\{[x] \mid x \in \mathbb{R}^{\mathbb{N}}\}.\]

This quotient is called an ultrapower.

Let's see how this permits us to define infinitesimal numbers. Consider the harmonic sequence:

\[\varepsilon = \left(1, \frac{1}{2}, \frac{1}{3}, \dots, \frac{1}{n}, \dots\right).\]

We first put to the vote whether \(\varepsilon \gt 0\). Since \(\varepsilon_n \gt 0\) for all \(n\), the vote is unanimous. Thus \(\varepsilon \gt 0\).

We next put to the vote whether \(\varepsilon \lt r\) for all \(r \in \mathbb{R}\) such that \(r \gt 0\). Let \(r\) be an arbitrary real number such that \(r \gt 0\). By the Archimedean property of \(\mathbb{R}\), there exists a natural number \(n\) such that \(\frac{1}{n} \lt r\). Thus, at every world \(m\) where \(m \ge n\), \(\varepsilon_m = \frac{1}{m} \le \frac{1}{n} \lt r\), and thus the vote is "yes". As previously indicated, there are no finite sets in a non-principal ultrafilter. The set of "no" votes is finite and thus does not belong to the non-principal ultrafilter. This means that its complement, namely, the set of "yes" votes does belong to the ultrafilter. Thus, \(\varepsilon\) is a hyperreal number that is greater than \(0\) yet less than every positive real number; in other words, it is an infinitesimal number.

Ultrafilters

Now we will state the standard definition of an ultrafilter.

Definition (Ultrafilter). Given a set \(I\), an ultrafilter is a set \(\mathcal{U} \subseteq \mathcal{P}(I)\) satisfying the following properties:

  • It is a filter, namely:
    • It is non-empty: \(\mathcal{U} \ne \emptyset\).
    • It is upward closed: for every set \(A \in \mathcal{U}\), if \(B \in \mathcal{P}(I)\) and \(A \subseteq B\), then \(B \in \mathcal{U}\).
    • It is closed under intersections: for all sets \(A, B \in \mathcal{U}\), \(A \cap B \in \mathcal{U}\).
  • It is proper: \(\emptyset \notin \mathcal{U}\). Note that, since \(\mathcal{U}\) is upward closed, \(\emptyset \in \mathcal{U}\) if and only if \(\mathcal{U} = \mathcal{P}(I)\), so a proper filter is equivalently a filter \(\mathcal{U}\) such that \(\mathcal{U} \ne \mathcal{P}(I)\).
  • It is maximal, namely, if \(\mathcal{F} \subseteq \mathcal{P}(I)\) is a proper filter and \(\mathcal{U} \subseteq \mathcal{F}\), then \(\mathcal{U} = \mathcal{F}\).

This definition is different from the one we used previously, but it is equivalent. We will state the alternative definition next.

Definition (Ultrafilter - Alternative). Given a set \(I\), an ultrafilter is a set \(\mathcal{U} \subseteq \mathcal{P}(I)\) satisfying the following properties:

  • It is a filter, namely:
    • It is non-empty: \(\mathcal{U} \ne \emptyset\).
    • It is upward closed: for every set \(A \in \mathcal{U}\), if \(B \in \mathcal{P}(I)\) and \(A \subseteq B\), then \(B \in \mathcal{U}\).
    • It is closed under intersections: for all sets \(A, B \in \mathcal{U}\), \(A \cap B \in \mathcal{U}\).
  • It is proper: \(\emptyset \notin \mathcal{U}\). Note that, since \(\mathcal{U}\) is upward closed, \(\emptyset \in \mathcal{U}\) if and only if \(\mathcal{U} = \mathcal{P}(I)\), so a proper filter is equivalently a filter \(\mathcal{U}\) such that \(\mathcal{U} \ne \mathcal{P}(I)\).
  • It is bivalent: for every set \(A \in \mathcal{P}(I)\), either \(A \in \mathcal{U}\) or \(A^c \in \mathcal{U}\) (where \(A^c = I \setminus A\)).

Thus, ultrafilters are filters characterized either by maximality or bivalence. We will now show that these definitions are equivalent.

Theorem. A proper filter \(\mathcal{U}\) on a set \(I\) is maximal if and only if it is bivalent.

Proof. Suppose that \(\mathcal{U}\) is a bivalent proper filter. Let \(\mathcal{F}\) be any proper filter such that \(\mathcal{U} \subseteq \mathcal{F}\) and suppose that there exists a set \(A \in \mathcal{F}\) such that \(A \notin \mathcal{U}\). Then, since \(\mathcal{U}\) is bivalent, \(A^c \in \mathcal{U}\). But, since \(\mathcal{U} \subseteq \mathcal{F}\), it follows that \(A^c \in \mathcal{F}\), and since \(\mathcal{F}\) is closed under intersections, it further follows that \(A \cap A^c = \emptyset \in \mathcal{F}\), which contradicts the axiom \(\emptyset \notin \mathcal{F}\). Therefore, if \(\mathcal{U}\) is a bivalent filter, \(\mathcal{F}\) is a filter, and \(\mathcal{U} \subseteq \mathcal{F}\), then there is no element \(A\) such that \(A \in \mathcal{F} \setminus \mathcal{U}\), i.e., \(\mathcal{F} \subseteq \mathcal{U}\) and hence \(\mathcal{U} = \mathcal{F}\).

Next, suppose that \(\mathcal{U}\) is a maximal proper filter. We will show that it is also bivalent, namely, either \(A \in \mathcal{U}\) or \(A^c \in \mathcal{U}\) for every set \(A \in \mathcal{P}(I)\), or equivalently, that if \(A \notin \mathcal{U}\), then \(A^c \in \mathcal{U}\). Suppose \(A \notin \mathcal{U}\). Consider the following set:

\[\mathcal{F} = \{B \subseteq I \mid A \cap U \subseteq B \text{ for some } U \in \mathcal{U}\}.\]

The set \(\mathcal{F}\) is non-empty and upward closed by construction. If \(B_1, B_2 \in \mathcal{F}\), then there exist sets \(U_1, U_2 \in \mathcal{U}\) such that \(A \cap U_1 \subseteq B_1\) and \(A \cap U_2 \subseteq B_2\), and, since \(\mathcal{U}\) is closed under intersections, \(U_1 \cap U_2 \in \mathcal{U}\) and thus, since \(A \cap (U_1 \cap U_2) \subseteq B_1 \cap B_2\), it follows that \(B_1 \cap B_2 \in \mathcal{F}\). Thus, \(\mathcal{F}\) is a filter.

Furthermore, \(\mathcal{U} \subseteq \mathcal{F}\) since \(A \cap U \subseteq U\) for every \(U \in \mathcal{U}\) . Since \(A \cap U \subseteq A\) for every \(U \in \mathcal{U}\), \(A \in \mathcal{F}\), yet \(A \notin \mathcal{U}\) by assumption, and hence \(\mathcal{U} \ne \mathcal{F}\). Thus, \(\mathcal{U} \subsetneq \mathcal{F}\). Since \(\mathcal{U}\) is maximal with respect to proper filters, it follows that \(\mathcal{F}\) is not proper. Hence \(\emptyset \in \mathcal{F}\) and thus there exists a \(U \in \mathcal{U}\) such that \(A \cap U = \emptyset\), which implies that \(U \subseteq A^c\). Since \(\mathcal{U}\) is upward closed, \(A^c \in \mathcal{U}\), as required. \(\square\)

Existence of Ultrafilters

It does not matter which non-principal ultrafilter we use; we simply require that some non-principal ultrafilter exists for our index set \(I\). Since our metatheory is ZFC, we can invoke the ultrafilter lemma, which is a theorem of ZFC.

Theorem (Ultrafilter Lemma). Every proper filter on a set \(I\) can be extended to an ultrafilter on \(I\).

Given an index set \(I\), we can construct the Fréchet filter on \(I\), which is the proper filter consisting of all cofinite subsets of \(I\):

\[\mathcal{F} = \{A \subseteq I \mid I \setminus A \text{ is finite}\}.\]

The ultrafilter lemma then ensures that there exists an ultrafilter \(\mathcal{U}\) such that \(\mathcal{F} \subseteq \mathcal{U}\). Note that this ultrafilter must be non-principal. If \(\mathcal{U}\) is principal, then it has the form

\[\mathcal{U} = \{A \subseteq I \mid i \in A\}\]

for some \(i \in I\). Since \(\{i\}\) is a finite set, it follows that \(I \setminus \{i\} \in \mathcal{F}\) and thus \(I \setminus \{i\} \in \mathcal{U}\). But \(\{i\} \in \mathcal{U}\), so \(I \setminus \{i\} \cap \{i\} = \emptyset \in \mathcal{U}\). Since \(\mathcal{U}\) is proper by definition, this is a contradiction. Therefore, \(\mathcal{U}\) is non-principal.

Note that this makes the foundations non-constructive, since the ultrafilter lemma relies on the axiom of choice.

Ultrapowers

Ultrafilters can be used to construct ultrapowers. Ultrapowers can be used to generalize the sequences of real numbers discussed previously to sequences of elements of a superstructure.

Definition (Ultrapower). Given an ultrafilter \(\mathcal{U}\) on an index set \(I\) and a set \(X\), the ultrapower of \(X\) modulo \(\mathcal{U}\) is defined as the quotient

\[X^I / \mathcal{U},\]

where

\[X^I = \left\{ f : I \rightarrow X \right\}\]

indicates the set of all sequences of elements of \(X\) indexed by \(I\) (the exponential set), and

\[X^I / \mathcal{U} = X^I / \sim_{\mathcal{U}},\]

where \(\sim_{\mathcal{U}}\) is the equivalence relation defined for each \(f,g \in X^I\) as

\[f \sim_{\mathcal{U}} g \leftrightarrow \{i \in I \mid f(i) = g(i)\} \in \mathcal{U},\]

so that

\[X^I / \mathcal{U} = \left\{[f]_{\mathcal{U}} \mid f \in X^I\right\},\]

where the equivalence class \([f]_{\mathcal{U}}\) is defined as

\[[f]_{\mathcal{U}} = \left\{g \in X^I \mid f \sim_{\mathcal{U}} g \right\}.\]

Note that we often just write \([f]\) for an equivalence class since the ultrafilter is implicit.

The Nonstandard Universe

We can define a membership relation \((\in^*)\) on \({^*}V(A)\) as follows:

\[[f] \in^* [g] \leftrightarrow \{i \in I \mid f(i) \in g(i)\} \in \mathcal{U}.\]

We then define an embedding \(^*(-) : V(A) \rightarrow V(A)^I / \mathcal{U}\) that embeds each element \(x \in V(A)\) into \(V(A)^I / \mathcal{U}\) via the corresponding constant sequence

\[^*x = [c_x]\]

where \(c_x(i) = x\) for all \(i \in I\).

We can now define the non-standard universe.

Definition (Nonstandard Universe). The nonstandard universe is the \(\in\)-structure \((^*V(A), \overline{c} \mapsto {^*}c, \in^*)\) whose domain \(^*V(A)\) is

\[^*V(A) = \bigcup_{n \in \mathbb{N}} (V_n(A)^I / \mathcal{U}),\]

whose constant interpretation is the map \(\overline{c} \mapsto {^*}c\) for all \(\bar{c} \in \Sigma_{V(A)}\), and whose \(\in\)-interpretation is \(\in^*\) restricted to this domain.

We also define the standard universe.

Definition (Standard Universe). The standard universe is the \(\in\)-structure \((V(A), \overline{c} \mapsto c, \in)\) whose domain is the superstructure \(V(A)\), whose constant interpretation is the map \(\overline{c} \mapsto c\) for every \(\overline{c} \in \Sigma_{V(A)}\), and whose \(\in\)-interpretation is \(\in\) restricted to this domain.

The Hyperreals

We can now define the set of hyperreal numbers \({^*}\mathbb{R}\) as follows:

\[{^*}\mathbb{R} = {^*}(\mathbb{R}).\]

Since \(\mathbb{R} \in V(\mathbb{R})\), the natural embedding \({^*}\mathbb{R}\) is simply

\[[c_{\mathbb{R}}]\]

for the constant sequence \(c_{\mathbb{R}}\), i.e., the sequence

\[(\mathbb{R}, \mathbb{R}, \mathbb{R}, \dots).\]

In the non-standard universe, we use \(\in^*\) as the membership relation. Thus, \(R \in^* {^*}\mathbb{R}\) means that \(R = [r]\) for some \(r \in V(\mathbb{R})^I\) and \(r(i) \in \mathbb{R}\) for all \(i \in I\). In other words, each hyperreal number is an equivalence class of sequences of real numbers (externally speaking).

Łos's Theorem (The Transfer Principle)

Theorem (Łos - Transfer Principle). For any formula \(\varphi(x_1, \dots, x_n) \in \mathcal{L}_{V(A)}\) with free variables among \(x_1, \dots, x_n \in \mathcal{V}\), for any elements \([f_1], \dots, [f_n] \in {^*}V(A)\) and any variable assignments \(\sigma_1 : \mathcal{V} \rightarrow {^*}V(A)\) and \(\sigma_2 : \mathcal{V} \rightarrow V(A)\),

\[\left(^*V(A), \overline{c} \mapsto {^*}c, \in^* \right), \sigma_1 \left[[f_1]/x_1, \dots, [f_n]/x_n \right] \models \varphi \left(x_1, \dots, x_n \right)\] \[\Leftrightarrow \left\{i \in I \;\middle|\; \left(V(A), \overline{c} \mapsto c, \in \right), \sigma_2 \left[f_1(i)/x_1, \dots, f_n(i)/x_n \right] \models \varphi \left(x_1, \dots, x_n\right)\right\} \in \mathcal{U}.\]

In other words, a statement about a collection of non-standard elements is true in the non-standard universe if and only if it is true in a qualified majority of worlds.

Proof. We proceed by structural induction on the formula \(\varphi\). We define the following abbreviations:

\[\sigma^* = \sigma_1\left[[f_1]/x_1, \dots, [f_n]/x_n\right]\]

\[\sigma_i = \sigma_2\left[f_1(i)/x_1, \dots, f_n(i)/x_n\right].\]

1. Base Case: Atomic Formulas

  • \(\varphi\) is \(x_j = x_k\) for variables \(x_j, x_k\) and indices \(j, k \in 1,\dots,n\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models x_j = x_k &\Leftrightarrow [f_j] = [f_k] \\&\Leftrightarrow \{i \in I \mid f_j(i) = f_k(i) \} \in \mathcal{U} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models x_j = x_k\} \in \mathcal{U}.\end{align*}
  • \(\varphi\) is \(\overline{c_1} = \overline{c_2}\) for some constants \(\overline{c_1}, \overline{c_2}\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models \overline{c_1} = \overline{c_2} &\Leftrightarrow {^*}c_1 = {^*}c_2 \\&\Leftrightarrow \{i \in I \mid c_1 = c_2 \} \in \mathcal{U} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \overline{c_1} = \overline{c_2}\} \in \mathcal{U}.\end{align*}
  • \(\varphi\) is \(x_j \in x_k\) for variables \(x_j,x_k\) and indices \(j,k \in 1, \dots, n\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*),\sigma^* \models x_j \in x_k &\Leftrightarrow [f_j] \in^* [f_k] \\&\Leftrightarrow \{i \in I \mid f_j(i) \in f_k(i)\} \in \mathcal{U} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in),\sigma_i \models x_j \in x_k\}.\end{align*}
  • \(\varphi\) is \(\overline{c_1} \in \overline{c_2}\) for constants \(\overline{c_1}, \overline{c_2}\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*),\sigma^* \models \overline{c_1} \in \overline{c_2}&\Leftrightarrow {^*}c_1 \in^* {^*}c_2 \\&\Leftrightarrow \{i \in I \mid c_1\in c_2\} \in \mathcal{U} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in),\sigma_i \models \overline{c_1} \in \overline{c_2}\}.\end{align*}
  • Cases involving a mixture of constants and variables are similar.

2. Connectives

  • \(\varphi\) is \(\neg \psi\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models \neg \psi & \Leftrightarrow (^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \not \models \psi \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \psi\} \notin \mathcal{U} & \text{(Inductive Hypothesis)} \\&\Leftrightarrow I \setminus \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \psi\} \in \mathcal{U} & \text{(Bivalence)} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \not \models \psi\} \in \mathcal{U} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \neg \psi\} \in \mathcal{U}.\end{align*}
  • \(\varphi\) is \(\psi \land \theta\). Note that \begin{align*}(^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models \psi \land \theta &\Leftrightarrow (^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models \psi \text{ and } (^*V(A), \overline{c} \mapsto {^*}c, \in^*), \sigma^* \models \theta \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \psi \} \in \mathcal{U} \text{ and } \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \theta \} \in \mathcal{U} & \text{(Inductive Hypothesis)} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \psi \text{ and } (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \theta \} \in \mathcal{U} & \text{(Intersection Closure)} \\&\Leftrightarrow \{i \in I \mid (V(A), \overline{c} \mapsto c, \in), \sigma_i \models \psi \land \theta \} \in \mathcal{U}.\end{align*}

3. Bounded Quantifiers

\(\varphi\) is \((\exists x \in x_k)\psi\).

  • \((\Rightarrow)\). Suppose that \((^*V(A), \overline{c} \mapsto {^*}c, \in^*) \models (\exists x \in x_k) \psi\). By definition, there exists an element \([g] \in \sigma^*(x_k)\) (i.e., \([g] \in [f_k]\)) such that \((^*V(A), \overline{c} \mapsto {^*}c, \in^*),\sigma^*\left[[g]/x\right] \models \psi\). Define \[B_1 = \{i \in I \mid g(i) \in f_k(i)\}\] and \[B_2 = \{i \in I \mid (V(A), \overline{c} \mapsto c, \in),\sigma_i\left[g(i)/x\right] \models \psi\}.\] By the inductive hypothesis for \(x \in x_k\), \(B_1 \in \mathcal{U}\), and by the inductive hypothesis for \(\psi\), \(B_2 \in \mathcal{U}\). Since \(\mathcal{U}\) is closed under intersections, \(B_1 \cap B_2 \in \mathcal{U}\). Thus, since \(\mathcal{U}\) is upward closed, \begin{align*}B_1 \cap B_2 &= \{i \in I \mid g(i) \in f_k(i) \text{ and } (V(A), \overline{c} \mapsto c, \in),\sigma_i\left[g(i)/x\right] \models \psi\} \\&\subseteq \{i \in I \mid a \in f_k(i) \text{ and } (V(A),\overline{c} \mapsto c, \in),\sigma_i\left[a/x\right] \models \psi \text{ for some } a \in f_k(i)\} \\&= \{i \in I \mid (V(A), \overline{c} \mapsto c, \in),\sigma_i \models (\exists x \in x_k)\psi\} \in \mathcal{U}.\end{align*}
  • \((\Leftarrow)\). Suppose \[W = \{i \in I \mid (V(A), \overline{c} \mapsto c, \in),\sigma_i \models (\exists x \in x_k)\psi\} \in \mathcal{U}.\] We define a family of sets indexed by \(I\) for any non-empty dummy set \(X \in V(A)\) as follows: \[Y_i = \begin{cases}\{y \in f_k(i) \mid (V(A),\overline{c} \mapsto c, \in),\sigma_i[y/x] \models \psi\} & \text{if } i \in W \\ X & \text{if } i \notin W\end{cases}.\] Since \(Y_i\) is non-empty for all \(i \in I\), we apply the Axiom of Choice in the metatheory to produce a choice function \(g : I \rightarrow \cup_{i \in I}Y_i\) such that \(g(i) \in Y_i\) for all \(i \in I\). By construction, \(g(i) \in f_k(i)\) for all \(i \in W\). Since \([f_k] \in {^*}V(A)\), by construction, \([f_k] \in^* V_m(A)^I/\mathcal{U}\) for some \(m \in \mathbb{N}\). Thus, \(g(i) \in f_k(i) \in V_m(A)\) for all \(i \in W\), which implies that \(g(i) \in V_{m-1}(A)\) and \([g] \in^* V_m(A)^I/\mathcal{U} \subseteq {^*}V(A)\). Since \(W \in \mathcal{U}\), by definition, \([g] \in^* [f_k]\). Since \(g(i) \in Y_i\) for all \(i \in W\), by inductive hypothesis, \[({^*}V(A), \overline{c} \mapsto {^*}c, \in^*),\sigma^* \models \psi.\] Together with \([g] \in^* [f_k]\), this implies that \[({^*}V(A), \overline{c} \mapsto {^*}c, \in^*),\sigma^* \models (\exists x \in x_k)\psi.\]

\(\square\)

Transfer from Reals to Hyperreals

Next, we'll discuss a few applications of the transfer principle.

Recall that we defined the natural embedding of a standard set \(x \in V(A)\) as the set \(^*x\) which is the image of \(x\) under the map \(^*(-) : V(A) \rightarrow V(A)^I/\mathcal{U}\) defined as

\[^*x = [c_x]_{\mathcal{U}},\]

where \(c_x : I \rightarrow V(A)\) is the constant sequence defined as \(c(i) = x\) for all \(i \in I\).

Let \(\overline{\mathbb{R}} \in \Sigma_{V(\mathbb{R})}\) denote the constant symbol corresponding to \(\mathbb{R} \in V(\mathbb{R})\) and let \(\overline{0}\) denote the constant symbol corresponding to \(0 \in V(\mathbb{R})\). Consider the following field axiom expressing the existence of additive inverses for real numbers:

\[\varphi = \forall x \in \overline{\mathbb{R}} \exists y \in \overline{\mathbb{R}} (x + y = \bar{0}).\]

Since this is true of the real numbers, it follows that

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in) \models \varphi.\]

Since \(\varphi\) is a closed formula satisfied by this \(\in\)-structure, it is automatically the case for every variable assignment \(\sigma : \mathcal{V} \rightarrow V(\mathbb{R})\) that

\[\{i \in I \mid (V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma \models \varphi\} = I \in \mathcal{U}.\]

Thus, invoking the transfer principle, we obtain

\[(^*V(\mathbb{R}), \overline{c} \mapsto {^*}c, \in^*) \models \varphi.\]

By the definition of satisfaction, this means that, for all \(x \in {^*}\mathbb{R}\), there exists a \(y \in {^*}\mathbb{R}\) such that \(x +^* y = {^*}0\). The expression \(x +^* y = {^*}0\) is shorthand for a rather long expression expressed in terms of \(\in^*\).

Often, this is written in a notational shorthand as follows:

\[\forall x \in {^*}\mathbb{R} \exists y \in {^*}\mathbb{R} (x +^* y = {^*}0).\]

However, note that this informal shorthand is not a sentence in the formal first-order language \(\mathcal{L}_{V(\mathbb{R})}\) or in any formal first-order language; it is just the usual informal shorthand employed in mathematics.

This same demonstration works for all the axioms of an ordered field. Thus, \(^*\mathbb{R}\) is an ordered field, just like \(\mathbb{R}\). This is a very powerful result, since the field \(^*\mathbb{R}\) is substantially expanded (it contains infinitesimals, etc.). We may thus manipulate the hyperreal numbers according to the same algebraic rules as the real numbers.

Transfer from Hyperreals to Reals

Next, we'll indicate an example of transfer from the hyperreals to the reals. Along the way, we will introduce some key concepts.

Infinitesimals

We can now define the infinitesimal hyperreal numbers.

Definition (Infinitesimal). An infinitesimal is an element of the set \(\mathrm{Inf}(^*\mathbb{R})\) defined as follows:

\[\mathrm{Inf}({^*}\mathbb{R}) = \{x \in {^*}\mathbb{R} \mid \lvert x \rvert \lt^* {^*}r \text{ for all } r \in \mathbb{R}^+\}.\]

Thus, infinitesimals are those hyperreal numbers whose "size" (absolute value) is strictly less than all standard positive real numbers. Note that \({^*}0 \in \mathrm{Inf}({^*}\mathbb{R})\); we are generally interested in non-zero infinitesimals.

We can define an equivalence relation on hyperreal numbers using infinitesimals.

Note: we will often suppress the \({^*}\) notation when it is clear that we are referring to an element of the nonstandard universe. We typically suppress this for relations and operations but retain it for sets and numbers.

Definition (Infinitesimal Closeness). Two hyperreal numbers \(x,y \in {^*}\mathbb{R}\) are infinitesimally close, written \(x \approx y\), if they differ by an infinitesimal, namely

\[x \approx y \text{ if and only if } \lvert x - y \rvert \in \mathrm{Inf}({^*}\mathbb{R}).\]

Next, we will show that this is an equivalence relation. It is clearly reflexive and symmetric, so we will show that it is also transitive. Suppose \(x \approx y\) and \(y \approx z\). By definition, for any real number \(r \gt 0\), since \(\frac{r}{2} \gt 0\), it follows that \(\lvert x - y \rvert \lt^* {^*}\frac{r}{2}\) and \(\lvert y - z \rvert \lt^* {^*}\frac{r}{2}\). By the triangle inequality on \(\mathbb{R}\) (which is expressible in \(\mathcal{L}_{V(\mathbb{R})}\) and therefore transfers to \({^*}\mathbb{R}\)),

\begin{align*}\lvert x - z \rvert &= \lvert (x - y) + (y - z) \rvert \\&\le \lvert x - y \rvert + \lvert y - z \rvert \\&\lt {^*}\frac{r}{2} + {^*}\frac{r}{2} \\&= {^*}r.\end{align*}

Thus, since \(r\) was an arbitrary positive real number, \(\lvert x - z \rvert \in \mathrm{Inf}({^*}\mathbb{R})\), i.e., \(x \approx z\).

Definition (Finite Hyperreal). A hyperreal number \(x\) is finite if there exists a standard real number \(r \gt 0\) such that \(\lvert x \rvert \lt {^*}r\).

Every finite hyperreal number has a unique standard part, i.e., a unique real number that is infinitesimally closest to it. Let \(x\) be any finite hyperreal. Consider the following set (which is analogous to a Dedekind cut):

\[L = \{r \in \mathbb{R} \mid {^*}r \lt x\}.\]

Since \(x\) is finite, \(L\) has a standard real upper bound and is non-empty. By the completeness of the real numbers, \(L\) has a least upper bound \(c = \sup (L) \in \mathbb{R}\). Suppose \(x \not \approx c\). Then, by definition, there exists some real number \(\varepsilon\) such that \(\lvert x - c \rvert \ge \varepsilon\). If \(x - c \ge \varepsilon\), then \(c + \varepsilon \le x\), contradicting the fact that \(c\) is an upper bound of \(L\). If \(c - x \ge \varepsilon\), then \(x \le c - \varepsilon\), contradicting the fact that \(c\) is the least upper bound of \(L\). Thus, \(x \approx c\). Since suprema are unique, \(c\) is therefore unique.

Definition (Standard Part). The standard part of a finite hyperreal number \(x\) is the unique real number \(\mathrm{st}(x)\) defined as follows:

\[\mathrm{st}(x) = \sup \{r \in \mathbb{R} \mid {^*}r \lt x\}.\]

We can also show that every finite hyperreal number is decomposable into a standard part and an infinitesimal.

Theorem. For every finite hyperreal number \(x\), \(x \approx {^*}\mathrm{st}(x)\).

Proof. By definition, \(\mathrm{st}(x) = \sup L\) where \(L = \{r \in \mathbb{R} \mid {^*}r \lt x\}\). This means that, for any real number \(\varepsilon\), there exists an \(r \in L\) such that \(\lvert \mathrm{st}(x) - r \rvert \lt \varepsilon\). By transfer, \(\lvert {^*}\mathrm{st}(x) - {^*}r \rvert \lt {^*}\varepsilon\). Since \(r \in L\), it follows that \({^*}r \lt x\), and thus \(\lvert {^*}\mathrm{st}(x) - x \rvert \lt {^*}\varepsilon\). Since \(\varepsilon\) was arbitrary, \(\lvert {^*}\mathrm{st}(x) - x \rvert \in \mathrm{Inf}({^*}\mathbb{R})\) and so \({^*}\mathrm{st}(x) \approx x\).

Theorem (Decomposition). Every finite hyperreal number \(x\) is uniquely decomposable into a sum \({^*}\mathrm{st}(x) + \delta\), where \(\delta\) is an infinitesimal.

Proof. Define \(\delta = x - {^*}\mathrm{st}(x)\). Then \(x = {^*}\mathrm{st}(x) + \delta\) and \(\delta\) is infinitesimal since \(x \approx {^*}\mathrm{st}(x)\). Now suppose that \(x = {^*}r_1 + \delta_1\) and \(x = {^*}r_2 + \delta_2\) for real numbers \(r_1, r_2\) and infinitesimals \(\delta_1, \delta_2\). Then \({^*}r_1 - {^*}r_2 = \delta_2 - \delta_1\) and thus, by the linearity of the natural embedding, \({^*}(r_1 - r_2) = \delta_2 - \delta_1\). Therefore, \({^*}(r_1 - r_2)\) is infinitesimal, which implies that \(r_1 - r_2 = 0\). This then implies that \(\delta_2 - \delta_1 = 0\). Thus, \(r_1 = r_2\) and \(\delta_1 = \delta_2\). \(\square\)

Theorem. If \(x \approx y\) then \(\mathrm{st}(x) = \mathrm{st}(y)\).

Proof. Since \({^*}\mathrm{st}(x) \approx x\) and \(x \approx y\) and \(y \approx {^*}\mathrm{st}(x)\), by transitivity, \({^*}\mathrm{st}(x) \approx {^*}\mathrm{st}(x)\). Thus, \(\lvert {^*}\mathrm{st}(x) - {^*}\mathrm{st}(y) \rvert \lt r\) for all real numbers \(r \gt 0\) which, by linearity, implies that \(\lvert {^*}(\mathrm{st}(x) - \mathrm{st}(y))\rvert \lt r\). This implies that \(\mathrm{st}(x) - \mathrm{st}(y) = 0\), since, otherwise, it would follow, for instance, that \(\mathrm{st}(x) - \mathrm{st}(y) \lt (\mathrm{st}(x) - \mathrm{st}(y)) / 2\) for instance (since \(r \gt 0\) is arbitrary). Thus \(\mathrm{st}(x) = \mathrm{st}(y)\). \(\square\)

One fact that we will use is the following: if a finite hyperreal number \(x\) lies within a compact interval \({^*}[0, 1]\), then its standard part lies in the interval \([0,1]\). To see this, suppose that \(x \in {^*}[0, 1]\). Suppose \(c = \mathrm{st}(x) \gt 1\). Then \(x \lt {^*}1 \lt {^*}c\) and \(\varepsilon = c - 1 \gt 0\). Since \(c \approx x\), \(\lvert x - {^*}c \rvert \lt {^*}\varepsilon\), and thus, since \(x \lt {^*}c\), \(x \gt {^*}c - {^*}\varepsilon = {^*}c - ({^*}c - {^*}1) = {^*}1\), contradicting the assumption that \(x \lt {^*}1\). Suppose \(c \lt 0\). Then \(x \gt {^*}0 \gt {^*}c\) and \(\varepsilon = 0 - c \gt 0\). Since \(c \approx x\), \(\lvert x - {^*}c \rvert \lt {^*}\varepsilon\), and thus, since \(x \gt {^*}c\), \(x \lt {^*}c + {^*}\varepsilon = {^*}c - {^*}c = 0\), contradicting the assumption that \(x \gt {^*}0\). Thus, if \(x \in {^*}[0,1]\), then \(\mathrm{st}(x) \in [0, 1]\). This argument immediately generalizes to any compact interval.

Next, we will describe the monad of a hyperreal number, which is, roughly speaking, the "infinitesimal halo" of the number, i.e., the set of all hyperreals infinitesimally close to it.

Definition (Monad). The monad of a standard real number \(c\) is the following set:

\[\mathrm{Monad}(c) = \{x \in {^*}\mathbb{R} \mid x \approx {^*}c\}.\]

We can now state the definition of continuity in terms of infinitesimals.

Definition (Continuity). A (standard) function \(f : I \rightarrow \mathbb{R}\) is continuous at a point \(c \in I\) if, for every hyperreal number \(x\),

\[x \approx {^*}c \Rightarrow ({^*}f)(x) \approx {^*}(f(c)).\]

Next, we will demonstrate another example of transfer from the reals to the hyperreals: we will show that if a function is continuous in the standard sense, then it is continuous in the infinitesimal sense.

Theorem. If \(f : [0, 1] \rightarrow \mathbb{R}\) is continuous (in the standard sense), then \({^*}f : {^*}[0, 1] \rightarrow {^*}\mathbb{R}\) is continuous (in the hyperreal sense).

Proof.

Let \(f : [0,1] \rightarrow \mathbb{R}\) be any (standard) continuous function and let \(c \in [0,1]\) be any point. Let \([X] \in {^*}[0,1]\) be any hyperreal number and suppose that \([X] \approx {^*}c\). The open formula in \(\mathcal{L}_{V(\mathbb{R})}\) expressing continuity (in the standard definition) at the point \(c\) is as follows:

\[\varphi(x, \varepsilon, \delta) = \lvert x - \overline{c} \rvert \lt \delta \rightarrow \lvert \overline{f}(x) - \overline{f}(\overline{c}) \rvert \lt \varepsilon.\]

The free variables are \(x, \varepsilon, \delta\). The constant symbols are \(\overline{f}, \overline{c}\). Since \(f\) is continuous, it follows that, for any variable assignment \(\sigma : \mathcal{V} \rightarrow V(A)\) and any real number \(\varepsilon_0 \gt 0\), there exists a \(\delta_0 \gt 0\) such that, for all \(i \in I\),

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[X(i)/x, \varepsilon_0/\varepsilon, \delta_0/\delta] \models \varphi.\]

Then, by utilizing the natural embedding for each value in our variable assignment and invoking the transfer principle, we obtain (for any variable assignment \(\sigma : \mathcal{V} \rightarrow {^*}V(\mathbb{R})\))

\[({^*}V(\mathbb{R}), \overline{c} \mapsto {^*}c, \in^*),\sigma[[X]/x, {^*}\varepsilon_0/\varepsilon, {^*}\delta_0/\delta] \models \varphi.\]

By definition of satisfaction, this means that if \(\lvert [X] - {^*}c \rvert \lt {^*}\delta_0\), then \(\lvert ({^*}f)([X]) - ({^*}f)({^*}c) \rvert \lt {^*}\varepsilon_0\).

Now, since \([X] \approx {^*}c\), by definition, \(\lvert [X] - {^*}c\rvert \lt {^*}r\) for all standard real number \(r\) and thus in particular \(\lvert [X] - {^*}c\rvert \lt {^*}\delta_0\). Thus, it follows that

\[\lvert ({^*}f)([X]) - ({^*}f)({^*}c) \rvert \lt {^*}\varepsilon_0.\]

Since \(\varepsilon\) was an arbitrary positive real number, it follows that

\[({^*}f)([X]) \approx ({^*}f)({^*}c).\]

Thus, \(f\) is continuous is the hyperreal sense. \(\square\)

We are now prepared to demonstrate an example of transfer from the hyperreals to the reals.

Theorem. If \(f : [0, 1] \rightarrow \mathbb{R}\) is continuous, then it is also uniformly continuous.

Proof. By the previous theorem, since \(f\) is continuous in the standard sense, \({^*}f\) is continuous in the hyperreal sense. Let \(\varepsilon_0\) be any real number such that \(\varepsilon_0 \gt 0\). Let \(\delta\) be any infinitesimal, let \(x,y\) be any hyperreal numbers, and suppose that \(\lvert x - y \rvert \lt \delta\). Then, by definition, \(x \approx y\) and, since \({^*}f\) is continuous, \(({^*}f)(x) \approx ({^*}f)(y)\) which means that \(\lvert ({^*}f)(x) - ({^*}f)(y) \rvert \in \mathrm{Inf}({^*}\mathbb{R})\) and thus, by definition, \(\lvert ({^*}f)(x) - ({^*}f)(y) \rvert \lt {^*}\varepsilon_0\). We have thus established that

\[({^*}V(\mathbb{R}), \overline{c} \mapsto {^*}c, \in^*),\sigma[{^*}\varepsilon_0/\varepsilon] \models \varphi\]

where

\[\varphi(\varepsilon) = \exists \delta \in \overline{\mathbb{R}^+}\forall x \in \overline{\mathbb{R}} \forall y \in \overline{\mathbb{R}}(\lvert x - y \rvert \lt \delta \rightarrow \lvert \overline{f}(x) - \overline{f}(y) \rvert \lt \varepsilon).\]

Note that \(\delta\) was an arbitrary infinitesimal, not an arbitrary hyperreal, so even though there are infinitely many candidate values, the domain of quantification is still a proper subset of the hyperreals, so it is an existential quantification. The variable \(\varepsilon\) is free; we are quantifying (in the metatheory) over \(\mathbb{R}\) and applying transfer for an arbitrary value of \(\varepsilon\).

Now, by transfer, since \({^*}\varepsilon_o = [c_{\varepsilon_0}]\) (where \(c_{\varepsilon_0}(i) = \varepsilon_o\) for all \(i \in I\)), it follows that

\[\{i \in I \mid (V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[c_{\varepsilon_0}(i)/\varepsilon] \models \varphi\} \in \mathcal{U}.\]

Now, since \(\emptyset \notin \mathcal{U}\), it follows that there exists some index \(i \in I\) such that

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[c_{\varepsilon_0}(i)/\varepsilon] \models \varphi,\]

namely

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[\varepsilon_0/\varepsilon] \models \varphi.\]

Since \(\varepsilon_o\) was arbitrary, this shows that

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in)\models (\forall \varepsilon \in \mathbb{R}^+) \varphi.\]

Thus, \(f\) is uniformly continuous. \(\square\)

Hypernatural Numbers

We can also define the hypernatural numbers as follows:

Definition (Hypernatural Numbers). The set of hypernatural numbers \({^*}\mathbb{N}\) is the image of \(\mathbb{N}\) under the natural embedding, i.e.,

\[{^*}\mathbb{N} = {^*}(\mathbb{N}).\]

Just as with the hyperreals, this implies that (from an external perspective), hypernatural numbers are sequences of natural numbers, indexed by the index set \(I\).

We can construct infinite hypernatural numbers by using unbounded sequences. If \(I = \mathbb{N}\), then the sequence

\[(1,2,3,4,\dots)\]

is unbounded, so there is an equivalence class (hypernatural number)

\[\omega = [(1, 2, 3, 4, \dots)]\]

and thus for any \(N \in \mathbb{N}\), the set

\[\{i \in \mathbb{N} \mid \omega(i) \lt N\}\]

is finite, so it follows that

\[\{i \in \mathbb{N} \mid \omega(i) \ge N\} \in \mathcal{U}.\]

Since \(N\) was arbitrary, \(\omega\) is a hypernatural number that is greater than any standard hypernatural number (i.e., the embedding \({^*}n\) of a natural number \(n\)).

Example

To demonstrate the utility of the hypernatural numbers, we will show how they permit us to represent Riemann integrals as actual (internally) finite sums (i.e. hyperfinite sums).

Let \(f : [0, 1] \rightarrow \mathbb{R}\) be any continuous function and suppose that

\[\int_0^1 f(x) ~dx = s.\]

It follows that

\[\lim_{n \to \infty}S(n) = \int_0^1 f(x) ~dx,\]

where \(S : \mathbb{N} \rightarrow \mathbb{R}\) computes the \(n\)-th partial right Riemann sum:

\[S(n) = \sum_{i=1}^n f\left(\frac{i}{n}\right)\frac{1}{n}.\]

We can embed the function \(S\) into the nonstandard universe to produce a function \({^*}S : {^*}\mathbb{N} \rightarrow {^*}\mathbb{R}\).

We can choose any infinite hypernatural number, such as the hypernatural number \(\omega\) we defined earlier. We can evaluate our function at this number:

\[{^*}S(\omega) = \sum_{i=1}^{\omega}f\left(\frac{i}{\omega}\right)\frac{1}{\omega}.\]

This is a perfectly valid expression. Note, however, that the bound is an infinite hypernatural number, the expression \(\frac{1}{\omega}\) is an infinitesimal.

Recall that \({^*}S(\omega) = v\) is shorthand for \((\omega, v) \in^* {^*}S\), which is again shorthand, and this ultimately implies (from an external perspective) that

\[{^*}S(\omega) = \left[(S(1), S(2), S(3), \dots, S(k), \dots)\right]_{\mathcal{U}}.\]

In other words, it represents a sequence of increasingly finer Riemann sums:

\[{^*}S(\omega) = \left[\left(\sum_{i=1}^{1}f\left(\frac{i}{1}\right)\frac{1}{1}, \sum_{i=1}^{2}f\left(\frac{i}{2}\right)\frac{1}{2}, \sum_{i=1}^{3}f\left(\frac{i}{3}\right)\frac{1}{3}, \dots, \sum_{i=1}^{k}f\left(\frac{i}{k}\right)\frac{1}{k}, \dots \right)\right]_{\mathcal{U}}.\]

Now, in the standard universe, it holds for all \(\varepsilon_0 \in \mathbb{R}^+\) that

\[(V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[\varepsilon_0/\varepsilon] \models \varphi\]

where

\[\varphi(\varepsilon) = \exists K \in \mathbb{N} \forall k \in \mathbb{N} (k \gt K \rightarrow \lvert S(k) - \overline{s}\rvert \lt \varepsilon).\]

This formula expresses the convergence of the sequence of Riemann sums. Since

\[\{i \in \mathbb{N} \mid (V(\mathbb{R}), \overline{c} \mapsto c, \in),\sigma[\varepsilon_0/\varepsilon] \models \varphi\} = \mathbb{N} \in \mathcal{U},\]

by the transfer principle, it follows that

\[({^*}V(\mathbb{R}), \overline{c} \mapsto {^*}c, \in^*),\sigma[{^*}\varepsilon_0/\varepsilon] \models \varphi).\]

By definition, this means that there is some \(K \in {^*}\mathbb{N}\) such that, for all \(k \in {^*}\mathbb{N}\), if \(k \gt K\), then \(\lvert ({^*}S)(k) - {^*}s \rvert \lt {^*}\varepsilon_o\). In particular, we may choose \(k = \omega\), which means that

\[\lvert ({^*}S)(\omega) - {^*}s \rvert \lt {^*}\varepsilon_0.\]

Since \(\varepsilon_0\) was arbitrary, this means that \(\lvert ({^*}S)(\omega) - s \rvert \in \mathrm{Inf}({^*}\mathbb{R})\) and thus

\[({^*}S)(\omega) \approx {^*}s.\]

Applying our theorem about standard parts, this implies that

\[\mathrm{st}(({^*}S)(\omega)) = \mathrm{st}({^*}s) = s,\]

i.e., that

\[\mathrm{st}\left(\sum_{i = {^*}1}^{\omega}f\left(\frac{i}{\omega}\right)\frac{1}{\omega}\right) = \int_0^1 f(x)~dx.\]

Thus, the standard part of the hyperfinite sum computes the Riemann integral exactly.

Internal vs. External Sets

Implicit in the discussion so far is the distinction between internal and external sets. An internal set is simply an element of \({^*}V(A)\). An external set is a subset of \({^*}V(A)\) which is not internal. Only internal sets obey the transfer principle.

Conclusion

The hyperreal numbers provide several powerful tools. Infinitesimals allows us to employ the intuitive concept of infinitely small quantities in a fully rigorous fashion. The hypernatural numbers, combined with infinitesimals, provide a bridge between the discrete and continuous worlds and allow us to employ discrete reasoning to continuous concepts.