Why quaternions?

Points [vectors] in the plane can be represented using complex numbers, and then can be “multiplied”, i.e. can be given the structure of an \mathbb{R}-algebra. Conversely, complex numbers can be viewed as geometric objects, and operations on them given geometric meaning, by the same construction.
Is there an analogue of this for points in 3-space? Hamilton tried to find such an analogue, and was led to the quaternions. There is a (possibly apocryphal) story of how his son asked him every morning: “Well, Papa, can you multiply triplets?” and always got the same answer: “No, I can only add and subtract them”, with a sad shake of the head. What Hamilton realized was that this is indeed not quite possible … unless we embed the triplets in a 4-dimensional algebra (the quaternions)—this is what Hamilton realized that morning and inscribed on Brougham Bridge.
Quaternions are a concise method of representing the automorphisms of three- and four-dimensional spaces. They have the technical advantage that unit quaternions form the simply connected cover of SO(3). For this reason, quaternions are used in computer graphics (Tomb Raider is often cited as the first mass-market computer game to have used quaternions to achieve smooth 3D rotation), control theory, signal processing, attitude control, physics, bioinformatics, and orbital mechanics.For example, it is common for spacecraft attitude-control systems to be commanded in terms of quaternions.
Quaternion [algebra]s have received another boost from number theory because of their relation to quadratic forms.
Nevertheless, quaternions remain relatively obscure outside of, and even within large parts of, mathematics. Today, vector analysis—which was in fact born out of the subsequent development of quaternions—is used to do the mathematics and physics that could be done with quaternions.  Or, to steal Simon L. Altmann‘s words,
“… quaternions appear to exude an air of nineteenth century decay, as a rather unsuccessful species in the struggle-for-life of mathematical ideas. Mathematicians, admittedly, still keep a warm place in their hearts for the remarkable algebraic properties of quaternions but, alas, such enthusiasm means little to the harder-headed physical scientist.”  (1986.)


  • Wikipedia’s “History of quaternions” and “Quaternion
  • (see also) John Conway and Derek Smith, On Quaternions and Octonions: Their Geometry, Arithmetic, and Symmetry

A (small) regularity zoo


Hoelder C^k Schwarz Sobolev L^p Hardy Orlicz

The image is a imagemap (which doesn’t scale because doesn’t allow Javascript widgets :/ and so everything probably looks off on your screen): move your cursor around it and explore!

Some general (imprecise) notes:

  • Function spaces in the left two columns are defined by regularity conditions
  • Those in the right three columns are defined by integrability conditions (BMO, the space of functions with bounded mean oscillation, is the dual of the Hardy space H^1)
  • Sobolev spaces W^{k,p} are defined by some mix of integrability and regularity conditions.
  • Most of the spaces above are Banach spaces; some (e.g. L^2) are Hilbert spaces.
  • Functions in any of the above spaces are (a fortiori) measurable.
  • Sobolev spaces embed into C^k / Hoelder and L^p spaces; the precise statements can get tricky.
  • This map is in no way comprehensive, it only sketches some links between some common families.
Overview / Outlines

A functional analysis primer

(Again, no proofs–see e.g. Stein and Shakarchi’s Real Analysis and Functional Analysis.)

Function spaces

Functional analysis deals with spaces of functions. Typically these are infinite-dimensional vector spaces with some sort of norm (Banach spaces), or even better, inner product (Hilbert spaces), and the additional structure has good analytic properties, i.e. the [induced] norm is complete.

If we do not assume completeness, we get pre-Banach or pre-Hilbert spaces; these can be completed to Banach or Hilbert spaces, and the completion is unique up to isomorphism.

The prototypical infinite-dimensional Hilbert space is the space of square-summable sequences \ell^2(\mathbb{Z}), with the inner product \langle (a_n), (b_n) \rangle = \sum_n a_n b_n; in fact, this is universal (for separable Hilbert spaces), and any L^2(X,\mu) (with the inner product \langle f, g \rangle = \int_X fg \,d\mu) is unitarily equivalent by the Fourier transform.

It is less clear what the universal Banach space is, but the L^p spaces (with 1 \leq p \leq \infty) are good prototypical examples.

Many other function spaces of interest are also Banach spaces.

Dual spaces

Linear maps between Banach spaces are also called linear operators. Given a linear operator T, we define its sup norm by \|T\| = \sup_{\|f\| = 1} \|T(f)\|. Operators with finite sup norm are called bounded. A linear operator T is bounded iff it is continuous.

Given any Banach space X, we can form the dual space X^* of all bounded linear functionals (i.e.. This is a Banach space with the sup norm \|\ell\| = \sup_{\|f\| = 1} \|\ell(f)\|.

Note that a linear functional (more generally, any linear map between Banach spaces) is bounded iff it is continuous.

Some examples

The dual space of L^p(X) for 1 \leq p < \infty is L^q(X), where q is the dual exponent which satisfies \frac 1p + \frac 1q = 1. Note (L^\infty)^* \supsetneq L^1—e.g. it contains the Dirichlet delta functionals, which are not representable by L^1-functions.

The dual space of the space of continuous functions C(X) is the space of all signed measures on X.

Hilbert spaces are self-dual, by the Riesz representation theorem. More precisely, given any continuous linear functional \ell on a Hilbert space \mathcal{H}, there exists a unique g = g(\ell) \in \mathcal{H} s.t. \ell(f) = \langle f, g \rangle for all f \in \mathcal{H}, with \|\ell\| = \|g\|, and this gives us an identification \mathcal{H}^* \to \mathcal{H} defined by \ell \mapsto g(\ell).

Building linear functionals

The Hahn-Banach theorem states that any linear functional defined on a linear subspace V_0 \subset V and bounded above by a sublinear function p on V_0 can be extended to a linear functional bounded above by p on the entire space V.

This allows us to define linear functionals by specifying values on some subspace, and then extending them using the general theorem.

Some applications / consequences:

  • convex subsets can be separated from points in complement of their closures
  • the natural injection V \to V^{**} into the double dual is isometric
    (L^\infty)^* \supsetneq L^1: the construction of the Dirac delta functionals uses the Hahn-Banach theorem

Topologies weak and strong

Note the unit ball in the norm topology is compact iff the space is finite-dimensional. In the infinite-dimensional case, there is no single “good” or “canonical” topology we could put on the space. There is the norm topology, which is in some sense natural if we are given a norm to start with; however it does not have the good properties that it does in the finite-dimensional case, whereas there are in fact some other topologies in which e.g. the unit ball is compact:

A sequence of points (x_n) in a Banach space \mathcal{B} is said to converge weakly to a point x \in \mathcal{B} if \ell(x_n) \to \ell(x) for every \ell \in \mathcal{B}^*. (If our Banach space is in fact a Hilbert space, then in view of the Riesz representation theorem we may re-express this condition as \langle x_n,y \rangle \to \langle x,y \rangle for every y \in \mathcal{B}.) The weak topology on \mathcal{B} is the (coarsest) topology for which sequences (or, more generally, nets) converge in the sense of weak convergence, or equivalently the coarsest topology in which bounded linear functionals remain continuous.

The closed unit ball in \mathcal{B} is compact iff \mathcal{B} is reflexive (this is the case if e.g. \mathcal{B} is a Hilbert space, or one of the L^p spaces.)

The weak* topology on the dual space \mathcal{B}^* is the coarsest topology such that the evaluation maps \varphi \mapsto \varphi(x) from \mathcal{H}^* to the base field remain continuous (for all x \in \mathcal{H}^*. It coincides with the topology of pointwise convergence of linear functionals.

The (sequential) Banach-Alaoglu theorem states that the closed unit ball of the dual space of a (separable) normed vector space is (sequentially) compact in the weak* topology (uses Tychonoff theorem, hence Choice.)

There are many other possible topologies, especially on the dual space, but that is a topic for another day.

Baire categories in Banach spaces

Baire’s two categories form a dichotomy of “size” based purely on topology, which is “in some sense a combination of [countability and density]” (quote adapted from Baire, as translated in Stein.)

First category (“meagre”) sets are countable unions of nowhere dense sets (i.e. sets whose closures have empty interior, such as discrete sets, or the Cantor set.) Complements of first category sets are generic. Anything which is not first category is second category.

Baire’s category theorem*states that any complete metric space X is of the second category (“the continuum is of the second category”.) One corollary of this is that generic sets are dense in a complete metric space (but note that there are also first category sets in [0,1] of full measure–e.g. [very] fat Cantor sets.)

Second category gives wriggle room

That wriggle room (usually in the form of “if we write a second category as a countable union of closed sets, at least one of them contains an open ball”) allows us to prove a bunch of useful analytic results for Banach spaces, which mostly extend our intuition for what happens in finite-dimensional spaces to the infinite-dimensional case:

  1. The uniform boundedness principle states that any set of continuous linear functionals on a Banach space \mathcal{B} which is pointwise bounded on some second category X \subset \mathcal{B} is uniformly bounded (i.e. bounded in the sup norm.)
  2. The open mapping theorem states that surjective continuous linear maps between Banach spaces are open mappings.
  3. The closed graph theorem states that linear maps between Banach spaces whose graphs are closed are continuous.

Hilbert space structures

The inner product that comes with a Hilbert space \mathcal{H} enables us to talk about orthogonal elements. We say that a (possibly infinite) tuple is an orthonormal basis for \mathcal{H} if it spans not necessarily the whole space, but a dense subspace.

e.g. (e^{inx})_{n=-\infty}^\infty is a orthonormal basis of L^2([-\pi, \pi]), by the theory of Fourier series.

Moreover, whenever we have a (topologically) closed subspace \mathcal{S} \subset \mathcal{H}, there is a well-defined notion of orthogonal projection onto \mathcal{S}, and thus (by subtracting the orthogonal projection) a well-defined orthogonal complement \mathcal{S}^\perp, which behave as we would expect it to from the case of finite-dimensional Hilbert spaces.


The inner product structure of a Hilbert space allows us to define these fun things called adjoints, which should be familiar from linear algebra: the adjoint of a linear operator T: \mathcal{H} \to \mathcal{H} is a linear operator T^*: \mathcal{H} \to \mathcal{H} satisfying \langle Tf, g \rangle = \langle f, T^*g \rangle for every f, g \in \mathcal{H}.

The construction of this adjoint goes through the Riesz representation theorem (see above), and so only works for operators from a Hilbert space to itself. With more care adjoints may be defined for operators between arbitrary pairs of Hilbert spaces (or even Banach spaces.) These adjoints really go between the dual spaces–in the first instance the distinction was blurred since Hilbert spaces are self-dual; without further assumptions, in the general case they may not be defined on the whole space and may not be unique.

Compact operators

A linear operator T: \mathcal{H} \to \mathcal{H} is compact if the image of the closed unit ball in \mathcal{H} under T is pre-compact (i.e. has [sequentially] compact closure.) Note compact operators are automatically bounded. “It turns out that dealing with compact operators provides us with the closest analogy to the usual theorems of (finite-dimensional) linear algebra.”

Some useful properties:

  • Pre- or post-composing a compact operator with a bounded operator yields a compact operator.
  • Limits of compact operators (in the sup norm) are compact.
  • Conversely, every compact operator is the limit of finite-rank operators (i.e. operators with finite-dimensional range.)
  • Compactness is preserved under taking adjoints.

Some useful examples:

  • Diagonalizable operators with eigenvalues |\lambda_k| \to 0
  • Hilbert-Schmidt operators

The Spectral Theorem for compact operators states that any compact symmetric operator T: \mathcal{H} \to \mathcal{H} has an orthonormal basis of eigenvectors, with top eigenvalue of norm \|T\|.

Spectral Theorem for bounded operators

There is a more general spectral theorem for bounded self-adjoint operators: given any bounded symmetric operator T: \mathcal{H} \to \mathcal{H}, there exists a measure space X and a real-valued f \in L^\infty(X) (representing the spectrum) s.t. A is unitarily conjugate to the “multiplication by f” operator on L^2(X) given by \varphi \mapsto (x \mapsto f(x)\varphi(x))

Alternatively this may be expressed in terms of a spectral resolution E_\lambda or projection-valued measure dE_\lambda, which allows us to write A = \int_{\sigma(A)} \lambda \, d E_\lambda.

In the case of compact operators the spectrum \sigma(A) is discrete (and the corresponding projection-valued measure a countable linear combination of atoms), and we recover the more specific statement above.

Overview / Outlines

A measure theory primer

(No proofs here: for proofs and/or details see e.g. Stein and Shakarchi’s Real Analysis)

Step 1: Measures

A (signed) measure \mu on a given space X is a way of determining (a measure for, as it were) how large a set is, or, more precisely, a function from subsets of X to the (extended) reals. Measures are non-negative (signed measures need not be), zero on the empty set, and countably additive on disjoint sets.

It is, in general, not possible to assign a measure to every subset of an arbitrary X in a way consistent with these axioms (see: existence of non-measurable sets); hence to fully specify a measure  space we need one more piece of data, the set of subsets of X which are measurable. These form a tribe\sigma-algebra: they contain X, and are closed under taking complements and countable unions (and hence, by De Morgan’s laws, also closed under countable intersections.)

The word “countable” in all of the above is important! If we replace it with “finite” we obtain the weaker notion of Jordan content; if we replace it with “arbitrary” we get limp hogwash (any set is the disjoint union of its points; if we further assume some sort of translation-invariance, i.e. all singleton sets have the same measure, this implies either any infinite set has infinite measure, or every set has zero measure.)

Important examples include

  • the counting measure
  • the Lebesgue measure on \mathbb{R}^d is the complete translation-invariant measure on the σ-algebra containing the closed cubes with \mu([0, 1]^d) = 1
  • the Haar measure on a locally-compact topological group is a common generalization of the Lebesgue measure and the counting measure, with similar uniqueness properties

Construction of the Lebesgue measure

  1. Closed intervals \prod_{n=1}^d [a_n, b_n] are assigned measure \prod_{n=1}^d |b_n - a_n| (Note this follows from our normalization, translation-invariance, and countable additivity.)
  2. To an arbitrary set E we assign the (Lebesgue) exterior measure \mu_*(E) = \inf \sum_{j=1}^\infty |Q_j|, where the inf ranges over all countable coverings of E by closed cubes. Again—“countable” is key here.
  3. A set is measurable if it differs from some open set(s) in a difference set of arbitrarily small exterior measure (more precisely: for any \epsilon > 0, we have an open set \mathcal{O}, which depends on \epsilon, s.t. \mu_*(\mathcal{O} \setminus E) < \epsilon.)
  4. Define the measure of a measurable set to be its exterior measure.
  5. Check that the resulting measure is indeed a measure (i.e. do the book-keeping to verify that the result is countably additive on disjoint sets.)

Carathéodory extension

A similar procedure can be applied more generally: given a space X on which we would like a measure, we start with some small reasonable family of subsets of X on which we can agree how to determine size / measure (in the case above, the closed cubes), and then attempt to extend this toddler measure (technically, a premeasure) to a measure on some larger \sigma-algebra of subsets. \epsilon-more precisely:

  1. The “small reasonable family” should be an algebra, i.e. non-empty, and closed under complements, finite unions and finite intersections.
  2. A premeasure assigns (non-negative extended) reals to sets in our algebra. Premeasures should be zero on the empty set and countably additive on disjoint sets.
  3. Given a premeasure \mu_0 on an algebra \mathcal{A}, we may form an exterior measure—a function \mu_* that assigns a (non-negative extended) real to any subset of X—by taking \mu_*(E) = \inf \sum_{j=1}^\infty \mu_0(E_j), where the inf ranges over all coverings of E by sets in \mathcal{A}.
  4. Axiomatically, exterior measures should be zero on the empty set, non-decreasing (if E_1 \subset E_2, then $latex  \mu_*(E_1) \leq \mu_*(E_2)$), and countably subadditive.
  5. Now we come to a key idea of Carathéodory: whereas in the construction of the Lebesgue measure we leaned heavily on the open sets in \mathbb{R}^d, we can formulate a criterion for measurability which does not refer to any topology on X, by declaring that a set E is measurable if \mu_*(A) = \mu_*(E \cap A) + \mu_*(E^c \cap A) for every A \subset X—i.e. if E divides any part of the space up in a reasonable enough way, as seen by the exterior measure.
  6. It is then straightforward to check that the set of all such sets forms a \sigma-algebra which contains \mathcal{A}, and our exterior measure restricted to this \sigma-algebra satisfies the axioms for a measure. By construction, this measure agrees with our premeasure on \mathcal{A}

The Carathéodory extension theorem states that, starting with any premeasure \mu_0 on any algebra of sets in X, one can form a measure \mu extending \mu_0, in the sense above, by following the process above.

Moreover, if X is \sigma-finite, i.e. it is the union of countably many pieces of finite measure (according to \mu), then this extension is unique.


How do different measures on the same space relate? Lebesgue (for the real line) and Radon-Nikodym (in the general case) tell us that the relation is, in some way, as nice and controlled as it could be.

Given any \sigma-finite positive measure \mu on a measure space X, any \sigma-finite (signed) measure \nu on may be decomposed into a piece  \nu_aabsolutely continuous w.r.t. \mu (i.e. \nu_a(E) = 0 iff \mu(E) = 0) and a piece \nu_s mutually singular with \mu, i.e. the two measures have disjoint supports.

Moreover the first piece may be written in the form d\nu_a = f \,d\mu for some extended \mu-integrable function f. (For notions of measurable functions and their integration, see below.)

Step 2: Maps

Measuring sets is all very well … but we also want our notion of measure to play nicely with maps between spaces, and this leads us to the idea of measurable functions. These are functions where measurable sets in the target space have measurable preimages in the domain space.

Note “measurable” here is with respect to the respective \sigma-algebras. In particular, if the target is a topological space, it is assumed, unless otherwise specified, to be equipped with the Borel \sigma-algebra, i.e. the smallest \sigma-algebra containing all of the open sets (which is smaller than the family of Lebesgue-measurable sets for \mathbb{R}^d, for instance.)

Littlewood’s three principles

In short: “weird things only ever happen in a vanishingly small period of time”:

  1. Every measurable set of finite measure is nearly a finite union of intervals: given such a set E for any \epsilon > 0 there exists a finite union of closed cubes s.t. \mu(E \Delta F) \leq \epsilon.
  2. Every measurable function is nearly continuous, i.e. for any \epsilon > 0, f|_{A_\epsilon} is continuous for some closed A_\epsilon with \mu(E \setminus A_\epsilon) < \epsilon (Lusin’s theorem.)
    Note that this states the restricted function is continuous as a function
  3. Every convergent sequence of measurable functions is nearly uniformly convergent, i.e. for any \epsilon > 0 is uniformly convergent on some closed A_\epsilon with \mu(E \setminus A) < \epsilon (Egorov’s theorem.)

The above are formulated for Lebesgue measure on \mathbb{R}. More generally:

  1. applies in any measure space with a measure constructed by Carathéodory extension, with “closed cube” replaced by “element of \mathcal{A}“.
  2. requires the domain and target spaces to be topological spaces for continuity to make sense, and the result further requires the target to be second-countable, and the domain to be Hausdorff and equipped with a Radon measure.
  3. requires the target to be a metric space for the idea of uniform convergence to make sense, and also requires target to be separable.

Step 3: Integrals

Once we have measurable functions it doesn’t take very long before somebody starts talking about trying to integrate them. Because measure theory is all about how big things are, and an integral is essentially (some global measure of ) “how big a function is.” Okay, enough with this turbo vagueness already—Eli Stein does a much better job in his preface, anyway.

In the below we assume the maps are functions into the reals; more generally the target space may be any separable metric space without too much change to the statements …

Construction of the Lebesgue integral

  1. The characteristic function \chi_E of a measurable set E is declared to have integral equal to the measure of the set: \int_X \chi_E \,d\mu = \mu(E).
  2. Extend the definition of the integral to simple functions, i.e. linear combinations of characteristic functions, by linearity.
  3. Extend the integral to all non-negative functions, by considering any such function f as a limit of simple functions \varphi_n, and letting the integral of f be the limit of the integrals of \varphi_n.
  4. Extend the integral to all (measurable) functions by decomposing any such function f into positive and negative parts f = f_+ - f_-.

The same process works, more generally, for any \sigma-finite measure space

Fatou and friends

Very useful for proving results with the Lebesgue integral (starting with how it’s well-defined)

  1. Fatou’s lemma: for (f_n) a sequence of non-negative measurable functions, \int \liminf_{n \to \infty} f_n \,d\mu \leq \liminf_{n \to \infty} f_n \,d\mu.
  2. Monotone convergence: for (f_n) a sequence of non-negative measurable functions with f_n \nearrow f, $\lim_{n \to \infty} \int f_n \,d\mu = \int f \,d\mu$
  3. Dominated convergence: for (f_n) a sequence of measurable functions with f_n \to f a.e. and |f_n| \leq g for some integrable g, we have \int |f-f_n| \,d\mu \to 0 and so \int f_n \,d\mu \to \int f\,d\mu as n \to \infty.
  4. Much approximation. Wow. The simple functions, step functions, and continuous functions of compact support are dense in the space of integrable functions.

Fubini’s theorem

Given two measures \mu_1 and \mu_2 on X_1 and X_2 (resp.), we can form a product measure \mu = \mu_1 \times \mu_2 by defining the premeasure (not measure!) \mu(A \times B) = \mu_1(A) \mu_2(B) for all \mu_1-measurable A and \mu_2-measurable B, and then extending this to a measure using Carathéodory extension.

Fubini’s theorem then tells us that integrating against the product measure \mu is the same as integrating against each of the factor measures \mu_i in turn (in either order.)

Whither the Fundamental Theorem of Calculus

The Lebesgue density theorem states that, for any locally integrable f on \mathbb{R}^d, \lim_{m(B) \to 0, B \ni x} \frac{1}{m(B)} \int_B f(y) \,dy = f(x) for almost every x. 

As a corollary: recalling the definition of the derivative, this says taking the Lebesgue integral of any integrable function f and then differentiating will recover the original function f—this is one direction of the Fundamental Theorem.

In the opposite direction: if F is absolutely continuous on [a, b], then F’ exists a.e. and is integrable, and satisfies F(x) - F(a) = \int_a^x F'(y) \,dy for all x \in [a,b]. Absolutely continuity may appear to be an additional hypothesis, but its necessity is clear when we observe that functions that arise as indefinite integrals (i.e. those of the form x \mapsto \int_a^x f(y) \,dy with f an integrable function) are absolutely integrable.

Articles, Theorems

Ergodicity of the Geodesic Flow

The geodesic flow

Given any Riemannian manifold M, we may define a geodesic flow \varphi_t on the unit tangent bundle T^1M which sends a point (x, v) to the point (\varphi_t x, \varphi_t^* v), where

  • \varphi_t x is the point distance from x along the geodesic ray emanating from x in the direction of v, and
  • \varphi_t^* v is the parallel transport of v along the same ray

(it’s a mouthful, isn’t it? It’s really simpler than all those words make it seem.) Note, at each point, we remember not just where we are—the point x \in M—, but also where we’re going—the direction vector v \in T_x M; if we were to forget this second piece of information, we would become a little unmoored: here we are … where should we go next?


When M is a closed (i.e. compact, no boundary) hyperbolic surface, or more generally closed with strictly negative curvature, this geodesic flow is ergodic, i.e. any subset of \Sigma or M invariant under the flow has either zero measure, or full measure. Here the measure on our Riemannian manifold is the pushforward of the Lebesgue measure through the coordinate charts.

Since linear combinations of step functions are dense in the space of bounded measurable functions, we may equivalently define ergodicity as: any measurable function invariant under the flow is a.e. constant.

(Side note: with more assumptions on the curvature we may relax the compactness assumption to a finite volume assumption)

The Hopf argument (for closed hyperbolic manifolds)

This is essentially due to the exponential divergence of geodesics in negative curvature , and the splitting of the tangent spaces T_ v T^1M = E^s_v \oplus E^0_v \oplus E^u_v into stable, tangent (flowline), and unstable distributions; these give rise to three maximally transverse foliations, the stable foliation W^s, the unstable foliation W^u, and the foliation by flowlines W^0.

The flow is exponentially contracting in the forward time direction on the leaves of the stable foliation W^s, and on which the flow is exponentially contracting in the reverse time direction the leaves of the unstable foliation W^u. In other words, the flow is Anosov.

We may describe these foliations explicitly in the case of constant negative curvature—if we take \gamma to be the geodesic tangent to v \in T^1M,

  • W^s(v) is (the quotient image of) the unit normal bundle to the horosphere through \pi(v) \in M tangent to the forward endpoint of \gamma in \partial_\infty\mathbb{H}^n \cong \partial_\infty\widetilde{M}. “forward” here being taken with reference to how v is pointing along \gamma;
  • W^u(v) is (the quotient image of) the unit normal bundle to the horosphere through \pi(v) tangent to the backward endpoint of \gamma in \partial_\infty\mathbb{H}^n \cong \partial_\infty\widetilde{M};
  • W^0(v) = \gamma.

Step 1

Suppose f is a \phi-invariant function; by replacing f with min(f, C) if needed, WMA f is bounded. Since continuous functions are dense in the set of measurable functions on M, we may approximate f in L^1 by bounded continuous functions h_\epsilon.

By the Birkhoff ergodic theorem, forward time averages [w.r.t. \phi] exist for h_\epsilon.

By an argument involving the \phi-invariance of f and the triangle inequality, f is well-approximated (in L^1) by the forward time averages of h_\epsilon.

Step 2

The forward time averages of h_\epsilon are constant a.e., since by invariance these averages are already constant a.e. on (each of) the leaves of W^0, and they are also constant a.e. on (each of the) unstable and stable leaves, by uniform continuity of h_\epsilon.

(See also Proposition 2.6 in Brin’s exposition)

Step 3

To conclude that time averages, and hence our original arbitrary integrable function, are constant a.e. on M, we (would like to!) use Fubini’s theorem: locally near each (x_0,v_0) \in T^1M, the set of (x, v) along each of the foliation directions at which the time averages are equal to those at (x_0,v_0) has full measure, by the previous Step.

By Fubini’s theorem applied to the three foliation directions, we (would) conclude that the set of nearby (x, v) at which the time averages are equal to those at (x_0,v_0) has full measure. Hence the time averages are locally constant, and since T^1M is connected we are done.

But! (Also more generally, for K < 0)

The problem is that while our stable and unstable leaves are differentiable, the foliations need not be—i.e. the leaves may not vary smoothly in their parameter space.

To justify the use of a Fubini-type argument one instead shows that that these foliations are absolutely continuous.

The proof then immediately generalizes to all compact manifolds with (not necessarily constant) negative sectional curvature. For more general negatively-curved nanifolds, the stable and unstable foliations W^s and W^u may still be described in terms of unit normal bundles over horospheres, where horospheres are now described, more generally, as level sets of Busemann functions.

The proof of absolute continuity of the foliations proceeds as follows

  1. Showing that the stable and unstable distributions E^s and E^u (also the “central un/stable” or “weak un/stable” distributions, i.e. E^{s0} := E^s \oplus E^0 and  E^{u0} := E^u \oplus E^0) (of any C^2 Anosov flow) are Hölder continuous—i.e. given x, y \in M, the Hausdorff distance in TTM between the stable subspace E^s(x) and the stable subspace E^s(y) is \leq A \cdot d(x,y)^\alpha.
    Roughly speaking, this is true because any complementary subspace to E^s will become exponentially close to E^s under the repeated action of the geodesic flow, by the same mechanism that makes power iteration tick; and the distance function on M is Lipschitz. Analyzing the situation more carefully, and applying a bunch of simplifying tricks such as the adjusted metric described in Brin’s section 4.3, yields the desired Hölder continuity.
  2. Using this, together with the description of horospheres as limits of sequences of spheres with radii increasing to +\infty, to establish that between any pair of transversals for the un/stable foliation, we have a homeomorphism which is C^1 with bounded Jacobians, and hence absolutely continuous.
    Very slightly less vaguely, Hölder continuity of E^{u0}, together with the power iteration argument as above, implies tangents to transversals to the stable foliation W^s become exponentially close; given regularity of the Riemannian metric, this implies the Jacobians of the iterated geodesic flow on these transversals become exponentially close. By a chain rule argument and another application of the power iteration argument, this implies that the Jacobians of the map between transversals are bounded.
    This condition on the foliations is known as transversal absolute continuity, and implies, by a general measure theoretic argument (see section 3 of Brin’s article), absolute continuity of the foliations.
  3. Note that this last step, at least as presented in Brin, appears to require the use of pinched negative curvature.


Eberhard Hopf, “Ergodic theory and the geodesic flow on surfaces of constant negative curvature.” Bull. Amer. Math. Soc. 77 (1971), no. 6, 863–877.

Yves Coudene, “The Hopf argument.

Misha Brin, “Ergodicity of the Geodesic Flow.” Appendix to Werner Ballman’s Lectures on Spaces of Nonpositive Curvature.