» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
The Legendre–Fenchel transform, or Fenchel transform, or convex conjugation, is, in its naivest form, a duality between convex functions on a vector space and convex functions on the dual space. It is of central importance in convex optimization theory and in physics it is used to switch between Hamiltonian and Lagrangian perspectives.
Suppose that $V$ is a real vector space and that $f\colon V\to [-\infty ,+\infty ]$ is a function then the Fenchel transform is the function $f^{\ast }\colon V^{#}\to [-\infty ,+\infty ]$ defined on the dual vector space $V^{#}$ by $f^{\ast }(k)\coloneqq \sup _{x\in V}\big \{ \langle k,x\rangle -f(x)\big \} .$
If you’re a regular reader then you will be unsurprised when I say that I want to show how it naturally arises from enriched category theory constructions. I’ll show that in the next post. In this post I’ll give a little introduction to the Legendre–Fenchel transform.
There is probably no best way to introduce the Legendre–Fenchel transform. The only treatment that I knew for many years was in Arnold’s book Mathematical Methods of Classical Mechanics, but I have recently come across the convex optimization literature and would highly recommend Tourchette’s The Fenchel Transform in a Nutshell — my treatment here is heavily influenced by this paper. I will talk mainly about the one-dimensional case as I think that gives a lot of the intuition.
We will start, as Legendre did, with the special case of a strictly convex differentiable function $f\colon \mathbb{R}\to \mathbb{R}$; for instance, the function $x^{2}+1/2$ pictured on the left hand side above. The derviative of $f$ is strictly increasing and so the function $f$ can be parametrized by the derivative $k =d f/d x$ instead of the parameter $x$. Indeed we can write the parameter $x$ in terms of the slope $k$, $x=x(k)$. The Legendre-Fenchel transform $f^{*}$ can then be defined to satisfy $\langle k,x \rangle = f(x) +f^{\ast }(k),$ where the angle brackets mean the pairing between a vector space and its dual. In this one-dimensional case, where $x$ and $k$ are thought of as real numbers, we just have $\langle k,x \rangle = k x$.
As $x$ is a function of $k$ we can rewrite this as $f^{\ast }(k)\coloneqq \langle k,x(k) \rangle -f(x(k)).$ So the Legendre-Fenchel transform encodes the function is a different way. By differentiating this equation you can see that the $d f^{\ast }/d k=x(k)$, thus we have interchanged the abcissa (the horizontal co-ordinate) and the slope. So if $f$ has derivative $k_{0}$ at $x_{0}$ then $f^{\ast }$ has derivative $x_{0}$ at $k_{0}$. This is illustrated in the above picture.
I believe this is what Legendre did and then that what Fenchel did was to generalize this to non-differentiable functions.
For non-differentiable functions, we can’t talk about tangent lines and derivatives, but instead can talk about supporting lines. A supporting line is one which touches the graph at at least one point and never goes above the graph. (The fact that we’re singling out lines not going above the graph means that we have convex functions in mind.)
For instance, at the point $(x_{0},f(x_{0}))$ the graph pictured below has no tangent line, but has supporting lines with gradient from $k_{1}$ to $k_{2}$. A convex function will have at least one supporting line at each point.
It transpires that the right way to generalize the transform to this non-differentiable case is to define it as follows: $f^{\ast }(k)\coloneqq \sup _{x\in \mathbb{R}}\big \{ \langle k,x\rangle -f(x)\big \} .$ In this case, if $f$ has a supporting line of slope $k_{0}$ at $x_{0}$ then $f^{\ast }$ has a supporting line of slope $x_{0}$ at $k_{0}$. In the picture above, at $x_{0}$, the function $f$ has supporting lines with slope from $k_{1}$ to $k_{2}$: correspondingly, the function $f^{\ast }$ has supporting lines with slope $x_{0}$ all the way from $k_{1}$ to $k_{2}$.
If we allow the function $f$ to be not strictly convex then the transform will not alway be finite. For example, if $f(x)\coloneqq ax+b$ then we have $f^{\ast }(a)=-b$ and $f^{\ast }(k)=+\infty$ for $k\ne a$. So we will allow functions taking values in the extended real numbers: $\overline{\mathbb{R}}\coloneqq [-\infty ,+\infty ]$.
We can use the above definition to get the transform of any function $f\colon \mathbb{R}\to \overline{\mathbb{R}}$, whether convex or not, but the resulting transform $f^{\ast }$ is always convex. (When there are infinite values involved we can also say that $f^{\ast }$ is lower semi-continuous, but I’ll absorb that into my definition of convex for functions taking infinite values.)
Everything we’ve done for one-dimensional $\mathbb{R}$ easily generalizes to any finite dimensional real vector space $V$, where we should say ‘supporting hyperplane’ instead of ‘supporting line’. From that we get a transform between sets of functions $(\text {--})^{\ast }\colon \mathrm{Fun}(V,\overline{\mathbb{R}})\to \mathrm{Fun}(V^{#},\overline{\mathbb{R}}),$ where $V^{#}$ is the vector space dual of $V$. Similarly, we have a reverse transform going the other way, which is traditionally also denoted with a star $(\text {--})^{\ast }\colon \mathrm{Fun}(V^{#},\overline{\mathbb{R}})\to \mathrm{Fun}(V,\overline{\mathbb{R}}),$ for $g\colon V^{#}\to \overline{\mathbb{R}}$ we define $g^{\ast }(x)\coloneqq \sup _{k\in V^{#}}\big \{ \langle k,x\rangle -g(k)\big \} .$
This pair of transforms have some rather nice properties, for instance, they are order reversing. We can put a partial order on any set of functions to $\overline{\mathbb{R}}$ by defining $f_{1}\ge f_{2}$ if $f_{1}(x)\ge f_{2}(x)$ for all $x$. Then $f_{1}\ge f_{2} \quad \Rightarrow \quad f_{2}^{\ast }\ge f_{1}^{\ast }.$ Also for any function $f$ we have $f^{\ast }=f^{\ast \ast \ast }$ which implies that the operator $f\mapsto f^{\ast \ast }$ is idempotent: $f^{\ast \ast }=f^{\ast \ast \ast \ast }.$ This means that $f\mapsto f^{\ast \ast }$ is a closure operation. What it actually does is take the convex envelope of $f$, which is the largest convex function less than or equal to $f$. Here’s an example.
This gives that if $f$ is already a convex function then $f^{\ast \ast }=f$. And as a consequence the Legendre–Fenchel transform and the reverse transform restrict to an order reversing bijection between convex functions on $V$ and convex functions on its dual $V^{#}$. $\mathrm{Cvx}(V,\overline{\mathbb{R}})\cong \mathrm{Cvx}(V^{#},\overline{\mathbb{R}}).$
There are many other things that can be said about the transform, such as Fenchel duality and the role it plays in optimization, but I don’t understand such things to my own satisfaction yet.
Next time I’ll explain how most of the above structure drops out of the nucleus construction in enriched category theory.
Here’s a classic theorem about finite products theories, also known as algebraic theories or Lawvere theories. You can find it in Toposes, Triples and Theories, but it must go way back to Lawvere’s thesis. In my work with Nina Otter on phylogenetic trees, we need a slight generalization of it… and if true, this generalization must already be known. So I really just want a suitable reference!
Theorem. Suppose $C$ is a category with finite products that is single-sorted: every object is a finite product of copies of a single object $x \in C$. Let $Mod(C)$ be the category of models of $C$: that is, product-preserving functors
$\phi : C \to Set$
and natural transformations between these. Let
$U : Mod(C) \to Set$
be the functor sending any model to its underlying set:
$U(\phi) = \phi(x)$
Then $U$ has a left adjoint
$F : Set \to Mod(C)$
and $Mod(C)$ is equivalent to the category of algebras of the monad
$U F : Set \to Set$
As a result, any model $\phi$ can be written as a coequalizer
$F U F U(\phi) \stackrel{\longrightarrow}{\longrightarrow} F U(\phi) \longrightarrow \phi$
where the arrows are built from the counit of the adjunction
$\epsilon : F U \to 1_{Mod(C)}$
in the obvious ways: $F U (\epsilon_{F U (\phi)})$, $\epsilon_{F U F U(\phi)}$ and $\epsilon_\phi$.
The generalization we need is quite mild, I hope. First, we need to consider multi-typed theories. Second, we need to consider models in $Top$ rather than $Set$. So, this is what we want:
Conjecture. Suppose $C$ is a category with finite products that is $n$-sorted: every object is a finite product of copies of certain objects $x_\lambda$, where $\lambda$ ranges over some index set $\Lambda$. Let $Mod(C)$ be the category of topological models of $C$: that is, product-preserving functors
$\phi : C \to Top$
and natural transformations between these. Let
$U : Mod(C) \to Top^\Lambda$
be the functor sending any model to its underlying spaces:
$U(\phi) = (\phi(x_\lambda))_{\lambda \in \Lambda}$
Then $U$ has a left adjoint
$F : Top^\Lambda \to Mod(C)$
and $Mod(C)$ is equivalent to the category of algebras of the monad
$U F : Top^\Lambda \to Top^\Lambda$
As a result, any model $\phi$ can be written as a coequalizer
$F U F U(\phi) \stackrel{\longrightarrow}{\longrightarrow} F U(\phi) \longrightarrow \phi$
where the arrows are built from the counit of the adjunction in the obvious ways.
Comments
1) There shouldn’t be anything terribly special about $Top$ here: any sufficiently nice category should work… but all I need is $Top$. What counts as ‘sufficiently nice’? If we need to replace $Top$ ‘convenient category of topological spaces’, to make it cartesian closed, that’s fine with me—just let me know!
2) I’m sure this conjecture, if true, follows from some super-duper-generalization. I don’t mind hearing about such generalizations, and I suspect some of you will be unable to resist mentioning them… so okay, go ahead, impress me…
… but remember: all I need is this puny little result!
In fact, all I really need is a puny special case of this puny result. If it works, it will be a small, cute application of algebraic theories to biology.
Guest post by Christina Vasilakopoulou
In the eighth installment of the Kan Extension Seminar, we discuss the paper “Elementary Observations on 2-Categorical Limits” by G.M. Kelly, published in 1989. Even though Kelly’s classic book Basic Concepts of Enriched Category Theory, which contains the abstract theory related to indexed (or weighted) limits for arbitrary $\mathcal{V}$-categories, was available since 1982, the existence of the present article is well-justifiable.
On the one hand, it constitutes an independent account of the fundamental case $\mathcal{V}$=$\mathbf{Cat}$, thus it motivates and exemplifies the more general framework through a more gentle, yet meaningful exposition of 2-categorical limits. The explicit construction of specific notable finite limits such as inserters, equifiers etc. promotes the comprehension of the definitions, via a hands-on description. Moreover, these finite limits and particular results concerning 2-categories rather than general enriched categories, such as the construction of the cotensor as a PIE limit, are central for the theory of 2-categories. Lastly, by introducing indexed lax and pseudo limits along with Street’s bilimits, and providing appropriate lax/ pseudo/ bicategorical completeness results, the paper serves also as an indespensable reference for the later “2-Dimensional Monad Theory” by Blackwell, Kelly and Power.
I would like to take this opportunity to thank Emily as well as all the other participants of the Kan Extension Seminar. This has been a unique experience of constant motivation and inspiration for me!
Basic Machinery
Presently, our base of enrichment is the cartesian monoidal closed category $\mathbf{Cat}$ of (small) categories, with the usual adjunction $-\times\mathcal{A}\dashv[\mathcal{A},-]$. The very definition of an indexed limit requires a good command of the basic $\mathbf{Cat}$-categorical notions, as seen for example in “Review of the Elements of 2-categories” by Kelly and Street. In particular, a 2-natural transformation $\alpha:G\Rightarrow H$ between 2-functors consists of components which not only satisfy the usual naturality condition, but also the 2-naturality one expressing compatibility with 2-cells. Moreover, a modification between 2-natural transformations $m:\alpha\Rrightarrow\beta$ has components families of 2-cells $m_A:\alpha_A\Rightarrow\beta_A:GA\to HA$ compatible with the mapped 1-cells of the domain 2-category, i.e. $m_B\cdot Gf=Hf\cdot m_A$ (where $\cdot$ is whiskering).
A 2-functor $F:\mathcal{K}\to\mathbf{Cat}$ is called representable, when there exists a 2-natural isomorphism $\alpha:\mathcal{K}(K,-)\xrightarrow{\quad\sim\quad}F.$ The components of this isomorphism are $\alpha_A:\mathcal{K}(K,A)\cong FA$ in $\mathbf{Cat}$, and the unit of the representation is the corresponding `element’ $\mathbf{1}\to FK$ via Yoneda.
For a general complete symmetric monoidal closed category $\mathcal{V}$, the usual functor category $[\mathcal{A},\mathcal{B}]$ for two $\mathcal{V}$-categories is endowed with the structure of a $\mathcal{V}$-category itself, with hom-objects ends $[\mathcal{A},\mathcal{B}](T,S)=\int_{A\in\mathcal{A}} \mathcal{B}(TA,SA)$ (which exist at least when $\mathcal{A}$ is small). In our context of $\mathcal{V}$=$\mathbf{Cat}$ it is not necessary to employ ends and coends at all, and the hom-category $[\mathcal{K},\mathcal{L}](G,H)$ of the functor 2-category is evidently the category of 2-natural transformations and modifications. However, we note that computations via (co)ends simplify and are essential for constructions and (co)completeness results for enrichment in general monoidal categories.
The definition of weighted limits for 2-categories
To briefly motivate the definition of a weighted limit, recall that an ordinary limit of a ($\mathbf{Set}$-) functor $G:\mathcal{P}\to\mathcal{C}$ is characterized by an isomorphism $\mathcal{C}(C,limG)\cong[\mathcal{P},\mathcal{C}](\Delta C, G)$ natural in $C$, where $\Delta C:\mathcal{P}\to\mathcal{C}$ is the constant functor on the object $C$. In other words, the limit is the representing object of the presheaf $[\mathcal{P},\mathcal{C}](\Delta -,G):\mathcal{C}^\op\to\mathbf{Set}.$ Since components of a natural transformation $\Delta C\Rightarrow G$ (i.e. cones) can be viewed as components of a natural $\Delta\mathbf{1}\Rightarrow\mathcal{C}(C,G-):\mathcal{C}\to\mathbf{Set}$, the above defining isomorphism can be written as $\mathcal{C}(C,\mathrm{lim}G)\cong[\mathcal{P},\mathbf{Set}](\Delta\mathbf{1},\mathcal{C}(C,G-)).$ In this form, ordinary limits can be easily seen as particular examples of conical indexed limits for $\mathcal{V}$=$\mathbf{Set}$, and we are able to generalize the concept of a limit by replacing the functor $\Delta\mathbf{1}$ by an arbitrary functor (weight) $\mathcal{C}\to\mathbf{Set}$.
We may thus think of a 2-functor $F:\mathcal{P}\to\mathbf{Cat}$ as a (small) indexing type or weight, and a 2-functor $G:\mathcal{P}\to\mathcal{K}$ as a diagram in $\mathcal{K}$ of shape $\mathcal{P}$: $\begin{matrix} & \mathbf{Cat}\quad \\ {}^{weight} \nearrow_{F} & \\ \mathcal{P} & \overset{G}\underset{diagram}{\rightarrow} & \mathcal{K}. \\ \end{matrix}$ The 2-functor $G$ gives rise to a 2-functor $\int_p [Fp,\mathcal{K}(-,Gp)]=[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(-,G)):\; \mathcal{K}^\op\longrightarrow\mathbf{Cat}$ which maps a 0-cell $A$ to the category $[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(A,G))$. A representation of this contravariant 2-functor is an object $\{F,K\}\in\mathcal{K}$ along with 2-natural isomorphism $\mathcal{K}(-,\{F,G\})\xrightarrow{\;\sim\;}[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(-,G))$ with components isomorphisms between categories $\mathcal{K}(A,\{F,G\})\cong[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(A,G-)).$ The unit of this representation is $\mathbf{1}\to[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(\{F,G\},G))$ which corresponds uniquely to a 2-natural transformation $\xi:F\Rightarrow\mathcal{K}(\{F,G\},G)$.
Via this 2-natural isomorphism, the object $\{F,G\}$ in $\mathcal{K}$ satisfies a universal property which can be expressed in two levels:
The 1-dimensional aspect of the universal property states that every natural transformation $\rho$ factorizes as $\begin{matrix} F \xrightarrow{\rho} & \mathcal{K}(A,G) \\ {}_\xi \searrow & \uparr_{\mathcal{K}(h,1)} \\ & \mathcal{K}(\{F,G\},G) \\ \end{matrix}$ for a unique 1-cell $h:A\to\{F,G\}$ in $\mathcal{K}$, where the vertical arrow is just pre-composition with $h$.
The 2-dimensional aspect of the universal property states that every modification $\theta:\rho\Rrightarrow\rho'$ factorizes as $\mathcal{K}(\alpha,1)\cdot \xi$ for a unique 2-cell $\alpha:h\Rightarrow h'$ in $\mathcal{K}$.
The fact that the 2-dimensional aspect (which asserts an isomorphism of categories) does not in general follow from the 1-dimensional aspect (which asserts a bijection between the hom-sets of the underlying categories) is a recurrent issue of the paper. In fact, things would be different if the underlying category functor $\mathcal{V}(I,-)=(\;)_0:\mathcal{V}\text{-}\mathbf{Cat}\to\mathbf{Cat}$ were conservative, in which case the 2-dimensional universal property would always imply the 1-dimensional one. Certainly though, this is not the case for $\mathcal{V}$=$\mathbf{Cat}$: the respective functor discards all the 2-cells and is not even faithful. However, if we know that a weighted limit exists, then the first level of the universal property suffices to detect it up to isomorphism.
Completeness of 2-categories
A 2-category $\mathcal{K}$ is complete when all limits $\{F,G\}$ exist. The defining 2-natural isomorphism extends the mapping $(F,G)\mapsto\{F,G\}$ into a functor of two variables (the weighted limit functor) $\{-,-\}:[\mathcal{P},\mathbf{Cat}]^{op}\times[\mathcal{P},\mathcal{K}]\longrightarrow \mathcal{K}$ as the left parametrized adjoint (actually its opposite) of the functor $\mathcal{K}(-,?):\mathcal{K}^{op}\times[\mathcal{P},\mathcal{K}]\to[\mathcal{P},\mathbf{Cat}]$ mapping an object $A$ and a functor $G$ to $\mathcal{K}(A,G-)$. A colimit in $\mathcal{K}$ is a limit in $\mathcal{K}^op$, and the weighted colimit functor is $-\ast-:[\mathcal{P}^op,\mathbf{Cat}]\times[\mathcal{P},\mathcal{K}]\longrightarrow\mathcal{K}.$ Apart from the evident duality, we observe that often colimits are harder to compute than limits. This may partially be due to the fact that $\{F,G\}$ is determined by the representable $\mathcal{K}(-,\{F,G\})$ which gives generalized elements of $\{F,G\}$, whereas the description of $\mathcal{K}(F\ast G,-)$ gives us arrows out of $F\ast G$. For example, limits in $\mathbf{Cat}$ are easy to compute via $[\mathcal{A},\{F,G\}]\cong[\mathcal{P},\mathbf{Cat}](F,[\mathcal{A},G-])\cong[\mathcal{A},[\mathcal{P},\mathbf{Cat}](F,G)]$ and in particular, taking $\mathcal{A}=\mathbf{1}$ gives us the objects of the category $\{F,G\}$ and $\mathcal{A}=\mathbf{2}$ gives us the morphisms. On the contrary, colimits in $\mathbf{Cat}$ are not straightforward (except than their property $F\ast G\cong G\ast F$).
Notice that like ordinary limits are defined, via representability, in terms of limits in $\mathbf{Set}$, we can define weighted limits in terms of limits of representables in $\mathbf{Cat}$: $\mathcal{K}(A,\{F,G\})\cong\{F,\mathcal{K}(A,G-)\},\quad\mathcal{K}(F\ast ,G,A)\cong\{F,\mathcal{K}(G-,A)\}.$ On the other hand, if the weights are representables, via Yoneda lemma we get $\{\mathcal{P}(P,-),G\}\cong GP, \qquad \mathcal{P}(-,P)\ast G\cong GP.$
The main result for general $\mathcal{V}$-completeness in Kelly’s book says that a $\mathcal{V}$-enriched category is complete if and only if it admits all conical limits (equivalently, products and equalizers) and cotensor products. Explicitly, conical limits are those with weight the constant $\mathcal{V}$-functor $\Delta I$, whereas cotensors are those where the domain enriched category $\mathcal{P}$ is the unit category $\mathbf{1}$, hence the weight and the diagram are determined by objects in $\mathcal{V}$ and $\mathcal{K}$ respectively. Once again, for $\mathcal{V}$=$\mathbf{Cat}$ an elementary description of both limits is possible.
Notice that when a 2-category admits tensor products of the form $\mathbf{2}\ast A$, then the 2-dimensional universal property follows from the 1-dimensional for every limit, because of conservativity of the functor $\mathbf{Cat}_0(\mathbf{2},-)$ and the definition of tensors. Moreover, the former also implies that the category $\mathbf{2}$ is a strong generator in $\mathbf{Cat}$, hence the existence of only the cotensor $\{\mathbf{2},B\}$ along with conical limits in a 2-category $\mathcal{K}$ is enough to deduce 2-completeness.
$\mathbf{Cat}$ itself has cotensor and tensor products, given by $\{\mathcal{A},\mathcal{B}\}=[\mathcal{A},\mathcal{B}]$ and $\mathcal{A}\ast\mathcal{B}=\mathcal{A}\times\mathcal{B}$. It is ultimately also cocomplete, all colimits being constructed from tensors and ordinary colimits in $\mathbf{Cat}_0$ (which give the conical limits in $\mathbf{Cat}$ by the existence of the cotensor $[\mathbf{2},B]$).
If we were to make use of ends and coends, the explicit construction of an arbitrary 2-(co)limit in $\mathcal{K}$ as the (co)equalizer of a pair of arrows between (co)products of (co)tensors coincides with $\{F,G\}=\int_K \{FK,GK\}, \qquad F\ast G=\int^K FK\ast GK.$ Such an approach simplifies the proofs of many useful properties of limits and colimits, such as $\{F,\{G,H\}\}\cong\{F\ast G,H\},\;\;(G\ast F)\ast H\cong F\ast(G\ast H)$ for appropriate 2-functors.
Famous finite 2-limits
The paper provides the description of some important classes of limits in 2-categories, essentially by exhibiting the unit of the defining representation for each particular case. A table which summarizes the main examples included is the following:
Let’s briefly go through the explicit construction of an inserter in a 2-category $\mathcal{K}$. The weight and diagram shape are as in the first line of the above table, and denote by $B\overset{f}\underset{g}{\rightrightarrows}C$ the image of the diagram in $\mathcal{K}$. The standard technique is to identify the form of objects and morphisms of the functor 2-category $[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(A,G-))$, and then state both aspects of the universal property.
An object is a 2-natural transformation $\alpha:F\Rightarrow\mathcal{K}(A,G-)$ with components $\alpha_\bullet:1\to\mathcal{K}(A,B)$ and $\alpha_\star:\mathbf{2}\to\mathcal{K}(A,C)$ satisfying the usual naturality condition (2-naturality follows trivially, since $\mathcal{P}$ only has the identity 2-cell). This amounts to the following data:
an 1-cell $A\xrightarrow{\alpha_\bullet}B$, i.e. the object in $\mathcal{K}(A,B)$ determined by the functor $\alpha_\bullet$;
a 2-cell ${\alpha_\star0}\overset{\alpha_\star}{\Rightarrow}{\alpha_\star1}$, i.e. the morphism in $\mathcal{K}(A,C)$ determined by the functor $\alpha_\star$;
properties, which make the 1-cells $\alpha_\star0,\alpha_\star1$ factorize as $\alpha_\star0=A\xrightarrow{\alpha_\bullet}B\xrightarrow{f}C$ and $\alpha_\star1=A\xrightarrow{\alpha_\bullet}B\xrightarrow{g}C$.
We can encode the above data by a diagram $\begin{matrix} & B & \\ {}^{\alpha_\bullet} \nearrow && {\searrow}^f \\ A\; & \Downarrow{\alpha_\star}& \quad C. \\ {}_{\alpha_\bullet} \searrow && \nearrow_g \\ & B & \\ \end{matrix}$ Now a morphism is a modification $m:\alpha\Rrightarrow\beta$ between two objects as above. This has components
$m_\bullet:\alpha_\bullet\Rightarrow\beta_\bullet$ in $\mathcal{K}(A,B)$;
$m_\star:\alpha_\star\Rightarrow\beta_\star$ given by 2-cells $m_\star^0:\alpha_\star0\Rightarrow{\beta_\star0}$ and $m_\star^1:\alpha_\star1\Rightarrow\beta_\star1$ in $\mathcal{K}(A,C)$ satisfying naturality $m^1_\star\circ\alpha_\star=\beta_\star\circ m^0_\star$.
The modification condition $m^0_\star=f\cdot m_\bullet$ and $m^1_\star=g\cdot m_\bullet$ gives the components of $m_\star$ as whiskered composites of $m_\bullet$. We can thus express such a modification as a 2-cell $m_\bullet$ satisfying $gm_\bullet\circ\alpha_\star=fm_\bullet\circ\beta_\star$ (graphically expressed by pasting $m_\bullet$ accordingly to the sides of $\alpha_\star,\beta_\star$).
This encoding simplifies the statement of the universal property for $\{F,G\}$, as the object of in $\mathcal{K}$ through which any natural transformation and modification uniquely factorize in an appropriate way (in fact, through the unit $\xi$). A very similar process can be followed for the identification of the other classes of limits. As an illustration, let’s consider some of these limits in the 2-category $\mathbf{Cat}$.
The inserter of two functors $F,G:\mathcal{B}\to\mathcal{C}$ is a category $\mathcal{A}$ with objects pairs $(B,h)$ where $B\in\mathcal{B}$ and $h:FB\to GB$ in $\mathcal{C}$. A morphism $(B,h)\to(B',h')$ is an arrow $f:B\to B'$ in $\mathcal{B}$ such that the following diagram commutes: $\begin{matrix} FB & \overset{Ff}{\longrightarrow} & FB' \\ {}_h\downarrow && \downarrow_{h'} \\ FB & \underset{Gh}{\longrightarrow} & GB'. \\ \end{matrix}$ The functor $\alpha_\bullet=P:\mathcal{A}\to\mathcal{B}$ is just the forgetful functor, and the natural transformation is given by $(\alpha_\star)_{(B,h)}=h$.
The comma-object of two functors $F,G$ is precisely the comma category. If the functors have also the same domain, their inserter is a subcategory of the comma category.
The equifier of two natural transformations $\phi^1,\phi^2:F\Rightarrow G:\mathcal{B}\to\mathcal{C}$ is the full subcategory $\mathcal{A}$ of $\mathcal{B}$ over all objects $B$ such that $\phi^1_B=\phi^2_B$ in $\mathcal{C}$.
There is a variety of constructions of new classes of limits from given ones, coming down to the existence of endo-identifiers, inverters, iso-inserters, comma-objects, iso-comma-objects, lax/ oplax/pseudo limits of arrows and the cotensors $\{\mathbf{2},K\}$, $\{\mathbf{I},K\}$ out of inserters, equifiers and binary products in the 2-category $\mathcal{K}$. Along with the substantial construction of arbitrary cotensors out of these three classes, P(roducts)I(nserters)E(quifiers) limits are established as essential tools, also relatively to categories of algebras for 2-monads. Notice that equalizers are `too tight’ to fit in certain 2-categories of importance such as $\mathbf{Lex}$.
Weaker notions of limits in 2-categories
The concept of a weighted 2-limit strongly depends on the specific structure of the 2-category $[\mathcal{P},\mathbf{Cat}]$ of 2-functors, 2-natural transformations and modifications, for the 2-categories $\mathcal{P}$ and $\mathbf{Cat}$. If we alter this structure by considering lax natural transformations or pseudonatural transformations, we obtain the definition of the lax limit $\{F,G\}_l$ and pseudo limit $\{F,G\}_p$ as the representing objects for the 2-functors $\begin{matrix} Lax[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(-,G)):\mathcal{K}^\op\to\mathbf{Cat} \\ Psd[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(-,G)):\mathcal{K}^\op\to\mathbf{Cat}. \end{matrix}$ Notice that the functor categories $Lax[\mathcal{P},\mathcal{L}]$ and $Psd[\mathcal{P},\mathcal{L}]$ are 2-categories whenever $\mathcal{L}$ is a 2-category, hence the defining isomorphisms are again between categories as before.
An important remark is that any lax or pseudo limit in $\mathcal{K}$ can be in fact expressed as a `strict’ weighted 2-limit. This is done by replacing the original weight with its image under the left adjoint of the incusion functors $[\mathcal{P},\mathbf{Cat}]\hookrightarrow Lax[\mathcal{P},\mathbf{Cat}]$, $[\mathcal{P},\mathbf{Cat}]\hookrightarrow Psd[\mathcal{P},\mathbf{Cat}]$. The opposite does not hold: for example, inserters and equifiers are neither lax not pseudo limits.
We can relax the notion of limits in 2-categories even further, and define the bilimit $\{F,G\}_b$ of 2-functors $F$ and $G$ as the representing object up to equivalence: $\mathcal{K}(A,\{F,G\}_b)\simeq Psd[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(A,G)).$ This is of course a particular case of general bilimits in bicategories, for which $\mathcal{P}$ and $\mathcal{K}$ are requested to be bicategories and $F$ and $G$ homomorphism of bicategories. The above equivalence of categories expresses a birepresentation of the homomorphism $Hom[\mathcal{P},\mathbf{Cat}](F,\mathcal{K}(-,G)):\mathcal{K}^op\to\mathbf{Cat}$.
Evidently, bilimits (firstly introduced by Ross Street) may exist even when pseudo limits do not, since they require an equivalence rather than isomorphism of hom-categories. The following two results sum up the conditions ensuring whether a 2-category has all lax, pseudo and bilimits.
A 2-category with products, inserters and equifiers has all lax and pseudo limits (whereas it may not have all strict 2-limits).
A 2-category with biproducts, biequalizers and bicotensors is bicategorically complete. Equivalently, it admits all bilimits if and only if for all 2-functors $F:\mathcal{P}\to\mathbf{Cat}$, $G:\mathcal{P}\to\mathcal{K}$ from a small ordinary category $\mathcal{P}$, the above mentioned birepresentation exists.
Street’s construction of an arbitrary bilimit requires a descent object of a 3-truncated bicosimplicial object in $\mathcal{K}$. An appropriate modification of the arguments exhibits lax and pseudo limits as PIE limits.
These weaker forms of limits in 2-categories are fundamental for the theory of 2-categories and bicategories. Many important constructions such as the Eilenberg-Moore object as well as the Grothendieck construction on a fibration, arise as lax/oplax limits. They are also crucial in 2-monad theory, for example when studying categories of (strict) algebras with non-strict (pseudo or even lax/oplax) morphisms, which are more common in nature.
A new Spanish language mathematical magazine has been launched: universo.math. Hispanophones should check out the first issue! There are some very interesting looking articles which cover areas from art through politics to research-level mathematics.
The editor-in-chief is my mathematical brother Jacob Mostovoy and he wants it to be a mix of Mathematical Intellingencer, Notices of the AMS and the New Yorker, together with less orthodox ingredients; the aim is to keep the quality high.
Besides Jacob, the contributors to the first issue that I recognise include Alberto Verjovsky, Ernesto Lupercio and Edward Witten, so universo.math seems to be off to a high quality start.
Guest post by Bruce Bartlett
The following is the greatest math talk I’ve ever watched!
- Etienne Ghys (with pictures and videos by Jos Leys), Knots and Dynamics, ICM Madrid 2006. [See below the fold for some links.]
I wasn’t actually at the ICM; I watched the online version a few years ago, and the story has haunted me ever since. Simon and I have been playing around with some of this stuff, so let me share some of my enthusiasm for it!
The story I want to tell here is how, via modular flow of lattices in the plane, certain matrices in $\SL(2,\mathbb{Z})$ give rise to knots in the 3-sphere less a trefoil knot. Despite possibly sounding quite scary, this can be easily explained in an elementary yet elegant fashion.
As promised above, here are some links related to Ghys’ ICM talk.
- Video — watch this now!
- Slides — the (ungarbled) slides for the talk
- Web version with animations — a great summary
- Article — the accompanying article with details
- More pictures — from Jos Leys’s webpage
- More animations — from Jos Leys’s webpage
- Article by Dana Lyons — a great summary with historical context and interviews
I’m going to focus on the last third of the talk — the modular flow on the space of lattices. That’s what produced the beautiful picture above (credit for this and other similar pics below goes to Jos Leys; the animation is Simon’s.)
Lattices in the plane
For us, a lattice is a discrete subgroup of $\mathbb{C}$. There are three types: the zero lattice, the degenerate lattices, and the nondegenerate lattices:
Given a lattice $L$ and an integer $n \geq 4$ we can calculate a number — the Eisenstein series of the lattice: $G_{n}(L) = \sum _{\omega \in L, \omega \neq 0} \frac{1}{\omega ^{n}}.$ We need $n \geq 3$ for this sum to converge. For, roughly speaking, we can rearrange it as a sum over $r$ of the lattice points on the boundary of a square of radius $r$. The number of lattice points on this boundary scales with $r$, so we end up computing something like $\sum _{r \geq 0} \frac{r}{r^{n}}$ and so we need $n \geq 3$ to make the sum converge.
Note that $G_{n}(L)$ = 0 for $n$ odd since every term $\omega$ is cancelled by the opposite term $-\omega$. So, the first two nontrivial Eisenstein series are $G_{4}$ and $G_{6}$. We can use them to put `Eisenstein coordinates’ on the space of lattices.
Theorem: The map $\begin{aligned} \{ \text{lattices} \} &\rightarrow \mathbb{C}^{2} \\ L & \mapsto (G_{4} (L), \, G_{6}(L)) \end{aligned}$ is a bijection.
The nicest proof is in Serre’s A Course in Arithmetic, p. 89. It is a beautiful application of the Cauchy residue theorem, using the fact that $G_{4}$ and $G_{6}$ define modular forms on the upper half plane $H$. (Usually, number theorists set up their lattices so that they have basis vectors $1$ and $\tau$ where $\tau \in H$. But I want to avoid this ‘upper half plane’ picture as far as possible, since it breaks symmetry and mystifies the geometry. The whole point of the Ghys picture is that not breaking the symmetry reveals a beautiful hidden geometry! Of course, sometimes you need the ‘upper half plane’ picture, like in the proof of the above result.)
Lemma: The degenerate lattices are the ones satisfying $20 G_{4}^{3} - 49G_{6}^{2} = 0$.
Let’s prove one direction of this lemma — that the degenerate lattices do indeed satisfy this equation. To see this, we need to perform a computation. Let’s calculate $G_{4}$ and $G_{6}$ of the lattice $\mathbb{Z} \subset \mathbb{C}$. Well, $G_{4}(\mathbb{Z}) = \sum _{n \neq 0} \frac{1}{n^{4}} = 2 \zeta (4) = 2 \frac{\pi ^{4}}{90}$ where we have cheated and looked up the answer on Wikipedia! Similarly, $G_{6}(\mathbb{Z}) = 2 \frac{\pi ^{6}}{945}$.
So we see that $20 G_{4}(\mathbb{Z})^{3} - 49 G_{6}(\mathbb{Z})^{2} = 0$. Now, every degenerate lattice is of the form $t \mathbb{Z}$ where $t \in \mathbb{C}$. Also, if we transform the lattice via $L \mapsto t L$, then $G_{4} \mapsto t^{-4} G_{4}$ and $G_{6} \mapsto t^{-6} G_{6}$. So the equation remains true for all the degenerate lattices, and we are done.
Corollary: The space of nondegenerate lattices in the plane of unit area is homeomorphic to the complement of the trefoil in $S^{3}$.
The point is that given a lattice $L$ of unit area, we can scale it $L \mapsto \lambda L$, $\lambda \in \mathbb{R}^{+}$ until $(G_{4}(L), G_{6}(L))$ lies on the 3-sphere $S^{3} = \{ (z,w) : |z|^{2} + |w|^{2} = 1\} \subset \mathbb{C}^{2}$. And the equation $20 z^{3} - 49 w^{2} = 0$ intersected with $S^{3}$ cuts out a trefoil knot… because it is “something cubed plus something squared equals zero”. And the lemma above says that the nondegenerate lattices are precisely the ones which do not satisfy this equation, i.e. they represent the complement of this trefoil.
Since we have not divided out by rotations, but only by scaling, we have arrived at a 3-dimensional picture which is very different to the 2-dimensional moduli space (upper half-plane divided by $\SL(2,\mathbb{Z})$) picture familiar to a number theorist.
The modular flow
There is an intriguing flow on the space of lattices of unit area, called the modular flow. Think of $L$ as sitting in $\mathbb{R}^{2}$, and then act on $\mathbb{R}^{2}$ via the transformation $\left ( \begin{array}{cc} e^{t} & 0 \\ 0 & e^{-t} \end{array} \right ),$ dragging the lattice $L$ along for the ride. (This isn’t just some formula we pulled out the blue — geometrically this is the ‘geodesic flow on the unit tangent bundle of the modular orbifold’.)
We are looking for periodic orbits of this flow.
“Impossible!” you say. “The points of the lattice go off to infinity!” Indeed they do… but disregard the individual points. The lattice itself can ‘click’ back into its original position:
How are we to find such periodic orbits? Start with an integer matrix $A = \left ( \begin{array}{cc} a & b \\ c & d \end{array}\right ) \in \SL(2, \mathbb{Z})$ and assume $A$ is hyperbolic, which simply means $|a + d| \geq 2$. Under these conditions, we can diagonalize $A$ over the reals, so we can find a real matrix $P$ such that $P A P^{-1} = \pm \left ( \begin{array}{cc} e^{t} & 0 \\ 0 & e^{-t} \end{array} \right )$ for some $t \in \mathbb{R}$. Now set $L \coloneqq P(\mathbb{Z}^{2})$. We claim that $L$ is a periodic orbit of period $t$. Indeed: $\begin{aligned} L_{t} &= \left ( \begin{array}{cc} e^{t} & 0 \\ 0 & e^{-t} \end{array} \right ) P (\mathbb{Z}^{2}) \\ &= \pm PA (\mathbb{Z}^{2}) \\ &= \pm P (\mathbb{Z}^{2}) \\ &= L. \end{aligned}$ We have just proved one direction of the following.
Theorem: The periodic orbits of the modular flow are in bijection with the conjugacy classes of hyperbolic elements in $\SL(2, \mathbb{Z})$.
These periodic orbits produce fascinating knots in the complement of the trefoil! In fact, they link with the trefoil (the locus of degenerate lattices) in fascinating ways. Here are two examples, starting with different matrices $A \in \SL(2, \mathbb{Z})$.
The trefoil is the fixed orange curve, while the periodic orbits are the red and green curves respectively.
Ghys proved the following two remarkable facts about these modular knots.
- The linking number of a modular knot with the trefoil of degenerate lattices equals the Rademacher function of the corresponding matrix in $\SL(2, \mathbb{Z})$ (the change in phase of the Dedekind eta function).
- The knots occuring in the modular flow are the same as those occuring in the Lorenz equations!
Who would have thought that lattices in the plane could tell the weather!!
I must say I have thought about many aspects of these closed geodesics, but it had never crossed my mind to ask which knots are produced. – Peter Sarnak
Guest post by Sean Moss
In this post I shall discuss the paper “On a Topological Topos” by Peter Johnstone. The basic problem is that algebraic topology needs a “convenient category of spaces” in which to work: the category $\mathcal{T}$ of topological spaces has few good categorical properties beyond having all small limits and colimits. Ideally we would like a subcategory, containing most spaces of interest, which is at least cartesian closed, so that there is a useful notion of function space for any pair of objects. A popular choice for a “convenient category” is the full subcategory of $\mathcal{T}$ consisting of compactly-generated spaces. Another approach is to weaken the notion of topological space, i.e. to embed $\mathcal{T}$ into a larger category, hopefully with better categorical properties.
A topos is a category with enough good properties (including cartesian closedness) that it acts like the category of sets. Thus a topos acts like a mathematical universe with ‘sets’, ‘functions’ and its own internal logic for manipulating them. It is exciting to think that if a “convenient topos of spaces” could be found, then its logical aspects could be applied to the study of its objects. The set-like nature of toposes might make it seem unlikely that this can happen. For instance, every topos is balanced, but the category of topological spaces is famously not. However, sheaves (the objects in Grothendieck toposes) originate from geometry and already behave somewhat like generalized spaces.
I shall begin by elaborating on this observation about Grothendieck toposes, and briefly examine some previous attempts at a “topological topos”. I shall explore the idea of replacing open sets with convergent sequences and see how this leads us to Johnstone’s topos. Finally I shall describe how Johnstone’s topos is a good setting for homotopy theory.
I would to thank Emily Riehl for organizing the Kan Extension Seminar and for much useful feedback. Thanks go also to the other seminar participants for being a constant source of interesting perspectives, and to Peter Johnstone, for his advice and for writing this incredible paper.
Are there toposes of spaces?
We shall need to be flexible about what we mean by “space”. For the rest of this post I shall try to use the term “topological space” in a strict technical sense (a set of points plus specified open sets), whereas “space” will be a nebulous concept. The idea is that spaces have existence regardless of having been implemented as a topological space or not, and may naturally have more (or perhaps less) structure. Topology merely forms one setting for their rigorous study. In topology we can only detect the topological properties of spaces. For example, $\mathbb{R}$ and $(0,1)$ are isomorphic as topological spaces, but they are far from being the same space: consider how different their implementations as, say, metric spaces are. Some spaces are naturally considered as having algebraic or smooth structure. The type of question one wishes to ask about a space will bear upon the type of object as which it should be implemented.
An extremely important class of toposes consists of the Grothendieck toposes, which are categories of sheaves on a site. A site is a small category together with a Grothendieck coverage (also known as a Grothendieck topology). Informally, the Grothendieck coverage tells us how some objects can be “covered” by maps coming out of other objects. In the special case where the site is a topological space, the objects are open sets and the coverage tells us that an open set is covered by the inclusions of any family of open sets whose union is all of that open set. A sheaf on a site is then a contravariant $\mathrm{Set}$-valued functor on the underlying category (a presheaf) which satisfies a “unique patching” condition with respect to each covering sieve.
In the following two senses, a Grothendieck topos always behaves like a category of spaces:
(A) One way to describe the properties of a space is to consider the maps into that space. This is the idea behind the homotopy groups, where we consider (homotopy classes of) maps from the $n$-sphere into a space. Given a small category $\mathcal{C}$, each object is determined by knowing all the arrows into it and how these arrows “restrict” along other arrows, i.e. precisely the data of the representable presheaf. A non-representable presheaf can be viewed as a generalized object of $\mathcal{C}$, which is testable by the ‘classical’ objects of $\mathcal{C}$: it is described entirely by what the maps into it from objects of $\mathcal{C}$ ought to be. If we have in mind that $\mathcal{C}$ is some category of spaces, with some sense in which some spaces are covered by families of maps out of other spaces, (i.e. we have a Grothendieck coverage), then we should be able to patch maps into these generalized spaces together. So the topos of sheaves on this site is a setting in which we may be able to implement certain spaces, if we wish to study their properties testable by objects of $\mathcal{C}$.
(B) The category of presheaves on a small category $\mathcal{C}$ is its free cocompletion. Intuitively, it is the category of objects obtained by formally gluing together objects of $\mathcal{C}$. The use of the word “gluing” is itself a spatial metaphor. CW-complexes are built out of gluing together cells - simplicial sets are instructions for carrying out this gluing. Manifolds are built from gluing together open subsets of Euclidean space. Purely formal ‘gluing’ is not quite sufficient: the Yoneda embedding of $\mathcal{C}$ into its presheaves typically does not preserve any colimits already in $\mathcal{C}$. But if $\mathcal{C}$ is a category of spaces, its objects are not neutral with respect to each other: there may be a suitable Grothendieck coverage on $\mathcal{C}$ which tells us how some objects can cover others. The topos of sheaves is then the category of objects obtained by formally gluing objects of $\mathcal{C}$ in a way that respects these coverings. This is strongly connected with the preservation of colimits by the embedding of $\mathcal{C}$ into the sheaves. Colimits in the presheaf topos are constructed pointwise; to get the sheaf colimit one applies the reflection into the category of sheaves (“sheafification”) to the presheaf colimit. The more covers imposed on $\mathcal{C}$, the more work is done by the sheafification, so the closer we end up to the original colimit.
Are there toposes in topology?
It is far from clear that we can choose a site for which the space-like behaviour of sheaves accords with the usual topological intuition. If we want to use a topological topos for homotopy theory, then ideally it should contain objects that we can recognize as the CW-complexes, and we should be able to construct them via more or less the usual colimits.
Attempt 1: The “gros topos” of Giraud
The idea is to take sheaves on the ‘site’ of topological spaces, where covers are given by families of open inclusions of subspaces whose union is the whole space. We do not automatically get a topos unless the site is small, so instead take some small, full subcategory $\mathcal{C}$ of $\mathcal{T}$, which is closed under open subspaces. The gros topos is the topos of sheaves for this site.
The Yoneda embedding exhibits $\mathcal{C}$ as a subcategory, and in fact we can ‘embed’ $\mathcal{T}$ via the functor $X \mapsto \hom_\mathcal{T}(-,X)$, this will be full and faithful on a fairly large subcategory. By (B) one may like to consider the gros topos as the category of spaces glued together from objects of $\mathcal{C}$. This turns out not to be useful, since the site does not have enough covers for colimits to agree with those in $\mathcal{T}$. Moreover the site is so large that calculations are difficult.
Attempt 2: Lawvere’s topos
We use observation (A). Motivated by the use of paths in homotopy theory, we take $M$ to be the full subcategory of $\mathcal{T}$ whose only object is the closed unit interval $I$. So $M$ is the monoid of continuous endomorphisms of $I$. Lawvere’s topos $\mathcal{L}$ is the topos of sheaves on $M$ with respect to the canonical Grothendieck coverage (the largest Grothendieck coverage on $M$ for which $\hom_M(-,I)$ is a sheaf).
Then an object $X$ of $\mathcal{L}$ is a set $X(I)$ of paths, together with, for any continuous $\gamma\colon I \to I$, a reparametrization map $X(\gamma) \colon X(I) \to X(I)$, where this assignment is functorial. The points of such a space are given by natural transformations $1 \to X$, i.e. ‘constant paths’ or paths which are fixed by every reparametrization. We can see which point a path visits at time $t$ by reparametrizing that path by the constant map $I \to I$ with value $t$. A word of caution: a given object in $\mathcal{L}$ may have distinct paths which agree on points for all time.
This site is much easier to calculate with than the gros site (once we have a handle on the canonical coverage). Again there is a functor $P \colon \mathcal{T} \to \mathcal{L}$ given by $X \mapsto \hom_\mathcal{T}(I,X)$, which is full and faithful on a fairly large subcategory (including CW-complexes). However, it is still the case that the site could do with more covers: the functor $P$ does not preserve all the colimits used to build up CW-complexes. By observation (B), an object of $\mathcal{L}$ is obtained by gluing together copies of the unit interval $I$, so it is possible to construct the circle $S^1$ out of copies of $I$, but we cannot do this in the usual way. The coequalizer of $I$ by its endpoints in $\mathcal{L}$ is not $S^1$, but a “signet-ring”: it is a circle with a ‘lump’, through which a path can cross only if waits there for non-zero time. We cannot solve this problem by adding in more covers, because the coverage is already canonical (adding in more covers will evict the representable $\hom_\mathcal{T}(-,I)$ from the topos).
The key idea in Johnstone’s topos is to replace paths with convergent sequences. Given a topological space $X$, a convergent sequence in $X$ is a function from $a \colon \mathbb{N}\cup\{\infty\} \to X$ such that whenever $U \subseteq X$ is an open set containing $a_\infty$, then there exists an $N$ such that $a_n \in U$ for all $n \gt N$. The convergent sequences are precisely the continuous maps out of $\mathbb{N}\cup\{\infty\}$ when we give it the topology that makes it the one-point compactification of the discrete space $\mathbb{N}$ - we denote this topological space by $\mathbb{N}^+$.
Convergent sequences as primitive
It is a basic theorem in general topology that, given a function $f\colon X \to Y$ between topological spaces, if it is continuous then it preserves convergent sequences. The converse is not true for general topological spaces, but it is true whenever $Y$ is a sequential space. Given a topological space $X$, A set $U \subseteq X$ is sequentially open if for any convergent sequence $(a_n)$, with $a_\infty \in U$, $(a_n)$ is eventually in $U$. (Clearly any open subset is sequentially open.) A topological space is then said to be sequential if all of its sequentially open sets are open. The sequential spaces include all first-countable spaces and in fact they can be characterized as the topological quotients of metrizable spaces, so they certainly include all CW-complexes.
The notion of convergent sequence is arguably more intuitive than that of open set. For example, each convergent sequence gives you concrete data about the nearness of some family of points to another point, whereas open sets only give you such data when the topology (or at least a neighbourhood basis) is considered as a whole. It would be compelling to define a continuous function as one that preserves convergent sequences. This motivates the study of subsequential spaces.
A subsequential space consists of a set $X$ (of points) and family of “convergent sequences”: a specified subset of the set of functions $\mathbb{N}\cup\{\infty\} \to X$, such that:
- for every point $x \in X$, the constant sequence $(x)$ converges to $x$;
- if $(x_n)$ converges to $x$, then so does every subsequence of $(x_n)$;
- if $(x_n)$ is a sequence and $x$ is a point such that every subsequence of $(x_n)$ contains a (sub)subsequence converging to $x$, then $(x_n)$ converges to $x$.
The third axiom is the general form of intertwining two or more sequences with the same limit or changing a finite initial segment of a sequence. Note that there is no ‘Hausdorff‘-style condition on the convergent sequences: a sequence may converge to more than one limit. A continuous map between subsequential spaces $X \to Y$ is a function from the points of $X$ to the points of $Y$ that preserves convergence of sequences.
The axioms above are all true of the set of convergent sequences which arise from a topology on a set. In fact, this process gives a full and faithful embedding of sequential spaces into subsequential spaces. Thus sequential spaces live inside both topological and subsequential spaces. They are coreflective in the former and reflective in the latter: given a topological space $X$, its sequentially open sets constitute a new (finer) topology; given a subsequential space $Y$, we can consider the “sequentially open” sets with respect to its convergent sequences, and then take all convergent sequences in the resulting (sequential) topological space. Observe that the sense of the adjunction in each case comes from the fact that we either throw in more open sets - so there is a natural map $(X)_\text{seq} \to X$, or throw in more convergent sequences - so there is a natural map $Y \to (Y)_\text{seq}$.
In the following I shall denote the category of sequential spaces by $\mathcal{F}$ and that of subsequential spaces by $\mathcal{F}'$.
Johnstone’s topos
Let $\Sigma$ be the full subcategory of $\mathcal{T}$ on the objects $1$ (the singleton space) and $\mathbb{N}^+$ (the one-point compactification of the discrete space of natural numbers). The arrows in this category can be described without topology as well: as functions, the maps $\mathbb{N}^+ \to \mathbb{N}^+$ are the eventually constant ones and the ones that “tend to infinity”.
Given an infinite subset $T \subseteq \mathbb{N}$, let $f_T$ denote the unique order-preserving injection $\mathbb{N}^+ \to \mathbb{N}^+$ whose image is $T \cup \{\infty\}$. One can check that there is a Grothendieck coverage $J$ on $\Sigma$ where $1$ is covered by only the maximal sieve, and where $\mathbb{N}^+$ is covered by any sieve $R$ such that:
- $R$ contains all of the points $n \colon 1 \to \mathbb{N}^+$, $n \in \mathbb{N}^+$.
- For any infinite subset $T \subseteq \mathbb{N}$ there exists an infinite subset $T' \subseteq T$ such that $f_{T'} \in R$.
The topos $\mathcal{E}$ is then defined to be $\mathrm{Sh}(\Sigma,J)$.
The objects in our topos are a slight generalization of subsequential space. If $X \in \mathcal{E}$, then $X(1)$ is its set of points, and $X(\mathbb{N}^+)$ is its set of convergent sequences. Each point $n \colon 1 \to \mathbb{N}^+$ induces a ‘projection map’ $X(n)\colon X(\mathbb{N}^+) \to X(1)$, giving you the point of the sequence at time $n$. The unique map $\mathbb{N}^+ \to 1$ induces a map $X(1) \to X(\mathbb{N}^+)$, which sends each point to a canonical choice of constant sequence. Note that there may be more than one convergent sequence with the same points, thus it may be helpful to think of $X(\mathbb{N}^+)$ as the set of proofs of convergence for sequences.
Clearly we can embed $\mathcal{F}'$, the subsequential spaces, into $\mathcal{E}$: the points are the same, and the convergence proofs are just the convergent sequences. The first two axioms are satisfied because of the equations that hold in $\Sigma$. The third axiom is encoded into the coverage. Conversely, any object $X$ of $\mathcal{E}$ for which the projection maps $X(n)\colon X(\mathbb{N}^+) \to X(1)$, $n \in \mathbb{N}^+$ are jointly injective is isomorphic to one coming from a subsequential space. There is a functor $H \colon \mathcal{T} \to \mathcal{E}$ sending $X \mapsto \hom_\mathcal{T}(-,X)$, and it is indeed sheaf-valued since it is equal to the composite of the coreflection $\mathcal{T} \to \mathcal{F}$ with the inclusions $\mathcal{F} \to \mathcal{F}' \to \mathcal{E}$. In fact, the Grothendieck coverage defining $\mathcal{E}$ is canonical, so it is the largest for which this functor is well-defined.
We can use observation (B) to think of $\mathcal{E}$ as all spaces constructed from gluing sequences together. It is just about possible that we could have motivated the construction of $\mathcal{E}$ this way: classically, any sequential space $X$ is the quotient in $\mathcal{T}$ of a metrizable space, which may be taken to be a disjoint union of copies of $\mathbb{N}^+$ - one for every convergent sequence in $X$. Compare this with the canonical representation of a presheaf as a colimit of representables (one for each of its elements).
Colimits
It turns out that $\mathcal{F}'$ is the subcategory of $\neg\neg$-separated objects in $\mathcal{E}$, hence it is a reflective subcategory. $\mathcal{F}$ is reflective in $\mathcal{F}'$, hence it is also reflective in $\mathcal{E}$. In particular, all limits in $\mathcal{F}$ are preserved by the inclusion into $\mathcal{E}$. Take some caution, however, since products do not agree with those in $\mathcal{T}$: one has to take the sequential coreflection of the topological product. This is only a minor issue; having to modify the product arises in other “convenient categories” such as compactly-generated spaces.
The colimits in $\mathcal{F}$ do agree with those in $\mathcal{T}$ because it is a coreflective subcategory. Surprisingly, the inclusion $\mathcal{F} \to \mathcal{E}$ preserves many of these colimits.
Theorem Let $X$ be a sequential space, and $\{U_\alpha \mid \alpha \in A\}$ an open cover of $X$. Then the obvious colimit diagram in $\mathcal{F}$: $\begin{matrix} U_\alpha\cap U_\beta & \rightarrow & U_\alpha & & \\ & \searrow & & \searrow & \\ U_\beta \cap U_\gamma & \rightarrow & U_\beta & \rightarrow & X \\ \vdots & \searrow & & \nearrow & \\ & & U_\gamma & & \\ & & \vdots & & \end{matrix}$ is preserved by the embedding $\mathcal{F} \to \mathcal{E}$.
Proof The recipe for this sort of theorem is: take the colimit in presheaves, show that the comparison map is monic, then show that it is $J$-dense, for then it will exhibit $X$ as the colimit upon reflecting into the topos $\mathcal{E}$. The colimit $L$ in presheaves is calculated “objectwise”, so $L$ has the same points of $X$, but only those convergent sequences which are entirely within some $U_\alpha$ (hence the comparison map $L \to X$ is monic). To sheafify, we need to add in all those sequences $x \in X(\mathbb{N}^+)$ which are locally in $L$, i.e. for which the sieve $\{f \colon ? \to \mathbb{N}^+ \mid X(f)(x) \in L(?) \}$ in $\Sigma$ is $J$-covering. For any $x \in X(\mathbb{N}^+)$, this sieve clearly contains all the points $1 \to \mathbb{N}^+$. But $x$ must also be eventually within one of the $U_\alpha$, so the second condition for the covering sieves is also satisfied. $\square$
There are several other colimit preservation results one can talk about (with similar proofs to the above). The amazing consequence of these is that the colimits used to construct CW-complexes are all preserved by the embedding $\mathcal{F} \to \mathcal{E}$. Thus classical homotopy theory embeds into $\mathcal{E}$ and we have successfully found a topos of spaces which agrees with the classical theory.
Geometric realization
Let $\Delta$ be the category of non-zero finite ordinals and order-preserving maps. Then objects of the presheaf category $[\Delta^\mathrm{op},\mathrm{Set}]$ are known as simplicial sets.
Theorem $[\Delta^\mathrm{op},\mathrm{Set}]$ is the classifying topos for intervals in $\mathrm{Set}$-toposes.
The closed unit interval $[0,1]$ is sequential and is in fact an interval (a totally ordered object with distinct top and bottom elements). Thus it corresponds to a geometric morphism $\mathcal{E} \to [\Delta^\mathrm{op},\mathrm{Set}]$ (an adjunction $(f^\star \dashv f_\star)$ with $f^\star$ left-exact).
Theorem If $S \in [\Delta^\mathrm{op},\mathrm{Set}]$ is a simplicial set, then $f^\star(S)$ is its geometric realization, considered as a sequential space and hence as an object of $\mathcal{E}$. If $X \in \mathcal{E}$ is a sequential space, then $f_\star(E)$ is its singular complex.
The usual geometric realization is not left-exact if considered to take values in $\mathcal{T}$, one must choose a “convenient subcategory” first, and then there is some work to do in proving it. Here the left-exactness just arises out of the general theory of geometric morphisms. Should we wish to do so, the above method allows us to replace $[0,1]$ with any other object that the internal logic of $\mathcal{E}$ sees as an interval to get a different realization of simplicial sets.
The above is far from a complete survey of “On a Topological Topos”, which contains several more results of interest relating to $\mathcal{E}$ and captures the elegance of using the site $\Sigma$ for calculation - I thoroughly recommend taking a look if you know some topos theory. We have seen enough though to understand that for many spaces the sequential properties align with the topological properties. Unfortunately, $\mathcal{E}$ is yet to receive the attention it deserves.
Guest post by Nils Carqueville and Daniel Murfet
(My university probably isn’t alone in encouraging mathematicians and computer scientists to embrace the idea of “big data”, or in more sober terminology, “data science”. Here, Nils Carqueville and Daniel Murfet introduce their really excellent article on big data in whole-population surveillance. —TL)
In recent years Big Data has become an increasingly relevant topic in the economic sector, for intelligence agencies, and for the sciences.
A particularly far-reaching development made possible by Big Data is that of unprecedented mass surveillance. As Alexander Beilinson, Stefan Forcey, Tom Leinster and others have pointed out, the role that mathematicians and computer scientists play here is central. With this in mind last January we wrote an essay on
stressing some of those aspects of the matter that we think deserve more attention or additional elaboration. We hope this to be a useful contribution to the necessary discussion on modern mass surveillance, and we thank Tom for his efforts in this direction, and for allowing us to post here.
EDIT: New Submission Deadline: 15 April 2014, Notification of Acceptance: 30 April 2014
A friend, John Wigglesworth, has asked me to announce the following conference. Perhaps he’ll let us know if pluralist homotopy type theorists are welcome ;).
I wonder if any progress has been made on my MO ‘multiverse’ question.
CFP: Symposium on the Foundations of Mathematics, Kurt Gödel Research Center, University of Vienna.
Set theory is taken to serve as a foundation for mathematics. But it is well-known that there are set-theoretic statements that cannot be settled by the standard axioms of set theory. The Zermelo-Fraenkel axioms, with the Axiom of Choice (ZFC), are incomplete. The primary goal of this symposium is to explore the different approaches that one can take to the phenomenon of incompleteness.
One option is to maintain the traditional “universe” view and hold that there is a single, objective, determinate domain of sets. Accordingly, there is a single correct conception of set, and mathematical statements have a determinate meaning and truth-value according to this conception. We should therefore seek new axioms of set theory to extend the ZFC axioms and minimize incompleteness. It is then crucial to determine what justifies some new axioms over others.
Alternatively, one can argue that there are multiple conceptions of set, depending on how one settles particular undecided statements. These different conceptions give rise to parallel set-theoretic universes, collectively known as the “multiverse”. What mathematical statements are true can then shift from one universe to the next. From within the multiverse view, however, one could argue that some universes are more preferable than others.
These different approaches to incompleteness have wider consequences for the concepts of meaning and truth in mathematics and beyond. The conference will address these foundational issues at the intersection of philosophy and mathematics. The primary goal of the conference is to showcase contemporary philosophical research on different approaches to the incompleteness phenomenon.
To accomplish this, the conference has the following general aims and objectives:
(1) To bring to a wider philosophical audience the different approaches that one can take to the set-theoretic foundations of mathematics. (2) To elucidate the pressing issues of meaning and truth that turn on these different approaches. (3) To address philosophical questions concerning the need for a foundation of mathematics, and whether or not set theory can provide the necessary foundation
Date and Venue: 7-8 July 2014 - Kurt Gödel Research Center, Vienna
Confirmed Speakers: Sy-David Friedman (Kurt Gödel Research Center for Mathematical Logic), Hannes Leitgeb (Munich Center for Mathematical Philosophy)
Call for Papers: We welcome submissions from scholars (in particular, young scholars, i.e. early career researchers or post-graduate students) on any area of the foundations of mathematics (broadly construed). Particularly desired are submissions that address the role of set theory in the foundations of mathematics, or the foundations of set theory (universe/multiverse dichotomy, new axioms, etc.) and related ontological and epistemological issues. Applicants should prepare an extended abstract (maximum 1500 words) for blind review, and send it to sotfom [at] gmail [dot] com. The successful applicants will be invited to give a talk at the conference and will be refunded the cost of accommodation in Vienna for two days (7-8 July).
Submission Deadline: 31 March 2014
Notification of Acceptance: 30 April 2014
Scientific Committee: Philip Welch (University of Bristol), Sy-David Friedman (Kurt Gödel Research Center), Ian Rumfitt (University of Birmigham), John Wigglesworth (London School of Economics), Claudio Ternullo (Kurt Gödel Research Center), Neil Barton (Birkbeck College), Chris Scambler (Birkbeck College), Jonathan Payne (Institute of Philosophy), Andrea Sereni (Università Vita-Salute S. Raffaele), Giorgio Venturi (Université de Paris VII, “Denis Diderot” - Scuola Normale Superiore)
Organisers: Sy-David Friedman (Kurt Gödel Research Center), John Wigglesworth (London School of Economics), Claudio Ternullo (Kurt Gödel Research Center), Neil Barton (Birkbeck College), Carolin Antos (Kurt Gödel Research Center)
Conference Website: sotfom [dot] wordpress [dot] com
Further Inquiries: please contact
Claudio Ternullo (ternulc7 [at] univie [dot] ac [dot] at)
Neil Barton (bartonna [at] gmail [dot] com)
John Wigglesworth (jmwigglesworth [at] gmail [dot] com)
Nina Otter is a master’s student in mathematics at ETH Zürich who has just gotten into the PhD program at Oxford. She and I are writing a paper on operads and the tree of life.
Anyone who knows about operads knows that they’re related to trees. But I’m hoping someone has proved some precise theorems about this relationship, so that we don’t have to.
By operad I’ll always mean a symmetric topological operad. Such a thing has an underlying ‘symmetric collection’, which in turn has an underlying ‘collection’. A collection is just a sequence of topological spaces $O_n$ for $n \ge 0$. In a symmetric collection, each space $O_n$ has an action of the symmetric group $S_n$.
I’m hoping that someone has already proved something like this:
Conjecture 1. The free operad on the terminal symmetric collection has, as its space of $n$-ary operations, the set of rooted trees having some of their leaves labelled $\{1, \dots, n\}$.
Conjecture 2. The free operad on the terminal collection has, as its space of $n$-ary operations, the set of planar rooted trees having some of their leaves labelled $\{1, \dots, n\}$.
Calling them ‘conjectures’ makes it sound like they might be hard — but they’re not supposed to be hard. If they’re right, they should be easy and in some sense well-known! But there are various slightly different concepts of ‘rooted tree’ and ‘rooted planar tree’, and we have to get the details right to make these conjectures true. For example, a graph theorist might draw a rooted planar tree like this:
while an operad theorist might draw it like this:
If the conjectures are right, we can use them to define the concepts of ‘rooted tree’ and ‘rooted planar tree’, thus side-stepping these details. And having purely operad-theoretic definitions of ‘tree’ and ‘rooted tree’ would make it a lot easier to use these concepts in operad theory. That’s what I want to do, ultimately. But proving these conjectures, and of course providing the precise definitions of ‘rooted tree’ and ‘rooted planar tree’ that make them true, would still be very nice.
And it would be even nicer if someone had already done this. So please provide references… and/or correct mistakes in the following stuff!
Rooted Trees
Definition. For any natural number $n = 0, 1, 2, \dots$, an $n$-tree is a quadruple $T=(V,E,s,t)$ where:
- $V$ is a finite set whose elements are called internal vertices;
- $E$ is a finite non-empty set whose elements are called edges;
- $s: E \to V\sqcup \{1,\dots, n\}$ and $t: E \to V \sqcup\{0\}$ are maps sending any edge to its source and target, respectively.
Given $u,v\in V \sqcup\{0\} \sqcup \{1,\dots, n\}$, we write $u \stackrel{e}{\longrightarrow} v$ if $e\in E$ is an edge such that $s(e)=u$ and $t(e)=v$.
This data is required to satisfy the following conditions:
- $s : E \to V\sqcup \{1,\dots, n\}$ is a bijection;
- there exists exactly one $e\in E$ such that $t(e)=0$;
- for any $v \in V\sqcup\{1,\dots , n\}$ there exists a directed edge path from $v$ to $0$: that is, a sequence of edges $e_0, \dots, e_n$ and vertices $v_1 , \dots , v_n$ such that $v \stackrel{e_0}{\longrightarrow} v_1 , \; v_1 \stackrel{e_1}{\longrightarrow} v_2, \; \dots, \; v_n \stackrel{e_n}{\longrightarrow} 0 .$
So the idea is that our tree has $V \sqcup\{0\} \sqcup \{1,\dots, n\}$ as its set of vertices. There could be lots of leaves, but we’ve labelled some of them by numbers $1, \dots, n$. In our pictures, the source of each edge is at its top, while the target is at bottom.
There is a root, called $0$, but also a ‘pre-root’: the unique vertex with an edge going from it to $0$. I’m not sure I like this last bit, and we might be able to eliminate this redundancy, but it’s built into the operad theorist’s picture here:
It might be a purely esthetic issue. Like everything else, it gets a bit more scary when we consider degenerate special cases.
I’m hoping there’s an operad $Tree$ whose set of $n$-ary operations, $Tree_n$, consists of isomorphism classes of $n$-trees as defined above. I’m hoping someone has already proved this. And I hope someone has characterized this operad $Tree$ in a more algebraic way, as follows.
Definition. A symmetric collection $C$ consists of topological spaces $\{C_n\}_{n \ge 0}$ together with a continuous action of the symmetric group $S_n$ on each space $C_n$. A morphism of symmetric collections $f : C \to C'$ consists of an $S_n$-equivariant continuous map $f_n : C_n \to C'_n$ for each $n \ge 0$. Symmetric collections and morphisms between them form a category ${STop}$.
(More concisely, if we denote the groupoid of sets of the $\{1, \dots, n\}$ and bijections between these as $S$, then $STop$ is the category of functors from $S$ to $Top$.)
There is a forgetful functor from operads to symmetric collections
$U : Op \to STop$
with a left adjoint
$F: STop \to Op$
assigning to each symmetric collection the operad freely generated by it.
Definition. Let $Comm$ be the terminal operad: that is, the operad, unique up to isomorphism, such that $Comm_n$ is a 1-element set for each $n \ge 0$.
The algebras of $\Comm$ are commutative topological monoids.
Conjecture 1. There is a unique isomorphism of operads
$\phi : F (U (Comm)) \stackrel{\sim}{\longrightarrow} Tree$
that for each $n \ge 0$ sends the unique $n$-ary operation in $Comm$ to the corolla with $n$ leaves: that is, the isomorphism class of $n$-trees with no internal vertices.
(When I say “the unique $n$-ary operation in $Comm$”, but treating it as an operation in $F(U(Comm))$, I’m using the fact that the unique operation in $\Comm_n$ gives an element in $U(\Comm)_n$, and thus an operation in $F(U(\Comm))_n$.)
Planar Rooted Trees
And there should be a similar result relating planar rooted trees to collections without symmetric group actions!
Definition. A planar $n$-tree is an $n$-tree in which each internal vertex $v$ is equipped with a linear order on the set of its children, i.e. the set $t^{-1}(v)$.
I’m hoping someone has constructed an operad $PTree$ whose set of $n$-ary operations, $PTree_n$, consists of isomorphism classes of planar $n$-trees. And I hope someone has characterized this operad $PTree$ as follows:
Definition. A collection $C$ consists of topological spaces $\{C_n\}_{n \ge 0}$. A morphism of collections $f :C \to C'$ consists of a continuous map $f_n : C_n \to C'_n$ for each $n \ge 0$. Collections and morphisms between them form a category $\mathbb{N}Top$.
(If we denote the category with natural numbers as objects and only identity morphisms between as $\mathbb{N}$, then $\mathbb{N}Top$ is the category of functors from $\mathbb{N}$ to $Top$.)
There is a forgetful functor
$\Upsilon : Op \to \mathbb{N}Top$
with a left adjoint
$\Phi : \mathbb{N}Top \to Op$
assigning to each collection the operad freely generated by it. This left adjoint is the composite
$\mathbb{N}Top \stackrel{G}{\longrightarrow} STop \stackrel{F}{\longrightarrow} Op$
where the first functor freely creates a symmetric collection $G(C)$ from a collection $C$ by setting $G(C)_n = S_n \times C_n$, and the second freely generates an operad from a symmetric collection, as described above.
Conjecture 2. There is a unique isomorphism of operads
$\psi : \Phi(\Upsilon (Comm)) \stackrel{\sim}{\longrightarrow} PTree$
that for each $n \ge 0$ sends the unique $n$-ary operation in $Comm$ to the corolla with $n$ leaves ordered so that $1 < \cdots < n$.
Have you seen a proof of this stuff???
Standard logic involving the truth values ‘true’ and ‘false’ can make it difficult to model some of the fuzziness we use in everyday speech. If you’d bought a bike yesterday then today it would be truthful to say “This bike is new”, but it wouldn’t be truthful so say it in 20 years’ time. However, between now and then there won’t be a specific day on which the statement “This bike is new” suddenly switches from being true to being false. How can you model this situation?
One approach to modelling this situation is with fuzzy logic where you allow your truth values to be things other than just true and false. For instance, you can take the interval $[0,1]$ as the set of truth values with $0$ representing false and $1$ representing true. So the truth degree of the statement “This bike is new” would vary, being $1$ today and decreasing to something very close to $0$ in 20 years’ time.
This post is an attempt by me to understand this fuzzy logic in the context of enriched category theory, in particular, using $[0,1]$ as a monoidal category to enrich over. We will see that categories enriched over $[0,1]$ can be interpreted as fuzzy posets or fuzzy preorders.
This was going to be a comment on Tom Avery’s Kan Extension Seminar post on Metric Spaces, Generalized Logic, and Closed Categories but grew too big!
I have been trying to understand a bit of fuzzy logic because it feeds into fuzzy concept analysis which I’m also trying to understand. As ever, I understand the category theory better than the logic. In terms of references, I’ve picked up bits and pieces from Belohlavek’s Concept lattices and order in fuzzy logic and Hájek’s Metamathematics of fuzzy logic.
Truth degrees in $[0,1]$: fuzzy logic
As mentioned above you can replace your set of truth values $\{ \mathrm{false},\mathrm{true}\}$ by the interval $[0,1]$, with $0$ representing completely false and $1$ representing completely true. Thus an element of $[0,1]$ represents a ‘degree’ of truth. The idea is that this represents the degree of truthfulness of statements like “George is old” or “The film is good”: it is not supposed to represent than the probabitlity of the truth of a statement like “I will receive an email from a student tomorrow”. We are modelling vagueness and not probabilty. Also, we are supposed to be modeling ojectivity so the truth degree of the statement “George is old” will depend on the age of George and not on anyone’s opinion of how old he seems to be.
Being category theorists, we might try to fit this into Lawvere’s framework of generalized logic, where we think of our truth values as forming a closed monoidal category. We can make $[0,1]$ it into a poset — and hence a category — by using the order $\le$, this will be our notion of entailment. So if $P$ and $Q$ are statements then $P\vdash Q$ precisely when the truth degree of $P$ is less than or equal to the truth degree of $Q$. However, I’m not entirely sure how to interpret that.
The observant amongst you will have noticed that this poset $[0,1]$ is isomorphic to the poset we use for metric spaces $([0,\infty ],\ge )$, via the maps $x\mapsto \exp (-x)$ and $a\mapsto -\ln (a)$. So an alternative interpretation of these truth values will be as ‘proximities’ — proximity approximately $0$ meaning not at all close, and proximity approximately $1$ meaning really, really close. [Switching from distances in $[0,\infty ]$ to proximities in $[0,1]$ is an important step in calculating magnitudes of metric spaces, but I won’t say any more about that.]
Next we want some notion of ‘conjunction’ $\otimes$ and ‘implication’ $\Rightarrow$. In category theory terms we want a closed monoidal structure on the category $[0,1]$. There are at least three such structures that are well studied.
The product structure: Here $a\otimes b \coloneqq a\cdot b$ and $a\Rightarrow b \coloneqq 1$ if $a\le b$ and $b/a$ otherwise. Via the exponential and logarithm maps this corresponds to the usual closed monoidal structure on $[0,\infty ]$ of addition and truncated subtraction.
The Gödel structure: Here $a\otimes b \coloneqq \min (a,b)$ and $a\Rightarrow b \coloneqq 1$ if $a\le b$ and $b$ otherwise. Via the exponential and logarithm maps this corresponds to the closed monoidal structure on $[0,\infty ]$ which gives rise to ultrametric spaces.
The Łukasiewicz structure: Here $a\otimes b \coloneqq \max (a+b-1,0)$ and $a\Rightarrow b\coloneqq \min (1-a+b,1)$. Via the exponential and logarithm maps this corresponds to the monoidal structure $x\otimes y\coloneqq -\ln (\max (e^{-x}+e^{-y}-1,0))$ and that doesn’t look at all familiar!
If the truth degrees of two statements are $P$ and $Q$ then the truth degree of both statements together should be given by the conjunction $P\otimes Q$, but I don’t know examples of real world models with these three different logics. Anybody? I would be very interested in hearing thoughts and ideas on this.
In all of the above three cases the closed monoidal category $\{ \mathrm{false},\mathrm{true}\}$ of classical truth values embeds closed monoidally.
Enriching over $[0,1]$: fuzzy preorders
We can now think about what categories enriched in $([0,1],\le )$ are, as generalizations of preorders. Such an enriched category consists of a set $C$ and to each pair of elements $c,c'\in C$ we associate a truth degree $C(c,c')\in [0,1]$, we want to think of this as the degree to which $c$ is ordered before $c'$. This is what we will call a fuzzy preorder. It will satisfy fuzzy transitivity: $C(c',c'')\otimes C(c,c')\le C(c,c'')$. How this is actually interpreted will depend on which product $\otimes$ we took.
Here’s an example of a fuzzy preorder. Suppose that Bart is aged 9, Marge is aged 39 and Homer is aged 40. Is it true that Homer is (at least) as old as Marge? Yes. Is it true that Marge is (at least) as old as Homer? Well, nearly. Is it true that Bart is (at least) as old as Marge? Well, not really at all. We can formalise this into a fuzzy preorder $\succeq$ “is at least as old as” where $\begin{aligned} (\text {Homer}\succeq \text {Marge})&=1 \\ (\text {Marge}\succeq \text {Homer})&=\text {a tiny bit less than}\, \, 1\\ (\text {Bart}\succeq \text {Marge})&=\text {much less than}\, \, 1. \end{aligned}$ Algebraically we could define $A\succeq B\coloneqq \begin{cases} 1 &\text {if}\, \, \mathrm{age}(A)\ge \mathrm{age}(B)\\ \mathrm{age}(A)/\mathrm{age}(B)&\text {if}\, \, \mathrm{age}(A)\lt \mathrm{age}(B) . \end{cases}$ This gives a fuzzy preorder on the set $(\text {Bart}, \text {Marge}, \text {Homer})$ with respect the product monoidal structure $\cdot$ on $[0,1]$, or in other words, it gives a category enriched over the monoidal category $([0,1],\cdot ,1)$.
There should be some more convincing examples, but I haven’t found any yet.
An exercise at this point is to look at the dictionary in my post on Galois correspondences and enriched categories and extend it by another column for categories enriched over $[0,1]$.
Term is nearly over, which for me means the end of the 4th year Fourier Analysis course I’ve been teaching for the last couple of years.
I was fortunate enough to take over the course from Jim Wright, a genuine expert on the subject, and I inherited a great set of notes from him. But I felt the need to make the course my own, so I’ve been writing my own notes, which I’ve just finished: notes here, plus accompanying problem sheets. They’re mostly about convergence of Fourier series, with a delicious dessert of Fourier analysis on finite abelian groups.
But what I wanted to write about here — and get your opinions on — was not Fourier analysis, but some questions of teaching. This year, I’ve been (in the jargon) “flipping the classroom”, or at least partially flipping it (which reminds me of that mysterious substance, partially inverted sugar syrup, that you sometimes see on ingredients lists). I’d like to hear about other people’s similar experiences.
The allotted class time is two 50-minute “lectures” a week, plus one “workshop” (Edinburgh lingo for tutorial or exercise class) a fortnight. Here’s the routine for lectures:
At least a week before a given lecture, I make Latexed notes for that lecture available online.
The students are committed to spending at least an hour before each lecture reading over the notes.
I spend the first half of each lecture giving an overview of that portion of the notes, on the firm assumption that the students have done the prescribed amount of reading beforehand. Thus, I can spend time talking about the big picture rather than the mechanics, and I can concentrate on parts that seem likely to cause most difficulty rather than having to go through everything.
The second half of the lecture is interactive. We do different activities each time, e.g.:
- solving exercises (individually, in small groups, or all together)
- question and answer sessions
- definitions quizzes
- working on mathematical writing
- identifying hard parts of the course and having students explain them to each other.
In a completely flipped classroom, all the time would be taken up with interactive work. The first half of each class isn’t too far from a traditional lecture, except for the data-projected notes and the prior reading by the students. That’s why I said it’s only partially flipped.
It’s the first time I’ve done things quite like this, so this year has been pretty experimental. In particular, before every lecture, I’ve had to come up with an idea for the second half of the class, and evidently some have been more successful than others. I’m also running out of ideas! It’s a good thing it’s the end of term.
Have you ever taught in a similar way? If so, what worked well, and what didn’t? Can you share some ideas for classroom activities?
I’ve just submitted a piece for the new Opinions section of the monthly LMS Newsletter: Should mathematicians cooperate with GCHQ? (Update: now available (p.34).) The LMS is the London Mathematical Society, which is the UK’s national mathematical society. My piece should appear in the April edition of the newsletter, and you can read it below.
Here’s the story. Since November, I’ve been corresponding with people at the LMS, trying to find out what connections there are between it and GCHQ. Getting the answer took nearly three months and a fair bit of pushing. In the process, I made some criticisms of the LMS’s total silence over the GCHQ/NSA scandal:
GCHQ is a major employer of mathematicians in the UK. The NSA is said to be the largest employer of mathematicians in the world. If there had been a major scandal at the heart of the largest publishing houses in the world, unfolding constantly over the last eight months, wouldn’t you expect it to feature prominently in every issue of the Society of Publishers’ newsletter?
To its credit, the LMS responded by inviting me to write an inaugural piece for a new Opinions section of the newsletter. Here it is.
I had a 500-word limit, so I omitted a lot. Here are the facts on the LMS’s links with GCHQ, as stated to me by the LMS President Terry Lyons:Should mathematicians cooperate with GCHQ?
Tom Leinster
One of the UK’s largest employers of mathematicians has been embroiled in a major international scandal for the last nine months, stands accused of law-breaking on an industrial scale, and is now the object of widespread outrage. How has the mathematical community responded? Largely by ignoring it.
GCHQ and its partners have been systematically monitoring as much of our lives as they possibly can, including our emails, phone calls, text messages, bank transactions, web browsing, Skype calls, and physical location. The goal: “collect it all”. They tap internet trunk cables, bug charities and political leaders, disrupt lawful activist groups, and conduct economic espionage, all under the banner of national security.
Perhaps most pertinently to mathematicians, the NSA (GCHQ’s major partner and partial funder) has deliberately undermined internet encryption, inserting a secret back door into a standard elliptic curve algorithm. This can be exploited by anyone sufficiently skilled and malicious — not only the NSA/GCHQ. (See Thomas Hales’s piece in February’s Notices of the AMS.) We may never know what else mathematicians have been complicit in; GCHQ’s policy is not to comment on intelligence matters, which is to say, anything it does.
Indifference to mass surveillance rests partly on misconceptions such as “it’s only metadata”. This is certainly false; for instance, GCHQ has used webcams to collect images, many sexually intimate, of millions of ordinary citizens. It is also misguided, even according to the NSA’s former legal counsel: “metadata absolutely tells you everything about somebody’s life”.
Some claim to be unbothered by the recording of their daily activities, confident that no one will examine their records. They may be right. But even if you feel that way, do you want the secret services to possess such a powerful tool for chilling dissent, activism, and even journalism? Do you trust an organization operating in secret, and subject to only “light oversight” (a GCHQ lawyer’s words), never to abuse that power?
Mathematicians seldom have to face ethical questions. But now we must decide: cooperate with GCHQ or not? It has been suggested that mathematicians today are in the same position as nuclear physicists in the 1940s. However, the physicists knew they were building a bomb, whereas mathematicians working for GCHQ may have little idea how their work will be used. Colleagues who have helped GCHQ in the past, trusting that they were contributing to legitimate national security, may justifiably feel betrayed.
At a bare minimum, we as a community should talk about it. Sasha Beilinson has proposed that working for the NSA/GCHQ should be made “socially unacceptable”. Not everyone will agree, but it reminds us that we have both individual choice and collective power. Individuals can withdraw their labour from GCHQ. Heads of department can refuse staff leave to work for GCHQ. The LMS can refuse GCHQ’s money.
At a bare minimum, let us acknowledge that the choices are ours to make. We are human beings before we are mathematicians, and if we do not like what GCHQ is doing, we do not have to cooperate.
The Society has an indirect relationship with GCHQ via a funding agreement with the Heilbronn Institute, in which the Institute will give up to £20,000 per year to the Society. This is approximately 0.7% of our total income. This is a recently made agreement and the funding will contribute directly to the LMS-CMI Research Schools, providing valuable intensive training for early career mathematicians. GCHQ is not involved in the choice of topics covered by the Research Schools.
So, GCHQ’s financial support for the LMS is small enough that declining it would not make a major financial impact.
I hope the LMS will make a public statement clarifying its relationship with GCHQ. I see no argument against transparency.
Another significant factor (which Lyons alludes to above and is already a matter of public record) is that GCHQ is a funder of the Heilbronn Institute, which is a collaboration between GCHQ and the University of Bristol. I don’t know that the LMS is involved with Heilbronn beyond what’s mentioned above, but Heilbronn does seem to provide an important channel through which (some!) British mathematicians support the secret services.
Finally, I want to make clear that although I think there are some problems with the LMS as an institution, I don’t blame the people running it, many of whom are taking time out of extremely busy schedules for the most altruistic reasons. As I wrote to one of them:
I’m genuinely in awe of the amount that you […] give to the mathematical community, both in terms of your selflessness and your energy. I don’t know how you do it. Anything critical I have to say is said with that admiration as the backdrop, and I hope I’d never say anything of the form “do more!”, because to ask that would be ridiculous.
Rules for commenting here I’ve now written several posts on this and related subjects (1, 2, 3, 4). Every time, I’ve deleted some off-topic comments — including some I’ve enjoyed and agreed with heartily. Please keep comments on-topic. In case there’s any doubt, the topic is the relationship between mathematicians and the secret services. Comments that stray too far from this will be deleted.
Guest post by Alexander Campbell
We want to develop category theory in a general 2-category, in order to both generalise and clarify our understanding of category theory. The key to this endeavour is to express the basic notions of the theory of categories in a natural 2-categorical language. In this way we are continuing a theme present in previous posts from the Kan Extension Seminar, wherein monads and adjunctions were given a 2-categorical setting, and by analogy, in our very first paper, whose purpose was to express basic notions of the theory of sets in a natural categorical language. In this post we consider a concept very central and special to category theory: the Yoneda lemma.
So what’s the Yoneda Lemma again?
The Yoneda lemma says that for any object $a$ of a category $A$, the diagram $\begin{matrix} 1 & \overset{a}{\rightarrow} & A \\ {}_{\ast} \searrow & \overset{\iota}{\Rightarrow} & \swarrow_{A(a,-)} \\ & Set \\ \end{matrix}$ is a left extension.
In this post I will give a motivation for the notion of Yoneda structure, as defined in the paper Yoneda Structures on 2-Categories of Ross Street and Bob Walters. But before we begin I would like to take this opportunity to thank Emily for inviting me to join the Kan Extension Seminar and for her support and encouragement throughout the course. This has been and continues to be a singularly valuable experience in my first year as a category theorist.
Liftings and extensions
After the various compositions available in a 2-category (nicely unified as the operation of pasting), the most basic notions of two-dimensional algebra are liftings and extensions. A diagram $\begin{matrix} & A \\ {}^h \swarrow & \overset{\alpha}{\Leftarrow} & \searrow^{g} \\ C & \overset{f}{\rightarrow} & B \\ \end{matrix}$ in a 2-category $\mathcal{K}$ is said to be a left lifting if pasting along $h$ effects a bijection of 2-cells $\frac{h \Rightarrow k}{g \Rightarrow f k}.$
There is a similar elementary definition of left extension, but we can simply say that a left extension is a left lifting in $\mathcal{K}^{op}$ (reverse 1-cells). Hence these are just dual versions of a single notion. Left liftings in $\mathcal{K}^{co}$ and $\mathcal{K}^{coop}$ are right liftings and right extensions respectively, but these will not concern us here.
An elementary result of 2-categorical algebra is the pasting lemma. This tells us that given a diagram $\begin{matrix} & & A \\ {}^h \swarrow & \overset{\beta}{\Leftarrow} & \downarrow^{g} & \overset{\alpha}{\Leftarrow} & \searrow^{f} \\ D & \underset{l}{\rightarrow} & C & \underset{k}{\rightarrow} & B \\ \end{matrix}$ in which $\alpha$ is a left lifting, $\beta$ is a left lifting if and only if the composite is a left lifting. By duality, we immediately get a corresponding result for left extensions.
The Yoneda lemma and Yoneda structures
A Yoneda structure on a 2-category consists of two pieces of data satisfying three axioms. We will see that the data is what is necessary to naturally express the Yoneda lemma in the 2-category, and that the axioms are expressions of familiar results of category theory.
In a general 2-category, following the philosophy of generalised elements, it is natural to consider arrows $X \longrightarrow A$ with arbitrary domain (not just restricted to the terminal object 1) as objects of $A$. In order to state the Yoneda lemma for such generalised objects, we must replace $\ast \colon 1 \longrightarrow \text{Set}$ with some arrow $y_X \colon X \longrightarrow P X$. Here $P X$ could be loosely thought of as “sets freely varying over $X$”; in CAT, it is the category of presheaves over $X$ and $y_X$ is the Yoneda embedding.
But now we must address the issue of size. Firstly, in CAT, the Yoneda embedding $y_X$ exists only for locally small $X$. Secondly, in the wish for a snappy statement of the Yoneda lemma, I left an important condition unmentioned: it is not true for a general category $A$ that the hom-functor $A(a,-)$ exists for every object $a$. (It would be true if $A$ were locally small, but we do not want CAT to consist only of locally small categories; for if $X$ is not small, then $P X$ is not locally small.) Hence we must restrict those elements for which we state the Yoneda lemma.
We can now specify the data of a Yoneda structure on a 2-category. The first piece of data is a class of admissible arrows in the 2-category; the only property we require of this class is that it be a right ideal, meaning that if $g$ is admissible, then so is $g f$ for every such composable $f$. We say that an object $A$ is admissible if the identity arrow $1_A \colon A \longrightarrow A$ is admissible; in the Yoneda structure on CAT, the admissible objects are the locally small categories. Hence we see that there is nothing mysterious about size; it is just part of the structure. The other piece of a Yoneda structure is the Yoneda arrows: an admissible arrow $y_A \colon A \longrightarrow P A$ for each admissible object $A$.
With this structure we can naturally state the Yoneda lemma in a 2-category. We take this as our first axiom for a Yoneda structure.
Axiom 1. For $X$, $a$ both admissible, the left extension $\begin{matrix} X & \overset{a}{\rightarrow} & A \\ {}_{y_X} \searrow & \overset{\chi^a}{\Rightarrow} & \swarrow_{A(a,1)} \\ & P X \\ \end{matrix}$ exists.
The universal property of this extension is indicated by $\frac{A(a,1) \Rightarrow f}{y_X \Rightarrow f a}.$
So we have used the Yoneda lemma as a definition of the hom-functors $A(a,1)$; the axiom asserts their existence. We can further define the hom-functors on 1-cells and 2-cells in both variables: in the first variable by the universal property of extension, and in the second variable by composition. In particular, for (suitably admissible) $f \colon A \longrightarrow B$, we can define $P f \colon P B \longrightarrow P A$ as $P f = (P B)(B(1,f),1)$, with a similar definition on 2-cells. This all just follows from Axiom 1, hence is part of any Yoneda structure.
All further use of the admissibility notion is just to ensure the existence of the data of Axiom 1. So I will make minimal explicit mention of it from here; it can be easily filled in as necessary.
Universal arrows and absolute liftings
The second axiom of a Yoneda structure is expressed in terms of liftings. I will now exhibit the presence of liftings in category theory.
The first formalisation of the notion of universal property that Mac Lane gives in Categories for the Working Mathematician is that of universal arrow (I will deal with the ‘left’ sense here). Given a functor $f \colon A \longrightarrow B$ and an object $b$ of $B$, he defines what it means for a “pair” consisting of an object $a$ of $A$ and an arrow $\theta \colon b \longrightarrow fa$ to be a universal arrow from $b$ to $f$. This definition says precisely that $\begin{matrix} & 1 \\ {}^a \swarrow & \overset{\theta}{\Leftarrow} & \searrow^{b} \\ A & \underset{f}{\rightarrow} & B \\ \end{matrix}$ is a left lifting diagram.
In a general 2-category, the restriction of our consideration to such ‘global objects’ is untenable. So how do we express the universal property of a ‘generalised arrow’ from a ‘generalised object’ $b \colon X \longrightarrow B$ to $f$? It is not enough to say that the corresponding diagram $\begin{matrix} & X \\ {}^a \swarrow & \overset{\theta}{\Leftarrow} & \searrow^{b} \\ A & \underset{f}{\rightarrow} & B \\ \end{matrix}$ is a left lifting diagram; we must consider the object $b$ at all its stages of development. Hence we define a 2-cell $\theta$ to be a universal arrow from $b$ to $f$ when for every earlier stage $Y \longrightarrow X$, the diagram $\begin{matrix} & Y \\ & \downarrow \\ & X \\ {}^a \swarrow & \overset{\theta}{\Leftarrow} & \searrow^{b} \\ A & \underset{f}{\rightarrow} & B \\ \end{matrix}$ is a left lifting. That is, we require our corresponding diagram to be an absolute left lifting (meaning that the property of being a left lifting is preserved when composed with any arrow into $X$).
Hence absolute liftings give a natural 2-categorical expression of the notion of universal arrow. This slogan can help us to interpret some basic facts of the algebra of 2-categories as they relate to category theory. For instance, the result that a 2-cell $\begin{matrix} & A \\ {}^f \swarrow & \overset{\eta}{\Leftarrow} & \searrow^{1} \\ B & \underset{g}{\rightarrow} & A \\ \end{matrix}$ is the unit of an adjunction $f \dashv g$ if and only if it is an absolute left lifting corresponds to the familiar result of basic category theory that states that a functor $g \colon B \longrightarrow A$ is a right adjoint if and only if there is a universal arrow to $g$ from every object of $A$.
Universal elements and representability
Now, observe that the 2-cell in the ordinary Yoneda lemma, in addition to being a left extension, is trivially a left lifting; for this just means that there is a bijection between arrows $a \longrightarrow b$ and elements of the set $A(a,b)$. In familiar terms, this says that $\iota$ is a universal element of the functor $A(a,-)$, and by the above discussion we can immediately incorporate this into our setting as the second axiom for a Yoneda structure.
Axiom 2. The 2-cell $\chi^a$ of Axiom 1 is an absolute left lifting.
The universal property of this absolute left lifting is indicated by $\frac{a x \Rightarrow b}{X(1,x) \Rightarrow A(a,b)}.$
The combination of these two universal properties of the 2-cells $\chi^a$ is the ignition that allows us to begin to develop category theory in our 2-category. Together they give a bijection of 2-cells $\pi$ and $\eta$ as indicated by the bijections $\frac{\frac{B(s,1) \Rightarrow C(j,t)}{y_A \Rightarrow C(j,t s)}}{j \Rightarrow t s}$ and as displayed in the diagram $\begin{matrix} & A \\ {}^s \swarrow & \overset{\chi^s}{\Leftarrow} & \searrow^{y_A} \\ B & \overset{B(s,1)}{\rightarrow} & P A \\ {}_t \searrow & \Downarrow^{\pi} & \nearrow_{C(j,1)} \\ & C \\ \end{matrix} \qquad = \qquad \begin{matrix} & & A \\ {}^s \swarrow & \overset{\eta}{\Leftarrow} & \downarrow^{j} & \overset{\chi^j}{\Leftarrow} & \searrow^{y_A} \\ B & \underset{t}{\rightarrow} & C & \underset{C(j,1)}{\rightarrow} & P A \\ \end{matrix}$ From Axiom 2 and the pasting lemma, we get that the LHS is an absolute left lifting (with base $C(j,t) = C(j,1)t$) if and only if $\eta$ is an absolute left lifting. This is exactly the realisation in our setting of the familiar correspondence between universal elements of $C(j,t)$ and universal arrows from $j$ to $t$. In particular, if $\pi$ is an isomorphism, then this condition holds, and we have a universal element of $C(j,t)$. Note that we can understand $\pi$ being an isomorphism as meaning that $C(j,t)$ is representable.
Now, recall the representability condition of ordinary category theory that says that a functor $f \colon B \longrightarrow \text{Set}$ is representable if and only if it has a universal element. We have seen that one half of this implication follows from the first two axioms of a Yoneda structure. We could take the other half as an axiom, being a basic result of category theory that we wish to incorporate into our setting.
Axiom 3*. If a 2-cell $\sigma \colon A(a,1) \Rightarrow f \colon A \longrightarrow P X$ yields an absolute left lifting diagram when pasted onto $\chi^a$, then $\sigma$ is an isomorphism.
Understood in our interpretation, this is indeed the other implication of the representability condition.
However this axiom is too strong to capture all the examples we want! While it does hold in ordinary category theory and in 2-categories of internal categories and variable (=indexed=parametrized) categories, it is does not hold in general in enriched category theory. For in enriched category theory, representability of a functor is expressed as an isomorphism in the base of enrichment, and the bijection of sets of the universal element condition is not enough to capture this.
Hence we do not take Axiom 3* as an axiom for Yoneda structures. Instead we take a couple of special cases. By Axiom 1 for $y_A$ and $gf$ in place of $a$, we have the bijections of 2-cells $\frac{P1_A \Rightarrow 1_{P A}}{y_A \Rightarrow y_A} \qquad \frac{C(g f,1) \Rightarrow P f.C(g,1)}{y_A \Rightarrow P f.C(g,1).g f}.$ Let $\iota_A$ correspond to $1_{y_A}$ in the first bijection, and $\theta_{f,g}$ correspond to the 2-cell $\begin{matrix} A & \overset{f}{\rightarrow} & B & \overset{g}{\rightarrow} & C \\ {}^{y_A} \downarrow & \overset{\chi^{B(1,f)}}{\Rightarrow} & {}^{y_B} \downarrow & \overset{\chi^g}{\Rightarrow} & \swarrow_{C(g,1)} \\ P A & \underset{P f}{\leftarrow} & P B \\ \end{matrix}$ in the second. These corresponding 2-cells are an identity and a pasting of two absolute left liftings respectively. Hence (by the pasting lemma in the second case) they are both absolute left liftings, and Axiom 3* would imply that they are isomorphisms. We take these special cases as our third and final axiom.
Axiom 3. The 2-cells $\iota_A$ and $\theta_{f,g}$ are isomorphisms.
In summary, a Yoneda structure on a 2-category consists of admissible arrows and Yoneda arrows satisfying Axioms 1, 2, and 3.
Formal category theory
Now that we have established the notion of a Yoneda structure on a 2-category, we can begin to develop category theory within such a 2-category. What is remarkable is that everything follows formally and naturally from the Yoneda structure and from the elementary two-dimesional algebra of pasting, liftings, extensions and adjunctions. In their paper, Street and Walters develop results concerning adjunctions, weighted colimits, EM objects and Kleisli objects for monads, and they introduce the notion of totality. I will now briefly survey some of these results; I refer the reader to the paper for details.
They show that the 2-categorical definition of an adjunction $f\dashv g$ in terms of its unit and counit is equivalent to the condition $B(f,1) \cong A(1,g)$, which expresses the classical hom-set definition of an adjunction. This is an isomorphism of modules, and indeed, one benefit of a Yoneda structure is that it equips our 2-category with a notion of modules (a.k.a. profunctors): modules from $A$ to $B$ are arrows $A \longrightarrow P B$.
It is well known that a satisfactory theory of colimits in enriched category theory requires the consideration of colimits weighted by presheaves. However the natural generality of the theory calls for weighting by modules (presheaves on $A$ are modules from $I$ to $A$). Given a module $j\colon M \longrightarrow P A$ and an arrow $s\colon A \longrightarrow C$ (suitably admissible), we define the colimit $\text{colim}(j,s) \colon M \longrightarrow C$ of $s$ weighted by $j$ by the formula $C(\text{colim}(j,s),1) \cong (P A)(j,C(s,1))$. From this formula we can derive many familiar results such as the fact that left adjoints preserve colimits, associativity of colimits in the weight, and the Yoneda isomorphisms. Also we can introduce the notion of pointwise extension in terms of colimits and show that they are indeed extensions. Thus all the results of the calculus of colimits, which in enriched category theory depend on a “complicated machinery” involving enriched extranatural transformations, coends, enriched functor categories etc., in fact follow easily and formally in the context of the elementary notion of a 2-category with a Yoneda structure.
Street and Walters show that a certain result from Street’s earlier paper, the characterisation of the Eilenberg-Moore category of a monad in CAT as certain sheaves on the Kleisli category of the monad, holds for any Yoneda structure. They also show that in the presence of more properties of the 2-category (particularly a bo-ff factorisation of its arrows), we can make sense of the idea of the Kleisli object as a “full subcategory” of the Eilenberg-Moore object.
This paper introduced the notion of totality. An arrow $s \colon A \longrightarrow C$ is said to be total when $A$ and $s$ are admissible and the module $C(s,1) \colon C \longrightarrow P A$ has an admissible left adjoint. It follows immediately from the developed theory that this left adjoint $z$ must be given by $z j = \text{colim}(j,s)$ and be the pointwise left extension of $s$ along the Yoneda arrow $y_A$ (note that these assertions require all the admissibility assumptions in the definition). Such adjunctions are absolutely fundamental to the use of category theory; they sometimes go by the name of nerve and realization.
An object $A$ is said to be total when the identity arrow $1_A$ is total; this is the same as saying the Yoneda arrow $y_A \colon A \longrightarrow P A$ has a left adjoint. This can be thought of as a sort of strong cocompleteness property. In particular we get a “very satisfactory” adjoint functor theorem: if $A$ is total, an arrow $f \colon A \longrightarrow B$ has a right adjoint if and only if $B(f,1)$ is admissible and $f$ preserves the colimit $\text{colim}(B(f,1),1)$.
Examples
Important motivating examples of 2-categories with Yoneda structures are 2-categories of internal categories (see the papers of Street and Weber) and enriched categories, in particular CAT is the 2-category of categories internal to SET. Given categories $\mathcal{E}$ and $\mathcal{V}$ suitably nice and approriate for internalisation and enrichment respectively, we get Yoneda structures on the 2-categories Cat$(\mathcal{E}$) and $\mathcal{V}$-Cat roughly as follows. For such a category $\mathcal{C}$, the size structure arises from an object $S$ of $\mathcal{C}$ which is in some way understood as a “full subcategory” of $\mathcal{C}$; for internal categories, this can be characterised as an internal full subcategory, or equivalently, a classifying discrete opfibration. Then the objects $P A$ arise as exponentials $S^{A^{op}}$, and the Yoneda arrows are exponential adjoints of hom-functors.
The 2-category LEX of finitely complete categories and finitely continuous functors inherits a Yoneda structure from CAT. This example is of interest because the total objects are (almost) Grothendieck toposes.
Another example, not treated in this paper but in this later paper of Street, are 2-categories of variable categories. These are of the form $\mathcal{K}$ = Hom($\mathcal{E}^{op}$,CAT), consisting of pseudofunctors, pseudonatural transformations, and modifications. I mention this example here for the interesting way in which the size structure arises. An arrow $f \colon A \longrightarrow B$ in $\mathcal{K}$ is admissible if for every $a\colon U \longrightarrow A$ and $b \colon V \longrightarrow B$ where $U$ and $V$ are representables on objects of $\mathcal{E}$, the comma object $f a/b$ is representable.
Guest post by Dimitri Zaganidis
First of all, I would like to thank Emily for organizing the Kan extension seminar. It is a pleasure to be part of it. I want also to thank my advisor Kathryn Hess and my office mate Martina Rovelli for their revisions.
In the fifth installment of the Kan Extension Seminar we read the paper “Review of the Elements of 2-categories” by G.M Kelly and Ross Street. This article was published in the Proceedings of the Sydney Category Theory Seminar, and its purpose is to “serve as a common introduction to the authors’ paper in this volume”.
The article has three main parts, the first of them being definitions in elementary terms of double categories and 2-categories, together with the notion of pasting. In a second chapter, they review adjunctions in 2-categories with a nice expression of the naturality of the bijection given by mates using double categories. The last part of the article introduces monads in 2-categories, and specializing to 2-monads towards the end.
Double categories and 2-categories
The article starts with the definition of a double category as a category object in the (not locally small) category of categories $\mathbf{CAT}$. (I think that there might be some set theoretic issues with such a category, but you can add small everywhere if you want to stay safe.)
The authors then switch to a description of such an object in terms of objects, horizontal arrows, vertical arrows, and squares, with various compositions and units. I will explain a bit how to go from one description to the other.
A category object is constituted of a category of objects, a category of morphisms, target and source functors, identity functor and a composition.
The category of objects is the category whose morphisms are “the objects” and whose morphisms are the vertical arrows. The category of morphisms is the category whose objects are the horizontal morphisms and whose morphisms are the squares, with vertical composition.
Since the functors $\mathrm{Obj}, \mathrm{Mor}: \mathbf{CAT} \longrightarrow \mathbf{SET}$ preserve pullbacks, by applying them to a double category seen as a category object, we get actual categories. Applying $\mathrm{Obj}$ to the double category, we get the category whose objects are “the objects” and whose morphisms are the horizontal arrows. Applying $\mathrm{Mor}$, we get the category whose objects are the vertical morphisms and whose morphisms are the squares, but this time with horizontal composition.
An interesting thing to notice is that the symmetry of the explicit description of a double category is much more apparent than the symmetry of its description as a category object.
One can define a $2$-category as a double category with a discrete category of objects, or as a $\mathbf{CAT}$-enriched category, exactly as one can define a simplicially enriched small category as either a category enriched over $\mathbf{sSet}$ or as a category object in $\mathbf{sSet}$ with a discrete simplicial set of objects.
The second viewpoint on 2-categories leads to definitions of 2-functors and 2-natural transformations and also to modifications, once one makes clear what enrichment a category of 2-functors inherits.
It is also worthwhile mentioning that the pasting operation makes computations easier to make, because they are more visual. The proof of proposition 2.1 of this paper is a good illustration of this.
The basic example of a 2-category is $\mathbf{CAT}$ itself, with natural transformations as 2-cells (squares).
As category theory describes set-like constructions, 2-category theory describes category-like constructions. You can usually build up categories with as objects sets with extra structure. In the same way, small V-categories, V-functors, and V-natural transformations form a 2-category.
My first motivation to learn about 2-categories was the 2-category of quasi-categories defined by Joyal and which has been studied by Emily Riehl and Dominic Verity in the article The 2-category theory of quasi-categories in particular the category-like constructions one can make with quasi-categories, such as adjunctions and limits.
Adjunctions and mates in 2-categories
It is not a surprise that 2-categories are the right framework in which to define adjunctions. To build the general definition from the usual one, you just need to replace categories by objects in a 2-category, functors by 1-cells of the 2-category, and natural transformations by its 2-cells.
Adjunctions in a 2-category $\mathcal{C}$ compose (as in $\mathbf{CAT}$), and one can form two, a priori distinct double categories of adjunctions. Both of them will have the objects of $\mathcal{C}$ as objects and the horizontal morphisms being the morphisms of $\mathcal{C}$, while their vertical morphisms are the adjunctions (going in the same direction as the right adjoint, by convention). The two double categories differ on the squares. Given adjunctions $f \dashv u$ and $f' \dashv u'$ together with 1-cells $a:A \longrightarrow A'$ (between the domains of $u$ and $u'$) and $b:B \longrightarrow B'$ (between the codomains of $u$ and $u'$), the squares of the first double category are 2-cells $b u \Rightarrow u'a$ while the squares of the second are 2-cells $f'b \Rightarrow a f$.
Now, the bijective correspondence between these kind of 2-cells given by mates induces an isomorphism of double categories. This means in particular that the horizontal (or vertical) composite of mates is equal to the mate of the corresponding composite.
This is a very beautiful way to express the naturality of the mate correspondence, and it provides a one-line proof of the fact that two 1-cells that are left adjoints to a same 1-cell are naturally isomorphic.
Monads in 2-categories
2-categories are also the right framework to define monads. A monad in a 2-category $\mathcal{C}$ and on an object $B$ is a 1-cell $t:B \longrightarrow B$ together with 2-cells $\mu: t^2 \Rightarrow t$ and $\eta: 1_B \Rightarrow t$, verifying the usual equations $\mu \circ (t\mu)= \mu \circ (\mu t)$ and $\mu \circ(t\eta) = 1_B = \mu \circ(\eta t)$. Since 2-functors preserve both horizontal and vertical compositions, for all objects $X$ of $\mathcal{C}$, $t$ induces a monad on $\mathcal{C}(X,B)$, given by post-composition $(t_{\ast},\mu_{\ast},\eta_{\ast})$. The authors call * an action of $t$ on $s:X \longrightarrow B$* a $t_\ast$ algebra structure on $s$.
The definition of monad morphism in this article is quite surprising for someone who has read the previous article by Ross Street, The formal theory of monads, which we read for the second meeting of the Kan extension seminar.
In Ross Street’s original paper, a monad morphism $(B,t,\mu, \eta) \longrightarrow (B',t',\mu', \eta')$ is a 1-cell $f: B \longrightarrow B'$ together with a $2$-cell $\phi: t'f \Rightarrow f t$ verifying certain conditions.
In this paper, morphisms of monads are defined only for monads on the same object, letting the $1$-cell part of a monad transformation of the previous article be the identity. This leads the authors to reverse the direction of the morphism, since the $2$-cell seems to go in the reverse direction of the $1$-cell!
One might think that fixing $f=1$ is needed by the result which explains that there is a bijection between monad morphisms $t \Rightarrow t'$ and actions of $t$ on $t'$ making $t'$ a “$(t,t')$-bimodule”. In fact, in the case where $f$ is not necessarily the identity, there is a bijection between 2-cells $\phi:t f \Rightarrow f t'$ such that $(f,\phi)$ is a monad functor and actions of $t$ on $ft'$ making $ft'$ a “$(t,t')$-bimodule”. A statement of the same kind can be also made for monad functor transformations (in the sense of the formal theory of monads). A 2-cell $\sigma : f \Rightarrow f'$ is a monad functor transformation $(f,\phi) \longrightarrow (f', \phi')$ if and only if $\sigma t': f t' \Rightarrow f' t'$ is a morphism of “$(t,t')$-bimodules”.
A 2-category admits the construction of algebras if for every monad $(B,t,\mu, \eta)$, the 2-functor $X \mapsto \mathcal {C}(X,B)^{(t_\ast, \mu_\ast, \eta_\ast)}$ is representable. The representing object is called the object of $t$-algebras. By Yoneda, the free-forgetful adjunction can be made internal in this case.
The terminology is justified, because in the $2$-category $\mathbf{CAT}$, it specializes to the usual notions of the category of $t$-algebras and the corresponding free-forgetful adjunction.
A monad in $\mathcal{C}$ is the same as a 2-functor $\mathbf{Mnd} \longrightarrow \mathcal{C}$, where $\mathbf{Mnd}$ is the 2-category with one object and $\Delta_+$, the algebraist’s simplicial category as monoidal hom-category (with ordinal sum). Since moreover, $\mathcal {C}(X,B)^{(t_\ast, \mu_\ast, \eta_\ast)} \cong [\mathbf{Mnd}, \CAT]( \Delta_{+\infty}, \mathcal{C}(X,-)),$ (where $\Delta_{+\infty}$ is the subcategory of maps of $\Delta$ preserving maxima, which is acted on by $\Delta_+$ via ordinal sum) one can see that the object of t-algebras can be expressed as a weighted limit.
As a consequence, it is not surprising that a 2-category admits the construction of algebras under some completeness assumptions.
Doctrines
In the last part of the article, the authors review the notion of a doctrine, which is a 2-monad in 2-$\mathbf{CAT}$, i.e., a 2-functor $D: \mathcal {C} \longrightarrow \mathcal{C}$, where $\mathcal{C}$ is a 2-category, and 2-natural transformations $m$ and $j$, which are respectively the multiplication and the unit, verifying the usual identities. The fact that it is both a monad on a 2-category and in another one can be a bit disturbing at first.
If $(D,m,j)$ is a doctrine over a 2-category $\mathcal{C}$, then its algebras will be objects $X$ of $\mathcal{C}$ together with an action $DX \longrightarrow X$, exactly as in the case of algebras over a usual monad.
Already with morphisms, we can take advantage of the fact that a 2-category $\mathcal{C}$ has 2-cells, and define $D$-morphisms to be lax in the sense that the diagram $\begin{matrix} DX & \longrightarrow & DY \\ \downarrow & & \downarrow \\ X & \longrightarrow & Y \end{matrix}$ is not supposed to be commutative, but is rather filled by a 2-cell with some coherence properties.
As one might expect, we can actually form a 2-category of such $D$-algebras by adding 2-cells, using again the $2$-cells existing in $\mathcal{C}$.
If we keep only the $D$-morphisms that are strict, we obtain the object of algebras (which should be a $2$-category) that we discussed before.
One example of a doctrine is $\Delta_+ \times - : \mathbf{CAT} \longrightarrow \mathbf{CAT}$ together with the multiplication induced by the ordinal sum, and unit given on $\mathcal {D}$ by the functor $\mathcal{D} \longrightarrow \Delta_+ \times \mathcal{D}$ that sends $d$ to $(\emptyset,d)$.
The algebras for this doctrine will be categories equipped with a monad acting on them, while the $D$-morphisms are transformations of monads, and the $D$-2-cells are exactly the monad functor transformations of Street’s article.
Here, since we have two different 2-categories of algebras (with strict $D$-morphisms or with all of them), one can wonder if monad morphisms $D \longrightarrow D'$ will induce $2$-functors $D'$-$\mathbf{Alg} \longrightarrow D$-$\mathbf{Alg}$ on the level of these $2$-categories.
This is indeed the case, and one can actually go even one step further and define monad modifications, using the fact that 2-$\mathbf{CAT}$ is in fact a 3-category! These modifications between two given monad morphisms are in fact in bijective correspondence with the 2-natural transformations between the $2$-functors induced by these monad morphisms on the level of algebras (with lax D-morphisms). Note that they are not the same as monad morphisms transformations of Street’s article.
This bijection is nice because it implies that you can compare 2-categories of algebras by only looking at the doctrines: if they are equivalent, so are the 2-categories of algebras.
The fact that this bijection does not hold when we restrict only to strict morphism was really surprising to me, but I guess this is the price to pay to use the 3-category structure.
During the last days of April, the Kan extension seminar will be reading the article “Two dimensional monad theory”, by Blackwell, Kelly and Powell. We will then have more to say about these 2-monads!
Leila Schneps is trying to raise $6,000 for what sounds like a good cause: translating a biography of Grothendieck into English:
As of this moment she’s raised $350… including $100 of her own money.
Leila Schneps writes:
Two volumes of a projected 3-volume biography of A. Grothendieck have been completed by Winfried SCHARLAU, and are available in German on Amazon.com through “Books on Demand” as
“Wer ist Alexander Grothendieck? Teil 1: Anarchy”
and
“Wer ist Alexander Grothendieck? Teil 3: Spiritualität”.
Grothendieck is aware of Scharlau’s work and even met him during the preparation of the part concerning the lives of his parents, although he subsequently ruptured relations with Scharlau as he has with everyone else.In 2013, an English translation of volume 1 by my sister, Melissa Schneps became available on Amazon as
“Who is Alexander Grothendieck? Part 1, Anarchy”
Melissa’s translation of volume 1 was funded by Jim Carlson of the Clay Institute for Mathematics.The new director of the Clay Institute has not accepted to fund translation of volume 3. The book covers Grothendieck’s life from his rupture with the mathematical establishment in 1970 to his disappearance in 1991 and contains a wealth of information from documents and interviews.
Due to multiple expressions of interest in an English translation, we have decided to use this method to raise funding for the translation of volume 3. Once the project is underway, we propose to create a web page where chapters of the volume can be made accessible to donors as they appear. Once completed, the new volume will be available on Amazon through “Books on Demand” like the others.
If you are interested in Alexander Grothendieck and his astonishing life, please help to make this happen.
In response Kevin Lin asked the obvious question: “What about volume 2?” I don’t know the answer! Do you?
One of my dreams these days is to get people to apply modern math to ecology and biology, to help us design technologies that work with nature instead of against it. I call this dream ‘green mathematics’. But this will take some time to reach, since living systems are subtle, and most mathematicians are more familiar with physics.
So, I’ve been warming up by studying the mathematics of chemistry, evolutionary game theory, electrical engineering, control theory and information theory. There are a lot of ideas in common to all these fields, but making them clear requires some category theory. I call this project ‘network theory’. I’m giving some talks about it at Oxford.
(This diagram is written in Systems Biology Graphical Notation.)
Here’s the plan:
Network Theory
Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:
I. Electrical circuits and signal-flow graphs.
II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.
III. Bayesian networks, information and entropy.
All these talks will be in Lecture Theatre B of the Computer Science Department—you can see a map here, but the entrance is on Keble Road. Here are the times:
• Friday 21 February 2014, 2 pm: Network Theory: overview. Also available on YouTube.
• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal-flow graphs. Also available on YouTube.
• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams. Also available on YouTube.
• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy. Also available on YouTube.
I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.
Ulrike Tillmann has also kindly invited me to give a topology seminar:
Operads and the Tree of Life
Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.
I’m not sure exactly where this will take place, but probably somewhere in the Mathematical Institute, shown on this map. Here’s the time:
• Monday 24 February, 3:30 pm, Operads and the Tree of Life.
If you’re nearby, I hope you can come to some of these talks — and say hi!
(This diagram was drawn by Darwin.)
Guest post by Nick Gurski
I have been thinking about various sorts of operads with my PhD student Alex Corner, and have become interested in the following very concrete question: what are examples of operads in the category of finite groups under the cartesian product? I don’t know any really interesting examples, but maybe you do! After the break I will explain why I got interested in this question, and tell you about some examples that I do know.
Alex and I started off thinking about various sorts of things you might do with operads in $\mathbf{Cat}$, and were eventually forced into what we currently call an action operad. This is an operad $G$ whose job it is to act on the objects of other operads. The key examples to keep in mind are the terminal operad (each set is just a singleton), the symmetric operad (the $n$th set is the $n$th symmetric group), and the braid operad (the $n$th set is the $n$th braid group). The technical definition involves an operad $G$, a group structure on each set $G(n)$, a map of operads $\pi:G \rightarrow \Sigma$ to the symmetric operad which is levelwise a group homomorphism, and a final condition (when it makes sense) relating operadic composition, $\mu$, with group multiplication:
$\mu(g; f_1, \ldots, f_n) \cdot \mu(g'; f_1', \ldots, f_n') = \mu (g g'; f_{\pi(g')(1)}f_{1}', \ldots, f_{\pi(g')(n)}f_{n}').$
These ideas have cropped up before, for example in Nathalie Wahl’s thesis or this preprint of Wenbin Zhang. Once you have this definition, you can define operads which are equivariant with respect to $G$: you have an operad $P$, a group action of $G(n)$ on $P(n)$ for each $n$, and some equivariance conditions that generalize the equivariance conditions for a symmetric operad.
This isn’t the only thing you can do with an action operad, you can also think about the 2-monad on $\mathbf{Cat}$ whose algebras are strict monoidal categories where $G(n)$ acts naturally on $n$-fold tensor products. If you do this with the symmetric operad, you get symmetric strict monoidal categories (or permutative categories, if you are a topologist); if you do this with the (ribbon) braid operad, you get (ribbon) braided strict monoidal categories; and if you do this with the action operad of all terminal groups, you get back plain old strict monoidal categories. The only other “naturally-occurring” example of an action operad that I know of is the operad of $n$-fruit cactus groups, $J_{n}$. These groups come up in the representation theory of quantum groups, particularly the theory of crystals, and the monoidal structure you get out here is something Drinfeld called a coboundary category. I can give you a generators-and-relations definition of these groups, at which point I would have completely exhausted my understanding of this operad. The best reference that I know of is the paper Crystals and coboundary categories by Henriques and Kamnitzer.
What does this have to do with my question about operads of finite groups? Well, as it turns out, the structure map for an action operad $\pi:G \rightarrow \Sigma$ only has two options: it can be surjective, or it can be the zero map (i.e., everything maps to the identity permutation). Furthermore, that condition I wrote down relating group multiplication and operadic composition says that giving an action operad with $\pi$ the zero map is equivalent to giving an operad in which the operadic composition maps all preserve group multiplication. Alex and I already showed that the operadic composition of all identity elements is the identity element in the target, in other words an action operad with $\pi$ the zero map is just an operad in the category of groups using the cartesian product.
You can take kernels of maps between action operads, so in particular given any action operad $G$ you can take the kernel of $\pi:G \rightarrow \Sigma$; this gives you an operad in groups. For the examples above, you get finite groups when $G$ is the terminal operad or $G = \Sigma$ (as the terminal operad is obviously the kernel in the case of $G = \Sigma$), but for the rest of the examples you get an operad in the category of groups, but most of those groups are infinite. The groups involved are the so-called pure versions: pure braids, pure ribbon braids, and pure $n$-fruit cacti. One can then think of an action operad with surjective map $\pi$ as being an extension of the symmetric operad by an operad in the category of groups (the “pure” version), and that action operad is finite if and only if the operad in the category of groups is one containing only finite groups. We can translate this back into thinking about monoidal categories by then noting if we have some notion of strict monoidal category in which $n$th tensor powers come equipped with a natural action of a finite group $G(n)$ for all $n$, then we must be able to dig up an operad in the category of finite groups.
Now let’s talk concrete examples: what operads do I know in the category of finite groups? Well, there is obviously the terminal operad, but I can go very slightly further in that I can tell you how to construct some new ones. Here are two methods you can use to construct operads of finite groups.
- Let $A$ be a finite abelian group. Then there is an operad $\underline{A}$ where $\underline{A}(n) = A^{n}$ (the power here is a cartesian power). The operadic composition map $A^{n} \times A^{k_{1}} \times \cdots \times A^{k_{n}} \rightarrow A^{\Sigma k_{i}}$ takes the vector $(a_{1}, \ldots, a_{n})$ in the first coordinate and duplicates the $i$th coordinate $k_{i}$ times, then adds the result to the vector you get by just concatenating the $n$ vectors in the $A^{k_{i}}$. Here $A$ must be abelian as it appears as $\underline{A}(1)$, and $G(1)$ must be abelian for any action operad by an Eckmann-Hilton argument.
- Now let $G$ be any finite group, but in fact you will see that we don’t use anything interesting about the group structure here. There is an operad $G^{c2}$ with $G^{c2}(n) = G^{\binom{n}{2}}$ given the pointwise group structure. You should think of this as the set of functions from $\{ (i,j): 1 \leq i \lt j \leq n \}$ to $G$. Operad composition is a map $G^{\binom{n}{2}} \times G^{\binom{k_{1}}{2}} \times \cdots \times G^{\binom{k_{n}}{2}} \rightarrow G^{\binom{\Sigma k_{i}}{2}}.$ If we are given $1 \leq a \lt b \leq \sum k_{i}$, we must give back an element of $G$. If there is some $r$ such that $\sum_{i=1}^{r-1} k_{i} \leq a \lt b \lt \sum_{i=1}^{r} k_{i},$ then we use the function coming from $G^{c2}(r)$ evaluated on $1 \leq a +1 - \sum_{i=1}^{r-1} k_{i} \lt b+1- \sum_{i=1}^{r-1} k_{i} \leq k_{r}.$ If not, then there exist $r \lt s$ such that $a$ lies in the $r$th “interval” as we had before and $b$ lies in the $s$th interval, so then you use the function coming from $G^{c2}(n)$ evaluated on $r \lt s$.
I am reasonably content with the first of these constructions, I understand how to do the second but don’t really know where it comes from, and I have some ideas about how one could try to insert the finite group of their choice as $G(0)$ but haven’t checked the details, and know basically nothing else. Furthermore, I don’t know how these interact with each other, or how you can form extensions of $\Sigma$ with them outside of some obvious constructions.
Those are just the two straightforward approaches that I know of to construct operads in the category of finite groups. You can also try to construct these operads from operads of topological spaces or simplicial sets, but once again I don’t know of an example that produces finite things (apart from ones giving the groups above). Do you know any others?
Guest post by Tom Avery
Before getting started, I’d like to thank Emily for organizing the seminar, as well as all the other participants. It’s been a lot of fun so far! I’d also like to thank my supervisor Tom Leinster for some very helpful suggestions when writing this post.
In the fourth instalment of the Kan Extension Seminar we’re looking at Lawvere’s paper “Metric spaces, generalized logic, and closed categories”. This is the paper that introduced the surprising description of metric spaces as categories enriched over a certain monoidal category $\mathbb{R}$. A lot of people find this very striking when they first see it, and it helps to drive home the point that enriched categories are not just ordinary categories with some extra structure on the hom-sets; in fact the hom-sets don’t have to be sets at all!
Lawvere also intended the paper to serve as an accessible introduction to enriched category theory, so it begins fairly gently with some basic definitions. For the purposes of this post however, I’ll assume the reader has at least seen the definitions of symmetric monoidal closed categories, $\mathcal{V}$-categories, and $\mathcal{V}$-functors. If not, everything you need can be found on the nlab.
Lawvere begins by describing some of the philosophy behind the paper. He argues that rather than being abstract nonsense that only appears at the highest level of mathematics, category theory can be found even in the most elementary concepts. He mentions the well known examples of posets and groups as categories, and the aim of the paper is to describe another example of an elementary mathematical structure arising from categorical constructions, namely metric spaces.
We’ll also look at posets as enriched categories that are closely related to metric spaces, and this is where the “generalized logic” of the title comes in. The idea is that objects in a monoidal category $\mathcal{V}$ will be truth values for our generalized logic, morphisms in $\mathcal{V}$ will be entailments between them, certain functors correspond to logical operations, and adjointness relations between these functors correspond to rules of inference.
We will only consider categories enriched in complete and cocomplete symmetric monoidal closed categories (which I’ll just call “monoidal categories”), and we are particularly interested in two examples. The first is $\mathbb{2}$, which has two objects, $\mathrm{false}$ and $\mathrm{true}$, and three morphisms, which we write as entailments: $\mathrm{false} \vdash \mathrm{false}$, $\mathrm{false} \vdash \mathrm{true}$ and $\mathrm{true} \vdash \mathrm{true}$. The tensor product is given by conjunction, the unit is $\mathrm{true}$ and the internal hom is implication.
The hom-tensor adjunction gives a correspondence between judgements $a \wedge u \vdash v$ and $u \vdash (a \Rightarrow v)$, which is just the deduction theorem from logic. Categories enriched in $\mathbb{2}$ are simply posets (well, really preorders); we interpret the hom-object $X(x,y)$ as the truth value of the statement $x \leq y$, and the composition and identities give us
$(x \leq y) \wedge (y \leq z) \vdash x \leq z \quad \text{and} \quad \mathrm{true} \vdash x \leq x.$
The category $\mathbb{R}$ has as objects non-negative reals together with $\infty$, and has a single morphism $x \to y$ precisely when $x \geq y$. The tensor product is addition with $0$ as the unit, and this forces the internal hom to be truncated subtraction:
$[x,y] = \mathrm{max} (y-x, 0),$
but we’ll just write this as $y-x$. Thus the fact that $x+y \geq z$ if and only if $y \geq z-x$ can be thought of as somehow generalizing the deduction theorem. The composition and identities in an $\mathbb{R}$-category $X$ give us
$X(x,y) + X(y,z) \geq X(x,z) \quad \text{and} \quad 0 \geq X(x,x),$
which are familiar as axioms of metric spaces, and this is how we will refer to $\mathbb{R}$-categories. This is a weakening of the usual metric space axioms in three ways:
We don’t require that $X(x,y) = 0$ only when $x=y$. This condition amounts to requiring that isomorphic objects are equal, which is undesirable from a categorical point of view; this is also the reason for dropping antisymmetry of posets. There are naturally occurring examples that don’t satisfy the non-degeneracy condition; perhaps most notably, it makes sense to regard the distance between two measurable functions that are equal almost everywhere as being zero.
We allow $X(x,y) = \infty$. Allowing infinite distances is also quite natural, as it’s easy to imagine spaces where it’s impossible to get from one point to another. For example, this is the most sensible way to think of distances in a discrete space.
We don’t require $X(x,y) = X(y,x)$. Allowing non-symmetry is a more significant shift in what we mean by a metric space, but even so there are natural examples, for example the Hausdorff metric on subsets of a metric space. It’s best to think of the hom-values in a generalized metric space not as representing distances, but representing the least time or effort it takes to get from one point or state to another. When you look at it this way, there’s no reason to assume symmetry (e.g. going up and down a hill are not equally difficult).
There is a monoidal functor $\mathbb{2} \to \mathbb{R}$ sending $\mathrm{false} \mapsto \infty$ and $\mathrm{true} \mapsto 0$, and this induces an inclusion of the category of posets into the category of metric spaces. There is also a monoidal functor $\mathbb{2} \to \mathrm{Set}$, so metric spaces and categories can be seen as generalizing posets in different directions.
Functor categories and Yoneda
A $\mathbb{2}$-functor between posets is precisely an order preserving map, and a $\mathbb{R}$-functor between metric spaces is a Lipschitz map with Lipschitz constant 1, i.e. a function $f \colon X \to Y$ such that $Y(f x,f x') \leq X(x,x')$ for all $x$ and $x'$ in $X$. For an arbitrary monoidal category $\mathcal{V}$ and $\mathcal{V}$-categories $X$ and $Y$ the functor $\mathcal{V}$-category $[X,Y]$ has as objects the $\mathcal{V}$-functors $X \to Y$, and the hom objects are defined by the end
$[X,Y](f,g) = \int _x Y(f x,g x).$
When $\mathcal{V} = \mathrm{Set}$, this just gives the ordinary end representation of the set of natural transformations. Since $\mathbb{2}$ and $\mathbb{R}$ are both posets, ends reduce to products, which are conjunctions in $\mathbb{2}$ and suprema in $\mathbb{R}$. So for posets, the object of natural transformations between order preserving maps $f,g \colon X \to Y$ is the truth value of the statement $\forall x (f x \leq g x)$. In other words, order preserving maps are ordered by domination. For metric spaces the hom object $[X,Y](f,g)$ is given by $\sup_x Y(f x,g x)$. So the space of Lipschitz-1 maps between metric spaces is endowed with the sup metric.
The closed structure of the monoidal category $\mathcal{V}$ makes $\mathcal{V}$ itself into a $\mathcal{V}$-category. For $\mathbb{2}$ this gives the obvious interpretation of $\mathbb{2}$ as a poset. However, the metric on $\mathbb{R}$ is given by truncated subtraction, rather than the usual $|y-x|$. The self-enrichment of $\mathcal{V}$ means we can define the presheaf $\mathcal{V}$-category $[X^{\mathrm{op}}, \mathcal{V}]$, and that allows us to define the Yoneda embedding $X \to [X^{\mathrm{op}},\mathcal{V}]$, which sends an object $x$ to the representable presheaf $X(-,x)$. Just as in ordinary category theory, this embedding is full and faithful, which in the case of posets gives:
Proposition. Any poset $X$ can be embedded into the poset of downwards closed subsets of $X$ (ordered by inclusion), by sending an element $x$ to the set of all things $\leq x$.
Here we have used the correspondence between downwards closed subsets and order preserving maps $X^{\mathrm{op}} \to \mathbb{2}$. For metric spaces we have:
Proposition. Any metric space can be isometrically embedded into the space of Lipschitz-1 maps $X^{\mathrm{op}} \to \mathbb{R}$.
Adequacy and density
We say that a $\mathcal{V}$-functor $i \colon X \to Y$ is adequate if the composite
$Y \to [Y^{\mathrm{op}},\mathcal{V}] \to [X^{\mathrm{op}}, \mathcal{V}]$
of the Yoneda embedding with restriction along $i$ is full and faithful. We say that $i$ is dense if we have an isomorphism of bimodules $i_{\star} \circ i^{\star} \cong 1_X$. I haven’t yet said what bimodules are, but the important thing is that in the case of metric spaces this becomes the condition that
$Y(y_1,y_2) = inf_x [Y(y_1,i x) + Y(i x, y_2)].$
This terminology is slightly unfortunate, because what Lawvere calls an adequate functor is nowadays called a dense functor, and Lawvere’s notion of density doesn’t have a modern name as far as I’m aware. A functor is dense iff it’s image is dense in the usual metric space sense. Lawvere shows that a dense functor is adequate, at least in the case of metric spaces. A metric space is separable if it admits a dense map from the discrete space on the natural numbers, and this result give us
Proposition. Any separable metric space can be isometrically embedded in the space of sequences of reals endowed with the sup metric.
This space of sequences is not quite $l_{\infty}$ as usually defined because of the non-standard metric on $\mathbb{R}$.
The comprehension scheme
A(n ordinary) functor $p \colon E \to B$ is called a discrete opfibration if every morphism $p(e) \to b$ is the image under $p$ of a unique morphism with domain $e$. There is a correspondence between discrete opfibrations on $B$ and functors $B \to \mathrm{Set}$, and this correspondence is obtained by restricting a certain adjunction $\mathrm{Cat}/B \rightleftarrows [B, \mathrm{Set}]$ to an equivalence. This adjunction can be defined for arbitrary $\mathcal{V}$, provided that the unit of $\mathcal{V}$ is in fact terminal (this gives a way of defining projections from a tensor product to each of the factors, which is not generally possible). I won’t go into the details of how it’s defined, but I’ll describe what it gives for $\mathbb{2}$ and $\mathbb{R}$.
Given an order-preserving map $p \colon E \to B$ between posets, the left adjoint of this adjunction sends $p$ to $\phi_p \colon B \to \mathbb{2}$, defined by setting $\phi_p(b)$ to be the truth value of the statement $\exists e(b \leq p(e))$. The right adjoint sends $\phi \colon B \to \mathcal{V}$ to $\pi \colon \{B|\phi\} \to B$, where $\{B|\phi\}$ is the (upwards closed) subset of $B$ on which $\phi$ takes the value $\mathrm{true}$, and $\pi$ is the inclusion. If we take $B$ to be a discrete poset (i.e. a set) then $\phi$ is just a unary predicate on $B$, and
$\{B|\phi \} = \{b \in B | \phi (b) = \mathrm{true} \}.$
Lawvere calls the right adjoint of the adjunction the comprehension scheme, and this example shows why.
For metric spaces, the left adjoint sends a Lipschitz-1 map $p \colon E \to B$ to the function $\phi_p \colon B \to \mathbb{R}$ defined by $\phi_p (b) = \inf_e B(p (e), b)$, i.e. the distance from the image of $p$. The right adjoint sends $\phi \colon B \to \mathbb{R}$ to the inclusion of its vanishing set.
It’s interesting to note that for $\mathrm{Set}$ and $\mathbb{2}$, this adjunction restricts to an equivalence between a subcategory of $\mathcal{V} \text{-} \mathrm{Cat} / B$ (the discrete opfibrations, and inclusions of upwards closed subsets respectively) and the whole functor category $[B, \mathcal{V}]$. For $\mathbb{R}$ however, only those functions $B \to \mathbb{R}$ that give the distance from a closed subset are invariant under the adjunction, and these correspond to the inclusion of this closed set. Thus purely categorical considerations naturally give rise to the closed sets in $B$ as a distinguished class of subsets.
Bimodules and Kan extensions
For $\mathcal{V}$-categories $X$ and $Y$, a bimodule $\phi \colon X \to Y$ consists of a $\mathcal{V}$-functor $\phi \colon Y^{\mathrm{op}} \otimes X \to \mathcal{V}$ (the placement of the $\mathrm{op}$ is not completely agreed upon, but I’ll stick with Lawvere’s convention). We can think of a bimodule as a $\mathcal{V}$-category structure on the disjoint union of $X$ and $Y$, with $\phi(y,x)$ as the hom object between an object of $Y$ and an object of $X$, and with all the hom objects in the other direction being the initial object of $\mathcal{V}$. The $\mathcal{V}$-functoriality of $\phi$ means that we have a “composition” operation $Y(y',y)\otimes \phi(y,x) \to \phi(y',x)$, and dually, and that this composition is associative.
There are two special types of bimodule that are particularly important, and they are both induced by $\mathcal{V}$-functors. Given a $\mathcal{V}$-functor $f \colon X \to Y$, we define bimodules $f_{\star} \colon X \to Y$ and $f^{\star} \colon Y \to X$ by
$f_{\star} (y,x) = Y(y, f x) \quad \text{and} \quad f^{\star} (x,y) = Y(f x,y).$
A morphism between two bimodules $X \rightrightarrows Y$ is just a $\mathcal{V}$-natural transformation between the corresponding $\mathcal{V}$-functors $Y^{\mathrm{op}} \otimes X \to \mathcal{V}$. Given bimodules $\phi \colon X \to Y$ and $\psi \colon Y \to Z$, we can define the composite bimodule $\psi \circ \phi \colon X \to Z$ by the coend
$\psi \circ \phi (z,x) = \int^y \psi (z,y) \otimes \phi (y, x),$
and this composition is associative (up to isomorphism). The bimodule $1_X \colon X \to X$ given by $X(-,-) \colon X^{\mathrm{op}} \otimes X \to \mathcal{V}$ serves as an identity for this composition, and hence there is bicategory $\mathcal{V} \text{-} \mathrm{Mod}$ of $\mathcal{V}$-categories, bimodules and natural transformations.
Let $k$ be the $\mathcal{V}$-category with a single object and whose only hom object is the unit object of $\mathcal{V}$. Then a bimodule $k \to k$ is just an object of $\mathcal{V}$, and composition is just the tensor product in $\mathcal{V}$. So bimodule composition generalizes the tensor product, and it’s natural to ask whether the internal hom of $\mathcal{V}$ also generalizes. Since general bimodule composition is not commutative, there are two composition operations (on the left and the right) that could potentially have right adjoints, and in fact they both do, and these are defined by end formulae. If we restrict our attention to modules of the form $f_{\star}$ for a $\mathcal{V}$-functor $f$, the operation $\beta \mapsto \beta \circ f_{\star}$ also has a left adjoint, which is given by $\phi \mapsto \phi \circ f^{\star}$, and can also be written explicitly as a coend.
Let $f$ be a $\mathcal{V}$-functor $X \to Y$. Bimodules $Y \to k$ are just $\mathcal{V}$-functors $\psi \colon Y \to \mathcal{V}$, and the bimodule composite $\psi \circ f_{\star}$ is the bimodule $X \to k$ corresponding to the $\mathcal{V}$-functor $\psi f \colon X \to \mathcal{V}$. Thus the left and right adjoint to precomposition with $f_{\star}$ in this case give left and right Kan extensions along $f$, and the formulae for these adjoints reduce to the familiar coend and end formulae for left and right Kan extensions:
$\mathrm{Lan}_f \phi (y) = \int^x Y(f x, y) \otimes \phi (x) \quad \text{and} \quad \mathrm{Ran}_f \phi (y) = \int _x [ Y(y, f x), \phi (x) ]$
When $f$ is full and faithful, the Kan extensions really are extensions, in other words if you extend and then restrict, you get back what you started with. So specializing all this to the case $\mathcal{V} = \mathbb{R}$ gives:
Proposition. Let $X$ be a subspace of a metric space $Y$, and let $\phi$ be a Lipschitz-1 map from $X$ to $\mathbb{R}$. Then $\phi$ has both maximal and minimal Lipschitz-1 extensions to the whole of $Y$, given by
$\mathrm{Lan}_f \phi (y) = \inf_x [ \phi(x) + Y(f x,y) ] \quad \text{and} \quad \mathrm{Ran}_f \phi (y) = \sup_x [ \phi(x) - Y(y,f x)].$
A particularly interesting example is given when $X$ is the space of simple, non-negative functions on a probability space (i.e. positive linear combinations of indicator functions of measurable sets), and $f$ is the inclusion into the space of all measurable functions (with the sup metric). If you take $\phi (x)$ to be the integral of the simple function $x$, then the two Kan extensions of $\phi$ give the upper and lower integrals of a measurable function $y$. In particular, $y$ is integrable if and only if $\mathrm{Lan}_f \phi (y) = \mathrm{Ran}_f \phi (y)$.
If we think of a map from a poset to $\mathbb{2}$ as a unary relation or predicate, then the left and right Kan extensions become
$\mathrm{Lan}_f \phi (y) = \exists x (f x \leq y \wedge \phi (x) ) \quad \text{and} \quad \mathrm{Ran}_f \phi (y) = \forall x (y \leq f x \Rightarrow \phi (x)).$
If we restrict to the case when $X$ and $Y$ are just sets, so that $f$ is an arbitrary function, this gives
$\mathrm{Lan}_f \phi (y) = (\exists x \in f^{-1} (y)) \phi (x) \quad \text{and} \quad \mathrm{Ran}_f \phi (y) = (\forall x \in f^{-1}(y)) \phi (x),$
and specializing even further to the case when $f$ is a product projection $W \times Z \to Z$ we have
$\mathrm{Lan}_f \phi (z) = \exists w \phi (w,z) \quad \text{and} \quad \mathrm{Ran}_f \phi (z) = \forall w \phi (w,z).$
So existential and universal quantification are (very) special cases of Kan extensions. This leads Lawvere to use the phrase “Kan quantification” for Kan extensions.
Every $\mathcal{V}$-functor $X \to Y$ gives rise to an adjunction $f_{\star} \colon X \rightleftarrows Y \colon f^{\star}$ in the bicategory of bimodules, and it’s natural to ask under what conditions does every adjunction arise in this way. Recall that a metric space is (Cauchy) complete if every Cauchy sequence has a limit. A bit of care is needed when making this precise, because the possible non-symmetry of the metric means there are several things it could mean for a sequence to be Cauchy or convergent.
Proposition. A metric space $Y$ is Cauchy complete if and only if every adjunction of bimodules $X \rightleftarrows Y$ is induced by a Lipschitz-1 map $X \to Y$. In particular, the points of the Cauchy completion of $Y$ correspond to bimodule adjunctions $1 \rightleftarrows Y$ (where $1$ is the one point metric space).
This gives a description of Cauchy complete metric spaces that is purely categorical, so it makes sense to talk about Cauchy complete $\mathcal{V}$-categories for arbitrary $\mathcal{V}$. It turns out that an ordinary ($\mathrm{Set}$-enriched) category is Cauchy complete iff all idempotents split, and the Cauchy completion of a ring regarded as a one object category enriched in $\mathrm{Ab}$ is the category of finitely generated projective modules.
These days it’s more common to see Cauchy complete categories defined in terms of absolute colimits. I’m not too sure about the history of the concept, but I think the definition Lawvere gives came first, and Street later gave the characterisation in terms of absolute colimits in “Absolute colimits in enriched categories”.
Free $\mathcal{V}$-categories
Finally, Lawvere defines a $\mathcal{V}$-graph $(X, \gamma)$ to consist of a set $X$ of “vertices” and for every pair $(x,x')$ of vertices an object $\gamma (x,x')$ of $\mathcal{V}$ (i.e. like a $\mathcal{V}$-category but without composition or identities). A morphism of $\mathcal{V}$-graphs is a function $f \colon X \to Y$ and a family of morphisms $f_{x,x'} \colon \gamma (x,x') \to \delta (f x, f x')$. There is an evident forgetful functor from $\mathcal{V} \text{-} \mathrm{Cat}$ to the category of $\mathcal{V}$-graphs. This has a left adjoint which sends a $\mathcal{V}$-graph $(X, \gamma)$ to the $\mathcal{V}$-category $F X$ which has the vertices of $X$ as objects and hom objects defined by
$FX(x,x') = \sum_{x_1,\ldots , x_n} \gamma (x,x_1) \otimes \gamma (x_1,x_2) \otimes \ldots \otimes \gamma (x_n, x'),$
where the sum runs over all finite sequences of vertices. In the case of metric spaces, this gives the “least-cost” distance:
$inf_{x_1, \ldots , x_n} [ \gamma(x, x_1) + \gamma (x_1, x_2) + \ldots + \gamma (x_n, x') ].$
I’ll finish with a few questions for those more knowledgeable than myself:
I think the only categorical notion in the paper that I hadn’t at least heard of before is Lawvere’s notion of density (i.e. a functor $i$ such that $i_{\star} \circ i^{\star} = 1$; as noted above this is different from what people usually mean by a dense functor). Does this appear elsewhere in category theory, or is it only interesting for metric spaces?
A lot of the results about metric spaces are very similar to classical results, but are not quite the same because of the non-standard metric on $\mathbb{R}$. Is there any way of recovering the classical results without too much work?
The idea of generalized logic is intriguing, but I’m not sure how far you can go with it. Is it possible, for example, to talk about generalized models of a generalized theory?
You may recall how Tom Leinster, Tobias Fritz and I cooked up a neat category-theoretic characterization of entropy in a long conversation here on this blog. Now Tobias and I have a sequel giving a category-theoretic characterization of relative entropy. But since some people might be put off by the phrase ‘category-theoretic characterization’, it’s called:
I’ve written about this paper before, on my other blog:
- Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.
- Relative Entropy (Part 2): a category related to statistical inference, $\mathrm{FinStat},$ and how relative entropy defines a functor on this category.
- Relative Entropy (Part 3): statement of our main theorem, which characterizes relative entropy up to a constant multiple as the only functor $F : \mathrm{FinStat} \to [0,\infty)$ with a few nice properties.
But now the paper is actually done! Let me give a compressed version of the whole story here… with sophisticated digressions buried in some parenthetical remarks that you’re free to skip if you want.
Our gives a new characterization of the concept of relative entropy, also known as ‘relative information’, ‘information gain’ or—by people who like to use jargon to make their work seem obscure—‘Kullback-Leibler divergence’.
Here’s the basic idea. Whenever you have two probability distributions $p$ and $q$ on the same finite set $X,$ you can define the entropy of $q$ relative to $p$:
$S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right)$
Here we set
$q_x \ln\left( \frac{q_x}{p_x} \right)$
equal to $\infty$ when $p_x = 0,$ unless $q_x$ is also zero, in which case we set it equal to 0. Relative entropy thus takes values in $[0,\infty].$
Intuitively speaking, $S(q,p)$ measures how surprised you’d be if you thought a situation was described by a probability distribution $p$… but then someone came along and said no, it’s really $q$.
Or if ‘surprise’ sounds too subjective, it’s the expected amount of information gained when you discover the probability distribution is really $q,$ when you’d thought it was $p.$
Tobias and I wanted to use category theory to say what’s so great about relative entropy. We did it using a category $\mathrm{FinStat}$ where:
- an object $(X,q)$ consists of a finite set $X$ and a probability distribution $x \mapsto q_x$ on that set;
- a morphism $(f,s) : (X,q) \to (Y,r)$ consists of a measure-preserving function $f$ from $X$ to $Y,$ together with a probability distribution $x \mapsto s_{x y}$ on $X$ for each element $y \in Y$, with the property that $s_{x y} = 0$ unless $f(x) = y$.
If the raw math seems hard to swallow, perhaps some honey-coated words will help it go down. I think of an object of $\mathrm{FinStat}$ as a system with some finite set of states together with a probability distribution on its states. This lets me think of a morphism
$(f,s) : (X,q) \to (Y,r)$
in a nice way. First, there’s a measurement process $f : X \to Y$, a function from the set $X$ of states of some system being measured to the set $Y$ of states of some measurement apparatus. The condition that $f$ be measure-preserving says the probability that the apparatus winds up in any state $y \in Y$ is the sum of the probabilities of all states of $X$ leading to that outcome:
$\displaystyle{ r_y = \sum_{x \in f^{-1}(y)} q_x }$
Second, there’s a hypothesis $s$. This is a guess about the probability that the system being measured is in the state $x \in X$ given any measurement outcome $y \in Y.$ The guess is the number $s_{x y}$.
Now, suppose we have any morphism
$(f,s) : (X,q) \to (Y,r)$
in $\mathrm{FinStat}.$ From this we get two probability distributions on $X$. First, we have the probability distribution $p$ given by
$\displaystyle{ p_x = \sum_{y \in Y} s_{x y} r_y } \qquad \qquad \heartsuit\heartsuit\heartsuit$
This is our best guess about the the probability that the system is in any given state, given our hypothesis and the probability distribution of measurement results. Second, we have the ‘true’ probability distribution $q$.
In fact, this way of assigning relative entropies to morphisms defines a functor
$RE : \mathrm{FinStat} \to [0,\infty]$
where we use $[0,\infty]$ to denote the category with one object, the numbers $0 \le x \le \infty$ as morphisms, and addition as composition. More precisely, if
$(f,s) : (X,q) \to (Y,r)$
is any morphism in $\mathrm{FinStat},$ we define
$RE(f,s) = S(q,p)$
where $p$ is defined as in equation $\heartsuit\heartsuit\heartsuit$. This tells us how surprised we are when we learn the true probability distribution $q$, if our measurement results were distributed according to $r$ and our hypothesis was $s$.
The fact that $RE$ is a functor is nontrivial and rather interesting! It says that given any composable pair of measurement processes:
$(X,q) \stackrel{(f,s)}{\longrightarrow} (Y,r) \stackrel{(g,t)}{\longrightarrow} (Z,u)$
the relative entropy of their composite is the sum of the relative entropies of the two parts:
$RE((g,t) \circ (f,s)) = RE(g,t) + RE(f,s) .$
We prove that $RE$ is a functor. However, we go further: we characterize relative entropy by saying that up to a constant multiple, $RE$ is the unique functor from $\mathrm{FinStat}$ to $[0,\infty]$ obeying three reasonable conditions.
Lower semicontinuity
The first condition is that $RE$ is lower semicontinuous. The set $P(X)$ of probability distibutions on a finite set $X$ naturally has the topology of an $(n-1)$-simplex when $X$ has $n$ elements. The set $[0,\infty]$ has an obvious topology where it’s homeomorphic to a closed interval. However, with these topologies, the relative entropy does not define a continuous function
$\begin{array}{rcl} S : P(X) \times P(X) &\to& [0,\infty] \\ (q,p) &\mapsto & S(q,p) . \end{array}$
The problem is that
$\displaystyle{ S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right) }$
and $q_x \ln(q_x/p_x)$ is $\infty$ when $p_x = 0$ and $q_x > 0$ — but it’s $0$ when $p_x = q_x = 0.$
So, it turns out that $S$ is only lower semicontinuous, meaning that if $p^i , q^i$ are sequences of probability distributions on $X$ with $p^i \to p$ and $q^i \to q$ then
$S(q,p) \le \liminf_{i \to \infty} S(q^i, p^i)$
We give the set of morphisms in $\mathrm{FinStat}$ its most obvious topology, and show that with this topology, $RE$ maps morphisms to morphisms in a lower semicontinuous way.
(Lower semicontinuity may seem like an annoying property. But there’s a way to redeem it. There’s a sneaky topology on $[0,\infty]$ such that a function taking values in $[0,\infty]$ is lower semicontinuous (in the lim inf sense above) if and only if it’s continuous with respect to this sneaky topology!
Using this idea, we can make $FinStat$ and $[0,\infty]$ into topological categories — that is, categories internal to Top — in such a way that lower semicontinuity simply says
$RE : FinStat \to [0,\infty]$
is a continuous functor.
A bit confusingly, this sneaky topology on $[0,\infty]$ is called the upper topology. I’ve fallen in love with the upper topology on $[0,\infty]$. Why?
Well, $[0,\infty]$ is a very nice rig, or ‘ring without negatives’. Addition is defined in the obvious way, and multiplication is defined in the almost-obvious way, except that
$0 \cdot \infty = \infty \cdot 0 = 0$
Even this is actually obvious if you remember that it’s required by the definition of a rig. But if you try to put the ordinary closed interval topology on $[0,\infty]$, you’ll see multiplication is not continuous, because $a \cdot \infty$ is infinite when $a \gt 0$ but then it suddenly jumps down to zero when $a$ hits zero. However, multiplication is continuous if we give $[0,\infty]$ the upper topology! Then $[0,\infty]$ becomes a topological rig.)
Convex linearity
The second condition is that $RE$ is convex linear. We describe how to take convex linear combinations of morphisms in $\mathrm{FinStat},$ and then the functor $RE$ maps any convex linear combination of morphisms in $\mathrm{FinStat}$ to the corresponding convex linear combination of numbers in $[0,\infty].$
Intuitively, this means that if we take a coin with probability $P$ of landing heads up, and flip it to decide whether to perform one measurement process or another, the expected information gained is $P$ times the expected information gain of the first process plus $1-P$ times the expected information gain of the second process.
(Operadically, the point is that both $FinStat$ and $[0,\infty]$ are algebras of an operad P whose operations are convex linear combinations. The $n$-ary operations in P are just probability distributions on an $n$-element set. In other words, they’re points in the $(n-1)$-simplex.
So, saying that $RE$ is convex linear means that
$RE: FinStat \to [0,\infty]$
is a map of P-algebras. But we avoid discussing this in our paper because $FinStat$, being a category, is just a ‘weak’ P-algebra, and we decided this would be too much for our poor little readers.
For those who like fine nuances: P is a topological operad, and $FinStat$ and $[0,\infty]$ are algebras of this in the topological category TopCat. As I mentioned, $FinStat$ is a ‘weak’ P-algebra, meaning the laws for convex linear combinations hold only up to coherent natural isomorphism. $[0,\infty]$ is strict… but to get convex linear combinations like $\lambda \cdot 0 + (1 - \lambda) \infty$ to behave continuously, we have to give $[0,\infty]$ the upper topology!)
Vanishing on a subcategory
The third condition is that $RE$ vanishes on morphisms $(f,s) : (X,q) \to (Y,r)$ where the hypothesis $s$ is optimal. By this, we mean that equation $\heartsuit\heartsuit\heartsuit$ gives a probability distribution $p$ equal to the ‘true’ one, $q$.
That makes a lot of sense conceptually: we don’t gain any information upon learning the truth about a situation if we already knew the truth!
(But the subcategory of $FinStat$ where we keep all the objects but only these ‘optimal’ morphisms also has a nice category-theoretic significance. Tom Leinster called it FP in this post:
That’s because it’s the ‘free P-algebra on an internal P-algebra’, where P is the operad I mentioned. I won’t explain what this means here, because Tom did it! Suffice it to say that it’s a shockingly abstract piece of operad theory that nonetheless manages to capture the concept of entropy very neatly. But that’s plain old entropy, not relative entropy.)
The result
Here, then, is our main result:
Theorem. Any lower semicontinuous, convex-linear functor
$F : \mathrm{FinStat} \to [0,\infty]$
that vanishes on every morphism with an optimal hypothesis must equal some constant times the relative entropy. In other words, there exists some constant $c \in [0,\infty]$ such that
$F(f,s) = c RE(f,s)$
for any any morphism $(f,s) : (X,p) \to (Y,q)$ in $\mathrm{FinStat}.$
The proof
The proof is surprisingly hard. Or maybe we’re just surprisingly bad at proving things. But the interesting thing is this: the proof is swift and effective in the ‘generic’ case — the case where the support of the probability measures involved is the whole set they’re living on, and the constant $c$ is finite.
It takes some more work to handle the case where the probability measures have smaller support.
But the really hard work starts when we handle the case that, in the end, has $c = \infty$. Then the proof becomes more like analysis than what you normally expect in category theory. We slowly corner the result, blocking off all avenues of escape. Then we close in, grab its neck, and strangle it, crushing its larynx ever tighter, as it loses the will to fight back and finally expires… still twitching.
You’ve got to read the proof to understand what I mean.
Stefan Forcey just alerted me to a long and reflective piece (edit: now paywalled) in the Chronicle of Higher Education about the relationship between the NSA and American academics — mathematicians in particular.
Now many academics are trying to be heard from the outside, arguing that the NSA’s spying tactics are proving counterproductive and that university researchers have a duty to stop assisting them. […]
In the months since Edward J. Snowden fled the United States with tens of thousands of electronic documents describing NSA practices, mathematicians are realizing that they are in the same position as nuclear physicists in the middle of the last century, and business students in more recent times — suddenly needing to figure out the ethics behind what they do, said Edward Frenkel, a professor of mathematics at the University of California at Berkeley.
The Chronicle article cites not only various sources that have been mentioned here at the Café before (e.g. Beilinson’s letter and Stefan’s Notices article), but also something that hasn’t:
Their appeals were followed on January 24 by an open letter from a group of 50 researchers warning of long-term damage to society and to the nation’s technological enterprise from the NSA’s reported tactic of intentionally weakening computer-security standards so it can carry out spy operations.
“Every country, including our own, must give intelligence and law-enforcement authorities the means to pursue terrorists and criminals,” the researchers wrote, “but we can do so without fundamentally undermining the security that enables commerce, entertainment, personal communication, and other aspects of 21st-century life.”
Shunned as NSA advisers, academics question their ties to the agency, The Chronicle of Higher Education, 10 February 2014. [Article was readable for free when I first put this post up, but got put behind a paywall a few days later.]