One of my dreams these days is to get people to apply modern math to ecology and biology, to help us design technologies that work with nature instead of against it. I call this dream ‘green mathematics’. But this will take some time to reach, since living systems are subtle, and most mathematicians are more familiar with physics.
So, I’ve been warming up by studying the mathematics of chemistry, evolutionary game theory, electrical engineering, control theory and information theory. There are a lot of ideas in common to all these fields, but making them clear requires some category theory. I call this project ‘network theory’. I’m giving some talks about it at Oxford.
(This diagram is written in Systems Biology Graphical Notation.)
Here’s the plan:
Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal flow diagrams, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:
I. Electrical circuits and signal flow diagrams.
II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.
III. Bayesian networks, information and entropy.
All these talks will be in Lecture Theatre B of the Computer Science Department—you can see a map here, but the entrance is on Keble Road. Here are the times:
• Friday 21 February 2014, 2 pm: Network Theory: Overview.
• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal flow diagrams.
• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams.
• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy.
I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.
Ulrike Tillmann has also kindly invited me to give a topology seminar:
Operads and the Tree of Life
Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.
I’m not sure exactly where this will take place, but probably somewhere in the Mathematical Institute, shown on this map. Here’s the time:
• Monday 24 February, 3:30 pm, Operads and the Tree of Life.
If you’re nearby, I hope you can come to some of these talks — and say hi!
(This diagram was drawn by Darwin.)
A friend, John Wigglesworth, has asked me to announce the following conference. Perhaps he’ll let us know if pluralist homotopy type theorists are welcome ;).
I wonder if any progress has been made on my MO ‘multiverse’ question.
CFP: Symposium on the Foundations of Mathematics, Kurt Gödel Research Center, University of Vienna.
Set theory is taken to serve as a foundation for mathematics. But it is well-known that there are set-theoretic statements that cannot be settled by the standard axioms of set theory. The Zermelo-Fraenkel axioms, with the Axiom of Choice (ZFC), are incomplete. The primary goal of this symposium is to explore the different approaches that one can take to the phenomenon of incompleteness.
One option is to maintain the traditional “universe” view and hold that there is a single, objective, determinate domain of sets. Accordingly, there is a single correct conception of set, and mathematical statements have a determinate meaning and truth-value according to this conception. We should therefore seek new axioms of set theory to extend the ZFC axioms and minimize incompleteness. It is then crucial to determine what justifies some new axioms over others.
Alternatively, one can argue that there are multiple conceptions of set, depending on how one settles particular undecided statements. These different conceptions give rise to parallel set-theoretic universes, collectively known as the “multiverse”. What mathematical statements are true can then shift from one universe to the next. From within the multiverse view, however, one could argue that some universes are more preferable than others.
These different approaches to incompleteness have wider consequences for the concepts of meaning and truth in mathematics and beyond. The conference will address these foundational issues at the intersection of philosophy and mathematics. The primary goal of the conference is to showcase contemporary philosophical research on different approaches to the incompleteness phenomenon.
To accomplish this, the conference has the following general aims and objectives:
(1) To bring to a wider philosophical audience the different approaches that one can take to the set-theoretic foundations of mathematics. (2) To elucidate the pressing issues of meaning and truth that turn on these different approaches. (3) To address philosophical questions concerning the need for a foundation of mathematics, and whether or not set theory can provide the necessary foundation
Date and Venue: 7-8 July 2014 - Kurt Gödel Research Center, Vienna
Confirmed Speakers: Sy-David Friedman (Kurt Gödel Research Center for Mathematical Logic), Hannes Leitgeb (Munich Center for Mathematical Philosophy)
Call for Papers: We welcome submissions from scholars (in particular, young scholars, i.e. early career researchers or post-graduate students) on any area of the foundations of mathematics (broadly construed). Particularly desired are submissions that address the role of set theory in the foundations of mathematics, or the foundations of set theory (universe/multiverse dichotomy, new axioms, etc.) and related ontological and epistemological issues. Applicants should prepare an extended abstract (maximum 1500 words) for blind review, and send it to sotfom [at] gmail [dot] com. The successful applicants will be invited to give a talk at the conference and will be refunded the cost of accommodation in Vienna for two days (7-8 July).
Submission Deadline: 31 March 2014
Notification of Acceptance: 30 April 2014
Scientific Committee: Philip Welch (University of Bristol), Sy-David Friedman (Kurt Gödel Research Center), Ian Rumfitt (University of Birmigham), John Wigglesworth (London School of Economics), Claudio Ternullo (Kurt Gödel Research Center), Neil Barton (Birkbeck College), Chris Scambler (Birkbeck College), Jonathan Payne (Institute of Philosophy), Andrea Sereni (Università Vita-Salute S. Raffaele), Giorgio Venturi (Université de Paris VII, “Denis Diderot” - Scuola Normale Superiore)
Organisers: Sy-David Friedman (Kurt Gödel Research Center), John Wigglesworth (London School of Economics), Claudio Ternullo (Kurt Gödel Research Center), Neil Barton (Birkbeck College), Carolin Antos (Kurt Gödel Research Center)
Conference Website: sotfom [dot] wordpress [dot] com
Further Inquiries: please contact
Claudio Ternullo (ternulc7 [at] univie [dot] ac [dot] at)
Neil Barton (bartonna [at] gmail [dot] com)
John Wigglesworth (jmwigglesworth [at] gmail [dot] com)
Now many academics are trying to be heard from the outside, arguing that the NSA’s spying tactics are proving counterproductive and that university researchers have a duty to stop assisting them. […]
In the months since Edward J. Snowden fled the United States with tens of thousands of electronic documents describing NSA practices, mathematicians are realizing that they are in the same position as nuclear physicists in the middle of the last century, and business students in more recent times — suddenly needing to figure out the ethics behind what they do, said Edward Frenkel, a professor of mathematics at the University of California at Berkeley.
Their appeals were followed on January 24 by an open letter from a group of 50 researchers warning of long-term damage to society and to the nation’s technological enterprise from the NSA’s reported tactic of intentionally weakening computer-security standards so it can carry out spy operations.
“Every country, including our own, must give intelligence and law-enforcement authorities the means to pursue terrorists and criminals,” the researchers wrote, “but we can do so without fundamentally undermining the security that enables commerce, entertainment, personal communication, and other aspects of 21st-century life.”
Shunned as NSA advisers, academics question their ties to the agency, The Chronicle of Higher Education, 10 February 2014.
This is the fourth post in a series on categorical ideas related to formal concept analysis (the other posts are linked below). I want to bring together the ideas of the previous posts and tell you what relations and Galois correspdonences have to do with profunctors and nuclei. To do that I have to tell you what happens when you think of posets as categories enriched over the category of truth values.
Here is a picture of a poset, with if you can climb up the arrows from to .
The picture can also be interpreted as representing a thin category, that is a category with at most one morphism between each pair of objects.
Most people who know what a category is know that a thin category is the same thing as a poset; with an arrow corresponding to the relation . However, despite being well-known, this fact is also not strictly true. You have to be wary of the antisymmetry axiom.
In a poset the antisymmetry axiom asserts that if and then , but this translates in a thin category to having morphisms and implying that and that is not true in general, all you can say is that is isomorphic to .
A poset-without-antisymmetry is called a preorder, and the true fact is that thin categories correspond to preorders (and posets correspond to skeletal, thin categories).
Less well-known than the above not-entirely-true fact, at least according to my unscientific straw poll of mathematicians, is that a category enriched over truth values is the same thing as a preorder. Although this seems, superficially, to be the same as the previous fact, it does have some deeper connotations due to the depths of enriched category theory. It is some of these depths that I want to start to explore here.
This follows on from three previous posts, but doesn’t require you to have read the other ones — unless you want to appreciate the punchline!
In The Nucleus of a Profunctor: Some Categorified Linear Algebra I gave a construction which starts with an enriched profunctor between two enriched categories and gives rise to an adjoint equivalence between certain presheaves and copresheaves on the two enriched categories. This is the nucleus construction.
In Formal Concept Analysis I gave a construction which starts with a relation between two sets and gives rise to a Galois correspondence between certain sets of subsets of the two sets.
In Classical Dualities and Formal Concept Analysis I gave examples of this construction popping up all over mathematics.
In this post I’ll try to convince you that the second construction is just an example of the first. I’ll do this by explaining what familiar (?!) concepts from ordinary category theory become when you work in categories enriched over truth values. This is summarized in the following dictionary.
In the rest of this post (which has got quite long) I will just explain how the translation arises. It is basic enriched category theory, so if you already understand the above dictionary you can stop reading now!
Next time I hope to explain when you apply the nucleus construction when you enrich over things more like the real numbers.
The enriching category
The category of truth values, , has two objects: and . The morphisms correspond to ‘logical entailment’ which is what working mathematicians would usually refer to as ‘implication’ but logicians reserve ‘implication’ for logical operation, we’ll see this below. Entailment is written (pronounced ‘entails’), so we have corresponds to the only non-identity morphism in . So the category can be simply pictured as follows. We can make this category of truth values into a monoidal category by using logical ‘and’ as the monoidal product; I’ll write this as .
A category enriched over , , consists of a set and for each we have the ‘hom-object’ . We should think of as being the truth of a relation . We need composition morphisms in this category, so for each triple, we need a morphism in truth values in this context it means we have an entailment or, in other words, if and then . So we have a transitive relation. We also need an identity for each , generally speaking this means we need a morphism (where is the monoidal unit) in this context it means or, in other words, . So the relation is also reflexive. Thus we have a preorder.
Conversely, a preorder gives rise to a category.
Slogan: -categories correspond to preorders.
There is a notion of an enriched functor which we can look at in this context. If we have two -categories and then a -functor consists of a function and, for each pair , a morphism in using square brackets to mean ‘the truth value of’ this is the same as or, in other words, preserves the order.
Slogan: -functors correspond to order preserving maps.
Natural transformation objects
Enriched category theory is richer if we enrich over a complete closed, symmetric monoidal category. If the category is complete then we means that we can make the collection of enriched functors between two enriched categories into an enriched category. If we have then we have a hom-object and this is given by an end In the context of enriching over , this end is just a big ‘and’, or a ‘for all’, if you prefer. Rewriting this is terms of truth values of the relations we get that this means In other words, given preorders and , there is a canonical preorder on the set of order preserving functions , and this is if and only if for all . We can say that is dominated by if .
Slogan: The -natural transformation object corresponds to the domination relation.
The closed structure
If the enriching category is closed then it can be considered as a category enriched over itself. There is a small problem here with type checking and we should be more precise. If is closed then there is an internal hom functor . We form a -category with the same objects as but with the hom-object given by the internal hom . People often use the same notation for and ; this is potentially confusing, but so you should do this with some care.
The category of truth values, , is closed and the internal hom is given by ‘logical implication’: . Here implication is understood in the logical operation sense and not the deductive sense (that’s what we are using entails for). So, using brackets purely for clarity,
Slogan: Internal hom for is implication.
We know that -categories are the same as preorders, so this gives a natural preorder on the set .
Slogan: The -category structure on corresponds to the preorder .
Presheaves and copresheaves
As we are considering a closed enriching category , we can think of as a -category and consider -functors into . This is like considering scalar valued functions on a vector space.
A presheaf on is a -functor and a copresheaf is a -functor . In the context of preorders and truth values, a presheaf is an order-reversing function and a copresheaf is an order-preserving function .
Knowing a function to is the same as knowing the preimage of , so we can identify a function with the subset . The order reversing condition translates into So is a downward closed subset of the preorder and all downward closed subsets arise in this way.
Similarly if is a copresheaf then it corresponds to an upward closed subset .
Because presheaves are -functors, we canonically have the domination relation between them. Given presheaves then which when translated to downward closed sets becomes
Slogan: The poset of presheaves on a preorder can be identified with the set of downward closed subsets equipped with the subset ordering.
Similarly, the domination relation on copresheaves corresponds to inclusion of the associated upward closed subsets. However, it is usually the opposite of the category of copresheaves that crops up, so this has the opposite relation.
Slogan: The poset of opcopresheaves on a preorder can be identified with the set of upward closed subsets equipped with the superset ordering.
Sets can be thought of as discrete posets, that is to say, where if and only if . In that case all subsets are both upward closed and downward closed. In this case, then, the set of copresheaves and the set of presheaves can both be identified with the powerset of the original set.
Slogan: The poset of presheaves on a set is the powerset with the subset ordering; the poset of opcopresheaves on a set is the powerset with the superset ordering
An enriched adjunction consists of a pair of enriched functors together with an isomorphism in , natural in and This means that when we enrich over the category of truth values we get a Truth-adjunction being a pair of order-preserving maps between posets with the condition that In other words we have the following slogan.
Slogan: A Truth-adjunction is precisely a Galois connection between preorders.
Then the adjoint equivalence induced on the fixed sets is precisely the Galois correspondence induced on the fixed set of the Galois connection.
A profunctor between -categories and is an a -functor We should remind ourselves what the definition of the tensor product of -cateogries is. The set of objects of is the set of ordered pairs and the hom-sets are given by It is perhaps worth noting that in order to define composition we will need that has some extra structure such as being symmetric or braided.
When enriching over truth values, this means that for preorders and the preorder on is given by A profunctor in the case can then be identified with a subset of which is upward closed. Writing for , we have: This can make more sense if we expand it out: Slogan: Profunctors between preorders correspond to relations extending the preorders
In the first post in this sequence I showed that a profunctor gives rise to an adjunction between presheaves and opcopresheaves, and thus to an adjoint equivalence between the fixed sets of the adjunction.
In the third post I showed how a relation between sets gives rise to a Galois connection between the powersets and thus to a Galois correspondence between the sets of closed subsets. I also gave several examples of correspondences around different areas of maths that arise in this way.
I will leave it to the interested reader to show that the Galois correspondence construction is what you get when you apply the profunctor nucleus construction to the situation of discrete -categories. It should drop out from the dictionary.
Slogan: The nucleus of a profunctor between discrete categories corresponds to the Galois correspondence between the closed subsets coming from a relation between two sets.
That’s not a very snappy slogan to end with, I grant you.
In the next post I will show how costructing the nucleus of a profunctor when you enrich over something like the real numbers instead of truth values gives rise to some dualities in other areas of maths.
Guest post by Fosco Loregian
The aim of this note is to give a short account of
Freyd, P. J., & Kelly, G. M. (1972). Categories of continuous functors, I. JPAA, 2(3), 169-191.
as part of the Kan Extension Seminar series of lectures. I warmly thank all the participants and the organizer Emily Riehl for giving me this way of escape to the woeful solitude a “baby” category theorist (like I am preparing to become) suffers here in Italy. It is an amazing and overwhelming experience, I can’t even estimate the amount of things I already learned after these three lectures. There are two other people without whom this wouldn’t have been possible: my current advisor, D. Fiorenza, who patiently helped me to polish the exposition you are about to read, and my friend Paolo, for which the words “I don’t want to learn this” are meaningless.
That said, let’s begin with the real discussion.
Freyd and Kelly’s paper was the first to raise and solve in a very elegant way some fundamental questions in elementary Category Theory, the so-called Orthogonal subcategory problem, and Continuous functor problem.
Orthogonality between arrows
Definition. An object in a category is said to be orthogonal to an arrow (we say ) if the hom-functor sends to a bijection between sets.
If has a terminal object, then if and only if the terminal arrow has the so-called (unique) right lifting property against : this means that for any choice of in the diagram there exists a unique arrow making the upper triangle commute. Obviously, there is a dual notion of left lifting property.
Statement of the problem: the CFP and the OSP.
Now, a classical issue in elementary Category Theory is the so-called “orthogonal subcategory problem”:
Orthogonal Subcategory Problem (OSP). Given a class of arrows in a category , when is the full subcategory of all objects orthogonal to a reflective subcategory (i.e., when there exists a left adjoint to the inclusion )?
There are lots of “protoypical” examples of the OSP in Algebra and Geometry: think for example to the case of sheaves of sets (on a given Grothendieck site) as a reflective subcategory among presheaves: the sheaf condition can be easily stated in terms of an orthogonality request: a presheaf on a site is a -sheaf if and only if for every covering sieve .
In fact this is not a case, since following a general tenet “(at least some) localizations are determined by an orthogonality class” (see for example the definition of -local object and the Lab page about the OSP).
Freyd and Kelly were the first to point out that a solution for the OSP turns out to solve another fundamental question, which falls under the name of “continuous functor problem”:
Continuous Functor Problem (CFP). Given a category and a class of diagrams (say ) in it, when is the category of all functors which preserve limits of all -shaped diagrams reflective in the category of functors ?
(Important) Remark. Before going on, we must spend a word on the notion of continuity: Freyd and Kelly published an erratum shortly after the paper, to correct the “stupid mistake of supposing that the limit of a constant diagram is the constant itself”; counterexamples to this statement abound, and in fact it can be easily shown that the limit of the constant functor is (whenever this copower exists) precisely , where is the set of connected components of the category .
Once this is fixed, notice that the CFP arises in an extremely elementary way: for example,
- An additive functor between abelian categories is left exact if and only if it commutes with finite limits, and
- The above sheaf condition can be easily restated in the good old familiar continuity request on coverings of objects .
This should give you evidence that the two problems are not unrelated:
Proposition. Given a class of diagrams in a small complete category , we get a family of natural transformations and a functor is -continuous if and only if it is orthogonal to each arrow in .
(This result is not in its full generality: see [FK], Prop. 1.3.1.)
Proof. The following diagram commutes, and one of the two arrows is an isomorphism if and only if the other is.
OSP CFP: Strategy of the proof.
The strategy adopted by Freyd and Kelly to solve the OSP, is to find sufficient conditions on so that Freyd’s Adjoint Functor Theorem applies to the inclusion (in particular, since it can be shown that is always complete, this boils down to find a solution set for to apply Freyd Adjoint Functor Theorem).
These conditions are of 1+3 different types:
- The presence of a proper factorization system;
- The presence of a generator;
- A (global) boundedness condition (or equivalently, on the generator in the previous point).
Notation. We denote (resp, ) the (possibly large) class of all arrows left (resp, right) orthogonal to each arrow of the class .
In the previous notation, .
Definition. A prefactorization system on a category consists of two classes of arrows such that and .
A prefactorization system on is said proper if and .
A factorization system (OFS, or simply FS) on a category corresponds to the modern notion of orthogonal factorization system: a (proper) factorization on the category is precisely a (proper) prefactorization such that each can be written as a composition with .
Examples. 0. Any category has two trivial factorization systems, namely and , where denotes the class of all isomorphisms, and the class of all arrows in ; 1. The category has a factorization system where denotes the class of surjective maps, and the class of injective maps. More generally, the category of models of any algebraic theory (monoids, (abelian) groups, …) has a proper FS , where is the class of extremal epimorphisms (which may or may not coincide with plain epimorphisms); and for abelian categories, (elementary) toposes…
Definition. If is a category with a proper factorization system , we say that a family of objects lies in if there exists a unique solving (all at once) the lifting problems (one for each ). If has sufficiently large coproducts, this condition is obviously equivalent to ask that the arrow .
Definition. A generator in a category with a proper factorization system consists of a small full subcategory such that for any the family lies in in the former sense.
Remark. Mild completeness assumptions on entail that
- A generator separates objects, i.e. if then there exists an object and an arrow such that .
- A small dense subcategory of is a generator;
- Any finitely complete category with a generator is well-powered.
For extremal FSs (in which the left/right class coincides with that of extremal epi/mono) the converse of 1,2 is also true, so as to recover the notion of generator as a “separator for objects”.
Notation. In this section admits all limits and colimits whenever needed
Definition(s). An ordered set is said to be -directed (for a regular cardinal ) if every subset of with less than elements has an upper bound in . A -directed family of subobjects of consists of a functor from a -directed set to the posetal class of subobjects of . The colimit of such a functor, denoted is called the -directed union of the family.
With these conventions, we say that an object is bounded by a regular cardinal (called the bound of ) if every arrow to a -directed union factors through one of the . The category is bounded if each is bounded by a regular cardinal (possibly depending on ).
Example. In a set of cardinality is -bounded.
- admit arbitrary colimits;
- admit a generator each of which object is -presentable.
Examples. Examples of such structures/properties on categories abound:
- Any abelian, , bicomplete and bi-well-powered category , is bounded;
- Given a regular cardinal , locally -presentable categories are -bounded, and admit a generator with respect to the proper FS : sets, small categories, presheaf toposes and Grothendieck abelian categories all fall under this example. Less obviously, the converse implications is false: exhibiting a -bounded category with a generator which is not locally -presentable requires to accept the inexistence of measurable cardinals (see [FK], Example 5.2.3).
Solution of the OSP
Theorem (OS theorem). If is complete, cocomplete, bounded and co-well-powered with a proper FS , and is a class of arrows whose elements are “almost all” in , i.e. (we call these classes quasi-small with respect to ), where
- is a set;
- is possibly large but contained in .
Then is a reflective subcategory.
Proof. [FK] performs a clever transfinite induction to generate a solution set for any object : if is the typical arrow in , we define
- (zero step) ;
- (successor step) , where
- (limit step) , where .
This is where boundedness comes into play: if is the cardinal bounding , then the induction stops at : is the desired solution set for , namely every arrow whose codomain lies in factors through some .
Solution of the CFP
The procedure we adopted to reduce the CFP to the OSP (building ) doesn’t take care of any size issue: to repair this deficiency we exploit the following
Lemma. Let be cocomplete, endowed with a proper factorization system and a generator . For any class of natural transformations in we denote Then there exists a class contained in such that .
The particular shape of is due to the procedure used in [FK] to reduce the CFP to the OSP. The operation is a copower, in the obvious sense: given , , where .
The key point of this result is that the class is small (obviously) whenever is, so we can conclude applying the OS theorem:
Theorem (CF theorem). Let be a small category, and a bicomplete, bounded, co-wellpowered category with a generator and a proper factorization . Let be a class of cylinders whose elements are almost all cones (this means that the collection of diagrams which are not cones is a set). Then the subcategory of -continuous functors is reflective in .
The rough idea behind this result is the following: can be written as , and itself can be split as a union , where the two sub-classes consist of the -arrows and the -arrows of the various . The assumptions made on and the presence of a generator on entail that is a set, so we can conclude.
The state of the art.
[FK]’s solution of the OSP can be generalized: [AHS] show that is reflective in a category with a proper FS whenever the class is quasi-presentable, namely it can be written as a union , where and is presentable (in a suitable sense).
The same paper offers a fairly deep point of view about the “weak analogue” of the OSP, which can be regarded as a generalization of the Small Object Argument (SOA) in Homotopical Algebra; if we build the class of arrows having a non-unique lifting property against each , then we can only hope in a weak reflection, where the unit of the adjunction is only weakly universal. In a setting where “things are defined up to homotopy” this can still be enough, provided that we ensure the reflection maps satisfy some additional properties. The additional property requested in the SOA is that the weak reflection maps belong to the cellular closure of , i.e. they can be obtained as a transfinite composition of pushouts of maps in .
The theory of factorization systems is deeply intertwined with the SOA, too: in [SOA] R. Garner defines an “algebraic” Small Object Argument, exploiting a description of OFS and WFS as suitable pairs over the category . In this respect I think that the best person which can give us sensible references for this is our boss, since she wrote this paper.
References and suggested reading
[LPAC] Jiří Adámek, Jiří Rosický, Locally presentable and accessible categories, LMS Lecture Notes Series 189, Cambridge University Press, (1994).
[FK] Freyd, P. J., & Kelly, G. M. (1972). Categories of continuous functors, I. JPAA, 2(3), 169-191.
[SOA] R. Garner, Understanding the small object argument, Applied Categorical Structures 17 (2009), no. 3, pages 247-285.
[PK] P. Gabriel and F. Ulmer, Lokal präsentierbare Kategorien, Springer LNM 221, 1971.
[AHS] J. Adamek, M. Hebert, L. Sousa, The Orthogonal Subcategory Problem and the Small Object Argument, Applied Categorical Structures 17, 211-246.
Last time, I described an emerging pattern in algebra and topology, exemplified by the cohomology of the space of configurations of ordered points in the plane. In language I’ll introduce below, cohomology with rational coefficients defines a uniformly representation stable sequence, the fundamental data of which is an -representation for each .
In this post, I want to tell you in more detail about the representation stable sequences introduced in a paper by Tom Church and Benson Farb. I’ll also discuss the second generation approach to representation theory, which centers of the functor category of what they call FI-modules. Jordan Ellenberg, the third author of this paper, has blogged about this also. The first post in his series can be found here. In contrast with Part I, my exposition here will be more explicitly directed at the categorically-minded reader.
Representation stable sequences
Let be a commutative ring and let denote the symmetric group on letters. Write for a collection of -representations . Equivalently, defines a functor whose domain is the disjoint union of the symmetric groups , regarded as one-object groupoids (or, if you prefer, the category of finite sets and isomorphisms). This data is sometimes called a symmetric sequence or a species. The symmetric groups come with injective group homomorphisms
where the image of consists of those permutations that fix the element . The “mapping telescope” of this sequence is the category defined by gluing to the poset along the discrete category .
Definition. A consistent sequence of representations is a functor .
Explicitly, a consistent sequence is a symmetric sequence together with -equivariant maps (natural transformations) from to the restriction of the -representation along . Here’s another way to think about this. Define a category over whose objects in the fiber over are -representations . Hom-sets in Rep are defined by
if and empty otherwise. Here is left adjoint to the restriction along the composite inclusion . It’s precisely the left Kan extension. In representation theory, this functor is called induction, which explains the notation I’m using here. Now a consistent sequence is exactly a section to the projection functor , i.e., a countable sequence of composable morphism in , with exactly one object in the fiber over each .
Definition. A consistent sequence of representations is a representation stable sequence if
(i) All but finitely many of the maps in are monomorphisms.
(ii) All but finitely many of the maps in are epimorphisms.
(iii) For each partition , the coefficients of the irreducible representation in the decomposition is independent of for sufficiently large.
By definition, a morphism in is represented by either of the two adjunct arrows in or in . The map is a monomorphism if and only if is injective and an epimorphism if and only if is surjective.
The sequence is uniformly representation stable if the stable range in condition (iii) can be chosen independently of .
For example, consider the space of configurations of labeled points in the plane. “Forgetting the last point” defines a continuous map , inducing a map that is -equivariant in the sense described above. So -th cohomology of configuration spaces defines a consistent sequence of representations. The theorem mentioned at the end of Part I now takes the following form:
Theorem. The consistent sequence of representations is uniformly representation stable with stable range .
Later, this theorem was generalized to configurations of ordered points in any connected, orientable, manifold of finite-type (meaning that the rational cohomology consists of finite dimensional vector spaces).
The second-generation account of representation stability begins with the following observation. The functor factors through a quotient of the category that turns out to be much easier to describe. The consistent sequence of configuration spaces was defined using maps that “forget the last point.” But of course, we could have forgotten any of the points or any number of points. Moreover we could have chosen new labels for the points that we do not forget. These “forgetful” maps assemble into a functor , where FI is the category of finite sets and injections. (When notationally convenient, we’ll consider to be skeletal.)
Definition. An FI-module is a functor . The category FI-Mod is the category of FI-modules and natural transformations.
A consistent sequence is an FI-module if and only if factors through the functor induced by the obvious cone under the pushout diagram defining the category . This is the case just when, for each in the image of the canonical map (), and for any with common restriction along , we have .
A consistent sequence that is not an FI-module is the sequence of regular representations
in which is the free -module module with basis .
Representable FI-modules and free FI-modules
For each , there is a free FI-module that Church, Ellenberg, and Farb denote by defined to be the composite of the functor represented by and the free -module functor . So . By the Yoneda lemma and the free-forgetful adjunction, a map of -modules corresponds to a vector .
There are more general notions of free FI-modules defined by forming the left Kan extension of an -representation or of a symmetric sequence. (We might regard an -representation as a symmetric sequence in which all of the other representations are 0.)
Write -Rep for the functor category . Note that is the maximal subgroupoid of . Left Kan extension defines the free-FI-module functor
which is left adjoint to restriction: the functor that sends an FI-module to its underlying symmetric sequence. Because the category is cocomplete, we obtain an explicit formula for as a special case of the formula for pointwise left Kan extensions:
If is a field (so that is free), this formula tells us that
Categorical properties of FI-
FI- is a category of functors taking values in a complete and cocomplete abelian category. Hence FI- is also abelian and has these limits and colimits (and kernels and cokernels), defined pointwise. The tensor product over the commutative ring also induces a number of monoidal structures that we decline to discuss here. Another monoidal structure is induced by Day convolution from the disjoint union on FI.
The abelian category structure of FI-modules will have computational implications.
Definition. An FI-module is finitely generated if there exists a finite sequence of integers and an epimorphism in FI-Mod.
The map identifies a sequence of elements that span in the sense that is the largest submodule containing each of the . We have the elementary result:
Proposition. Consider a short exact sequence of FI-modules. If is finitely generated, then so is . If both and are finitely generated, then so is .
If is a Noetherian ring containing , then FI- is Noetherian, meaning that any submodule of a finitely generated FI-module is finitely generated. This is harder to prove.
FI-modules and representation stability
The reason we care about finite generation is that in good cases it is equivalent to uniform representation stability.
Theorem. An FI-module over a field of characteristic zero is finitely generated if and only if its underlying consistent sequence is uniformly representation stable and each vector space is finite dimensional.
Moreover, it is computationally advantageous to show that a sequence of FI-modules is finitely generated. For instance, we have the following theorem, which is a special case of a more general result about character polynomials. (These are polynomials whose variables are the class functions that carry a permutation to the number of -cycles in its cycle decomposition.)
Theorem. If is a finitely generated FI-module over a field of characteristic zero, then there is a polynomial with rational coefficients so that for all sufficiently large.
In some examples, FI-modules come with even more structure. Let us return our attention to the configuration spaces but now assume that the manifold has non-empty boundary. This allows us to define maps for less than , which introduce new points in a small neighborhood of the boundary. Composition of maps of this form isn’t associative on the nose, but it is well-defined up to homotopy, which means that the cohomology for fixed has both the structure of an FI-module and of a co-FI-module (a contravariant functor ).
Church, Ellenberg, and Farb call symmetric sequences of this form FI-modules, which are -valued functors whose domain is the category of finite sets are partially-defined injections. It’s convenient to think of as the category of (isomorphism classes of) spans in . The span
specifies a subset of (the domain of the partial function) and a bijection (mediated by ) with its image, a subset of . We call the rank of this map. There are natural bijective-on-objects inclusions and whose images consists of those spans with one component the identity. In this way we see that an FI-module has both an underlying FI-module and co-FI-module structure.
The representable FI-module can be extended to an FI-module, which we also denote by , in a unique way. Recall that for all . Note that in , the span factors through in the evident way (as a map in followed by a map in ). So an immediate consequence of functoriality is that any map in of rank less than must define the zero map on the FI-module .
To define the linear map
corresponding to , it of course suffices to specify the image for each basis element . To do so, form the composite span . Either , in which case this map lies in the subcategory , and we define the image of to be the basis vector . If not, this composite has rank less than and we must define its image to be zero.
Note the FI-module is not the left Kan extension of the FI-module along : the left Kan extension would be the representable FI-module. (Recall that Kan extensions are only guaranteed to be extensions in the case where the map being extended along is full and faithful.) This confused me for a while. A big difference between and is that objects in the latter category have non-invertible endomorphisms. Indeed, each subset defines an idempotent
whose splitting is the object ; note the category is idempotent complete.
Now allowing to vary, the define a symmetric sequence of FI-modules (with acting on by precomposition). So we can define a functor from symmetric sequences to FI-modules by functor tensor product:
This functor induces a classification of FI-modules.
Theorem. defines an equivalence of categories.
If is a field, a corollary of this result and the dimension calculation above is that the dimension of any finitely generated FI-module is given by a single polynomial in for all . In fact, over a field finite generation of an FI-module is equivalent to the dimensions dim being bounded. This, together with the Noetherianness mentioned above, provides useful tools for proving that FI-modules are finitely generated.
At the Joint Mathematics Meetings in Baltimore, I saw Benson Farb deliver a joint invited address on representation stability, the above eponymous “emerging pattern.” He began this work in 2010 with Tom Church, then a PhD student at the University of Chicago. I’ve been a fan for a few years now, and Benson’s beautiful talk has inspired me to write a brief summary.
This post is divided in two parts. In Part I, I’ll tell you about the talk, which was largely accessible to anyone with an undergraduate math background. In Part II, I’ll say a bit about the technical details and write about more recent developments, joint also with Jordan Ellenberg, in which some categorical ideas enable a simplified conceptual understanding of the patterns that frequently arise in practice.
Configuration spaces and polynomials
The topological spaces we’ll consider are configuration spaces. Let be any topological space (typically a manifold) and write for the space of ordered tuples of distinct points in . The space has a natural free action by , the symmetric group on elements, that simply permutes the labels. The space of orbits, (the colimit of this action) is denoted by and is the space of unordered tuples of distinct points in .
Configuration spaces are basic mathematical objects. For example
is the space of monic, degree , square-free polynomials. Here, the tuple of unordered points corresponds to the roots of the polynomial. The square-free condition guarantees that the roots are distinct. The spaces are algebraic varieties. For example:
There are many reasons to care about the topology of these spaces. Even basic questions, such as whether is connected, can have non-trivial applications to robotics. (Aside: I hereby propose “configuration” as the collective noun for a group of robots.) Another point of interest is that a loop in the space defines a braid with -strands. Indeed, the fundamental group of endpoint-preserving homotopy classes of loops is the braid group on -strands.
It will be important in what follows for us to learn how to visualize this — the visualization is easiest if you replace with a small disk. Picture distinct points in the plane, perhaps each in a different color, but as we’re visualizing the space of unordered configurations, you don’t have to remember which point has which color. We’ll visualize a path in this space as a “movie” in which we can watch the points move around. Each “frame” is a point in the configuration space. If we stack all the frames on top of each other in a direction orthogonal to the disk, the ambient space is a cylinder. As the colored points move around, they trace strings from the top of the cylinder to the bottom. Because the points never collide, the strings never intersect. Hence, a loop defines a braid. A loop in the space of ordered configurations is called a pure braid, meaning that each strand starts and ends in the same relative position.
The fundamental problem is to understand the topology of the spaces of configurations of ordered and unordered points. A standard way to try to do this is to compute the cohomology of the space. It’s okay if you don’t know exactly what this means. The point is that to any topological space one can define a sequence of rational vector spaces for each . These algebraic invariants capture a significant portion of the topological information about the space. For fixed , our aim is to reduce infinitely many computations to finitely many by investigating how these vector spaces depend on .
Case study: first cohomology
Let us consider this problem for . Elements of are represented by homomorphisms . To define such a , we must assign an algebraic invariant (a rational number) to each loop in in a natural way; here this means that the invariant must be additive with respect to concatenation of loops, so that the map is a homomorphism.
One idea is to use the winding number. For each pair with , define by
winding number of around while performing .
Note that . Benson described a beautiful intuition for this observation that he learned from his advisor, William Thurston, who always encouraged him to project himself into the space. Imagine and are dance partners holding hands. As they turn around the dance floor, performs one clockwise rotation each time does so and conversely.
Proposition. is a linearly independent in .
Linear independence is implied by the existence of loops that produce a non-trivial winding number for some pair while fixing the remaining points. Moreover:
Theorem (Arnol’d 1968). is spanned by . Hence .
This is exactly the sort of closed form we might hope for. This leads to the next question:
Question: What is ?
As before, we seek a numerical invariant for loops of unordered configurations. One such invariant is the total winding number
total winding number of
It turns out this is the only invariant.
Corollary. total winding number.
Proof: = -fixed vectors in = -span of the total winding number.
Theorem (Arnol’d 1968, F Cohen 1972). For any , does not depend on for .
This theorem implies that the spaces satisfy homological stability. While configurations of unordered points satisfy homological stability, configurations of ordered points do not. In fact, for each
The reason that homological stability always fails for the spaces of ordered configurations has to do with representation theory. Because the symmetric group acts on , also acts on the vector space .
A fact from representation theory is that (except for ) any vector space that admits a non-trivial action must have dimension at least . (Technically this is only true if I can also exclude alternating representations. Exercise: prove that does not contain any alternating representations.) In particular
and homological stability always fails. But somehow homological stability should hold morally. For example, up to permuting indices, there’s only one . A more precise statement is that each is the same representation of . From the basis described in Arnol’d’s theorem above, it is just the representation where acts in the expected way on unordered pairs of distinct elements of .
Representation theory of the symmetric groups
Here is a quick primer on representation theory. A representation is a -vector space with a linear group action, or, equivalently, a functor from the one-object groupoid whose morphisms correspond to group elements to the category . A representation is irreducible if it has no invariant subspaces. By Maschke’s theorem, every representation decomposes into a direct sum of irreducibles. Thus, the goals of representation theory are to:
classify all the irreducible representations of a given group
understand how to decompose a given representation into irreducibles
Here, we’re interested in representations of symmetric groups, for which we have the following beautiful theorem.
Theorem (Young 1900). Irreducible representations of correspond to partitions of .
The trivial representation of corresponds to the partition .
The standard representation of (the orthogonal complement of the fixed vector in the permutation representation of on ) corresponds to the partition .
We want notation for irreducible representations that is independent of . For this, we denote the irreducible representation of corresponding to the partition
of by . So is the trivial representation, and is the standard representation.
Maschke’s theorem says that
for some indices . The main problem is to find the .
Example. for all . Here the trivial representation is the collection of fixed vectors, the span of the total winding number. The subspace is the permutation representation of some collection of basis vectors of . These basis vectors have geometric meaning: is the total winding of the loop around the point .
This example implies that first cohomology of ordered configuration spaces is representation stable: the index of the irreducible representation is independent of for sufficiently large. Tom and Benson wondered whether this phenomenon would also occur for the higher cohomology groups. They recruited David Hemmer to compute for low . Unfortunately, his calculations, aided by some computer algebra package whose name I didn’t catch, did not stabilize. But thinking this theory was too beautiful not to be true, Benson pressed him to try again, and it turned out that there was a bug in the software, relating to a failure to account for the outer automorphism of . Once this was corrected, Hemmer calculated that the second cohomology groups appear to stabilize for .
This lead to the following result:
Theorem (Church-Farb 2010). Fix . Then and does not depend on for .
Now it turns out that the stable representations are quite complicated. See their paper. But nevertheless, this stability phenomenon has led to useful computations with numerous practical applications. I’ll mention just one: to combinatorial statistics for polynomials over a finite field . More details can be found in this paper, joint with Jordan Ellenberg.
By the Grothendieck-Lefschetz theorem, the multiplicity of a given irreducible representation in corresponds to a point count in the variety weighted by the character of the representation, and this correspondence is linear. Thus each index encodes some “statistic” for monic, square-free polynomials over . Representation stability implies that these statistics are asymptotically stable.
The multiplicity of the trivial representation corresponds to the number of degree square-free polynomials, which is .
The multiplicity of the standard representation corresponds to the expected number of linear factors, which is .
Guest post by Mark Meckes
A recent paper of Mark’s proved the very substantial result that Minkowski dimension (one of the most important types of fractal dimension) can be derived from magnitude (an invariant ultimately coming from category theory). More exactly, he showed that the Minkowski dimension of a compact subset of is exactly equal to its “magnitude dimension” (Cor 7.4). Here, Mark explains not this specific result, but the overall framework that makes such theorems possible. —TL
I’d like to report on some recentish progress in understanding the magnitude of compact metric spaces. To set the stage I’ll recap some background, which will repeat a number of things which have already appeared in posts by Tom and Simon. For those who want to be reminded of more, Tom has provided a reading list.
Background, part I: finite spaces
The magnitude of a finite metric space is defined as follows. First, define the matrix . A weighting for is a vector such that for each ,
If is a weighting for , then the magnitude of is defined to be
In general, there are well-definedness issues with this definition as written. For the purposes of this post I’ll brush them aside by noting that if happens to be a positive definite matrix, and hence invertible, then possesses a unique weighting. When this is the case, we call a positive definite metric space, or PDMS. For example, it turns out that all finite subsets of (with the usual Euclidean metric) are positive definite metric spaces. PDMSs have the additional nice properties that their magnitude is always at least 1 (if they’re nonempty), and that magnitude is increasing with respect to inclusion.
Background, part II: compact spaces
Given an invariant of finite metric spaces, it’s natural to try to extend it to some class of infinite spaces, say compact spaces. There are two obvious general approaches: approximate a given compact space by finite spaces, or adapt the definition to apply to infinite spaces directly.
Let’s first consider approximation. We’d like to say that if is a given compact metric space, we can find a sequence of finite spaces which approximate in some sense and whose magnitudes converge to some limit, which we could then take as the definition of . There are well-definedness issues here, in particular whether different approximating sequences would give different values of . The usual way to handle such an issue, if you can, is by appealing to continuity: if magnitude were a continuous function of a finite metric space (with respect to an appropriate topology), then we could define magnitude on the family of compact spaces as its unique continuous extension.
Fortunately, there is a natural topology on the family of isometry classes of compact metric spaces. Unfortunately, magnitude is not a continuous function of a finite metric space. However, if we restrict to finite PDMSs, it turns out that magnitude is lower semicontinuous, or lsc for short. Like a continuous function, an lsc function has a canonical extension from a domain to its closure—not a unique lsc extension, but a minimal one. So we can naturally define the magnitude of compact PDMSs (a compact space is a PDMS if each of its finite subspaces is) as the minimal lsc extension of the magnitude of finite PDMSs.
Concretely, this means that is well-defined as the limit of , whenever is a sequence of finite PDMSs which approximate and for which is increasing. Since magnitude is increasing with respect to inclusion for finite PDMSs, the last condition can be guaranteed by taking an increasing sequence . (Notice that this minimal lsc extension of magnitude a priori takes values in . I don’t know whether every compact PDMS has finite magnitude. Tom proved that every compact subset of has finite magnitude.)
Using roughly this approach (although before the semicontinuity result was proved), Tom and Simon showed, among other things, that the magnitude of an interval of length is . So it became clear that magnitude knows something about classical geometric invariants.
The approximation approach leads to some nice concrete results, and is justified by semicontinuity. But it’s rather unsatisfactory in at least two ways. On the one hand, it would simply be nicer from an aesthetic point of view to give a direct definition of the magnitude of a compact space, which extends the original definition; it might also shed new light on the original definition. On the other hand, such a direct definition would involve some notion of weighting for a compact space, which would provide a new tool for analyzing magnitude.
Later, Simon gave a definition of magnitude for certain compact spaces which extends the original definition. Given a compact metric space , a weight measure for is a measure such that for each ,
If is a weight measure for , then we define
It turns out that this approach agrees with the approximation approach. That is, if is a compact PDMS and has a weight measure, then this definition of agrees with the earlier one. On the other hand, the domains of definition are different: not every compact PDMS has a weight measure, and many spaces which are not positive definite do possess weight measures.
Using this definition, Simon studied magnitudes of homogeneous Riemannian manifolds. Again, he found that magnitude knows about classical geometric invariants. However, the fact that many (probably most, in some sense) spaces don’t have weight measures makes the measure-based definition of magnitude unsatisfactory as well.
Weightings for compact PDMSs
More recently, I found a definition of magnitude for a compact PDMS which directly extends the original definition for a finite space. To begin with, we take inspiration from Simon’s definition and think of the weighting of a finite space as the measure (more precisely, as a signed measure, since its components need not be positive)
where is a point mass at . We next take inspiration from the finite approximation definition and say that the weighting of a compact space should be a limit of weightings of finite spaces. That is, it should be a limit of some sequence of finitely supported signed measures on . To make sense of this, we need to put some topology on the space of finitely supported signed measures. Since it’s a vector space, we’d probably like to make it a topological vector space. The nicest topological vector spaces are inner product spaces, so it makes sense to try to find an inner product on which will work nicely with magnitude.
If is a PDMS, then such an inner product is staring us in the face (although it took many months before I noticed that): if is finite, then
Define to be the Hilbert space completion of with this inner product. A weighting for will be an element of which satisfies an appropriate analogue of the original condition .
When , for some finite , we can straightforwardly define the function
for . It’s not hard to show that is a bounded continuous function, and that , so the linear map has a unique continuous linear extension . We can now define a weighting of to be a such that for each .
The map is injective, so has at most one weighting. Moreover, has a weighting if and only if there is a uniform bound on the magnitudes of its finite subsets. In other words, possesses a weighting iff the magnitude , as defined by finite approximations, is finite. (This also means that I don’t know whether every compact PDMS has a weighting, but I do know that every compact subset of does.)
Now that we know what a weighting for is, we’re ready to define the magnitude of . This is slightly less straightforward, since if is not in , then there is no meaning to . To get around this, we notice that the original definition for a finite space can be rewritten as
and the latter sum fits right into our Hilbert space framework. So, if is a compact PDMS with weighting , then we define the magnitude of to be
This fits with all the previous definitions: if is a finite PDMS, this simply repeats the original definition in a more complicated way; if is a compact PDMS, this gives the same value of magnitude as finite approximation; and if is a compact PDMS with a weight measure , then is a weighting for and .
So what does this give us that we didn’t have before? As hoped for, it’s a definition which works for a broad class of spaces, and is (to me, at least) more elegant than approximation by finite spaces. It may shed a little bit of light on the original definition of magnitude for finite spaces—it suggests that one should think of the magnitude not, as originally written, as the sum of the entries of the weighting , but as the double sum
In other words, it suggests that the matrix should be thought of, not as defining a linear map, but as defining a bilinear form. (The same vague, speculative remark applies more generally to the definition of the magnitude of an enriched category.)
What about the hope that the weighting of would turn out to be a useful tool? Before addressing that, let’s first specialize to subsets of .
Weightings and magnitude in Euclidean space
If , then the linear operator is a convolution operator, which suggests using a bunch of Fourier mumbo-jumbo. If you do that, it turns out that is a subspace of the Sobolev space (or more properly, Bessel potential space) , and the norm is, up to a constant depending on , the standard Sobolev norm on . This isn’t the most familiar, classical sort of thing—it’s a Sobolev space, but with a negative, fractional (half the time) exponent—but it is a known and well-understood thing. Its elements are distributions, not functions, but a lot of machinery exists to work with such things.
Even better, maps into the Sobolev space . This is a Sobolev space whose elements are functions. (Real, honest, continuous functions—not even almost-every-equal equivalence classes!) If is the weighting of , then is, by definition, the unique function in such that for all . It’s not hard to show that , for an explicit dimensional constant . With a bit more work, one can show that
and the infimum is uniquely achieved when .
This reformulation of magnitude in terms of a variational problem for functions turns out to be extremely useful. It opens the door to using tools from Hilbert space geometry, Fourier analysis, potential theory, and PDEs to study magnitude. But this post has gotten to be rather long, so I’ll leave it to you to read the paper (and hopefully future papers and posts) to see how!
There’s a meeting in Erlangen coming up!
• Structures on Tensor Categories and Topological Field Theories, Department Mathematik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, March 4-6, 2014, organized by Catherine Meusburger and Christoph Schweigert.
This is a three day learning event on recent developments in (extended) topological field theories. The focus is the interplay of structures on tensor categories and topological field theories, including structures related to non-semisimple categories.
The idea is not to have yet another workshop or conference but to study these topics in depth, with a schedule more like the one of an Oberwolfach workshop, with a lecture series and plenty of time for discussions. It will include a lecture series by Christopher Schommer-Pries on topological field theories and the underlying categorical structures.
- Bruce Bartlett, Oxford University
- Alain Bruguières, Université Montpellier II
- Thomas Kerler, Ohio State University
- Michael Mueger*, Radboud Universiteit Nijmegen
- Christopher Schommer-Pries, Max Planck Institute for Mathematics, Bonn
- Noah Snyder*, Columbia University
- Alexis Virelizier*, Université Lille I
* to be confirmed.
Click here to download a PDF file of the conference poster.
Guest post by Eduard Balzin
The Kan extension seminar continues, and with it we now come to the paper by Ross Street. Published in 1972, this paper is one of the first instances where the notion of a monad was made relative to an arbitrary 2-category. A lot of aspects, such as algebras over a monad, Eilenberg-Moore and Kleisli categories, the relation with adjunctions, were generalised accordingly. Other topics involve representability, duality, and the example of . I will therefore try to talk about this paper in detail, and add something of my own.
Myself, in the course of my (higher)-categorical education, I have not learned in full about monads. I do not mean definitions or main theorems, but rather the answer to the question why monads. While preparing for the seminar, I’ve managed to give a series of answers to this (my own) question, and I hope that the discussion will take me (and everyone else) even further.
Let me sincerely thank Emily Riehl for organising the seminar. I truly hope it will be a new castle on the landscape of categorical life. I am also quite grateful for the participants of the seminar, who have greatly contributed during the online discussion and in their readers discussion. It all makes me happy.
I. What follows below is my summary and exposition of Street’s paper. At times, I add words of mine. The notation is not exactly of Street, but follows him closely for reading convenience. In addition, I rearranged some of the paragraphs in my exposition.
At the time this paper has written, monads had been around for quite a while and had achieved some recognition due to their usefulness in homological algebra and homotopy theory. Indeed, this is not the first paper where a monad internal to an arbitrary (weak) 2-category appears. Apparently the first mention of such general monads was made by Jean Bénabou in 1967. Formal Theory of Monads also restricts its attention to 2-categories instead of bicategories, strict 2-adjunctions instead of coherent ones. This brings more transparency to the theory but lessens the ability to apply the results of this paper directly.
II. A monad in a 2-category is a monoid object inside for some . Thus we have ‘multiplication’ 2-cells and ‘unit’ such that if we draw associativity and unit diagrams, we witness commutativity of those. A monad morphism consists of a 1-cell together with a 2-cell . It should be required that preserves units and associativity. A monad functor transformation is a 2-cell suitably commuting with 2-cells. For each , this defines a 2-category , the construction is actually functorial in .
Maybe some readers are not willing to take this definition for granted. Then I remind them of the following situation in . For a monad , a -algebra is an object equipped with a map , commuting suitably with and . The map of monads will send an -algebra to a -algebra.
A 2-category admits construction of algebras if the inclusion 2-functor , sending to , has a right adjoint (in the strict 2-categorical sense) . ( always has a left adjoint .) The counit of 2-adjunction gives rise to a monad morphism through which each morphism of monads factors. The adjunct of defines a map that is left adjoint to and such that . This gives us a theorem which says that when exists, each monad is generated by an adjunction.
Also, for each adjunction , there is a comparison 1-morphism . is monadic when this is an equivalence (Street asks for an isomorphism, though).
III. The paper also studies representability questions. In , we know that has the form of the category of algebras for a monad mentioned above. By direct consideration, it turns out, that there is an isomorphism (of categories — things are strict in this paper!) where is the monad induced by Yoneda embedding. With this, we have two things to mention:
(a) represents a functor . It turns out that can be obtained as a weighted limit. This is observed but not proved in the paper, but was actually done afterwards.
(b) Many things about monads in can be deduced by passing to -presheaves. For instance, is an equivalence iff for each the functor is an equivalence. So is monadic iff is for each .
IV. For a given object of , we can form the comma category (we use the underlying functor) of monads over . One can ask, given a monad and , if there is a pushforward monad on which is universal (any monad morphism factors through the canonical one ). Those for which this pushforward exists are called tractable.
For instance, we can try taking to be the right Kan extension of along , and this (if exists) will give us a desired universal pushforward. When is unity, this is precisely what is called codensity monad. Observe that when has a left adjoint , we can always take , and this will satisfy the desired universal property. Thus codensity monads are monads one would get if actually had a left adjoint.
Let us restrict our attention to the category which is the same as the inverse image of the underlying functor over . The construction of algebras can be actually seen as a ‘semantics’ functor , , and it will land in the full subcategory of tractable objects. For each tractable object, forming the universal monad gives us the ‘structure’ functor , . is right adjoint to , and the counit of this adjunction is moreover an isomorphism.
V. For a 2-category , there are its duals , , and . There is a big section of the paper which studies questions involving interpretations of monads and the construction of algebras in these dual 2-categories.
In we again get monads (and so if is generated by an adjunction in , same is true in ), but the functors between them are different. Precisely, if we consider (so that the direction of 1-morphisms is the same as in ), we get that a 1-morphism here is a monad opfunctor in : a 1-cell with a 2-cell , opposite direction. The category of algebras in this case is denoted , and comes with canonical . That’s the generalisation of Kleisli category.
Opfunctors often appear if one considers adjunctions: for two monads (X,S) and (Y,T) and an adjunction , supplying the right adjoint with a 2-cell to make it into a monad functor is the same as supplying in a 2-cell to make it into a monad opfunctor.
Monads in and by contrast are comonads in . Of more interest for consideration (for reasons explained below) is the 2-category , whose objects are comonads and morphisms are comonad opfunctors (the 2-cells are going in the same direction as 2-cells for monad functors). Again, when we have the construction of algebras for , there is a universal object for each comonad , with the opfunctor .
If there is an adjunction , then we get a monad on but a comonad on . When we do the construction for a monad, the fact that gives us a comonad on . This allows to define a 2-functor , which sends to (provided that admits the construction of algebras). Dually, there is (if, again, is nice enough). A big calculation reveals that this is an adjunction: which can be thought of lifting one of the construction of algebras adjunctions (here is just a fancy name for the ‘coop’ of on ). Street hypothesises if one can use this lifting idea together with ‘2-triangle’ theorems to actually prove that if admits algebras, then so does . One could hope that a version of 2-lifting theorem would obtain the right adjoint as above from knowing the right adjoint to , and then apply the (dual of the) underlying functor after , as to show that admits algebras for monads. That would be something special, however.
The other result mentioned in this section is when has a duality involution . In that case, admits construction of algebras whenever does, and .
VI. A distributive law consists of two monads and together with a 2-cell so that makes into a monad functor with and becoming monad morphism transformations (equivalently, becomes a monad opfunctor on ). An example here is given by
, the monoid monad, and
, the abelian group monad
(the usual distribution of and gives a map ). By writing out what means, one can witness that distributive laws in are the objects of . Moreover, using the ’s coming with each distributive law, one can define a natural transformation , which together with makes into a (3-)monad on -.
In addition, one can also consider the ‘opfunctor’ monad . Taking compositions like , or again gives distributive laws (as objects), but different kinds of morphisms between them. One can ask if the resulting monads and are related by a distributive law; however one can actually check that the law is a canonical isomorphism:
VII. Essentially, the sole example of this paper is . Using the fact that, for a monad , the category of algebras is in fact what is known as Kleisli category, which certainly exists (its objects are the same as of and morphisms are maps of the form ), we conclude that and all admit the construction of algebras. A really nice result which follows is the statement that for a monad , where and is induced by the canonical functor . So we see that algebras for is a full subcategory of presheaves on Kleisli category which are representable when restricted to .
Finally, for -, it is stated that the formal theory is applicable to this 2-category once a couple of properties are satisfied. Unfortunately, not much more is said about the enriched categories case in detail.
VIII. And that was it, (some of) the formal theory of monads! I must say that the 2-categorical content of that paper is really interesting, and one could ask if there is a way to proceed in the same manner in a higher -categorical setting. Another thing is that, if one wishes to discuss monads in full, it is (as I presume) quite desirable to look beyond the formal theory for as many examples as possible. This is why I conclude my exposé with some references I know to the subjects that look interesting. Everyone is welcome to make her or his own list!
There is a follow-up of “The formal theory of monads”, called “The formal theory of monads, II”, in which the idea of being a weighted limit is taken seriously, to the extent that the authors consider, for a 2-category , its completion with respect to these limits. The 1-interior of turns out to be the same as the one of , and moreover one can use to work with monads in a very useful way using mostly the universal property of this completion. The authors also study wreaths — the objects of , and in addition the paper is quite filled with examples for the notions presented.
The result everyone knows today as the Monadicity Theorem has a lot of applications. One of them which I like is the result that, given an (elementary) topos , the ‘maps into the classifier’ functor from is monadic. Because of this, has all finite colimits. On the other hand, when I talk to some (derived) algebraic geometers, they tell me that from their experience the only place where monadicity theorem comes close to be used fully and faithfully is Tannaka duality, and even there it can be avoided as was done in some first expositions. This might be subjective; however my search of any details on the question revealed a paper named The Formal Theory of Tannaka Duality, the author of which claims that it is inspired by Ross Street’s writeup! One might want to have a look for the detail.
Monads are strongly connected with algebraic theories (the latter term can actually be used here in a general sense). What is less known, probably, is that one can use monads themselves to describe some ‘ring-like’ objects. What I mention here is the theory of generalised commutative rings (which are some special monads on ) by Durov, which appeared in his thesis, and can include such things as the field of one element as an example. One can then do a lot of algebraic geometry with this monads, up to K-theory and intersection stuff.
Monads do appear in higher categories. When working with model categories, one can work with weak monadic projections to treat Bousfield localisation (e.g. see Section 11.2 of Simpson’s book on Segal categories). Another thing is that it is often the case that one runs into a question if the category of algebras for a monad (or even a comonad) admits a model structure. For monads, one can see this, and for comonads, this and probably the recent one(!) . Finally, our dear host of this seminar has co-authored a paper which treats homotopy coherent adjunctions and monadicity theorem in quasi-categories, in a way much simpler than originally done by Jacob Lurie in his DAG II (the approach of Lurie still has some interesting flavour, coming from the treatment of algebraic structures via fibrations).
I’ve just arXived a new paper about a new invariant, The magnitude of a graph. Much of the development of this invariant has taken place at this blog, with two previous posts and crucial contributions from David Speyer and Simon Willerton. Your comments, critical or otherwise, would be very welcome.
Magnitude seems to be orthogonal to other graph invariants. You can’t derive from it such classic invariants as the Tutte polynomial or the chromatic number, nor the girth nor the clique number nor even the number of connected components. And conversely, you can’t derive the magnitude from these or any other well-known graph invariants. Apparently, it captures information of a different kind.
Its most appealing characteristic is that it behaves like cardinality. It’s multiplicative with respect to cartesian product, additive with respect to disjoint union, and obeys a restricted version of the inclusion-exclusion principle. And it also behaves a bit like the Tutte polynomial. For instance, it’s often invariant under Whitney twists — though not always, as I’ll explain.
That it behaves like cardinality is no surprise, given its origins.
Steve Schanuel taught us that the Euler characteristic of a space should be thought of as analogous to the cardinality of a set. There is a notion of the Euler characteristic of a category, closely related to other forms of Euler characteristic. This notion can be generalized to enriched categories, then specialized to categories enriched in the poset , seen as a monoidal category via addition. Any graph gives rise to an -enriched category, the objects being the vertices and the homs being distances in the graph. So, we get an invariant of graphs — and that’s what’s called magnitude.
That’s a highly condensed account, and you can find out more by following the links, but let me skip all that and tell you the end result, which is completely concrete.
The magnitude of a graph is a rational function over — the ratio of two polynomials with integer coefficients. It can also be expressed as a power series over . For example, the graph
Here is just a formal variable.
The definition goes like this. Take a graph . Let be the square matrix whose rows and columns are indexed by the vertices of , and whose entries are given by
for each pair of vertices and . Here is the distance between and in the graph (and by convention, ). It turns out that is invertible over both the field of rational functions and the ring of power series. The magnitude of is the sum of all the entries of its inverse.
I know of no direct graph-theoretic motivation for this definition. Frankly, it looks odd. The real motivation is that it’s part of a large family of related invariants, including not only cardinality and various types of Euler characteristic, but also metric dimension, maximum entropy, and, conjecturally, geometric measures like volume, surface area and perimeter. From that point of view, it would be odd if graph magnitude wasn’t important.
The paper has two main theorems. First, magnitude obeys an inclusion-exclusion principle: when and are subgraphs of some larger graph, and certain further conditions hold,
(Theorem 4.9). I was frustrated by having to impose those “further conditions”, because it meant that magnitude didn’t behave entirely like cardinality. But then I figured out that no nontrivial graph invariant behaves entirely like cardinality (Lemma 4.1).
The second main theorem was conjectured by Simon. Take a graph with two distinct vertices distinguished, and another graph with two distinct vertices distinguished. Glue them together by pairing up the distinguished vertices. There are two different ways you can do this, depending on how you do the pairing, and the two resulting graphs are said to differ by a Whitney twist.
David Speyer pointed out that (as all graph theorists know) the Tutte polynomial is invariant under Whitney twists, so if magnitude is a specialization of the Tutte polynomial, it had better be invariant too. Then David got his computer to randomly generate a couple of 18-vertex graphs differing by a Whitney twist, and lo and behold, their magnitudes were equal. That magnitude was a rational function of degree 23 whose coefficients were integers of up to 5 decimal digits, so it was highly unlikely that this happened by chance.
So, when Simon found an example showing that magnitude isn’t invariant under Whitney twists, it came as a considerable surprise. What’s going on?
Well, Simon immediately conjectured that you get invariance only when the two points of identification are joined by an edge in one of the original two graphs. That turns out to be true — and that’s the second theorem.
The apparently implausible coincidence in David’s experiment can therefore be explained as follows. He generated two 10-vertex graphs and , putting an edge between each pair of vertices with probability . He then joined them together in two ways, giving two 18-vertex graphs and differing by a Whitney twist. The second theorem applies if (or equivalently ) has an edge between the two points of identification, and the probability that this happens is . So, it was more likely than not that he’d get what he got.
If someone told you that in order to flourish rather than languish, you have to have at least 2.9013 times as many positive as negative emotions, and that they know this because of the Lorenz equations, you’d be instantly suspicious, right? It sounds like run-of-the-mill pseudoscience, but it was published in American Psychologist, official organ of the American Psychological Association and one of the very top journals in the field. Not only that, the lead author on this two-author paper, Barbara Fredrickson, is one of the editors of this journal.
Guardian Observer carried a great piece yesterday telling how it was debunked by — it gets better — a master’s student, Nick Brown. He teamed up with the legendary Alan Sokal and the psychologist Harris Friedman, and they wrote this highly enjoyable account of exactly how incompetent the original paper was. American Psychologist, to its credit, is publishing it.
Sample from the original paper:
An interesting observation that highlights the usefulness of fluid dynamics concepts to describe human interaction arises from the fact that Lorenz chose the Rayleigh number as a critical control parameter in his model. […] Low performance teams could be characterized as being stuck in a viscous atmosphere highly resistant to flow
Sample from Brown et al.’s response:
They appear to assert that the predictive use of differential equations abstracted from a domain of the natural sciences to describe human interactions can be justified on the basis of the linguistic similarity between elements of the technical vocabulary of that scientific domain and the adjectives used metaphorically by a particular observer to describe those human interactions. If true, this would have remarkable implications for the social sciences. One could describe a team’s interactions as “sparky” and confidently predict that their emotions would be subject to the same laws that govern the dielectric breakdown of air under the influence of an electric field. Alternatively, the interactions of a team of researchers whose journal articles are characterized by “smoke and mirrors” could be modeled using the physics of airborne particulate combustion residues, combined in some way with classical optics.
For the last couple years, people interested in quantum gravity have been arguing about the ‘firewall problem’. It’s a thought experiment involving black holes that claims to demonstrate an inconsistency in some widely held assumptions about how quantum mechanics and general relativity fit together. Everyone is either scratching their heads over it, struggling to find a way out… or grumbling that the problem isn’t real, and everyone else has gone crazy.
While the firewall problem has roots going back earlier, the paper that got everyone interested came out in July 2012:
- Ahmed Almheiri, Donald Marolf, Joseph Polchinski and James Sully, Black holes: complementarity or firewalls?
They called it the ‘firewall’ problem because one way out is to assume that when you fall into a black hole you hit a ‘wall of fire’ — a region of hot radiation — as you cross the event horizon. That sounds crazy: the equivalence principle says you shouldn’t feel anything special as you fall through the horizon, at least if the black hole is big enough. But they claimed this crazy-sounding solution was the most conservative way out!
In June of the following year, two extremely reputable physicists, in their attempts to avoid this conclusion, came up with an idea that sounds even crazier: every pair of entangled quantum particles is connected by a wormhole!
- Juan Maldacena and Leonard Susskind, Cool horizons for entangled black holes.
They called their proposal EPR = ER, which is a joke referring to two papers Einstein wrote in 1935. EPR stands for ‘Einstein–Podolsky–Rosen’, the authors of a famous paper on quantum entanglement, a spooky way that distant objects can be correlated. ER stands for ‘Einstein–Rosen’, the authors of a famous paper on wormholes, solutions of general relativity in which distant regions of space can be connected by a kind of tunnel, or handle.
I haven’t been thinking much about quantum gravity lately, but the recent convulsions in my old favorite subject caught my attention. In particular, I’d long been fond of the idea that when we finally understand quantum gravity, the distinction between quantum mechanics and our theory of spacetime will evaporate. There’s a lot of evidence coming from category theory, which reveals analogies between the two:
Now, quantum entanglement is a sneaky way for two distant particles to be correlated. A wormhole is a sneaky way for two distant particles to be connected. Could this be hinting at yet another analogy between quantum mechanics and our theory of spacetime? Perhaps one that deserves a category-theoretic treatment?
Last summer, when Jamie Vicary and I were both at the Centre for Quantum Technologies, we figured out something interesting about this. We wrote a paper about it, which is done now:
- John Baez and Jamie Vicary, Wormholes and entanglement.
I’d like to explain a bit of it here, because it uses 2-categories in a cute way.
First, I want to emphasize that Jamie and I are not making any claims about the firewall problem. In particular, we’re not claiming that EPR = ER is a way to solve the firewall problem.
Second, we’re not claiming that in our universe, every quantum-entangled pair of particles is connected by a wormhole.
Third, I’m not going to explain the firewall problem here. I don’t feel I have a deep understanding of it — and in fact, I wouldn’t be shocked if it gently dissolves when people think about it carefully enough. So, if you want to get a feel for this problem, but don’t feel quite up to the technical literature, I recommend starting with these blog articles:
- George Musser, When you fall into a black hole, how long have you got?
- John Preskill, Is Alice burning? The black hole firewall controversy.
- Joe Polchinski, Black Holes, complementarity, and firewalls.
I’ve listed them in roughly increasing order of how much physics you need to know to enjoy them, but they’re all good in different ways. The quantum information theorist John Preskill also wrote a nice intro to Maldacena and Susskind’s work:
- John Preskill, Entanglement = wormholes.
Wormholes and pair creation
Okay, so let me tell you what I’m actually going to tell you about. Jamie and I did some calculations that work in a simplified setup where spacetime is just 3-dimensional, and nothing about spacetime matters except its topology. This is called a 3-dimensional topological quantum field theory, or 3d TQFT for short.
3d TQFTs are good for describing quantum gravity in a fictitious world where spacetime is 3-dimensional and time is no different from space. At a more practical level, they’re also good for describing thin films of certain special substances at very low temperatures! Some people want to use these films to make topological quantum computers.
Here’s the interesting thing: in this setup, the creation of a particle-antiparticle pair can be reinterpreted as the formation of a wormhole. The particle and its antiparticle are the opposite ends of the wormhole! See, we can interpret this picture two ways:
In one way of interpreting this picture, we’re looking at a chunk of spacetime with a curved tube removed. Reading it from bottom to top, space starts out as a disk and ends up as a disk with two holes. So, a particle-antiparticle pair is being created!
In the other way of interpreting this picture, we think of that curved thing as a wormhole: a handle connecting two distant part of space. Reading from bottom to top, space starts out as a disk and ends up as a disk with a handle attached. So, a wormhole is being formed!
If this seems confusing, don’t worry: I’ll explain it more precisely later.
The idea that particle-antiparticle pairs could be wormhole ends is an old one, going back to the paper by Einstein and Rosen. It’s especially cute for electrically charged particles. When it seems the electric field is diverging, coming out of a particle, maybe it’s really just flowing through a wormhole and coming out one end… and flowing in the other end, which would thus look like a particle of the opposite charge.
The idea has been kicking around for a long time. John Wheeler was especially fond of it: he called it ‘charge without charge’. But the discovery of 3d TQFTs gave us a context where we can make this idea precise and calculate with it.
This lets us relate quantum entanglement of particle-antiparticle pairs to wormholes. Indeed, we can compute what happens when a wormhole forms in a 3d TQFT, and use that to compute the state of the resulting particle-antiparticle pair.
The particle and antiparticle act entangled! They even act like they’re ‘completely’ entangled: every piece of information about the particle at one end of the wormhole is encoded in information about its antiparticle at the other end.
This entanglement is, however, a kind of sham. The particle and its antiparticle act entangled, but they have no choice about this, since they are not really separate things. They are just two views of the same thing: the wormhole. True entanglement arises when two independent things are correlated in a certain way. It doesn’t really count as entanglement when something is correlated with itself.
The fact that the entanglement is ‘fake’ was crucial for Maldacena and Susskind, who used it to get around a key aspect of the firewall problem: the so-called ‘monogamy of entanglement’. Just as it’s against the rules to be married to two people, a quantum system can’t be entangled with two others. But ‘fake’ entanglement isn’t monogamous.
Jamie Vicary and I make the concept of fake entanglement precise in our paper, but I don’t feel like talking about that now. Instead, I want to say more about how the creation of a particle-antiparticle pair can be reinterpreted as the formation of a wormhole. This is where 2-categories come in…
Wormholes and pair creation: the 2-categorical view
In our simplified setup where spacetime is just 3-dimensional, space is just 2-dimensional, and the formation of a wormhole looks like this:
This process goes from a disk to a disk with a handle attached. But the disk itself can be seen as a process going from the empty set to a circle… and so does the disk with a handle attached. A ‘process between processes’ is a 2-morphism in a 2-category. So, the formation of a wormhole is actually a 2-morphism, like this:
Here is the empty set, is the circle, and is the disk. is the disk with a handle attached… but why am I calling it that?
Here’s why. The disk with a handle attached can be chopped into two pieces: a handle, and a disk with two holes:
The handle goes from the empty set to the disjoint union of two circles:
The disk with two holes goes like this:
So, the disk with a handle is the composite
Okay, so we’ve sliced and diced the process of wormhole formation and seen how it’s a 2-morphism in a 2-category. What about the process of creating a particle-antiparticle pair? What does that look like?
It looks like this:
We read this from bottom to top. Space starts out as a disk. As time passes, a particle-antiparticle pair forms. When they’ve formed, space is a disk with two holes. (You see, in 3d TQFTs a particle can be treated as a hole cut out of space, or ‘puncture’.)
Here’s a 2-categorical diagram for this process of pair creation:
At the start, space is a disk — and as before, we think of this disk as going from the empty set to the circle:
As time passes, space turns into a disk with two holes, namely
As this happens, the circle doesn’t change, but the empty set turns into two circles, thanks to the handle
Now here’s the punchline. Take the square diagram:
Compose the left and top arrows, and compose the bottom and right ones. The bottom and right ones compose to give a disk:
so we get a diagram we’ve already seen:
This is the diagram for wormhole formation! So, pair creation is just wormhole formation in disguise.
Finally, here are some remarks that may be too categorical, and even too 2-categorical, for most readers to enjoy.
The ‘bigon’ here is how we draw a morphism in a 2-category:
But the square here is how we draw a morphism in a double category:
There’s a standard trick for getting double categories from 2-categories, due to Ehresmann, and that’s what we’re using here. It’s fairly common for people to study TQFTs using 2-categories. Indeed, Jamie is writing a huge multi-part paper on this with Bruce Bartlett, Chris Douglas and Chris Schommer-Pries, which will be really exciting when it comes out, assuming hell hasn’t frozen over yet. It’s less common for people to study TQFTs using double categories, but Jeffrey Morton has:
- Jeffrey Morton, Extended TQFTs and quantum gravity.
- Jeffrey Morton, Double bicategories and double cospans.
The advantage of double categories is that they let us study cobordisms-between-cobordisms where the boundary of space changes with time… as in pair creation. Since Jeffrey was my grad student, I was primed to think about this issue.
Here’s another thing, which Jamie and I were reluctant to include in our paper. You can get a 3d TQFT from a modular tensor category , and then
Since a hole in space acts like a particle, and cutting out a hole in space leaves a circle as boundary, plays the role of the ‘2-Hilbert space of particle types’. has a basis of simple objects which are the various types of particles.
The 2-Hilbert space for a pair of particles is then
where is the tensor product of 2-Hilbert spaces. So when we apply our TQFT to a handle
and this functor is determined by its value on , which is
where ranges over the simple objects of , and is the dual of .
Translating back into physics: the two ends of the wormholes act like particles. These particles can be of any type whatsoever, but if one end happens to be a particle of type , the other end has to be the corresponding antiparticle, . So, you could call the space of states of a particle-antiparticle pair.
And, this space of states is a categorified version of a completely entangled state! In ordinary quantum mechanics, we could have a Hilbert space with a basis , and a completely entangled state would be something like
(up to normalization). Now we’ve got a 2-Hilbert space and
It’s the same idea, one level up! We could call this ‘2-entanglement’.
Even better, we have a kind of microcosm principle thing going on. In higher category theory, little things often naturally live in bigger things that are categorified versions of themselves. For example, monoids naturally live in monoidal categories. There are many other examples, too. This is called the ‘microcosm principle’.
In the situation here, we’ve already seen the space of states of the particle-antiparticle pair is 2-entangled. But the formation of a wormhole creates an element of this space of states: a particle-antiparticle pair in a particular state. And this state is itself entangled!
Jamie explain this in more detail in our paper, starting at equation (4). But we didn’t mention that the entanglement of the state of the particle-antiparticle pair is ‘riding’ the 2-entanglement of the space of states of this pair. You have to really like higher categories to care about an observation like this.
Guest post by Clive Newstead
William Lawvere’s Elementary Theory of the Category of Sets (ETCS) was one of the first attempts at using category theory as a foundation of mathematics and formulating set theory in category theoretic language. In this post I will outline the content of Lawvere’s paper [Lawvere 1965] and put it in a historical context; suggestions for further reading are at the end. Before I begin, I’d like to thank Emily Riehl, Steve Awodey and the other Kan Extension Seminar participants for their helpful feedback, suggestions, and reading responses to the paper; and, of course, William Lawvere for writing it!
ETCS came about at a boomtime for category theory and the foundations of mathematics. It was the early 1960s; category theory had found its feet and had started being used extensively throughout mathematics, notably in algebraic geometry with the work of Alexander Grothendieck. Around the same time, Paul Cohen developed the technique of forcing, which has since become central to modern set theory. Lawvere had recently written his Ph.D. thesis [Lawvere 1963], which was the first step in his programme of using the category of categories as a foundation. It was therefore fitting that he should want to try his hand at doing set theory inside category theory.
Lawvere’s attempt did not go unopposed, even by the likes of Samuel Eilenberg and Saunders Mac Lane:
His attempt to explain this idea to Eilenberg did not succeed; […] Sammy asked me to listen to Lawvere’s idea. I did listen, and at the end I told him “Bill, you can’t do that. Elements are absolutely essential to set theory.” [Mac Lane 1988]
Unfazed, Lawvere pressed on and in 1964 he published the ETCS paper, which was expanded and reprinted in 1965. What follows is a very brief sketch of its contents; more depth and technical details can be found in the paper itself.
Axioms of ETCS
Before listing the axioms, we should fix some set theoretic notation with category theoretic meaning: given an object in a category with a terminal object , if then we write and say is an element of . If is monic then we say is a subset of , and write . We then write if factors through .
The axioms of ETCS consist of the usual axioms for a category together with the following eight axioms:
- has an initial object , a terminal object , products and coproducts of all pairs , and equalisers and coequalisers of all pairs . (Thus has all finite limits and colimits.)
- Every pair of objects has an exponential .
- has a natural numbers object , with zero morphism and successor morphism .
- is well-pointed, i.e. given any pair of maps with , there is an element such that .
- Axiom of Choice: If and there is some , then there exists a quasi-inverse , which satisfies .
- If has no elements then is an initial object.
- If then factors through one of the coproduct inclusions.
- has an object with more than one element.
With modern technology we can simplify this list considerably: a model of ETCS is a well-pointed topos with a natural numbers object satisfying the axiom of choice.
In Lawvere’s paper, the axioms are introduced as they are needed, rather than all as one list. As such, it becomes clear which properties held by the category of sets are also held by other categories. Moreover, Lawvere comments on how the axioms interact with each other. For example, since the category of posets and order-preserving maps satisfies each of the axioms except for #5, the axiom of choice is independent of the other axioms of ETCS.
Mathematics with ETCS
Lawvere demonstrated the usefulness of ETCS as a set theory by proving six mathematical theorems.
Theorem 1 (Primitive recursion). Given a pair of morphisms there is a morphism satisfying, for each and ,
The primitive recursion theorem requires only axioms 1-3, and uniqueness of is implied by axiom 4. Thus primitive recursion can be done in categories such as the functor category for any small category , and in the arrow category .
Theorem 2 (Peano’s postulates).
- The morphism is monic
- If for some then .
- If and then .
The third of these allows us to perform mathematical induction in any model of ETCS.
Theorem 3 (Image factorization). Every morphism can be factored as an epi followed by a mono.
Theorem 4 (Unions). Let and let be one of the inclusion maps. Given any there is a subset such that, for all , if and only if for some .
Intuitively, is the union of the subsets of picked out by the indexing morphism ; that is,
Theorem 5 (Complements). Let be an object. Given any subset there is a subset , such that with and as the coproduct injections.
Theorem 6 (Quotients). Let be an object and be an equivalence relation. Let be the product projections. Then is the equaliser of and , where is the coequaliser of with .
Intuitively, is the quotient of by , then the theorem is stating that is precisely the set of pairs such that .
The paper closes with a theorem justifying that the axioms of ETCS really do characterise categories which are sufficiently ‘like’ the category of sets. Lawvere refers to this likeness as the invariant form of the category of sets.
(Meta)theorem. If is a locally small, complete model of ETCS, then is equivalent to .
The equivalence is given by
What happened next?
The appeal of ETCS was not felt by the mathematical community at the time. Indeed, according to Marquis and Reyes:
The category of sets was not taken as a foundational framework. It was not studied and explored. […] It seems that one of the most common reactions at the time was that ETCS was merely a translation in categorical terms of the standard axioms of set theory [Marquis and Reyes 2011]
Even Lawvere acknowledged the shortcomings of his work:
However, it is the author’s feeling that when one wishes to go substantially beyond what can be done in the theory presented here, a much more satisfactory foundation for practical purposes will be provided by a theory of the category of categories.
However, the impact of ETCS stems not from its (non-)use as a foundation, but from the research that it led to. It was one of the first attempts to give a categorical account of logic, and it led directly to the definition of an elementary topos, which generalised the notion of a Grothendieck topos and bridged the gap between geometry and logic via category theory. For a number of reasons it was toposes, rather than models of ETCS, which became the central object of study in categorical logic.
References and suggested reading
- Lawvere, F.W. Functorial Semantics of Algebraic Theories, Ph.D. thesis, Columbia University (1963).
- Lawvere, F.W. An elementary theory of the category of sets, Proceedings of the National Academy of Science of the USA 52, 1506-1511 (1965), reprinted as Lawvere, F.W., McLarty, C. An elementary theory of the category of sets (long version) with commentary, Reprints in Theory and Applications of Categories, No. 11 (2005), pp. 1-35.
- Mac Lane, S. Concepts and categories in perspective. In A Century of Mathematics in America, I, vol. I, pages 323-365. American Mathematical Society, Providence (1988).
- Marquis, J.-P. and Reyes, G. The History of Categorical Logic: 1963-1977. In Handbook of the history of logic, vol. 6, pp. 689-800. Elsevier (2011).
The EFF have been doing fantastic work for over 20 years, keeping the internet the kind of place you most likely want it to be: ensuring your freedom of speech, protecting your privacy, and defending the core principles of the internet against the controlling ambitions of both corporations and governments. Although they’re only a small nonprofit organization, with a comparatively minuscule budget, they’ve had a string of legal victories against huge players. They deserve your support!
But what are they doing at the Joint Mathematics Meetings? It was the idea of Thomas Hales. Hales is famous for, among other things, proving the Kepler sphere-packing conjecture. (He also wrote a very nice introduction to motivic measure, mentioned here in passing a couple of years ago.)
Hales anticipated that the NSA would be recruiting mathematicians with particular fervour this year: in order to recruit, they’ll need to overcome the outrage caused by the recent revelations of mass, population-level, surveillance. They’ll want to persuade mathematicians that what they’re doing is good for society. And the EFF will be there to tell mathematicians that there may be better channels for their talents.
In an earlier post, Bas Spitters immediately put his finger on the point where mathematics most directly touches the NSA scandal: the undermining of internet encryption. What you might not have immediately picked up from casually reading the newspapers is that among the variety of techniques used by the secret services to get round encryption, they managed to insert a back door into a cryptographic protocol based on elliptic curves.
The definitive mathematical account of this is Hales’s piece in the February 2014 issue of the Notices of the AMS. There are many other accounts from different perspectives. But rather than dump a large number of links on you, let me highlight one by the EFF.
The EFF piece quotes from an internal NSA document, which lists as one of their items of budgetary spending:
Insert vulnerabilities into commercial encryption systems, IT systems, networks …
And the EFF article makes a crucial point:
By weakening encryption, the NSA allows others to more easily break it. By installing backdoors and other vulnerabilities in systems, the NSA exposes them to other malicious hackers—whether they are foreign governments or criminals. As security expert Bruce Schneier explained, “It’s sheer folly to believe that only the NSA can exploit the vulnerabilities they create.”
In other words, even if (for some reason) you trust the NSA with everyone’s data, their undermining of internet encryption makes the world a more dangerous place.
Maybe you’re saying: what’s done is done. The outlook is gloomy, but what can we do now?
One immediate priority is to stop the situation becoming normalized. Those who wish to destroy online privacy will want to make the NSA’s actions seem like an unexceptional part of protecting national security. The NSA will, I imagine, be trying to persuade mathematicians in Baltimore this week that the whole fuss is really rather overblown; that perhaps a few checks and balances need adjusting, but fundamentally what it’s doing is good. The opposite argument needs to be made, and I imagine the EFF will be making it.
But more positive actions are possible. Mathematicians involved in cryptography can speak up! They can say “I do not want to contribute to mass surveillance”, just as physicists and engineers have refused to contribute to the building of nuclear weapons, and doctors have refused to participate in torture. We can withdraw our labour. We have that choice.
And mathematicians have a role to play in building new tools that allow genuine privacy. In the wake of the Snowden revelations, there’s been a big push to develop encrypted channels of communication that are secure against government snooping. Mathematicians can help.
It’s important to realize that as far as we know, the NSA has made no decisive mathematical cryptographic advance of which the rest of the world is ignorant. Snowden said in June:
Encryption works. Properly implemented strong crypto systems are one of the few things that you can rely on. Unfortunately, endpoint security is so terrifically weak that NSA can frequently find ways around it.
The NSA have used social, legal, physical and other means to weaken encryption, but the underlying mathematics has not been challenged. Nonetheless, mathematical expertise will be needed to build new, better, tools.
I’d love to go along to the EFF booth this week and help out. Unfortunately, I’m on the wrong side of the Atlantic. But if you’re going to be there and you want to help out, I believe they’d appreciate it. Just drop an email to Yan Zhu, who will be the official EFF representative there, or talk to her in person when you arrive.
And if you can’t be there, but reading this has made you want to help in some other way, why not join the EFF?
Postscript For more on the NSA’s weakening of internet encryption, here are some further links. I already linked to one post by the Johns Hopkins computer scientist Matthew Green; here’s a more technical companion post. Less technically, there’s a great piece by IT security legend Bruce Schneier, and of course there’s any number of articles for a general readership.
The categorical notion of an end is something that several people have requested Catster videos for and Yemon Choi was recently asking if Tom had covered it in his new-born book. Given that I’ve got ends in my head at the moment for two different reasons, I thought I’d write a post on how I think about them.
I feel that seeing an integral sign like can cause people’s eyes to glaze over, never mind them getting confused as to whether that represents an end or a coend. So I will endeavour to avoid integral signs apart from right at the end.
My experience is that coends roam more freely in the wild than ends do, but I will focus on ends in this post. One reason that people are interested in ends is that natural transformation objects in enriched category theory are expressed as ends, but I will stay away from the enriched setting here.
Having said what I won’t do, maybe I should say what I will do. I will mainly concentrate on a few examples to demonstrate ends as universal wedges.
The input data for an end is a functor of the form . The only functors that I know people take the ends of are those of the form where are functors and is some kind of hom functor, i.e. either the usual hom (so ) or an internal hom (so ).
Here are some typical examples I’ve come across.
, where are functors.
is finite-dimensional algebra over ,
is the category of finite-dimensional, complex representations of ,
is the category of complex vector spaces,
is the forgetful functor,
is the internal hom in , i.e. is the vector space of linear maps from to .
where is a finite group
is the internal hom of -representations and , so it is the vector space of linear maps with the -action on given by .
The main point of difference between 3 and 4 is that for an arbitrary algebra , the category of representations does not have an internal hom, but for a finite group it does. Other examples like 4 would involve representation categories of Hopf algebras or quantum groups.
An end is a universal wedge, so I’d better say what a wedge is. A wedge for a functor is an object with a morphism for every object . These morphisms have to satisfy a naturality condition which says that for every morphism in , the two obvious maps you can make are the same, i.e. the following diagram commutes. We can denote such a wedge by writing .
Let’s have a look a what wedges are in the examples given above.
For , a wedge is a set and a function for each . This means that for each we get a family of morphisms . The naturality condition ensures that these are the components of a natural transformation of the identity functor . So a wedge is a set with a function .
For , by a similar argument to that above, a wedge is a set with a function to the set of natural transformations.
In this case, where we have a -algebra , an end is a vector space with a morphism for every representation of , in other words a linear map . The naturality condition says that these linear ‘action maps’ must commute with all -intertwining maps.
In the case of the internal hom of representations of a finite group, a wedge consists of a representation together with a natural ‘action’ on every -representation: . These action maps are intertwiners and must commute with all other intertwiners.
If is of the form as in 1, 3 and 4 above, then we have morphisms so in some sense is acting on for every .
An end is a universal wedge. An end for consists of a set with a morphism for each , satisfying the wedge naturality conditions, such that if is another wedge for then there is unique map such that the components of factor through as .
An end is often written as an integral . Coends, which I’m not talking about here, are written with the limits at the top of the integral sign: in Sheffield we have the mnemonic
“The end of the walking stick is at the bottom.”
I have mixed feelings about this notation. I think that people can be intimidated by it, also people get the impression that it is supposed to be something to do with integration which confuses them (or maybe that’s just me!)
We can look back at our examples and identify the ends.
For , from what is written above, it should be clear that the end is the set of natural transformations of the identity . This set is sometimes called the Hochschild cohomology of the category.
For , similar to the example above, the end is the set of natural transformations from to .
In the case of the representation category of an algebra we find that the end is actually (the underlying vector space of) the algebra itself. It is pretty obvious that the tautological action on -representations gives rise to a wedge. It is less obvious that it is the universal wedge.
This is the example of the internal hom for the representations of a finite group . Whilst we have the tautological action for every representation , we can make these into intertwining maps by taking the group algebra with the adjoint action . The resulting morphisms in make the group algebra into a wedge for the internal hom . In fact it is an end for the internal hom. This is related to the fact that a group algebra is semisimple and that there is an isomorphism of algebras in the representation category where sum runs over (a set of representatives of the equivalence classes of) the irreducible representations of .
Example 3 is the prototypical example of Tannakian reconstruction. We start with the representation category and the ‘fibre functor’ , then reconstruct from there. See the nlab page on Tanaka duality for more details.
Example 4 is a kind of ‘internal reconstruction’. More generally, for a Hopf algebra the end of the internal hom is a version of inside its category of representations. I believe that this idea is due to Shahn Majid (see Chapter 9 of Foundations of Quantum Group Theory).
Ends as Algebras (or Monoids, if you prefer)
In Examples 1, 3 and 4, the end could actually be given more structure than that of being just a set, a vector space or a representation. In all three examples the end is actually an algebra (or monoid, if you prefer) in the appropriate category. In fact, in Example 4 the end has a Hopf algebra structure, but I won’t go into that here.
Suppose the functor is of the form for some functor and for either internal or external hom then using the composition of the hom, for each we have the composite and you can check that this gives a wedge . Thus by the universal nature of the end we get a canonical map In a similar vein we have an identity morphism for each these give rise to a wedge so by the universality of we get a canonical map You can verify that and make the end into an algebra object in , and the components of the end are algebra homomorphisms.
Just to finish we can have a look at how taking different ends on a similar category give different but related answers. So let’s look at Examples 1, 3 and 4 for the category of representations of a finite group.
If we use the ordinary hom then we get the centre of the group algebra.
If we use the internal hom in we get the group algebra.
If we use the internal hom then we get the group algebra with the adjoint action.
There ends my introduction to ends.
I’ve posted before about the involvement of mathematicians and computer scientists in the secret services — who, we now know, routinely record your and my daily activities. And in case you’ve somehow missed this whole scandal, you could do a lot worse than starting with this piece by Bruce Schneier in The Atlantic.
Many academics oppose mass surveillance. This probably includes many mathematicians who have helped the NSA and GCHQ in the past. If you’re one of them, you can add your name to a public declaration: Academics Against Mass Surveillance. All you need to do is this:
email us at info (at) academicsagainstsurveillance.net with your name, academic function and university in the subject line.
Given their pivotal role, I hope that many more mathematicians will want to add their names. (There may be some in the pipeline, along with my own: apparently, the organizers can’t keep up with the volume of emails coming in, citing an “overwhelming response”.)
I’m using the quiet of Christmas to finish writing a book, Basic Category Theory. It’s nothing revolutionary: just a short introduction to the subject, based on courses I’ve taught. But the process of book-writing is demanding and maddening enough that I wanted to take a moment to reflect on why that is — and why you hear authors complain so much.
Put another way, I’m taking a break from the tedium of writing a book to write about the tedium of writing a book. I hope it’s not tedious.
I want to try to articulate why writing a book is so much more painful than writing a paper. This isn’t something I’ve thought through; I’m typing this off the top of my head. But I’ll see if I can gather some reasons.
First, let me confess that I’ve been surprised by just how demanding it’s been. I’ve written one book before, and that was an extremely stressful experience. But this one is a venture of a completely different kind: it’s half the length, it’s a textbook rather than research (and therefore not nearly so personal), I already had what I thought was a nearly-final draft when I approached the publishers this time, and I’d been polishing the notes up, on and off, for the previous twelve years. What’s more, I’m older and, I hope, more able to cope with stress: just as carpenters get calloused hands that make them insensitive to small abrasions, I like to imagine that academics get calloused minds that allow them not to be bothered by small stresses and strains.
So, I went into this aware of the potential stress. I think I successfully removed just about all of it. But what I hadn’t bargained for is that when you remove all the stress, what’s left underneath is… boredom!
Let me qualify that. When you’re writing just about anything, there’s an intensely satisfying period when it’s all coming together. That’s great. But after that, towards the end — and that’s the stage I’m at now — there’s an awful lot of grind.
Let’s start with the obvious. A paper is long if it’s 50 pages; a book is short if it’s 200. But the crucial thing is that the pain does not scale linearly.
For instance, you have to check that you’ve used notation consistently throughout. The time it takes to check this for each piece of notation is proportional to the length of the book — but so too is the number of pieces of notation. So, the time needed to check consistency of notation is proportional to the square of the length.
You also want to make sure you haven’t repeated yourself. (In an earlier draft, I told/reminded the reader what the discrete topology was three separate times.) This amounts to checking that line is not too similar to line for all lines and , which again means that the time you need is proportional to the square of the length.
What’s more, a book feels different from a paper. Books tend to get more publicity, and people engage with them in a different way. My experience is that if I tell a mathematical friend that I’m writing a book, they’re pretty interested; but if I solemnly informed them that I was writing a paper, they’d look at me like I’d told them I had two legs. (I do.) So when you’re writing a book, you know that what you’re doing is likely to come under more scrutiny. This will bring out all the perfectionism in you.
In my case, this was a big effect. I’ve taught master’s-level category theory courses several times, and I had a polished set of Latexed notes for them. I thought it wouldn’t take much effort to turn them into a book. What I hadn’t realized is the extent to which I was speaking to my students — not particular individuals, but the generic Glasgow master’s student that the notes were addressed to. To adapt them for an unknown anyone-in-the-world reader, I’ve needed to examine and undo a lot of assumptions, and that’s taken a lot of work.
On top of all this, there are things you need to do for a book that you don’t need to do for a paper. One of them is indexing, which has to be up there among the most boring tasks in academic life. I actually have a book on indexing (excerpted from the Chicago Manual of Style), which I bought when I was indexing my last book, in Chicago. It’s 65 pages long — and yes, it has an index.
Anyway, I’m happy to say that I’ve very nearly finished. My deadline is 31 December. Although these deadlines seem to be almost infinitely elastic, I intend to meet this one. After that, there will be a whole lot of to-ing and fro-ing with Cambridge University Press, who are publishing it, and I hope it will be on the shelves some time in the middle of 2014.
Limits commute with limits, and colimits commute with colimits, but limits and colimits don’t usually commute with each other — with some notable exceptions. The most famous of these is that in the category of sets, finite limits commute with filtered colimits.
Various other cases of limit-colimit commutation are known. There’s an nLab page listing some. But it seems that quite an easy case has been overlooked.
It came to light earlier this week, when I was visiting Cambridge. Peter Johnstone told me that he’d found a family of new limit-colimit commutations in the category of sets, I asked whether his result could be simplified in a certain way (to involve groups only), and we both realized that it could not only be simplified, but also generalized.
Here it is. Let and be finite groups whose orders are coprime. View them as one-object categories. Then -colimits commute with -limits in the category of sets.
Now here’s the result stated more precisely: first in category-theoretic terms, then purely group-theoretically.
Let and be small categories, and let
be a functor. There’s a canonical map of sets
and the question is whether is a bijection. If the answer is yes for all , we say that limits over commute with colimits over in the category of sets. The statement is that when and are the one-object categories corresponding to finite groups with coprime orders, they do commute.
Here it is again, purely group-theoretically. To translate, we’re going to need the facts that when a group is viewed as a one-object category, a functor from that category into is the same thing as a left action of the group, the limit of such a functor is the set of fixed points of the action, and the colimit is the set of orbits.
Let and be groups, and let be a set equipped with both a left -action and a left -action in such a way that the actions commute: for all , and . Equivalently, has a left action by .
The set of -fixed points has a -action, and we can take the set of orbits. On the other hand, the set of -orbits on has an -action, and we can take the set of fixed points. There’s a canonical map of sets
It’s straightforward to show that is always injective. It’s not always surjective. But the fact is that it’s surjective (and therefore bijective) if and are finite with coprime orders. So then,
The proof is so short that I might as well include it. We have to show that is surjective. Let . Then for some , and we know that is a fixed point of . It’s enough to show that itself is a fixed point of : for then the element of represented by is mapped by to the element of represented by , which is .
So, let . We must show that . Since is a fixed point of , we know that for some . Since the - and -actions on commute, for all integers . But and are coprime, so we can choose an such that
Then and , so , as required.
Although category theorists seem to have overlooked this result, I thought it might be well-known in group theory, so I asked on MathOverflow. No one has yet provided a reference, and the view of the expert group theorist Derek Holt was that “the proof is sufficiently straightforward as not to require a reference”. But what may not have been apparent to group theorists is its categorical significance.
You might ask whether this result can be generalized. In other words, for which pairs of groups do limits over one commute with colimits over the other in ? Although I didn’t ask for it, Will Sawin provided a complete answer — a necessary and sufficient condition. Here it is:
-limits commute with -colimits in if and only if no nontrivial quotient of is isomorphic to a subquotient of .
Of course, this immediately implies the coprime-order case for finite groups. But it applies to other pairs of finite groups too (e.g. when is simple and has smaller order), as well as to some pairs where one or both groups are infinite.
Here’s a new feature of the Café, thanks to our benevolent host Jacques Distler. If you ever want to see how someone has created some mathematical expression on this blog, there’s an easy way to do it.
With Firefox, you simply double-click on the expression. Try it: or or A window should pop up showing the TeX source.
With other browsers, I’m not so sure. Try double-clicking. If that doesn’t work, then, according to Jacques’s instructions, you “bring up the MathJax context-menu for the formula, and choose Show Math As Annotation TeX”. I don’t know how one brings up this menu. Does anyone else know? (Update: right-click in Chrome, Explorer and Opera, and control-click in Safari. Thanks to those who responded.)
Once you’ve made the TeX source appear, you can cut and paste to your heart’s content. Of course, most users here are fluent in LaTeX. But like most math-oriented websites, we use a variant of TeX that’s a little different from standard LaTeX, so this should turn out to be a helpful feature.