The categorical scoop on attributed C-sets
$$
$$
The trouble with C-sets is that they cannot model data structures such as labelled graphs, which are graphs where every edge and vertex has an associated element from some set of labels. In this blog post, we will discuss an extension to C-sets which we call attributed C-sets that solves this issues. This formalism is mainly inspired by the viewpoint of the “Algebraic Databases” paper (Schultz et al. 2016), though in truth we take more after the “databases” part than we do the “algebraic” part.
The double category of profunctors
It is possible to develop attributed C-sets within the 2-category of categories, functors, and natural transformations. However, we gain much in simplicity when we move to a double categorical framework, and it is for this reason that we dive into a bit of abstraction before moving to attributed C-sets.
From a graphical perspective, a double category is a mathematical object where the basic unit of analysis is a square that looks like this:
Although this looks like a commutative diagram, it is in general not. Rather, it is a depiction of a 2-cell, and the morphisms and objects are merely the boundaries of the 2-cell. In an actual diagram, the asterisks and arrows would all be named, but we have left out names for now to emphasize the shape. These squares can be composed both vertically:
and horizontally
A double category can be very succinctly defined as a category in . This is analogous to the definition of a Lie group as a “group in the category of smooth manifolds.” However, it is a little more subtle than that for a couple reasons, so we will spell it out explicitly. A double category is defined by
- A category . We say that an object in this category is an object in (i.e., the corners of a square). A morphism in this category is a vertical morphism in (i.e., the left and right sides of a square).
- A category . An object in this category is a horizontal morphism in (i.e., the top and bottom of the square), and a morphism is a 2-cell, i.e. an entire square. This will make more sense when we introduce the next element. Vertical composition of 2-cells is given by regular composition in (so in particular, it is strictly associative and unital).
- Two functors . On an object of (i.e. a horizontal morphisms), and give the objects of (i.e. objects of ) that are the domain and codomain of . On a morphisms of (i.e. a 2-cell), and give the morphisms of (i.e. vertical morphisms) that are on the left and right of .
- The heart of a double category is horizontal composition. Intuitively, horizontal composition is a functor from the category of “pairs of composable horizontal morphisms” to . This category is the equalizer of and which both go from to . We write it as an equalizer rather than a pullback to emphasize that it is a sub-category of . In any case, we have a functor from this equalizer to , and this functor on objects is horizontal composition of horizontal morphisms, and on morphisms is horizontal composition of 2-cells.
- Finally, we have a functor from to that is a unit for , again up to unique natural isomorphism. This gives the identity for horizontal composition of both horizontal morphisms (when applied to objects of ), and 2-cells (when applied to morphisms of ).
Before we get into profunctors, let’s talk about some simple examples of double categories.
Example 1. Given any category , there is a double category where the 2-cells are commutative squares in . The category is actually a familiar category: the arrow category of . This is a strict double category, in that the natural isomorphisms for associativity and unitality of are actually the identity.
Example 2. Given any 2-category , there is a double category where the 2-cells are squares that commute up to 2-isomorphism. The reader is invited to investigate this double category further.
Example 3. Given any category with pushouts, there is a double category where the horizontal morphisms are cospans in . A 2-cell is a commutative diagram in that looks like this:
and horizontal composition is by pushout. John Baez and Kenny Courser make a very compelling argument that this construction is key to understanding open systems, and we encourage the interested reader to look at the work that they have done in this area (Baez and Courser 2019; Courser 2020).
The double category that we are interested in is called , and its generic 2-cell is illustrated in the following diagram.
, , , and are all categories. and are functors and respectively. and are functors and (also known as profunctors). And finally, is a natural transformation from to . That is, the components of have the form .
What we have said so far should be sufficient for the reader to implement the first three parts of a double category, and we encourage the reader to do so before continuing.
At this point, we have not defined composition or units in . However, we have enough material to define attributed C-sets at this point, so in order to give the reader a break from abstract nonsense, we will talk about them now.
What is an attributed C-set?
If we think about C-sets as databases, their “schemas” are finitely presented categories. Before we talk about attributed C-sets, we must talk about the construct that plays an analogous role. The “schema” for an attributed C-set is a horizontal morphism in whose source is a finitely presented category , and whose target is a finitely presented discrete category . We visualize this in the following way:
Let’s parse this diagram step-by-step. On the left in yellow, we have , and this is a standard way of picturing a finitely-presented category. On the right in blue, we have a discrete category (no arrows). Finally, connecting and , we have two green ovals. These ovals represent the sets and . We picture an element of as an arrow from to , which is consistent with the viewpoint that a profunctor is a heterogenous Hom-functor. We call an element of an attribute.
The reader is likely unfamiliar with the notation because to our knowledge, it is non-standard. A more standard way of writing it would be . Namely, it is the action of the morphism on . We write it like to suggest that we are precomposing with .1
As always, when we are doing computational category theory we must introduce presentations whenever we introduce new mathematical object. In this case, to present a “schema”, we present a category and a discrete category in a standard way, but we also must present a profunctor, and we expect that the reader is not so familiar with this.
Fortunately, it is not so difficult; to present a profunctor we simply give generating elements in each of the component sets (i.e., ), and then we write down some equalities involving precomposition (for instance, if we wanted to say that the decoration on an edge was equal to the decoration on its source, we could write ).
In Catlab, we would write down the presentation of decorated graphs like so:
@present SchDecGraph(FreeSchema) begin
V::Ob
E::Ob
src::Hom(E,V)
tgt::Hom(E,V)
X::AttrType
edec::Attr(E,X)
vdec::Attr(V,X)
end
Note that this presention has no equations. If we did the category of decorated symmetric graphs, we would end up with
@present SchDecSymmetricGraph <: SchDecGraph begin
inv::Hom(E,E)
compose(inv,inv) == id(E)
compose(inv,src) == tgt
compose(inv,tgt) == src
compose(inv,edec) == edec
end
The last equation says that the decoration on an edge should be the same as the decoration on its inverse.
It is perhaps best to understand attributed C-sets by looking at how we implement them in Catlab. Each object in becomes a table. The table for has a column of type Int
for each generating morphism with domain , and a column of some other type for each generating attribute with domain . We will later see exactly how this type is determined.
The columns corresponding to morphisms in are similar to FOREIGN KEY
columns in SQL: they point to rows in other tables. The columns corresponding to attributes are just normal columns, but they might have some constraints corresponding to the equations in the presentation.
Formally, we define an attributed C-set on the schema to be a 2-cell in that looks like this:
Let’s pick apart this definition. First of all, we have an ordinary C-set . This is the “core” of an attributed C-set, and we should understand this fairly well.
Then we have , which gives the attributes types. In this presentation, we have chosen to make the codomain of . However, in Catlab, the codomain of is really the category of Julia types. This is not so well-behaved, but that’s the difference between math and computing for you. Typically, we actually treat as part of the schema. This means that we would be working with -decorated graphs, or -decorated graphs, instead of just graphs decorated by an unspecified type.
Finally, we have , which sends an attribute to a column of the data table for , of type . The profunctor is defined in exactly such a way to make this work: is the set of length- arrays with element type , where denotes the set .2
Another way of thinking about this is that we can treat as a 2-cell from the profunctor to the profunctor . In this guise, sends to an element that represents the column in the data table for . When thought about this way, it becomes more clear how and are analogous even though they seem like they are different shapes.
Back to
Note: for the reader with categorical inclinations, this section will be very interesting. For the reader who is mainly interested in attributed C-sets, we suggest skimming this section and then moving on to the next.
We now know what an attributed C-set is, but in category theory, that’s only half the battle! We must also define the morphisms!
To do this, we need to further investigate . One good way of thinking about profunctors is that they are kind of like matrices (Schultz et al. 2016). That is, if we work with sets instead of categories, a map is a matrix. This analogy goes deeper than one might expect.
In the context of matrices, the following 5 things are all equivalent.
- A map
- A map
- A map
- A linear map .
- A linear map .
It turns out that this is directly analogous to profunctors. A profunctor has the following equivalent forms.
- A functor .
- A functor .
- A functor .
- A colimit-preserving functor .
- A colimit-preserving functor . This is for the same reasons as above.
The equivalence of 1, 2, and 3 are provided by the product-hom adjunction in . However, understanding 4 and 5 takes some work. We explain this by analogy, rather than by rigorous proof. The analogy proceeds in several steps.
- Just as the canonical basis vectors of are in canonical bijection with , the representable functors (i.e. functors of the form for ) are in canonical isomorphism with (by the map , of course!). This is important, because in the next step we will see that the representable functors form a basis of sorts for .
- Just as any element of can be written canonically as a linear combination of basis vectors, any element of can be written canonically as a colimit of representable functors.
- Therefore, just a linear map is uniquely determined by its action on a basis, which is the same thing as a map , a colimit-preserving functor is uniquely determined by its action on the representable functors, which is the same thing as a map .
To go beyond the presentation in terms of analogy, we encourage the reader to consult a standard text on category theory, such as (Riehl 2016), and look up the Yoneda lemma and the writing of a presheaf as a canonical colimit of representables.
The fourth equivalent form is the most useful form for defining the composition of and . That is, simply compose and ! This works because colimits are preserved across the composition. Additionally, it is clear what the identity is: !
However, it is also useful to have an explicit formula for composition and identity in terms of profunctors as functors . This is provided by a funny little colimit called a coend, that looks like this:
Essentially, is the coproduct modulo the equivalence relation that for any , , and ,
This gives a rather neat explanation of the identity
Namely, to make the function , we map to , and to make the function , we map to . The reader can verify that this defines a bijection.
The reader is encouraged to verify that taking the identity on and contorting it into a map gives precisely .
Now, this gives us horizontal composition of horizontal arrows. We still need horizontal composition of 2-cells. However, the seeds of this composition are already sewn in the definition of horizontal composition as a colimit. If and are 2-cells, as depicted in the diagram
then we can use the universal property of the coend to define
That is, for each we have a map , and each element of is represented by at least one for some , so we just apply to that representative, and we get a representative of something in .
With this, we have a full definition of the double category . Before we get back to attributed C-sets, we will quickly discuss one interestesting feature of this category.
The question is, what is a 2-cell with a frame that looks like this:
Given that the top and bottom are the identity for horizontal composition, one might suggestively write this as
This implies that might be a natural transformation between and . Let’s see if that holds up. Define . Suppose that in . Then by naturality of ,
However, this is precisely the naturality condition for in disguise, as , and .
Moreover, if we have a natural transformation , we can define by
Therefore, the 2-category is hiding in !
We now have all the technology we need to finish the definition of attributed C-sets.
The category of attributed C-sets on a schema
A morphism of C-sets is fairly easy to define: it is just a natural transformation between two functors . If and are attributed C-sets on the schema , one natural definition for morphisms between them is such that preserves attributes. This is most clearly seen if we introduce some notation. If and , we define to be . That is, is the element of that assigns to . The reason for this notation is that it reminds us of dot-syntax in programming languages, where swiss.cheese
returns the value of the field cheese
on the object swiss
. Viewing as a row in a database, is the value of column at row .
In any case, if is a natural transformation, we say that it preserves attributes if for all , , , and .
For the reader who was paying attention in the last section, it turns out that this condition can be very succinctly stated categorically as , or in commutative diagram form as
To verify this, one simply needs to chug through the definition of and the definition of . This defines the category of attributed C-sets over a schema as a sort of double category-theoretic analogue of a slice category.
The only thing remaining is to name this category! We have chosen to name it , which can be shortened to if and are clear from the context.
However, this is not the only possible definition for a category with attributed C-sets as objects; there are other types of morphisms that one might want to consider. For instance, we could let a morphism be a commutative diagram of the form
Here would be a natural transformation between , so would be an analogous 2-cell to . In this type of morphism, attributes are allowed to change, but they need to be related by a natural transformation. We are less likely to use this kind of morphism in Catlab for the simple reason that Julia functions don’t serialize, and Julia functions are the homs in our implementation of . However, it is an equally valid definition. A special case of this would be .
Both of these categories are instances of a more general construction which takes slice categories into the double-category realm. The properties and full definition of this more general construction have yet to be determined, but we will likely cover this in a future blog post.
Variations and extensions
A simple variation on attributed C-sets would be to replace with the category of finite sets and partial maps, or replace with the profunctor that takes to the set of partial maps from to . The first variation is actually in Catlab, because it turned out to be convenient.
However, there are also many other possiblilities. For instance, we could allow to have finite products, and force to be product-preserving, or replace with . One open problem is how to specify that some attributes or morphisms should end up being injective. It is clear that we have just scratched the surface in terms of things that fit this pattern, and we are excited to see what the future has in store for attributed things!
References
Footnotes
One way of thinking about this is by analogy to group actions on a set. If and are single-element categories with all invertible morphisms (i.e. groups), then a profunctor is precisely a set with a left action of and a right action of .↩︎
In an actual implementation, we do not store the column for all of the attributes, just the generating attributes. This saves space.↩︎