Programming with SageMath

This lesson gives an introduction to SageMath. It is …

The lesson will lead the learner along the path …

On this path, the learner will became familiar with:

basic constructions of the programming language,
ways to find necessary information in the system, and
good design practices to organise code into complex programs.

Prerequisites

The lesson is oriented on learners possessing the minimal theoretical background (at least at the undergraduate level) and willing to learn how the concepts from abstract algebra may be explored using computational tools. No previous experience of working with SageMath is required.

Learners need to understand the concepts of files and directories (including home and working directories) and know only how to start SageMath.

Overview

In this Software Carpentry lesson we will explore some of the functionalities of SageMath, exemplified by exploring the mathematical underpinnings of icosahedral virus structure.

Most viruses have an icosahedral protein capsid encapsulating and protecting the genome. The reasons for this are essentially genetic: the nucleic acid (DNA or RNA) needed to encode a protein shell consisting of a single protein would be so large that it would not actually fit inside this protein shell. Thus the protein shell has to be assembled from one or several identical building blocks. The simplest way of assembling identical building blocks into a container is by using symmetry. This means the problem of virus structure is essentially geometric, rather than genetic/biochemical etc, and given by the regular solids. Those are the Platonic solids - we will also be interested in their reflection symmetry groups, and the related concepts of root systems which generate these reflection groups from a collection of vectors (so-called roots or root vectors), which define mirror planes at specific sets of angles to each other.

To start with, we will perform several standard procedures such as defining vectors and matrices in Sage, and then perform multiplication of a vector by a matrix, which are very widely applicable.

We then take a set of vectors as a starting point. Those vectors define mirror planes via the planes that these vectors are normal to. One can thus find a set of matrices that describe reflections in those mirror planes. We then define our first function: it takes a (root) vector as its argument and returns the matrix that describes the reflection in the hyperplane that it is normal to.

These matrices can be multiplied together to generate new matrices. We thus generate a growing list of such matrices by multiplying together more and more of these simple reflections. Python is very powerful for handling lists, and skillfully manipulating lists is thus a very general technique.

The examples we will look at are such that these lists remain finite, i.e. at some point the continued multiplication of matrices stops generating something new. This is the idea of closure: the product of any two matrices in the complete list is contained somewhere in this list. We therefore want to define some new functions: checking whether a given matrix is equal to another; checking whether a given matrix is already contained in the list; and given some starting simple reflections/roots, to generate the whole list.

This list contains all the symmetry transformations of a Platonic solid, and is encapsulated in the mathematical notion of a group: for instance, the icosahedron has 120 different symmetry transformations, i.e. reflections, rotations etc, that leave the icosahedron invariant; these 120 group elements can all be generated by multiplying together three simple reflections given by 3 root vectors.

So this example gives us a set of 120 matrices. We can pick up our earlier example on how these act on vectors. If we feed one vector to these 120 matrices we in general get a set of 120 vectors. This set of vectors can be visualised as a polyhedron in 3D.

We will generate several such polyhedra. We will see that when our seed vector lies on certain symmetry axes, then the cardinality (number of points) of this polyhedron will be lower than 120, since some of the points actually coincide (are degenerate). In this way, we will generate the icosahedron (12 vertices), the dodecahedron (20 vertices) and the icosidodecahedron (30 vertices).

The 120 group elements correspond to various rotations, reflections etc of the polyhedra, that look identical afterwards. There is an identity operation, the 3x3 identity matrix, that leaves everything the same; every operation can be undone by the matrix inverse; we alluded to the fact that the set of matrices closes under matrix multiplication; technically, the final axiom of a group is also satisfied since matrix multiplication is associative. So we construct the list of inverses to our matrices. We also look at what kind of operation the 120 group elements are. E.g. the icosahedron consists of 20 triangles, i.e. each midpoint of a triangle is a centre of 3-fold symmetry: everything looks the same after rotation by 0, 120 or 240 degrees. Similarly, since five triangles meet in each vertex, such vertices are centres of 5-fold symmetry. That is, if one performs this operation five times, one gets back to the identity. We are therefore interested in the power of each of the 120 matrices that gives the identity matrix. This is called the order of each element. We also calculate the determinant (+1=rotation, -1=reflection etc) and the trace (character) of each element, as well as the sets of elements (x,y) that are pairwise conjugate to each other via $x=gyg^{-1}$ for some matrix g. The group theoretical motivation is not crucial here, it is an easy operation to perform, and illustrates a good use of lists with a nice algorithm.

The functions that check equality of matrices, or whether a matrix is already contained in a given set, is usually straightforward for polyhedral symmetries with nice integer values etc. However, the fact that 5-fold symmetry arises in the context of the icosahedral group means that the golden ratio $\tau = 1/2(1+\sqrt(5))$ makes an appearance. This slight awkwardness can be dealt with in several different ways - e.g. using the recurrence relation $\tau^2=\tau+1$, or numerically, or symbolically - which we will explore.

So a virus can build an icosahedral shell out of 20 triangular building blocks - or 60 kite-shaped building blocks with 3 assembling into an equilateral triangle. So genetically, the difficulty for viruses is to encode triangles, not how many to make. So there evolution comes up with something interesting! Rather than making each of the 20 icosahedral faces out of once such triangle, one can use more (say 4) to make larger icosahedra or smaller building blocks. This makes genetic and evolutionary sense for the virus. The number of small triangles per icosahedral face is called the (surprise!) triangulation number (or T-number for short). Not all such numbers are allowed - there is a general construction using a hexagonal lattice that constructs triangulations for T-numbers $T=h^2+hk+k^2$, where $h$ and $k$ are integers. We will also calculate some allowed T-numbers, as well as finding pairs of $h$ and $k$ that give rise to certain T-numbers such as 499. We display polytopes, such as the icosahedron, and a $T=4$ triangulation. We also construct the case of Human Papillomavirus, which has one of the ‘forbidden’ T-numbers, which required the invention of new mathematics to describe virus structure!

Actually, some of the simpler viruses have evolved a very clever way of assembling, that depends on the RNA being there. It serves as a washing line, onto which the building blocks are successively pegged, growing an icosahedron, rather than letting it to assemble on its own. This leads to the mathematical idea of a Hamltonian path or cycle, that visits a certain set of vertices of a polyhedron exacly once. The polyhedra in question are a bit too complicated, but we can look at a simple toy model!

Schedule

	Setup	Download files required for the lesson
00:00	1. First session with SageMath	…
00:30	2. Polynomials and linear algebra	…
01:00	3. Elements of programming and 3D graphics	…
01:30	4. Matrix Groups	…
02:00	5. Rotational Subgroup	…
02:30	6. Orbits	…
03:00	7. Display polytopes	…
03:30	8. IDD	…
04:00	9. Conjugacy Classes	…
04:30	10. Number and Dimensions of Irreducible Representations	…
05:00	11. More advanced Caspar-Klug orbits and pymol visualisation	…
05:30	12. Cartan matrix and Perron-Frobenius eigenvector	…
06:00	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.