The trace of a linear map is as useful as it is enigmatic. What is the trace *really*? Why should adding the diagonal elements of any representation matrix of a linear endomorphism (map of a space to itself) give something that is invariant under change of basis *and* interacts nicely with matrix-multiplication?

**Definition**

The shortest definition of the trace is, unsurprisingly, the algebraic one. Take a linear function . Then has eigenvalues corresponding to the zeros of its characteristic polynomial ^{1}. If we multiply each eigenvalue by its multiplicity and then add everything up we get the trace of , denoted by .

A different definition is to let be the matrix representation of , defining the trace to be the sum of its diagonal entries. *Clearly* the two definitions are equivalent, and hence the trace is invariant under similarity transformations. Nice. You should shuddered at the use of *clearly*, but we can prove it with a bit of effort:

The characteristic polynomial of is a polynomial, so we can factor it and write , where the are eigenvalues of including the multiplicity. Looking at the term of this polynomial more closely reveals that it is equal to . However, the determinant can also be written as a sum over products of elements where the row and column of an element in such a product may appear only once (and each product is weighted by +1 or -1). As appears only on the diagonal of the matrix, the only terms contributing to the term of the characteristic polynomial must be those that contain diagonal entries, but by the “each row and each column may occur only once” the only contribution that matters must be the term , where the diagonals are given by . Extracting the term of this expression proves that .

**A geometric picture**

I find the algebra above un-intuitive and unsatisfying. Recall that the determinant of has a nice geometric property as it tells us how much expands the volume of objects. Luckily, there is a similar expression for the trace!

The linear map can be associated with the vector field . This vector field, in turn can be associated with the ordinary differential equation , and given we write for the solution of this ODE at time . We may wonder how much the volume of an object changes under applications of . Let’s write for the volume of some object that “flows” with the vector field above, where . With a bit of effort, we find that (the divergence of our vector field is the trace of , and the divergence measures ‘infinititessimal’ mass gain/loss), and so . Evaluating at , we recover the formula – after all, is the volume change of the whole map.

**The cyclic property**

We can look at for two matrices and . Here we have (with the ‘sum of diagonals property’) that . So clearly , and this is very nice. What this means is that we have some kind of invariant that doesn’t care about the order that two matrices are multiplied. Unfortunately this only works for two matrices, but using the associativity of matrix multiplication means the result is still saved for cyclic permutations. Thus is not in general the same as , but it is equal to .

Though easy to prove algebraically, this is really weird in terms of the geometric picture above – what on earth is “flow with “?

**Further reading**

I’m quite unhappy with the lack of a good answer to the last question, but this post has been sitting in my “drafts” folder for over a year now so I will tentatively publish it, especially given that the material is so basic. There’s a lot more that can be said about the trace, especially if one goes into the world of differential geometry. I haven’t seen the cyclic property of the trace pop up much there, but possibly it is related to certain “almost-commutivity” properties of (pseudo)-differential operators. I will probably amend this post in the future.

Here is a nice mathoverflow question, which was the basis for most of this post.

- The characteristic polynomial of a linear map is given by where is the identity on . To take the determinant of such an abstract linear maps, just take the determinant of any representation matrix of $f$ determinants are invariant under change of basis, so it doesn’t matter which one (as long as you use the same basis ‘on both sides’ for the representation matrix). ↩