The trace of a linear map is as useful as it is enigmatic. What is the trace really? Why should adding the diagonal elements of any representation matrix of a linear endomorphism (map of a space to itself) give something that is invariant under change of basis and interacts nicely with matrix-multiplication?
Definition
The shortest definition of the trace is, unsurprisingly, the algebraic one. Take a linear function . Then
has eigenvalues corresponding to the zeros of its characteristic polynomial 1. If we multiply each eigenvalue by its multiplicity and then add everything up we get the trace of
, denoted by
.
A different definition is to let be the matrix representation of
, defining the trace to be the sum of its diagonal entries. Clearly the two definitions are equivalent, and hence the trace is invariant under similarity transformations. Nice. You should shuddered at the use of clearly, but we can prove it with a bit of effort:
The characteristic polynomial of is a polynomial, so we can factor it and write
, where the
are eigenvalues of
including the multiplicity. Looking at the
term of this polynomial more closely reveals that it is equal to
. However, the determinant can also be written as a sum over products of
elements where the row and column of an element in such a product may appear only once (and each product is weighted by +1 or -1). As
appears only on the diagonal of the matrix, the only terms contributing to the
term of the characteristic polynomial must be those that contain
diagonal entries, but by the “each row and each column may occur only once” the only contribution that matters must be the term
, where the diagonals are given by
. Extracting the
term of this expression proves that
.
A geometric picture
I find the algebra above un-intuitive and unsatisfying. Recall that the determinant of has a nice geometric property as it tells us how much
expands the volume of objects. Luckily, there is a similar expression for the trace!
The linear map can be associated with the vector field
. This vector field, in turn can be associated with the ordinary differential equation
, and given
we write
for the solution of this ODE at time
. We may wonder how much the volume of an object changes under applications of
. Let’s write
for the volume of some object that “flows” with the vector field above, where
. With a bit of effort, we find that
(the divergence of our vector field is the trace of
, and the divergence measures ‘infinititessimal’ mass gain/loss), and so
. Evaluating at
, we recover the formula
– after all,
is the volume change of the whole map.
The cyclic property
We can look at for two matrices
and
. Here we have (with the ‘sum of diagonals property’) that
. So clearly
, and this is very nice. What this means is that we have some kind of invariant that doesn’t care about the order that two matrices are multiplied. Unfortunately this only works for two matrices, but using the associativity of matrix multiplication means the result is still saved for cyclic permutations. Thus
is not in general the same as
, but it is equal to
.
Though easy to prove algebraically, this is really weird in terms of the geometric picture above – what on earth is “flow with “?
Further reading
I’m quite unhappy with the lack of a good answer to the last question, but this post has been sitting in my “drafts” folder for over a year now so I will tentatively publish it, especially given that the material is so basic. There’s a lot more that can be said about the trace, especially if one goes into the world of differential geometry. I haven’t seen the cyclic property of the trace pop up much there, but possibly it is related to certain “almost-commutivity” properties of (pseudo)-differential operators. I will probably amend this post in the future.
Here is a nice mathoverflow question, which was the basis for most of this post.
- The characteristic polynomial of a linear map
is given by
where
is the identity on
. To take the determinant of such an abstract linear maps, just take the determinant of any representation matrix of $f$ determinants are invariant under change of basis, so it doesn’t matter which one (as long as you use the same basis ‘on both sides’ for the representation matrix). ↩