May contain traces of linear algebra

The trace of a linear map is  as useful as it is enigmatic. What is the trace really? Why should adding the diagonal elements of any representation matrix of a linear endomorphism (map of a space to itself) give something that is invariant under change of basis and interacts nicely with matrix-multiplication?

Definition

The shortest definition of the trace is, unsurprisingly, the algebraic one. Take a linear function $\varphi : \mathbb{C}^n \rightarrow \mathbb{C}^n$. Then $\varphi$ has eigenvalues corresponding to the zeros of its characteristic polynomial 1. If we multiply each eigenvalue by its multiplicity and then add everything up we get the trace of $\varphi$, denoted by $tr(\varphi)$.

A different definition is to let $A$ be the matrix representation of $\varphi$ , defining the trace to be the sum of its diagonal entries. Clearly the two definitions are equivalent, and hence the trace is invariant under similarity transformations. Nice. You should shuddered at the use of clearly, but we can prove it with a bit of effort:

The  characteristic polynomial of $\varphi$ is a polynomial, so we can factor it and write $\chi_\varphi (\lambda) = \Pi_{i = 1}^n ( \lambda - \lambda_i )$, where the $\lambda_i$ are eigenvalues of $\varphi$ including the multiplicity. Looking at the $\lambda^{n-1}$ term of this polynomial more closely reveals that it is equal to $-tr(\varphi)$. However, the determinant can also be written as a sum over products of $n$ elements where the row and column of an element in such a product may appear only once (and each product is weighted by +1 or -1). As $\lambda$ appears only on the diagonal of the matrix, the only terms contributing to the $\lambda^{n-1}$ term of the characteristic polynomial must be those that contain $n-1$ diagonal entries, but by the “each row and each column may occur only once” the only contribution that matters must be the term $\Pi_{i=1}^n (\lambda - d_i)$, where the diagonals are given by $d_i$. Extracting the $\lambda^{n-1}$ term of this expression proves that $tr(\varphi) = \sum_{i=1}^n d_i$.

A geometric picture

I find the algebra above un-intuitive and unsatisfying. Recall that the determinant of $\varphi$ has a nice geometric property as it tells us how much $\varphi$ expands the volume of objects. Luckily, there is a similar expression for the trace!

The linear map $\varphi$ can be associated with the vector field $x \mapsto \varphi(x)$. This vector field, in turn can be associated with the ordinary differential equation $x'(t) = \varphi(x)$, and given $x(0)$ we write $\exp(t\varphi)(x_0)$ for the solution of this ODE at time $t$. We may wonder how much the volume of an object changes under applications of $\exp(t\varphi)$. Let’s write $v(t)$ for the volume of some object that “flows” with the vector field above, where $v(0) = 1$. With a bit of effort, we find that $v'(t) = tr(\varphi)v(t)$ (the divergence of our vector field is the trace of $\varphi$, and the divergence measures ‘infinititessimal’ mass gain/loss), and so $v(t) = \exp(t\cdot tr(\varphi))$. Evaluating at $t=1$, we recover the formula $det(\exp(\varphi)) = \exp(tr(\varphi))$ – after all, $det(\exp(\varphi))$ is the volume change of the whole map.

The cyclic property

We can look at $tr(AB)$ for two matrices $A$ and $B$. Here we have (with the ‘sum of diagonals property’) that $tr(AB) = \sum_{i,j=1}^n A_{i,j}B_{j,i}$. So clearly $tr(AB) = tr(BA)$, and this is very nice. What this means is that we have some kind of invariant that doesn’t care about the order that two matrices are multiplied.  Unfortunately this only works for two matrices, but using the associativity of matrix multiplication means the result is still saved for cyclic permutations. Thus $tr(ABC)$ is not in general the same as $tr(BAC)$, but it is equal to $tr(BCA)$.

Though easy to prove algebraically, this is really weird in terms of the geometric picture above – what on earth is “flow with $AB$“?

1. The characteristic polynomial of a linear map $\varphi: V \rightarrow V$ is given by $\chi_f(\lambda) = det( \lambda I - \varphi)$ where $I$ is the identity on $V$. To take the determinant of such an abstract linear maps, just take the determinant of any representation matrix of $f$ determinants are invariant under change of basis, so it doesn’t matter which one (as long as you use the same basis ‘on both sides’ for the representation matrix).