May contain traces of linear algebra

The trace of a linear map is  as useful as it is enigmatic. What is the trace really? Why should adding the diagonal elements of any representation matrix of a linear endomorphism (map of a space to itself) give something that is invariant under change of basis and interacts nicely with matrix-multiplication?


The shortest definition of the trace is, unsurprisingly, the algebraic one. Take a linear function \varphi : \mathbb{C}^n \rightarrow \mathbb{C}^n . Then \varphi has eigenvalues corresponding to the zeros of its characteristic polynomial 1. If we multiply each eigenvalue by its multiplicity and then add everything up we get the trace of \varphi, denoted by tr(\varphi).

A different definition is to let A be the matrix representation of \varphi , defining the trace to be the sum of its diagonal entries. Clearly the two definitions are equivalent, and hence the trace is invariant under similarity transformations. Nice. You should shuddered at the use of clearly, but we can prove it with a bit of effort:

The  characteristic polynomial of \varphi is a polynomial, so we can factor it and write \chi_\varphi (\lambda) = \Pi_{i = 1}^n ( \lambda - \lambda_i ), where the \lambda_i are eigenvalues of \varphi including the multiplicity. Looking at the \lambda^{n-1} term of this polynomial more closely reveals that it is equal to -tr(\varphi). However, the determinant can also be written as a sum over products of n elements where the row and column of an element in such a product may appear only once (and each product is weighted by +1 or -1). As \lambda appears only on the diagonal of the matrix, the only terms contributing to the \lambda^{n-1} term of the characteristic polynomial must be those that contain n-1 diagonal entries, but by the “each row and each column may occur only once” the only contribution that matters must be the term \Pi_{i=1}^n (\lambda - d_i), where the diagonals are given by d_i. Extracting the \lambda^{n-1} term of this expression proves that tr(\varphi) = \sum_{i=1}^n d_i.

A geometric picture

I find the algebra above un-intuitive and unsatisfying. Recall that the determinant of \varphi has a nice geometric property as it tells us how much \varphi expands the volume of objects. Luckily, there is a similar expression for the trace!

The linear map \varphi can be associated with the vector field x \mapsto \varphi(x). This vector field, in turn can be associated with the ordinary differential equation x'(t) = \varphi(x), and given x(0) we write \exp(t\varphi)(x_0) for the solution of this ODE at time t. We may wonder how much the volume of an object changes under applications of \exp(t\varphi). Let’s write v(t) for the volume of some object that “flows” with the vector field above, where v(0) = 1. With a bit of effort, we find that v'(t) = tr(\varphi)v(t) (the divergence of our vector field is the trace of \varphi, and the divergence measures ‘infinititessimal’ mass gain/loss), and so v(t) = \exp(t\cdot tr(\varphi)). Evaluating at t=1, we recover the formula det(\exp(\varphi)) = \exp(tr(\varphi)) – after all, det(\exp(\varphi)) is the volume change of the whole map.

The cyclic property

We can look at tr(AB) for two matrices A and B. Here we have (with the ‘sum of diagonals property’) that tr(AB) = \sum_{i,j=1}^n A_{i,j}B_{j,i}. So clearly tr(AB) = tr(BA), and this is very nice. What this means is that we have some kind of invariant that doesn’t care about the order that two matrices are multiplied.  Unfortunately this only works for two matrices, but using the associativity of matrix multiplication means the result is still saved for cyclic permutations. Thus tr(ABC) is not in general the same as tr(BAC), but it is equal to tr(BCA).

Though easy to prove algebraically, this is really weird in terms of the geometric picture above – what on earth is “flow with AB“?

Further reading

I’m quite unhappy with the lack of a good answer to the last question, but this post has been sitting in my “drafts” folder for over a year now so I will tentatively publish it, especially given that the material is so basic. There’s a lot more that can be said about the trace, especially if one goes into the world of differential geometry. I haven’t seen the cyclic property of the trace pop up much there, but possibly it is related to certain “almost-commutivity” properties of (pseudo)-differential operators. I will probably amend this post in the future.

Here is a nice mathoverflow question, which was the basis for most of this post.

  1. The characteristic polynomial of a linear map \varphi: V \rightarrow V is given by \chi_f(\lambda) = det( \lambda I - \varphi) where I is the identity on V. To take the determinant of such an abstract linear maps, just take the determinant of any representation matrix of $f$ determinants are invariant under change of basis, so it doesn’t matter which one (as long as you use the same basis ‘on both sides’ for the representation matrix).

Knowledge as the Only Modern Value


Source, though really it’s just John 8:32

This is a commentary on Lou Keep’s piece on HyperNormalisation.

My aim is to repeat what he says, and this warrants an explanation as Lou is an excellent writer. I’ll save that for a future blog post, but the short version is that writing about it forces me to actually understand his argument and condense it into something aiming to be clear, concise, and without words like “jeremiad”.  I apologize to Lou in advance for disfiguring his piece past the point of recognition:

HyperNormalisation – Now in Technicolor

The  BBC documentary of the same name is less important.  What is important is this argument which you may recognize from elsewhere: people in the modern world, are being fed false facts [by the media]. This causes them to be complacent [as opposed to revolting].  Lou’s 5,000 word essay looks at this statement (from now on “statement” in bold), and uses it to make a point. This point is (more or less) that modern society ignores the is/ought problem and acts like knowing facts is sufficient for doing; when in reality different people respond to different facts in different ways depending on their values. Lou uses the word “truth”, but I’ll stick with “facts” as that is what is meant, and the word “truth” has historically been seen as  something distinct.

The Obvious

Every good piece of writing tends to have arguments you already know about and agree with. For me, these were the following. Firstly, from the you’re-not-stuck-in-traffic-you-are-traffic-department:

I would say that’s a neat trick, “Look over there at that media, not we media”, but it’s not a trick. I think he actually believes it, as do other members of the media. This is terrifying

Then, from the there-is-nothing-new-under-the-sun-department:

Julius Caesar was reinterpreted as a Deity, and prayed to as such. How are we to interpret this if “lie becoming truth” is characteristic of modernity?

Both of these points are easy to understand, and there’s nothing groundbreaking about them. They are more or less consistent with the statement, and not the main point of Lou’s piece, because the main point is

The Is/Ought Problem

People making the statement assume that falsehoods cause complacency. The underlying assumption is that if people knew the facts, then they would act differently.  But knowing something (Is) doesn’t imply action (Ought):

This assumes that “truth” has some kind of power. I mean, if lies do, then truth definitely does. Use truth in exchange, enough of it will slay the demon […] Truth, a rote pile of facts and neato information, results in nothing.

A specific example:

The fact that 18% of Americans think the sun moves around the earth has no motive force behind it. What do you do with it? 82% of you will mock the dumbasses, and 18% will not get why they’re being mocked. Those are different responses, in case you weren’t aware of that, i.e. this simple truth doesn’t have any inherent action underlying it.

Or a corollary: if you’re told what action someone takes, then that doesn’t tell you what facts they know (and vice versa). Lou’s point is that in modern discourse, people making statement don’t get this, and incorrectly assume that falsehood is the only possible reason for complacency. This makes a lot of things that previously made less sense to me make more sense.

I have good news for anyone who comes  across an “inconvenient truth” and bad news for those hoping to spread them: none of them mean anything.

What you call “truth”, i.e. a bushel of factoids, leafed together solely with the pithy twine of your self-regard, doesn’t do anything. It doesn’t make people act, it doesn’t make them think. Assuming that it does is madness, as though properly manipulating a syllogism will finally make “change” “occur”.

If “truth” dictates action, and if people don’t act how you think they would  if they had the truth, then:


Step one: Truth makes people act (how I want them to).
Step two: But the people are not acting (how I want them to).
Step three: They must not have the truth, because of […].

This is interesting, and I think is related to how people don’t realize how diverse people’s thoughts (and values) can be. Lou ties this to that other modern pathology – narcissism – and of course to nihilism:

Nihilism is the period at which our highest values become unsustainable. It doesn’t look like bombs and leather jackets. It looks like ashen-faced, Serious Men puking trivialities and staring slack-jawed when this fails to provoke anything.

I’m not sure I agree, more on that below. But there’s still the question about whether or not the manipulation part of the statement itself is true or not.

You need someone so good at lying and distorting that they can annihilate the entirety of the internet, and of public education, and of…

But if we disregard that, and assume facts really were misrepresented on a wide scale, who would be easiest to fool?

Educated people are more susceptible to manipulation by the media

The problem with elites is that they’re smarter than the average rube, and they know it, which is why they’ll never get the point. They’re smarter because they do read the journals the periodicals and the magazines. They’re “informed”. But being informed means no filter, i.e. direct from the prop machine. Which means that they are prime propaganda territory, not Joe the Plumber.

Educated people who are informed get their propaganda straight from the source – the media. Joe the Plumber gets the trickle-down version from a wider variety of sources including coworkers, friends, family, etc.

I like this argument, and it has a Chesterton-like feel to it (I suspect Lou has read Orthodoxy), but at the same time I think it is only partially true – people who think tend to be educated [citation needed],  and people who think may be less susceptible to manipulation by the media [citation needed], which would reduce the susceptibility to manipulation of the educated in an average sense. Lou ignores the question of whether education may be correlated with ability to not be manipulated, which is a shame because this is  the standard argument against what he writes.

Some Comments

In my opinion, Lou makes some very good points. But I wish he had said “facts” instead of truth, as this conflation of the two is really what his argument is about (which Lou acknowledges).

1. “Truth” here is considered as a series of facts. This is the common conception of truth, and the one we’re examining, so that’s how I’ll use the word in this essay. Heidegger BTFO until I can make my point.

If this conflation isn’t made, we can throw away the notion that this has something to do with modernity – truth in a more complicated sense has been seen as a value from ancient times (some examples1, also the related aphorism “knowledge is useless unless it leads to wisdom”, etc..), but in pre-modern times people were more happy to speak about objective values or truth in a more mystic sense which completely changes the relationship between truth and action. Maybe the modern view of truth is closer to it being a series of facts, but I don’t think this is entirely the case – there’s always a moral connotation to “truth”, and moral connotation implies values, which Lou wants to keep separate.

Also I don’t quite get how this ties to nihilism: assuming that facts imply action to me assumes objective values which is more or less the opposite of nihilism. Nihilism is not “Serious men puking trivialities and staring slack-jawed when this fails to provoke anything”, nihilism is if people say valuable things but this fails to provoke anything. The over-reliance on truth as a value shows that modern society is less nihilistic in the sense that those making the statement believe in objective values. The problem seems to be that they don’t realize people don’t have uniform values. But probably Lou uses a different definition of nihilism with with this makes more sense.

These didn’t really fit in anywhere above:

People are more consistent than we like to think, they just don’t show their work.

The easy critique of “speaking truth to power” is that power already knows the truth, they just don’t care

  1. In Christianity, there’s Jesus’ “I am the way, the truth and the life” and associated “And you shall know the truth and the truth shall set you free”. In Islam, “The Truth” is one of the names of GodConfucius: “The object of the superior man is truth.” 

First signs of life

The sign (or parity) of a permutation is a group-homomorphism from S_n to $latex S_2  [^1] that appears in the definition of the determinant. Proving that the sign defines a group-homomorphism is not difficult,  but the (very short) standard proof[^2] is fairly unintuitive. Therefore


This post describes a more visual proof of the fact that the sign of a permutation is a homomorphism and gives some interesting facts relating to the sign.

Permutations – a visual description

Let \pi \in S_n be a permutation. Then we can write \pi explicitly using two line notation, for example \pi=\left(\begin{matrix} 1 & 2 & 3 & 4 & 5 \\ 2 & 1 & 5 & 3 & 4\end{matrix}\right) is the permutation that sends 1 to 2, and 2 to 1, 3 to 5 and so on.

The parity, or sign of a permutation is defined as sgn(\pi) := (-1)^{N(\pi)} where  (-1) is the non-identity element in S_2 (it is easy to see that S_2 has two elements, one of which is the identity, denoted by 1{\ \ } ) and N(\pi) := |\{a,b \in \mathbb{N} : 1 \leq a < b \leq n, \pi(a) > \pi(b) \}|. Basically sgn(\pi) looks at whether the number of inversions in \pi is even or odd. A nice way of visualising permuations is by drawing which elements get sent where. In this way, the permutation \pi corresponds to the following picture:


Crossings and the sign

The number of lines that cross1 gives the number of a < b so that \pi(a) > \pi(b) and this number is N(\pi).  Whether or not this number is even or odd determines the sign. In this case, we see immediately that sgn(\pi) = -1. Deforming a single line can change the number of crossings, but (provided that each crossing is proper and no more than two lines cross at a point) doing so introduces/removes an even number of crossings so the sign is well-defined.

The graphical representation (called picture for this post) also tells us that the sign of the identity is 1 and that inverting an element does not change the sign (just flip the picture).

Compositions of permutations can be drawn graphically :


The idea is that if we “deform” the black lines into the blue lines, we can only get rid of even numbers of crossings in the process. If you look at the picture above long enough this should be clear.

From this fact, it follows that sgn is a homomorphism. For if we draw the pictures of \sigma and \tau \in S_n over each other, we obtain a picture of \tau \circ \sigma. The total number of crossings is the number of crossings in \tau plus the number of crossings in \sigma. Calling C(\tau) the number of crossings in this picture of \tau (and similarly for C(\sigma) and C(\tau + \sigma) we have that (-1)^{C(\tau \circ \sigma)} = (-1)^{C(\tau) + C(\sigma)} = (-1)^{C(\tau)}(-1)^{C(\sigma)} which finishes the proof.

A more formal way of phrasing this is that if \{a,b\} is an inversion in \tau \circ \sigma (i.e. taking a < b then (\tau \circ \sigma)(a) > (\tau \circ \sigma)(b)), then either it is an inversion in  \sigma or \{\sigma(a),\sigma(b)\} is an inversion in \tau, but not both. If both are inversions, or neither of them is, then  \{a,b\} is not inversion in \tau \circ \sigma. Hence the parity of the number of inversions in \tau \circ \sigma is the sum (modulo 2) of the number of inversions in \sigma and \tau.

Signs in the wild

Apart from being used in the formula for calculating determinants, the sign of a permutation is also useful in other contexts. For example, for every S_n we can define A_n :=\mathop{Ker}(sgn) as the alternating group over n elements. Because sgn is a homomorphism it follows that A_n is normal subgroup. For n \geq 5 it can be shown that A_n is the only nontrivial2 normal subgroup of S_n.

Permutations also are used to define orientations of objects in differential geometry and algebraic topology. Here it is useful to say that the triangle with vertices \pi(u_1),\pi(u_2),\pi(u_3) is  sgn(\pi) times the triangle with vertices u_1,u_2,u_3, where \pi \in S_3. The vertices of both triangles are of course the same, but they are treated as different objects depending on how their vertices are ordered.

Lastly, looking at a permutation in  one line notation is fairly clear that  sgn( (1\dots k )) = (-1)^{k - 1} by observing that one line crosses all of the other ones. Knowing that elements with same cycle type are conjugate and that sgn is a homomorphism, this gives the following formula if \pi is composed of k disjoint cycles of length r_1 \dots r_k:

sgn(\pi) = \Pi_{i = 1}^k (-1)^{r_i -1}

  1. We need to arrange the objects so that no more than 2 lines cross at one point. 
  2. A normal subgroup is a subgroup that is the kernel of a homomorphism. A subgroup H of a group G is trivial if H = G or |H| = 1

Hello, and welcome to this blog

There are many ways to start a blog, and I have decided to choose the one more travelled by.

What this blog is about

I keep my personal life and blog separate.  There are few other constraints to what I am willing to blog about – anything I see as a positive contribution to the Internet is fair game. A large portion of this blog will be devoted to mathematics, where I try to give  explicit and in some way “natural” proofs of things I find interesting. Apart from mathematics, I’ll be blogging about various other issues that I think I know enough about to be able to say something of value. These areas might include philosophy (though hopefully not very much of it), politics, religion, literature and the sciences. Basically:

Come for the mathematics, stay for all the other interesting stuff.

Because I want to maintain the quality of the posts herein, I will not blog very often.

Who I am

Existentially speaking – who knows? Materially speaking –  I study mathematics in a first-world country, and as far as the internet is concerned, Matty Wacksen is my real name.  Mathematics plays a much smaller part in my life than in this blog, which is one of the reasons why I will not be posting very often.

 ‘Categorical Observations’

Feel free to take the title literally.

The categorical point of view that focuses less on the objects in a category, and more on the arrows between them will hopefully feature in most mathematical posts.

I edit [read delete redundant parts of] posts after re-reading them. If anyone ever starts reading this blog I might start leaving links to old versions up.