Felix Breuer's Blog

OMeta in Qute - a first experiment

It has been quiet around Qute, my experimental editor project, for quite some time since I announced a rewrite of the code. Well, to be honest, the rewrite has not started yet. But I did get around to do some experimenting with the feature I am most excited about: The support for user-defined custom markup languages within Qute.

I will write more about the rationale behind this feature when it is more polished. Right now, I just want to give you a little heads-up on the latest rough-around-the-edges feature in Qute 0.4.1.

To define a custom markup language, you include an OMeta grammar right inside your document. Such a grammar might look like this.

An OMeta grammar in Qute

You can then tell Qute that any paragraph in your document is written in the custom language you just defined. Then Qute will transform that paragraph according to the rules specified in the grammar. In this example, we calculate the number 42 by way of OMeta.

Using the custom language in Qute

Qute 0.4.1 even supports some rudimentary error reporting.

An OMeta parse error in Qute

If you want to try this out for yourself, grab a copy of Qute 0.4.1 from here.

The next milestone is to make this do something interesting. For example, an implementation of Markdown in OMeta would allow the user to easily extend Markdown to suit his or her needs. This is something that I have been missing sorely while writing mathematics with Qute.

Feldenkrais’ Spontaneous Action and Laozi’s Wuwei

It has been out for a while already, but now I finally get around to announcing it here: my first venture into philosophy.

Moshé Feldenkrais’ work and Laozi’s classic, the Daode Jing, have both had a big effect on me and I have always been struck by how alike in spirit these (at first glance) very different philosophies are. In the summer of 2010, I heard a great talk by Steve Palmquist about parallels between the Daode Jing and Kant’s philosophy and this experience gave me the confidence to sit down and work out my own ideas on the relationship between Feldenkrais and Laozi.

The result is an article that has just been published by the Journal of Daoist Studies. You can

download a PDF of my article right here

or you can purchase an ebook or a printed copy of the entire journal from Lulu.

Not only beyond Journals, not only beyond Papers. Beyond Theorems.

The mathematical community is discussing how we can leave the journal publishing model behind us. However, putting journals online, making them open access or even leaving them behind us entirely will not solve the challenges the mathematical sciences face in the next couple of decades. A couple of weeks ago, Peter Krautzberger captured this point of view with the simple and beautiful slogan:

Not beyond Journals, beyond Papers.

I agree! And I would like to go one step further. We have to go beyond journals. We have to go beyond papers. And we have to go beyond that which we, as mathematicians, hold dearest:

We have to go beyond theorems!

In this post, I want to make an argument why, from my point of view. This is going to be a long post, but I will try not to rant too much.

The number of new results per year is increasing

The number of mathematical articles - and theorems! - published each year has increased dramatically in the last decades. To give you just one piece of evidence, here is a graph of the number of new research articles entered in the Zentralblatt MATH index each year. This figure is taken from Bernd Wegner’s article in the June 2011 newsletter of the European Mathematical Society.

ZMATH publication by year

This trend will continue. The human population on earth is increasing and (on average) becoming more prosperous. So, more people will pursue an academic career and more people will become research mathematicians. And, because new theorems are the currency in which mathematicians currently measure research activity, this means that the number of new (!) theorems proven each year will continue to increase dramatically - unless we change our way of thinking about research.

This increase is a loss for everyone

There are roughly 100,000 mathematical articles on the Zentralblatt index that have been published in 2010. One hundred thousand articles, of which the majority contain (supposedly) new results. I’d go so far as to say that most mathematicians cannot read and properly digest more than 100 articles per year. (For me personally, the number is certainly lower than that.)

This means that most mathematicians cannot even make use of 0.1% of the new results published each year.

This must have at least one of the following two consequences. In my opinion, it has both.

Mathematical research is becoming more and more specific.

Mathematicians have a great talent for inventing ever more particular fields and sub-fields and sub-sub-fields in which we can discover ever more new theorems. If this strategy is successful, then these theorems are truly new and they have never come up in another area before. The flip side of the coin is of course that the majority of these very particular theorems will simply not be relevant for most mathematicians (and they will be utterly irrelevant for the rest of the world). A favorite objection to this argument is that, a couple of decades down the road, these results will become relevant in other fields and that results from today will be made use of then. But how is this supposed to happen if mathematicians can only read less than 0.1% of what is published this year? It is much more likely that mathematicians who, 20 years later, arrive at an equivalent problem in another field will simply solve it again on their own. Which brings us to the other consequence.

Mathematical research is becoming more and more redundant.

Independent rediscoveries of theorems happen all the time. They may happen more or less simultaneously. Or they may happen a couple of years apart. Or they may happen in entirely different fields, in which case it may take a while for people to notice that these results are, in fact, equivalent. Whatever form these redundancies take, their number is going to increase.

One can make the argument, that this is no problem at all. Science is redundant. So if all a rediscovery achieves is to make an old result popular in a different time or a different area, then so be it! But this clear eyed view of the world goes against the grain of our current self-image as mathematicians, where only original (never-seen-before) theorems count.

Also, redundancy becomes a real problem in conjunction with increasing specialization. What is the point of doing foundational research in a narrow area of specialization today, if the people who may need this stuff twenty years down the road have no means of discovering it? The only insurance against this type of disaster is to focus on advertising our small corner of mathematics, in the hope that somebody will remember it when the time comes. But again, our mathematical community is not built on this socio-dynamic view of research.

In the end, both trends, increasing specialization and increasing redundance, have the same effect:

Mathematical research is becoming increasingly irrelevant.

Note that I am not saying that mathematics is becoming increasingly irrelevant! There will always be seminal theorems that have a far reaching impact on both mathematics and the real world. But the work of the average research mathematician is going to become increasingly irrelevant.

Unless we finally stop to believe that the only valid form of “research” in mathematics is proving new theorems.

Beyond theorems

I have tried to argue above that the business of churning out as many new theorems as possible is not in the best interest of both mathematicians and mathematics. The two causes that I see as central in this regard are the following.

  • We as a community view only new theorems as research. (Thus we create incentives to produce an unnecessarily large inflation of theorems.)
  • Our current methods of discovering theorems do not scale with the increase in theorem production.

Thus the best way out of this situation is to acknowledge other forms of mathematical activity as research, not only the proving of new theorems. In particular, activities that help in the discovery of mathematical ideas should be promoted. Here are a couple of suggestions:

Communicating mathematics. This includes all forms of communication, by means of talks, videos, games and of course writing in blog posts, web forums (such as mathoverflow), survey articles and books. This should also include all levels of writing, from expository work for students to references for experts. Finally, this should also include all audiences, from the interested public, to scientists outside mathematics, to mathematicians in other fields and finally specialists in your area. All of these forms of communicating mathematics are widely recognized in the mathematical community. But they are not recognized as research. I argue that mathematicians should get credit as researchers for communicating mathematics, because a larger part of every mathematician’s work will have to be about communication if we are to cope with the present inflation of new results. And we should make it possible that researchers specialize in these social activities, which will only be possible if we give them proper recognition.

Collaborative theorem proving. Instead of everyone trying to get their little theorem published, why do we not try to find new ways of working on the big open questions collaboratively? The polymath projects are very valuable efforts in this regard, but they are only a small first step. As someone on the sidelines of these projects, it appeared to me that in the end, only a relatively small number of people were able to actually contribute to any given polymath project. We have to find ways to make large projects like this asynchronous and distributed. That means, we have to find ways how people can contribute half-baked ideas to research projects, how people who might find them useful can discover them and how to give credit for half-baked ideas in the mathematical community. How to solve these problems is wide open, from my point of view.

The two topics of communicating mathematics and “crowd sourcing” proofs have been widely dicussed in the current debate about mathematical publishing. Now on to two points that have received less attention, as far as I know.

Implementing mathematical software. To make mathematical methods discoverable (and useful), it is of tremendous value to have them implemented in software. This applies to algorithms and methods for visualization as well as to such mundane tasks as the development of decent user interfaces. While these may be engineering tasks in many cases, we should recognize them as research, because they are crucial to the advancement of mathematics! Moreover turning all the useful algorithms available in the literature into usable software is a huge endeavour - it just won’t happen without the proper incentives.

Formalizing mathematics. In my opinion, formalizing mathematics is the single one most important thing that needs to happen if we want mathematics to scale. With “formalizing mathematics” I mean a huge process that includes all of the following.

  • The development of the languages and tools necessary to make the formalization of mathematical definitions, theorems and proofs feasible for the average mathematician.
  • This will have to go hand in hand with the development of automatic proof systems that can automatically “fill in the gaps” present in any research article.
  • The creation of an infrastructure that allows formal mathematical articles to be shared and discovered across different proof systems and formal mathematical languages.
  • And, finally, the formalization of (almost) all of mathematics.

It goes without saying that this is more than enough work for generations (plural!) of mathematicians. We still have a long way to go in all of the four areas indicated by the four bullet points above. And even though work is going on in all of these areas, this work is not as widely known as it should. Freek Wiedijk was kind enough to give me a brief tour last fall, and I plan to blog about this in the near future.

Here is why I think the project of formalizing mathematics is crucial. We, as humans, cannot cope with the sheer quantity of mathematical output that is going to be produced in the 21st century. So we have to teach computers how to discover theorems for us that are relevant for the questions we are interested in. Otherwise, the work of most mathematicians will disappear into obscurity faster than anyone can be comfortable with.

Communicating mathematics, collaborative theorem proving, implementing mathematical software and formalizing mathematics are of course all closely interrelated. Why should we not dream of composite electronic resources that are accessible and understandable by all kinds of different readers, that we work on collaboratively and that combine expository texts with detailed explanations and usable software with formalized mathematics? This dream is certainly not new and it is shared by many. How should this brave new world of mathematics look like in detail and how do we build it? The process of exploring these questions will be interesting indeed. My point here is that to get there we need to give up the belief that theorem proving is what mathematical research is all about.

The self-image of the mathematician

No matter what happens, proving mathematical theorems is going to become less relevant. If we succeed in changing they way mathematical research works, other activities will become more important. If we do not, the theorems most mathematicians prove will, generally, be redundant and overspecialized. So:

Theorems will become less relevant.

The social and psychological impact of this simple observation is huge. The self-image of the modern mathematician is built entirely on proofs and theorems. We, as mathematicians, grow up on stories of great “heros” who became “immortal” by proving all these great theorems we learn about. Naturally, we aspire to do as they did and so we seek to wrest these eternal truths from the world of pure, abstract ideas that we conquer using nothing but our “beautiful mind”.

Of course I am being ridiculous here, I exaggerate. But our identity as mathematicians is based on the stories we tell each other about other mathematicians - and those stories revolve around theorems and proofs. Often the only thing we know about a mathematician is what theorem they proved and we set up Millenium Prizes to celebrate the act of attaching a particular mathematician’s name to a particular theorem.

The association of “being a mathematician” with “proving a new theorem” is so tight that I have serious doubts about whether we as a community will accept other mathematical activities as research anytime soon. To do so would call into question what a mathematician is at a fundamental level. But this is precisely what we have to do.

Conclusion

The current model of doing mathematics is in need of reform. The fundamental issues with the current model cannot be resolved just by placing mathematical journals on a different economic and organizational basis. The core of the problem is our fixation as a community on theorems and proofs. As these notions are tied so closely to our self-image as mathematicians, finding a new model of doing mathematics is going to be a long and arduous process. But we have to do it, if we want our work to be relevant in a couple of decades.

Ehrhart f*-coefficients of polytopal complexes are non-negative integers

One week ago, my most recent math paper appeared on the arXiv and now I finally get around to blogging about it.

The \(f^*\)- and \(h^*\)-coefficents of a polynomial \(p(k)\) are simply the coefficients of \(p(k)\) with respect to certain binomial bases of the space of polynomials. If \(p\) is of degree \(d\), then they are given by \[p(k)=\sum_{i=0}^d f^*_i {k-1 \choose i} =\sum_{i=0}^d h^*_i {k+d-i \choose d}.\]

Where do these definitions come from? The motivation is Ehrhart theory. Given a set \(X\subset\mathbb{R}^d\), the Ehrhart function \(L_X\) of \(X\) is given by \[L_X(k)=\#\mathbb{Z}^d\cap k\cdot X,\] for \(1\leq k \in \mathbb{Z}\), that is, \(L_X(k)\) counts the number of integer points in the \(k\)-th dilate of \(X\).

Now, a famous theorem by Ehrhart states that if \(X\) is a polytope with integral vertices, then \(L_X\) is a polynomial. More precisely, there exists a polynomial that coincides with \(L_X\) at all positive integers and by abuse of notation, we will denote this polynomial by \(L_X\) as well.

After this preamble you may already have guessed it. The binomials in the expansion above are Ehrhart polynomials of particularly nice polytopes, namely half-open unimodular simplices. We denote by \(\Delta^d_i\) the \(d\)-dimensional standard simplex with \(i\)-open faces, i.e., \[\Delta^d_i = \{ x\in\mathbb{R}^{d+1} \;|\; x_0,\ldots,x_{i-1} > 0, x_i,\ldots x_d\geq 0 \sum_{i=0}^d x_i = 1 \} \]

and then it is not hard to check that \[L_{\Delta^d_i}(k) = {k+d-i \choose d}\] and in particular \[L_{\Delta^d_{d+1}}(k) = {k-1\choose d}.\] Thus:

  • If \(P\) is a lattice polytope with a shellable unimodular triangulation \(K\), then \(h^*_i\) counts the number of \(d\)-dimensional simplices with \(i\) open faces in a shelling of \(K\).
  • If \(P\) is a lattice polytope with a unimodular triangulation \(K\), then \(f^*_i\) counts the number of \(i\)-dimensional faces in \(K\).

In particular, if \(P\) is sufficiently nice, then both the \(f^*\)- and the \(h^*\)-coefficients are non-negative. The point of my article is to show that even if \(P\) is not nice and partitioning \(P\) into unimodular simplices does not work at all, the \(f^*\)-vector is still non-negative.

Theorem. If \(P\) is a polytopal complex, then the \(f^*\)-coefficients of \(L_P\) are non-negative integers, even if \(P\) does not have a unimodular triangulation.

This becomes relevant when \(P\) is not-convex. As long as \(P\) is a polytope, its \(h^*\)-coefficients are non-negative by a famous theorem of Stanley. However, the \(h^*\)-coefficients of polytopal complexes may well be negative. For example, in recent work, Martina Kubitzke, Aaron Dall and myself have shown that there exist coloring complexes of hypergraphs with negative \(h^*\)-coefficents.

In fact, we can take this theorem one step further and use it to characterize Ehrhart polynomials of what I call partial polytopal complexes. An integral partial polytopal complex is any set in \(\mathbb{R}^d\) that can be written as the disjoint union of relatively open polytopes with integral vertices.

Theorem. A vector is the vector of \(f^*\)-coefficients of some integral partial polytopal complex if and only if it is integral and non-negative.

Behind all of this lies a combinatorial interpretation of the \(f^*\)-coefficients in a (not necessarily unimodular) lattice simplex and a new way of partitioning the set of lattice points in a simplicial cone into discrete cones.

For more results, in particular about the rational case, and more details on all of this, please check out my article! It has the id arxiv: 1202.2652.

Using Intuitive Geometry - Lecture Summary

This fall semester, I taught the Math 890 course at San Francisco State University. It was the very first course that I taught all on my own. I am currently funded by the DFG, so I have no teaching obligation (in fact, I had to ask the DFG for permission to devote my time to something other than research). But when I was offered this opportunity, I leapt at the chance and volunteered to teach this course, even though, with two 75 minute classes each week, it was a significant amount of work. I wanted the experience, and, after all the years where I was responsible only for my own research portfolio, I was looking forward to taking on some real responsibility. Also, this was a topic course for graduate students, so that I could teach anything that I wanted. What a wonderful opportunity to experiment!

In this post, I will present the idea behind the course I gave, give an outline of the topics covered, provide all my slides and notes for the lecture as well as all the exercise problems, give a list of topics for the seminar, and review how the course, as a whole, went.

Note: The material linked below is known to contain a number of typos.

The Idea

I always wanted to try a different format of a course. With “format” I do not mean the methodology for teaching - experiments in that department are certainly called for, but I wanted to stick to the classical lecture + seminar format (because I actually enjoyed it as a student). What I wanted to experiment with was the content of my course.

Math courses, the way I experienced them at the FU Berlin, typically build a theory from the ground up, in a very detailed fashion. They emphasize rigor over intuition, and have a classical definition-theorem-proof structure. Courses of this “classic” format are certainly important, especially for teaching mathematical rigor, but here, I wanted to do things differently:

  • The course was to cover a large range of topics, with the aim of showing their interconnections,
  • intuition was to be emphasized over rigor,
  • and it was to make extensive use of visualization.

In short, I wanted to give my students a grand tour of the mathematics that I find interesting. Adding courses of this type to the curriculum is important for a number of reasons. I do not want to go into too much detail on this, so here are just a few bulletpoints, in no particular order.

  • For me personally, grand tours are simply more inspiring than develop-a-theory-from-the ground-up-lectures. I want to take a grand tour of the house, instead of spending my time laying bricks for the foundations!
  • Grand tours have more room to present applications of the covered material.
  • Pure mathematics especially are not measured only in terms the applications they have. The beauty, insight and perspective they have to offer are of direct value to us as human beings. However beauty, insight and perspective are tied primarily to the rough ideas and much less to the technical details. So let’s cover more beautiful ideas and less details!
  • Showing the connections between different fields make all of these fields more relevant to the audience!
  • First, I cannot remember a theorem unless I develop some kind of intuition for it. Second, I cannot prove similar theorems on my own, without having an intuitive idea why it should be true. Third, the intuitive idea is often all I remember off the top of my head after a couple of weeks have passed. So why not focus on teaching the intuition in the first place?
  • For me personally visualization is by far the best way to develop an intuition.

Outline

The topics that were covered in the lecture + seminar are show in the big mind map at the top of this page. As you can see, the variety of topics is very large. Each of the big headlines in the mindmap could easily make up an entire course on its own! All of these topics are interrelated. Yet, it is hard to come up with a single title for the course that includes them all. “Discrete Geometry”, certainly, is too generic and not entirely accurate. In the end, I decided to call the course:
Using Intuitive Geometry - Using Geometry Intuitively
I like this title, because it makes clear that I care more about methods and intuition, than about giving a comprehensive account of anything. For example, Game Theory is certainly no subset of Discrete Geometry, but methods from “intuitive geometry” do play a big role: The Minimax Theorem in Game Theory is both equivalent to the LP Duality Theorem and it is implied by Brouwer’s Fixed Point Theorem, which are both staples of intuitive geometry. In hindsight, it occured to me that another fitting title would have been:
Adventures with Hyperplanes
Well, maybe next time. Here are the slides and notes from the tour I took this fall through all of these diverse topics:
  • Visualizing Numbers (Gauss sum formula, lattice points in triangles, staircases, Dedekind sums)
  • Polytopes (H- and V-description, faces, permutahedron)
  • Linear Programming (Fourier-Motzkin eliminiation, Farkas lemma, LP duality, simplex algorithm, combinatorial optimization)
  • Game Theory, Social Choice (minimax theorem, Nash equilibria, spatial voting, McKelvey’s theorem)
  • Topological Combinatorics, Fair Division (Brouwer’s fixpoint theorem, Sperner lemma, ham sandwich splitting)
  • Ehrhart Theory, Lattice Points in Polyhedra (Ehrhart’s theorem, Ehrhart-Macdonald reciprocity, Stanley non-negativity, combinatorial applications)
  • Generating Functions (formal power series as lattice point sets and vice versa, enumeration of lattice points using formal power series)
  • Machine Learning (support vector machines, kernel trick, VC dimension)

Exercises

To go with these lectures, I made up these exercises, containing 2 to 5 problems each:

Seminar Talks

After 10 weeks of lectures, we had 4 weeks of student talks. Some of the talks were in-depth talks on a particular subject, while others were brief introductions to large subject areas. These are the topics I suggested:

  • The Mordell-Pommersheim Tetrahedron. (Source: Computing the Continuous Discretely. Beck, Robins. 2007. Chapter 8.)
  • Power Indices and Spatial Power Indices (Source: Power and Stability in Politics. Straffin. In: Handbook of Game Theory Volume 2. Chapter 32. 1994.)
  • LP and MIP Modelling Tricks (Source: AIMMS Optimization Modelling. Bisschop. 2011. link)
  • Interior Point Methods (Source: The integer-point revolution in optimization: History, recent developments, and lasting consequences. Wright. 2004. link)
  • The Borsuk-Ulam Theorem and Ham Sandwich Splitting (Source: Using the Borsuk-Ulam Theorem. Matousek. 2003. Chapters 2 and 3.)
  • Integer Programming Methods (Source: Theory of Linear and Integer Programming. Schrijver. 1986. Chapters 23 and 24.)
  • Order Polynomials (Source: Combinatorial Reciprocity Theorems. Beck. 2011. Chapter 6. link)
  • P-Partitions (Source: Combinatorial Reciprocity Theorems. Beck. 2011. Chapter 7. link)

How did it go?

I am very happy with how the course turned out! This “grand tour” style of a course succeeded in engaging the students. From what I heard, my students not only learned something, but also took away a new perspective on many of the subjects I covered. One student even told me that they now had a much better understanding what the thesis they had been working on is about. Was will man mehr!

Nonetheless there are of course a number of things I would like to improve the next time I teach this course - which I would really like to do! Here is a list of notes that I would like to keep in mind for the next iteration.

Exercises

The exercises were what made this “intuitive tour” type of course work. Had my students not spent the time to accquaint themselves personally with the subject matter, I guess many parts of this course might have remained too vague. The great thing is, that my students did work through virtually all of the exercise problems, even though I did not have the resources to grade the correctness of their solutions. It would have been too easy to skip corners - but they didn’t! Thus, I really have to thank them for their initiative, curiosity and dilligence - they made this course work (even more than ususal)!

In hindsight, I think this large program is better suited to a course with three 75 minute sessions per week. Two sessions for lectures and one for talking about the exercise problems. Also, it would have been very valuable to provide individual corrections for the exercises and/or solutions of the exercises for the students to review on their own.

Lecture Notes

I wrote the script for my lectures as I went along. Given that the material was so diverse and scattered over so many sources and given that it was my ambition to present some material from an (AFAIK) original perspective, this was very work intensive. Originally, I had planned to use my blog to record detailed lecture notes, but this turned out to be infeasible. (Currently, it takes me at least four hours to write up notes for one 75-minute lecture in a form suitably self-contained for a blog post - even if I have prepared the material beforehand in slide form and given the lecture in front of class. Is that typical or somewhat slow?)

The points mentioned so far (publish lecture notes, publish solutions for the exercises, provide individual corrections for the exercises) all boil down to putting in more work. And when I give this course the next time, I certainly want to prepare lecture notes and solutions beforehand and work together with a teaching assistant for grading! Here are some more thoughts, though, that take a different direction:

Seminar

Combining the lecture with a seminar was an excellent choice. For two reasons. First, it gave the students the opportunity to work on a subject in depth. This was especially important as the “intuitive tour” format (by necessity) left the students wanting more details. The seminar gives students the opportunity to pick one subject they find particularly appealing and dig into the details that were only hinted at in the lecture and in the exercises.

Second, the seminar was enjoyable not only for the students, but for me as well! During preparation, it was simply fun to get together and puzzle about a problem or figure out the best picture for illustrating a theorem. (Especially after I had spent hours on end with mainly one-way communication during the lecture.) On the whole, the talks were very good and got across all the important points. Some even went above and beyond the suggested material! Next time, I might be tempted to make the seminar a much bigger part of the course!

Proofs

What place do proofs have in this type of course? After the first two weeks of the course I become worried that my intuitive style of lecturing might be too vague for my audience. This worry was not fed by any actual observations during class, but rather by my rookie-lecturer-confidence wavering. So I decided to up the level of detail in the next few weeks and include more proofs and a greater level of formality. In retrospect, after talking with my students about it and judging from the their exercise solutions, I can say that this failed completely. This “proofy” part of the lecture was the part my students enjoyed the least and it was the part they had the least grasp of, judging by the exercises. (Of course, proper lecture notes would have helped their performance in the exercises, but I think this is not the main point here.)

I was reminded here of these great quotes by Victor Klee:

  • “A good talk contains no proofs; a great talk contains no theorems.”
  • “Mathematical proofs should only be communicated in private and to consenting adults.”

In any case, they have no place in this type of lecture! Intuitions for why a theorem should hold, rough proof sketches, all of these certainly belong in this type of lecture. But going the extra distance from an intutive proof sketch to an actual “formal” proof is not worth it - simply because any formalization is significantly harder to understand than the intuitive idea that gave rise to it. If your goal is to teach a theory, then teaching that formalization is part of what you want to communicate. If your goal is to show connections between different fields intuitively, then that formalization is only tangential to what you want to communicate. So, adding formalism just for the sake of communicating an intuitive idea is not going help - instead of clarifying the point it will obfuscate.

Of course this does not mean that proofs are irrelevant! On the contrary, if students are to learn how to use geometric intuition, then learning how to make intuitive ideas precise is crucial. However this is an active process. Students have to do it themselves when solving exercises or preparing seminar talks. My doing it at the whiteboard does not help anybody! More precisely: my doing it at the whiteboard helps only, if students have tried to do it by themselves beforehand.

I will not try to write down here all the mental notes I made for myself on how to present proof ideas if you do not want to make them 100% precise. (Maybe I will write another blog post about this.) I want to mention two things, however: First, when explaining an algorithm/constructive proof it is much more important to make the construction precise and illustrate it with an example than to prove its correctness. Second, always give a concise summary of the key point of a theorem or proof that you just explained. That way, even those members of your audience that you may have lost will have something concrete to remember.

Drawing

Of course, when talking about geometric intuition, drawing is the most important thing! Two shorts thoughts about this:

The most important aspect of many mathematical figures is the way they are drawn and not the final image. Thus it is very important to construct a figure slowly, step by step. No fancy visualization tool beats a good drawing on a whiteboard!

In this vein, I encouraged my students to draw the objects they were talking about, both in the exercises and in the seminar - and I got everybody to do it in the end! I was very happy about this, because, in my opinion, drawing is an essential skill for any mathematician!

Conclusion

It was a great experience teaching this course! I learned a lot, both in terms of hard skills and soft skills. I would love to teach this course a second time, and the next time I would like to experiment more with teaching methodology. Also, I would really like to write up this tour through some uses of intuitive geometry in book form and smooth out all the wrinkles along the way. But, maybe, the printed page (or static PDF document) is just not the right medium for this.

I think mathematics could benefit from more courses and texts of this “intuitive tour” type: overview material with a strong emphasis on intuition and visuals, accessible to advanced undergraduates but with enough depth to satisfy graduate students.

Bibliography

Finally, here is a short informal summary of the literature I used for the preparation of this course.
  • Beck, Combinatorial Reciprocity Theorems.
  • Beck, Robins, Computing the Continuous Discretely.
  • Bisschop, AIMMS Optimization Modelling.
  • Breuer, Heymann, Staircases in , Integers.
  • de Longueville, A Course in Topological Combinatorics.
  • Gaiha, Gupta, Adjacent Vertices on a Permutohedron, SIAM Journal on Applied Mathematics.
  • Grötschel, Lineare und Ganzzahlige Programmierung.
  • Matousek, Using the Borsuk-Ulam Theorem.
  • McKelvey, Intransitivities in multidimensional voting models and some implications for agenda control, Journal of Economic Theory.
  • McKelvey, General Conditions for Global Intransitivities in Formal Voting Models, Econometrica.
  • Raghavan, Zero-sum two-person games, Handbook of Game Theory, Vol 2.
  • Schölkopf, Smola, Learning with Kernels.
  • Schrijver, Theory of Linear and Integer Programming.
  • Straffin, Power and stability in politics, Handbook of Game Theory, Vol 2.
  • Wright, The interior-point revolution in optimization: History, recent developments, and lasting consequences, Bulletin of the American Mathematical Society.
  • Ziegler, Lectures on Polytopes.

Geometry, the Majority Vote and the Power of Agenda Control

These are notes for a talk I gave both in my course at SF State and at the teach-in against police violence at UC Berkeley. The talk at the teach-in found its way on IndyMedia and YouTube (Pt 1 2).

In a democracy, people are fighting all the time over control of the political agenda. Media campaigns, protests and lobbying are all examples of people trying to shape the political discourse. They seek to get “the public” to discuss a topic they care about, they seek to shift the focus of the current political discussion towards the points they think are important or they may even want to throw up a smokescreen to divert the public from a topic they consider dangerous for them. This process is vital for a democracy to work. In fact, I want to present a mathematical argument that supports the following bold claim: This process is more important for a democracy than voting! And perhaps the most amazing thing about this observation is that this is a direct consequence of the geometry of voting.

Policies as points in space

Political choices can often be represented by points in space. By this I mean that often, all the different policies that can be adopted with regard to a given political issue can be represented by different points in space.

As an example, let us consider budget decisions. A state’s budget can be represented by three numbers:

  • the state’s revenue (e.g. taxes)
  • the state’s expenditures (e.g. public spending)
  • the state’s deficit
In this way, we can view every possible budget decision as a point in three-dimensional space. However, these three values are not independent of each other. Instead they are related by

deficit = spending - taxes.

This equation defines a plane in three dimensional space. All possible budgets lie in this hyperplane. To visualize how this looks consider the following three dimensional coordinate system. The axes represent spending, taxes and deficit. The budget hyperplane lies slanted in space. Its intersections with the three coordinate hyperplanes are shown as dashed lines. For example, the horizontal dashed line is the set of budgets with spending = 0. It is a diagonal in the plane spanned by the deficit and taxes axes. Similarly, the other two dashed lines are diagonals in the deficit/spending and spending/taxes coordinate hyperplanes.

As we have seen, the set of all budgets with spending = 0 forms a line in the plane of all possible budgets. Increasing or decreasing spending by, say, 1 dollar corresponds to translating this line in the plane. Taking all translates by $k$ dollars, we obatin a family of parallel lines. And if we repeat this for the lines given by taxes = 0 and deficit = 0, we obtain a triangulation of the budget plane as shown below.

This picture shows a part of this triangular lattice. It allows us to visualize what happens if we change a given budget $b$. Say $b$ is given by a spending of 3 trillion dollars, tax revenues of 2 trillion dollars and a deficit of 1 trillion dollars and suppose we increase tax revenues by 1 dollar. This corresponds to moving from $b$ to the line given by 1 trillion and one dollar of tax revenues. However, we have to make a choice here. Do we use that one extra dollar to lower the deficit or increase spending? If we go “left” in the picture, we stay on a line where the deficit remains constant but we move from one “spending line” to the next thus increasing spending. On the other hand, if we go “right” we keep spending constant and decrease the deficit.

Now that we have a means to visualize how different budget decisions relate to each other, we can use that to visualize the current positions of various political actors in the US with regard to the federal budget.

We place the 2011 federal budget in the center of our picture at the point labeled $b$. Next we draw the three lines through $b$ along which spending, taxes and deficit are constant, respectively. Both, the democrats and the republicans want to decrease the deficit. So if you asked a democrat or republican where their ideal 2012 budget would be, then they would tell you that it lies below the deficit line through $b$. Ideally, however, a democrat would like to decrease the deficit by increasing taxes, without decreasing spending. So, qualitatively, a democrat’s ideal policy would lie in the blue region. A republican, on the other hand, would want to reduce spending to finance the decrease in deficit without raising taxes. So a republican’s ideal policy would lie in the red region. There are other people, though, e.g., Keynesian economists, who think that lowering the deficit is not a priority at the moment. They think that it is much more important to stimulate the economy by lowering taxes and increasing spending, even if that means that the deficit has to increase even further. People in this comparatively small group might have an ideal budget somewhere in the green region.

We have now seen how to think geometrically about politics. Now we are going to apply this point of view to analyze the dynamics of the majority vote.

Spatial Voting and the Power of Agenda Control

So far, we have only visualized policies as points in space. Now we come to McKelvey’s theorem, the main result I want to tell you about. To be able to state this theorem, we need to formulate a model of voting, called spatial voting.

As we discussed, every policy corresponds to a point in space. Now, spatial voting simply means the following.

  • Voters are asked to decide between two alternatives, the status quo which we call $x$ and a proposed new policy $z$. Voters can vote for either of these two alternatives.
  • The alternative with the most votes wins, i.e., we are having a majority vote.
  • Every voter has an ideal policy in mind. So the voters can be represented by a set of points in space.
  • Finally, we make a very important assumption: Every voter will vote for the alternative closest to his or her ideal policy!

Consider as an example the following picture. The black points represent the ideal policies of all voters. The blue point labeled $x$ marks the status quo, while the red point labeled $z$ marks the proposed new policy. In this example, most voters are going to vote for $z$ as $z$ is closer to their ideal than $x$. As we are running a majority vote, this means that $z$ is going to be approved and will become the new policy.

Now we can state McKelvey’s theorem.

Theorem. (McKelvey 1976) For every status quo $x$ and every other policy $z$ there exist policies $y_1,y_2,\ldots,y_k$ such that

$x$ loses against $y_1$ which loses against $y_2$ which loses against $y_3$ … which loses against $y_{k-1}$ which loses against $y_k$ which loses against $z$

in the majority vote.

What does this mean? Suppose there were someone with complete control of the agenda. This agenda controller would have absolute control over which new policies are proposed. Whenever the agenda controllers proposes a new policy, the voters would get to vote and the agenda controller would be bound by their decision according to the majority rule. This means that an agenda controller cannot decide anything. All he can control is between which alternatives the voters get to choose.

So, suppose there is an aganda controller and the current status quo is $x$. Suppose further that the agenda controller wants voters to pass a new policy $z$ that he favors (but that everybody else may hate). McKelvey’s theorem now states that there are cleverly devised policies $y_1,\ldots,y_k$ such that the agenda controller can proceed as follows. First, he proposes $y_1$. The voters get to choose between $x$ and $y_1$ and they approve $y_1$ as the new policy because the majority of voters prefers $y_1$ to $x$. Now $y_1$ is the new status quo. Next, the agenda controller proposes $y_2$ and again, the voters prefer $y_2$ over $y_1$ so they are going to adopt $y_2$ as the new status quo of their own free will. The agenda controller continues in this fashion until the voters pass $y_k$ as their new policy. Then he makes his final move and proposes $z$, the policy that he likes but that everybody else hates. However, through his previous moves, he has managed to put the public in a position where the majority actually prefers $z$ to the current status quo $y_k$. And so they are going to approve of $z$ of their own free will.

In this way, someone in control of the political agenda is in complete power, even if he is bound by the majority vote of the population. This explains why, for example, freedom of speech and the right to protest are so important parts of democracy. Protest is a way of getting a certain item on the political agenda, trying to influence the direction that current policy should take. (See here for a recent example.) Conversely, if there is a small elite who dictates the political topic of the day, then this elite will get its way, no matter if vote or not. So the morale of the story is this:

Voting alone does not give the people any power! Just by voting, you can’t control anything. You also have to have a public discussion about political topics and this is what protest is about.

At this point, it is important to stress the difference between the provocative morale that I drew from the story and the statement of the original theorem. The theorem is provably true, but applies only to a very specific abstract model of a society. In reality, no one is in absolute control of the political agenda! And the theorem tells us that this is a very good thing!

So, while the theorem does not apply to western democracies such as the USA, it can still tell us something about the real world, by way of analogy, in just the same way that a good novel can reveal a truth about the world we live in. (See the Q&A section for more about this.)

Now that we have both a formal as well as an intuitive feeling of what McKelvey’s theorem is saying, let us try to get an idea why the theorem should be true.

The Idea behind McKelvey’s Theorem

In this section, I want to give an idea how the sequence of policies $y_1,\ldots,y_k$ are constructed. If you want, this is a proof by example, but of course it is not a complete proof of the theorem. For a thorough proof I refer the interested reader to the literature cited at the end of the Q&A section.

We work with an example in 2 dimensions. There are three voters $v_1,v_2,v_3$ whose ideal policies are given by the three black points $p_1,p_2,p_3$ in the picture. We draw the lines through any pair of these three points. The status quo is given by the blue mark in the center of this triangle formed by the voters’ ideals. So, intuitively, the current status quo $x$ seems like a pretty good compromise between the voters’ preferences.

Which policies defeat the status quo $x$ in a majority vote? Remember that the three voters will vote for the alternative closest to their ideal policy. So, for each voter, we draw a circle around that voter’s ideal policy and we choose the radius of that circle such that the cirlcle passes through the status quo $x$. Let $C_1$ be the circle around voter $v_1$’s ideal policy $p_1$. $v_1$ will prefer any policy $y$ to $x$ that lies strictly inside the circle $C_1$!

Consequently, the policies that will defeat $x$ lie in the intersection of any two of the circles shown in the picture. The area corresponding to the set of policies that defeat $x$ are shaded green in the picture. As you notice, the shaded area extends outside of the triangle spanned by the three ideal policies $p_1,p_2,p_3$. Let $L_1$ denote the line through $p_1$ and $p_2$. If we reflect $x$ at $L_1$, then we will obtain a point $x’$ at exactly the same distance to $p_1$ and $p_2$ as $x$. If we now move $x’$ inwards for a little bit, the resulting point $x”$ will lie strictly in the interior of both $C_1$ and $C_2$. Thus $x”$ will defeat $x$ in a majority vote, because both voters $v_1$ and $v_2$ will like $x”$ slightly bettern than $x$. Note, however, that for $v_3$ the new policy $x”$ is much worse than the status quo $x$. Since we are working with a majority vote, however, this is not taken into account.

As it turns out, this construction always works! We summarize this observation as follows.

For any policy $x$ and any line $L$ through two of the voters’ ideal policies, the policy $x”$ obtained by first reflecting $x$ at $L$ and then moving the reflected point back towards $L$ a tiny bit, will always defeat $x$ in a majority vote. Typically, $x”$ will be “slightly better” than $x$ for two of the voters and much worse for the third voter.

This observation is the staples of the agenda controller’s strategy for manipulating the voters. Phrased provocatively, the agenda controller’s strategy is this:

Play off different groups of voters against each other! Offer a majority of voters marginal incentives to hurt a minority really bad!

So how can an agenda controller use this strategy to get voters to vote for his favorite policy? The following figure tells it all.

We start with a status quo $x$. This we reflect $x$ along the line $L_1$ through $p_1$ and $p_2$ by playing off voters $v_1$ and $v_2$ against voter $v_3$ to get to a new policy $y_1$. Next, we reflect $y_1$ along $L_3$ by playing off $p_1$ and $p_3$ against $p_2$ to get to $y_2$. Then, we reflect $y_2$ along $L_2$, by playing off $p_2$ and $p_3$ against $p_1$. We can keep going in this fashion as long as we like. Note that in each step we give two of our voters a tiny improvement (we move towards their ideal points by a tiny bit) while at the same time making the situation much worse for the third one (we move much further away from his ideal policy). In total, we make the situation worse in every step! So if we continue this process long enough, everybody will be so unhappy with the current policy ($y_k$ will be so far away from all voters’ ideal policies) that everybody will be happy to agree to the policy $z$ that the agenda controller had in mind all along.

I find this proof striking for a number of reasons. First of all, even from this small example, you get a clear idea how the proof will proceed in the general case (even though you need a couple of technical tools like median hyperplanes). Second, for three voters, the proof requires only an elementary geometric construction that (I hope) does not scare off people who do not feel that comfortable around mathematics. Finally, and this is most remarkable, this proof gives clear instructions how someone who wants to manipulate public decision making should go about it: Play off different groups of voters against each other! The fact that many political decisions are two or higher dimensional provides the necessary room to maneuver.

(In fact, I find it striking how similar the manipulative stragtegy suggested by the geometry of voting is to how many agents on the political stage actually behave. To manipulate public opinion, they try to reduce the public discussion on an issue to a one-dimensional problem. To a choice between two alternatives. They try to hide the fact that real world problems are higher-dimensional.)

Conclusion

I hope to have convinced you that there is merit in looking at politics from a geometric point of view. A natural consequence of this point of view is McKelvey’s theorem, which states that an agenda controller can get voters to agree to anything the agenda controller wants of their own free will. Fortunately, there are no perfect agenda controllers in real world democracies! But McKelvey’s theorem reminds us that we had better exercise the freedom we have to shape the political agenda. Because without active political debate, the right to vote alone does not give the public any power at all.

Q&A

There are a great many things more to say about all this. Here are a few of them.

Budgets are more complicated than this!

Of course a state’s budget is much more complex than the model I suggested above. There are many more parameters to take into account! The advantage of this model is that it captures the gist of the problem while still being low-dimensional enough so that we can draw something!

There are some assumptions missing from McKelvey’s theorem!

That is true. First of all, McKelvey’s theorem only works in dimensions two and up. If different policies all lie on a line, then McKelvey’s theorem fails. The same thing can happen in higher dimensions if the voters’ individual preferences have a particular pattern. For example, if all voters have the exact same ideal, then agenda control won’t work. The technical assumption I omitted in the above statement, is that the median hyperplanes (whatever that is) of the the voters’ ideal policies do not all intersect in a single point. However, this assumption is not critical for the following reason: By slightly perturbing the voters ideal policies we can always guarantee that this assumption is satisfied. Also, the theorem assumes that the set of policies is all of $\mathbb{R}^n$. This is not true in the budget example, as negative values for taxes or spending do not make sense (a negative deficit is fine, though). But generalizations of McKelvey’s theorem take care of this issue (see below). Finally, the theorem assumes an odd number of voters. McKelvey argues, however, that this makes no difference, as the agenda controller can decide to participate in the vote if and only if his vote is required to make the total number of votes odd.

It should also be mentioned that McKelvey’s theorem assumes that all voters use the Euclidean distance to measure which policies they prefer. However, this assumption is not essential as McKelvey showed in a subsequent paper. (See below.)

What generalizations of McKelvey’s theorem are there?

McKelvey generalized his theorem considerably in subsequent work. For example, he showed that if the set of policies is an arbitrary connected subset $X$ of $\mathbb{R}^n$ and each voter measures with his or her own, private notion of distance (which has to be continuous) then, still, the top cycle of the majority rule is going to be all of $X$ (i.e., agenda control works). He concludes, in his own words:

“Regardless of other voters’ preferences, any one voter with complete information of the other voters’ preferences, control of the agenda, and the ability to cast his own vote as he chooses can always construct majority paths to get anywhere in space.”

Is protest really more important than voting?

The provocative catchphrase that “protest is more important than voting” is meant in the following sense: A political system in which people do not shape the political agenda at all but only go to vote is not democratic. In this sense, active participation in the public political discourse is at least as important as going to vote. This is not to be construed as an excuse not to go voting, however!

Is there a mathematical proof that influencing the political agenda is more important than voting?

No. Mathematics makes statements about abstract models. McKelvey’s theorem makes a statement about a particular abstract model of a society in which the people do not influence the political agenda at all and behave in a predictable way. This statement is provably true. But “mathematics” makes no claim whatsoever about what this statement about an abstract model has to do with reality.

How realistic is the model of society assumed in McKelvey’s theorem?

First of all: I am not a political scientist, and I do not know of any empirical research on this subject. My personal opinion, however, is the following: Western democracies such as the USA do not fit the model of McKelvey’s theorem at all. In particular there are a number of important differences:

  1. In practice, all actors on the political stage constantly strive to control the agenda! So McKelvey’s theorem does not apply to practical politics, but it offers an unconventional explanation why control of the agenda is so strongly contested.
  2. The theorem assumes that the agenda controller has complete information about the preferences off all voters. This is not true in practice. However, never before was so much data about personal opinions readily available to anyone with an internet connection. So one could argue that western societies are converging on the condition required for McKelvey’s theorem to apply.
  3. Voters oppinions change over time. The theorem assumes that opinions are constant. I would guess, though, that the theorem could be extended to account for changing opinions.
  4. There are few voting processes where voters only decide to aprove or reject a single proposition. Presidential election in the US fit this description but are too closely tied to personality to admit a reasonable parameterization by points in space. In most other public elections, the voters have to decide between many alternatives, so spatial voting does not fit either. In process of passing a bill in congress fits the model accurately, however there voters are constantly trying to control the agenda. Ballot propositions also fit the model, but again, voters have direct influence on the available propositions through the initiative system.
  5. As mentioned above, the theorem makes a couple of additional technical assumptions. However, I do not think that these are essential, especially with regard to McKelvey generalizations of his theorem.
  6. There is one technical assumption that deserves special mention: continuity. I do not think that it is that much of a constraint to assume that voters preferences are continuous. Rather, I think it is a significant departure from reality to assume that the space of all policies is a continuum in the first place. For a budget, this is simply false: Two budgets cannot differ infinitessimally - they can only differ by cents.
So, while McKelvey’s theorem does not prove anything about real world democracies, I think it still provides insight on how democracies work.

There is no policy that cannot be defeated by any other policy in a majority vote, right?

Exactly! But instead of looking for a policy that cannot be defeated, we can go for the policy that is least likely to be defeated! This is the policy for which the shaded green area in circle picture above is smallest. As it turns out this so-called “strong point” of the majority game lies roughly in the center of all voters’ ideal points. More precisely, it is given by a convex combination of the voters’ preferences, where the weights of the convex combination are given by spatial power indices of the voters. This means that centrist voters have more influence on the strong point than extremist voters. (Owen and Shapley 1989)

Where can I read more on this?

I recommend the excellent article by P.D. Straffin Jr. in the Handbook of Game Theory and Richard McKelvey’s original articles on the subject.

  1. Straffin, P. D. J. (1994). Power and stability in politics. Handbook of Game Theory with Economic Applications Volume 2 (Vol. 2, pp. 1127-1151). Elsevier.
  2. McKelvey, R. D. (1979). General Conditions for Global Intransitivities in Formal Voting Models. Econometrica, 47(5), 1085-1112.
  3. McKelvey, R. D. (1976). Intransitivities in multidimensional voting models and some implications for agenda control. Journal of Economic Theory, 12, 472-482.

Teach-in - Mathematicians against Police Violence

I was shocked by the violence the police here in the Bay Area used against peaceful protesters in recent weeks. Examples include shooting grenades at people lying on the ground and using batons to savagely beat up people for no reason. Apart from being incredibly brutal, I think this kind of police behavior is completely unnecessary and simply unprofessional. It is a vital feature of a democracy that people can protest! And even if camps of protesters need to be removed, there is no reason to use violence to do it!

Last week, Nathan Ilten had the great idea of organizing a teach-in against police violence. Basically, this is a bunch of mathematicians giving lectures in public where everybody can see them. I think this is a great way of raising public awareness for a political issue while at the same time communicating interesting ideas. I have participated in a couple of these in Berlin - we had a lecture in front of the Rotes Rathaus and a lecture in the U-Bahn - and it is great that Nathan took the initiative to make this kind of event happen in Berkeley as well.

The teach-in took place yesterday from 11 to 5 at Dwinelle Plaza on the UC Berkeley campus and it was a resounding success. At the peak, the talks drew crowds of about one hundred people and we had a solid attendance throughout the day. Everything stayed peaceful and we had no trouble with the police at all. (I guess they were busy trying to stare down the people at the Occupy camp on Sproul.) And our event was covered by a local radio station.

We had a wide range of talks, some addressing mathematical aspects of politics, some focusing on mathematical subjects, some targeted at a general audience and some intended for math majors. Our speakers were Chris Manon, Andrew Critch, Charley Crissman, Eugenia Rosu, Piotr Achinger and myself. I do not intend this blog post to be a comprehensive account of the teach-in, so please forgive me when I do not summarize each of their talks. I just want to mention that I particularly enjoyed Andrew’s and Charlie’s talks, because they provided mathematical perspectives on politics, which I think is particularly effective for this type of event. (In fact, “mathematical politics” make a great subject for any kind of general mathematics event, as Svante Linusson’s brilliant talk at the last Bad Math Day demonstrated! Politics never fail to get an audience involved.)

The title of my own talk was Geometry, the Majority Vote and the Power of Agenda Control and its topic was McKelvey’s theorem. McKelvey’s theorem is a surprising result with a beautifully intuitive proof that can be made accessible to everyone. Its morale in this context is this: Protest, i.e., the struggle for the control of the political agenda, is a much more important element of democracy than voting! Ergo: the police should not beat up protesters! If you are interested to learn more about this, stay tuned: I will blog about this tomorrow.

My talk was at 4pm and I think it turned out very well. Despite the late hour I had an interested audience, who stayed involved throughout the talk. I was especially pleased to speak with some non-mathematicians afterward, who just happend to walk by and then decided to stay and listen. They even asked when the next teach-in was going to be! Was will man mehr!

Qute - 0.4 released, big changes to come

Version 0.4 of Qute was released on Monday. 0.4 is a small bugfix release that resolves only the most pressing of the open issues.

From my own experience, I can say that Qute is very usable right now, but it certainly does have a couple of rough edges. However, the current branch of Qute won’t see any further development after 0.4!

Instead, I am going to focus on a complete rewrite of Qute, which will then be released as Qute 0.5. With this rewrite, Qute is going to move away from the simple Markdown+TeX editor it currently is and toward something more adventurous. Here are a couple of features I want to go for:

  • Outlining capabilities. Think Org-Mode or even Leo.
  • Ink support. Think Xournal-style drawings and notes inside your LaTeX documents.
  • Custom translators. Think custom OMeta for each paragraph.
  • Clean architechture. Think new implementation in ClojureScript.
  • Platform independence. Think Qute in your web-browser.

Of course this rewrite is going to take a while. So the purpose of the 0.4 release is to give my users my user something to work with in the meantime. Also, the next few releases won’t be as polished as you might want, simply because I am going to focus my energies on getting all of these juicy features implemented! So please bear with me.

If you want to be notified when Qute 0.5 comes out, you can subscribe to release announcements on freecode.com (formerly freshmeat.net) or follow me on Twitter.

Lecture - Visualising Numbers (Part 2)

Sums of Squares

We have seen last time, that the sum \(\sum_{i=1}^n i\) counts the number of lattice points in triangles with slope 1. What geometric object is associated with \[\sum_{i=1}^n i^2?\]

\(\sum_{i=1}^n i^2\) counts the number of lattice points in pyramids.

figure

Using this insight it is possible to derive the formula \[\sum_{i=1}^n i^2=\frac{n(n+2)(2n+1)}{6}.\]

We will not do that right now, but instead consider pyramids "with rational slope".

Pyramids

pyramids

Notice that by reflection \(P_{a,b;c}=P_{b,a;c}\), which means that these pyramids are symmetric in the first two arguments but not in the last. Now, how many lattice points are in \(P_{a,b;c}?\) To this end, let us intersect the pyramid with a hyperplane, where the last coordinate is fixed.

pyramid slices

The top-right lattice point in each slice has coordinates \((\lfloor \frac{ka}{c} \rfloor, \lfloor \frac{kb}{c} \rfloor, k)\) and so the pyramid contains \[L(P_{a,b;c})= \sum_{k=0}^c (\left\lfloor \frac{ka}{c} \right\rfloor + 1) (\left\lfloor \frac{kb}{c} \right\rfloor + 1)\] lattice points.

In this sum, the "\(+1\)s" are kind of annoying, so can we modify the pyramids so that the lattice point count is exactly \(\sum_{k=0}^{c-1} \left\lfloor \frac{ka}{c} \right\rfloor \cdot \left\lfloor \frac{kb}{c} \right\rfloor\)?

One solution is to remove the hyperplanes given by \(x_1=0\), \(x_2=0\) and \(x-3=c\) from the pyramid. If \(\hat{P}_{a,b;c}\) is the "half-open" pyramid obtained by removing these hyperplanes from \(P_{a,b;c}\), then we have effectively removed one column and one row of lattice points from each "slice" and so, indeed, \[L(\hat{P}_{a,b;c}) = \sum_{k=0}^{c-1} \left\lfloor \frac{ka}{c} \right\rfloor \cdot \left\lfloor \frac{kb}{c} \right\rfloor.\]

Now, let’s try to do the same trick we did last time:

Can we build a simple geometric shape out of pyramids of this type in order to obtain a formula for these sums?

When last time, we built a rectangle out of triangles, this time, we are going to build a right-rectangular prism out of pyramids! (By the way, right-rectangular prims are called quader in German, which is a much shorter word, so I am going to stick to that one.) The idea is this:

quader

Let us consider the easy case first, where \(a,b,c\) are pairwise relatively prime, that is, \[\text{gcd}(a,b)=\text{gcd}(b,c)=\text{gcd}(a,c)=1.\]

How many lattice points are now on the triangles where two of the pyramids intersect?

intersection triangle

In the relatively prime case, the only lattice points on this triangle are the origin and those on the line from \((0,b,c)\) to \((a,b,c)\). But all of these lie on hyperplanes that have been removed from our half-open pyramids! So the intersection of any two half-open pyramids does not contain any lattice points!

This means that in the relatively prime case, we will not have any double counting. Putting the three half-open pyramids together, we obtain the open quader \[\hat{P}_{a,b;c} \cup \hat{P}_{a,c;b} \cup \hat{P}_{b,c;a} = (0,a) \times (0,b) \times (0,c)\] where the union is disjoint, and so we get the formula \[L(\hat{P}_{a,b;c}) + L(\hat{P}_{a,c;b}) + L(\hat{P}_{b,c;a}) = L((0,a) \times (0,b) \times (0,c))\] which yields \[ \sum_{k=0}^{c-1} \left\lfloor \frac{ka}{c} \right\rfloor \cdot \left\lfloor \frac{kb}{c} \right\rfloor + \sum_{k=0}^{b-1} \left\lfloor \frac{ka}{b} \right\rfloor \cdot \left\lfloor \frac{kc}{b} \right\rfloor + \sum_{k=0}^{a-1} \left\lfloor \frac{kb}{a} \right\rfloor \cdot \left\lfloor \frac{kc}{a} \right\rfloor = (a-1)(b-1)(c-1).\]

This is a typical example of a reciprocity theorem. This term stems from the fact that fractions in which numerator and denominator have been exchanged are related, but I would like to use the term for a much wider class of theorems:

A reciprocity theorem is a theorem that puts several complicated functions (that, individually, have no simple form) in a simple relation.

We have been dealing with number theoretic functions here, but reciprocity theorems also exist for combinatorial counting functions, as we will see in a couple of weeks. In both cases, the message is that geometry helps finding reciprocity theorems.

Dedekind sums

Before we deal with the case where \(a,b,c\) are not relatively prime, let us first take a closer look at the sums \(\sum_{k=0}^{c-1} \left\lfloor \frac{ka}{c} \right\rfloor \cdot \left\lfloor \frac{kb}{c} \right\rfloor\). As it turns out these are closely related to so called Dedekind sums, which are important players in number theory.

The Dedekind sum \(s(a,b)\) is defined by \[s(a,b)=\sum_{k=0}^{b-1} \left(\left( \frac{ka}{b} \right)\right) \left(\left( \frac{k}{b} \right)\right)\] where \(((x))\) is the sawtooth function which is \(0\) if \(x\in\mathbb{Z}\) and \(\{x\}-\frac{1}{2}\) otherwise, where \(\{x\} = x - \lfloor x \rfloor\) is the fractional part of \(x\).

Dedekind sums are called Dedekind sums because of Dedekind’s reciprocity law which states that if \(\text{gcd}(a,b)=1\) then \[s(a,b)+s(b,a) = \frac{1}{12} \left( \frac{a}{b} + \frac{b}{a} + \frac{1}{ab} \right) - \frac{1}{4}.\]

Now, the first time I encountered Dedekind sums, I was baffled, because I had no idea what on earth this function was doing. Okay, Dedekind sums are periodic in the sense that \(s(a+b,b)=s(a,b)\) and this explains why they show up in the context of discrete Fourier transformation: Fourier transformation is a tool for the analysis of periodic functions, and Dedekind are the basic building blocks for this class of functions. But this does not answer the question:

What do Dedekind sums count or measure?

First of all, we notice that if \(\text{gcd}(a,b)=1\) the case that \(((x))=0\) if \(x\in\mathbb{Z}\) is never used: all arguments passed to the sawtooth function are fractional. This allows us to rewrite \(s(a,b)\) without the sawtooth function, using only fractional parts:

\[s(a,b) = \sum_{k=0}^{b-1} \left\{\frac{ka}{b}\right\}\left\{\frac{k}{b}\right\} - \frac{b}{2} + \frac{3}{4}\]

Given that \(\{x\}\) and \(\lfloor x \rfloor\) are complementary, you will not be surprised to hear that $_{k=0}^{b-1} \{\}\{\} $ and \(\sum_{k=0}^{c-1} \left\lfloor \frac{ka}{c} \right\rfloor \cdot \left\lfloor \frac{kb}{c} \right\rfloor\) are closely related. Can we see this relation in our pyramid picture?

Let’s look at all the slices of the \(x_3=k\) hyperplanes with our pyramid. The complexity of the set of lattice points in this pyramid is tied to the fact that the top left lattice point in each slice appears "to jump around wildly". While the corners \((\frac{ka}{c},\frac{kb}{c},k)\) of the slices all lie on a line, the lattice points \((\lfloor \frac{ka}{c} \rfloor, \lfloor \frac{kb}{c} \rfloor, k)\) don’t.

jump around

This leads us to the following observation:

The sums \[s'(a,b;c) := \sum_{k=0}^{c-1} \left\{\frac{ka}{c}\right\}\left\{\frac{kb}{c}\right\}\] measure nothing but the area of the small rectangle between \((\lfloor \frac{ka}{c} \rfloor, \lfloor \frac{kb}{c} \rfloor, k)\) and \((\frac{ka}{c},\frac{kb}{c},k)\).

Before we look at a few examples, a short remark on the fractional-part function:

Decomposing a fraction \(\frac{a}{b}\) into its integral and fractional part \[\frac{a}{b} = \left\lfloor \frac{a}{b} \right\rfloor + \left\{ \frac{a}{b} \right\}\] amounts to the same thing as diving \(a\) by \(b\) with remainder: \[a = (a \text{ div } b)\cdot b +(a \text{ mod } b).\]

This gives us \[\left\lfloor\frac{a}{b} \right\rfloor = a \text{ div } b \;\;\;\text{ and }\;\;\; \left\{ \frac{a}{b} \right\} = \frac{a \text{ mod } b}{b}.\]

Of course this is just notation. But to me, it serves as a reminder that we are doing just modular arithmetic here and not dealing with "real" real numbers. Also, we get \[ s'(a,b;c) = \frac{1}{c^2} \sum_{k=0}^{c-1} (ka \text{ mod }c)\cdot(kb\text{ mod } c)\] and can read off that \(c^2\cdot s'(a,b;c)\) is integral - so it counts something!

Let us now visualize \(c^2\cdot s'(a,b;c)\) by drawing the "rectangles" \((ka \text{ mod }c)\cdot(kb\text{ mod } c)\) for \(k=0,\ldots,c-1\). We pick the example \[7^2\cdot s'(5,1;7) = 5\cdot 1 + 3 \cdot 2 + 1 \cdot 3 + 6 \cdot 4 + 4 \cdot 5 + 2 \cdot 6.\]

torus1

How does the "corner" defining the rectangle move when we pass from one column \(k\) to the next? As we are doing modular arithmetic here, we are operating on a torus here. We can see this by identifying the top and bottom, and left and right edges of this square, respectively. Moving from one corner to the next corresponds to translation by the vector \((5,1)\) on the torus. So the corners of these rectangles all lie on one line.

torus2

In the above image, each region is labled with a number. This number gives the number of rectangles this region is contained in. If we now consider all sums \(s'(a,1;7)\) for \(a=1,\ldots,6\) we get all (linear) lines \(\text{mod }7\).

lines mod 7

Note that in the relatively prime case it suffices to consider sums of the form \(s'(a,1;c)\) because \[s'(a,b;c) = s'(ab^{-1},1;c)\] where \(b^{-1}\) denotes the multiplicative inverse of \(b\) modulo \(c\). Can you see this in the picture?

Now, here are two questions for you:

Question. Does this visualization give an immediate geometric proof of Dedekind reciprocity? Why do \(s'(a,1;b)\) and \(s'(b,1;a)\) sum to something simple?

Question. We have proved a reciprocity law for the "pyramid sums" \(L(\hat{P}_{a,b;c})\). Is this reciprocity law equivalent to some classic reciprocity law for Dedekind sums and their generalizations?

For inspiration in this regard, you may want to consult Chapter 8 of Computing the Continuous Discretely by Matthias Beck and Sinai Robins.

Pyramids with Lattice Points on the Boundary

I promised to work out the reciprocity law for \(L(\hat{P}_{a,b;c})\) when \(a,b,c\) are not relatively prime. To this end we need to work out the number of lattice points in the triangle \[P_{a,b;c} \cap P_{a,c;b} =\text{conv}( \;(0,0,0), \; (0,b,c) ,\; (a,b,c) \; ).\]

We want to find a lattice transformation that maps this triangle into one of our coordinate hyperplanes, so that we can apply last lecture’s formula for counting lattice points. Let’s look at this triangle from the side, along the \(x_1\)-axis. Then it looks just like a line from \((0,0,0)\) to \((0,b,c)\). This line contains \(\text{gcd}(b,c)+1\) lattice points. The "first" lattice point after the origin has coordiantes \((0,\frac{b}{\text{gcd}(b,c)},\frac{c}{\text{gcd}(b,c)})\). We would now like to map this line onto the (in this picture) horizontal axis, such that \((0,b,c)\) is mapped to \((0,0,\text{gcd}(a,b))\), so that the \(\text{gcd}(b,c)+1\) lattice points on the line are mapped to \(\text{gcd}(b,c)+1\) adjacent lattice points on the \(x_3\)-axis. We do not want to affect the \(x_1\)-coordinate at all, so we, similarly, want to map \((a,b,c)\) to \((a,0,\text{gcd}(b,c))\).

lattice transformation

Let’s work out the matrix for this lattice transformation. We want \[\begin{pmatrix} ? & ? & ? \\ ? & ? & ? \\ ? & ? & ? \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} a \\ 0 \\ \text{gcd}(b,c) \end{pmatrix}.\]

As the first coordinate is supposed to remain invariant, we put \[\begin{pmatrix} 1 & 0 & 0 \\ 0 & ? & ? \\ 0 & ? & ? \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} a \\ 0 \\ \text{gcd}(b,c) \end{pmatrix}.\]

How can we combine \(b\) and \(c\) to get \(\text{gcd}(b,c)\)? This is precisely what the extended Euclidean algorithm does for us. It computes integers \(x\) and \(y\) such that \(xb+yc=\text{gcd}(b,c)\) which gives us \[\begin{pmatrix} 1 & 0 & 0 \\ 0 & ? & ? \\ 0 & x & y \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} a \\ 0 \\ \text{gcd}(b,c) \end{pmatrix}.\]

Now we want to fill in the second row such that \(b\) and \(c\) cancel out. A good first idea is \[\begin{pmatrix} 1 & 0 & 0 \\ 0 & -c & b \\ 0 & x & y \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} a \\ 0 \\ \text{gcd}(b,c) \end{pmatrix}.\]

But this matrix does not have all the properties we would like. We also want it to be a lattice transform, so in particular, its determinant has to be \(\pm 1\). Currently, however, the determinant is \[-cy-bx=-\text{gcd}(b,c).\] So what we have to do is divide those two entries by \(\text{gcd}(b,c)\) to give us our final matrix:

\[\begin{pmatrix} 1 & 0 & 0 \\ 0 & -\frac{c}{\text{gcd}(b,c)} & \frac{b}{\text{gcd}(b,c)} \\ 0 & x & y \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} a \\ 0 \\ \text{gcd}(b,c) \end{pmatrix}.\]

This is a lattice transform and it maps

\[\begin{pmatrix} 1 & 0 & 0 \\ 0 & -\frac{c}{\text{gcd}(b,c)} & \frac{b}{\text{gcd}(b,c)} \\ 0 & x & y \end{pmatrix} \cdot \begin{pmatrix} 0 \\ \frac{b}{\text{gcd}(b,c)} \\ \frac{c}{\text{gcd}(b,c)} \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}\]

and

\[\begin{pmatrix} 1 & 0 & 0 \\ 0 & -\frac{c}{\text{gcd}(b,c)} & \frac{b}{\text{gcd}(b,c)} \\ 0 & x & y \end{pmatrix} \cdot \begin{pmatrix} 0 \\ -y \\ x \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}.\]

So the lattice point \((0,-y,x)\) corresponds to the standard unit vector \((0,1,0)\) in this bijection. Can we interpret this somehow?

Let’s look at an example. Let \(b=10\) and \(c=14\) such that \(\text{gcd}(b,c)=2\). Then \[ 3 \cdot 10 - 2 \cdot 14 = 2 \] so $x= 3 $ and \(y=-2\) and the mysterious lattice point is \((0,2,3)\).

lattice basis example

In this picture there are two things to notice:

  1. Consider all translates of the line through \((0,0)\) and \((10,14)\) that go through some lattice point. \((2,3)\) lies on the translate closest to the original.
  2. The vectors \((2,3)\) and \((5,7)\) span a parallelogram of volume 1.

These are no accidents. \((2,3)\) and \((5,7)\) form a lattice basis. And the task of finding a lattice transformation to the standard basis is equivalent to finding a lattice basis. Let us make these notions precise.

Lattice Bases

Recall that a lattice transformation \(f:\mathbb{R}^d\rightarrow\mathbb{R}^d\) is an affine isomorphism that, when restricted to the integer lattice \(\mathbb{Z}^d\), induces a bijection (i.e., a 1-to-1 and onto map) on the integer lattice \(f|_{\mathbb{Z}^d}: {\mathbb{Z}^d} \rightarrow {\mathbb{Z}^d}\). Now, a lattice basis is a set \(a_1,\ldots,a_d\in\mathbb{Z}^d\) of linearly independent integer vectors such that for every \(z\in\mathbb{Z}^d\) there exist integers \(\lambda_1,\ldots,\lambda_d\in\mathbb{Z}\) such that \[\sum_{i=1}^d \lambda_i a_i = z.\]

These notions are closely related. For linearly independent integer vectors \(a_1,\ldots,a_d\in\mathbb{Z}^d\) the following are equivalent:

  • \(a_1,\ldots,a_d\) form a lattice basis.
  • There exists a lattice transformation \(f\) such that \(f(a_i)=e_i\) for \(i=1,\ldots,d\) where \(e_i\) are the standard unit vectors.
  • The fundamental parallelepiped \(\Pi_{a_1,\ldots,a_d}\) contains exactly one lattice point \(z\in\mathbb{Z}^d\)
Here, the fundamental parallelepiped \(\Pi_{a_1,\ldots,a_d}\) spanned by \(a_1,\ldots,a_d\) is the set
\[\Pi_{a_1,\ldots,a_d}= \{ x\in\mathbb{R}^d \; | \; x = \sum_{k=1}^d \lambda_i a_i \text{ for } 0 \leq \lambda_i < 1 \}.\]
parallelepiped

The crucial ingredient to the proof of the above equivalences is the following theorem.

Theorem. Let \(a_1,\ldots,a_d\in\mathbb{Z}^d\) be linearly independent integer vectors. Let \(A=(a_1 \; \ldots \;a_d)\) be the matrix with the \(a_i\) as columns. Then \[L(\Pi_{a_1,\ldots,a_d}) = \text{Volume}(\Pi_{a_1,\ldots,a_d}) = |\det(A)|.\]

Now, as we will see, it is always the case that the lattice point count approximates the volume in a certain sense. But for fundamental parallelepipeds we have equality here! The reason is the following.

Sketch of Proof. First of all we make the approximation argument a bit more precise, by arguing that

\[\lim_{k\rightarrow\infty} \frac{1}{2^{kd}}|\frac{1}{2^k}\mathbb{Z}^d \cap \Pi_{a_1,\ldots,a_d}| = Volume(\Pi_{a_1,\ldots,a_d}).\]

Instead of counting lattice points in some set \(X\), we can also measure the volume of a set of unit cubes, that each have one of these lattice points at their center. Intuitively, this is a rough approximation of the volume of the set.

Next, we refine this approximation in the following way. We refine the lattice \(\mathbb{Z}^d\) by scaling it by \(\frac{1}{2}\). Now we play the same game with the cubes, measuring the volume of a set of cubes, each with one of the lattice points in \(X\) at its center. To make sure that the cubes don’t overlap we have to scale their sidelengths by \(\frac{1}{2}\) as well, which scales their volume by \(\frac{1}{2^d}\).

approximation

If we continue in this fashion, we our cube set in the \(k\)th step will have volume \(\frac{1}{2^{kd}}|\frac{1}{2^k}\mathbb{Z}^d \cap \Pi_{a_1,\ldots,a_d}|\). And we can see that the volume of this cube set approximates the volume of the parallelepiped, in the same way that the number of pixels, multiplied by the area of a pixel, will approximate the area of a parallelogram drawn on a (very high resolution) computer screen.

So far, this works for all polytopes, not just for the parallelepiped. But now comes the curcial property of the fundamental parallelepiped.

\[ L(k\cdot\Pi_{a_1,\ldots,a_d}) = k^d \cdot L(\Pi_{a_1,\ldots,a_d}) \]

The reason is simply that we can tile the \(k\)-th dilate of \(\Pi_{a_1,\ldots,a_d}\) by \(k^d\) integral translates of \(\Pi_{a_1,\ldots,a_d}\). Because these are integral translates, they all contain the same number of lattice points as the original fundamental parallelepiped.

tiling

Now all we have to observe is that refining the lattice is the same as dilating our parallelepiped. Then, our proof is simply this:

\[ L(\Pi_{a_1,\ldots,a_d}) =\lim_{k\rightarrow\infty} \frac{1}{2^{kd}} L(2^k\cdot \Pi_{a_1,\ldots,a_d}) = \lim_{k\rightarrow\infty} \frac{1}{2^{kd}} |\frac{1}{2^k}\mathbb{Z}^d \cap \Pi_{a_1,\ldots,a_d}| = \text{Volume}(\Pi_{a_1,\ldots,a_d}).\]

Conclusion

After this detour, let us finally put our reciprocity theorem in the non-relatively prime case together.

Using our lattice transformation, we have seen that \[L(\text{conv}( \;(0,0,0), \; (0,b,c) ,\; (a,b,c) \; )) = L(\text{conv}( \;(0,0,0), \; (0,0,\text{gcd}(b,c)) ,\; (a,0,\text{gcd}(b,c)) \; ))\]

Using our formula from the last lecture, we conclude that

\[L(\text{conv}( \;(0,0,0), \; (0,0,\text{gcd}(b,c)) ,\; (a,0,\text{gcd}(b,c)) \; )) = \frac{(a+1)\text{gcd}(b,c) + \text{gcd}(a,b,c) + 1}{2}\]

We now have to adjust for the fact that the pyramids \(\hat{P}_{a,b;c}\) are half open. In particular, the two "rectangular" sides have been removed (when intersecting half open pyramids). This gives us

\[ L(\hat{P}_{a,b;c}\cap \hat{P}_{a,c;b}) = \frac{(a+1)\text{gcd}(b,c) + \text{gcd}(a,b,c) + 1}{2} - a - \text{gcd}(b,c) - 1\]

which simplifies to

\[ L(\hat{P}_{a,b;c}\cap \hat{P}_{a,c;b}) = \frac{1}{2}\left((\text{gcd}(b,c) -2) a - \text{gcd}(b,c) + \text{gcd}(a,b,c) -1\right).\]

These observations also allow us to compute the number of lattice points on the diagonal

\[ L(\hat{P}_{a,b;c}\cap \hat{P}_{a,c;b} \cap \hat{P}_{b,c;a}) = \text{gcd}(a,b,c) -1. \]

Now we can out all of these formulas together by applying inclusion-exclusion.

\[ \begin{eqnarray} (a-1)(b-1)(c-1) & = & L(\hat{P}_{a,b;c}) + L(\hat{P}_{a,c;b}) + L(\hat{P}_{b,c;a}) \\ && - L(\hat{P}_{a,b;c}\cap \hat{P}_{a,c;b}) - L(\hat{P}_{a,b;c}\cap \hat{P}_{b,c;a}) - L(\hat{P}_{a,c;b}\cap \hat{P}_{b,c;a}) \\ && + L(\hat{P}_{a,b;c}\cap \hat{P}_{a,c;b} \cap \hat{P}_{b,c;a}) \end{eqnarray} \]

This yields

And all of this, just because we fit three pyramids together to form a quader!

While the first two lectures focused on two examples of how visualizing numbers can yield actual formulas, we are going to start with some theory in the next lecture and talk about polytopes. So you do not have to worry: the next lecture is not about \(\sum_k k^3\)! Here is another question, though:

Can you work out the classic formula for \(\sum_k k^d\) by putting \(d\) pyramids over a \(d-1\) cube together?

Lecture - Visualising Numbers (Part 1)

Gauss’ sum formula

One of mathematic’s most well-known anecdotes tells a story about Carl Friedrich Gauss when he was still a boy. At the Volksschule in Braunschweig, his math teacher gave the class the assignment of summing all integers from 1 through 100, with the intent of keeping them busy for a while. But as soon as the teacher had posed this problem, young Gauss wrote down his answer and shouted: "There it is!" To everyone’s astonishment he had written down the correct number: 5050.

What had happend was that Gauss had refused to churn numbers, the way his teacher had intended him to. Instead he had done mathematics! He had come up with the formula \[\sum_{i=1}^{n} i = \frac{n(n+1)}{2}\] and applied it to the case \(n=100\) to get his result.

This formula was known at the time - but not to Gauss. So the natural question is to ask, how Gauss had come up with this formula. Proving this formula may be a routine exercise. But where had his idea come from that something like this might hold?

Of course, I do not know what went on in Gauss’ head and I certainly agree that Gauss was one of the greatest mathematicians of all times. But what I would like to argue here is this:

It does not take a genius to come up with formula. All it takes is some intuition about numbers.

Here I would like to describe one such intuition that leads to this formula.

We visualize the number \(i\) simply as a set of \(i\) points. For convenience we place all these points in one column. Taking the sum now simply means taking the disjoint union of these sets of points. In case of the sum \(\sum_{i=1}^n i\) this yields a triangle of points.

figure

Now, we have no idea (yet) how to count point in triangles. But we have a very good idea of how to count points in rectangles: that’s just what multiplication is for! So we take two instances of this triangle and fit them together such that they form a rectangle of \(n\) by \(n+1\) points.

figure

This shows \[2\cdot \sum_{i=1}^n i = n(n+1)\] which implies Gauss’ formula.

Note: A video in which I tell just this story in German can be found here.

Lattice Points in Triangles

Crucial to the above argument was that we arranged the points in a regular pattern. In fact, we used points \(x\) in the lattice \(\mathbb{Z}^2\). Throughout these lectures a lattice point or integer point will be an element of the integer lattice \(\mathbb{Z}^n\) for some \(n\). There are other lattices, but we will only work with this one. For convenience, we will write \(L(X)= |\mathbb{Z}^n\cap X|\) for the number of lattice points in some set \(X\subset\mathbb{R}^n\).

In the above proof, we used the set of lattice points contained in the triangle
figure

The two "instances" of this triangle fit together so nicely, because the line \(x_1=x_2\) corresponding to the inequality \(x_1\geq x_2\) has slope 1. Thus, whenever we move one column to the right, the number of lattice points in the column always changes by one. What happens when we consider triangles with more general (rational) slopes?

Let \(T_{a,b}\) denote the triangle with slope \(\frac{b}{a}\) for \(a,b\in\mathbb{N}\).

figure

Here, the "steps" we take from one column to the next become irregular. Sometimes we go one lattice point up, sometimes we don’t. But before we take another look at the structure of this lattice point set, let us simply count how many lattice points there are.

Counting Lattice Points

Counting the number of lattice points in \(T_{a,b}\) is easy enough.

\[ L(T_{a,b}) = |\mathbb{Z}^2 \cap T_{a,b}| = \sum_{k=0}^a \left( \left\lfloor \frac{kb}{a} \right\rfloor + 1 \right)\]

But what we really want is a simple expression for the sum on the right hand side. Let us try a variation of the previous trick to find one!

figure We can fit two "instances" of the triangle \(T_{a,b}\) together to form the rectangle \(R=[0,a]\times[0,b]\). All lattice points in \(R\) are contained in exactly one of the two triangles, except for the lattice points on the diagonal
\[D_{a,b}=\text{conv} \; \{ \;\begin{pmatrix} 0\ 0\end{pmatrix},\; \begin{pmatrix}a\ b\end{pmatrix} \;\}\;\]

The lattice points on the diagonal \(D\) are contained in both triangles and are thus counted twice in the following sum.

So how many lattice points are on the diagonal?

Lemma. The number of lattice points on the diagonal \(D_{a,b}\) is the greatest common divisor of \(a\) and \(b\) plus one, that is, \[L(D_{a,b}) = \text{gcd}(a,b) +1.\]

Proof. We make the following observations.

  1. The lattice points on the diagonal parition the diagonal into intervals of equal length.
figure

This follows simply from the fact that the difference between to lattice points is an integer vector and translation by an integer vector maps lattice points to lattice points.

  1. Let \((p,q)\in D\cap\mathbb{Z}^2\) such that \(p>0\) is minimal. Let \(k\) be the number of intervals into which \(D\) is partitioned. Then \[kp = a \;\;\; \text{and} \;\;\; kq = b.\] So \(k\) is a common divisor of \(a\) and \(b\).
  1. Conversely, let \(k'\) be a common divisor of \(a\) and \(b\). This means, there exist \(p',q'\in \mathbb{Z}^2\) such that \[ k'p' = a \;\;\; \text{and} \;\;\; k'q' = b,\] which implies \(\begin{pmatrix} p \\ q\end{pmatrix}=\frac{1}{k'}\begin{pmatrix} a \\ b\end{pmatrix},\) which in turn means that \((p',q')\) is a lattice point on the diagonal \(D_{a,b}\).

As \(p\) was chosen minimal, we conclude that \(p' \geq p\) and thus \(k' \leq k\), which means that \(k\) is the greatest common divisor of \(a\) and \(b\). The number of lattice points on the diagonal is \(k+1\). QED.

Great! Now we have both: a visual interpretation of the \(\text{gcd}\) and the means to complete our formula!

\[\sum_{k=0}^a \left( \left\lfloor \frac{kb}{a} \right\rfloor + 1 \right) = \frac{(a+1)(b+1) + \text{gcd}(a,b) + 1}{2}\]

Note that by symmetry we also get

\[\sum_{k=0}^b \left( \left\lfloor \frac{ka}{b} \right\rfloor + 1 \right) = \frac{(a+1)(b+1) + \text{gcd}(a,b) + 1}{2}.\]

This corresponds to the fact that the lattice point sets in \(T_{a,b}\) and \(T_{b,a}\) are "the same". Let us make this notion precise.

Lattice Isomorphism

We want to answer questions such as "How many lattice points are in a set \(S\)?" From this point of view, we want to see two sets as "the same" when there is a linear automorphism from the ambient space to itself that induces a bijection on the lattice. Then, applying the automorphism does not changes the number of lattice points in any given set.

So, a lattice isomorphism is an affine linear map \(f:\mathbb{R}^n\rightarrow\mathbb{R}^n\) of the form \(f(x)=Ax+b\) that induces a bijection from \(\mathbb{Z}^n\) onto itself. We say set \(X\) is a lattice transform of a set \(Y\) if there exists a lattice isomorphism \(f\) with \(f(X)=Y\).

Lattice isomorphisms are very useful objects. We have already made implicit use of the fact that translations by an integer vector and reflections at the origin are lattice isomorphisms. And our observation that \(T_{a,b}\) and \(T_{b,a}\) have "the same" lattice point structure can be made precise by saying that

\[ x \;\;\; \longmapsto \;\;\; \begin{pmatrix} 0 & -1 \ -1 & 0 \end{pmatrix} \cdot x + \begin{pmatrix} b \ a \end{pmatrix} \]

is a lattice isomorphism.

We will need one more lattice isomorphism in a minute. And to show that it is a lattice isomorphism, we will make use of the following tool.

Lemma. The map \(x\mapsto Ax +b\) is a lattice isomorphism if and only if \(A\) and \(b\) are integral and \(\det(A) = \pm 1\).

The proof is not difficult, and we omit it for now.

Structure of \(\mathbb{Z}^2\cap T_{a,b}\)

We now know how many lattice points there are in \(T_{a,b}\). But what we would really like to understand is the structure of this set of lattice point. When we look at larger examples of these triangles than we see that this "staircase" of points appears "irregular" but far from "random". What is going on here?

One way to approach this question is by investigating the recursive structure of these lattice point sets. We can ask:

What subset of \(\mathbb{Z}^2\cap T_{a,b}\) is particularly easy to describe?

As we have seen in the example of Gauss’ sum formula, triangles with integral slopes are particularly easy to describe.

Let us suppose that \(a < b\). This means that the slope \(\frac{b}{a}\) of the triangle is "steep". In this case, we can find a subset of the lattice points which corresponds to a triangle with integral slope. Or, more precisely, \[\mathbb{Z}^2\cap T_{a,a} \subset \mathbb{Z}^2\cap T_{a,b}.\]

Let us split the triangle \(T_{a,b}\) into two parts. A lower part with integral slope that is easy to describe and an upper part that we have yet to make sense of. Both parts meet at the line \(x_1=x_2\). For convenience, we decide that this boundary should belong to the upper part and it should be removed from the lower part. The lower part then becomes a half-open triangle of the form

\[T'_{a,a} = \{x : x_1\leq a, x_2\geq 0, x_1> x_2\}.\]
figure

In this way we can partition \(T_{a,b}\) such that

the lower part \(\mathbb{Z}^2\cap T'_{a,a}\) has a simple form and precisely \(\frac{a(a+1)}{2}\) lattice points.

Good. What about the upper part \(T_{a,b}\setminus T'_{a,a}\)?

We would like to see that this upper part is again a triangle of the form we have been studying all the time, so that we apply recursion. To that end we need to transform the upper part so as to create a right angle at the vertex on the bottom-right. To achieve this we might try to shear the triangle. But is the resulting linear transformation a lattice isomorphism?

figure As we can see the matrix of the linear transformation is simply
\[A = \begin{pmatrix} 1 & 0 \ -1 & 1 \end{pmatrix}.\]

As \(\det(A)=1\) we conclude that this transformation is a lattice isomorphism and thus:

The upper part \(T_{a,b}\setminus T'_{a,a}\) is a lattice transform of \(T_{a,b-a}\).

So we can simplify "steep" triangles until we obtain a "flat" triangle with slope \(<1\). But by our observation that

\(T_{a,b}\) is a lattice transform of \(T_{b,a}\)

we can turn "flat" triangles into "steep" triangles and continue with this method of simplification! Equivalently, we can handle flat triangles by partitioning them differently and shearing "left" instead of "down".

figure

We have thus obtained a recursive description of the sets of lattice points in the triangles \(T_{a,b}\).

reduction

This allows us to decompose the set of lattice points in a large triangle \(T_{a,b}\) into lattice transforms of "simple" sets of lattice points.

figure

(Note: There are other ways of making sense of this "staircase" of lattice points. If you got curious, I would like to advertise Fred’s and my article on "Staircases in \(\mathbb{Z}^2\)".)

As a side-effect, this recursion also allows us to count the lattice points in \(T_{a,b}\) recursively.

Theorem. Let \(a,b\in\mathbb{N}\). Then:

  1. If \(a>b\) then \(L(T_{a,b})=L(T_{a-b,b}) + \frac{b(b+1)}{2}\).
  2. If \(a < b\) then \(L(T_{a,b})=L(T_{a,b-a}) + \frac{a(a+1)}{2}\).
  3. \(L(T_{a,a})=\frac{(a+1)(a+2)}{2}\).

Apart from the structural insight, what does the recursion for \(L(T_{a,b})\) buy us? With \[L(T_{a,b})=\frac{(a+1)(b+1) + \text{gcd}(a,b) + 1}{2}\] we already have a very nice closed form expression for \(L(T_{a,b})\), don’t we?

Well, not quite. \(\text{gcd}(a,b)\) is no closed form expression. To compute \(\text{gcd}(a,b)\) we need to use the Euclidean Algorithm. In fact, I would like to argue that the above recursion is essentially the same as the Euclidean Algorithm.

Visualizing the Euclidean Algorithm

I would like to close this lecture by arguing that the recursion we developed in the previous section is nothing but a visualization of the Euclidean Algorithm. A simple description of the Euclidean Algorithm (which he called "antenaresis") is this:

"[The greatest common divisor] can be found using antenaresis by repeatedly subtracting the smaller, whichever it happens to be at the time, from the larger until the smaller divides the larger." Euclid’s Elements

When we apply the above recursion to compute \(L(T_{a,b})\), we do exactly the same thing, as far as the pair \((a,b)\) is concerned! If \(a>b\) we pass to \((a-b,b)\) and if \(a < b\) we pass to \((a,b-a)\). Even the stopping criteria are essentially the same, even though they look different at first glance.

So here is a "geometric interpretation" of the Euclidean Algorithm:

  1. We want to compute the greatest common divisor of \(a\) and \(b\). As we have seen this amounts to computing the number of lattice points on the diagonal \(D_{a,b}\) of the rectangle \([0,a]\times[0,b]\).
  2. To achieve this, we apply lattice transformations. These do not change the number of lattice points on the diagonal.
  • If \(a > b\) we shear the diagonal "left" and pass to \(D_{a-b,b}\).
  • If \(a < b\) we shear the diagonal "down" and pass to \(D_{a,b-a}\).
  1. We stop as soon as the number of lattice points on the diagonal can be read of immediately. For example, if we reach a line of the form \(D_{a,a}\) or \(D_{a,ka}\) or \(D_{a,0}\).
figure

So in a way,

all the Euclidean Algorithm does is compressing a set of lattice points by means of lattice transformations!

For our two formulas for \(L(T_{a,b})\) this means that both use the same type of recursion. This of course begs the question:

Is there a faster way to compute the \(\text{gcd}\) of two integers? Is there maybe even a "closed form" expression for \(\text{gcd}\)?

As it turns out, these questions are closely related to a long standing open problem in complexity theory:

Is the problem of computing the \(\text{gcd}\) in the complexity class \(\mathbf{AC}\)?

Informally, this is asking whether there exists a family of boolean circuits with polynomial size, polylogarithmic-depth and unbounded fan-in that computes the \(\text{gcd}\). See the Complexity Zoo for more information.

Outlook

This is it for today! Next time, we will continue visualizing numbers and venture into the world of Dedekind sums.