Go back to All About That Bayes
This article should be read as a collection of ideas for reducing research debt in the sense of Chris Olah's and Shan Carter's article
The last true polymath is said to have been Gottfried Wilhelm Leibniz, with countless contributions to logic, philosophy, mathematics, engineering, psychology, physics, law and various other fields. Few of us can say to have a similarly broad expertise; even more: Most researchers would label themselves not as mathematicians but as topologists, probabilists or computational scientists; not as physicists, but as experimental physicists or string theorists etc. Has mankind become more stupid? Arguably, but another factor is definitely more significant.
If Leibniz's endeavours are likened to climbing all mountains in the Austrian Alps (which is no mean feat but possible given sufficient fitness and perseverance), a modern scientist aspiring to be a true polymath faces a challenge similar to that of scaling every elevation higher than five hundred meters on Earth as well as on any known celestial object in the solar system. The large amount of knowledge that humanity has accumulated so far has made it virtually impossible to be an expert in more than one discipline.
This also affects students aspiring to become researchers themselves: The long climb to the peak of current research becomes longer and more tedious with every new result stacked on top of the mountain of science. Thought through its conclusion, this means that -- given nothing changes -- there is a natural limit to research at a point where coming to speed with current research takes a whole lifetime. All peaks that are higher than that constitute impossible climbs (and impossible scientific breakthroughs).
There are two ways out of this dilemma. First, we can narrow down specializations even more
such that there is less relevant material to be learned until novel research can be carried out. But this can only buy us so much time and it seems plausible that there is an upper limit to specialization
that will still attract people to scientific research. Also, a forest of disconnected ivory towers not being able to communicate with each other generates its own kind of problems: Techniques which are useful for various disciplines need to be developed
by each discipline separately and multidisciplinary projects will become even more rare.
A second way is to speed up the learning process
Cairns, Gown and Collins
The authors argue that this method for problem solving can be applied verbatim to learning mathematics; and a teacher of mathematics (or someone writing a textbook or a scientific article) should present her material in a way that the reader will (maybe unknowingly) follow these four steps.
Translated into the language of teaching mathematics, these four steps can be reformulated.
A prime example for this is put forward by Jonathan Shewchuk in his paper "An introduction to the Conjugated Gradient Method without the agonizing pain"
The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Nevertheless, the Conjugate Gradient Method is a composite of simple, elegant ideas that almost anyone can understand. Of course, a reader as intelligent as yourself will learn them almost effortlessly.
[...] I have taken pains to make this article easy to read. Sixty-six illustrations are provided. Dense prose is
avoided. Concepts are explained in several different ways. Most equations are coupled with an intuitive interpretation.
The article keeps its promise and truly avoids any agonizing pain, relying on building geometrical intuition to introduce the
concept of conjugated gradients while basically following Polya's four steps: The author first builds intuition why the naive gradient descent is suboptimal and then sketches the basic idea of conjugated gradients.
He proceeds by executing the derivation of the algorithm (still for quadratic forms) and reflects on its version in the setting of nonlinear optimization.
I believe that this four step system as presented in
In his 2012 paper
Lamport proposes to use a hierarchical proof structure. He writes, "When one reads a sentence in a prose proof, it is often unclear whether the sentence is asserting a new fact or justifying a previous assertion; and if it asserts a new fact, one has to read further to see if that fact is supposed to be obviously true or is about to be proved."
He presents a proof from Spivak's Calculus book
Lamport proposes to structure any proof by introducing sub-steps which are actually sub-proofs and to iterate this subdivision until each sub-sub-proof is trivial. The figure above shows Lamport's expanded version (on the right hand side). It clearly marks all the "ins and outs" in every step of the proof. Now some reader may struggle with step 2: The application of the Mean Value Theorem requires a function to be continuous but he only sees a differentiable function. This means we need a sub-proof for the fact that every differentiable function is continous. At this point we can either refer to a lemma which might have proven this fact before or we can follow Lamport's suggestion to insert a sub-proof:
A hierarchical proof can be implemented in at least three ways.
Here is a demonstration of how an interactive hierarchical structure could work on a webpage
Corollary: If $f'(x)>0$ for all $x$ in an interval $I$, then $f$ is increasing on $I$.
Proof:
Proof
Proof
Proof
Mathematical intuition seems to be a contradiction in terms: Intuition is a quick and dirty answering tool shaped by our everday experience: When we show someone a video of a person throwing a ball and we stop the video a
few seconds into the ball's trajectory, we will get a roughly correct answer for "Where will it land?". We have intuition for where a ball thrown by hand will land approximately even without making the calculations.
On the other hand, we don't have intuition for the dynamics of a ball thrown with 90% speed of light because such situations are outside of what we see every day
In my experience from mathematics, intuition comes with specific mental imagery: The simplex algorithm for linear optimization problems can be posed in the language of linear inequalities as a sequence of Gaussian elimination steps. It is a lot more revealing, though, to think of the simplex algorithm geometrically as jumping from vertex to vertex in the polytope of feasibility, always improving on the value of the objective functional.
Even if there is no actual geometric picture for a specific mathematical concept, there is often a "particularly good" visual way of thinking about it. In many cases, this "good way" is both most easily expressed and stored in memory when presented visually. This visual representation may or may not be present in the mathematical community but in some cases is not effectively taught in courses and not explained in most textbooks.
An example for this is conditional entropy and mutual information from information theory. Most textbooks (and Wikipedia) provide with its definition a visual aid in the form of a Venn diagram (see below). But there
is another way by using a specific bar plot.
For reference, these are the Venn diagram and the bar plot visualization as depicted in
It is important to realize that visual explanations are not just pretty pictures but tools for understanding and building intuition which can lead to a deeper grasp of the concept they convey.
In some cases, the mathematical objects under study actually are dynamical objects. In this case it is almost negligent to not show animations of their dynamics, albeit it is inevitable
for textbooks and articles presented on paper. There are a few examples pushing the limit of what can be sketched on paper, most notably the wonderful four-part cartoon series
As a remarkable example of what can be achieved by switching from traditional paper to dynamical websites as a medium for presenting mathematics, we consider Michael Nielsen's online book
Neural networks and deep learning
Humans have a deep-rooted passion for playing and exploring, with Friedrich Schiller going as far as to say "Man only plays when in the full meaning of the word he is a man, and he is only completely a man when he plays.". Just as with many breakthroughs in physics and engineering, many mathematical discoveries only arose from someone thinking "what happens if I change this bit?". People love to tinker and to fiddle; and to observe the environment's reaction.
Fun leads to passion, passion gives persistence and motivation, which in turn is necessary for the hard work needed in order to achieve anything meaningful.
Another facet of the power of playing is that it helps acquiring familiarity and intuition. Why is that? To understand this we need to think about what playing really is: Children play in order to see how things work. When they
knock over a tower of bricks they do not mean to be destructive, they are trying to understand statics and gravity. No parent would try to teach their toddler Newton's laws of motion, because playing is an incredible
shortcut to a kind of understanding which may not be exact, but approximately correct and intuitive
While the importance of playing for children and their development has been widely acknowledged, there seems to be a growing body of literature suggesting that adults can benefit a lot from playing, too. I am no expert in psychology and so we will not get into that. I will just state an unproven proposition: Playing helps people to gain informal and intuitive familiarity with some object or system which can help them when they are thinking about it in an analytical way. This in turn improves intuition. I think about this feedback system in the following way.
It could be argued that "traditional" ways of presenting mathematics work by entering this feedback cycle from below (starting with knowledge), by confronting the reader with
enormous amounts of facts (definition, definition, definition, lemma, lemma, definition, theorem,...). Only the fittest survive this and emerge -- with an intuition which they formed as a by-product
Let's talk about the game "A slower speed of light" by the MIT Game Lab. The objective is very simple: Pick up 100 orbs by walking around in a slightly Kafkaesque landscape. Every orb you pick up reduces your environment's speed of light by a small amount (while your own maximum speed is unchanged), until your own speed reaches the speed of light with the 100th orb. By doing that, you can experience step by step relativistic effects: The doppler effect changes the colors of objects, the searchlight effect brightens objects which lie ahead of you and the whole scenery gets warped by length contraction and time dilation.
By playing "A slower speed of light" you can get familiarity and almost "real life experience" with something which is classically beyond the reach of our senses. After some time you intuitively know in which direction the giant mushrooms in the landscape will (seem to) bend when you whizz past them with 90% speed of light. If you decided to calculate length dilation with pencil and paper there is a good chance you would be able to match the outcome with your newly acquired relativistic intuition and say "Yep, this sounds about right".
Or take the short browser game "District"
There is a community of people trying to teach interesting concepts like the one above in the form of "Explorable Explanations", see this website.
Also, the data visualization community is doing wonders with interactive data presentation. I strongly believe that there could be a lot more collaboration between scientific researchers and data visualization wizards, aimed at creating graphical explanations of very challenging topics (as in functional analysis or quantum field theory).
Cairns, Gown and Collins
Setting those parameters means choosing an audience: There are few textbooks which are read by both undergraduate students and researchers, and indeed, as Cairns, Gown and Collins point out: A given mathematical text may not even be appropriate for a single reader at different times (with the reader progressing in her mathematical abilities, she may want to see less and less technicalities in the proofs and the general exposition in order to more quickly absorb essential information).
Paper based media necessarily fix these parameters once and for all. The use of "starring" sections (i.e. marking them as advanced material) or relegating technical proofs to the appendix can alleviate the issue, but permanently skipping large portions of text makes for a tiresome and disorienting read.
With modern web technologies, there are at least two possibilities in order to tackle this problem. First, we can apply the idea of hierarchicalization
Textbooks and scientific articles are usually written in a linear, contiguous manner. Theorems and their proofs follow lemmata, after that we get examples, concluded by exercises left to the reader. The reader can stray from the path, jump over sections and ignore proofs which she deems too tedious to follow for the moment, but she is still forced to navigate her way through what is essentially a very long scroll of papyrus. If she needs to re-read a previous paragraph or find a definition, she needs to scroll back until she finds the passage. It is hard to navigate in one dimension because one has to remember the exact order of the items along it. There is another option, though.
We talked about people loving to play and explore and so it makes sense to try and introduce "explorative fun" any time when teaching mathematics (be it lecturing undergrads or writing a research paper).
There is a caveat to this: Given too much explorative freedom, anyone will be overwhelmed by the amount of choices
There is a sweet spot (which may be individually different) between the yoke of a very rigid textbook and unhelpful total freedom. I believe that an instructive teaching medium has struck the balance if it
takes the reader by the hand, gently leads her towards helpful ideas, warns her of common misunderstandings but then takes a step back and waits for her to choose the directions in which she wants to proceed:
More rigor, more exercise, more details or more intuition
Instead of just hiding and expanding sections which seem interesting to the reader, it would make for an immersive experience if she could actually "go in different directions", not only figuratively (while still being constrained to the one-dimensionality of the document), but literally: A classical document (even with dynamically expanding and collapsing sections) is still a continuous scroll (albeit variable in length).
We can imagine her navigating a tree-like structure, with a stem covering the main story with branches diverting to "building intuition", "generalization to metric spaces", "more examples and counterexamples", "applications to physics" etc.
Most readers will find some of those branches, but not all of them, interesting. By presenting the material in this form, the reader does not need to skip a large number of pages,
losing track of what he read and what he leapt over. As humans usually have a decent way of navigating themselves in two dimensions
A second reason for "writing a tree" (instead of a scroll) is that dependencies often form a tree structure, see below. Sometimes the reader has a specific goal in mind, i.e. a chapter, algorithm or theorem she wants to understand. If the document is written tree-like, she can follow the sequence of branches leading to her goal very easily and more intuitively.
The chapter dependency structure of
Try to recall how many textbooks you have actually read from cover to cover (I suspect that this constitutes a minority). Most readers look for a specific "substory" of a given textbook (or an article). Let's take a typical linear algebra textbook. Here are a few plausible reader types and their intention for reading it:
All those readers superficially need different textbooks because every one of them needs to ignore large portions of the whole text (which is very tiring). It could even be hypothesized that there is such a large (and growing) number of books on almost any topic because there are typically a lot of different reader types. Few books manage to appeal to all of them and it is exhausting to read a book like this:
A specific reader might read just small, scattered parts (in red) of a book.
A dynamical content textbook where the reader can collapse and expand sections might help bringing those red batches closer together, but a tree-like structure might be more intuitive.
When navigating a "book tree", the reader has a clearer structure of what he is reading and what he is skipping.
It would be interesting to see a scientific article or a textbook which presents its content in such a tree-like manner.
We conclude this article with a short discussion of challenges to the ideas presented.
This manuscript is inspired by articles written by Chris Olah and Shan Carter, and blog posts by Michael Nielsen (see references). I am grateful to Ludwig Schubert for encouraging comments. If you have questions or comments, please drop me a line at contact[at]pwacker[dot]com. I am indebted to the Distill Journal whose layout I was kindly allowed to use.