These characters were like twelve-bar blues … Given the basic parameters of Batman, different creators could play very different music.

Grant Morrison, Supergods: What Masked Vigilantes, Miraculous Mutants, and a Sun God from Smallville Can Teach Us About Being Human

Intro 

I have been delinquent on two counts (if not more):

  1. So far I have looked almost entirely at Marvel, and left DC on the floor, and
  2. I haven’t been keeping up my Batman reading.

I thought I should catch up on both counts, starting by checking out some top-10 lists to work out what Batman story to read next. But they don’t seem to agree. What do you do when you have a set of contradictory rankings?

It turns out there is an entire theory about that, wrapped up in the theory of elections. I’ve used those ideas to compile a meta-top-10 list (meta is used here because this is a list formed from other lists, much as a meta-analysis in statistics combines the results of multiple individual studies to provide a pooled estimate). My meta-list is given in Table 1.

There is much more in what follows that explains where this comes from – it isn’t just my opinion being expressed here! This list is synthesized from the lists of 30 other people or groups, and the full list of graphic novels being ranked includes nearly 100 titles that appeared on someone’s top-N list at some time. It’s kind of crazy that there is so much variation in people’s choices, and yet we can still find patterns.

The techniques that are used in compiling a meta-list are interesting because they touch on a strange paradox that lies at the conjunction of Mathematics and the theory of Elections. It’s usually discussed under the name Arrow’s Impossibility Theorem after its creator.

Arrow’s Impossibility Theorem 

People care deeply that elections should be fair, but it has often been noticed that no matter how fair we try to make a system of voting, sometimes weird things happen. For instance, it is often noted that a winning candidate has lost the “popular” vote. That is, they would have lost if a simple majority election was held. It is often thought that “if only we had a different, better system, we could avoid such tragic outcomes.”

Kenneth Arrow (1921-2017) showed that this was an impossible dream. He won the Nobel Prize in Economics (with John Hicks) in 1972 for his work.

Elections have been studied mathematically for nearly a thousand years. The topic is named variously “social choice theory” or “election theory”. The idea (in an election) is to understand how individual preferences can be converted to an overall ranking, which is exactly what I want to do today.

Unfortunately, Arrow showed that there is no perfect system. His theorem is reminiscent of the quip:

faster, cheaper, better – pick two.

Arrow started by describing a set of (mathematical) properties that we want our voting/ranking system to have. Arrow had a set of four properties, each of which seemed obviously desirable1. Arrow showed that you couldn’t have all of these properties at once in any system.

That’s a very powerful result. It says that every election system can have results that seem unfair to someone. It also means there is no unique system for calculating ranks that everyone can agree on. So I am going to try four different systems, starting with one of the most common: a Borda count.

Borda and Friends 

System 1: Borda 

A “Borda” count is a great example of Stigler’s law of eponymy, which states that

No scientific discovery is named after its original discoverer.

Jean-Charles, Chevalier de Borda (1733-1799) was a French mathematician who suggested his eponymous method as a means to elect members to the French Academy, but Ramon Llull (1232–1315), a Catalan monk, wrote about the method (and others) around 500 years earlier2. Unfortunately, Borda’s name has been used for a long time, and so we are stuck with it.

The idea of a Borda count is to award points for your rank in any given list. The standard method awards points corresponding to the number of other candidates you beat, for instance see Table 2. In an election we only care about who comes first, and so the person with the top Borda count would win. However, here we use the scores to determine a new meta-ranking.

Simple Borda counts won’t work here. The simple version assumes a fixed list of candidates, and that every voter lists them all in descending order. However, the Batman ranking lists

  • vary in length (e.g., top-5 or top-25);
  • don’t consider the same list of candidate graphic novels; and
  • some are just a “to read” list, which is unordered.

My Borda count has been adapted to these conditions by

  • presuming that all rankings are selecting from the larger list, and that unranked candidates are at the bottom (see Table 2); and
  • where a “to read” list has N candidates, we assume they are all equal Nth on the list. We could have made them equal 1st, but this really over-weights some candidates by a lot.

System 2: Dowdall 

There are many modifications of the Borda count, often using an alternative scoring. The Dowdall system (that is reputedly in use in elections in Nauru) awards \( 1/n \) points to the \( n \) th ranked candidate. The Dowdall system has an advantage over Borda for our application. In typical elections there are only a few candidates, but many voters. Here, I have a relatively large number of candidate graphic novels (nearly 100) and only 30 input rankings. In this setting the Borda count doesn’t provide very good discrimination between 1st and 2nd place on a list. It’s the difference between say 100 and 99 points. Of course, the points add up, but the Dowdall system places a higher value on coming 1st compared to 2nd.

System 3: Power (weighted Dowdall) 

You might suspect (I do) that the Dowdall system places too much weight on 1st place. In an election, voters often have a very strong preference for their top vote but my read of a lot of the top-N graphic novel rankings is that their authors are less dogmatic about the exact order. Giving such a strong weight to 1st place might be a little optimistic. That leads to the next system, my own power-weighted Dowdall system.

A “power” weighting gives \( 1/n^\alpha \) points for \( n \) th place, where \( \alpha \) is between 0 and 1. When \( \alpha = 1 \) we get the Dowdall system, but we can tune how much weight 1st place gets by choosing a smaller value. I tried a few and eventually settled on the simple choice of \( \alpha = 1/2\) .

Table 2 shows the points awarded by the different scoring systems for a ranked list of candidates.

System 4: Condorcet/Copeland 

There are an almost infinite number of other election/ranking systems. I found a really nice table comparing them here. I wanted to do a few more comparisons, but I couldn’t spend the time to implement all of these, so I chose one more. I selected a method that is both simple and a Condorcet method – such methods guarantee that if a candidate would win the majority of the time in any head-to-head comparison, then it would win the election. Incidentally Condorcet is another Stiglerism: Llull wrote about the idea of so-called Condorcet methods first as well.

There are many Condorcet methods: I am using Copeland’s method because all I need to do is count the number of head-to-head wins minus losses for each pair. That’s simple, conceptually and computationally.

You might ask “If you have a Condorcet method why try anything else? Surely that is the best?” I refer you back to Arrow’s Theorem from which we know there must be some downside. Condorcet’s problem is called the Condorcet paradox. It was discovered by the Marquis de Condorcet in the late 18th century, but it was independently discovered by Lewis Caroll, the writer of Alice in Wonderland3. Condorcet and Carroll (nee Dodgson) noted that voters’ preferences can be cyclic, and hence there may be no Condorcet winner. I plan to say more on cyclic systems at a later date, so hang on for that…

There are some other issues. The Copeland system, and others similar to it calculate a preference matrix comparing each pair of candidates. For an election with \( N \) candidates, this is an \( O(N^2) \) data structure. The amount of raw information you have is \( O(R N) \) for \( R \) input rankings (or voters).

For a typical election the number of voters \( R > N \) so the amount of information is large compared to the number of estimates you need to make.

However, here, \( R \) is small (30) and \( N \) is larger (and potentially much larger – imagine we wanted the best graphic novels overall) and so we have more unknowns than data points. That worries me. There is the danger of over-fitting to limited data. I’m not saying it can’t work, but let’s be wary.

Data and Julia Code 

I don’t want to spend forever talking about data and data cleaning this time around, but a few words are needed.

First, I chose the lists I am using by the simple expedient of Googling. I looked for top-N lists, and only included those that were composed specifically for Batman over a notionally unlimited time range4. I excluded lists arising in comments on other lists, i.e., I only went for the list in a main article. I gave up after adding 30 lists (read: I got tired of entering data).

Listal says they started their list (in 2005) with a list of more than 150 titles, just looking at graphic novels (collections and trade paperbacks included). I didn’t try to be so comprehensive. I only ranked those that appeared somewhere in the input rankings. That still gave me a list of nearly 100 graphic novels to work with.

The input lists and their sources are given in a set of files called “ranking_NN.csv” hosted at my GitHub (look at the comments for the sources). The code is set up to make it fairly easy to add an extra ranking, so I am very open to suggestions.

However, the original data wasn’t particularly easy to work with:

  • Names of comics are not given in standard forms
  • Some items in the lists are not a “graphic novel” as such, but rather a series of comics. I have only included series that have a collected form, or one-shot novels.
  • Sometimes, there are multiple formats for one storyline, and different lists reference different versions.

Consequently, I had to do some manual cleaning of the data to make it consistent. As noted, my final lists, and their origin are given in the files.

Some lists don’t rank, they just provide a “to read” list. In those cases the ranking data is missing, but scoring systems I’m using can cope with that.

Finally, the lists were all compiled by different organisations using different criteria for different purposes at different times. I’m sure some are trying to sell product, and some are pushing personal views. They may not have had access to the same Batman corpus. The goal of a meta-list is to overcome these issues through the aggregation of many sources.

BTW, the lists had a lot of overlaps, but many, particularly the longer lists had personal favs that didn’t appear elsewhere. The list of all comics considered is in comic_list.csv along with some meta-data.

I haven’t explained all of the details, but Julia code for Borda and other counts is included on the GitHub. It isn’t very interesting code (yet) so no direct discussion seems warranted. It is included so you can try this stuff yourself.

Results 

My top-10 meta-list is given above. A more complete view is given in the following figure. The figure shows the ratings that each graphic novel from four different scoring techniques (note that scores are all normalised onto the range 0-1 so that we can compare them). Here “Power” denotes the method I proposed with \( \alpha = 0.5\) .

You should be able to mouse over the points to get the names and zooming should work. You should also be able to click on legend5elements to hide them, or double-click to focus on them. The lines are there to help track the performance of the top-10 listed above, and highlight the differences between the scoring methods.


With one exception the scoring methods all chose the same top-10 (the Dowdall system includes Batman: The Cult in preference to Broken Bat5). However, they don’t agree on the ordering. For instance: the positions of

  • Year One and The Dark Knight Returns, and
  • The Long Halloween and The Killing Joke

are reversed by some systems.

The plot helps understand some of the issues that make each system good or bad. For instance:

  • The Dowdall system gives a lot of weight to coming first in some ranking, and that tends to have an extreme effect. One outcome is that the top-10 take up most of the scoring range and all the others are compressed into a narrow range at the bottom. That is fine when all you want to do is elect the best candidate, but not so good for ranking them all.

  • On the other hand, the Copeland scores don’t separate well in any part of the range. That gives me less confidence in their robustness to noise in the small (relative to a typical election) set of input rankings.

The best seems to be either the simple Borda count or my Power-weighted score. Of course, I prefer my Power-based score. It makes the closeness of the two top pairs much more obvious. Your mileage may vary, probably depending on whether you regard Year One or The Dark Knight Returns as the best ever.

Conclusion 

My final list was given above.

You probably disagree with it. Most of the input lists aren’t consistent, so I don’t expect everyone to agree on any given list.

But at the very least there are a top-2:

  • Year One
  • The Dark Knight Returns

and then a next top-2:

  • The Long Halloween
  • The Killing Joke

All the systems of ranking make these pairings and their relative rank pretty clear.

Then Arkham Asylum is cemented in 5th place.

After that there is a group of 5 that almost all rankings suggest should fill out the 10, though the ordering varies.

  • Hush
  • A Death in the Family
  • Knightfall
  • The Court of Owls
  • The Black Mirror

Finally, there are the others, and these I think we can all agree are great, but they don’t get universal support from a wide set of rankings. If you want to see how they fare in detail, I have a table below that lists the top-20, and also links to the scores for the full list.

Who was the best author/illustrator/…? That’s a question for next time. For the moment I have my reading list, starting with The Long Halloween.

Acknowledgements 

Thanks go out to Jono Tuke and Giang Nguyen for editing this one.

Resources 

Julia code for Borda and other counts is included on the GitHub.

My final scoring for the complete list of graphic novels can be obtained by clicking on the table below. The table lists only the top 20, but the data file has the complete list of nearly 100.

The ranked lists used as input are in my GitHub. Note that they are CSV files, but with a comment line at the start indicated by a # (which is used to list the source of each list). I’ve found that a lot of CSV readers can easily filter out such lines, so I use them to provide a little meta-data.

My overall list of comics, with some more meta-data about each is in the file comic_list.csv. This is a work in progress, and I plan to complete and check all of the data as I need it.


Footnotes

  1. I won't go into the exact properties here. There is a vast amount written about them, and they have been generalised in many ways. Google the theorem and you can find any number of descriptions.
  2. The time ranges are notionally unlimited, but most have a upper cut-off imposed by when the list was compiled (the meta-data on each list's source includes this date). More subtle is the fact that the lists are limited by the corpus of comics available to current readers -- many older comics may be out of print or otherwise unavailable and are thus censored.
  3. In the plot and the text I will use abbreviated names for Batman graphic novels. The abbreviations should be obvious, but the full dataset has the canonical names if needed. On a related note, occasionally the rankings use an ambiguous name which could refer to more than one graphic novel or collection. In those cases I make the best determination I can.