The Infinite Library

Jorge Borges’ “The Library of Babel” talks about an incomprehensibly large Library that contains every possible book. Even though there are 25 characters in the Library’s alphabet, Borges points out that transliterations or more complex encodings could represent any language. Each book is limited to exactly 410 pages. A shorter work could still be found in the Library, since later pages are allowed to contain only spaces. A longer work could presumably exist, although critics disagree about the best way that could be done. Very long books get to be important when working with any strange language where the Library’s meager alphabet needs dozens of pages for a single foreign phrase. A collection of serialized two-megapixel photographs would run into a book each.

Steven Peck proposed in A Short Stay in Hell that two books together could be considered as one large book. A character explained, “War and Peace would be in multiple volumes.” That is not accurate, strictly speaking, because there are some books that can exist in our Universe but not in that library. Imagine a monotonous book with three million repetitions of the letter “A.” This would take more than two 410-page books all filled with “A.” However, Peck’s library only contains one copy of each permutation, so two books that are both full of “A” could never be found.

The philosopher W. V. O. Quine noticed that repetition would be necessary to resolve the problem. In “Universal Library” he said that even if a library only contained two-inch strips of seventeen characters each, all literature could be expressed. Further, even if a library only contained a single dot and a single dash, their correct combination could create any book in Morse code or a binary encoding.

The totality of truth is now reduced to a manageable compass. Getting a substantial account of anything will require extensive concatenation of our two-inch strips, and re-use of strips here and there. But we have everything to work with.

The ultimate absurdity is now staring us in the face: a universal library of two volumes, one containing a single dot and the other a dash. Persistent repetition and alternation of the two is sufficient, we well know, for spelling out any and every truth.

Borges is deliberately ambiguous about the details of the Library and whether it has an end. Of all the different formulations of the Library, the most romantic descriptions are where every possible book exists, regardless of length. Quine’s repetition is unattractive, because a library with only two overused symbols is utterly lacking in mystery. We are already familiar with our Universe, where all books are assembled by humans. Quine’s use of intelligent composition makes the Library too much like our Universe. Borges foresaw the argument and mentioned that it is “blasphemous” (“blasfema“) for librarians to write books. The Library is interesting because it is the Library itself that has prepared all possible permutations. Logically, this should extend to large works. To honestly say that any 820-page book appears in the Library, it should be produced by the Library itself, with the two volumes naturally adjacent, not needing the intervention of a librarian.

Somewhere in the Library, two adjacent volumes can tell any possible 820-page story, just as one volume tells any possible 410-page story. That means that a given volume must appear an infinite number of times in the Library, but in different contexts, as part of an infinite number of larger works. A volume at the end of a shelf could be considered adjacent to the first volume of the next shelf, and the last volume in a gallery is followed by the first volume in the next gallery. It’s not obvious which gallery should be chosen to be next. The hallway leading to the nearest gallery cannot always be chosen, because a complete path ought to visit other floors. A strategy is needed that can collapse the several dimensions of character, line, page, volume, shelf, side, gallery, circuit, and floor into a single ordered list of coordinates. The tool for this is called a pairing function. A simple example is a sequence that visits all the neighbor galleries, left then right then above then below, and then recursively revisits each of those galleries in the same order while also visiting their respective neighbors. Within books, an order of left to right and top to bottom could be assumed, or a different exhaustive path could be chosen. This is a form of breadth-first traversal, and it gives the librarian an unambiguous way of stringing together volume after volume throughout the entire Library, although it still needs to be examined whether any one traversal is the “correct” one. Also, if a librarian wants an unambiguous way to determine where one multi-volume work stops and another starts, then an encoding scheme needs to be defined, but that issue will be ignored here since encoded representations are well understood and this essay is concerned with the organization of the Library’s information rather than its interpretation.

The size of this Library is now certainly unbounded. Not only are all the 2580×40×410 possible 410-page books present, but also all the possible multi-volume works that are 2580×40×410 volumes long and longer. Every volume has a volume that follows it in an unending sequence. Such a sequence is described by the technical term “disjunctive.” It is impossible for the sequence to be periodic, frustrating the narrator’s hope for a cyclical “Order.” There are many ways to arrange letters into a disjunctive sequence, and some can produce all sorts of heterogeneous structures for librarians to discover. However, in Borges’ story, there is only a mystical lack of structure. All the letters appear with equal frequency, and the books still don’t reveal any structure when interpreted as coded messages in other alphabets. Such a sequence is called by the odd name “absolutely normal.”

There are an uncountably infinite number of possible libraries that are absolutely normal. One simple example is that there are libraries where everything is uniformly random. Another possibility is a library that has a central gallery that contains all works of one letter, followed by all works of two letters, and so on, forever. That sequence is known as the Champernowne constant. Paradoxically, a random library must also have a gallery somewhere that contains all works of one letter followed by all works of two letters and so on for as long as a librarian could imagine following. There are no end of ways that the library could be organized, but it matters very little how it is organized, because every library can mimic every other. Although all the possible libraries are distinct from each other, they each contain arbitrarily large overlaps. Note that an arbitrarily large overlap is not as complete as an infinite overlap, which is what would be required to call two libraries equivalent. An arbitrarily large overlap is just enough to make it impossible for the librarians in any finite amount of time to prove or disprove a postulated order to the library.

With that theory in place, it is possible to investigate whether some paths through the Library will work better than others. The only criterion for a path to be admissible is that it must reach every work eventually. To illustrate the concern, suppose that a path only traverses a single floor. If the Library were generated with a random sequence, then that path would work as well as any other path. But if the Library were mostly random except for a single floor filled only with the letter “A,” then that path may be unlucky and never find anything except “A.” There would be other paths where the Library would behave absolutely normal despite the floor with fixed contents, because one floor has an infinitesimal effect on the Library as a whole. For the same reason, the probability of a librarian accidentally starting on that floor would be zero. A path that covers all galleries on all floors would eventually exit such an area of fixed contents and would be guaranteed to be absolutely normal. A more damaging scenario is if it were possible that the Library has fixed contents spaced out at intervals that frustrate the breadth-first traversal, but that exhibit absolutely normal behavior in some alternative “correct” path. For example, suppose that the correct path is a breadth-first traversal that skips every other gallery, and suppose that the skipped galleries contain fixed contents. Those fixed contents would not be part of the absolutely normal sequence, but they would poison the librarian’s breadth-first traversal. This example can be dismissed, however, since the correct path ought to visit every gallery eventually. The model is that the correct path was used by the Library to generate the books, and so galleries not on the path would therefore not have books. The question concerns the order of the galleries, not which galleries are included. That means that asking whether all paths are correct is equivalent to asking whether an absolutely normal sequence can survive permutation. Clearly it is possible to find a permutation that is not absolutely normal: replace every instance of “ABC” with “ACB,” guaranteeing that “ABC” never appears. But such a permutation could not be created accidentally, because absolutely normal numbers are so much in the majority that for every incorrect path there are an infinite number of correct paths. If the Library was generated randomly, then the probability of a particular breadth-first search being incorrect is zero. It could only have been incorrect if the Library itself had singled out that traversal to not contain all possible works. If that were the case, it would comprise a message from the Library to the librarians, showing them one path that is exceptional among paths. Since the Library fastidiously does not create order from the chaos, such communication is unrealistic. Therefore, the librarian is free to choose any path that traverses the entire library. It is known that any configuration of the Library will still have paths that do not contain every possible work, but they are outnumbered in infinite proportions by paths that will succeed.

An important detail about the Champernowne constant is that there is no proof that it is absolutely normal. So it may not meet the minimum requirements to be the mystical Library. Likewise, if all books were generated from the digits of tau (or pi), then it is conjectured to be absolutely normal, but no one really knows. Absolutely normal numbers are strangely elusive: it is known that there are an uncountably infinite number of them, (even more than the number of books in the Library), and it is known that almost all numbers are absolutely normal, but it took until 2002 for mathematicians to figure out how to come up with a single concrete example! This is one of those strange areas of science where simple ideas are maddeningly difficult to prove. A consequence of this is that it would be impractical for a computer program to generate the volumes that could appear in the library. The only deterministic process that has been discovered is prohibitively expensive. It would be necessary to cheat by using a sequence that merely gives the mystical appearance of being random. In the future, the state of the art may advance so that a computer could accurately simulate the Library.

Another paradox is that there must be a region in the Library where only the letters “A” and “B” and “C” appear. A librarian born in such an area would wander for a lifetime and never get an inkling that other letters existed. Likewise, the narrator could have been born in such an anomalous region. An enormous distance away there could exist an infinite number of books with a larger alphabet, or illustrated books in colorfully-painted galleries. It is allowed for the Library to have freedom with a countably infinite number of typefaces and of bindings and of any number of other characteristics, because the possible permutations would still be countably infinite. The important point is that it would be plausible for a librarian to describe an enormous region as having different letters from the narrator’s in Borges’ story, or as having a pattern that is not absolutely normal, yet the same Library can encompass both realities. This justifies the cheating that a computer would need if it generated a library. No librarian would be able to determine whether a given computer program is doing a good job or not, because there must be somewhere in the Library that mimics the computer’s output, however artificial it looks.

All these hypotheses allow one to accurately simulate the Library in our own Universe, such as the online “Library of Babel” does. A classifying system is necessary for distinguishing books and galleries. Borges uses natural numbers when describing “circuit fifteen ninety-four” (“circuito quince noventa y cuatro“). One possible catalog would start with an arbitrary pseudorandom gallery and call the first book #1, and continue numbering all books from that point. A pseudorandom sequence is already cheating because it is periodic. Jonathan Basile proposed using truncated entries from the quasi-random Halton sequence so that the library would be searchable, which is elegant and may one day be proved to be absolutely normal. Another possible catalog would fill all the books by using the digits of tau (or pi). Yet another possible catalog would locate a book that only contains the letter “A” and call that book “A.” The book next to it only contains the letter “B” and is named book “B,” and so on. After all the one-letter works would come the book containing “AA” and named “AA.” This library is generated by the Champernowne constant. This is the simplest ordering because each book near the start is named the same as its contents. Far enough away, when book names are already 80×40×410 characters long and books start to repeat themselves, the books can still be ordered by the length of the multi-volume work, and then in alphabetical order. The names then need a prefix to distinguish different copies of each book. Somewhere in the Library there must exist such a convenient ordering of books, so it is no trick to define a coordinate system with that gallery at its center. In fact, no matter what classifying system is desired, there is a gallery somewhere that acts as an appropriate origin and that follows the pattern for an arbitrarily large distance. The Champernowne constant is just as plausible a choice as the digits of tau (or pi) or of any given random sequence. This supports the story’s assertion of symmetry, “The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible” (“La Biblioteca es una esfera cuyo centro cabal es cualquier hexágono, cuya circunferencia es inaccesible“). Even if the catalog is cheating, it is impossible for a librarian to ever catch it. Creating a catalog of the Library is therefore as easy a task as the author desires to make it. It is more of an artistic choice than a mathematical one. No matter what catalog a librarian invents, the Library obliges by providing a gallery somewhere that fulfills the catalog’s criteria.

In summary, the Library is a never-ending sequence that contains every possible work of every possible length. As long as there is one path through the Library that finds every work, any path is almost certain to succeed. Simulating the Library with a computer program is unfeasible with current technology, because not much is known about absolutely normal numbers. At present, computers must resort to cheating with sequences that are not guaranteed to include all works. As long as cheating is acceptable, no strategy is more or less plausible than the others, because every possible pattern appears somewhere in the Library.

Leave a Reply

Your email address will not be published.