Hidden numbers and letters on f71r

Voynich Manuscript

Here is Bunny’s find of possible hidden numbers and letters in the tree on f71r of the Voynich Manuscript (link goes to original image). It is reproduced on my site with permission (by request, in fact).

Possible hidden letters and numbers in the tree in f71r.

Possible hidden letters and numbers in the tree in f71r.

Adjustments in the image: “changed contrast brightness, gamma, colour then made b/w.  attempting to remove green from tree and clarify what left, no adjustments made to actual lines of image.”

Countries and territories by most popular Wikipedia edition


I was in a statistical geography mood so I made this map based on Wikimedia statistics. It shows the most popular Wikipedia language edition for each countries and territories that had hits in 2014 Q1. If the majority of hits from a place are for a single language, I marked that language’s colour. If there was no majority language, I marked the top two in a gradient.

I hope you find this as interesting as I did.

Map of countries and territories by most popular Wikipedia edition (2014 Q1). Click on image for full size.

Map of countries and territories by most popular Wikipedia edition (2014 Q1). Click on image for full size.

Interesting points:

  • Out of ~6000 languages in the world, only 32 (0.5%) account for most popular Wikipedia edition in every country and territory in the world that tried to access it. All of these languages are from Eurasia, which really says something about the power structures over history and the digital divide.
  • Language geography corresponds well with European imperial holdings with some exceptions. Who would have guessed that Puerto Rico, Suriname and East Timor would have English as their preferred Wikipedia language? Regionalisation is also a factor.
  • English has more popularity than the rest of the languages combined.
  • Regions with no single majority language include North Africa, the Caucasus, the Balkans and the Baltics. Other such places include Belgium (French and Dutch), Norway (English and Norwegian), Greenland (English and Danish), Israel (English and Hebrew) and South Korea (English and Korean).

Leave your thoughts in the comments section below!

Cod. Sang. 754 and the Voynich Manuscript

Voynich Manuscript

Every now and again we uncover manuscripts with possible direct or indirect links to the Voynich Manuscript. They might contain a similar glyph, a similar illustration, or perhaps a similar diagram. A good example was Cod. Sang. 839 (discovered by Thomas Sauvaget) with the same quire number style.

Cod. Sang. 754 is perhaps special in how many similarities there are.

All credit to the discovery goes to Job (from the Voynichese project); I am simply documenting it for him. I will avoid making any bold claims and simply lay out all the similarities and let you make your own decision. I’ll also not bore you with the details of the manuscript until the end.

1. The illustration

The first thing Job noticed was the style of the illustration on page 164.

Page 164 of Cod. Sang 754.

Page 164 of Cod. Sang 754.

It should speak for itself.

(it’s the only full plant illustration in the manuscript so don’t bother looking for others)

Introduction to the Curve-Line System

Voynich Manuscript


This paper proposes a new pattern in the text of the Voynich Manuscript named the “Curve-Line System” (CLS). This pattern is fundamentally based on shapes of individual glyphs but also informs the structure of words. The hypotheses of the system are statistically tested by two independent people to judge their significance. It is also compared to existing word structure paradigms. The results suggest that the shapes of glyphs affect their placement in a word, the Curve-Line System is an  intentional feature of the text design, and the text of the Voynich Manuscript is a highly artificial system.

Encipherment process (take 1)

Voynich Manuscript

According to my latest cipher theory, this is a general estimate of what the encipherment process could be:

  1. (Optional) Prepare your plaintext by removing some letters. Helps to save time.
  2. (Optional) Split into blocks of equal length. Helps to reduce errors.
  3. Convert each letter into a number with simple substitution.
  4. Do some mathemagics to the numbers with Pascal’s Triangle (exact details are trade secret). You now have Voynichese! If you split the plaintext into blocks, each one now corresponds to a line of ciphertext. But some may have ended up with wildly different lengths, so…
  5. (Optional) Pad out lines with filler at the beginning or end to make them equal length. Helps to make the result look nicer and harder to decipher.

Pascal’s Triangle and the Voynich Manuscript

Voynich Manuscript

I’ve been toying with the idea of using Pascal’s Triangle to make a cipher that results in similar statistics to the text “system” of the Voynich Manuscript. My concepts are premature but I’m pleased to note that so far I’ve devised something (relatively) simple with short words, binomial word lengths, strong word structure, lines as semantic units, lack of repeated sequences, and word-adjacent repetition. I haven’t had the time to really dig in and quantify any of these and compare with the VMS text but on first glance it appears fairly close.


For example, here is a ciphered phrase using an early version of the cipher and EVA transcription: (deliberately seeded to end in -n all the time)

chiain chiin dain choiin shoin shoiin chiiin chiin shn dain chiiin diin in.

Here is the same phrase again:

potir chiin dain shoedy shoin shoiin chols sheey toy chddy chiiin ooli aiim.

Here is the same phrase yet again:

fodar choiin shn sheey diiin diin choli shoedy chedy ty choin shels daiim.

Here is the same phrase without vowels:

shedy shoey sheyi shoiin choiin chtchar cheli shoyiiim.


Interesting things about my system (so far):

  • Word context is highly important and affects all content.
  • The same plaintext sequence is almost guaranteed to end up completely different every time it is included. This applies to individual words too. For a word of length enciphered twice, the probability that its ciphered versions will match is approximately 1/(2^18n), with a few caveats here and there. I wish WordPress could embed formulae easily (can someone please tell me how in the comments below?).
  • Multiple appearances of the same ciphertext sequence are almost guaranteed to be completely unrelated. This applies to individual words and similar sequences like Timm Pairs. The probability is similar to the one mentioned above. However, if they are at the very beginning or end of lines they might be a bit related. If they are labels (i.e. enciphered outside a line) they become much more similar.
  • Blank spaces in words are meaningful. What do I mean by this? All words actually store 10 letters of information, but one letter of the alphabet is an invisible glyph (we’ll call it “_”), giving the appearance of different word lengths. For example (not a real example), fodar might actually be f _ _ o d _ a _ _ r. The system allows us to unambiguously reconstruct the original ten letter sequence with ease. This allows words to store more information than they would suggest.
  • Similar words that appear next to each other (Bad Romance sequences) are an unintended side effect. They store just as much information as any other sequence because of their context.
  • It allows for a total of 9^4=6561 unique words, though this can be adjusted with some tricks and workarounds. Stolfi counted a total of 6525 unique words in the Voynich Manuscript.
  • If this was confirmed to be the system behind the Voynich Manuscript’s text, I would still have very little idea of how to decipher it.
  • Update: At certain points you could pack filler at the beginning or end of a line to make them equal length and make the system a bit more secure. In the Voynich Manuscript itself, some see evidence of meaningless filler material at the beginning or end of some lines.
  • Update 2: It also accounts for the findings that the first two letters of each word are more predictable than the rest, and that there is some mild correlation between the end of one word and the start of the next.

f76v/f77r Timm Pair

Voynich Manuscript

If you didn’t know, Timm Pairs* are sets of two phrases in the Voynich Manuscript that are almost identical, and not Bad Romance sequences**. Timm himself uses them as evidence of a hoax. Nick Pelling opines that nearby Timm Pairs could be the same plaintext sequence encoded slightly differently by the same system. There’s no telling what they are, if they are anything at all.

That said, I’m not here to discuss them, just present an interesting one.

f76v begins:

polarar okor

f77r begins:

poldarairol qokol

At first glance they don’t look that similar, and strictly speaking these phrases are not close to each other. What caught my attention is that almost no other folio pairs begin this similarly, not to mention two folios that (currently) face each other. As for the phrase similarity, f77r could be seen as a spruced up version of f76v’s beginning.

f76v: p-o-l-d-a-r-a-i-r-o-l / q-o-k-o-r

f77r: p-o-l-d-a-r-a-i-r-o-l / q-o-k-o-l

As for what this means (if anything), I don’t know. Just putting it out there. I thought that perhaps if I looked at similar instances, the transformations between them could lead to insights into the word structures. I haven’t been through them thoroughly but here are some I have found since then:

  • f2v begins “kooiin”, f3v begins “koaiin”, f4v begins “pchooiin”. Oddly, f29v also begins”kooiin”.
  • f54r and f55r both start with “podaiin”. (note: this also begins paragraphs in f49r and f85r1)
  • f68v3 begins “tchedy chepchy”. f68v2 begins “teeody shcthey”. These two folios face each other on a large fold-out section. Although these seem dissimilar, they have exactly the same curve-line pattern which interests me.
  • One of the starred paragraphs on f103r begins “polarar lshedy qotolaiin”. Another on f111v begins “polarar okshey qokain”.

Feel free to share your thoughts in the comments below!

*Named after Torsten Timm but not discovered by him, serving as an example of Stigler’s law of eponymy. The term is a neologism by Pelling, but given that there’s nothing else to call them, I use it too.

**Bad Romance sequences are the common, repetitive, information-poor phrases peppered through the manuscript’s text like “qokedy qokedy qokeedy qokey” or “dain daiin okaiin”. This is my neologism since I’m not aware of any other name for these. They are named after the Lady Gaga song Bad Romance which is full of the phrase “ra ra-a-a-a roma roma-ma gaga ooh lala”. They are similar to, but perhaps not the same as, what David Jackson calls “epizeuxis”.

Review of “A Logical Consideration of the Voynich Manuscript” by David Jackson

Voynich Manuscript

Can we prove what the Voynich Manuscript is? Well, we need to take that first step sooner or later.

With ROS (Robot Operating System) being much less co-operative than I had envisioned, I haven’t had time at all to investigate the Voynich Manuscript as much as I had promised myself and others. That said, I looked at David Jackson’s interesting essay A Logical Consideration of the Voynich Manuscript where he evaluates all the evidence out there and comes to some conclusions.

Here are my thoughts (and in writing them, I have created more questions to consider).