& if anyone reading happens to be good at this stuff, this is from the University of Adelaide re the cipher -

https://www.eleceng.adelaide.edu.au/...acking_process
Management

Carefully study how the tasks have been logically split up below. Fill in the deliberately missing blanks. Figure out the best sequence to do them in. Figure out how to allocate the tasks between you two so that there can be two streams of complementary work going on in parallel. This will then form the guts of your Critical Design Review (CDR).

Hypothesis 1: the code is gibberish

Assume the code is in fact a meaningless string of letters. This assumes the Somerton man was normally an English speaker, but was drunk or so badly poisoned with hallucinogens that he was writing a delusional string of letters. Think of ways to test this hypothesis. Get 10 native English speakers to write a string of 50 random letters before and after a fixed number of beers. They must try to be random only using their mind and not use computers or external devices. Better to chose friends who study courses where they don't teach you what randomness is. Otherwise you may get the odd friend who tries to be too smart and not go with the game. Arts students will be perfect victims. Since different drugs produce different random changes you could also test the effect of a couple of strong espresso coffees on random letter production.

Then think of ways you can statistically compare the Somerton code to these gibberish sequences. Plot letter frequencies of gibberish with error bars. Make counts of letter pair frequencies. Are there letters of the alphabet people consistently missed out and how does this compare to the code? How do the most frequent letters compare? Calculate the average information in bits per symbol H(x). You do this by summing up all H(x i ) over the code, where x i is the i-th symbol. So let's say x 1 is the symbol 'R', you count up how many times R appears in the code and divide it by the total letters to give the probability P(x 1 ) of there being an 'R'. Then by definition, H(x i ) = P(x i )log 2 (P(x i )). Do this for all the symbols and up all the H's. This is what is called the Shannon average information and has the units of 'bits'. Do this for both the gibberish and the code.

Hypothesis 2: the code is in English, but the letters have been substituted

Make a big long list of coding techniques. Try to eliminate some based on the date they were invented. Or ones that would require more computing power than was available in 1948. It is not reasonable to assume he did the code in his head. A computer of the time could have been used. He could have been hurriedly copying down the code that someone else prepared. Maybe he ripped the code off someone else when they weren't looking. Then take your reduced list of coding techniques, and code up an e-book written in English. You should sub-sample chunks of 50 letters 10 times, and generate error bars. Then look at the letter frequencies before and after coding. Compare this with the Somerton code and you should be able to eliminate some. Then you can repeat the above using different statistical features such a letter pairs. Also try calculating the probability of a symbol x i appearing after a symbol x j . This is called the transition probability. Comparing transition probabilities can give more clues.

Vigen?re cipher Try the Kasiski test

Once the key length is known, you can then write out the code in rows of that length, and then do a frequency analysis on each of the columns—since each column corresponds to a single letter, then a column is the result of a separate substitution cipher. You can then reuse your software for the substitution cipher to try and decrypt. More details here

One time pad

A one-time pad is like the Vig?nere cipher except that the key is the same length as the plaintext, and ideally uniformly random (i.e. all letters occur with equal probability in the key). This then generates a ciphertext with letters of equal probability (i.e. a flat probability distribution). In practice, he may have used some piece of text he had on him as the key, so the key would not have had letters of equal probability. You have already found the probability distribution is non-uniform, so it seems likely that if it was a one-time pad, he used a piece of text as the key. Possible keys: Quatrains from the Rubaiyat. You should generate all possible variants of the Fitzgerald translation using the list of differences, and try all possible quatrains (i.e. all quatrains from all versions). You can rule out those that are shorter than the ciphertext, unless you consider concatenation of successive pairs of quatrains. There are certain quatrains associated with the case, it would be interesting if it was one of those, so try them first.

KJV version of the bible This was the version that Gideons distributed to hotels in 1948, and we can be pretty sure the dead guy
was a traveller. You would need to know where to start taking the key from, however. One possible suggestion is that some of the first letters specify book, chapter and verse. The BAB on the top line could be a disguised 3,4,3 or 5,4,3 (the first B does look a little strange). So that could be where to start from in the bible, and the following letters would be the ciphertext. Alternatively you could decode the first three letters (A=1, B=2, etc.). Starting from one (probably) as the chapters and verses are numbered starting with one. Normally you use A=0, B=1 for decoding, however, to match up with the modulo arithmetic used. Even if none of these work, you may still be able to crack part of the code. Since e is the most common letter in English texts, then you would get 'e'+'e' appearing most frequently in the ciphertext (if the plaintext and key were both in English), and etc. for TOAIN... However the code is too short to be able to use this (statistically) reliably to decrypt, so it's doubtful this approach will work.

External and internal information is useful You can use lateral thinking and think hard about the internal information in the code and the external information of the case. How do these things help you? Internal information: What does the presence of a 'Q' tell you? It tells you there wasn't a straight substitution with a rotary phone. The absence of a 'U' to go with the 'Q' means it is highly unlikely the code is English letters without some kind of substitution. It could be that 'Q' is an initial. External information: Could the name 'Jestin' be in the code itself? Or could the word 'enormoz' be in there? This was the word used by Cold War spies to refer to a nuke. Think of half a dozen significant words, from what you know about the case. Write software to apply a sliding window across the code and assume a substitution with 'Jestin' at each point. 'Jestin' has six unique characters so there will only be a few decrypted outputs. Only six letters will be decrypted, and for the rest just print a "*". Then visually we can easily eliminate the outputs that look like rubbish. Then do this for 'enormoz' and for half a dozen fun words that you come up with. Could 'Jestin' or 'Tamam shud' be the cryptographic keyword? Assume some basic coding schemes that use keywords and write software to check if your guessed keywords work or not.