Thursday, March 24, 2011

How to Beathis! challenge - the solutions

Beatthis! cryptoanalysis challenge turned out to be pretty popular. Some of you have been asking me for additional tips, some of you shared the happiness of completing the levels, most of you probably cursed a lot ;) I'd like to thank all the participants for their time, I hope you liked this hackme. But of course my congratulations go to all of you who solved all levels:
  1. mrrr (@gynvael)
  2. hellman (@hellman1908)
  3. @internot_
  4. carstein (@m_melewski)
  5. wrrr (Kuba - jakubk at mp dot pl)
  6. dxp (@dxp2532)
You're all my personal heroes! Now it's time to reveal all the secrets - one by one. If you still want to finish the challenge by yourself, don't read up. There are spoilers ahead!

Free gifts for everyone!!111

But before that, let me present a few cryptographic tools created while preparing this challenge, together with cipher/plaintexts for all levels. It's a gift for you for trying the challenge. Download it, test it, do what you want with it. Enjoy! Now, let's begin with...

Level 1

This was the file:
V2hhdCBjb3VsZCBiZSBzaW1wbGVyIHRoYW4gc29tZSBzaW1wbGUgYmFzZTY0ICJjaXBoZXIiPyBU
aGUgY29kZQpmb3IgbmV4dCBsZXZlbCBpcyBCQUJBSkFHQUVWSUxLTklFVkVMCg==
Obviously, it's base64 encoded (the range of characters used and the ending == give this away). Notice the difference - encoded, not encrypted. There is no encryption here, it's just a simple starter. Decoding this with e.g. this online decoder gave you the passphrase to the next level.

Level 2

The ciphertext was also base64 encoded, but the decoded text looked like this:
Pvgrq Sebz uggc://ra.jvxvcrqvn.bet/jvxv/EBG13
EBG13 vf na rknzcyr bs gur rapelcgvba nytbevguz xabja nf n Pnrfne pvcure, nggevohgrq gb Whyvhf Pnrfne va gur 1fg praghel OP.[5]
.. which is simply ROT13-encoded file, given away e.g. by the fact, that only letters are being transformed, and that uggc://ra.jvxvcrqvn.bet/jvxv/EBG13 looks like an URL.One ROT13 decoder away are we're in...

Level 3

Again, base64 to start the fun, and then:
p=D@[ FD:?8 #~%\2?JE9:?8 H:== ?@E 86E J@F 2?JH96C6 3FE E@ E96 ?6IE =6G6= H:E9 E9:D (p}}pqtr#*!%~p}p{*$% <6J]
Ouch. This one doesn't look so good... But you could quickly (or hours later) see that
  • the symbols are all in lower half of ASCII table (0-127)
  • it looks like the spaces are not modified, as they appear too often
Starting the frequency analysis, you can e.g.
  • replace the most common letters in the ciphertext (you can find them e.g. with histogram.py) with the most common letters in English (e,t,a)
  • identify short words and try to match them with short words in English (like "to", "me")
  • guess another missing letters, and uncover letters one by one
    The process could look like this:


    When you get bored, or reach a dead end, you will eventually encounter the "Why 13?" hint and calculate that the replaced characters byte values are shifted by 47. Previous level was ROT13, and this one is ROT47 - his lesser known brother. Quick googling to decode it and on to the next level.

    Level 4

    This is a breakthrough level - the previous ones were quick warmups easily solvable with Google, this one requires doing some actual analysis.
    Jv{>FQL>qn{ljql>wm>{fjl{s{rg>}qssqp>m>>}qsnqp{pj>wp>sql{>}qsnr{
    etc.
    
    Let's look deeper with histogram.py:
    $ ./histogram.py 4.cipher 
    col 0:
    -------------
       '>' 3E  62 00111110 0.1767 *
       '{' 7B 123 01111011 0.1186 
       'j' 6A 106 01101010 0.0651 
       'm' 6D 109 01101101 0.0616 
       'w' 77 119 01110111 0.0605 
       'p' 70 112 01110000 0.0535 
       'l' 6C 108 01101100 0.0523 
       'q' 71 113 01110001 0.0512 
    '\x7f' 7F 127 01111111 0.0488 
       'v' 76 118 01110110 0.0314 
    ....
    $ ./histogram.py 4.cipher | wc -l
    43
    
    There's 40-something symbols, with non-uniform distribution. The output simply lists all occuring bytes in the file, sorted by frequency (last column). The file is filled with 17% of ">" - that's a lot. For comparison, here's the distribution for English text (article from Wikipedia):
    $ ./histogram.py english.txt 
    col 0:
    -------------
       ' ' 20  32 00100000 0.1614 *
       'e' 65 101 01100101 0.1085 
       'a' 61  97 01100001 0.0637 
       't' 74 116 01110100 0.0542 
       'n' 6E 110 01101110 0.0500 
       'i' 69 105 01101001 0.0498 
       'r' 72 114 01110010 0.0479 
       's' 73 115 01110011 0.0451 
       'o' 6F 111 01101111 0.0440 
       'l' 6C 108 01101100 0.0388 
       'h' 68 104 01101000 0.0351  
    
    Looking pretty similar, don't they? So, we match: > is SPACE, { is e and j is a. And we might continue with the path from previous exercises, trying to decipher each character, but there's a faster way. Let's look at the bits (red - ciphertext, purple - English text):
       '>' 3E  62 00111110 0.1767 *
       ' ' 20  32 00100000 0.1614 *
       '{' 7B 123 01111011 0.1186 
       'e' 65 101 01100101 0.1085 
    
    This highlighted bits in most common letters in text are clearly related - they are negated. Or XORed with 0b00011110 = 30. Let's test this with xor.py:
    $ ./xor.py 4.cipher 30 | head -n 1
    The XOR operator is extremely common as a component in more complex ciphers. By itself, [...]
    
    Yup, we're right. This is solving with the glove i.e. reasoning & deduction with the help of frequency analysis. It can also be solved with a hammer - XOR/ROL/ADDing all 256 combinations of a key for a meaningful plaintext.

    Level 5

    Is a ZIP file (simple to tell when you see PK at the beginning) . Unpacking the zip gives us a single file, but with a byte distribution that is 'flat':
    '\xcf' CF 207 11001111 0.0577 *
    '\x8d' 8D 141 10001101 0.0449 
    '\x9e' 9E 158 10011110 0.0385 
    '\xbb' BB 187 10111011 0.0353 
    '\xfe' FE 254 11111110 0.0321 
    '\x8a' 8A 138 10001010 0.0321 
    '\xdb' DB 219 11011011 0.0321 
    
    That's not good. The hint tells us that "One is lame" - could this mean that the key is longer than one byte? It would explain the flattened histogram. There are a few ways to find the key length, but let's just skip them for this level and let's use that last hint - "you have it all, the key is there". Yes, it's in the file itself, stuck at the end, as a DEADBEEF. So, first byte is XORed with 0xDE, 2nd - with 0xAD, and so on, looping over the key. It's an example of (sort-of) Vigenere cipher.

    Sidenote: It's a common practice for programmers to use XOR with a random number and then glue it somewhere with ciphertext. Please don't do that, it makes me cry like a baby and I don't like crying (well, unless I'm watching Titanic).

    Decrypting with xor.py is successful. The passphrase mentions "Friedman's thingy", and we're ready for the final level.

    Level 6

    Does that file resemble anything? Look at the previous level .bin file, compare. The structure seems similar, but there's QJ, not PK at the beginning. Well, not when you XOR it with 1, getting the almost-valid ZIP file in return. Almost valid, because I screwed up editing the file, but thankfully, the ZIP could be unpacked by at least a few unpackers (Linux unzip, WinRAR).

    Could be, provided you knew the password. The zip contained a single file - drowssap. As for the password, you could brute force it with anything you like, or just enter the most common one: "password" (drowssap in reverse).

    After unpacking the file, you should be pretty much clueless. There's around 100 different characters in the file - more than in usual English texts (60-70 for Wikipedia articles), so it's probably using some multibyte key (or homophonic substitution). Bruteforcing the key would take too long, to do frequency analysis and start uncovering letters one-by-one we need to know the key length. But this time it's not stored anywhere in the file.

    There must be a different way - and there is! Remember the last passphrase? "Let's try this Friedman's thingy" - what is that all about? Wikipedia article on Vigenere cipher tells us that "The Kasiski and Friedman tests can help determine the key length".

    Great! Just what we need. This time you'll have to write a program counting Index of Coincidence to determine the key length (as an additional hint IoC definition was also cited in my blog post announing Beatthis!). The example on Wikipedia IoC article is very good explanation, so I won't try to be smarter . Writing the program is a great excersie, but if you're lazy, just use mine: break.py

    $ scripts/break.py drowssap english.txt 
    Guessing key length...
    IC: 16.0056153949
    1 4.5598898258327143 (11.445725569)
    2 4.5263998965036087 (11.4792154984)
    3 4.6351954511917084 (11.3704199437)
    4 4.3566207548108 (11.6489946401)
    5 16.613403570383376 (0.607788175501)
    6 4.538307420468576 (11.4673079744)
    7 4.249386352479136 (11.7562290424)
    8 4.3937767878235636 (11.6118386071)
    9 4.8772309151256525 (11.1283844798)
    10 16.468332136110892 (0.462716741229)
    11 4.643879300674647 (11.3617360942)
    12 4.1983747246905141 (11.8072406702)
    13 4.3670083864739384 (11.6386070084)
    14 4.1592126212187015 (11.8464027737)
    15 17.909881422924901 (1.90426602804)
    16 4.058450152607838 (11.9471652423)
    17 4.3184885290148447 (11.6871268659)
    18 4.6497023339128596 (11.355913061)
    19 4.4484544695071007 (11.5571609254)
    20 15.707664884135474 (0.297950510746)
    21 4.0888144113950551 (11.9168009835)
    
    This one counts the IoC for our ciphertext, probing different lengths, also comparing this with IoC of some sample English text. You can clearly see, that every 5*x length is giving a much higher result. That's because 5 is the length of a key.

    Now you can try the frequency analysis on each column. (e.g. get byte 1, 6, 11, 16, ... and anylyze them separately, then bytes 2,7,12,17,22 etc.). Or follow Wikipedia example and try maximizing correlation (see Wikipedia for details). Again, write your own or use break.py for this:

    $ scripts/break.py drowssap english.txt 5
    Assuming key length 5
    
    Column 0:
    Best matches:[('xor', 30), ('xor', 91), ('add', 226)]
    Chosen function: xor:30
    
    Column 1:
    Best matches:[('xor', 40), ('add', 24), ('rol', 2)]
    Chosen function: xor:40
    
    Column 2:
    Best matches:[('xor', 50), ('add', 14), ('xor', 119)]
    Chosen function: xor:50
    
    Column 3:
    Best matches:[('xor', 60), ('xor', 121), ('add', 4)]
    Chosen function: xor:60
    
    Column 4:
    Best matches:[('xor', 70), ('xor', 3), ('xor', 87)]
    Chosen function: xor:70
    Guessed plaintext:
    Congratulations! You broke the unbreakable, indecipherable cipher, commonly named as Vigenere cipher.[...]
    

    It turned out that the key was 5 bytes long, with bytes (decimal) 30,40,50,60,70. Easy when you know it upfront :)

    Final notes

    Once again, thank you all for participating. I hope you had at least so much fun playing as me preparing it. Like I said before, for your participation - take a look at the scripts, try to play with them all, fork & improve them, fix their bugs - let them live on their own.

    With break.py levels 4,5,6 (after unzipping) are solvable in seconds. histogram.py helps analyzing all levels, xor.py is good for encrypting/decrypting, others are just helpers.

    Beatthis! is still online (and probably will be for weeks, until someone starts abusing it with bruteforce), so you can test the scripts or try other methods. Let me know what is your opinion on the challenge, were the levels too hard or too easy, how did you solve them, what stopped you - I really want to hear. Till next time!

    5 comments:

    Gynvael Coldwind said...

    Hey,

    Thanks for the hackme, I really enjoyed it :)

    About the "Almost Valid" ZIP file in level 6 - it was simple to make it valid. All it took was to change the "DE AD BE EF" bytes (last 4 bytes) to 00 00 00 00 ;>

    As a side note - did you know that a ZIP files don't have to start with a "PK"? I've actually learned about that not long ago. ZIP has the main header at the *end* of the file (actually, it can be offset at most 64k from the end, due to a variable length comment field). The fact that ZIPs commonly start with "PK" is almost an accident and it is not always the case (see GIFAR as a good example).

    Thanks for the IoC method btw ;) It's cool ;)

    Cheers!

    Alexey Hellman said...

    Nice crypto challenge! Had fun solving it :) Though, it was really easy. The only places to stuck a bit for me were ROT47 and drowssap. I found that it is shifted by 47, but the boundaries were not obvious. Also, googling 'drowssap' did the thing and it was done :)

    PS: to decrypt the last challenges I used similar to your break.py my tool
    https://github.com/hellman/xortool

    kkotowicz said...

    Glad you liked it :) As for the ZIP, yeah, I know it, but I only just learned that from wrrr 2 days ago. He was actually reading the ZIP format and started fixing the file manually (it took looong). My respect for the effort :)

    kkotowicz said...

    Funny thing - wrrr gave me the link to your tool in the middle of the challenge, and I saw you published it just two months ago. And then I saw you in the challenge, talk about coincidence :)

    Through a quick look though, your xortool uses a different method and only asks for most popular letter, do you have any links where this methodology is described?

    Alexey Hellman said...

    Hm, I just played with CrypTool - it has many crypto features. But it is for Windows and I decided to write analog for it's xor analysis feature for linux.
    The algorithm is easily guessable: try different length of the key to produce a plaintext that contains as much as possible equal chars. Then calculate a key to make all that chars equal to a given one.
    No magic, really :)