MaS is about computer security, malware and spam issues in general.

2011/11/07

AnonCommunicate

I was intrigued by the various twitter feeds, allegedly owned by factions of the Anonymous group. Intrigued because it looked like the messages were encrypted. I asked the cryptographer Endre Bangerter, FH Bern/Biel, to help me out and he forwarded me to one of his reverse engineering wizards, David Gullasch (https://twitter.com/#!/x0n0x), who found out that it was not what I thought it was. Here's David's analysis:


The twitter account AnonCommunicate periodically tweets cryptic looking messages (apparently one tweet every 15 minutes). The stream of messages repeats every 537 tweets:

AAAAAA AAAAA4 NcPhvj VqKmBO lrbGYF WFvtYc 9FeFPl XAHsv8 cp7dLG VwJMht
sz7tNa OCDebL 3XyHL9 4NrD6b xCALJv RUoSl9 jpywkA 9JJg5Y cQSHam T4ACuG

MJGojD uarAAO QmkNiP DriWbM I9grRP Wsxlkw 7hdBSz vTRVKE 1U5CAK iua01m
DhcTSm pL8r7b podCXT JomI1N B4a6fD GbmlyA Gi18vQ 6qTikd rwHQZS 20l0pU

...

EcMt5A kEka05 5azHox uRhPlE Xh5PCm 28LjtL o5bzoe AAAAAG Mt1IvW bjfNp1
d6lLyZ iyJAKM quAT8w SuxpOj iAAAAA AAAAAA AABlta 7WXyEO ism4GD 7zKKwt

j0i8Ct Xl

The most obvious observation is that it consists of alphanumeric characters (a-z,A-Z,0-9), only. Therefore it can't be base64 encoded – more probably some sort of base62 encoding. Because log2(626) = 35.7251 is a somehow weird value, the six character blocking does not make much sense, if one assumes a binary encoding below the base62 layer.

The next observation come from the character frequencies: the character 'A' is much more likely to be encountered than any other character. This statistical anomaly does not stem from long runs of "AAA...A", these runs are only present in the first and last messages shown above. The many 'A' sprinkled all over the place turn out to be periodic and suggest a different blocking scheme as follows:

AAAAAAAAAAA4NcPhvjVqKmBOlrbGYFWFvtYc9FeFPlX

AHsv8cp7dLGVwJMhtsz7tNaOCDebL3XyHL94NrD6bxC
ALJvRUoSl9jpywkA9JJg5YcQSHamT4ACuGMJGojDuar
...
AEcMt5AkEka055azHoxuRhPlEXh5PCm28LjtLo5bzoe
AAAAAGMt1IvWbjfNp1d6lLyZiyJAKMquAT8wSuxpOji
AAAAAAAAAAAAABlta7WXyEOism4GD7zKKwtj0i8CtXl


Now the truncated final tweet "j0i8Ct Xl" also makes sense, because it exactly completes a 43 character block. Also log2(6243) = 256.03 is much, much nicer and suggests a base62 encoding of 256 bit blocks.

The third hint comes from the statistics of the second character in each block: it always is in the range A-O. Factoring in this fact and that the first character is always 'A', we get a block entropy of log2(15*6241) = 248.03, suggesting that one byte in an underlying binary 32 byte block must be fixed.

The next step is to find the correct base62 decoding. In the spirit of Benford's Law (integers are more likely to start with lower digits), we guess that 'A'..'Z' map to the values 0..25. Also, we guess that '0'..'9' and 'a'..'z' map to contiguous ranges. With these assumptions, the final parameters for the decoding can be found by trial and error:
'A'..'Z','0'..'9','a'..'z' map to 0..61 and the blocks are big-endian integers, which can be decoded like in the following example:

1. decoding the digits:

   A  A  A  A  A  A  A  A ...  t  Y  c  9  F  e  F  P  l  X
   0  0  0  0  0  0  0  0 ... 55 24 38 35  5 40  5 15 47 23
 


2. computing the integer value:

   0*6242 + 0*6241 + 0*6240 + 0*6239 + 0*6239 + ...
      ... + 40*624 + 5*623 + 15*622 + 47*621 + 23*620
   = 1107723835656471950689271297510666484906119194003614018861
 


3. repesenting that as big-endian hexadecimal integer:

   2D2D2D2D2D2D20424547494E2046494C45202D2D2D2D2D2D 


which obviously corresponds to the ASCII string:

   "------ BEGIN FILE ------" 


Decoding 31 bytes from every 43 character block in this way (and omitting the "------ BEGIN FILE ------" and "------ END FILE ------" strings) yields:


[Ed.: I've reduced this in size and converted it to JPEG format, so this isn't the original. - Morton]

So, it's not as nefarious as it seems. At least at first sight. I'm still exploring the phenomena, though :-)



No comments: