The twitter account AnonCommunicate periodically tweets cryptic looking messages (apparently one tweet every 15 minutes). The stream of messages repeats every 537 tweets:
AAAAAA AAAAA4 NcPhvj VqKmBO lrbGYF WFvtYc 9FeFPl XAHsv8 cp7dLG VwJMht
sz7tNa OCDebL 3XyHL9 4NrD6b xCALJv RUoSl9 jpywkA 9JJg5Y cQSHam T4ACuG
MJGojD uarAAO QmkNiP DriWbM I9grRP Wsxlkw 7hdBSz vTRVKE 1U5CAK iua01m
DhcTSm pL8r7b podCXT JomI1N B4a6fD GbmlyA Gi18vQ 6qTikd rwHQZS 20l0pU
...
EcMt5A kEka05 5azHox uRhPlE Xh5PCm 28LjtL o5bzoe AAAAAG Mt1IvW bjfNp1
d6lLyZ iyJAKM quAT8w SuxpOj iAAAAA AAAAAA AABlta 7WXyEO ism4GD 7zKKwt
j0i8Ct Xl
The most obvious observation is that it consists of alphanumeric characters (a-z,A-Z,0-9), only. Therefore it can't be base64 encoded – more probably some sort of base62 encoding. Because log2(626) = 35.7251 is a somehow weird value, the six character blocking does not make much sense, if one assumes a binary encoding below the base62 layer.
The next observation come from the character frequencies: the character 'A' is much more likely to be encountered than any other character. This statistical anomaly does not stem from long runs of "AAA...A", these runs are only present in the first and last messages shown above. The many 'A' sprinkled all over the place turn out to be periodic and suggest a different blocking scheme as follows:
AAAAAAAAAAA4NcPhvjVqKmBOlrbGYF
AHsv8cp7dLGVwJMhtsz7tNaOCDebL3
ALJvRUoSl9jpywkA9JJg5YcQSHamT4
...
AEcMt5AkEka055azHoxuRhPlEXh5PC
AAAAAGMt1IvWbjfNp1d6lLyZiyJAKM
AAAAAAAAAAAAABlta7WXyEOism4GD7
Now the truncated final tweet "j0i8Ct Xl" also makes sense, because it exactly completes a 43 character block. Also log2(6243) = 256.03 is much, much nicer and suggests a base62 encoding of 256 bit blocks.
The third hint comes from the statistics of the second character in each block: it always is in the range A-O. Factoring in this fact and that the first character is always 'A', we get a block entropy of log2(15*6241) = 248.03, suggesting that one byte in an underlying binary 32 byte block must be fixed.
The next step is to find the correct base62 decoding. In the spirit of Benford's Law (integers are more likely to start with lower digits), we guess that 'A'..'Z' map to the values 0..25. Also, we guess that '0'..'9' and 'a'..'z' map to contiguous ranges. With these assumptions, the final parameters for the decoding can be found by trial and error:
'A'..'Z','0'..'9','a'..'z' map to 0..61 and the blocks are big-endian integers, which can be decoded like in the following example:
1. decoding the digits:
A A A A A A A A ... t Y c 9 F e F P l X
0 0 0 0 0 0 0 0 ... 55 24 38 35 5 40 5 15 47 23
2. computing the integer value:
0*6242 + 0*6241 + 0*6240 + 0*6239 + 0*6239 + ...
... + 40*624 + 5*623 + 15*622 + 47*621 + 23*620
= 110772383565647195068927129751
3. repesenting that as big-endian hexadecimal integer:
2D2D2D2D2D2D20424547494E204649
which obviously corresponds to the ASCII string:
"------ BEGIN FILE ------"
Decoding 31 bytes from every 43 character block in this way (and omitting the "------ BEGIN FILE ------" and "------ END FILE ------" strings) yields:
So, it's not as nefarious as it seems. At least at first sight. I'm still exploring the phenomena, though :-)