This is a follow-up and a “get what I’ve done down” to Listening to Skype Voicemail .dat files.
I got back on the “Lets get these 70+ Skype voicemails listened to” bandwagon…
Filename relevance
I previously pondered the relevance of the filename that the voicemail was saved under and, aside from the embedding of the Unix timestamp as one of the 4 numerical sets I couldn’t really determine the actual relevance of the other 3 and what they actually imply. I went so far as to load SkypeLogView with my old database to see the Voicemail records and determine if any of that data could be correlated. Aside from database “Action Time” and Unix timestamp the short answer for additional correlation is a resounding no.
SILK -> Opus
I was re-creating my Music collection from a long, long time ago, making sure I get FLACs instead of MP3s. My curiosity led me to look into FLACs and how awesome they were and I came across Opus.
Looking into Opus and it’s RFC 6716 spec I saw it was extended from two previous codecs: SILK and CELT.
SILK was originally on the standards track in 2010 as draft-vos-silk-02 by Skype Technologies S.A., complete with encoder/decoder. This overly tweaked my curiosity, but the voicemail file formats look nothing like what the Silk stream looks like, so I decided to dig deeper into the files…
00 20 80 12
In deciding to look at the 1s and 0s of the voicemail files there wasn’t really a “file signature”, or a magic header that determined something specific. A large majority of the files either started off with 4 00s or with 00 20 80 12. Consequently the files that started off with 4 00s also had the 4-digit 00 20 80 12 afterwards. After writing a couple files and making a spreadsheet to help visualize the data, and then later on extending the quantity of data I was looking at I was seeing that 00 20 80 12 was the start of a packet of data.
In fact, the voicemail files contained a plethora of packets with 20 bytes of data curiously hidden:
00 20 80 12 | header (magic number)
xx xx xx xx | counter
xx xx xx xx | counter
00 00 | padding
xx xx xx xx | data
xx xx xx xx | data
xx xx xx xx | data
xx xx xx xx | data
xx xx xx xx | data
00 00 | padding (optional)
This was just a breakthrough for me. Writing a program to convert voicemails to “streams” of data helped to extract out the protocol junk and gave me data to attempt to plug into a TON of programs (again). Unfortunately, all those programs yielded static and more static.
Silk Encoder/Decoder
Extracting all 140 files from draft-vos-silk-02 was a minor editing/programmatic effort. After fixing a space/tabs issue with the Makefile the build was able to create for me encoder and decoder.
The source needed to be raw PCM data, so I grabbed a generic wav file from the interwebs and plugged it into encoder to get a bit file. Subsequently running the decoder on the bit file gave me back my raw PCM data that I could import into Audacity and verify that, in fact, this SILK program was working fine.
Plugging in my previously-created voicemail/stream files led to segfaults.
Looking at both the decoder source code and the bit file I see that the decoder reads “packets” of data and then sends them to the decoder routine. The loop that loads the bit file was especially telling:
for( i = 0; i < MAX_LBRR_DELAY; i++ ) {
/* Read payload size */
counter = fread( &nBytes, sizeof( SKP_int16 ), 1, bitInFile );
/* Read payload */
counter = fread( payloadEnd, sizeof( SKP_uint8 ), nBytes, bitInFile );
if( (SKP_int16)counter < nBytes ) {
break;
}
nBytesPerPacket[ i ] = nBytes;
payloadEnd += nBytes;
}
Read payload size (16 bits), then read payload.
Modified my stream program to create the necessary bit files, where the payload size was a static 20 bytes followed by the voicemail data. Plugged all this into the decoder:
$ ./decoder V1323104856M1M1000000417D396988293.bit out.pcm
******************* Silk Decoder v 1.0.2 ****************
******************* Compiled for 64 bit cpu *********
Input: V1323104856M1M1000000417D396988293.bit
Output: out.pcm
Frames decoded: 1SKP_Silk_SDK_Decode returned -10
Frames decoded: 3SKP_Silk_SDK_Decode returned -10
Frames decoded: 4SKP_Silk_SDK_Decode returned -12
Frames decoded: 5SKP_Silk_SDK_Decode returned -12
Frames decoded: 6SKP_Silk_SDK_Decode returned -12
Frames decoded: 9SKP_Silk_SDK_Decode returned -12
Frames decoded: 10SKP_Silk_SDK_Decode returned -12
Frames decoded: 11SKP_Silk_SDK_Decode returned -12
Frames decoded: 12SKP_Silk_SDK_Decode returned -12
Frames decoded: 13SKP_Silk_SDK_Decode returned -12
Frames decoded: 14SKP_Silk_SDK_Decode returned -10
Frames decoded: 15SKP_Silk_SDK_Decode returned -10
Frames decoded: 16Segmentation fault (core dumped)
Blasted. I didn’t go so far to see what the return codes implied or meant, or why the segfault happened on the 16th frame. I did, however, get an output file that I was able to plug into Audacity, and instead of hissing I got silence.
Where to go from here…
Knowing where to find the data was a large milestone from here.
Knowing that it’s not really raw SILK that’s being stored is a good start but doesn’t completely eliminate it.
Knowing that I could use Opus and create some generic Ogg-containerization routine to shove these streams into could be a next step.
Being able to run Skype with a disassembly program to step through and see exactly what the program would be doing would be a timely process…
Maybe while letting this sit for another 12+ months another idea would come along, or some discovered realization would set itself into my gourd and lead down a better path.
Updating code/scripts/info at https://github.com/mjheick/skype7server