Continued at Still trying to listen to Skype Voicemails…
Like thousands of other users I religiously used Skype to communicate with many friends and coworkers back in the early 2010s. It was a great platform, with the ability to send messages fluently from computer to phone and vice-versa, as well as make long and drawn out video calls. You could purchase a telephone number from anywhere in the world to have a presence in that country (as I did), and with it you gained voicemail. It did everything perfectly except save voicemails in a reusable format.
I’m not the only person that has a need/want to listen to these types of audio. There are forums of people who have their own needs, such as fathers voices and passed family members. The common solutions proposed are “download VLC”, “use Microsoft Word and run a repair”, or use a “DAT player”, all are non-functional or stupid solutions. There is a ton of common use cases for these old files and the technical solutions are far and none between.
Skype voicemails, once listened to, were downloaded from the Skype servers and stored in the users Skype profile as a dat file. Unfortunately, “dat” files are a general file format and have no immediate player that can open and listen to those files.
Time to dig in. Challenge Accepted!
Filenames
Personally I have 72 voicemail files, all with the same filename format:
V1286618205M2M72423D3040734713.dat
V1323104856M1M1000000417D396988293.dat
V1428288516M1M1000001198D3920752677.dat
- Starting with “V”, of course, we can infer this is a “Voicemail”.
- The next 10 numbers are a unix timestamp of when this voicemail was recorded. 1286618205 would be Sat Oct 9 05:56:45 EDT 2010, 1323104856 would be Mon Dec 5 12:07:36 EST 2011, and 1428288516 would be Sun Apr 5 22:48:36 EDT 2015.
- M <number> M is not quite obvious yet. <number> could be 1, 2, or 3
- A series of numbers, unknown
- D, most likely a separator
- A series of numbers, unknown
One thing we can derive from the filenames is that they contain nothing in them that would be considered cryptographic, as in there is information required of the filename to listen to the audio files. This is asserted from an article on sqliteforensictoolkit.com where their method to listen to old Skype voicemails was to basically “have” a voicemail listed in a conversation, record “when” it was, then replace the voicemail filename with a different voicemail to listen to a different voicemail.
Just as a side, we can use some of the data that’s present on sqliteforensictoolkit.com‘s article to attempt to derive some information, such as the voicemail filename is V1331159474M1M1000000486D2568798973.dat, which has a unix timestamp of Wed Mar 7 17:31:14 EST 2012, a voicemail audio length of 0:12 and a reported filesize of 20k. Having a file size and the audio length can help assist whether the data is compressed or not, and possibly derive a sampling rate, channel information, and possible sampling bit depth.
Personally I have two voicemails that exist around the filename from sqlliteforensictoolkit.com. I added spaces around the constituent parts to see if there was any additional information that could be derived:
V 1330891396 M1M 1000000459 D 1701397653 .dat
V 1331159474 M1M 1000000486 D 2568798973 .dat
V 1331760616 M1M 1000000462 D 237787267 .dat
Containerizing the dat with Audacity
If I were storing data that my application could only use there would be a few reasons I would do this. Every file would utilize the same audio codec and have the same format. With this thought active on my mind I figured I could use Audacity to import the dat files in as raw data and, with some combination figure out if the audio sounded familiar by trial and error.
I’m going to assume some things about these voicemail files to maybe provide a better shot at this:
- Voicemails would be 1-channel/monaural audio
- They could be 8-bit audio, but most likely 16-bit.
- If the data is uncompressed, then a 20k file that lasts 12 seconds would have as sampling rate of 1.7khz at 8-bit, or 850hz at 16-bit per the previous information derived.
Importing one of these audio files with the lowest sampling rate available (8000 hz) is what we’re going to do. I looked through my list and located one that was about 20k so that my end result was to be a 12-second voicemail. I had tried all the listed encodings with only 1 encoding (GSM 6.10) showing me a 12-second audio file, but the output was garbled. All the rest of the Encodings provided were just noise.
Containerizing the dat with ffmpeg
Still going on my “this thing needs a container” hunch I enlisted my local copy of ffmpeg (currently 5.1.1). I wrote a quick bash/perl wrapper to rename a dat file to a format that ffmpeg should recognize, use one of it’s demuxers, and try to output a wav file that I could use. If it happens to detect it’s proper format then it will be able to convert it to some PCM format to listen.
#!/bin/bash
F="V1317134712M3M3452327D2199135011.dat";
for EXT in `ffmpeg -hide_banner -formats | perl -e '
my ($e);
while(<>) {
chomp;
if (substr($_,1,1) eq "D") {
$e=substr($_,4,15);
if ((index($e,",")==-1)&&(substr($e,0,1) ne "="))
{$e=~s/\s+$//;print $e."\n";}
}}'`; do
echo "extension: $EXT"
cp $F $F.$EXT
ffmpeg -hide_banner -i $F.$EXT $F.$EXT.wav
echo
rm $F.$EXT
done;
Out of 343 possible demuxers I got 25 possible hits. Out of those 25 possible hits, 13 of them had actual data to listen to. Out of those 13 none of them were listenable as an actual voicemail.
I also decided to attempt to use the audio decoders instead using the same brute-force logic:
#!/bin/bash
F="V1337183958M1M1000000534D3692593007.dat";
for DECODER in `ffmpeg -hide_banner -decoders | perl -e '
my ($e);
while(<>) {
chomp;
if (substr($_,1,1) eq "A") {
$e=substr($_,8,15);
if (substr($e,0,1) ne "=")
{$e=~s/\s+$//;print $e."\n";}
}}'`; do
cp $F $F.$DECODER
echo "Decoder: $DECODER"
ffmpeg -c:a $DECODER -i $F.$DECODER -y $F.$DECODER.wav
echo
rm $F.$DECODER
done;
Out of 201 audio decoders I got 5 possible hits and none of them were the expected result. However, g723_1 and g729 seemed to get me something that sounded remotely close to what would be the codec, but there was nothing listenable in that.
Specific WebRTC Codecs
There was a beautiful analysis done of Skype 1.4 that broke down how the platform works, complete with 3 audio codecs that were “seen” (iLBC, iSRC, iPCM). 2 of the 3 codecs that were created by Global IP Sound was purchased by Google IP Solutions in 2011 and were provided for WebRTC.
Looking up iLBC I found a LinkedIn writeup by Costas Latsavounidis that makes the process to “create” an iLBC file from data fairly easy (thanks to the RFC). All he did was place the following 8 bytes as a header to the file:
The file begins with a header that includes only a magic number to
identify that it is an iLBC file.
The magic number for iLBC file MUST correspond to the ASCII character
string:
- for 30 ms frame size mode:"#!iLBC30\n", or "0x23 0x21 0x69
0x4C 0x42 0x43 0x33 0x30 0x0A" in hexadecimal form,
- for 20 ms frame size mode:"#!iLBC20\n", or "0x23 0x21 0x69
0x4C 0x42 0x43 0x32 0x30 0x0A" in hexadecimal form.
After the header, follow the speech frames in consecutive order
Writing a small PHP script to do this wasn’t that difficult, either:
<?php
/* https://www.linkedin.com/pulse/sayhi-saved-audio-messages-ilbc-files-costas-katsavounidis/ */
$dat_file = 'V1317134712M3M3452327D2199135011';
$dat_file_contents=file_get_contents($dat_file . '.dat');
/* write header */
$fp = fopen($dat_file . '_ilbc20.ilbc', 'wb');
fwrite($fp, chr(0x23) . chr(0x21) . chr(0x69) . chr(0x4c) . chr(0x42) . chr(0x43) . chr(0x32) . chr(0x30) . chr(0x0a));
fwrite($fp, $dat_file_contents);
fclose($fp);
$fp = fopen($dat_file . '_ilbc30.ilbc', 'wb');
fwrite($fp, chr(0x23) . chr(0x21) . chr(0x69) . chr(0x4c) . chr(0x42) . chr(0x43) . chr(0x33) . chr(0x30) . chr(0x0a));
fwrite($fp, $dat_file_contents);
fclose($fp);
Still, the verdict on both 20ms and 30ms frame-sized iLBC files did not successfully bring back voicemail.
Other Theories
Outside of brute-forcing possible codecs there is a couple points to make that would be investigative pathways:
- I’ve seen posts purporting that the data is not encrypted once saved as a file. With how the waveforms appear as randomly garbled data it’s possible they are encrypted.
- It’s been said that voicemails could only be played on the local device (Arissa O), which would imply some device-level encrption.
- It’s been stated many times that the application is the only way to listen to voicemails. With the application no longer functioning, it’s possible that the protocol to authenticate could be deconstructed and maybe a way to have a chatbot with a listened voicemail could be the best way.
- We could attempt to reverse-engineer a downloadable Skype 6/Skype 7 and see if there are programmable APIs/DLLs that we can borrow/steal to pipe voicemail audio through.
Unfortunately this challenge will have to wait, but there are avenues to go with this since, for me, it is still a want more than a need.
Additional research will most like happen at https://github.com/mjheick/skype7server