August 03, 2007
Yesterday I wondered what would happen if I stored the raw data from a pixmap image to a WAV file, converted that file to an MP3, then back to a WAV, then finally reconstituted it as a pixmap.
Here's the original photo I used (I can't remember what site I got this from, unfortunately [see update below]):
You can hear what this image sounds like (16 khz sample rate).
It turns out the answer is that there's no noticeable difference after one conversion cycle.
In order to see any real difference I had to do multiple generations of mp3 compression—all done at a constant 96kbps.
(To hear the audio artifacting that comes from repeated mp3 compression, check here.)
Here's the result after 20 generations:
Gavin pointed out that it mostly just lost contrast. If you renormalize the histogram, this is what you get:
Now I want to see the result of compressing audio with JPEG, but I wouldn't be able to stand the disappointment if I spent a couple hours doing it myself and then discovered that again there was no interesting difference. So I guess that means it's your turn to contribute something for once.
Update: The photo is by my buddy Christopher Trott.
Update: There's more discussion on programming.reddit.com.
Posted by jjwiseman at August 03, 2007 03:55 PM
I think it would be much more interesting if you tried lower compression rates -- and tried other algorithms. 96kbps sounds bad to the trained ear but it really preserves quite a lot of the information. I suggest you try a much lower compression rate like 24.
Awesome image, that photographer sure knows how to take cool photos!
good experiment, would you mind posting the "after" pictures as .PNG ?
So how much compression did you get (in the first pass)?
The .raw file would appear to be 1 620 000 bytes and the mp3 file 1 621 476 bytes, so it's not all that suprising that there's little loss of information. Interesting that it's all in the dynamic range though; I guess now we have a nice visual proof that MP3 encoding significantly hurts dynamic range.
Incidentally, non-audio data turned into sound is generally referred to in the "scene" as 'binary noise'.
(Yes, this is a scene.)
could you please tell which tools were used and how? i'd like to see what will happen if lowest possible compression rate (kinda 8 kbps or so) will be used.
certainly wav->mp3->wav conversion is simple via lame encoder. but i don't know which way is best for producing pixmap and stuffing metadata for tools to understand.
if you have some scripts, it would be great if you can share them.
what happens to the picture if you mix a voice over the picture audio?
It will be much more apropriate to compress greyscale image (one band of data). Or convert two components into stereo mp3.