View Full Version : Any audiologists in the house?
Hexchild
25 Apr 2006, 11:36 AM
If so, I might have use for your insights.
I'm working on an audio compression algorithm which involves a step where, once every ~1/100th of a second, there are 510 different frequency bands evenly spaced from ~43Hz to ~22007Hz, and a decision is made on which ones to store and which ones to dismiss. This decision directly affects the compressed size of the file, as well as the quality of the uncompressed audio.
I already have a somewhat decent algorithm in place, which dismisses bands where both the absolute amplitude and amplitude change are considered low enough compared to the average amplitude for all frequencies. I've managed to bring the compressed file to about 25% to 50% of the original size (for 16 bit, 44100Hz PCM .wav files) without hearing any major artifacts. This is close to MP3, but not quite as good. I suspect that with the right decision process at this step I might be able to surpass the quality of MP3.
What I hope to achieve with this thread is basically a formula for deciding which bands will contribute to the listening experience, and which ones can be deemed unimportant and thus subject to guesswork rather than storage.
I did find this site (http://arts.ucsc.edu/EMS/Music/tech_background/TE-03/teces_03.html), which has a bit of information on the subject, but I think I would do way better discussing this with knowledgeable people than I would reading age-old documentation.
EDIT: I have corrected the low and high frequencies above. I forgot to account for the fact that I (to put it simply) cut the frequency spectrum in half at one point.
Architectonic
25 Apr 2006, 12:22 PM
I would suggest asking over at the Hydrogen Audio (http://www.hydrogenaudio.org/forums/index.php) Forum.
Compression ratios of around 50% (wavpack claims 30-70%) are able to be achieved losslessly (without throwing any information away, through the use of predictive algorithms, basically like the zip format, but for music).
There are already a number of formats that are similar to what you are suggesting - they throw information away, but they don't necessarily use a psychoacoustic model to determine this. Then you have the common lossy codecs like mp3, ogg vorbis etc that use a psychoacoustic model, with noise shaping to achive their high compression rates while maintaining (almost always) transparent quality.
I assume you have read this: http://en.wikipedia.org/wiki/Audio_data_compression
last_caress
25 Apr 2006, 01:12 PM
I know that some people believe that super and subsonic frequencies contribute to the listening experience.
Post some examples.
Hexchild
30 Apr 2006, 12:10 AM
I would suggest asking over at the Hydrogen Audio (http://www.hydrogenaudio.org/forums/index.php) Forum.
I'll keep that forum in mind and might visit later. Thanks.
Compression ratios of around 50% (wavpack claims 30-70%) are able to be achieved losslessly (without throwing any information away, through the use of predictive algorithms, basically like the zip format, but for music).
Yeah, around 40% - 60% was about as good as it seemed to get before I started experimenting with lossy compression.
There are already a number of formats that are similar to what you are suggesting - they throw information away, but they don't necessarily use a psychoacoustic model to determine this. Then you have the common lossy codecs like mp3, ogg vorbis etc that use a psychoacoustic model, with noise shaping to achive their high compression rates while maintaining (almost always) transparent quality.
Yeah, I'm aware of those, but the fun lies in developing my own algorithm from scratch, so I avoid looking at existing ones. On the other hand, I have little interest in putting together a large enough study group and gathering data on psychoacoustics; hence this thread.
I assume you have read this: http://en.wikipedia.org/wiki/Audio_data_compression
Actually I hadn't. Silly me, I keep forgetting about Wikipedia. Thanks again.
Hexchild
30 Apr 2006, 12:30 AM
I know that some people believe that super and subsonic frequencies contribute to the listening experience.
I think they probably do, but in a very subtle way, yielding a general impression rather than discernable detail. Also, with low enough frequencies you start to feel the sound rather than hear it, which while not strictly being part of the listening itself, probably makes the hearable parts of a sound seem more concrete. I also suspect that relative phase is more important than it's generally made out to be.
A friend of mine claims to be able to hear oscillations from 15Hz to about 30KHz, and he prefers his PCM data at 24-bit, 44100Hz quality. He also claims to hear the difference between any MP3 and the original WAV file. Normally I can't tell the difference at all.
Post some examples.
I'm in the middle of restructuring the code, but once I'm done with that (as well as the plethora of bug fixes that are likely to follow) I might get a few files uploaded. If I do so I'll post some links here.
Snowflake
30 Apr 2006, 12:59 AM
I know that some people believe that super and subsonic frequencies contribute to the listening experience.
Post some examples.
That's ridiculous. Subsonic frequencies are felt. Supersonic frequencies can't be heard.
Digital equipment is capable of subsonics, but not supersonics. Analog equipment is capable of neither.
I bet 99% of people wouldn't notice in a double blind ABX test.
Snowflake
30 Apr 2006, 01:02 AM
He also claims to hear the difference between any MP3 and the original WAV file. Normally I can't tell the difference at all.
It's possible to hear a difference if you have extended high frequency hearing as most MP3s are passed through a highcut filter at around 19khz, but this is assuming only that your sound system is capable of these upper frequency ranges.
last_caress
30 Apr 2006, 01:24 AM
That's ridiculous. Subsonic frequencies are felt. Supersonic frequencies can't be heard.
Digital equipment is capable of subsonics, but not supersonics. Analog equipment is capable of neither.
:wtf:
I bet 99% of people wouldn't notice in a double blind ABX test.
I can't find the links right now, but I think it's possible that infra and ultrasonic waves interact with one another or a perceptible frequency to create audible harmonics.
That in addition to the theory that sub and supersonic sound can affect mood.
Architectonic
30 Apr 2006, 09:35 AM
A friend of mine claims to be able to hear oscillations from 15Hz to about 30KHz, and he prefers his PCM data at 24-bit, 44100Hz quality. He also claims to hear the difference between any MP3 and the original WAV file. Normally I can't tell the difference at all.
Oooh, an 'audiophile' eh? Be sure to shatter his illusions by asking him to prove it with ABX testing. (as suggested by Snowflake..)
That in addition to the theory that sub and supersonic sound can affect mood.
This is a possibility (I'm mostly thinking of ultrasonic sound (above 20khz)) based on some studies, but there isn't much consensus yet.
However, we are talking about music that was recorded on (both analog and digital) equipment that was not designed to record sub/supersonic sound - so most of the sound above 20khz in our current recordings is just noise and hence should be discarded.
In regards to subsonic sound, I am quite interested in the lower octaves, but I haven't really found any recordings that have significant content below ~20hz. But the bandwith required for 20hz recordings is so small, that its worth keeping anyway (it is usually kept in mp3 and most other compressed audio formats..). Any filters used on the lowest octave are there to filter out dc offset.
Secondly, when live instruments are used if the 'infra and ultrasonic waves' created any audible (within 20khz) harmonics, then these would have already been recorded and thus don't need to be accounted for.
Architectonic
30 Apr 2006, 09:45 AM
Yeah, I'm aware of those, but the fun lies in developing my own algorithm from scratch, so I avoid looking at existing ones. On the other hand, I have little interest in putting together a large enough study group and gathering data on psychoacoustics; hence this thread.
Well, as long as you aren't trying and reinvent the wheel, without studying the usefulness of the current wheel. (I hope you know what I mean).
Do check out the Hydrogen Audio forum - there are some extremely knowledgeable people there, including the developers of a variety of different compression formats.
Snowflake
30 Apr 2006, 02:17 PM
But the bandwith required for 20hz recordings is so small, that its worth keeping anyway (it is usually kept in mp3 and most other compressed audio formats..). Any filters used on the lowest octave are there to filter out dc offset.
No it's not. Most sound systems aren't capable of much below 30hz. And to feel 10hz, you'd need a serious system specifically designed for that task. In most recordings, anything below 30hz is usually cutoff so as to prevent these unecessary capitulations for ruining sound equipment and making the mastering process more difficult than it needs to be.
Anyway, we're seriously off topic guys. This is about audio compression. Supersonics and subsonics are usually not considered as they generally don't fit into the audio range most people can hear even if their systems were capable of reproducing these ranges.
Architectonic
30 Apr 2006, 02:44 PM
No it's not. Most sound systems aren't capable of much below 30hz. And to feel 10hz, you'd need a serious system specifically designed for that task. In most recordings, anything below 30hz is usually cutoff so as to prevent these unecessary capitulations for ruining sound equipment and making the mastering process more difficult than it needs to be.
You are correct about the recording/mastering process. Anything below 30hz is usually cut off.
But when encoding into a digital format, its unnecessary to filter this as it should have already been done during the recording/mastering process and the aim of compression is usually to preserve the original signal as much as possible.
Hexchild
5 May 2006, 01:30 AM
Right. Your disagreement reminds me that the frequency spectrum available is limited to about 20KHz (I fold the frequency spectrum back into itself halfway to the sampling frequency due to aliasing (http://en.wikipedia.org/wiki/Aliasing)). I've corrected this in the OP.
Anyhow, this thread isn't mainly about audio compression. It's about psychoacoustics, or specifically how much, and what parts, of a sound stream you can mask/filter out before the listener starts noticing.
Thanks for the input so far. Since there is at least a vague interest in this, I'll try to remember to keep you posted on my progress.
Snowflake
5 May 2006, 02:00 AM
Please do, it is useful information.
Powered by vBulletin™ Version 4.0.7 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.