Sorry man, I just don’t think this is true, at all.
If you are hearing YOUR OWN voice, then you may (and I stress the may), be able to notice the delay. But for any other operation, there’s just no way you notice anything, because you run into this constantly throughout your normal operations in life. You constantly talk to people who are 10 feet away from you. At no point do you EVER notice the 10ms delay that it takes for the sound to travel those 10 feet. This just isn’t a thing that happens, to anyone, ever.
No dude, you can’t. Again, this happens all the time. Hell, I routinely talk to people in my office that are 20 feet away. There’s no noticable delay in how long it takes for stuff to get to your ears compared to what you are seeing. And you can clearly see someone’s lips at 20 feet.
On basically any iOS application, there is far more of a delay between you touching the screen and you getting auditory feedback, even if you are using a wired headset… just because of how the software works.
20ms is an essentially trivial amount of time.
Now, if you do some specific tests to try and establish what the human ear’s auditory framerate is? The amount of distance between two tones where they perceive them as distinct tones instead of just a single one? THEN you might start to be onto something, but even there I’ve seen stuff suggesting that the average human can’t perceive a difference in tones that is less than 30ms.
Dude, this isn’t even remotely true.
This isn’t like your brain works with steroscopic vision. In that case, yes, inconsistent signals will create massively disconcerting effects, because your eyes are specifically designed to provide fairly precise distance estimations.
Your auditory system does not function like that. It’s not possible for your auditory system to function like that. Due to the way your ears are positioned, and the way that sound works including differences in volume, you are not able to do a steroscopic range test like bats do. Humans just don’t have that ability, or if they do, I have never seen any scientific literature about it, ever.
Your ability to judge the distance of sound is far more tied to relative volumes and other acoustic effects that are almost entirely based upon the specific environment the sound is taking place in. The stereoscopic nature of your auditory system provides bearing, not range information.
Yeah, it says you can’t lead by more than 15ms, but you can trail by up to 45ms… because trailing by 45ms is largely imperceptible at all. And we’re talking specifically about lip syncing audio at this point.
It’s faster on the leading edge, because it’s far more likely that you will notice things like someone’s voice while they aren’t moving their lips at all… but once they are actually talking? Virtually none of the normal population can detect a sync that is off by less than 45ms. Because you have very little ability to perfectly match the phonemes of sound to lip movement for an arbitrary person. Hell, the lip syncing in any videogame is going to be orders of magnitude worse in this regard than anything you’re going to run into in a 45ms delay.
That’s why the standard is that. And that’s why a delay of less than 45ms largely doesn’t matter much. Hell, it used to be that they intentionally added a fixed 2.5 frame delay to audio on television, always. That’s about 100ms.
Again, you’re constantly encountering audio delays in your daily life, in everything from using basically any piece of software on your phone, to simply hearing sounds. It doesn’t significantly impact you.
I’m not saying that the human ear can’t technically perceive a very small difference in audio delay… but such a thing is essentially just ignored by your brain, because you constantly encounter other natural effects that result in more significant delays anyway.