Android - what's in your pocket?

Latency of 45 ms is definitely noticeable. The limit of detection for normal people is around 10 ms, and people may report 20 ms as annoying. For musicians, 5 ms is detectable and potentially annoying.

You said you don’t understand why people use wired headphones. I was simply providing a use case. There are people who play rhythm games on their phones (e.g. me.)

I 'm not sure what you’re trying to argue here. The jack takes up X amount of internal space. That space can be used for more battery (or other components). It’s not like that space somehow magically disappears in your phone. Your phone could have had X amount more battery, over and above what it has now. Whether your phone has more or less battery than an iphone isn’t the relevant issue.

Internal space and power consumption are two of the most limiting factors in phone design. When a phone manufacturer drops the 3.5mm jack, they’re rarely, if ever, doing it to save the price of the part.

Obviously not every phone manufacturer arrives at the exact same calculus, but the % of phones in use with the jack will decrease, it will never increase again.

Conversely, some phones will always offer it. The audio phone jack is a standard that dates back over 140 years. It’s robust and versatile and isn’t going anywhere.

I’m arguing that removing the jack was not necessary to meet any of the listed design specs for the iPhone X.

I don’t see how your LG phone having a bigger battery than an iphone establishes that, at all. Are you making the assumption that other than the jack and the batteries, the internals of the two phones (and the total internal volume) are the same? Where does that assumption come from? Now, if you want to point to a teardown of an iphone that shows amble amounts of internal space left unused, that would be a different matter.

Dude, this just makes no sense at all.

Sound moves at a speed of around 1 foot per milisecond.

If 10ms was noticeable, then that would mean that sound from 10 feet away from you was somehow perceptibly different than sound from 1 foot away from you, in something other than volume. When you are sitting 10 feet away from someone, you aren’t sitting there annoyed at the “delay” created by the soundwaves traveling 10 feet to your ear, right?

At basically any concert you have ever been to, the delay you experience in the sound reaching your ear is greater than that.

If you are playing the music yourself, THEN it might matter… although even then I’m extremely skeptical, just because you have folks who actually play music ON devices like an ipad, where you have a greater latency between touching the screen and the sounds than that.

They are obviously not the same. But since the iPhone X lacks a feature that the v35 has, then I think it’s reasonable to ask what feature the iPhone has added to compensate. The iPhone X is heavier, has a smaller screen, a smaller battery, a smaller screen/body ratio, fewer pixels, etc.

So, what feature did Apple give us that necessitating losing a 3.5 mm jack? As far as I can tell the answer is nothing. They dropped the 3.5 mm jack because they can’t design a phone as efficiently as LG can. Which is something!

Judging by the iphone teardowns, the FaceID sensor package (which also powers improved FaceTime functionality) is probably the biggest thing. After that, it’s probably the taptic engine and A12 chip, but I haven’t looked at any teardowns of your LG to see how big it’s haptic and processor components are.

Also, judging from external pictures, your LG is at least longer, and probably also wider, than an iphone XS. I suspect the available internal volume of the two phones is substantially different.

Again, there’s no need for comparison: the jack takes up a fair amount of space. That’s a simple, straightforward fact. What each manufacturer who removes it chooses to replace with it may vary, but the fact that it takes up space is irrefutable. If you think LG is better at making space efficient designs, great. Then those LG engineers could have used that space for something else, if the design dictated no jack. I don’t see why that’s even debatable.

I think you are confusing a few questions.

Can you detect a delay of 10 ms? That’s the threshold for most people, so by definition the answer is “occasionally, but not always”.

Can you detect a delay of 20 ms? For most people, the answer is “usually”.

Ok, so is a 20 ms delay annoying? Well, that depends. At 20 feet, it’s hard to see people’s lips move in real life. So probably not. At a concert, it’s very hard to correlate what you see to what you hear. So almost definitely not.

Ok, what about seeing something that is easy to correlate, like the noise of a vase crashing into the ground? Well, the brain is an amazing thing. If something looks far away, your brain expects a delay in the sound it makes, and the delay actually contributes to the perception of distance. In fact, so does the delay between sound reaching your right ear and reaching your left ear, only a few inches apart! It all registers and subconsciously contributes to sound localization.

But problems occur when someone is trying to trick your sense of localization. Say, by putting a picture very close to you, but with a sound cue that suggests that it is far away. That’s when your brain gets annoyed. And in fact, the Society of Motion Picture and Television Engineers suggests limits for A/V synchronization ranging between 15 ms and 45 ms, depending on the medium and type of mismatch.

Yes, but that’s not the relevant issue. You don’t justify removing something simply because it takes up space. You justify it by finding something that makes better use of the space. After all, the screen also takes up space, but removing it would be stupid because nothing would make better use of the space. And in the case of the iPhone X, I think they have not justified removing the 3.5 mm jack. FaceID notwithstanding. Furthermore, the original justifications by Apple (waterproofing, larger batteries) are clearly not incompatible with a 3.5 mm jack.

Finally, maximizing width/length/screen percentage while minimizing thickness/weight is an advantage, not a disadvantage. It’s actually what distinguishes the iPhone X from the cheaper iPhone models.

Sorry man, I just don’t think this is true, at all.

If you are hearing YOUR OWN voice, then you may (and I stress the may), be able to notice the delay. But for any other operation, there’s just no way you notice anything, because you run into this constantly throughout your normal operations in life. You constantly talk to people who are 10 feet away from you. At no point do you EVER notice the 10ms delay that it takes for the sound to travel those 10 feet. This just isn’t a thing that happens, to anyone, ever.

No dude, you can’t. Again, this happens all the time. Hell, I routinely talk to people in my office that are 20 feet away. There’s no noticable delay in how long it takes for stuff to get to your ears compared to what you are seeing. And you can clearly see someone’s lips at 20 feet.

On basically any iOS application, there is far more of a delay between you touching the screen and you getting auditory feedback, even if you are using a wired headset… just because of how the software works.

20ms is an essentially trivial amount of time.

Now, if you do some specific tests to try and establish what the human ear’s auditory framerate is? The amount of distance between two tones where they perceive them as distinct tones instead of just a single one? THEN you might start to be onto something, but even there I’ve seen stuff suggesting that the average human can’t perceive a difference in tones that is less than 30ms.

Dude, this isn’t even remotely true.

This isn’t like your brain works with steroscopic vision. In that case, yes, inconsistent signals will create massively disconcerting effects, because your eyes are specifically designed to provide fairly precise distance estimations.

Your auditory system does not function like that. It’s not possible for your auditory system to function like that. Due to the way your ears are positioned, and the way that sound works including differences in volume, you are not able to do a steroscopic range test like bats do. Humans just don’t have that ability, or if they do, I have never seen any scientific literature about it, ever.

Your ability to judge the distance of sound is far more tied to relative volumes and other acoustic effects that are almost entirely based upon the specific environment the sound is taking place in. The stereoscopic nature of your auditory system provides bearing, not range information.

Yeah, it says you can’t lead by more than 15ms, but you can trail by up to 45ms… because trailing by 45ms is largely imperceptible at all. And we’re talking specifically about lip syncing audio at this point.

It’s faster on the leading edge, because it’s far more likely that you will notice things like someone’s voice while they aren’t moving their lips at all… but once they are actually talking? Virtually none of the normal population can detect a sync that is off by less than 45ms. Because you have very little ability to perfectly match the phonemes of sound to lip movement for an arbitrary person. Hell, the lip syncing in any videogame is going to be orders of magnitude worse in this regard than anything you’re going to run into in a 45ms delay.

That’s why the standard is that. And that’s why a delay of less than 45ms largely doesn’t matter much. Hell, it used to be that they intentionally added a fixed 2.5 frame delay to audio on television, always. That’s about 100ms.

Again, you’re constantly encountering audio delays in your daily life, in everything from using basically any piece of software on your phone, to simply hearing sounds. It doesn’t significantly impact you.

I’m not saying that the human ear can’t technically perceive a very small difference in audio delay… but such a thing is essentially just ignored by your brain, because you constantly encounter other natural effects that result in more significant delays anyway.

Next topic: Nobody can see more than 30 FPS!

They did that in the early-digital-TV era to match the video more closely to the audio, because of frame sync. Without the 2.5-frame delay, audio was ~60ms ahead of the video, which was noticeable and problematic.

Wow this was a long discussion about an inconsequential point. It still remains the case that wireless headphones introduce enough variable lag that you can’t play rhythm games effectively. And it still remains the case that in some cases (e.g. multiple wireless relays like Roku->phone->headphones) there is enough lag that lip syncing issues become intolerable. Why fixate on this 45ms thing?

Also, yes you can detect a delay of 45ms. Though, as you said, you don’t use interaural time delay to determine distance to a sound, its bearing accuracy is up to 1°, and can detect differences in arrival time of 10 microseconds. Range determination is much less precise, but part of that relies on distinguishing reflected sound from direct arrival, which can again be done with a precision of tens of μs. I operate a laser system that pulses at 50Hz, which is 20ms per pulse. When the flashlamps fire (at 50Hz), it makes a clicking sound as the spark gaps in the Marx banks arc over. I can easily detect a single misfire in the 50Hz pulse train. It’s entirely possible that it simply doesn’t cause lip sync annoyance at that threshold as you say, but it’s certainly detectable. That said, as I pointed out above, it often IS above the threshold that causes lip sync annoyance, whatever that threshold is.

You are the champion of the weeds.

Yeah, being ahead of the video is very bad, so they erred on the delay side intentionally.

There actually are quantitative studies on this stuff. This study covers what’s probably the most hard core test case where audio latency will matter more than anything, which is live monitoring while playing an instrument.

The variation in effects tend to be largely driven by instrument, but in most cases, even in this super hard core case, latency times of 35ms still result in an experience subjectively classified as fair to good.

In normal situations where you are merely an observer of the sound? it’s unlikely it could impact your experience much.

Ya, sorry, I just started digging into it as I read some of this stuff, and got interested in the scientific aspects.

Oh, absolutely. Your ability to detect a delay difference between each of your ears is SUPER precise, because that’s what they’ve evolved to do.

I have to admit, I found a bunch of fascinating stuff while I was researching trying to refute you:)

Also, I don’t necessarily mean champion of the weeds in a negative way. Rabbit holes can contain wonderlands.

With regards to distance, I wasn’t talking about stereoscopic cues. There are other cues the brain uses to judge distance of a sound, including visual estimates. In normal settings, the audio and visual cues are concordant. But if those cues are not concordant due to an artificial audio delay, then your brain can report a mismatch between the audio and visual stimuli.

dot clusters presented with a sound delay were judged to be more distant than dot clusters paired with equivalent sound leads. In the second experiment, we confirmed that the presence of a sound delay was sufficient to cause stimuli to appear as more distant. Additionally, we found that ecologically congruent pairing of more distant events with a sound delay resulted in an increase in the precision of distance judgments

In a simultaneity judgment task we presented a large range of stimulus onset asynchronies corresponding to distances of up to 35 meters. We found an effect of distance over the simultaneity estimates, with greater distances requiring larger stimulus onset asynchronies… These findings reveal that there should be an internal mechanism to compensate for audiovisual delays, which critically depends on the depth information available.

In other words, the fact that you don’t notice a delay when talking to someone across a room is not evidence that it is impossible to perceive that delay, because your brain compensates for perceived distance when deciding whether sound localized from a particular object.

The acceptable limit depends on the medium. For film, the limit is 22 ms.

Modem Android phones and good headphones support AptX Bluetooth which actually has lower latency (45ms) than wired headphones (60ms).

OK, this is clearly nonsense. You’re telling me that a cable connected to a driver connected directly to the phone’s audio hardware can somehow have more latency than the exact same thing, but with an additional digital wireless protocol interposed between them? Bullshit. The fact that you could even make this claim renders anything else you say on the topic suspect.

Bluetooth is mostly tolerable and I haven’t used wired headphones for ages, but honestly? Bluetooth skips, and degrades when you hold the phone wrong, and has significant lag and has done so for me even with AptX, for years. It’s kind of garbage and people put up with it because it does the job just well enough to prevent anything else from replacing it.

I still use the headphone jack occasionally and when I do it’s necessary and useful not having to carry a stupid bloody dongle. I hate that I’m forced to buy Samsung to do so, but given so many other manufacturers also think that I don’t need an ability to expand my storage, i wouldn’t gain that many more options if I stopped caring about it.

And the fact is, the headphone jack is a robust and well-developed standard extending back decades. It’s still good enough for professional audio, so I find the claims that it’s obsolete laughable. And people complaining about the wasted space - phones have been too thin to comfortably grip for several years now, so if they just added 2mm of thickness, I’m sure they could manage to fit a poor little headphone jack in, as well as more battery!

(Seriously people, I want more battery.)

I’m not questioning whether your auditory system can perceive time slices that small. When detecting bearings to sounds, it’s based one differences that are smaller.

But the fact that it’s not noticeable in tons of normal situations means that it’s not going to be disruptive in situations that we’re talking about. All of those normal cues that your brain uses to do stuff are already fucked up at that point… you’re looking at pictures of tiny people on a flat screen a few feet from your face. There are already so many things which are conflicting with any biologically evolved mechanism that it demonstrates that your brain doesn’t have any problem just dealing with it. Because in the real world, the way that sound is routinely affected by the environment, volume disparities, distance, etc… while there is actual information in the audio signal that reaches your brain that includes all of that stuff, there are so many possible variations of the input that could result in that signal, that your brain doesn’t freak out about any of it.

It’s only when YOU are producing the sound that you start to run into significant problems, because your brain is very specifically listening for audio feedback that is directly related to your own actions. When that sort of delay exists, you start to have major problems. But again, as that study above presents, even in that case a delay of 35ms isn’t going to cause a major problem. And anything you are doing other than live audio monitoring of your own performance is going to have more room for latency without affecting the experience.

It’s not the same as the kind of instantaneous, massive problem that is caused when, say, you screw up the z-draw order of things rendered in VR space. Because the difference in perception between your eyes translates to a very specific distance, and that distance correlates to very specific properties in relation to obscuring other objects or being obscured. When those two factors are in conflict, it creates major problems for your brain. (for me, it causes an almost instantaneous headache)

For an auditory effect, to achieve this kind of problem, you would need to somehow delay the signals reaching each ear, differently. This would result in audio hitting you that suggested a source at one location, an then potentially have a visual indicator of the source being somewhere else.

Although, in practice, when you have exactly this happen (it’s fairly easy to do, through misconfiguring of surround sound systems), the effect is nowhere near as disruptive as the visual issues described above. Even with significant disparities in the binaural signal from the visual source, the effect ends up being somewhat subtle. It’s easily recognized, but it doesn’t cause the violent reaction that screwed up z-order in VR does.

Could you provide a source that explains the basis of this value? I’d be interested to see what resulted in that number.