Bleep Qualcomm right in their Qualcomm-hole

Per first post

https://www.bloomberg.com/news/articles/2017-05-12/intel-to-add-its-support-to-ftc-lawsuit-against-qualcomm

Nvidia was quite strong, albeit in tablet form which implies higher TDP. Witness the outlier perf of the Nexus 9.

As to why Intel could not get their shit together, probably insistence on x86, remember the x86 Android phone? That died fast.

As to why Apple seems to be the only mobile SoC vendor able to do this, we discussed the technical details mid-topic. Part of it seems to be Apple’s willingness to have huge L3 caches. Die size and process seems roughly comparable. I highly recommend the earlier posts on this, search for die size.

Are those JetStream benchmarks comparing Chrome to Chrome, or is it Safari vs Chrome? How much is hardware and how much is Apple has much better javascript engine in Safari than anything you can find on Android?

True. I was actually at NVIDIA at the time (though on the GPU side of things), and the familiar excuse for the lack of phone design wins was always that we were way ahead of Qualcomm on raw performance, but phone companies wanted low TDP, integrated modems, and low cost, and weren’t willing to compromise on any of those for perf. I can’t say they were wrong, either – battery life is bad enough on high-end phones as it is.

Yes but then we’d have to use an iPhone 5s and all the bad that goes along with that. If you’re going to shower us with meaningless numbers then at least use some that have some kind of relevance to actual users. For example, if you said, “Even an iPhone 5s has a camera that focuses 10x faster than any current gen Android!” then I might say to myself, gee maybe I SHOULD switch to iPhone and their wonderful processor.

Yeah good point, you’d be giving up a lot on camera by going back 2 generations. At least current Android devices have (mostly) great cameras. But this topic is really about overall performance as a, y’know, computer.

Integrated modems is where Qualcomm puts the screws on, if you read the anti-trust complaint article. (And there may be something to this low cost argument if you read on.)

Nothing would make me happier than Qualcomm releasing a SoC in 2018 with double the perf of the Snapdragon 835. That would be fucking awesome. Even more awesome would be credible competition of any kind, but I’d take it, and gladly.

Let’s consider the current SoCs, and their geekbench 4 single threaded scores…

  • Apple A10
    3.3 billion transistors, 16nm, 125mm2
    2.34 GHz, 2 + 2 cores (presents as dual)
    L1: 64kb, L2: 3MB (shared), L3: 4MB (shared)
    3307

  • Snapdragon 835
    3 billion transistors, 10nm, 72.3 mm2
    2.45 GHz, 4 + 4 cores
    L1: 64kb, L2: 2MB + 1MB, L3: none
    1904

Nothing in there really jumps out to me, other than

  • physical size (but, qualcomm is on a smaller process!)
  • presence of huge L3 cache on the Apple side

Both devices have “performance” and “efficiency” cores, so technically that “eight core” Android device isn’t really eight cores. Apple hides this implementation detail in hardware, presenting as a plain dual core CPU and the internal CPU hardware decides whether or not it should be using the performance or efficiency cores at any given time. I guess Android makes this decision at the OS level?

It also seems Qualcomm has backtracked from custom ARM cores with the 835…

We also know that the actual CPU cores are not of Qualcomm’s own design, but more closely based upon ARM reference cores (the 820 used Qualcomm’s full-custom Kryo cores)

… whereas Apple is in full bore “we customize the hell out of these CPUs” mode and has been for a while.

Well, one of these companies is definitely incompetent, that’s for sure.

Is it a fairy tale? Is it @jsnell? Since Apple has yet to go below 16nm, and Qualcomm has been on 10nm for a while now and is going to 7nm with the 845… plus, if you add Apple’s pretty damn amazing track record on custom mobile CPU design to the mix… I think it’s a very safe bet that with A11, Apple is gonna knock it out of the park. Again.

It depends on the benchmark, for common stuff like Octane (now deprecated), it was mostly hardware. Apple does have a top-notch JS engine in Safari and they do a great job optimizing it for the modern web. However lately Chrome has (finally) gotten serious about this, such that they’ve doubled their speed on things they were weak at before, and closed the gap a fair bit.

I did some digging about the Snapdragon 845 and it looks like it will be available in March 2018, based on the generic Cortex-A75 ARM CPU design, which promises about 1.33× performance:

As far as Geekbench 4 goes, 1,900 × 1.33 is 2,527 which is iPhone 6s territory.

Octane has been deprecated but 1.48× is a good omen, too.

Pretty sure I didn’t say anything like that. If I did, apologies. It’s not my place to tell anyone whether they should or should not desire more CPU performance.

My main point has been that you appear to be misunderstanding semiconductor economics. CPUs getting faster and cheaper is not some immutable law of nature. CPU speeds are fundamentally linked to semiconductor technologies, either directly in forms of higher frequencies and lower power consumption, or indirectly by it making manufacturing of larger CPUs economical. And those improvements aren’t really happening anymore.

For example if you went back in time a year, you’d see ARM project that the A73 would get to 2.8GHz in a mobile phone power envelope. They were basing that on 10nm giving them a 15-20% frequency bump. Whoops, the reality turned out to be that the manufacturers don’t seem able to clock it any higher with 10nm than 16nn.

No luck needed. If I marched into the store right now to buy a top end Android smartphone, the phones from the top 2 Android manufacturers would not be using a Snapdragon. Samsung’s phones would be using Exynos 8895, Huawei’s would be a Kirin 960. And guess what… The CPU performance of those SOCs be about the same speed as that of a Snapdragon 835.

Those numbers are basically irrelevant. What you’re ignoring is that both Apple and Qualcomm have a similar silicon budget for the CPU. Apple has chosen to spend it on a 2+2 configuration, Qualcomm on a 4+4 configuration. Of course the first choice leads to better single core performance and worse multicore performance.

And this particular choice of core size is not unique to Qualcomm. It’s one that’s being made by everyone except Apple when it comes to the mobile phone space. We could say that all of these people are morons. But that would be a bit arrogant. Or we could accept that they’re using actual market data to make these billion dollar decisions and all coming up with a certain answer, while we’re just idly speculating.

Note: it’s totally possible for the other manufacturers and Apple to be making decisions, and for both of them to be right. Apple is in a unique position. They sell a higher margin product to a captive audience, and they don’t need to stretch a single CPU design to cover a wide range of phones. While e.g. the idea behind the A73 was that it’d be used in a quad-core configuration for high end phones, and dual-core configuration with an older process for the middle of the market.

Would you care to quantify that guess?

For example, in the other thread, you seemed to be over the moon about the A10x results. Anyone else looking at those results would have seen that they basically did nothing. No frequency bump compared to A10, despite having a tablet power envelope to work with. The only improvements they got were on memory-bound synthetics from doubling the memory bus width (but that translate to no performance gains on real-world CPU bound tasks), and a big gain on one sub-benchmark from increasing the cache by 2.5x. But for everything else, there were basically no improvements.

If you think the single-core performance changes between A10 and A10x were “knocking it out of the park”, sure. I’ll buy that they get results like that.

Not in the US, they aren’t.

Let’s see… oh yes “The Kirin 960 is made up of four high-performance ARM Cortex A73” more generic ARM copies. Now that’s some technical innovation.

Indeed, and Apple’s 16nm process is absolutely brutalizing what Qualcomm is currently producing on 10nm. Guess what’s gonna happen when Apple shifts down to 10nm? Go ahead, guess. Want to make a wager on it?

It’s a fair point, and mirrors what eventually happened on desktops, but we’re not there yet, and won’t be for another 5 years on mobile. 7nm is coming.

I already did in the Intel vs. Apple topic. Feel free to look it up, if you’re curious.

Those numbers correlate exactly to real world perfomance in Discourse. And generally speaking, any sufficiently advanced JavaScript. I should know; I’ve spent the last four (technically five) years of my life building this, benchmarking it, and testing it on every major mobile and tablet device you can buy.

Earlier in this very topic:

I just ran WebXPRT 2015 on my 2017 iPad Pro and got 265. That means the A10x offers a 1.27× improvement over the A10 in the iPhone 7.

Feel free to read more benchmarks. You have an odd idea of what “basically no improvements” means. Hell, Ars didn’t even run WebXPRT, but I did, and I got results completely consistent with their claims:

Between the clock speed boost and the architectural improvements, single-core performance is up by 25 percent or so.

Wumpus may be very slightly biased because Javascript performance is limited to single core. =) I’m not aware of a JS engine that is multi-core capable.

Regarding the JS engine performance. Google’s V8 is extremely performant. I found this:

I had a fucking dream last night that I bought a phone, and then another one came out immediately that had a non qualcomm processor and was super fast, and Wumpus shamed me about my phone.

Not even joking.

That’s so weird. I had a dream that I died, was reincarnated as an Apple A11, and that @wumpus kept trying to touch me inappropriately…

There’s no inappropriate way to touch an A11, Armando.

Did he pinch zoom you? You can tell us, this is a safe space.

This is also true of most smartphone apps, for example all UI operations are single threaded.

The same logic applies everywhere; unless the program’s author has gone to great lengths to make every action happen in a background thread – which also implies they’ve written the code to wait for all those actions to complete, sync up and merge the results – then single threaded perf will be a very accurate estimate of overall performance.

And it is.

Multi-threaded programming is not just hard to apply to many problems, it’s hard period.

Algorithms for splitting rendering work between processors], as well as the thread synchronization that goes with the progressive effect rendering, is easily the most complex code in Paint.NET. It’s worth it though because this gives us a huge performance boost when rendering effects.

Rendering graphics effects is one of the cases where the work can be easily split up, that is, you divide the screen into (x) sections and have (x) processors work on each of those sections.

Poked head in to see what was happening, mostly wumpus’s crusade.

In other words, nothing to see here.

Eh, this is really not true. I mean, certainly automatically breaking big tasks up into parallel ones is no trivial undertaking, but I think you are overstating the difficulties of multithreaded programming.

The very nature of all user interface operations taking place on a single event dispatch thread effectively NECESSITATES use of multiple threads, because you never want to do large operations directly on that thread. You always spawn separate threads for that kind of stuff, so that the main interface doesn’t lock up.

You generally always have multiple threads going in most Android (or Java in general) for all but the most trivial of applications.

Well, I’ve been working in software for 30+ years, and I’m here to tell you that you may have contracted … “shit’s easy syndrome”.

Feel free to read up:

Probably the best near term solution is built in data structures and facilities that use multi-threading under the hood. That is good and solid and necessary, but it’s no magic bullet, not by a long shot.

Many if not most problems simply aren’t amenable to multi-threaded solutions – ideally you need the work to be perfectly divided up into sections, where each section has no dependency of any kind on the other sections. Video rendering is like this, as are graphics effects (photoshop filters), a lot of database queries, webservers where each core can service an individual user request, etcetera. The clear wins are almost always on the server versus the client, because the server is doing more work for more people, and that’s easy to break up into completely independent chunks of work.

And even when it does scale… it is difficult to scale cleanly to 4, 8, 16, 32:

It is funny to realize that hyperthreading is basically Intel’s way of telling programmers they suck at writing multithreaded code. And they do, because it’s really hard.

I’m just speaking from my own 15 years experience, where I do multithreaded progamming pretty routinely.

I mean, maybe I’m super genius, but I don’t think so.

Yeah, but we aren’t taking about making a parallel solution to a single problem. We aren’t parallelizing an algorithm.

We are writing a software application, where there are generally lots of things going on at the same time.

Like I said man, anyone who has done any significant application development uses threads all the time. Certainly you in your 30 years of experience have done this. You essentially HAVE to do it for applications to not have terrible performance in terms of UI sluggishness.