Intel's stagnating performance is not in your head


#41

Dolphin’s website states that it is a dual-core application, so I’m not sure if this is a good measure of single-core performance.

It does, which probably explains the 40% improvement going from Ivy Bridge to Haswell, while most benchmarks only show a 5% to 10% gain.


#42

I said 28nm just because it’s the last thing I worked on and saw a picture of. I could literally count the blobs on the picture. Well, at least I were told the speckly blobs were atoms. The exact picture I was shown won’t be on the internet, and I can’t find a decent one right now.

I know nothing about materials science, however, so the speckles could have just been noise in the image :P Threads like this seem to indicate we’re definitely into the “countable” range of atoms in our gates! If you need all of your gates to be 28 atoms wide, and one of them is accidentally made at 24, then a normal voltage is now an over-voltage and pop, there’s now a short in that transistor every time it’s used.

edit: Also, when I was at the University of Manchester 10 years ago, the lead silicon guy and the graphene guy showed us a picture, and the bit between the gate and the channel was 2 atoms wide or something absurd. And after graduation a mate of mine ended up working in the graphene team and he used to send me pictures from the electron microscope where gain, I was told that the blobs I could see were individual atoms and I was shocked at how few there were. The wiki page for 5nm ‘backs up’ this – all of the examples contain transistors with atoms countable in the tens! (It still blows my mind really, to think about things operating at that scale). It makes sense, given that an Angstrom is 1/10th of a nanometer and that an Angstrom is the unit used to measure atoms. In a 14nm chips there really isn’t a lot in each transistor!

Multi-edit: Though doing maths like that is probably wrong. I was under the impression that 14/10/7nm weren’t actually 14/10/7nm in the same way that 90nm was 90nm – i.e. the minimum gap between the important bits, but a bunch of marketing spin designed to make them sound smaller. Still: things are tiny and definitely scraping at the edges of physics!

Out of interest, you say many companies. I was aware of Samsung and Intel – which other fabs go this low?

I thought I knew what binning was, but I might be wrong. I mainly knew it from when I worked in the GPU industry. So I’ve watched that video, hoping to learn, but it’s exactly what I described? e.g. “this i3 is identical die layout to an i5, but it failed all of its tests so those faulty modules were disabled and it was labelled an i3”

I’m not sure what I’m missing? :)

(What I’m probably wrong about: I was under the impression that modern silicon products are more similar to each other and are more aggressively binned than in the past, e.g. 15 years ago, where different products had intentional architectural differences, and the binning sorted them into the shit and good editions. So much so that ‘design for binning’ is more intentional now than e.g. 15 years ago.)

You’re right that a 2x performance isn’t stagnation. Really, it’s the gains in single threaded performance that have stagnated, compared to overall performance. e.g.:

https://www.cpubenchmark.net/compare.php?cmp[]=2332&cmp[]=1074&cmp[]=2227

Two 3.00Ghz Haswells (An i7 eXtreme edition, and a i5 Shit edition – well, it’s not a T, but none of those go up to 3.0Ghz) and a Pentium 4. The i5 and i7 are basically identical in single threaded performance. The i7 is about 5x the price of the i5 and yet when running Javascript laden crap it’ll perform the same. Effective performance, however, is 2x.

Then both of them compared to the Pentium 4, effective performance is absurd. It’s orders of magnitude higher. But single-threaded? 2.5x. (Though it should probably be noted that this benchmarking tool probably deliberately does loads of parallel stuff, which would bring down the P4’s rating. Back when P4s were around the benchmarking tools wouldn’t have attempted that, as it wasn’t that common in terms of user story. But it’s still a decent metric. And that those Haswells aren’t even really 3.00Ghz, due to fancy technology. Comparing CPUs in like-for-like in specific scenarios is really difficult, which is another reason that giant red “overall performance” figure is worth looking at)

The rate of gain in single threaded performance is a crawl! Here’s an interesting article with some pretty graphs (watch out, they’re all log graphs).

Also, I’ve not read that article, so I don’t know if that’s simply running Dolphin or running a specific Dolphin single-threaded benchmark. But I don’t really believe that Dolphin is completely single thread bound. It might be heavily single-threaded, but even if it does a tiny bit of naturally parrellisable work then suddenly the core count matters and the graphs get skewed.

I was going to look up dhrystone and whetstone benchmarks when replying to Wumpus, but last time I did that they were also parallelised/vectorised/threaded up the wazoo and therefore confusing and no longer a simple test of single-threadedness.


#43

Let’s see how well it correlates to JavaScript performance:

Kaby Lake 47.5k
Skylake 45.3k
Haswell 34.2k
Ivy Bridge 33.8k
Sandy Bridge 31.1k

Yep looks confirmed.

Under Octane, the big leap is definitely Skylake; total performance change from Sandy Bridge is 1.5x.

I think people have not quite fully grokked how big a deal it is to hand over control of CPU clock speed changes to the actual CPU (Speed Shift). That is new to Skylake and later.


#44

Intel, IBM, TSMC, GlobalFoundries, and Samsung are all at 10nm with IBM showcasing 7nm last year. Intel just demo’d a working system on 10nm at CES this week.

Scale is a challenge as you get down that small. But a few years ago, FinFet was invented, allowing for higher (albeit thicker) densities. Back in 2012, Intel invested $1B in ASML for EUVL techs that were supposed to help accelerate 450mm wafers + EUVL. We need EUVL to draw lines smaller + more accurately. [quote=“Pod, post:42, topic:127826”]
I thought I knew what binning was, but I might be wrong. I mainly knew it from when I worked in the GPU industry. So I’ve watched that video, hoping to learn, but it’s exactly what I described? e.g. “this i3 is identical die layout to an i5, but it failed all of its tests so those faulty modules were disabled and it was labelled an i3”

I’m not sure what I’m missing? :)
[/quote]

Binning basically starts each chip out at top bin and tests downgrade to lower bins. We never cherry pick lower bins to be higher bins. Binning is a science in its own right - you have mfg yields and then hope they come out matching your customer demand. I know in the past, In we have fused higher end chips to be lower end chips. If the opposite is true, where you don’t have enough top-bin, you should see a significant write-down on the balance sheet at quarter end because of the inventory you have to scrap - which I don’t recall happening to Intel at least, but then I don’t follow that metric very closely.

EDIT: Yes, all I3, I5 and I7 chips are the same identical chip, but certain tests failed, downbinning from i7 to i5 or to i3.


#45

@Tman, you may appreciate this. I used to work for a company in the CMP world and Intel was a customer, and I spent a decent amount of time at Intel, tsmc in Taiwan, IBM, TI, Samsung, etc. Back then I had dinner on an Intel visit with one of Intel’s top tech guys (I won’t use names but he had a big bushy white boy afro kind of hair style.) For various reasons it was just me and him. For perspective, at that time (2006 or so) 90 nm was the standard, 45 nm was what we were all trying to accomplish, wafers were at 300. The secret Intel roadmap we were shown showed a path to 22nm, but the physics challenges were daunting, in terms of barrier materials and just the overall physics of things at that dimension. I don’t think many of us could even imagine going much smaller than that, and the discussion of stacked processors, etc. was interesting.

But at dinner, this top scientist at Intel, after a few glasses of wine, told me that there was some real concern (I can share this since it’s so many years later) at Intel because “People just don’t feel like they need to upgrade their CPU the way they used. Speeds are at the point that the average PC user doesn’t feel like their CPU speed is too slow. They want a faster hard drive, but no one other than hard core gamers care any more about CPU speed, and for gamers the move is more and more to GPU vs. CPU. So our business model is borked. We may need to shift to more emphasis on memory, but we’re way behind Samsung on that, plus that’s so many layers less the challenge is not as big, and our competitive advantage is our engineering of complex features.”

That was in 2006 give or take a year.


#46

Yes, that’s why intel aggressively moved to address their power consumption and get costs down at the low-end, to compete with Qualcomm and Samsung. That’s why you started seeing $200 laptops a couple years ago that were fast enough to actually use for web browsing. They have sold extremely well on both ChromeOS and Windows, as MS made the Win10 license free on those very cheap laptops.

At the higher-end, this led to a tradeoff between massively improved battery life and insanely thin/light form factors in modern laptops. Not faster, no. But laptops are definitely better today than 10 years ago by a huge degree.

Intel also tried to move into the mobile space and even though mobile atoms were competitive with Qualcomm on speed and power consumption, Qualcomm’s investment into quick charging and software modems made them unbeatable. Intel seems to have largely given up on this, which I think is a huge mistake.

On the software side Android is (very slowly) but surely approaching a desktop OS-- you can buy a chromebook that runs android apps in windows. The experience still sucks, but it will improve over time.

Apple isn’t really trying for desktop convergence; they are pushing iOS on the tablet/mobile paradigm instead because they have MacOS for that.

And of course Windows 8 was Microsoft’s move to converge mobile and desktop usage. It was a huge bust because they didn’t listen to their customers on the desktop side and didn’t invest in subsidizing core software for mobile.


#47

Thanks for sharing! cool story.

A principle engineer once gave the analogy of imagining drawing the circuits using a pen that is 2 feet in diameter.


#48

Intel sort of dropped this bombshell yesterday that no one really noticed: the 8th generation Core CPUs, which are due out this year, will remain on 14nm. Xeons and data center chips will move toward 10nm, but consumer chips will remain at 14nm for a 4th straight year.

http://www.anandtech.com/show/11115/intel-confirms-8th-gen-core-on-14nm-data-center-first-to-new-nodes


#49

Yeah, instead of Process/Architecture/Optimization they’re doing Process/Architecture/Optimization/Optimization. Possibly even Process/Architecture/Optimization/Optimization/Optimization. Or Process/Architecture/Optimization/OptimizationOptimization/Optimization. Exciting stuff!

It sounds like 10nm might be as far as silicon can go. No more tricks-- that’s it for Si unless quantum-well finfets pan out. Their 7nm process will probably be on indium gallium arsenide or indium tin.


#50

Well at least pressure from AMD is adding cores. So Coffee Lake is 0% faster than Kaby Lake but you do get 50% more cores for the same price! (6 vs 4).

https://www.anandtech.com/show/11859/the-anandtech-coffee-lake-review-8700k-and-8400-initial-numbers/18

image


#51

Yes and that makes the coffee lake i3s really attractive indeed. I don’t need six or eight cores, but I can use four. I might pick up a cheap NUC or craptop as a htpc/server replacement for my haswell Celeron.

Or maybe not, it works fine and I’m cheap. Shrug.


#52

What is that “LLC” part of the table, above? I can’t figure it out.


#53

Last level cache. Basically L3.


#54

lame, why didn’t they put “CACHE” or “L3” up there. LLC who the hell ever says that.


#55

Basically just intel. Also limited liability corporations.


#56

Intel had an L4 cache which was shared between the CPU and on die GPU around the Broadwell area. Might be a holdover from some marketing spin.


#57

They still have that for some of the extra fancy gpu on a die models but they are rare.

https://www.anandtech.com/show/10281/intel-adds-crystal-well-skylake-processors-65w-edram


#58

As you’ve mentioned in other threads, more cores aren’t usually of too much use in gaming. That seems to be changing as time goes on, though. Steve Sinclair (lead over at Digital Extremes) was showing some of their optimization work for Plains of Eidolon. They used to peg two cores pretty hard, but they’ve been able to distribute that a lot more evenly now across numerous cores. I think it was said that 6 is optimal, but will scale up to 12 to some degree.

I think as Intel continues to stall on increasing the speed of processors, we’ll see this become more common as time goes on.


#59

Modern consoles have 8 very slow CPU cores. That’s why modern games make better use of multiple cores these days.


#60

Yeah as @stusser said it is much easier to cheat on Pc when you have 4 extremely fast cores. Hell even 2 fast cores with 4 threads is usually fine.