Credit: Getty Images
Modern CPU transistor counts are enormous — AMD announced earlier this month that a full implementation of its 7nm Epyc “Rome” CPU weighs in at 32 billion transistors. To this, Cerebras Technology says: “Hold my beer.” The AI-focused company has designed what it calls a Wafer Scale Engine. The WSE is a square, approximately eight inches by nine inches, and contains roughly 1.2 trillion transistors.
I’m genuinely surprised to see a company bringing a wafer-scale product to market this quickly. The idea of wafer-scale processing has attracted some attention recently as a potential solution to performance scaling difficulties. In the study we discussed earlier this year, researchers evaluated the idea of building an enormous GPU across most or all of a 100mm wafer. They found that the technique could product viable, high-performance processors and that it could also scale effectively to larger node sizes. The Cerebras WSE definitely qualifies as lorge large — its total surface area is much larger than the hypothetical designs we considered earlier this year. It’s not a full-sized 300mm wafer, but it’s got a higher surface area than a 200mm does.
The largest GPU, just for comparison, measures 815 square millimeters and packs 21.1B transistors. So the Cerebras WSE is just a bit bigger, as these things go. Some companies send out pictures of their chips held up next to a diminutive common object, like a quarter. Cerebras sent out a photo of their die next to a keyboard.
Not Pictured: PCIe x1600 slot.
As you can see, it compares fairly well.
The Cerebras WSE contains 400,000 sparse linear algebra cores, 18GB of total on-die memory, 9PB/sec worth of memory bandwidth across the chip, and separate fabric bandwidth of up to 100Pbit/sec. The entire chip is built on TSMC’s 16nm FinFET process. Because the chip is built from (most) of a single wafer, the company has implemented methods of routing around bad cores on-die and can keep its arrays connected even if it has bad cores in a section of the wafer. The company says it has redundant cores implemented on-die, though it hasn’t discussed specifics yet. Details on the design are being presented at Hot Chips this week.
The WSE — “CPU” simply doesn’t seem sufficient — is cooled using a massive cold plate sitting above the silicon, with vertically mounted water pipes used for direct cooling. Because there’s no traditional package large enough to fit the chip, Cerebras has designed its own. PCWorld describes it as “combining a PCB, the wafer, a custom connector linking the two, and the cold plate.” Details on the chip, like its raw performance and power consumption, are not yet available.
A fully functional wafer-scale processor, commercialized at scale, would be an exciting demonstration of whether this technological approach has any relevance to the wider market. While we’re never going to see consumer components sold this way, there’s been interest in using wafer-scale processing to improve performance and power consumption in a range of markets. If consumers continue to move workloads to the cloud, especially high-performance workloads like gaming, it’s not crazy to think we might one day see GPU manufacturers taking advantage of this idea — and building arrays of parts that no individual could ever afford to power cloud gaming systems in the future.