“Generative AI is eating the world.”
That’s how Andrew Feldman, CEO of Silicon Valley AI laptop maker Cerebras, begins his introduction to his firm’s newest achievement: An AI supercomputer able to 2 billion billion operations per second (2 exaflops). The system, known as Condor Galaxy 1, is on monitor to double in measurement inside 12 weeks. In early 2024, it is going to be joined by two extra programs of double that measurement. The Silicon Valley firm plans to maintain including Condor Galaxy installations subsequent yr till it’s working a community of 9 supercomputers able to 36 exaflops in complete.
If large-language fashions and different generative AI are consuming the world, Cerebras’s plan is to assist them digest it. And the Sunnyvale, Calif., firm is just not alone. Other makers of AI-focused computer systems are constructing large programs round both their very own specialised processors or Nvidia’s newest GPU, the H100. While it’s troublesome to decide the scale and capabilities of most of those programs, Feldman claims Condor Galaxy 1 is already among the many largest.
Condor Galaxy 1—assembled and began up in simply 10 days—is made up of 32 Cerebras CS-2 computer systems and is about to broaden to 64. The subsequent two programs, to be inbuilt Austin, Texas, and Ashville, N.C., may even home 64 CS-2s every.
The coronary heart of every CS-2 is the Waferscale Engine-2, an AI-specific processor with 2.6 trillion transistors and 850,000 AI cores constituted of a full wafer of silicon. The chip is so massive that the dimensions of reminiscence, bandwidth, compute assets, and different stuff within the new supercomputers shortly will get a bit ridiculous, as the next graphic reveals.
In case you didn’t discover these numbers overwhelming sufficient, right here’s one other: There are not less than 166 trillion transistors within the Condor Galaxy 1.Cerebras
One of Cerebras’s greatest benefits in constructing huge AI supercomputers is its capacity to scale up assets merely, says Feldman. For instance, a 40 billion–parameter community will be educated in about the identical time as a 1 billion–parameter community in case you commit 40-fold extra {hardware} assets to it. Importantly, such a scale-up doesn’t require further traces of code. Demonstrating linear scaling has traditionally been very troublesome due to the problem of dividing up huge neural networks so that they function effectively. “We scale linearly from 1 to 32 [CS-2s] with a keystroke,” he says.
The Condor Galaxy collection is owned by Abu Dhabi–primarily based G42, a holding firm with 9 AI-based companies together with G42 Cloud, one of many largest cloud-computing suppliers within the Middle East. However, Cerebras will function the supercomputers and may lease assets G42 is just not utilizing for inner work.
Demand for coaching massive neural networks has shot up, based on Feldman. The variety of firms coaching neural-network fashions with 50 billion or extra parameters went from 2 in 2021 to greater than 100 this yr, he says.
Obviously, Cerebras isn’t the one one going after companies that want to coach actually massive neural networks. Big gamers resembling Amazon, Google, Meta, and Microsoft have their very own choices. Computer clusters constructed round Nvidia GPUs dominate a lot of this enterprise, however a few of these firms have developed their very own silicon for AI, resembling Google’s TPU collection and Amazon’s Trainium. There are additionally startup rivals to Cerebras, making their very own AI accelerators and computer systems together with Habana (now a part of Intel), Graphcore, and Samba Nova.
Meta, for instance, constructed its AI Research SuperCluster utilizing greater than 6,000 Nvidia A100 GPUs. A deliberate second part would push the cluster to five exaflops. Google constructed a system containing 4,096 of its TPU v4 accelerators for a complete of 1.1 exaflops. That system ripped by the BERT pure language processor neural community, which is far smaller than at the moment’s LLMs, in simply over 10 seconds. Google additionally runs Compute Engine A3, which is constructed round Nvidia H100 GPUs and a customized infrastructure processing unit made with Intel. Cloud supplier CoreWeave, in partnership with Nvidia, examined a system of three,584 H100 GPUs that educated a benchmark representing the big language mannequin GPT-3 in simply over 10 minutes. In 2024, Graphcore plans to construct a 10-exaflop system known as the Good Computer made up of greater than 8,000 of its Bow processors.
You can entry Condor Galaxy right here.
From Your Site Articles
Related Articles Around the Web