Cerebras Released World’s Fastest AI Inference

28.08.2024 15:44

Paper Blog

In Short

The Wafer-Scale Engine from Cerebras has proven to be faster than Groq at generating AI inference.

With the 8B model, Cerebras Inference can process up to 1,800 tokens per second; with the 70B model, it can process 450 tokens per second.

In contrast, Groq can operate with 8B and 70B models at up to 750 T/s and 250 T/s, respectively.

Cerebras AI Inference has finally opened access to its Wafer-Scale Engine (WSE), and it is able to infer the Llama 3.1 8B model at a rate of 1,800 tokens per second. Cerebras is capable ...