Cerebras Released World’s Fastest AI Inference
In Short
- The Wafer-Scale Engine from Cerebras has proven to be faster than Groq at generating AI inference.
- With the 8B model, Cerebras Inference can process up to 1,800 tokens per second; with the 70B model, it can process 450 tokens per second.
- In contrast, Groq can operate with 8B and 70B models at up to 750 T/s and 250 T/s, respectively.
Cerebras AI Inference has finally opened access to its Wafer-Scale Engine (WSE), and it is able to infer the Llama 3.1 8B model at a rate of 1,800 tokens per second. Cerebras is capable ...