ru24.pro
News in English
Август
2024

Cerebras Released World’s Fastest AI Inference

0

In Short

  • The Wafer-Scale Engine from Cerebras has proven to be faster than Groq at generating AI inference.
  • With the 8B model, Cerebras Inference can process up to 1,800 tokens per second; with the 70B model, it can process 450 tokens per second.
  • In contrast, Groq can operate with 8B and 70B models at up to 750 T/s and 250 T/s, respectively.

Cerebras AI Inference has finally opened access to its Wafer-Scale Engine (WSE), and it is able to infer the Llama 3.1 8B model at a rate of 1,800 tokens per second. Cerebras is capable ...