Cerebras Released World’s Fastest AI Inference

– The Wafer-Scale Engine from Cerebras has proven to be faster than Groq at generating AI inference.

With the 8B model, Cerebras Inference can process up to 1,800 tokens per second

with the 70B model, it can process 450 tokens per second.

– In contrast, Groq can operate with 8B and 70B models at up to 750 T/s and 250 T/s, respectively.

Cerebras AI Inference has finally opened access to its Wafer-Scale Engine (WSE)

the bigger Llama 3.1 70B variant. Prior to Cerebras, Groq held the title of quickest AI inference supplier.

Go More Stories

Arrow