768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU
·2 min read
Compiled by KHAO Editorial
— aggregated from 1 source + 2 references discovered via search.
See llms.txt for citation guidance.
◌ Single Source
A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion-parameter LLM.
Key facts
A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion-parameter LLM
The Redditor is rather proud of the resulting ~4 tokens per second performance
The software side of the equation relied on the aforementioned Kimi K2.5’s mixture-of-experts architecture
While the 768GB of Optane (6x 128GB) does indeed offer far lower latency than the best NVMe SSDs, it is still two or three times slower than DRAM
Summary
Central to the headlining feat was the Redditor’s sourcing of six Optane PMem (DCPMM) sticks. ASRock Steel Legend SL-850G 850W 80 PLUS GOLD & Cybenetics Platinum Fully Modular Power Supply. The build was configured with the Optane in memory mode and the Samsung DDR4 as cache. The software side of the equation relied on the aforementioned Kimi K2.5’s mixture-of-experts architecture.