ML engineering press

관점별 · 1 시각 이번 호 전체

MarkTechPost · United States · MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost

Technical writeup of M3's MiniMax Sparse Attention (MSA), which selects relevant key-value blocks to cut per-token compute to one-twentieth at 1M-token context, with native multimodal input and computer use for agentic coding.

“MSA cuts per-token compute to one-twentieth at 1M-token context, with over 9x faster prefill and 15x faster decoding than the prior generation.”

출처 ↗