ML engineering press
관점별 · 1 시각 이번 호 전체
MarkTechPost · United States · MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost
Technical writeup of M3's MiniMax Sparse Attention (MSA), which selects relevant key-value blocks to cut per-token compute to one-twentieth at 1M-token context, with native multimodal input and computer use for agentic coding.
“MSA cuts per-token compute to one-twentieth at 1M-token context, with over 9x faster prefill and 15x faster decoding than the prior generation.”