skeptical benchmark scrutiny

관점별 · 1 시각 이번 호 전체

Tech Times · United States · MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost

Reports independent verification of the MSA architecture on June 18 while flagging that M3's 59.0% SWE-Bench Pro is vendor-run, that it trails Anthropic's Claude Opus 4.8 at 69.2%, and that promised open weights had not shipped.

“M3's 59.0% on SWE-Bench Pro beats GPT-5.5 but trails Claude Opus 4.8's 69.2%; the scores are company-run and the weights are still withheld.”

출처 ↗