Nvidia's new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size venturebeat.com 11 points by hochmartinez 7 days ago
Half the size is not a great metric when comparing a dense model against a MoE.
llama has order of magnitude mode compute requirement than deepseek.