Havoc 7 days ago

Half the size is not a great metric when comparing a dense model against a MoE.

dyl000 6 days ago

llama has order of magnitude mode compute requirement than deepseek.