erulabs 2 hours ago

Lower power consumption on a desktop monitor is an interesting technical challenge but I do wonder “Cui bono?” - obviously I’d want my gaming machine to consume less power but I’m not sure I’ve ever considered mouse-idle monitor-on power consumption when considering eg AMD versus Nvidia for my gaming machine.

Don’t get me wrong this is very interesting and AMD does great engineering and I loath to throw shade on an engineering focused company but… Is this going to convert to even a single net gain purchase for AMD?

I’m a relatively (to myself) a large AMD shareholder (colloquially: fanboy) and damn I’d love to see more focus on hardware matmul acceleration rather than idle monitor power draw.

  • makeitdouble 39 minutes ago

    To wager a guess, would that optimization also help push the envelope when one application needs all the power it can get while another monitor is just sitting idle ?

    Another angle I'm wondering about is longevity of the card. Not sure if AMD would positively care in the first place, but as a user if the card didn't have to grind much on the idle parts and thus last a year or two longer, it would be pretty valuable.

  • jayd16 2 hours ago

    Rumors have been floating around about some kind of PS6 portable or next gen steam deck with RDNA4 where power consumption matters.

    There's also simply laptop longevity that would be nice.

  • adgjlsfhk1 2 hours ago

    the architecture is shared between desktop and mobile. this sounds 100% like something that they did to give some dual display laptop or handheld 3 hours extra battery life by fixing something dumb.

syntaxing 6 hours ago

More curious, does RDNA4 have native FP8 support?

  • krasin 5 hours ago

    I refer to the RDNA4 instruction set manual ([1]), page 90, Table 41. WMMA Instructions.

    They support FP8/BF8 with F32 accumulate and also IU4 with I32 accumulate. The max matrix size is 16x16. For comparison, NVIDIA Blackwell GB200 supports matrices up to 256x32 for FP8 and 256x96 for NVFP4.

    This matters for overall throughput, as feeding a bigger matrix unit is actually cheaper in terms of memory bandwidth, as the number of FLOPs grows O(n^2) when increasing the size of a systolic array, while the number of inputs/outputs as O(n).

    1. https://www.amd.com/content/dam/amd/en/documents/radeon-tech...

    2. https://semianalysis.com/2025/06/23/nvidia-tensor-core-evolu...