Show HN: Dia-Jax – A Jax port of the Dia text-to-speech dialogue model

github.com

2 points by jaco-bro 19 hours ago

I've created a JAX port of Dia, the 1.6B parameter text-to-speech model that generates realistic dialogue from transcripts. This experimental port explores running Dia with JAX's functional paradigm and hardware flexibility potential.

Key features: - Command-line interface for generating audio - Support for multi-speaker dialogue with [S1]/[S2] tags - Non-verbal sounds like (laughs), (coughs), etc. - Plans for voice cloning capability

Current status: Functional but with memory optimization challenges. The PyTorch version can generate minutes of audio in <10GB VRAM, while this port currently has higher memory usage. Contributions from JAX optimization experts welcome!

GitHub: [https://github.com/jaco-bro/diajax]