Summarized by Dodly:
DramaBox: Resemble AI's New Voice Cloning TTS
Audio Summary
Summary
Resemble AI has released DramaBox, a new text-to-speech model built on their LTX2 pipeline that offers impressive voice cloning capabilities. This audio-only model is significantly trimmed down from the full LTX model, making it more efficient. DramaBox is available through Hugging Face and a GitHub repository, with installation in ComfyUI being a key focus for many users. The model utilizes Unsloth's quantized Gemma 3 12B for its text encoder and recommends 24 GB of VRAM for optimal performance. Early impressions indicate a noticeable improvement in audio quality, naturalness, and expressiveness compared to previous open-source options. The model can be integrated into video generation pipelines, as demonstrated in a short story featuring two characters, where DramaBox handles all the dialogue generation. Installation involves cloning the repository into your ComfyUI custom nodes folder and installing dependencies, then downloading the model weights and organizing them in your models folder. The DramaBox node in ComfyUI allows for voice cloning by providing an audio sample, with the output being the generated TTS audio. Testing shows fast generation times and consistent voice cloning across different dialogue.