rinna releases Japanese Stable Diffusion with Japanese prompts and 2-stage training
AI Impact Summary
rinna has released a Japanese-specific fine-tuning of Stable Diffusion, enabling native Japanese prompts to generate culturally aligned imagery. The model uses a two-stage training approach—replacing the English text encoder with a Japanese encoder and jointly fine-tuning the encoder and latent diffusion model—paired with a Japanese tokenizer to avoid CLIP tokenization issues. Training leveraged ~100M Japanese-captioned images, filtered with japanese-cloob-vit-b-16 and the Japanese subset of LAION-5B, and is hosted on Hugging Face and GitHub via the Diffusers ecosystem. This broadens multilingual capabilities for image generation ecosystems and reduces the need for translation or post-editing to achieve authentic Japanese visuals.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info