ChatGPT adds vision, hearing, and speaking capabilities
AI Impact Summary
ChatGPT now supports multimodal input/output: image understanding, audio input via speech-to-text, and responses delivered as speech. This enables new workflow options for chatbots, assistive apps, and customer-support interfaces by removing barriers to non-text prompts and voice interactions. For engineering teams, this requires updating client integrations to send images/audio in supported formats, planning for added media processing latency and costs, and tightening privacy controls around media data and transcription results.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium