MediumCapability

ChatGPT adds vision, hearing, and speaking capabilities

AI Impact Summary

ChatGPT now supports multimodal input/output: image understanding, audio input via speech-to-text, and responses delivered as speech. This enables new workflow options for chatbots, assistive apps, and customer-support interfaces by removing barriers to non-text prompts and voice interactions. For engineering teams, this requires updating client integrations to send images/audio in supported formats, planning for added media processing latency and costs, and tightening privacy controls around media data and transcription results.

Affected Systems

ChatGPTOpenAI API

Date: Date not specified
Change type: capability
Severity: medium

ChatGPT adds vision, hearing, and speaking capabilities

More from OpenAI

Get alerts for OpenAI