Models that can take images as input include o1, gpt-4.5-preview, gpt-4o, gpt-4o-mini, and gpt-4-turbo.
Compare different transcription models. (Choice is saved automatically)
Select your preferred microphone device for recording. (Choice is saved automatically)
Select the language of your audio for better accuracy. (Choice is saved automatically)
Optional text to guide the model's style or continue a previous audio segment.
Note: Using gpt-4o-transcribe model which only supports JSON. Other formats are converted on server.
Sampling temperature between 0 and 1. Higher values make output more random.
Include confidence scores for transcribed tokens (works with gpt-4o-transcribe models).
Voice used to read back transcribed text automatically.
Upload or record audio to automatically transcribe. Results will appear here and will be spoken automatically. Note that processing may take 15-30 seconds depending on audio length.