Google GenAI SDK (Gemini Live)
Google GenAI SDK (Gemini Live)
This guide shows how to integrate the Telcoflow SDK with the Google GenAI SDK for bidirectional real-time audio streaming with Gemini’s native audio model.
Overview
The integration bridges two real-time streams:
- Caller audio -> Gemini: Incoming phone audio is forwarded to a Gemini Live session
- Gemini audio -> Caller: Gemini’s voice responses are sent back to the caller via
send_audio()
Interruption handling is built in: when Gemini detects the user is speaking over the model, the outgoing audio buffer is cleared instantly.
Full Example
How It Works
Stream to Gemini
The stream_to_gemini() coroutine reads audio chunks from call.audio_stream() and forwards them to the Gemini Live session using send_realtime_input(). The audio is wrapped in a types.Blob with the PCM MIME type.
Receive from Gemini
The receive_from_gemini() coroutine listens for Gemini responses:
- Interruption: When
content.interruptedisTrue, the caller has started speaking over the model.clear_send_audio_buffer()is called to immediately stop any queued audio. - Model audio: When
content.model_turncontainsinline_data, the raw audio bytes are sent to the caller viasend_audio().
Concurrency
Both coroutines run concurrently via asyncio.gather(). This allows the system to simultaneously listen to the caller and send AI responses without blocking.
Environment Variables
Audio Format
Both Telcoflow and Gemini native audio use PCM 16-bit linear, 24kHz, mono (audio/pcm;rate=24000). No transcoding is needed.
Next Steps
- Google ADK Integration - For structured multi-agent orchestration
- Audio Streaming - Buffer management and interruption handling
- Use Cases - Apply this integration to real-world scenarios
