- TTFT,
- instruction following,
- function calling, and
- natural conversation dynamics.
GPT-4o was ahead on all of these attributes by a wide enough margin that using any other LLM for voice AI mostly didn't make sense. Now there's competition!
- TTFT,
- instruction following,
- function calling, and
- natural conversation dynamics.
GPT-4o was ahead on all of these attributes by a wide enough margin that using any other LLM for voice AI mostly didn't make sense. Now there's competition!
There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!
Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.
There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!
Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.
Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).
Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).
Filipi added support in Pipecat for Google Gemini's `groundingMetadata`. This makes it easy to do things like:
- link to URLs
- log searches for observability
- use specific search result chunks for RAG
youtu.be/oL9w-3Hbag0
Filipi added support in Pipecat for Google Gemini's `groundingMetadata`. This makes it easy to do things like:
- link to URLs
- log searches for observability
- use specific search result chunks for RAG
youtu.be/oL9w-3Hbag0
Adrian built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside Cursor. Really nice glimpse of the future (and nice example of a "multi-agent" architecture).
youtu.be/0VFZWZfU0vw
Adrian built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside Cursor. Really nice glimpse of the future (and nice example of a "multi-agent" architecture).
youtu.be/0VFZWZfU0vw
31 entries in the Changelog, including:
🌓 Frame observers — for implementing loggers, debuggers, and pipeline tools
🌓 Heartbeat frames — pipeline traversal timing and warnings if system frames get blocked anywhere in the pipeline
github.com/pipecat-ai/p...
31 entries in the Changelog, including:
🌓 Frame observers — for implementing loggers, debuggers, and pipeline tools
🌓 Heartbeat frames — pipeline traversal timing and warnings if system frames get blocked anywhere in the pipeline
github.com/pipecat-ai/p...
Here's an iOS starter project that shows:
- how to to use the Gemini search built-in tool
- combining the built-in search with custom functions
Here's the code: github.com/pipecat-ai/p...
youtube.com/shorts/7jX7l...
Here's an iOS starter project that shows:
- how to to use the Gemini search built-in tool
- combining the built-in search with custom functions
Here's the code: github.com/pipecat-ai/p...
youtube.com/shorts/7jX7l...
u are here ⤵️
◻️◻️◻️◻️🟦◻️◻️◻️◻️
◻️◻️◻️🟦🟦🟦◻️◻️◻️
◻️◻️🟦🟦🟦🟦🟦◻️◻️
◻️🟦🟦🟦🟦🟦🟦🟦◻️
🟦🟦🟦🟦🟦🟦🟦🟦🟦
Network transport ▶️ Turn detection ▶️ Interruption handling ▶️ Natural voices ▶️ Tool use
www.youtube.com/watch?v=tAQW...
u are here ⤵️
◻️◻️◻️◻️🟦◻️◻️◻️◻️
◻️◻️◻️🟦🟦🟦◻️◻️◻️
◻️◻️🟦🟦🟦🟦🟦◻️◻️
◻️🟦🟦🟦🟦🟦🟦🟦◻️
🟦🟦🟦🟦🟦🟦🟦🟦🟦
Network transport ▶️ Turn detection ▶️ Interruption handling ▶️ Natural voices ▶️ Tool use
www.youtube.com/watch?v=tAQW...
Thank you to panelists Karan Goel, Niamh Gavin, Shrestha Basu Mallick, and Swyx.
And thank you to Chroma for hosting the meetup in their fantastic office in SF.
www.youtube.com/live/B6zTwHh...
Thank you to panelists Karan Goel, Niamh Gavin, Shrestha Basu Mallick, and Swyx.
And thank you to Chroma for hosting the meetup in their fantastic office in SF.
www.youtube.com/live/B6zTwHh...
Tech:
- Google Gemini 2.0
- Pipecat
- Deepgram
- Cartesia
Code is here: github.com/pipecat-ai/s...
youtu.be/q6v-3BQem3Y
Tech:
- Google Gemini 2.0
- Pipecat
- Deepgram
- Cartesia
Code is here: github.com/pipecat-ai/s...
youtu.be/q6v-3BQem3Y
Nice walk-through from Paul: youtu.be/nU3K8h_pkeQ
📣 set up a voice client in your iOS app
📶 specify WebSockets or WebRTC for network transport
🏓 attach a delegate to handle lifecycle events (for example "connected", "LLM ready")
Nice walk-through from Paul: youtu.be/nU3K8h_pkeQ
📣 set up a voice client in your iOS app
📶 specify WebSockets or WebRTC for network transport
🏓 attach a delegate to handle lifecycle events (for example "connected", "LLM ready")
Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.
Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.
open.spotify.com/episode/5Fjj...
open.spotify.com/episode/5Fjj...
lu.ma/it312noz
lu.ma/it312noz
The team at Coval does a lot of interesting work with synthetic data and evaluations for voice AI agents. I've been working with them on evals for Gemini 2.0. They wrote up some of their results so far, here:
www.coval.dev/blog/scripte...
The team at Coval does a lot of interesting work with synthetic data and evaluations for voice AI agents. I've been working with them on evals for Gemini 2.0. They wrote up some of their results so far, here:
www.coval.dev/blog/scripte...
Gemini 1.5 Flash occupies an interesting place in the capabilities matrix for voice AI. It's fast, very inexpensive, has a long context window, and has native audio input.
I've been experimenting with Gemini a lot. Here's an interesting Pipecat pipeline:
Gemini 1.5 Flash occupies an interesting place in the capabilities matrix for voice AI. It's fast, very inexpensive, has a long context window, and has native audio input.
I've been experimenting with Gemini a lot. Here's an interesting Pipecat pipeline: