Kwindla Hultman Kramer
banner
kwindla.bsky.social
Kwindla Hultman Kramer
@kwindla.bsky.social
Low, low, low latency. Daily.co and Pipecat.ai
March Voice AI Meetup - Wednesday the 5th

lu.ma/ffpyl57n
February 17, 2025 at 1:58 AM
Gemini 2.0 Flash is competitive with GPT-4o on:
- TTFT,
- instruction following,
- function calling, and
- natural conversation dynamics.

GPT-4o was ahead on all of these attributes by a wide enough margin that using any other LLM for voice AI mostly didn't make sense. Now there's competition!
February 5, 2025 at 5:55 PM
Memory for voice AI agents (and composite function calling) ...

There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!

Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.
February 4, 2025 at 3:51 PM
Sean DuBois is one of my favorite people to talk to about WebRTC, audio and video, designing good libraries, and hacking in general.

Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).
February 3, 2025 at 8:25 PM
My favorite part of the DeepSeek-V3 Technical Report is the stuff about the all-to-all communication kernels. (Mostly in section 3.2.2. "Efficient Implementation of Cross-Node All-to-All Communication.")
January 30, 2025 at 8:46 PM
January 24, 2025 at 1:37 AM
Using Gemini search metadata in a voice AI application

Filipi added support in Pipecat for Google Gemini's `groundingMetadata`. This makes it easy to do things like:

- link to URLs
- log searches for observability
- use specific search result chunks for RAG

youtu.be/oL9w-3Hbag0
Google Gemini search metadata
YouTube video by Daily
youtu.be
January 22, 2025 at 8:01 PM
Voice AI programming with Gemini² and Cursor

Adrian built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside Cursor. Really nice glimpse of the future (and nice example of a "multi-agent" architecture).

youtu.be/0VFZWZfU0vw
Voice AI programming with Gemini² and Cursor
YouTube video by Daily
youtu.be
January 21, 2025 at 5:47 PM
January 19, 2025 at 9:54 PM
Pipecat 0.0.53 release is out today.

31 entries in the Changelog, including:

🌓 Frame observers — for implementing loggers, debuggers, and pipeline tools
🌓 Heartbeat frames — pipeline traversal timing and warnings if system frames get blocked anywhere in the pipeline

github.com/pipecat-ai/p...
github.com
January 18, 2025 at 11:29 PM
Search is a built-in tool in the Gemini Multimodal Live API.

Here's an iOS starter project that shows:

- how to to use the Gemini search built-in tool
- combining the built-in search with custom functions

Here's the code: github.com/pipecat-ai/p...

youtube.com/shorts/7jX7l...
Gemini Multimodal Live search tool + iOS native app
YouTube video by Daily
youtube.com
January 17, 2025 at 11:47 PM
Maslow's hierarchy of voice AI

u are here ⤵️
◻️◻️◻️◻️🟦◻️◻️◻️◻️
◻️◻️◻️🟦🟦🟦◻️◻️◻️
◻️◻️🟦🟦🟦🟦🟦◻️◻️
◻️🟦🟦🟦🟦🟦🟦🟦◻️
🟦🟦🟦🟦🟦🟦🟦🟦🟦

Network transport ▶️ Turn detection ▶️ Interruption handling ▶️ Natural voices ▶️ Tool use

www.youtube.com/watch?v=tAQW...
Pipecat Flows - open source Voice AI agent builder
YouTube video by Daily
www.youtube.com
January 17, 2025 at 6:46 PM
"Voice AI in 2025" panel recording from the Voice AI Meetup last night.

Thank you to panelists Karan Goel, Niamh Gavin, Shrestha Basu Mallick, and Swyx.

And thank you to Chroma for hosting the meetup in their fantastic office in SF.

www.youtube.com/live/B6zTwHh...
YouTube
Share your videos with friends, family, and the world
www.youtube.com
January 15, 2025 at 7:53 PM
Christian built a voice AI assistant to control Spotify.

Tech:
- Google Gemini 2.0
- Pipecat
- Deepgram
- Cartesia

Code is here: github.com/pipecat-ai/s...

youtu.be/q6v-3BQem3Y
Gemini + Pipecat demo — Spotify Voice AI Assistant
YouTube video by Daily
youtu.be
January 15, 2025 at 6:06 PM
Gemini Multimodal Live API + iOS + WebRTC

Nice walk-through from Paul: youtu.be/nU3K8h_pkeQ

📣 set up a voice client in your iOS app
📶 specify WebSockets or WebRTC for network transport
🏓 attach a delegate to handle lifecycle events (for example "connected", "LLM ready")
PipeCat iOS Client + Gemini Multimodal Live WebSocket API 🌟
YouTube video by Daily
youtu.be
January 14, 2025 at 5:59 PM
Sunday morning listening ... and hacking.
January 12, 2025 at 2:52 PM
Today's reminder of how early we are in the generative AI/deep learning technology transition: moved a moderately complex prompt to a different LLM and 150% of my evals broke. 150% because evals I didn't even have (but, obviously, needed) broke, too.
January 11, 2025 at 5:48 PM
They know what they’re doing over there in Cupertino (and Shenzhen).
January 10, 2025 at 11:13 PM
iOS + Gemini Multimodal Live + WebRTC

Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.
January 10, 2025 at 7:23 PM
I had a lot of fun talking to Eric Landau about the state of Voice AI at the end of 2024, what's coming in 2025, what the pain points are today if you're scaling voice AI agents in production, and — of course — the importance of data tooling and evals.

open.spotify.com/episode/5Fjj...
Kwin Kramer | Building the Future of Real-Time AI with Daily and PipeCat: Insights on Multimodal Systems and Developer Tools
Deep Learning Leaders · Episode
open.spotify.com
January 10, 2025 at 4:56 AM
The first Voice AI Meetup of the new year is next week. If you'll be in the San Francisco area on Tuesday evening, come join the fun. It's been great to see the voice/conversational AI community grow over the past year.

lu.ma/it312noz
Voice AI Meetup — Voice AI in 2025 · Luma
Join us for the first Voice AI Meetup of the year, hosted by Daily and Chroma. As always, there will be food, great conversations, and a few demos. The main…
lu.ma
January 10, 2025 at 2:30 AM
Which LLM should you use for your voice agents?

The team at Coval does a lot of interesting work with synthetic data and evaluations for voice AI agents. I've been working with them on evals for Gemini 2.0. They wrote up some of their results so far, here:

www.coval.dev/blog/scripte...
Scripted Evaluation Framework for Large Language Models: A Controlled Approach to Comparative Analysis - My Framer Site
Coval is a simulation & evaluation platform for voice and chat agents. Start your free trial today!
www.coval.dev
January 10, 2025 at 12:13 AM
all you need is beam search
January 9, 2025 at 3:48 PM
The voice-to-voice AI Pareto frontier ...

Gemini 1.5 Flash occupies an interesting place in the capabilities matrix for voice AI. It's fast, very inexpensive, has a long context window, and has native audio input.

I've been experimenting with Gemini a lot. Here's an interesting Pipecat pipeline:
December 5, 2024 at 4:48 PM
Sunset. Double overhead day.
December 3, 2024 at 12:54 AM