Lightnews — Scholar-powered news

Kwindla Hultman Kramer

@kwindla.bsky.social

March Voice AI Meetup - Wednesday the 5th

lu.ma/ffpyl57n

February 17, 2025 at 1:58 AM

Kwindla Hultman Kramer

@kwindla.bsky.social

Gemini 2.0 Flash is competitive with GPT-4o on:
- TTFT,
- instruction following,
- function calling, and
- natural conversation dynamics.

GPT-4o was ahead on all of these attributes by a wide enough margin that using any other LLM for voice AI mostly didn't make sense. Now there's competition!

February 5, 2025 at 5:55 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Memory for voice AI agents (and composite function calling) ...

There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!

Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.

February 4, 2025 at 3:51 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Sean DuBois is one of my favorite people to talk to about WebRTC, audio and video, designing good libraries, and hacking in general.

Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).

February 3, 2025 at 8:25 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

My favorite part of the DeepSeek-V3 Technical Report is the stuff about the all-to-all communication kernels. (Mostly in section 3.2.2. "Efficient Implementation of Cross-Node All-to-All Communication.")

January 30, 2025 at 8:46 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

January 24, 2025 at 1:37 AM

Kwindla Hultman Kramer

@kwindla.bsky.social

Using Gemini search metadata in a voice AI application

Filipi added support in Pipecat for Google Gemini's `groundingMetadata`. This makes it easy to do things like:

- link to URLs
- log searches for observability
- use specific search result chunks for RAG

youtu.be/oL9w-3Hbag0

Google Gemini search metadata

YouTube video by Daily

youtu.be

January 22, 2025 at 8:01 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Voice AI programming with Gemini² and Cursor

Adrian built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside Cursor. Really nice glimpse of the future (and nice example of a "multi-agent" architecture).

youtu.be/0VFZWZfU0vw

Voice AI programming with Gemini² and Cursor

YouTube video by Daily

youtu.be

January 21, 2025 at 5:47 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

January 19, 2025 at 9:54 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Pipecat 0.0.53 release is out today.

31 entries in the Changelog, including:

🌓 Frame observers — for implementing loggers, debuggers, and pipeline tools
🌓 Heartbeat frames — pipeline traversal timing and warnings if system frames get blocked anywhere in the pipeline

github.com/pipecat-ai/p...

github.com

January 18, 2025 at 11:29 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Search is a built-in tool in the Gemini Multimodal Live API.

Here's an iOS starter project that shows:

- how to to use the Gemini search built-in tool
- combining the built-in search with custom functions

Here's the code: github.com/pipecat-ai/p...

youtube.com/shorts/7jX7l...

Gemini Multimodal Live search tool + iOS native app

YouTube video by Daily

youtube.com

January 17, 2025 at 11:47 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Maslow's hierarchy of voice AI

u are here ⤵️
◻️◻️◻️◻️🟦◻️◻️◻️◻️
◻️◻️◻️🟦🟦🟦◻️◻️◻️
◻️◻️🟦🟦🟦🟦🟦◻️◻️
◻️🟦🟦🟦🟦🟦🟦🟦◻️
🟦🟦🟦🟦🟦🟦🟦🟦🟦

Network transport ▶️ Turn detection ▶️ Interruption handling ▶️ Natural voices ▶️ Tool use

www.youtube.com/watch?v=tAQW...

Pipecat Flows - open source Voice AI agent builder

YouTube video by Daily

www.youtube.com

January 17, 2025 at 6:46 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

"Voice AI in 2025" panel recording from the Voice AI Meetup last night.

Thank you to panelists Karan Goel, Niamh Gavin, Shrestha Basu Mallick, and Swyx.

And thank you to Chroma for hosting the meetup in their fantastic office in SF.

www.youtube.com/live/B6zTwHh...

YouTube

Share your videos with friends, family, and the world

www.youtube.com

January 15, 2025 at 7:53 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Christian built a voice AI assistant to control Spotify.

Tech:
- Google Gemini 2.0
- Pipecat
- Deepgram
- Cartesia

Code is here: github.com/pipecat-ai/s...

youtu.be/q6v-3BQem3Y

Gemini + Pipecat demo — Spotify Voice AI Assistant

YouTube video by Daily

youtu.be

January 15, 2025 at 6:06 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Gemini Multimodal Live API + iOS + WebRTC

Nice walk-through from Paul: youtu.be/nU3K8h_pkeQ

📣 set up a voice client in your iOS app
📶 specify WebSockets or WebRTC for network transport
🏓 attach a delegate to handle lifecycle events (for example "connected", "LLM ready")

PipeCat iOS Client + Gemini Multimodal Live WebSocket API 🌟

YouTube video by Daily

youtu.be

January 14, 2025 at 5:59 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Sunday morning listening ... and hacking.

January 12, 2025 at 2:52 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Today's reminder of how early we are in the generative AI/deep learning technology transition: moved a moderately complex prompt to a different LLM and 150% of my evals broke. 150% because evals I didn't even have (but, obviously, needed) broke, too.

January 11, 2025 at 5:48 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

They know what they’re doing over there in Cupertino (and Shenzhen).

January 10, 2025 at 11:13 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

iOS + Gemini Multimodal Live + WebRTC

Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.

January 10, 2025 at 7:23 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

I had a lot of fun talking to Eric Landau about the state of Voice AI at the end of 2024, what's coming in 2025, what the pain points are today if you're scaling voice AI agents in production, and — of course — the importance of data tooling and evals.

open.spotify.com/episode/5Fjj...

Kwin Kramer | Building the Future of Real-Time AI with Daily and PipeCat: Insights on Multimodal Systems and Developer Tools

Deep Learning Leaders · Episode

open.spotify.com

January 10, 2025 at 4:56 AM

Kwindla Hultman Kramer

@kwindla.bsky.social

The first Voice AI Meetup of the new year is next week. If you'll be in the San Francisco area on Tuesday evening, come join the fun. It's been great to see the voice/conversational AI community grow over the past year.

lu.ma/it312noz

Voice AI Meetup — Voice AI in 2025 · Luma

Join us for the first Voice AI Meetup of the year, hosted by Daily and Chroma. As always, there will be food, great conversations, and a few demos. The main…

lu.ma

January 10, 2025 at 2:30 AM

Kwindla Hultman Kramer

@kwindla.bsky.social

Which LLM should you use for your voice agents?

The team at Coval does a lot of interesting work with synthetic data and evaluations for voice AI agents. I've been working with them on evals for Gemini 2.0. They wrote up some of their results so far, here:

www.coval.dev/blog/scripte...

Scripted Evaluation Framework for Large Language Models: A Controlled Approach to Comparative Analysis - My Framer Site

Coval is a simulation & evaluation platform for voice and chat agents. Start your free trial today!

www.coval.dev

January 10, 2025 at 12:13 AM

Kwindla Hultman Kramer

@kwindla.bsky.social

all you need is beam search

January 9, 2025 at 3:48 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

The voice-to-voice AI Pareto frontier ...

Gemini 1.5 Flash occupies an interesting place in the capabilities matrix for voice AI. It's fast, very inexpensive, has a long context window, and has native audio input.

I've been experimenting with Gemini a lot. Here's an interesting Pipecat pipeline:

December 5, 2024 at 4:48 PM

Kwindla Hultman Kramer

@kwindla.bsky.social

Sunset. Double overhead day.

December 3, 2024 at 12:54 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news