Author | Lightnews

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

2025 is dubbed the Year of Evaluation as agentic systems grow, forcing C‑suite to treat evaluation like a core KPI. Enterprises are now budgeting for whole‑system monitoring, not just single models. Watch the talk: https://youtu.be/CQGuvf6gSrM #AIevaluation #AgenticSystems #EnterpriseAI

Thumbnail for YouTube video: 2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

V0’s demo shows real‑user data is key to catching hallucinations—build deterministic evals, visualize failures like a basketball court, and plug them into CI to pre‑empt regressions. https://youtu.be/L8OoYeDI_ls #LLMEvals #AIops

Thumbnail for YouTube video: Evals Are Not Unit Tests — Ido Pesok, Vercel v0

1

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

The speaker argues the 'Bitter Lesson' shows data‑driven scaling beats domain expertise; he proposes DSPy signatures and evals to decouple system design from models, reducing technical debt. https://youtu.be/qdmxApz3EJI #AI #ML #LLM

Thumbnail for YouTube video: On Engineering AI Systems that Endure The Bitter Lesson - Omar Khattab, DSPy & Databricks

1

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Jeff R. shows retrieval is the real bottleneck in retrieval‑augmented LLMs. Fast evals on synthetic queries beat public benchmarks, and clustering conversation metadata turns usage patterns into product decisions. https://youtu.be/jryZvCuA0Uc #AIProduct #DataDriven

Thumbnail for YouTube video: How to look at your data — Jeff Huber (Choma) + Jason Liu (567)

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Conference opening stresses AI engineering’s maturity and urges the community to co‑define standard models such as LLMOS, LLM‑SDLC, and SPADE, prioritizing human input/output ratio over terminology. https://youtu.be/IHkyFhU6JEY #AIEngineering #StandardModels #SPADE

Thumbnail for YouTube video: Designing AI-Intensive Applications - swyx

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Brain Trust’s new Loop agent turns the manual eval loop into an automated, data‑driven optimization cycle using frontier LLMs like Cloud4. It integrates side‑by‑side previews and an auto‑apply toggle, speeding up AI product iteration. https://youtu.be/MC55hdWLq4o #AIEngineering #EvalAutomation

Thumbnail for YouTube video: The Future of Evals - Ankur Goyal, Braintrust

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

OpenAI execs highlight that future AGI relies on a mix of compute‑optimized and latency‑optimized GPUs, with RL‑HF driving reliability. The shift to domain‑specific agents means engineers will focus on model orchestration, not just code. https://youtu.be/avWhreBUYF0 #AI #AGI

Thumbnail for YouTube video: #define AI Engineer - Greg Brockman, OpenAI (ft. Jensen Huang)

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

The simplest way to keep an icon visible while truncating text is a single TextView with a compound drawable. No extra layout needed; ellipsis appears before the icon automatically. Works on all API levels. https://youtu.be/L8-5ezsoI5A #AndroidDev #TextView #UI

Thumbnail for YouTube video: The Next Unicorns: 7 Top AI startups from the HF0 Residency

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Patho.ai’s KAG approach shows how embedding a structured knowledge graph into an LLM enables multi‑step reasoning and numeric inference beyond simple retrieval. The system’s multi‑agent orchestration turns raw data into actionable business insights. https://youtu.be/9AQOvT8LnMI #KAG #KnowledgeGraphs

Thumbnail for YouTube video: Wisdom-Driven Knowledge Augmented Generation at Scale - Chin Keong Lam, Patho AI

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Cisco’s Outshift demo shows how a multi‑agent AI system, built on an OpenConfig knowledge graph, turns a ServiceNow ticket into automated testing and reporting, cutting change‑failure rates. https://youtu.be/m0dxZ-NDKHo #NetworkAutomation #OpenConfig

Thumbnail for YouTube video: Multi Agent AI and Network Knowledge Graphs for Change — Ola Mabadeje, Cisco

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Hazing treats prompt‑injection as an optimization problem, using gradient‑guided token edits and agent‑based judges to uncover jailbreaks in minutes. This turns months of manual testing into a rapid, automated pipeline. https://youtu.be/OMGPvW8TBHc #LLMsecurity #AIhazing

Thumbnail for YouTube video: Fuzzing in the GenAI Era — Leonard Tang, Haize Labs

1

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Generative UI and LLMs dissolve design‑engineering silos, treating AI as a co‑worker and stressing a material‑first approach. V0 prototypes reveal emergent features. https://youtu.be/CiMVKnX-CNI #AIUX #LLMDesign #GenerativeUI

Thumbnail for YouTube video: Form factors for your new AI coworkers — Craig Wattrus, Flatfile

1

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

BlackRock’s AI‑app framework demonstrates that domain‑specific prompt engineering and a sandbox/factory architecture can accelerate investment‑operations workflows while embedding mandatory compliance checkpoints. https://youtu.be/08mH36_NVos #AIinFinance #PromptEngineering

Thumbnail for YouTube video: How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

Current AI metrics ignore human perception. Rodriguez shows how JPEG’s perceptual tricks inspire better evaluation, urging metrics that learn from human aesthetic judgment. Watch the full talk: https://youtu.be/h5ItAJuB3Fc #AI #HumanPerception #Evaluation

Thumbnail for YouTube video: Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

1

lvrgd.bsky.social @lvrgd.bsky.social · Aug 25

The talk shows that an eval system is an engineered, automated artifact; when it demonstrates clear business value—like a 1‑day model rollout—it speaks for itself. https://youtu.be/a4BV0gGmXgA #LLMEvaluation #ProductEngineering