lvrgd.bsky.social
@lvrgd.bsky.social
18 followers 3 following 15 posts
Posts Media Videos Starter Packs
2025 is dubbed the Year of Evaluation as agentic systems grow, forcing C‑suite to treat evaluation like a core KPI. Enterprises are now budgeting for whole‑system monitoring, not just single models. Watch the talk: https://youtu.be/CQGuvf6gSrM #AIevaluation #AgenticSystems #EnterpriseAI
V0’s demo shows real‑user data is key to catching hallucinations—build deterministic evals, visualize failures like a basketball court, and plug them into CI to pre‑empt regressions. https://youtu.be/L8OoYeDI_ls #LLMEvals #AIops
The speaker argues the 'Bitter Lesson' shows data‑driven scaling beats domain expertise; he proposes DSPy signatures and evals to decouple system design from models, reducing technical debt. https://youtu.be/qdmxApz3EJI #AI #ML #LLM
Jeff R. shows retrieval is the real bottleneck in retrieval‑augmented LLMs. Fast evals on synthetic queries beat public benchmarks, and clustering conversation metadata turns usage patterns into product decisions. https://youtu.be/jryZvCuA0Uc #AIProduct #DataDriven
Conference opening stresses AI engineering’s maturity and urges the community to co‑define standard models such as LLMOS, LLM‑SDLC, and SPADE, prioritizing human input/output ratio over terminology. https://youtu.be/IHkyFhU6JEY #AIEngineering #StandardModels #SPADE
Brain Trust’s new Loop agent turns the manual eval loop into an automated, data‑driven optimization cycle using frontier LLMs like Cloud4. It integrates side‑by‑side previews and an auto‑apply toggle, speeding up AI product iteration. https://youtu.be/MC55hdWLq4o #AIEngineering #EvalAutomation
OpenAI execs highlight that future AGI relies on a mix of compute‑optimized and latency‑optimized GPUs, with RL‑HF driving reliability. The shift to domain‑specific agents means engineers will focus on model orchestration, not just code. https://youtu.be/avWhreBUYF0 #AI #AGI
The simplest way to keep an icon visible while truncating text is a single TextView with a compound drawable. No extra layout needed; ellipsis appears before the icon automatically. Works on all API levels. https://youtu.be/L8-5ezsoI5A #AndroidDev #TextView #UI
Patho.ai’s KAG approach shows how embedding a structured knowledge graph into an LLM enables multi‑step reasoning and numeric inference beyond simple retrieval. The system’s multi‑agent orchestration turns raw data into actionable business insights. https://youtu.be/9AQOvT8LnMI #KAG #KnowledgeGraphs
Cisco’s Outshift demo shows how a multi‑agent AI system, built on an OpenConfig knowledge graph, turns a ServiceNow ticket into automated testing and reporting, cutting change‑failure rates. https://youtu.be/m0dxZ-NDKHo #NetworkAutomation #OpenConfig
Hazing treats prompt‑injection as an optimization problem, using gradient‑guided token edits and agent‑based judges to uncover jailbreaks in minutes. This turns months of manual testing into a rapid, automated pipeline. https://youtu.be/OMGPvW8TBHc #LLMsecurity #AIhazing
Generative UI and LLMs dissolve design‑engineering silos, treating AI as a co‑worker and stressing a material‑first approach. V0 prototypes reveal emergent features. https://youtu.be/CiMVKnX-CNI #AIUX #LLMDesign #GenerativeUI
BlackRock’s AI‑app framework demonstrates that domain‑specific prompt engineering and a sandbox/factory architecture can accelerate investment‑operations workflows while embedding mandatory compliance checkpoints. https://youtu.be/08mH36_NVos #AIinFinance #PromptEngineering
Current AI metrics ignore human perception. Rodriguez shows how JPEG’s perceptual tricks inspire better evaluation, urging metrics that learn from human aesthetic judgment. Watch the full talk: https://youtu.be/h5ItAJuB3Fc #AI #HumanPerception #Evaluation
The talk shows that an eval system is an engineered, automated artifact; when it demonstrates clear business value—like a 1‑day model rollout—it speaks for itself. https://youtu.be/a4BV0gGmXgA #LLMEvaluation #ProductEngineering