app.phoenix.arize.com
It shows how Phoenix helps teams inspect agent reasoning, tool selection, & control flow, making NAT-based agents easier to debug, evaluate, and run in production.
Taught by @mcbrayer.bsky.social
It shows how Phoenix helps teams inspect agent reasoning, tool selection, & control flow, making NAT-based agents easier to debug, evaluate, and run in production.
Taught by @mcbrayer.bsky.social
It shows how Phoenix helps teams inspect agent reasoning, tool selection, & control flow, making NAT-based agents easier to debug, evaluate, and run in production.
Taught by @mcbrayer.bsky.social
It shows how Phoenix helps teams inspect agent reasoning, tool selection, & control flow, making NAT-based agents easier to debug, evaluate, and run in production.
Taught by @mcbrayer.bsky.social
For enterprises that can't use cloud SSO, Phoenix integrates directly with your internal directory—Active Directory, OpenLDAP, or any LDAP v3 server.
Your data. Your infrastructure. Your identity provider.
For enterprises that can't use cloud SSO, Phoenix integrates directly with your internal directory—Active Directory, OpenLDAP, or any LDAP v3 server.
Your data. Your infrastructure. Your identity provider.
AI development has two loops, Meta-evaluation lives in the inner loop.
We also walked through a live demo of this loop in practice, iteratively improving the judge and showing measurable gains at each step.
AI development has two loops, Meta-evaluation lives in the inner loop.
We also walked through a live demo of this loop in practice, iteratively improving the judge and showing measurable gains at each step.
Our commitment: privacy and security are foundational
What we're collecting: Simple, anonymous usage stats
Opt out: Set PHOENIX_TELEMETRY_ENABLED=false—no questions asked.
github.com/Arize-ai/ph...
Our commitment: privacy and security are foundational
What we're collecting: Simple, anonymous usage stats
Opt out: Set PHOENIX_TELEMETRY_ENABLED=false—no questions asked.
github.com/Arize-ai/ph...
🧵👇
🧵👇
It's a simple API, but it unlocks an important workflow for LLM evaluation: open coding. Here's why this matters.
It's a simple API, but it unlocks an important workflow for LLM evaluation: open coding. Here's why this matters.
Watch the session below 👇
Covered the basics of observability + evals, and showed via a Mastra agent how to set up tracing, run evals, & start your iteration cycle.
Check it out here 🚀
www.youtube.com/watch?v=qQGQ...
Watch the session below 👇
Learn the core workflows to:
1. Interpret agent traces
2. Define tasks & build datasets for experiments
3. Construct purposeful LLM evals
4. Iterate based on results to improve reliability.
Choose your language & dive in!⬇️
Learn the core workflows to:
1. Interpret agent traces
2. Define tasks & build datasets for experiments
3. Construct purposeful LLM evals
4. Iterate based on results to improve reliability.
Choose your language & dive in!⬇️
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.
Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset
👉 Full demo + code below 👇
Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.
Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset
👉 Full demo + code below 👇
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.
And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.
With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.
And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.
✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.
Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.
Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
I’ve been really liking some of the eval tools from Pydantic's evals package.
Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
I’ve been really liking some of the eval tools from Pydantic's evals package.
Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors
Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors