app.phoenix.arize.com
We released native OpenInference instrumentation for Pipecat using OTEL; enabling end-to-end observability across realtime agent pipelines.
Get rich semantic traces with zero manual instrumentation in Arize Phoenix.
Demo Below 🎬
We released native OpenInference instrumentation for Pipecat using OTEL; enabling end-to-end observability across realtime agent pipelines.
Get rich semantic traces with zero manual instrumentation in Arize Phoenix.
Demo Below 🎬
Available in arize-phoenix-client 1.28.0+ (Python) and @arizeai/phoenix-client 2.0.0+ (TypeScript)
Available in arize-phoenix-client 1.28.0+ (Python) and @arizeai/phoenix-client 2.0.0+ (TypeScript)
@arizeai/phoenix-cli is a command-line interface for retrieving trace data. It provides the same observability data available in the Phoenix UI through shell commands and file exports.
@arizeai/phoenix-cli is a command-line interface for retrieving trace data. It provides the same observability data available in the Phoenix UI through shell commands and file exports.
span.add_event(
name="model.config",
attributes={}
)
span.add_event(
name="model.config",
attributes={}
)
Agents don’t fail like traditional software. They can misclassify, retrieve the wrong docs, or forget context.
Learn the workflows that turn agents into systems you can understand and assess: arize.com/docs/phoeni...
Agents don’t fail like traditional software. They can misclassify, retrieve the wrong docs, or forget context.
Learn the workflows that turn agents into systems you can understand and assess: arize.com/docs/phoeni...
For enterprises that can't use cloud SSO, Phoenix integrates directly with your internal directory—Active Directory, OpenLDAP, or any LDAP v3 server.
Your data. Your infrastructure. Your identity provider.
For enterprises that can't use cloud SSO, Phoenix integrates directly with your internal directory—Active Directory, OpenLDAP, or any LDAP v3 server.
Your data. Your infrastructure. Your identity provider.
🧵👇
🧵👇
Learn the core workflows to:
1. Interpret agent traces
2. Define tasks & build datasets for experiments
3. Construct purposeful LLM evals
4. Iterate based on results to improve reliability.
Choose your language & dive in!⬇️
Learn the core workflows to:
1. Interpret agent traces
2. Define tasks & build datasets for experiments
3. Construct purposeful LLM evals
4. Iterate based on results to improve reliability.
Choose your language & dive in!⬇️
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance
🔗 Dive into the full implementation in our docs & notebook:
We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance
🔗 Dive into the full implementation in our docs & notebook:
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️