srichavali.bsky.social
@srichavali.bsky.social
Reposted
Follow the full trace → eval → iterate loop end to end with
@ArizePhoenix

Try it here:
arize.com/docs/phoenix...
arize.com/docs/phoenix...
arize.com/docs/phoenix...
arize.com/docs/phoenix...
Send Traces From Your App - Phoenix
arize.com
February 11, 2026 at 7:32 PM
Reposted
🔹 Trace agents to capture spans across execution flow, tool calls, & LLM calls
🔹 Define an eval to score outputs and label failures
🔹 Build a dataset of failure cases so you have concrete data to test iterations
🔹 Run experiments & Test Prompts to compare agent versions and verify improvements
February 11, 2026 at 7:32 PM
Reposted
Go from traced runs to measurable quality metrics with @ArizePhoenix TS evals 🚀

Check it out!
arize.com/docs/phoeni...

arize.com/docs/phoeni...

arize.com/docs/phoeni...
Customize Your Evaluation Template - Phoenix
arize.com
February 9, 2026 at 4:30 PM
Reposted
🔺 Use real trace data from your app runs as the basis for evaluating LLM output

🔺 Build evaluators (built-in or custom) that score outputs on correctness, relevance, and other quality criteria

🔺 Run those evaluators with Phoenix’s TypeScript eval tooling to produce structured quality metrics
February 9, 2026 at 4:30 PM
Reposted
📘 Step-by-step guide to using Phoenix + UQLM:
arize.com/docs/phoenix...
UQLM Confidence & Hallucination Risk - Phoenix
arize.com
December 30, 2025 at 5:52 PM
Reposted
⚪️ Spot hallucination-prone responses by surfacing low-confidence outputs

⚪️ Flag uncertain generations for fallback, review, or guardrails

⚪️ Compare prompts & models more rigorously with uncertainty signals

⚪️ Monitor safety + reliability in production by tracking confidence drift
December 30, 2025 at 5:52 PM