Lightnews — Scholar-powered news

Reposted

arize-phoenix

@arize-phoenix.bsky.social

Follow the full trace → eval → iterate loop end to end with
@ArizePhoenix

Try it here:
arize.com/docs/phoenix...
arize.com/docs/phoenix...
arize.com/docs/phoenix...
arize.com/docs/phoenix...

Send Traces From Your App - Phoenix

arize.com

February 11, 2026 at 7:32 PM

Reposted

arize-phoenix

@arize-phoenix.bsky.social

🔹 Trace agents to capture spans across execution flow, tool calls, & LLM calls
🔹 Define an eval to score outputs and label failures
🔹 Build a dataset of failure cases so you have concrete data to test iterations
🔹 Run experiments & Test Prompts to compare agent versions and verify improvements

February 11, 2026 at 7:32 PM

Reposted

arize-phoenix

@arize-phoenix.bsky.social

Go from traced runs to measurable quality metrics with @ArizePhoenix TS evals 🚀

Check it out!
arize.com/docs/phoeni...

arize.com/docs/phoeni...

arize.com/docs/phoeni...

Customize Your Evaluation Template - Phoenix

arize.com

February 9, 2026 at 4:30 PM

Reposted

arize-phoenix

@arize-phoenix.bsky.social

🔺 Use real trace data from your app runs as the basis for evaluating LLM output

🔺 Build evaluators (built-in or custom) that score outputs on correctness, relevance, and other quality criteria

🔺 Run those evaluators with Phoenix’s TypeScript eval tooling to produce structured quality metrics

February 9, 2026 at 4:30 PM

Reposted

arize-phoenix

@arize-phoenix.bsky.social

📘 Step-by-step guide to using Phoenix + UQLM:
arize.com/docs/phoenix...

UQLM Confidence & Hallucination Risk - Phoenix

arize.com

December 30, 2025 at 5:52 PM

Reposted

arize-phoenix

@arize-phoenix.bsky.social

⚪️ Spot hallucination-prone responses by surfacing low-confidence outputs

⚪️ Flag uncertain generations for fallback, review, or guardrails

⚪️ Compare prompts & models more rigorously with uncertainty signals

⚪️ Monitor safety + reliability in production by tracking confidence drift

December 30, 2025 at 5:52 PM