Lightnews — Scholar-powered news

Musah Abdulai

@musabdulai.com

Before bragging about “AI agents in production”, show:
• Your rate limits
• Your circuit breakers
• Your rollback plan
• Your max monthly spend per tenant

Otherwise it’s not a system, it’s a stunt.

December 25, 2025 at 10:27 AM

Musah Abdulai

@musabdulai.com

You don’t secure an AI system by “red teaming it once”.
You secure it by:
• Defining what it must never do
• Making those rules enforceable in code
• Monitoring for violations in production
• Having a way to shut it down fast
Policy → controls → telemetry → kill switch.

December 24, 2025 at 5:52 PM

Musah Abdulai

@musabdulai.com

AI agents shouldn’t be trusted by default.
Give them:
• Narrow scope
• Limited tools
• Explicit budgets
• Clear owners
If you can’t answer “who’s on call for this agent?” it has too much power.

December 23, 2025 at 1:06 PM

Musah Abdulai

@musabdulai.com

“The model is cheap” is not a cost strategy.
Real levers:
• Fewer round trips
• Less useless context
• Smarter routing between models
• Caching stable answers
Every avoided call is 100% cheaper and 100% safer.

December 22, 2025 at 6:13 PM

Musah Abdulai

@musabdulai.com

Before tuning prompts, ask:
• What’s the acceptable error rate?
• What’s the max we’re willing to pay per request?
• What does “graceful failure” look like?

LLM systems without these constraints are vibes, not engineering.

December 18, 2025 at 4:04 PM

Musah Abdulai

@musabdulai.com

An AI agent calling tools is cool.
An AI agent calling tools with:
• Timeouts
• Retry limits
• Circuit breakers
• Spend guards

…is something you can show to your SRE and finance teams without apologizing.

December 18, 2025 at 4:04 PM

Musah Abdulai

@musabdulai.com

LLM stacks have 3 pillars:
• Quality → does it help?
• Reliability → does it work today and tomorrow?
• Cost → can we afford success?

Most teams romanticize #1 and discover #2 and #3 when finance and ops show up.

December 18, 2025 at 4:02 PM

Musah Abdulai

@musabdulai.com

AI cost isn’t “our OpenAI bill is high”.

It’s:
• Engineers debugging flaky agents
• Support fixing silent failures
• RevOps dealing with bad insights

Reliability is a cost-optimization strategy.

December 16, 2025 at 3:36 PM

Musah Abdulai

@musabdulai.com

“We have an AI agent that can do everything.”

Translation:
• Unbounded scope
• Unpredictable latency
• Unknown worst-case cost
• Impossible to test

Narrow agents with clear contracts > one omnipotent chaos agent.

December 16, 2025 at 2:07 PM

Musah Abdulai

@musabdulai.com

A lot of “AI observability” talk is dashboards.
What you actually need:
• Can we say “turn this feature OFF now”?
• Can we cap spend per tenant?
• Can we see which prompts keep failing?

Control first, charts later.

December 15, 2025 at 5:50 PM

Musah Abdulai

@musabdulai.com

LLM reliability trick: design like this 👇

1. Small, cheap model for routing & quick wins
2. Medium model for most requests
3. Big model only for high-value, audited paths

You’ll save cost and reduce how often users see “smart but wrong” answers.

December 15, 2025 at 2:05 PM

Musah Abdulai

@musabdulai.com

Optimize LLM cost like an engineer, not a gambler:
• Measure cost per successful outcome, not per token
• Cache aggressively where correctness is stable
• Use smaller models for validation and guardrails

“We shaved 40% of tokens” means nothing if quality tanked.

December 13, 2025 at 6:45 PM

Musah Abdulai

@musabdulai.com

Your AI system is “secure” and “reliable”?
Cool. Now show me:
• How you test changes to prompts & tools
• How you roll back a bad deployment
• How you cap spend in a runaway loop

If the answer is manual heroics, you’re not there yet.

December 13, 2025 at 6:45 PM

Musah Abdulai

@musabdulai.com

AI agents are just microservices that hallucinate.

You still need:
• Timeouts & retries
• Rate limits
• Idempotency
• Cost ceilings

Treat them like unreliable juniors with prod access, not like magic.

December 12, 2025 at 5:38 PM

Musah Abdulai

@musabdulai.com

If your AI app has:
• No p95 latency target
• No cost per-query budget
• No clear failure modes

…you don’t have a product.
You have an expensive, occasionally helpful surprise.

December 12, 2025 at 5:37 PM

Musah Abdulai

@musabdulai.com

The most expensive tokens in your RAG system aren’t the ones you send.

They’re the ones that:
• Hit sensitive docs
• Bypass weak filters
• End up screenshotted into Slack forever

Data minimization is a cost control.

December 10, 2025 at 2:35 PM

Musah Abdulai

@musabdulai.com

Before you optimize RAG latency from 1.2s → 0.8s, ask:

• Do we know our top 10 expensive users?
• Do we know which indexes drive 80% of cost?
• Do we know our riskiest collections?

Performance tuning without cost & risk data is vibes-based engineering.

December 9, 2025 at 4:12 PM

Musah Abdulai

@musabdulai.com

Your vector DB is now:
• A data warehouse
• A search engine
• An attack surface
• A cost center

Still treating it like a sidecar for “chat with your docs” is how you get surprise invoices and surprise incidents.

December 9, 2025 at 8:33 AM

Musah Abdulai

@musabdulai.com

Hot take:
“Guardrails” are often a guilt-offload for not doing:
• Proper access control
• Per-tenant isolation
• Input/output logging

LLM wrappers won’t fix a broken security model. They just make it more expensive.

December 8, 2025 at 2:05 PM

Musah Abdulai

@musabdulai.com

Hidden RAG cost center: abuse.

• No per-user rate limits
• Unlimited queries on expensive models
• Tool calls that hit paid APIs

Congrats, you just built a token-minter for attackers.
Security is also about protecting your wallet.

December 7, 2025 at 2:32 PM

Musah Abdulai

@musabdulai.com

Observability for RAG isn’t just “for quality”:
• Track token spend per user/tenant
• Track which collections are most queried
• Track which prompts hit sensitive docs

Same logs help with cost optimization AND security forensics. Double win.

December 7, 2025 at 2:32 PM

Musah Abdulai

@musabdulai.com

Every “just in case” token you send has a cost:
• Direct $$
• Latency
• Attack surface

Prune your retrieval:
• Fewer, higher-quality chunks
• Explicit collections
• Permission-aware filters

Spend less, answer faster, leak less.

December 6, 2025 at 3:03 PM

Musah Abdulai

@musabdulai.com

Your RAG threat model should include finance:
• Prompt injection that triggers many tool calls
• Queries crafted to hit max tokens every time
• Abuse of “unlimited internal use” policies

Attackers don’t need your data if they can just drain your budget.

December 6, 2025 at 2:57 PM

Musah Abdulai

@musabdulai.com

RAG tradeoff triangle:
• More context → more tokens
• Less context → more hallucinations
• No security → more incidents

Most teams only tune the first two.
Mature teams treat security as a cost dimension too.

December 5, 2025 at 2:31 PM

Musah Abdulai

@musabdulai.com

“Low token cost” demos lie.

In real life RAG:
• 20–50 retrieved chunks
• Tool calls
• Follow-up questions

Now add:
• No rate limits
• No abuse detection
• No guardrails on tools

Congrats, you’ve built a DoS and data-exfil API with pretty UX.

December 5, 2025 at 8:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news