Michal Stanislawek
xmstan.bsky.social
Michal Stanislawek
@xmstan.bsky.social
Strategist & Solution Builder
Conversational & Generative AI
Live Media
What's the most "obvious" instruction in your prompts that would make a stranger's head explode? Let's play prompt therapists.

I'll start.

One of my glorious instructions was "Make it a short, but detailed explanation". LLM tried to obey. A human just give me a confused look.
June 23, 2025 at 5:48 AM
Reality check

This won't replace your evals (that's a whole other topicI'll cover later). But while evals tell you if your prompt failed (and how often), this tells you why you're about to waste a week building test cases for instructions that don't even make sense to humans.
June 23, 2025 at 5:48 AM
And watch your colleague's reasoning shatter your worldview. That "obvious" instruction? Only obvious to you and your rubber duck.

This isn't debugging prompts - it's debugging the lies we tell ourselves about clarity. Every prompt is like a Rorschach test of your comm skills.
June 23, 2025 at 5:48 AM
💡 Step 3: The Therapy Session
Now for the uncomfortable part.

- "Why did you interpret 'brief' as 3 sentences when I meant 3 paragraphs?"
- "What assumption made you think I wanted JSON when I said 'structured'?"
- "Why did you search the web when I said 'check our records'?"
June 23, 2025 at 5:48 AM
🧠 Step 2: The Human LLM
Person B reads it cold and states exactly what they'd do. What tools would they use? What would they output?
The Prompter's job: Shut up and take notes on how badly they communicate.
June 23, 2025 at 5:48 AM
The Reverse Wizard of Oz (Or: How to Discover You Can't Communicate)

🎭 Step 1: The Confession Booth
Person A (the "Prompter") writes their precious prompt. Sends it verbatim—no context, no explanations, no "what I really meant was..."—to Person B.
June 23, 2025 at 5:48 AM
Everyone knows the Wizard of Oz test - human pretends to be AI, tests the happy path, everyone goes home. Cute. But it's like testing a parachute indoors when it comes to your prompts. Here's something that actually catches your (and mine) poor choice of words.
June 23, 2025 at 5:48 AM
My hot take:
90% of business requests for an "AI Agent" can be solved with a simple, robust workflow from Step 2.

The real skill is knowing the difference.

What's a recent 'agent' request you've had to translate back to reality?
June 21, 2025 at 6:30 AM
🧠 Step 3: The Intern (Autonomous Agents)
The final boss. You give an agent a goal, a set of tools (like APIs), and the freedom to decide how to use them. It has the potential for magic. It also has the potential to get spectacularly, creatively wrong in ways you can't predict.
June 21, 2025 at 6:30 AM
🏭 Step 2: The Assembly Line (Chained Workflows)
The output of one call becomes the input for the next.

Summarize a review → Extract key complaints → Draft a reply

It’s a sequence of simple tasks that creates a powerful result. Structured, repeatable, and debuggable.
June 21, 2025 at 6:30 AM
📞 Step 1 The Walkie-Talkie (Simple LLM Calls)
Push to talk, get a response. That's it. Use it for summarizing text, classifying content, or rewriting copy. It's a single, reliable tool for a single job. Master this first.
June 21, 2025 at 6:30 AM
Everyone wants to build a self-driving car, but most teams haven't mastered the bicycle yet. When it comes to AI, don't jump to the most complex solution. Climb the ladder. It'll save you a world of pain and wasted resources.
June 21, 2025 at 6:30 AM
When your AI agent burns through $10k in API calls ordering 1000 pizzas (probably already true for someone, somewhere), you'll wish you had option 2.

If we are to believe Gemini, Google is clearly building for it.

We should be too.
June 20, 2025 at 1:27 PM
Because when your AI inevitably fails, and IT WILL (likely at 3am on Sunday), you have two choices.

1. A black box that leaves you shrugging your shoulders.
2. A system that snitches on itself like a guilty teenager, telling you exactly where the internal wiring is crossed.
June 20, 2025 at 1:27 PM
Your AI needs to be a glass box. Monitor not just what it spits out, but the entire fever dream that led to that decision. You need structured logs, automated flags, and a clear path from failure to fix.
June 20, 2025 at 1:27 PM
This highlights a dirty secret of AI in production.

We're all obsessed with prompt engineering and outputs while building black boxes on quicksand. The real work, the unglamorous grind that separates the demos from deployments, is building systems that can explain failure.
June 20, 2025 at 1:27 PM
Turns out, this wasn't a canned apology. When asked about it, Gemini admitted it had triggered a full diagnostic sequence - logging its own flawed reasoning, tagging the specific error (tool_use_failure), and sending a flare up to the human engineers.
June 20, 2025 at 1:27 PM
It failed a simple task. And instead of the usual "I'm sorry, Dave. I'm afraid I can't do that", I got this:
"I am escalating this issue, so my internal systems can be fixed."

Now, I had to know more.
June 20, 2025 at 1:27 PM
Now, your move:

What's your go-to BS detector question?

Drop it below - let's build the ultimate filter together.
June 20, 2025 at 5:57 AM
5. "What's something you believe about AI that most other 'experts' disagree with?"
This reveals independent thought. Are they just repeating the latest from Twitter, or do they have a unique, hard-won perspective?
June 20, 2025 at 5:57 AM
4. "Explain 'agentic workflow' to me like I'm our CFO."
Can they translate jargon into business value? If they can't explain it simply, they don't understand it deeply enough to implement it effectively. Bonus points if they don't use the word "synergy."
June 20, 2025 at 5:57 AM
3. How would you build a POC for our specific problem for under $1,000?
This is a pure pragmatism test. Can they scope down? Can they deliver value without a six-figure budget? A real expert knows the first step isn't building a rocket. It is seeing if the engine can even start
June 20, 2025 at 5:57 AM