Martin Koch
banner
martinkoch.bsky.social
Martin Koch
@martinkoch.bsky.social
LLMs will eat the world.
CPO @ aqua-cloud.io
Opinions are my own.
- Employs an LLM judge to assess compliance, focusing on rule adherence rather than correctness alone.

Results
PromptPex achieved 5.5% higher non-compliance rates compared to baseline test generators, indicating its effectiveness in identifying prompt weaknesses.

Paper: arxiv.org/abs/2503.05070v1
PromptPex: Automatic Test Generation for Language Model Prompts
Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like tradition...
arxiv.org
March 13, 2025 at 4:57 PM
How It Works:

- Extracts Input Specifications (IS) and Output Rules (OR) directly from prompts using LLMs.

- Generates targeted tests based on IS and OR to validate prompt compliance.

- Creates challenging "inverse" tests from OR rules to evaluate model limits.

🧵3/n
March 13, 2025 at 4:57 PM
Key Features:

✅ Specification Extraction: Provides insights into prompt behavior, beyond basic black-box testing

✅ Inverse Rule-Based Testing: Uncovers edge cases to enhance prompt robustness

✅ Automated Compliance Checks: Facilitates prompt portability and informed model selection

🧵2/n
March 13, 2025 at 4:57 PM
Zuck folds 🧵6/6
Follows Elons footsteps.
January 7, 2025 at 2:53 PM

Zuck folds 🧵5/6
January 7, 2025 at 2:53 PM
Zuck folds 🧵5/6
Will push back against European Censorship Regulations with help of US Gov
January 7, 2025 at 2:53 PM
Zuck folds 🧵3/6
Will move Content review team from California to Texas.
January 7, 2025 at 2:48 PM
Zuck folds 🧵2/6

People wanted less political content in their feed as they felt stressed by it, so they toned it down.
But "it feels like we are in a new era now" so they will fill your feed with political content again.
January 7, 2025 at 2:48 PM