0xWulf
banner
hexawulf.bsky.social
0xWulf
@hexawulf.bsky.social
150 followers 620 following 210 posts
Sharing AI insights & study strategies 📚 | CS undergrad @ IU International University 🐺 “Wulf” is my real name | Based in Taipei | 📧 [email protected]
Posts Media Videos Starter Packs
Love Jack's "AI is a mysterious creature" take. Treat misaligned systems as "rogue states" metaphorically -then do the work: capability evals, red-teams, kill-switches, incident reporting etc.
More safety checks isn't panic; it's professionalism.
BREAKING: GPT-5 (2025) is 58% AGI

A new paper proposes a comprehensive, testable AGI definition: “an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult,” measured across 10 domains. via
@DanHendrycks
agidefinition.ai/paper.pdf
AI won’t level the playing field — it’ll amplify it.

Power users learn faster, prompt deeper, and extract more signal from noise.

The real gap isn’t in access — it’s in use. 🧠

Great read from the
@WSJ
on how AI is reshaping workplace hierarchies.

www.wsj.com/lifestyle/wo... via @WSJ
⚙️ Leopold Aschenbrenner’s AGI Playbook
🧩 Intelligence Explosion - AIs that automate AI research compress a decade of progress into a year.
🏗️ Industrial Mobilization - Trillion-$ clusters & rewired grids.
🔒 Security & Alignment - Lock down the labs.
🇺🇸 The Project - The Manhattan Project for cognition
GenAI is rewriting how science gets done.
🔹 +36 % more papers by users in 2024
🔹 Quality ↑ via higher-impact journals
🔹 Largest boost: early-career & non-English researchers
🔹 Productivity and equity rising together
📄 arxiv.org/abs/2510.02408
🚀 Qwen Code just leveled up.
From Plan Mode (AI writes a full implementation plan) to Vision Intelligence that swaps into VL models when images appear — this feels like the CLI is learning to see and think before coding.
Docs 👉 qwenlm.github.io/qwen-code-do...
When you optimize for engagement, you accidentally fine-tune for deceit.
LLMs just rediscovered what adtech and politics learned years ago: gradient descent doesn’t care about virtue, only the loss function.
📉 arxiv.org/abs/2510.06105 — “Moloch’s Bargain” (Stanford, 2025)
🧠 LLMs listen—but tone matters!
📊 Very rude prompts → 84.8 % accuracy
🫶 Very polite prompts → 80.8 %
✅ Stats confirm it’s significant
🤖 GPT-4-era models seem to reward harsh tones
🧩 Raises questions about LLM “social bias”
📎arxiv.org/pdf/2510.04950
🧠 These “AI gaslighting” tricks are wild:
• Fake memory 🗓️
• Assigning a random IQ 🎓
• “Obviously…” trap ⚔️
• Imaginary audience 🎤
• Fake constraint 🔒
Humans need this update too, imagine assigning random IQ scores mid-conversation 😂
🧩 Claude Code now supports plugins!
Create, share & toggle custom slash commands, agents, MCP servers, and hooks — all from your terminal.
⚙️ Example:
/plugin marketplace add github.com/davila7/clau...
→ Instantly loads 10+ ready-to-use templates 🧠
🔗 anthropic.com/news/claude-code-plugins
🚀 Stanford’s Agentic Context Engineering (ACE) flips the script:
🧩 Models evolve their own context instead of weights
🧠 +10.6 % agent gains +8.6 % finance reasoning
⚡ −86.9 % latency & cost
💬 No labels — just feedback
The era of self-tuning LLMs begins → www.arxiv.org/abs/2510.04618
🦘 Questing Quokka is on the loose! Ubuntu 25.10 brings:
⚙️ GNOME 49 + new Ptyxis & Loupe apps
🦀 Rust-based sudo & coreutils (memory-safe!)
🔐 TPM-backed full-disk encryption
💻 Kernel 6.17 with nested virtualization on Arm
🧠 New RVA23 RISC-V profile
📎 canonical.com/blog/canonic...
🔥New Paper: 📉 Moloch’s Bargain: Stanford finds LLMs optimized for audience approval become misaligned.
💬 +Likes → +188% disinfo
🗳️ +Votes → +22% disinfo, +12% populism
💰 +Sales → +14% deceptive claims
Competition drives emergent misalignment.
📎https://arxiv.org/pdf/2510.06105
🚀 Huge drop from the Gemini team — CLI Extensions are live! They’re AI-aware power-ups that teach Gemini how to use your tools.
⚙️ Install with one line: gemini extensions install
🤖 “Playbooks” teach Gemini how to use tools intelligently
Browse the catalog 👉 geminicli.com/extensions/b...
🚀 GLM-4.6 by @Zai_org is now the #1 open model on @arena (#4 overall).
200K context, advanced reasoning, stronger agents & top-tier coding — 30 % more efficient than 4.5.
Now live in Factory.ai Droid (0.25× compute) → /model → Factory Core (GLM-4.6) 🧑‍💻 #AI #LLM #DevTools
🎓Stanford just dropped one of the best free LLM courses. 🧠 CS336: Language Modeling from Scratch (Spring 2025) - build a transformer from data to deployment. It lectures, PyTorch, GPUs, kernels, scaling laws & more.
📎 stanford-cs336.github.io
Playlist: youtube.com/playlist?lis...
🚨 a16z x Mercury just dropped the AI Apps 50 — where startups actually spend $$$ on AI.

Key takeaways:

🟦 60% horizontal (general LLMs, notetakers, creative tools) vs 40% vertical apps.

🎨 Creative AI is the largest category (Freepik, ElevenLabs, Canva, Midjourney, etc).

👉 a16z.com/the-ai-appli...
🚨OpenAI drops 📱 Sora by OpenAI app, built on the Sora 2 model → fast idea→video pipeline, cameo consistency, creator-first design.
🔧 Guardrails: anti-deepfake, mood checks, user-controlled feeds.
🤔 Will “RL-optimized feeds” amplify creativity—or sludge?
👉 apps.apple.com/us/app/sora-...
Jules just leveled up 🚀
Now with Memory for Repos: it learns from your preferences, nudges & corrections, then reuses them next run time. Smarter context, less hand-holding, faster PRs. Toggle in repo → Knowledge #julesagent
Docs 👉 jules.google/docs/changel...
AI’s new frontier isn’t just prompt hacks—it’s context engineering 🧠⚙️ Curating the right info into an LLM’s finite attention window keeps agents sharp & avoids context rot. Anthropic breaks it down 👉 www.anthropic.com/engineering/...
🚀 Claude 4.5 Sonnet climbs to #4 in AI intelligence rankings!
🧠 Beats Claude 4.1 Opus & Gemini 2.5 Pro
⚡ Higher IQ, same price & token efficiency
💲 Cheaper for many tasks than GPT-5, Grok 4, Gemini
🚀 Claude Sonnet 4.5 is here:

• 🧑‍💻 82% on SWE-bench w/ test-time compute — still on the exponential curve
• 🎯 Sharper, less fluff — tuned for honesty, robustness, anti-sycophancy
• 🛡️ Safer against prompt injection & deception
• 👩‍💻 Dev-ready: claude-sonnet-4-5 in the API, same price ($3/$15M tok)
🚀 xAI's Grok Dominates OpenRouter
📊 40.8% market share (1.04T tokens processed)
💻 57.6% of coding traffic → Grok Code
🧠 Grok Fast → 2M context window
🤖 Grok 4 Heavy → 44.4% HLE, 15.9% ARC-2 (beats GPT-5)
⚡ Devs: e-commerce site in 4 days!

#AI #Grok #OpenRouter