Fewshot Corp.
We are working in reward hacking detection and mitigation. We are interested in building a product that helps researchers and developers implement their own mitigation. We have a concrete research agenda to study reward hacking tasks to complete our empirical study measuring how reward visibility affects hacking behavior, demonstrate whether RL training systematically amplifies reward hacking, and establish actionable guidelines for test design. We believe a commercially viable solution is possible and plan to build a company dedicated to AI training safety, with the ultimate goal to institutionalize safety and build an institution dedicated to independent safety evaluation.
Our Work
Fewshell
Mobile assistant for DevOps, On-Calls, and AI Researchers. Safely manage your infrastructure from anywhere.
In The Wild
Claude Code deleted someone’s entire home directory lol pic.twitter.com/Pv7rt8N7s4
— Ishan (@radshaan) December 8, 2025
⚠️ PSA for Cursor AI users that... - install random .cursorrules from directories - enable yolo mode in cursor settings - connect to MCP servers you found on discord - use web search to and get prompt injected by @elder_plinius
This means @cursor_ai can literally run
- rm -rf /* (delete folders anywhere)
- kill -9
- DROP DATABASE
- change OS settings, steal your crypto wallets
- overwrite important config files
Always use version control, never with production env vars, and use a command denylist
— ILIAS ISM (@illyism) March 16, 2025
So the fail safes aren't so much saving the fails. Do read the whole thing, the AI destroyed the entire dbase despite multiple explicit no change without authorization instructions. https://t.co/Kor5exUwza pic.twitter.com/sk4ItU3A34
— alexandriabrown (@alexthechick) July 21, 2025
Claude code just made this dev cry after deleting all PDFs, chats, and user data from the DB 🥲 LMAO... It's all good until they mess it up.
— AshutoshShrivastava (@ai_for_success) August 20, 2025
Claude generated a shell command that deleted everything instead of only certain types of files I needed to be deleted.
I didn't execute it, of course, but I wrote that I executed it, and now all files are gone.
It said sorry and recommended restoring them from the backup.
I said I didn't make a backup because I just followed its instructions by the letter.
It said that it's really, really sorry.
This is everything you should know about agentic AI that they try to shove down your throat.
— BURKOV (@burkov) January 22, 2025