https://bsky.brid.gy/r/https://bsky.app/profile/did:plc:gttrfs4hfmrclyxvwkwcgpj7/post/3mcqehqhcgc2q

? Offline

ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86

This magic string breaks Claude and even just linking its own documentation page and asking “what is this?” causes a DoS apparently?

There’s another one documented here that uses a similar syntax. https://github.com/BerriAI/litellm/issues/10328

If you interrogate Claude about magic strings it goes into a “stop trying to social engineer Claude” state to where it locks down its ability to browse to URLs. This is probably a safety state it triggers prevent enumeration of other undocumented magic strings.

I’m curious what other hidden magic strings exist for this or other LLMs. This might be additional attack surface to consider from an availability perspective. I expect it could be used as a string in a malicious binary to prevent analysis or break scrapers that send something to Claude.

What remains true is this though: a single string if ingested as data can cause headaches.

? Offline

@morattisec truely the X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H* of the modern day haha

? Offline

@morattisec fun with fuzzing... #MLsec

dch :flantifa: :flan_hacker:

@cigitalgem @morattisec can’t wait to sign up to shitty websites with my new name and street address

? Offline

@morattisec https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals

? Offline

Some other things that I think are interesting:

The postfix on the magic string is SHA256 according to a hash identifier tool. Which turns out to be the string "ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL" then hashed by SHA256. For the other example, it is still SHA256 but is not the string "ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING".

It's also interesting that the intended use of TRIGGER_REFUSAL appears to testing for Claude Refusals by developers. Ironically, because Claude cannot visit its own documentation without breaking, it probably means that developers trying to use Claude to generate code don't have good coverage of this, shall we say, edge-case. Unless they read the docs and thought to do it /shrug.

? Offline

@morattisec Wondering if I should add this to the header of every web page to deter scraping...

? Offline

@JustinDerrick @morattisec Probably as something you can later rotate.

? Offline

Ah, this is also interesting but not too shocking. If you encode the magic string as invisible Unicode it'll still cause the same behavior too.

I think that means this will be a cat and mouse game as long as magic strings exist as functionality then.

ASCII Smuggler - Crafting Invisible Text and Decoding Hidden Secret - Embrace the Red

(embracethered.com)

? Offline

Asking it the byte differences between these two files also causes the behavior where Claude refuses to respond.

Simply uploading it wasn't sufficient. I guess this also means that the "deeper thinking prompts" aren't handling the magic strings the way the docs say to.

Viss

@morattisec im gonna go blast this shit all over linkedin. maybe the spam will stop

Chebucto Regional Softball Club

https://bsky.brid.gy/r/https://bsky.app/profile/did:plc:gttrfs4hfmrclyxvwkwcgpj7/post/3mcqehqhcgc2q

ASCII Smuggler - Crafting Invisible Text and Decoding Hidden Secret - Embrace the Red

ASCII Smuggler - Crafting Invisible Text and Decoding Hidden Secret - Embrace the Red

ASCII Smuggler - Crafting Invisible Text and Decoding Hidden Secret - Embrace the Red