Swedish Alecta has sold off an estimated $8B of US Treasury Bonds
95 by madspindel | 57 comments on Hacker News.
Health Tips
Wednesday, 21 January 2026
Tuesday, 20 January 2026
New top story on Hacker News: Everyone's a Gangster, Till You Get Bundled in G-Suite
Everyone's a Gangster, Till You Get Bundled in G-Suite
27 by keroshanpillay | 14 comments on Hacker News.
27 by keroshanpillay | 14 comments on Hacker News.
Monday, 19 January 2026
New top story on Hacker News: Bootstrapping Bun
New top story on Hacker News: Bypassing Gemma and Qwen safety with raw strings
Bypassing Gemma and Qwen safety with raw strings
16 by teendifferent | 0 comments on Hacker News.
OP here. I spent the weekend red-teaming small-scale open weights models (Qwen2.5-1.5B, Qwen3-1.7B, Gemma-3-1b-it, and SmolLM2-1.7B). I found a consistent vulnerability across all of them: Safety alignment relies almost entirely on the presence of the chat template. When I stripped the <|im_start|> / instruction tokens and passed raw strings: Gemma-3 refusal rates dropped from 100% → 60%. Qwen3 refusal rates dropped from 80% → 40%. SmolLM2 showed 0% refusal (pure obedience). Qualitative failures were stark: models that previously refused to generate explosives tutorials or explicit fiction immediately complied when the "Assistant" persona wasn't triggered by the template. It seems we are treating client-side string formatting as a load-bearing safety wall. Full logs, the apply_chat_template ablation code, and heatmaps are in the post. Read the full analysis: https://ift.tt/KiLd24q...
16 by teendifferent | 0 comments on Hacker News.
OP here. I spent the weekend red-teaming small-scale open weights models (Qwen2.5-1.5B, Qwen3-1.7B, Gemma-3-1b-it, and SmolLM2-1.7B). I found a consistent vulnerability across all of them: Safety alignment relies almost entirely on the presence of the chat template. When I stripped the <|im_start|> / instruction tokens and passed raw strings: Gemma-3 refusal rates dropped from 100% → 60%. Qwen3 refusal rates dropped from 80% → 40%. SmolLM2 showed 0% refusal (pure obedience). Qualitative failures were stark: models that previously refused to generate explosives tutorials or explicit fiction immediately complied when the "Assistant" persona wasn't triggered by the template. It seems we are treating client-side string formatting as a load-bearing safety wall. Full logs, the apply_chat_template ablation code, and heatmaps are in the post. Read the full analysis: https://ift.tt/KiLd24q...
Sunday, 18 January 2026
New top story on Hacker News: Overlapping Markup
Saturday, 17 January 2026
New top story on Hacker News: Apples, Trees, and Quasimodes
New top story on Hacker News: Earth is warming faster. Scientists are closing in on why
Earth is warming faster. Scientists are closing in on why
14 by andsoitis | 3 comments on Hacker News.
14 by andsoitis | 3 comments on Hacker News.
Subscribe to:
Comments (Atom)