Benchmarking Self-Hosted LLMs for Offensive Security

📰 Dev.to AI

This article explores the effectiveness of self-hosted Large Language Models (LLMs) in offensive security scenarios, specifically benchmarking local models against the OWASP Juice Shop. Using a minimal harness and basic HTTP tools, the study evaluates models like gemma4:31b, qwen3.5:27b, and devstral-small-2:24b across challenges involving SQL injection, JWT manipulation, and path traversal. The findings indicate that while local models excel at single-step exploit validation—reaching

Published 15 Apr 2026

Full Article

Title: Benchmarking Self-Hosted LLMs for Offensive Security

URL Source: https://dev.to/mark0_617b45cda9782a/benchmarking-self-hosted-llms-for-offensive-security-3jio

Published Time: 2026-04-15T05:45:05Z

Markdown Content:
# Benchmarking Self-Hosted LLMs for Offensive Security - DEV Community
[Skip to content](https://dev.to/mark0_617b45cda9782a/benchmarking-self-hosted-llms-for-offensive-security-3jio#main-content)

[![Image 1: DEV Community](https://media2.dev.to/dynamic/image/quality=100/https://dev-to-uploads.s3.amazonaws.com/uploads/logos/resized_logo_UQww2soKuUsjaOGNB38o.png)](https://dev.to/)

[Powered by Algolia](https://www.algolia.com/developers/?utm_source=devto&utm_medium=referral)

[Log in](https://dev.to/enter?signup_subforem=1)[Create account](https://dev.to/enter?signup_subforem=1&state=new-user)

## DEV Community

![Image 2](https://assets.dev.to/assets/heart-plus-active-9ea3b22f2bc311281db911d416166c5f430636e76b15cd5df6b3b841d830eefa.svg)0 Add reaction

![Image 3](https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg)0 Like ![Image 4](https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg)0 Unicorn ![Image 5](https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg)0 Exploding Head ![Image 6](https://assets.dev.to/assets/raised-hands-74b2099fd66a39f2d7eed9305ee0f4553df0eb7b4f11b01b6b1b499973048fe5.svg)0 Raised Hands ![Image 7](https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg)0 Fire

0 Jump to Comments 0 Save Boost

Copy link

Copied to Clipboard

[Share to X](https://twitter.com/intent/tweet?text=%22Benchmarking%20Self-Hosted%20LLMs%20for%20Offensive%20Security%22%20by%20Mark0%20%23DEVCommunity%20https%3A%2F%2Fdev.to%2Fmark0_617b45cda9782a%2Fbenchmarking-self-hosted-llms-for-offensive-security-3jio)[Share to LinkedIn](https://www.linkedin.com/shareArticle?mini=true&url=https%3A%2F%2Fdev.to%2Fmark0_617b45cda9782a%2Fbenchmarking-self-hosted-llms-for-offensive-security-3jio&title=Benchmarking%20Self-Hosted%20LLMs%20for%20Offensive%20Security&summary=This%20article%20explores%20the%20effectiveness%20of%20self-hosted%20Large%20Language%20Models%20%28LLMs%29%20in%20offensive...&source=DEV%20Community)[Share to Facebook](https://www.facebook.com/sharer.php?u=https%3A%2F%2Fdev.to%2Fmark0_617b45cda9782a%2Fbenchmarking-self-hosted-llms-for-offensive-security-3jio)[Share to Mastodon](https://s2f.kytta.dev/?text=https%3A%2F%2Fdev.to%2Fmark0_617b45cda9782a%2Fbenchmarking-self-hosted-llms-for-offensive-security-3jio)

[Share Post via...](https://dev.to/mark0_617b45cda9782a/benchmarking-self-hosted-llms-for-offensive-security-3jio#)[Report Abuse](https://dev.to/report-abuse)

[![Image 8: Mark0](https://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3702447%2F0301e2c9-634f-4567-8171-fd5d9da0b3aa.jpg)](https://dev.to/mark0_617b45cda9782a)

[Mark0](https://dev.to/mark0_617b45cda9782a)
Posted on Apr 15

# Benchmarking Self-Hosted LLMs for Offensive Security

[#cybersecurity](https://dev.to/t/cybersecurity)[#infosec](https://dev.to/t/infosec)[#ai](https://dev.to/t/ai)[#llm](https://dev.to/t/llm)

This article explores the effectiveness of self-hosted Large Language Models (LLMs) in offensive security scenarios, specifically benchmarking local models against the OWASP Juice Shop. Using a minimal harness and basic HTTP tools, the study evaluates models like gemma4:31b, qwen3.5:27b, and devstral-small-2:24b across challenges involving SQL injection, JWT manipulation, and path traversal.

The findings indicate that while local models excel at single-step exploit validation—reaching pass rates as high as 98.5%—they falter during complex, multi-step operations such as UNION-based extraction or a

Read full article → ← Back to Reads