From 30 to 60 Tokens/Second: How I Got vLLM Running on 2x RTX 3090

📰 Medium · LLM

Learn how to install and run vLLM on 2x RTX 3090 to achieve 60 tokens/second, a significant performance boost for LLM applications

advanced Published 6 May 2026
Action Steps
  1. Install Ubuntu on a machine with 2x RTX 3090
  2. Configure the NVIDIA drivers for optimal performance
  3. Build and install vLLM using the provided instructions
  4. Run vLLM on the configured hardware to achieve 60 tokens/second
  5. Test and validate the performance of vLLM on the 2x RTX 3090 setup
  6. Optimize vLLM settings for further performance improvements
Who Needs to Know This

This guide benefits AI engineers and researchers working with large language models, as it provides a step-by-step approach to optimizing vLLM performance on high-end hardware

Key Insight

💡 With the right hardware and configuration, vLLM can achieve significant performance gains, making it suitable for demanding LLM applications

Share This
💡 Boost vLLM performance to 60 tokens/second on 2x RTX 3090!
Read full article → ← Back to Reads