Chunking Methods for RAG
📰 Medium · Python
Learn 7 chunking methods for RAG pipelines with real code and a retrieval benchmark to improve your retrieval performance
Action Steps
- Apply fixed-size chunking by slicing text every N characters using the `fixed_size_chunking` function
- Use sentence-based chunking with the `sentence_transformers` library to split text into individual sentences
- Implement sliding window chunking to generate overlapping chunks of text
- Utilize a library like `Docling` for PDF-to-text conversion and handle tables, headings, and paragraphs
- Evaluate the performance of different chunking methods using a retrieval benchmark
- Experiment with other chunking methods such as graph-based or semantic chunking
Who Needs to Know This
Machine learning engineers and NLP specialists building RAG pipelines can benefit from this article to optimize their retrieval performance
Key Insight
💡 Choosing the right chunking method can significantly impact the performance of your RAG pipeline
Share This
Boost your RAG pipeline's performance with 7 chunking methods!
Key Takeaways
Learn 7 chunking methods for RAG pipelines with real code and a retrieval benchmark to improve your retrieval performance
Full Article
Title: Chunking Methods for RAG
URL Source: https://medium.com/@jain.ajanuj/chunking-methods-for-rag-828bc3160b2e?source=rss------python-5
Published Time: 2026-04-12T19:05:02Z
Markdown Content:
# Chunking Methods for RAG. If you’re building a RAG pipeline… | by Jain Ajanuj | Apr, 2026 | Medium
[Sitemap](https://medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Chunking Methods for RAG
[](https://medium.com/@jain.ajanuj?source=post_page---byline--828bc3160b2e---------------------------------------)
[Jain Ajanuj](https://medium.com/@jain.ajanuj?source=post_page---byline--828bc3160b2e---------------------------------------)
Follow
8 min read
·
Just now
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2F828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&user=Jain+Ajanuj&userId=65c24e2eb518&source=---header_actions--828bc3160b2e---------------------clap_footer------------------)
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=---header_actions--828bc3160b2e---------------------bookmark_footer------------------)
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=---header_actions--828bc3160b2e---------------------post_audio_button------------------)
Share
If you’re building a RAG pipeline, chunking is the step that quietly decides whether your retrieval works or doesn’t. We will cover 7 methods, with real code and a retrieval benchmark at the end.
The document used throughout: _Designing Machine Learning Systems_ by Chip Huyen. Parsed with [Docling](https://github.com/DS4SD/docling).
Complete code: [Github](https://github.com/ajanujaj/Chunking_Methods)
**The Setup**
from docling.document_converter import DocumentConverter
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
converter = DocumentConverter()
result = converter.convert("data/DesignMachineLearningSystem.pdf")
Docling handles the PDF-to-text conversion cleanly — tables, headings, paragraphs all intact. From here, each method takes `result.document` and returns a list of text chunks.
## Method 1: Fixed-Size Chunking
The simplest possible approach. Slice the text every N characters.
def fixed_size_chunking(document, chunk_size=1000):
text = document.export_to_text()
chunks = [text[i:i+chunk
URL Source: https://medium.com/@jain.ajanuj/chunking-methods-for-rag-828bc3160b2e?source=rss------python-5
Published Time: 2026-04-12T19:05:02Z
Markdown Content:
# Chunking Methods for RAG. If you’re building a RAG pipeline… | by Jain Ajanuj | Apr, 2026 | Medium
[Sitemap](https://medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Chunking Methods for RAG
[](https://medium.com/@jain.ajanuj?source=post_page---byline--828bc3160b2e---------------------------------------)
[Jain Ajanuj](https://medium.com/@jain.ajanuj?source=post_page---byline--828bc3160b2e---------------------------------------)
Follow
8 min read
·
Just now
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2F828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&user=Jain+Ajanuj&userId=65c24e2eb518&source=---header_actions--828bc3160b2e---------------------clap_footer------------------)
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=---header_actions--828bc3160b2e---------------------bookmark_footer------------------)
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D828bc3160b2e&operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40jain.ajanuj%2Fchunking-methods-for-rag-828bc3160b2e&source=---header_actions--828bc3160b2e---------------------post_audio_button------------------)
Share
If you’re building a RAG pipeline, chunking is the step that quietly decides whether your retrieval works or doesn’t. We will cover 7 methods, with real code and a retrieval benchmark at the end.
The document used throughout: _Designing Machine Learning Systems_ by Chip Huyen. Parsed with [Docling](https://github.com/DS4SD/docling).
Complete code: [Github](https://github.com/ajanujaj/Chunking_Methods)
**The Setup**
from docling.document_converter import DocumentConverter
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
converter = DocumentConverter()
result = converter.convert("data/DesignMachineLearningSystem.pdf")
Docling handles the PDF-to-text conversion cleanly — tables, headings, paragraphs all intact. From here, each method takes `result.document` and returns a list of text chunks.
## Method 1: Fixed-Size Chunking
The simplest possible approach. Slice the text every N characters.
def fixed_size_chunking(document, chunk_size=1000):
text = document.export_to_text()
chunks = [text[i:i+chunk
DeepCamp AI