Every RAG tutorial starts the same way.

📰 Medium · RAG

Load a PDF. Chunk it into 1,000-character slices. Embed with OpenAI. Drop into Chroma. Pass the top-5 chunks to GPT-4. Ship it. Continue reading on Medium »

Published 21 Apr 2026

Full Article

Title: Every RAG tutorial starts the same way.

URL Source: https://sahajshukla.medium.com/every-rag-tutorial-starts-the-same-way-0579f0bf74ef?source=rss------rag-5

Published Time: 2026-04-21T19:49:12Z

Markdown Content:
# Every RAG tutorial starts the same way. | by Sahaj Shukla | Apr, 2026 | Medium

[Sitemap](https://sahajshukla.medium.com/sitemap/sitemap.xml)

[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)

Get app

[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)

[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

![Image 1](https://miro.medium.com/v2/resize:fill:32:32/1*dmbNkD5D-u45r44go_cf0g.png)

# Every RAG tutorial starts the same way.

[![Image 2: Sahaj Shukla](https://miro.medium.com/v2/resize:fill:32:32/0*YmL0QbtAl79VScAH.)](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)

[Sahaj Shukla](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)

Follow

17 min read

·

1 hour ago

[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&user=Sahaj+Shukla&userId=e41b774efb15&source=---header_actions--0579f0bf74ef---------------------clap_footer------------------)

1

[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------bookmark_footer------------------)

[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------post_audio_button------------------)

Share

Press enter or click to view image in full size

![Image 3](https://miro.medium.com/v2/resize:fit:700/1*hlwQvbS5LzipiHn6TOgmfw.png)

Load a PDF. Chunk it into 1,000-character slices. Embed with OpenAI. Drop into Chroma. Pass the top-5 chunks to GPT-4. Ship it.

That’s not RAG. That’s a demo.

Actual enterprise data doesn’t live in PDFs. It lives in:

* A Snowflake warehouse with 2,300 tables
* A Databricks lakehouse ingesting 40 GB of CDC events per hour
* A dozen Kafka topics that the finance team swears by
* An operational Postgres that’s mutated 400 times since you started reading this
* And yes, somewhere in SharePoint, a folder of PDFs that nobody owns

If you build a RAG system for this world the way the tutorials tell you to, it will fail in production within a week. Not because the model isn’t smart. Because you’re chunking and embedding the wrong things.

I’ve spent the last year building retrieval systems for regulated financial data — schema-aware NL2SQL for
Read full article → ← Back to Reads