Every RAG tutorial starts the same way.
📰 Medium · RAG
Load a PDF. Chunk it into 1,000-character slices. Embed with OpenAI. Drop into Chroma. Pass the top-5 chunks to GPT-4. Ship it. Continue reading on Medium »
Full Article
Title: Every RAG tutorial starts the same way.
URL Source: https://sahajshukla.medium.com/every-rag-tutorial-starts-the-same-way-0579f0bf74ef?source=rss------rag-5
Published Time: 2026-04-21T19:49:12Z
Markdown Content:
# Every RAG tutorial starts the same way. | by Sahaj Shukla | Apr, 2026 | Medium
[Sitemap](https://sahajshukla.medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Every RAG tutorial starts the same way.
[](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)
[Sahaj Shukla](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)
Follow
17 min read
·
1 hour ago
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&user=Sahaj+Shukla&userId=e41b774efb15&source=---header_actions--0579f0bf74ef---------------------clap_footer------------------)
1
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------bookmark_footer------------------)
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------post_audio_button------------------)
Share
Press enter or click to view image in full size

Load a PDF. Chunk it into 1,000-character slices. Embed with OpenAI. Drop into Chroma. Pass the top-5 chunks to GPT-4. Ship it.
That’s not RAG. That’s a demo.
Actual enterprise data doesn’t live in PDFs. It lives in:
* A Snowflake warehouse with 2,300 tables
* A Databricks lakehouse ingesting 40 GB of CDC events per hour
* A dozen Kafka topics that the finance team swears by
* An operational Postgres that’s mutated 400 times since you started reading this
* And yes, somewhere in SharePoint, a folder of PDFs that nobody owns
If you build a RAG system for this world the way the tutorials tell you to, it will fail in production within a week. Not because the model isn’t smart. Because you’re chunking and embedding the wrong things.
I’ve spent the last year building retrieval systems for regulated financial data — schema-aware NL2SQL for
URL Source: https://sahajshukla.medium.com/every-rag-tutorial-starts-the-same-way-0579f0bf74ef?source=rss------rag-5
Published Time: 2026-04-21T19:49:12Z
Markdown Content:
# Every RAG tutorial starts the same way. | by Sahaj Shukla | Apr, 2026 | Medium
[Sitemap](https://sahajshukla.medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Every RAG tutorial starts the same way.
[](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)
[Sahaj Shukla](https://sahajshukla.medium.com/?source=post_page---byline--0579f0bf74ef---------------------------------------)
Follow
17 min read
·
1 hour ago
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&user=Sahaj+Shukla&userId=e41b774efb15&source=---header_actions--0579f0bf74ef---------------------clap_footer------------------)
1
[](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fbookmark%2Fp%2F0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------bookmark_footer------------------)
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D0579f0bf74ef&operation=register&redirect=https%3A%2F%2Fsahajshukla.medium.com%2Fevery-rag-tutorial-starts-the-same-way-0579f0bf74ef&source=---header_actions--0579f0bf74ef---------------------post_audio_button------------------)
Share
Press enter or click to view image in full size

Load a PDF. Chunk it into 1,000-character slices. Embed with OpenAI. Drop into Chroma. Pass the top-5 chunks to GPT-4. Ship it.
That’s not RAG. That’s a demo.
Actual enterprise data doesn’t live in PDFs. It lives in:
* A Snowflake warehouse with 2,300 tables
* A Databricks lakehouse ingesting 40 GB of CDC events per hour
* A dozen Kafka topics that the finance team swears by
* An operational Postgres that’s mutated 400 times since you started reading this
* And yes, somewhere in SharePoint, a folder of PDFs that nobody owns
If you build a RAG system for this world the way the tutorials tell you to, it will fail in production within a week. Not because the model isn’t smart. Because you’re chunking and embedding the wrong things.
I’ve spent the last year building retrieval systems for regulated financial data — schema-aware NL2SQL for
DeepCamp AI