Building a RAG Evaluation Harness That Actually Catches Problems
📰 Dev.to · Shiva Shrestha
I shipped a RAG chatbot without measurement, then built a proper eval harness. Hit@1 went from 60% to 80%, hallucination dropped from 41% to 28% and two metrics still fail. Here's the whole story.
DeepCamp AI