Your eval criteria are code. Version them like code.
📰 Medium · LLM
A judge prompt is an implementation. The criterion it encodes is a contract, and ours rotted for three months before anyone noticed. Continue reading on Medium »
A judge prompt is an implementation. The criterion it encodes is a contract, and ours rotted for three months before anyone noticed. Continue reading on Medium »