Content for building-an-evaluation-harness-for-production-ai-agents-a-12-metric-framework-from-100-deployments will be dynamically loaded here.