Loading...
Found 1 Skills
Calculate agreement between human ground truth and machine labels for a text LLM judge metric, then analyze transcripts and reviewer notes to propose an improved metric prompt. One metric at a time.