Show many evals, but how long will they be used?

This week, I gave myself homework to explore the different tests that LLM models are evaluated on today. I found the answer and thought:

Great! a LookUp table. I can bookmark this for the next time someone refers to an eval score.

There were many, so I asked Claude:

Lesson: Boil it down for me is a completely valid request for an LLM. Instead of 100+ evals, I had 16 of them bucketed into categories.

Next
Next

Women in Tech Global Conference 2025