Lightweight AI Evaluation

11 December 2024

For quick and easy evaluation or comparison of AI responses in .NET applications, particularly tests. We can leverage autoevals excellent 'LLM-as-a-Judge' prompts with the help of Semantic Kernel.

Sample code

Note that you need to setup semantic kernel with chat completion first. It is also recommended to set 'Temperature' to 0.

var json = 
    """
    {
        "humor" : {
            "output" : "this maybe funny"
        }
    }
    """;
await foreach (var result in 
        kernel.Run(json, executionSettings: executionSettings))
{
    Console.WriteLine($"[{result.Key}]: result: {result.Value?.Item1}, score: {result.Value?.Item2}");
}

Source

While Microsoft.Extensions.AI.Evaluation is in the making, it currently involves a little too much 'ceremonies' for simple use cases.