Kernel Memory with Azure AI Vision

30 November 2024

Extracting text from images for Retrieval-Augmented Generation (RAG) is a common task. Kernel Memory supports OCR functionality out of the box.

It is less obvious that importing images requires IOcrEngine implementations for kernel memory.

Kernel Memory only comes with Azure AI Document Intelligence extension suppot. Here is an example of integrating Azure Computer Vision into Kernel Memory document ingestion.

Create Azure Computer Vision resource.

Note different Azure regions support different visual features such as Caption. Only Read required for this sample.

Implement IOcrEngine

It has only one method in the interface and relatively straightforward

public async Task<string> ExtractTextFromImageAsync(Stream imageContent, CancellationToken cancellationToken = default)
{
   var imageData = await BinaryData.FromStreamAsync(imageContent, cancellationToken);
        
   var result =  await _imageAnalysisClient.AnalyzeAsync(
        imageData,
        VisualFeatures.Read, // Check regions for feature support
        new ImageAnalysisOptions { GenderNeutralCaption = true },
        cancellationToken);

    var buffer = new StringBuilder();
    if (result.HasValue)
    {
      foreach (var block in result.Value.Read.Blocks)
      {
        buffer.AppendLine();
        foreach (var line in block.Lines)
        {
          buffer.AppendLine(line.Text);
        }
      }
    }
        
    return buffer.ToString();
}

Register OCR with Kernel Memory

memoryBuilder.WithCustomImageOcr<AzureImageToText>();

Sample code here