Kernel Memory with Azure AI Vision
November 2024
Extracting text from images for Retrieval-Augmented Generation (RAG) is a common task. Kernel Memory supports OCR functionality out of the box.
It is less obvious that importing images requires IOcrEngine implementations for kernel memory.
Kernel Memory only comes with Azure AI Document Intelligence extension suppot. Here is an example of integrating Azure Computer Vision into Kernel Memory document ingestion.
Create Azure Computer Vision resource.
Note different Azure regions support different visual features such as Caption. Only Read required for this sample.
Implement IOcrEngine
It has only one method in the interface and relatively straightforward
public async Task<string> ExtractTextFromImageAsync(Stream imageContent, CancellationToken cancellationToken = default)
{
var imageData = await BinaryData.FromStreamAsync(imageContent, cancellationToken);
var result = await _imageAnalysisClient.AnalyzeAsync(
imageData,
VisualFeatures.Read, // Check regions for feature support
new ImageAnalysisOptions { GenderNeutralCaption = true },
cancellationToken);
var buffer = new StringBuilder();
if (result.HasValue)
{
foreach (var block in result.Value.Read.Blocks)
{
buffer.AppendLine();
foreach (var line in block.Lines)
{
buffer.AppendLine(line.Text);
}
}
}
return buffer.ToString();
}
Register OCR with Kernel Memory
memoryBuilder.WithCustomImageOcr<AzureImageToText>();