Kernel Memory default tokenizer

04 December 2024

Choosing the right tokenizer for in AI is crucial because it directly impacts the accuracy and efficiency. There is an optional 'tokenizer' parameter when configure Kernel Memory for both text generation and embedding. If no 'tokenizer' specified, Kernel Memory attempts to pick up 'default' one. But it does this by the AI model name (Depolyment in Azure OpenAI) like this TokenizerFactory shows.

My take away:

Always prefix your deployment name by the actual model name For example, for 'gpt-4o' model, the name should be prefix with model like this 'gpt-4o-bot'
Leave the 'tokenizer' parameter to default and let Kernel Memory pick one automatically.