top of page

ᐁ ᐋᐦ ᐊᓂᒫ

Innovation Lab

Our Innovation Lab is located in Winnipeg, Manitoba with a satellite station in Cross Lake, Manitoba.

Applications

Imagine watching any of your favourite movies in Cree! Imagine getting live translations of your auntie's gossip! Our goal is to have a model that generates high-quality Cree language to support tasks such as transcription and transliteration and applications in speech technology, education, and cultural knowledge systems.

unnamed_edited.jpg

Tokenization

To support accurate representation of the Cree language, AI Anima is developing a tokenization approach that works across audio, Cree syllabics, and standardized Roman orthography. By learning directly from Cree speech, the system captures meaningful sound patterns, including long vowels and prosody, that are not always reflected in writing. These audio-based units are aligned with sub-word patterns in both scripts, such as stems, prefixes, and suffixes, allowing the model to handle the natural complexity of Cree as a polysynthetic language. This blended method ensures that tokenization reflects how Cree is truly spoken, heard, and written, honouring sound, structure, and cultural expression.

Speech Processing

We work with fluent speakers and knowledge keepers to gather high-quality Cree audio recordings that reflect the richness of everyday speech and our regional dialect. These recordings form the heart of our language model, capturing sounds, rhythm, and expression that written systems alone cannot fully preserve. Using advanced speech-processing technology, we transform these recordings into secure, structured datasets that support language learning, transcription, and future research. This approach allows us to document spoken Cree while respecting cultural protocols and community ownership of data. By combining modern tools with traditional knowledge, we help ensure that spoken Cree is preserved, understood, and accessible for generations to come.

praat-1-624x366.jpg

Model Training

We take a Cree-first approach to model development, ensuring that the core of the system is grounded in Cree language rather than adapted from English. After assembling curated datasets of Cree text and speech, we tokenize the language, capturing both Cree syllabics and standardized Roman orthography, to create structured input for the model. We then train transformer-based architectures that learn statistical patterns in the language, including morphology, syntax, and oral expression.

 

Our workflow begins with pre-processing and normalization, followed by supervised and/or self-supervised training. The core model is trained on Cree-only data to establish Cree-centric linguistic priors. Additional capabilities, such as bilingual comprehension, is added later through modular extension (e.g., adapter layers) without altering the Cree foundation.  Throughout development, we monitor linguistic performance using metrics such as perplexity, dialect coverage, and morphological fidelity. We apply best practices including checkpointing, regularization, and controlled fine-tuning to support model stability and accuracy, while safeguarding cultural and linguistic integrity.

bottom of page