Train Large Language Model from Scratch
Using monolingual Cree text, the Cree syllabics tokenizer, speech–text pairs, and Cree–English pairs together, a multimodal LLM is trained from scratch. The monolingual data builds native Cree fluency; the pairs align it bilingually and enable audio understanding.
✦ Multimodal · Trained from scratch