Akusanat:About

From Akusanat
Revision as of 20:21, 25 March 2024 by SyncBot (talk | contribs) (First, without literary links.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Akusanat is one part of an attempt to bring Open Language technology to minority Uralic and other threatened languages. Historically, its original content is derived from bi- and multilingual glossing dictionaries produced during a two-year project `Creation of Morphological Parsers for Minority Finno-Ugrian Languages' funded through the Kone Foundation «Language Programme» 2013–2014, see [1](https://rueter.github.io/aku).

During the `Creation of Morphological Parsers for Minority Finno-Ugrian Languages' project, open-source dictionaries were developed for Livonian (liv), Olonets-Karelian (olo), Moksha (mdf), Hill Mari (mrj) and Nenets (yrk). In addition, glossing dictionaries for Erzya (myv), Komi-Zyrian (kpv) and Udmurt (udm) were also introduced. Since these assets were shared directly with what is now known as [2](GiellaLT) infrastructure, other dictionary pairs were also brought here.

The purpose of the multilingual dictionary at Akusanat (AKU + sanat `words') was to provide multiple downloadables. First, glossing dictionaries had notations assigning inflectional types of different word types, making it reasonable for downloading `LEMMA + PoS + STEM + INFLECTION' segments for use in the root .lexc files of finite-state descriptions. Second, glossing dictionaries could be downloaded as XML files for online morphology-savvy dictionaries maintained at GiellaLT. Third, translation-pair dictionaries could be downloaded for use in open-source [3](Apertium) shallow-transfer rule-based machine translation. Fourth, the dictionary could be developed as an editing platform for LATEX download of dictionaries, the Finnish-Skolt Saami dictionary at [4](Verdd), for example. Downloadability also came to mean that there should be an analogous upload for the different formats.

The development of the Verdd dictionary editing platform has seen the introduction of translation predictions for new language pairs, which have been tested, for example, on Livonian, Olonets-Karelian, Finnish and Karelian. The prediction algorithm has been refined, with variables including the use of all translation pairs versus only approved translation pairs. This is deemed to simplify the construction of new translation pair dictionaries (bidix) for work in Apertium, and its enhancement will hopefully continue through another Google Summer of Code project.

Nowadays, the multilingual dictionaries at Akusanat directly contribute to the description and facilitation of minority languages on four continents: Europe, Asia, North America and South America. This asset greatly enhances the working of the Erzya-Moksha Electric Resource And Language Diversity project [5](EMERALD).