From Akusanat
Jump to: navigation, search

Akusanat is one part of an attempt to bring Open Language technology to minority Uralic and other threatened languages. Historically, its original content is derived from bi- and multilingual glossing dictionaries produced during a two-year project `Creation of Morphological Parsers for Minority Finno-Ugrian Languages' funded through the Kone Foundation «Language Programme» 2013–2014, see [1](https://rueter.github.io/aku).

During the `Creation of Morphological Parsers for Minority Finno-Ugrian Languages' project, open-source dictionaries were developed for Livonian (liv), Olonets-Karelian (olo), Moksha (mdf), Hill Mari (mrj) and Nenets (yrk). In addition, glossing dictionaries for Erzya (myv), Komi-Zyrian (kpv) and Udmurt (udm) were also introduced. Since these assets were shared directly with what is now known as [2](GiellaLT) infrastructure, other dictionary pairs were also brought here.

The purpose of the multilingual dictionary at Akusanat (AKU + sanat `words') was to provide multiple downloadables. First, glossing dictionaries had notations assigning inflectional types of different word types, making it reasonable for downloading `LEMMA + PoS + STEM + INFLECTION' segments for use in the root .lexc files of finite-state descriptions. Second, glossing dictionaries could be downloaded as XML files for online morphology-savvy dictionaries maintained at GiellaLT. Third, translation-pair dictionaries could be downloaded for use in open-source [3](Apertium) shallow-transfer rule-based machine translation. Fourth, the dictionary could be developed as an editing platform for LATEX download of dictionaries, the Finnish-Skolt Saami dictionary at [4](Verdd), for example. Downloadability also came to mean that there should be an analogous upload for the different formats.

The development of the Verdd dictionary editing platform has seen the introduction of translation predictions for new language pairs, which have been tested, for example, on Livonian, Olonets-Karelian, Finnish and Karelian. The prediction algorithm has been refined, with variables including the use of all translation pairs versus only approved translation pairs. This is deemed to simplify the construction of new translation pair dictionaries (bidix) for work in Apertium, and its enhancement will hopefully continue through another Google Summer of Code project.

Nowadays, the multilingual dictionaries at Akusanat directly contribute to the description and facilitation of minority languages on four continents: Europe, Asia, North America and South America. This asset greatly enhances the working of the Erzya-Moksha Electric Resource And Language Diversity project [5](EMERALD).

- Alnajjar, K., Hämäläinen, M., Partanen, N., & Rueter, J. (2022). Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages. In S. Moeller, A. Anastasopoulos, A. Arppe, A. Chaudhary, A. Harrigan, J. Holden, J. Lachler, A. Palmer, S. Rijhwani, & L. Schwartz (Eds.), Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages (pp. 139-148). The Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.computel-1.18

- Alnajjar, K., Rueter, J., Partanen, N., & Hämäläinen, M. (2021). Enhancing the Erzya-Moksha dictionary automatically with link prediction. Folia Uralica Debreceniensia, 28, 7-18.

- Alnajjar, K., Hämäläinen, M., Rueter, J., & Partanen, N. (2020). Ve’rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement. In M. Ptaszynski, & B. Ziolko (Eds.), Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations International Committee on Computational Linguistics.

- Alnajjar, K., Hämäläinen, M., Partanen, N., & Rueter, J. (2019). The Open Dictionary Infrastructure for Uralic Languages. In Электронная Письменность Народов Российской Федерации: Опыт, Проблемы И Перспективы (pp. 49-51). Башкирская энциклопедия.

- Alnajjar, K., Hämäläinen, M., & Rueter, J. (2020). On Editing Dictionaries for Uralic Languages in an Online Environment. In Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages (pp. 26–30). The Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.iwclul-1.4.pdf

- Hämäläinen, M., Alnajjar, K., Rueter, J., Lehtinen, M., & Partanen, N. (2021). An Online Tool Developed for Post-Editing the New Skolt Sami Dictionary. In I. Kosem, M. Cukr, M. Jakubíček, J. Kallas, S. Krek, & C. Tiberius (Eds.), Electronic lexicography in the 21st century (eLex 2021). Proceedings of the eLex 2021 conference (pp. 653-664). (Electronic lexicography in the 21st century (eLex 2021). Proceedings of the eLex 2021 conference). Lexical Computing CZ s.r.o..

- Hämäläinen, M., & Rueter, J. (2019). An Open Online Dictionary for Endangered Uralic Languages. In I. Kosem, T. Zingano Kuhn, M. Correia, J. P. Ferreira , M. Jansen , I. Pereira, J. Kallas, M. Jakubíček, S. Krek, & C. Tiberius (Eds.), Electronic lexicography in the 21st century: Proceedings of the eLex 2019 conference (pp. 819-830). (Electronic lexicography in the 21st century). Lexical Computing CZ s.r.o..

- Hämäläinen, M., & Rueter, J. (2018). Advances in synchronized XML-media wiki dictionary development in the context of endangered uralic languages. In J. Čibej, V. Gorjanc, I. Kosem, & S. Krek (Eds.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts: 17-21 July 2018, Ljubljana (pp. 967-978). (EURALEX Proceedings). Ljubljana University Press. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1

- Hämäläinen, M., Tarvainen, L. L., & Rueter, J. (2018). Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages. In N. Calzolari , K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 862-867). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018/pdf/364.pdf

- Rueter, J., & Hämäläinen, M. (2019). On XML-MediaWiki Resources, Endangered Languages and TEI Compatibility, Multilingual Dictionaries For Endangered Languages. In M. Gürlek, A. N. Çiçekler, & Y. Taşdemir (Eds.), AsiaLex 2019: Proceedings of the 13th Conference of the Asian Association for Lexicography Asos Publisher.

- Rueter, J., & Hämäläinen, M. (2017). Synchronized Mediawiki based analyzer dictionary development. In F. M. Tyers, M. Rießler, T. A. Pirinen , & T. Trosterud (Eds.), 3rd International Workshop for Computational Linguistics of Uralic Languages (IWCLUL 2017): St. Petersburg, Russia 23 – 24 January 2017 (pp. 1-7). Article 2 The Association for Computational Linguistics. https://doi.org/10.18653/v1/w17-0601