Wuraola Oyewusi

← Back to Home

Yoruba Tech March 11, 2020

Predict Yorùbá Hymn Lyrics with TensorFlow

Notebook: The notebook for these codes can be found here. I didn't clear code outputs because of those who want to be sure they are following the tutorial correctly.

The Dataset: It is a collection of 10 Yorùbá hymns in about 260 lines; the first 10 lines are the titles of the hymns. Yorùbá hymns are available online but they are without the tone marks or diacritics, so I took time to put the tone marks. Here is a link to the dataset.

Motivation

I like hymns, especially Yorùbá ones. I like Machine Learning and Deep Learning algorithms, especially the ones that work well with sequences. I just got better at doing Natural Language Processing with TensorFlow and boom! There's this article.

Hymns are songs and poems. Each line has an average of about 8 syllables. Next word prediction is a popular NLP task. So we'll train a deep learning model on a Yorùbá hymns dataset and see how it will predict coherent lyrics.

If you're curious if the model will learn Yorùbá words with the accents or diacritics, yes it will.

Generated Yorùbá Hymn Lyrics

Yorùbá hymn lyrics generated by a Bidirectional LSTM in TensorFlow:

olúwa gbà gbà mí ègbè nù kúrọ̀
olùgbàlà gbóhùn mi ko ṣì gbọ́ràn
Ọlọ́run ọ̀rọ̀ rẹ̀ mo figbàgbọ́ rísun
ìṣẹ́gun ni jà re wò re pòrurù
ìyanu mi ba ti jẹ ní gbèsè
gbórí ọ̀rọ̀ rẹ̀ mo figbàgbọ́ rísun
ayọ̀ ńbọ̀ fún mi titi náà ló
ìfẹ́ rẹ̀ ju t'ìyekan lọ sógo
ìfẹ́ ọkàn kò sì ní tán wa
olúwa mi sí ńké pé o ró
ọ̀rẹ́ ayé nkọ̀ wá sílẹ̀ ní
ọ̀rẹ́ òtítọ́ ayé nkọ̀ wá sílẹ̀ ní

Implementation Steps

Loading dataset

Data preprocessing

Tokenization

Creating sequences

Padding sequences

Creating input and labels

Building the model

Model architecture

Compiling the model

Training the model

Generating lyrics

Model training progress

Results and Conclusion

The model was trained on a very small dataset. The performance was good because meaningful words in context were generated for most of the lyrics. It can be improved by increasing the data size, trying out other algorithms, and maybe I will find time to train a character-based model on the same dataset and compare the performance.

This is a great web tool to put intonation marks on your Yorùbá text: Yorùbá Intonation Tool

There are quite a number of Yorùbá hymns available here.