Automated Writing Support for Swedish Learners

This paper describes a tool developed for lexical and grammatical analysis of Swedish text and providing automated feedback for language learners. The system looks for words and word sequences that are likely to contain errors and suggests how to correct them us-ing different non-neural models. The feed-back consists of alternative word and word sequence suggestions and morphological features 1 which need to be corrected. Although the system is able to provide reasonable feed-back which is believed to be useful for language learners, it still needs further improvements to address the drawbacks such as low precision.


Introduction
The majority of automatic error detection and correction systems focus on searching for mistakes and providing right solutions directly without any feedback.Instead, providing the feedback would be useful especially for non-native writers and help them to understand the mistakes and correct the errors on their own.
In the DigiTala project (2019)(2020)(2021)(2022)(2023), financed by Academy of Finland, we are developing tools for automatic Finnish and Swedish spoken language proficiency evaluation of non-native speakers.This paper addresses a system built for lexical and grammatical analysis of Swedish and giving automatic supportive feedback for language learners.
For the analysis, the current version of the system involves non-neural models only, supposing that they are able to provide enough accuracy while requiring less training data than deep neural networks.However, the models can be replaced later by neural ones for future experiments.
The rest of the paper is organized as follows.
Section 2 provides a brief overview of related research.Section 3 describes the system components and the error analysis.Section 4 presents an example analysis performed by the system.Section 5 concludes the paper with ideas for future work.

Related work
There are some systems which act as pedagogical tools and provide constructive feedback, such as one developed by Morgado da Costa et al. (2020) for assisting students in their scientific English writing.The system described in the paper uses computational parsers and general NLP techniques, e.g.checking for repeated words, sentence length, word capitalization, etc.
Several grammar checkers for second language writers of Swedish have been developed in the research project CrossCheck (Bigert et al., 2004).One of them, called Granska (Domeij et al., 2000), consists of a POS tagger, a spelling checker and manually constructed rules for error detection and correction.The second one, ProbGranska (Bigert and Knutsson, 2002), is a statistical method which searches for unlikely grammatical constructions using POS tag trigram frequencies.The third one, SnålGranska (Sjöbergh and Knutsson, 2005), is a weakly supervised machine learning based system trained on a text corpus with artificially created errors.

System description
The tool developed so far relies on a language model (LM), when looking for errors in the input sentences.While ngrams (contiguous sequences of n words) which are present in the LM are supposed to be correct, unknown ngrams and out-ofvocabulary (OOV) words can possibly contain errors.If OOV words or unknown bigrams (two-word sequences) are found from the sentences, they are examined by the system in more detail and the feedback is provided.For the OOV words found in the sentences, the tool proposes similar words.Also, it suggests most likely part of speech and morphological features, or, grammatical categories (grammatical case, person, number, etc.), to use when asking to replace the OOV word with another word.If unknown bigram is detected, the system searches for similar bigrams and asks to change the part of speech and/or correct morphological features, if needed.

Corpora and Models
In total, 6 models are used in this work for different purposes: a part-of-speech (POS) tagger, a morphological features tagging module, a word-level LM, a subword-level LM, a LM trained on POS tags and a model for word segmentation.
A pretrained morphological features tagging module from the Stanza library (Qi et al., 2020) was also used in this work.The module is based on the Swedish-Talbanken treebank 2 .The treebank has 6,026 sentences and 96,819 tokens.It was used also for training a Conditional Random Fields (CRF) POS tagger.
The Swedish YLE corpus 3 was used for training other models.The corpus is a collection of news articles published by Finland's national public broadcasting company in Swedish from the year 2012 to the year 2018.It consists of 6,810,509 sentences and 93,405,178 tokens.The vocabulary has 1,102,561 words.The data was converted to a lowercased plain text corpus with punctuation preserved.Keeping the punctuation in the text corpus is important, because otherwise, a lot of words would form grammatically incorrect bigrams, e.g. the last word of one sentence and the first word of the following sentence which are not necessarily related to each other.When it comes to evaluation of transcribed speech, post-processing techniques for restoring the punctuation in the transcripts can be considered in order for the system to provide more proper analysis results.
In addition to plain text, the source data of the Swedish YLE corpus contains positional attributes for each word such as number of the token within 2 https://github.com/UniversalDependencies/UD_ Swedish-Talbanken 3 http://urn.fi/urn:nbn:fi:lb-2020021103 the sentence, lemma, POS tag, morphological analysis, dependency head number and dependency relation.The annotations were extracted separately to create a new corpus consisting of POS tag sequences which was then used for training a trigram POS LM.
Morfessor 2.0 (Smit et al., 2014) is a tool for unsupervised and semi-supervised statistical morphological segmentation.In this work, a Morfessor model was trained in unsupervised manner.The whole YLE corpus was then passed through the model to divide it into subwords and train a subword-level trigram LM.

OOV words analysis
An OOV word found from the text is first divided into segments using the Morfessor model.For each of 5 most likely segmentations, a new word is formed by reducing the last segment from the OOV word.If the new word is not found from the vocabulary of the LM, it is tested for possible continuations by the subword-level trigram LM: the system searches for most likely next segment(s) based on the previous segment or two previous segments.If such segments are found, new words are formed.The tool then checks if these new words are found from the vocabulary.
A trigram POS LM is used to find the most likely part of speech given two preceding POS tags.If no POS tag can follow the previous POS tags according to the POS LM model, the most likely POS tag given one preceding POS tag is used instead.Morphological features are suggested using the bigram LM and a morphological features tagging module: 20 words which are most likely to follow the word before the OOV word are collected using the LM.Then, only the words belonging to the most likely part of speech are preserved.For the these words, morphological features are extracted and the most frequent value for each feature is selected.

Unknown bigrams analysis
To find bigrams similar to an unknown bigram, 5 most likely word segmentations are collected using the Morfessor model for each word of a bigram.In each of these segmentations, only the longest segment consisting of at least 3 characters is preserved.If a word consists of less than 3 characters, the whole word is preserved.After that, similar words are collected by searching for words containing any of these segments from the LM vocabulary.Then, combinations of these words are formed, in-cluding the combinations of the first word of the bigram with words similar to the second word of the bigram.The LM analyzes each of these new bigrams: the ones that are possible according to the LM are then preserved and proposed by the system as bigrams similar to the initial unknown bigram.
The tool also suggests part of speech and morphological features to use when replacing the second word of the unknown bigram.It selects the POS tag and the values for the morphological features in a similar way as for OOV words.However, the part of speech of the second word of the bigram is also taken into account.The system compares the probability for the POS tag of the word to follow two preceding POS tags to the mean of the probabilities of all possible POS tags that can follow the corresponding POS tag sequence.
If the probability is above the average, the system supposes that this part of speech is likely enough and suggests to use another word belonging to the same part of speech.In this case, morphological features of the second word of the bigram are extracted and compared to the most likely morphological features.The most likely features are collected in a similar way as for OOV words.However, here all possible values are preserved instead of selecting the most common value for each feature.The most common value is selected and suggested to use only in case if the value of the feature for the word is not in the list of possible values.
If the probability is below the average, the tool suggests to use the POS tag which is most likely to follow two preceding POS tags.The most likely morphological features are then collected in the same way as for OOV words.If no POS trigrams are found from the POS LM, the system searches for POS bigrams instead.

Example of System Output
In this section, we will use the following example to explore the output of the tool: 'Hi Peter!I tried to call you, but your phone was switched off.' When the example is fed to the system, the output is: "Text to evaluate: Hej Peter!Jag försökte ringa du, men din mobilen var avstängd.
2. din mobilen.Similar ngrams: din mobiltelefon, din mobil, din mobila, din mobilbutik, din mobils.You used the noun mobilen (Case: Nom, Definite: Def, Gender: Com, Number: Sing).You can also try to use some other noun instead of mobilen.It is also recommended to correct the following morphological featutes: Definite: Ind." In this example, there are two grammatical errors: "*ringa du" ("call you"), where the pronoun du should be used in the accusative case (dig), and "*din mobilen" ("your phone"), where the noun should be in the indefinite form (mobil).The system found both errors and marked the bigrams containing the errors as uncommon.
As can be seen from the list of similar ngrams, the system did not manage to provide the correct bigram ("ringa dig").This happens because the morphological segmentation model defines the word "du" itself as its only morpheme which is then used as a search query in the LM vocabulary.However, the tool suggests correctly to change the case of the pronoun from nominative to accusative.For the second incorrect bigram, the system managed to provide both correct bigram ("din mobil") and suggestion to change the form of the noun from definite to indefinite.

Discussion
The current implementation of the tool is able to analyze words and sentences at grammatical and lexical level and provide reasonable feedback.In addition, the system can be applied for other languages by replacing the models.The models can be changed also to neural ones, if needed in future.However, the work is still in progress and further improvements are needed to overcome the existing drawbacks.
Many correct words and bigrams are not recognized by the system due to morphological richness of Swedish language.However, the same word in other word form(s) can be proposed as word(s) similar to the OOV word.Many compound words are also unknown to the system.Larger text corpora can help to reduce the amount of unknown words and bigrams.
The tool is able to detect bigrams containing lexical errors like wrong word choice, but it cannot provide the most suitable word based on the context.Instead, the system tries to find bigrams which are similar to the original one.One possible solution to address this drawback is to use more advanced LMs, for example Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018).BERT can be used in the masked language modeling task, where an inappropriate word is masked and needs to be predicted by the LM.BERT uses both left and right context of a word and therefore is believed to make more accurate word predictions compared to the ngram LMs which look only at the preceding context.There are several BERT models available which are pretrained on Swedish text corpora, for example KB-BERT (Malmsten et al., 2020).
Because the system focuses on providing feedback, it is difficult to evaluate how well it works.In addition, there is lack of labeled data for grammatical error correction task in general, and no such a dataset was found for Swedish.However, one way to evaluate efficiency of the tool would be to compare its feedback to the one provided by human annotators.Another option would be organizing a survey for Swedish learners and asking how useful they find the feedback.