Historical Language Models in Cryptanalysis: Case Studies on English and German
Keywords:language models, decipherment, historical texts
AbstractIn this paper, we study the impact of language models (LM) on decipherment of historical homophonic substitution ciphers. In particular, we investigate if decipherment by using hill-climbing and simulated annealing can benefit from LMs generated from historical texts in general and century-specific texts in particular. We carry out experiments on homophonic substitution ciphers with English and German as plaintext languages. We take into account ciphertext length as well as n-gram size of the LMs. We compare the results on decipherment based on historical LMs with large LMs generated from modern texts. The results show that using historical LMs in decipherment of homophonic substitution ciphers leads to significantly better performance on ciphertext produced in the 17th century or earlier, and century-specific language models yield better results on longer and older ciphertexts.
Copyright (c) 2023 Béata Megyesi, Justyna Sikora, Filip Fornmark, Michelle Waldispühl, Nils Kopal, Vasily Mikhalev
This work is licensed under a Creative Commons Attribution 4.0 International License.