Encrypted Documents and Cipher Keys From the 18th and 19th Century in the Archives of Aristocratic Families in Slovakia

In this article, we present encrypted documents and cipher keys from the 18th and 19th century, related to central-European aristocratic families Amade-¨Uchtritz, Esterh´azy, and P´alffy-Daun. In the first part of the article, we present an overview and analysis of the available documents from the archives with examples. We provide a short historical overview of the people re-lated to the analyzed documents to provide a context for the research. In the second part of the article, we focus on the digital processing of these historical manuscripts. We developed new tools based on machine learning that can automate the transcription of encrypted parts of the documents, which contain only digits as cipher text alphabet. Our digit detection and segmentation are based on YOLOv7. YOLOv7 provided good detection precision and was able to cope with problems like noisy paper background and areas where digits col-lided with the text from the reverse side of the paper.


Introduction
Many historical encrypted documents and cipher keys have survived in various archives all over the world.Collecting materials from a particular time period or geographic location can bring insights into the cipher design from a specific time/location.A valuable collection of historical encrypted documents and cipher keys can be found in Austria (Láng, 2020), Germany (Antal and Mírka, 2022), etc.It is thus desirable to investigate and compare various collections from a wide time range and location (Megyesi et al., 2022), because it is essential for understanding how ciphering evolved in the past.The encrypted documents and keys are easy to recognize but are difficult to locate in archives.This fact was already observed by Láng (2020) and other researchers in historical cryptography.
Systematic collection, annotation, and sharing of these materials among researchers are thus very important.For this reason, we bring to the reader's attention to two interesting projects that focus on such collections.The DECRYPT1 project (Megyesi et al., 2020;Megyesi et al., 2022) contains 4360 records at the time of writing this paper.Perhaps the only disadvantage of this database is that most of the documents are not publicly available2 .The HCPortal 3 project (Antal and Zajac, 2020;Antal and Zajac, 2021) contains at the time of the writing of this document 763 records.Every record is freely available to everyone.One of the data collections that is planned to be added to HCPortal in the near future is the content of this article.
The aim of this paper is to provide the first detailed description of encrypted documents and cipher keys from the 18th and 19th century, related to central-European aristocratic families that lived in the territory of modern Slovakia.The investigated collection is deposited in the Slovak National Archive in Bratislava.During the initial research phase, we found documents related to historical cryptology in the fonds of three different aristocratic families: • Amade-Üchtritz, • Esterházy, • Pálffy-Daun.
In these fonds, we identified a total of 13 card-board boxes containing 96 encrypted documents4 and 9 cipher keys.
The encrypted documents are from the 18th and 19th century.The earliest document is from 1711, and the latest is from 1834.The most commonly used encryption system from this time period is called nomenclator.Several publications have already described this cipher system in detail, including its large variety.For more information about this encryption system see (Antal and Mírka, 2022;Megyesi et al., 2022) or other publications published in the proceedings of the His-toCrypt5 conference.

Analysis of Available Materials
The used symbol set (cipher text alphabet) of the investigated collection, both in encrypted documents and cipher keys consist of only numbers (e.g. on Figures 1, 3), and markups of numbers (Figure 2).It also confirms that using digits became more common/standard in the 17th/18th centuries as stated in (Megyesi et al., 2022).A nomenclator system can be used with and without separators based on its design.Separators are mostly required to clearly distinguish/split the cipher text units (Antal and Mírka, 2022).In total, 37 encrypted documents contain cipher text with a separator, and 59 are without a separator character.
Some documents consist of cipher text only, some contain the corresponding (decrypted) plain text.In 8 cases, only the cipher text is preserved.Plain text is written above/under the encrypted lines of the text in 16 documents.In 58 cases, the plain text is available in a separate document.Finally, in 14 cases we were unable to verify whether the plain text is preserved.
Encrypted documents were preserved in all three fonds, however, cipher keys were only found in two of them.An interesting fact about this collection is that the preserved encryption keys do not match6 the preserved cipher texts.We also investigated some (possibly related) cipher keys from Austria (Haus-, Hof-und Staatsarchiv of Vienna) and from Hungary, however, these keys also did not match our encrypted documents.
A more detailed description of the available encrypted documents and cipher keys are present in the following subsections.

Encrypted Manuscripts
The Esterházy7 family archive contains most of the encrypted documents (fifty-eight), spread in four boxes.In cardbox no.631 there is (only) one document related to cryptography, a four-pagelong encrypted letter from 1744 written in the German language.The most promising materials8 are deposited in cardbox no.634, which contains an extensive communication between Count Nicolaus Esterházy and Wenzel Anton Prince of Kaunitz-Rietberg from the years 1756 and 1757.The whole communication is in the German language.Interestingly, they used at least two different encryption keys in the communication -one where the cipher text units are separated with a separator char (Figure 1) and one without a separator (Figure 3).Moreover, the cipher texts with a separator contain special digit markups (Figure 2).There are two additional documents sent to Count Nicolaus Esterházy in this box, one sent by Graf Colloredo and one by Empress Maria Theresa.For most of the encrypted documents in this box, the plain text is also available in separate documents.Cardbox no.635 contains letter concepts from the years 1741 and 1744 with encrypted parts in nine of them.It should be noted, however, that only small parts of these letter concepts are encrypted.These encrypted parts are mainly written below the corresponding plain text parts (Figure 4), or inserted next to the non-encrypted text parts (Figure 5).Again, the text is written in the German language.Cardbox no.636 contains six encrypted documents: one fully encrypted concept and five letters, of which there are five documents written in French (Figure 6), and one in the German language.All of the documents are dated to 1744.One encrypted letter was sent by Empress Maria Theresa to Count Nicolaus Esterházy.In the remaining documents, we do not recognize the participants of the communication.The corresponding plain text parts are not available for these documents.Four documents contain the corresponding plain text parts written above/below the cipher text.In one case, only the cipher text is preserved (Figure 7).
Cardbox n. 139 contains six encrypted French messages from 1806, correspondence of Senfft von Pilsach, Saxon envoy in Paris.In all cases, the corresponding plain text is written above the cipher text (Figure 8).
Cardbox n. 140 contains twenty-one documents written in French and German languages dated between 1816-1823.These messages are communication of Detlev Graf von Einsiedel with Schulenburg, the Saxon envoy in Vienna, and to Griesinger, the Saxon Legation Counsellor in Vi- enna.One message was sent by Minckwitz to Schulenburg.At least two different cipher keys were used.In all cases, the corresponding plain text is available.Cardbox n. 141 contains four encrypted documents written in German and French languages sent to Fleming, the Polish-Saxon envoy in Vienna (Figure 9).These documents are from 1758, and the corresponding plain text is written above the cipher text in all cases.
Very interesting materials are located in cardbox n. 150, which contains the diary of Emil Üchtritz from October 1804 to August 1805 written mainly in the German language.The diary has 123 pages, containing encrypted entries on 27 pages (Figure 10).Analysis and detailed description of the diary will be the content of a future publication.
In the fonds of the Pálffy-Daun family, we

Cipher Keys
The examined fonds contain nine cipher keys.Seven cipher keys in the Pálffy-Daun family archive are (probably) related to the War of Spanish Succession.These (Spanish and Italian) cipher keys were used by Wirich Daun to communicate with several Counts and Marquises.The structure of these keys is very similar and copies the structure of a classical nomenclator system (Figure 15), each key is drawn on a large paper sheet and consists of simple/homophonic substitution9 , bigram (and trigram) substitution, codes, and nulls.The individual sub-encryption systems are graphically separated in a table.
There are two cipher keys in the Amade-Üchtritz family archive.One cipher key is for a small (German) nomenclator system, containing letter substitution and a few codes, which was used by Emil Üchtritz to encrypt parts of his diary.The second (French) cipher key was also used by Emil Üchtritz and it is dated to 1831.This key consists of two small notebooks, Chiffrant for encryption and Déchiffrant for decryption (Figure 11), and contains three-digit numbers in a range 100 -999.

Historical Context
The encrypted documents in the Esterházy family archive date from the reign of Maria Theresa (1740-1780).At the beginning of her reign, she fought the so-called War of the Austrian Succession 1740-1748, and subsequently the so-called Seven Years' War (1756-1763).A significant role was played by diplomacy, and skilled diplomats, who managed to defend Maria Theresa's position in European politics.In this section, we give historical background on the prominent figures related to the analyzed encrypted documents.
Count Nicolaus Esterházy (1711-1764) was an imperial and royal chamberlain, counselor, guardian of the crown, and diplomat (Great Britain, Madrid, St. Petersburg).In Madrid, he was to conclude and sign the new treaty of alliance ending the War of the Austrian Succession, but he contracted severe poisoning from the in-fected water, from which he almost died.The Spanish-Austrian treaty of alliance was eventually signed without Esterházy's presence.(Khavanova, 2019) In the autumn of 1753, he was given the position of ambassador to St. Petersburg, but because of the Seven Years' War, Esterházy's mission was extended indefinitely.His task was to ensure the greatest possible integration of Russia into the Austro-French alliance contracted in 1753.On his return to Vienna in June 1762, Esterházy was appointed to the rank of captain of the Hungarian personal guard, at the same time a general.He died at the age of 53 in 1764 in Karlovy Vary.(Khavanova, 2016;Khavanova, 2017) Wenzel Anton Prince of Kaunitz-Rietberg (1711-1794) was an envoy to Turin, Brussels, and Paris, and in 1748 he played an important role in the Peace of Aachen, which ended the War of the Austrian Succession.From his return to Vienna ( 1753 The encrypted documents in the Amade-Üchtritz family archives relate to personalities representing the Kingdom of Saxony at the Viennese court.Most of them date from the first two decades of the 19th century, especially from the end of the Napoleonic Wars, when Saxony found itself as an ally of Napoleon among the defeated countries, and through its envoys sought to regain lost positions, especially the revision of territorial losses. Detlev Graf von Einsiedel (1773-1861) was a Saxon businessman, minister, and confidant of the Saxon kings Frederick Augustus I and his successor Anton.In foreign policy, he advocated a close alliance with Austria, seeking to strengthen Saxony's sovereignty by brokering new dynastic alliances with European ruling families.(Wetzel, 2014) In this he was supported by his brother-inlaw Emil von Üchtritz (1783-1841), a long-time Saxon envoy to the Viennese court.He stood in the service of King Frederick Augustus of Saxony, especially during his captivity after the Battle of Leipzig in 1813, and worked for the restoration of the Saxon kingdom.He then served as an envoy to France and, from 1830 to Vienna.His son Emil was an officer, married into the Hungarian Amade family.(Wurzbach, 1883)

Digital Processing of Historical Manuscripts
Our valuable collection of historical manuscripts was converted into a digital form so it can be processed by modern computer algorithms to analyse statistical properties and even solve the documents encrypted by nomenclator.The image resolution of the original images is 4160 × 6240 pixels.Documents vary in quality and readability.Some of the documents contain encrypted and non-encrypted parts on a single page.The portion of the collection contains documents with digits formed in groups delimited by separators.Automated image processing needs to take all these factors into consideration.Antal and Marák (2022) proposed an automated software system for historical handwritten digit detection, recognition, and transcription based on deep learning.This way, a large number of historical documents can be analyzed and solved in an automated manner.They provided a detailed description of a digit detector based on Mask R-CNN (He, 2017) method along with technical details related to the dataset, manually created annotations, training, inference, and transcription algorithm.Their Mask R-CNN based detector works well in general, however, it has some limitations which need to be addressed.
Limitations of the digit detector proposed in (Antal and Marák, 2022): • Mask R-CNN detector along with ResNet as its backbone network require a large amount of GPU memory which led to unavoidable optimizations.They needed to resize their original images to a smaller resolution to be able to fit data into memory and run training.
• Handwritten digit detection is a challenging task where small and densely grouped objects (digits) are located in a large document.• Mask R-CNN was introduced in 2017 and is no longer considered as state-of-the-art detector.According to the benchmark (Papers with code, 2023), there are new families of modern multi-class object detectors achieving average precision approximately 25 % higher than Mask R-CNN.Moreover, modern detector architectures make detection of small objects possible without having to split the image into small blocks.
These problems motivated us to design, train, and test a new digit detector based on YOLOv7 (Wang et al., 2022), which is a faster and more accurate deep learning model than Mask R-CNN.Furthermore, YOLOv7 can easily be configured to detect small objects using its auto-anchor algorithm.Its official implementation and documentation can be found at (YOLOv7 GitHub, 2023).In its essence, YOLOv7 is an object detector producing rectangular bounding boxes for object instances.However, it has a special architectural tweak which allows it to run in instance segmentation mode to produce high precision polygonal masks for detected objects.
Handwritten digit transcription based on YOLOv7 is a procedure consisting of several steps: 1. Preparing the software environment and dependencies for YOLOv7.
3. Preparing the training, validation and testing dataset.
4. Converting digit annotations from COCO format to YOLOv7 PyTorch format.
5. Configuring the model and setting hyperparameters.
6. Training the model using transfer learning.
7. Testing the model.

Preparation Stage
All the experiments were carried out on a computer with the following specification: AMD Ryzen 9 5900HX, GeForce RTX 3080 16 GB, 32GB DDR4 3200MHz.In order to have a robust detector and minimize training time, we employed a transfer learning technique.We obtained a pretrained model on the COCO dataset.The pretrained model accepts images of size 1280 × 1280, so we needed to downscale our original images.Our image dataset has 18 images and was split into training (12 images), validation (3 images) and testing (3 images) sets.We collected more than twelve thousand 10 polygonal annotations of digits 0-9 which are stored in well-known COCO format.More details about the annotations can be found in (Antal and Marák, 2022).Since our YOLOv7 performs digit detection on the entire image rather than on image blocks, we needed to transform geometric properties of annotations so they correspond to images of size 1280 × 1280.The annotations were converted from COCO format to YOLOv7 PyTorch format using Roboflow framework (Roboflow, 2023).

Model Configuration, Hyperparameters and Training
The training took 300 epochs and the batch size was set to 1 due to GPU memory limitations.The learning rate was set to 0.0025 using a stochastic gradient descent optimizer.We also performed 10 12,433   slight image augmentation to enhance the variability of our dataset.Namely, we generated augmented images by modifying hue, saturation, and brightness.Geometric transformations were omitted.Table 1 shows the values of standard performance indicators measured on the test dataset for the best model after 300 epochs of training.Masks are polygons produced by the detector which determine the area of the object.Bounding boxes are rectangular areas defining the object location.Compared to masks, bounding boxes are less accurate estimations of object location when the object has an irregular shape.mAP @.50 and mAP @.50:.95 indicators represent mean average precision which is a common metric to evaluate the performance of the object detector.mAP indicator quantifies the detection accuracy measured for different IoU (Intersection over Union) values.IoU measures the degree of overlap between the predicted bounding box/mask and the ground truth box/mask.mAP is the average of AP (Average Precision) values where AP corresponds to the area under the precision-recall curve.The precision is calculated as the ratio of correctly predicted positive examples divided by the total number of positive examples that were predicted.The recall is a metric that quantifies the number of correct positive predictions made out of all positive predictions that could have been made.

Testing
We evaluated the model inference using the confidence threshold of 50 % (i.e.only digits with at least 50% confidence are included in the final set of detections).We experimented with standard quality images to judge the general performance.The detector turned out to have only minor problems with missed out or misclassified digits.
Figure 12 shows the detected bounding boxes and masks in a sample document.Since our solution currently does not perform any image preprocessing, we also investigated the detection performance on images of lower quality.YOLOv7 was able to cope with problems like noisy paper background and areas where digits collided with the text from the reverse side of the paper.There are documents with encrypted and non-encrypted parts on a single page.In such situations we need to avoid false digit detections in non-encrypted parts.As we can see in Figure 13, this is where YOLOv7 performed quite well producing only minimum detections in areas outside the encrypted regions.
Last but not least, we sometimes encounter digit separators (dots) which are placed in digit sequences.We investigated the impact of the separators on detection performance.We found that separators do not have adverse effects on detection accuracy.Sample output from the aforementioned testing scenario is depicted in Figure 14.

Transcription
Once digits are detected by YOLOv7, all bounding boxes and their corresponding class labels are passed to the automated transcription algorithm presented in (Antal and Marák, 2022) which detects lines and extracts the digits.Digits are then

Future Work
In this paper we have presented our YOLOv7 based handwritten digit detector which is able to detect and classify large number of digits on a paper with acceptable performance.In addition, besides standard bounding boxes, it also supports polygonal masks for even higher precision.We have also eliminated the problem of Mask R-CNN detector where input images must be divided into small blocks leaving digits at the borders undetected.YOLOv7 solves this using its auto-anchor algorithm and detects digits in the entire document.
We plan to further improve our digit detection algorithm by extending the training dataset and incorporating geometric augmentations.It is crucial to train the detector on digits written using different writing styles and images of various quality.Currently, we deal with more research problems including segmentation of handwritten text area within the document, detection of special digit markups and glyphs.It is equally important to apply deep learning methods to segmentation and extraction of structured information from nomenclator keys.

Conclusions
During our research of historical encrypted documents in the Slovak National Archive in Bratislava we found several encrypted documents and keys in the archives of the aristocratic families Amade-Üchtritz, Esterházy, and Pálffy-Daun.Some of the documents are still encrypted, without known decryption or preserved corresponding key.To facilitate the potential decryption of these documents we focus on different tasks: • systematic collection and examination of the materials, trying to match the documents with keys, or at least with similar documents with known decryption; • historical study of the related time period, persons, and their relations, to provide insight on potential location of keys and potential key words for decryption attempts; • and finally providing a supporting automation, mainly in the transcription of documents.
We believe that only with proper computer automation the researchers would be able to prop-erly examine the large number of historical documents contained in the archives.As presented, new machine learning methods have promising results with good precision.However, the automated transcription of documents with machine learning does always lead to some detection and transcription errors (FAR, FRR).Unlike in plain text documents, such errors are difficult to detect and correct, because the ciphertext is seemingly a random sequence of characters.The ultimate goal of automatic decryption would require a further research in decryption methods that can tolerate some amount of transcription failures.

Figure 6 :
Figure 6: Cipher text example (Slovak National Archives, fond Esterházi -čeklíska vetva, box n. 636) In the Amade-Üchtritz family archive, we have found thirty-seven encrypted documents.Most of the documents are from the beginning of the 19th century and are related to Emil Üchtritz in the diplomatic service of the King of Saxony and to various Saxon envoys.Cardbox n. 136 contains five encrypted messages from 1814-1834, written in German and French.Three documents were sent by Detlev Graf von Einsiedel, and two by Minckwitz, Saxon Minister for Foreign Affairs.Four documents contain the corresponding plain text parts written above/below the cipher text.In one case, only the cipher text is preserved (Figure7).Cardbox n. 139 contains six encrypted French messages from 1806, correspondence of Senfft von Pilsach, Saxon envoy in Paris.In all cases, the corresponding plain text is written above the cipher text (Figure8).Cardbox n. 140 contains twenty-one documents written in French and German languages dated between 1816-1823.These messages are communication of Detlev Graf von Einsiedel with Schulenburg, the Saxon envoy in Vienna, and to Griesinger, the Saxon Legation Counsellor in Vi-

Figure 11 :
Figure 11: Cipher key example (Slovak National Archives, fond Amade-Üchtritz, box n. 138) ) until his death, he served as a court and state chancellor of the Habsburg monarchy.He initiated the Theresian reforms in the civil service and the establishment of the Council of State (1760).Kaunitz was one of Maria Theresa's closest advisers.He resigned from his post after Franz II ceded some Polish territories to Prussia in 1793 and wanted to exchange Austrian possessions in the Netherlands for Bavaria.(Encyclopaedia Beliana, 2017; Schilling, 1994) Wirich Philipp von Daun (1669-1741) was an Austrian field marshal in the Imperial Army in the War of the Spanish Succession under the command of Eugene of Savoy.He became famous for the successful defense of Turin in 1706.Daun became viceroy of Naples (1707-1708 and 1713-1719), and governor of the Austrian Netherlands (1724-1725).(Schmidt-Brentano, 2006; Kubeš et al., 2018) He was an imperial privy councilor and chamberlain.For his services on the Italian battlefields during the War of the Spanish Succession, he received the Italian noble titles of Marquis of Rivoli (1706) and Duke of Teano (1710).Italian and Spanish cryptographic materials from the Pálffy-Daun family archives are also linked to these titles.

Figure 12 :
Figure 12: Result of digit detection and segmentation

Figure 13 :
Figure 13: Digit detection performance in a document combining encrypted (at the bottom) and non-encrypted parts (at the top)