Ithaca, a machine learning model built by AI researchers at DeepMind, can guess missing words and the location and date of written language, according to a new paper. The effort could help historians decipher ancient manuscripts. “Ithaca is a deep neural network, and as such, it is incredibly capable of finding hidden patterns in vast amounts of data, ’’ historian Thea Sommerschield, co-author of the recent paper, told Lifewire in an email interview. “Such patterns could be textual (grammatical, syntactic, or linked to a repeated ‘formula’ across many texts) or contextual (certain words appearing consistently in certain genres of texts: e.g., a political decree from Classical Athens mentioning the words ‘alliance, council, assembly…’).”
Revealing the Past
Ithaca is the first deep neural network that can restore the missing text of damaged inscriptions, identify their original location, and help establish the date they were created, Sommerschield said. Ithaca is named after the Greek island in Homer’s Odyssey. The researchers found that Ithaca achieves 62% accuracy in restoring damaged texts, 71% accuracy in identifying their original location and can date texts to within 30 years of their origin dates. Ithaca’s visualization aids are intended to make it easier for researchers to interpret results. The paper’s authors wrote that historians achieved 25% accuracy when working alone to restore ancient texts. But, the historian’s performance increases to 72% when using Ithaca, surpassing the model’s performance and showing the potential for human-machine cooperation. “Ithaca offers interpretable outputs, showcasing the rising importance of cooperation between human experts and machine learning, and shows how matching human experts with deep learning architectures to tackle tasks collaboratively can surpass the individual (unaided) performance of both humans and model on the same tasks,” Sommerschield told Lifewire. For example, historians currently disagree on the date of a series of important Athenian decrees made at a time when notable figures such as Socrates and Pericles lived, Sommerschield wrote in a blog post. The decrees have long been thought to have been written before 446/445 BCE, although new evidence suggests a date of the 420s BCE. “Although it might seem like a small difference, these decrees are fundamental to our understanding of the political history of Classical Athens,” she wrote The closest work to Ithaca is a previous machine learning tool called Pythia that Sommerschield and her collaborators released in 2019. Pythia was the first ancient text restoration model to use deep neural networks. “Today, Ithaca is the first model to tackle the three central tasks in the epigrapher’s workflow holistically,” Sommerschield said in an email. “Not only does it advance the previous state-of-the-art set by Pythia, but it also uses deep learning for geographical and chronological attribution for the very first time and on an unprecedented scale.”
AI to Aid Historians
AI is useful for filling in missing data like the location and date of text because it is good at learning very complex patterns by analyzing data, Brad Quinton, the CEO of the AI company Singulos Research, told Lifewire via email. “Using machine learning techniques, AI can look through a large number of “known good” examples to find patterns between, for instance, a given text and its date and location of creation,” Quinton added. “Often, these patterns are so complex that they would not be obvious to a human expert.” Predicting missing data is a common task for machine learning-based AI. For instance, GPT-3 from OpenAI can predict missing words in a sentence or even missing sentences in a paragraph. And many AI-based image processing systems have been used to restore video and images by intelligently predicting what has been lost from the original. “Conceptually, researchers could use similar techniques to determine the date and origin of art or tools, or other historical man-made artifacts where there is an expectation of change in the underlying style and technique over time and by location of origin,” Quinton said.