Computer algorithm to decipher ancient texts

07 Sep, 2009

Researchers in Israel say they have developed a computer programme that can decipher previously unreadable ancient texts and possibly lead the way to a Google-like search engine for historical documents. The programme uses a pattern recognition algorithm similar to those law enforcement agencies have adopted to identify and compare fingerprints.
But in this case, the programme identifies letters, words and even handwriting styles, saving historians and liturgists hours of sitting and studying each manuscript.
By recognising such patterns, the computer can recreate with high accuracy portions of texts that faded over time or even those written over by later scribes, said Itay Bar-Yosef, one of the researchers from Ben-Gurion University of the Negev. "The more texts the programme analyses, the smarter and more accurate it gets," Bar-Yosef said.
The computer works with digital copies of the texts, assigning number values to each pixel of writing depending on how dark it is. It separates the writing from the background and then identifies individual lines, letters and words.
It also analyses the handwriting and writing style, so it can "fill in the blanks" of smeared or faded characters that are otherwise indiscernible, Bar-Yosef said.
The team has focused their work on ancient Hebrew texts, but they say it can be used with other languages, as well. The team published its work, which is being further developed, most recently in the academic journal Pattern Recognition due out in December but already available online.
A programme for all academics could be ready in two years, Bar-Yosef said. And as libraries across the world move to digitise their collections, they say the programme can drive an engine to search instantaneously any digital database of hand-written documents.
Uri Ehrlich, an expert in ancient prayer texts who works with Bar-Yosef's team of computer scientists, said that with the help of the programme, years of research could be done within a matter of minutes. "When enough texts have been digitised, it will manage to combine fragments of books that have been scattered all over the world," Ehrlich said.

Read Comments