Unexpected result from a try to fix IA Upload failures
Alex Brollo
2017-12-22 22:14:29 UTC
While trying to fix some failures of IA Upload an unexpected result
emerged: an easy opportunity of fixing some usual OCR errors into djvu text

In brief, the script xml2dsed.py
converts IA _djvu.xml files into a "dsed" (lisp-like) code, so that text
layer can be uploaded into djvu file into a much faster and controllable
way using djvused.exe. While parsing the xml tree, at WORD level any word
of the text layer is exposed to the script environment as pure text; this
offers a unique opportunity to fix many scannos, avoiding any risk to mess
the xml or the dsed code.

Here the first djvu file
where this has been successfully tested.

Alex brollo

