Discussion:
[Wikisource-l] Unexpected result from a try to fix IA Upload failures
Alex Brollo
2017-12-22 22:14:29 UTC
Permalink
While trying to fix some failures of IA Upload an unexpected result
emerged: an easy opportunity of fixing some usual OCR errors into djvu text
layer.

In brief, the script xml2dsed.py
<https://it.wikisource.org/wiki/Progetto:Bot/Programmi_in_Python_per_i_bot/xml2dsed.py>
converts IA _djvu.xml files into a "dsed" (lisp-like) code, so that text
layer can be uploaded into djvu file into a much faster and controllable
way using djvused.exe. While parsing the xml tree, at WORD level any word
of the text layer is exposed to the script environment as pure text; this
offers a unique opportunity to fix many scannos, avoiding any risk to mess
the xml or the dsed code.

Here the first djvu file
<https://commons.wikimedia.org/wiki/File:Trattati_del_Cinquecento_sulla_donna,_1913_%E2%80%93_BEIC_1949816.djvu>
where this has been successfully tested.

Alex brollo

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Mail
priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Loading...