2018-04-11 09:36:56 UTC
there's a _djvu.xml file.
_djvu.xml file is splitted into pages and uploaded "as it is" as page text.
An jQuery script can parse xml and convert it into an excellent plain text.
The same trick runs both in djvu and in pdf based Index pages. Another
advantage is that mapped text is saved as first version of page content and
that it can be recovered and used with no external tool.
While parsing xml, the same script can fix too some FineReader severe
mistakes from wrong analysis of text layout (wrong splitting of text into
columns/regions) using words coordinates.