Good to know. I consulted the website of ABBYY and it say one option is
an "Open license for local use on workstations", but I guess it's not a
FLOSS license, unfortunately.
By the way, what is the state of the affair regarding Indic languages?
Do we have a central page documenting existing OCR pipeline used by the
What should I say to a contributor which come to me asking "I have this
old PD book in my personnal library that I would like to digitalize,
share and proofread in Wikisource, where should I start?". Do we have an
online service, for example on tool labs, which enable to either upload
or simply input url of a facsimile and that launch the OCR for example
backed on tesseract?
Shouldn't we update our roadmap, or is there a more up to date
Le 13/04/2018 Ã 08:28, Nahum Wengrov a Ã©critÂ :
> I use ABBYY Finereader, don't remember the exact version (probably 12
> or 11). I bought it a few years ago and it works perfectly for my
> language (Hebrew).
> On Fri, Apr 13, 2018 at 2:22 AM, mathieu stumpf guntz
> <***@culture-libre.org <mailto:***@culture-libre.org>>
> Thank you Nahum,
> Could you indicate which OCR solution you are using?
> Le 26/03/2018 Ã 17:27, Nahum Wengrov a Ã©critÂ :
>> I frequently work offline on he.wikisource. I download the entire
>> pdf file from commons to my hard drive, and OCR the page I need
>> myself. One can use the OCR of wikisource and download the text
>> too, I guess, page by page. Then I proof the text in a Word
>> document, open to the lower half of my screen, with the pdf open
>> on the upper half of the screen, where I go to the page I need
>> with acrobat reader, and scroll both windows down or up as needed.
>> On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz
>> <mailto:***@culture-libre.org>> wrote:
>> Le 24/03/2018 Ã 16:22, billinghurst a Ã©critÂ :
>>> Though that would defeat the purpose of online proofreading
>>> with account verification. Some of the true value of our
>>> online process is that contribution builds a level of trust
>>> and knowledge and that is reflected in both our patrolling
>>> and the allocation of autopatrolled status.
>> How providing tools to make batch work offline would
>> interfere in anyway with that? Once the work is done, it can
>> be uploaded to Wikisource with whichever account the user want.
>> Actually, to my mind, the main benefit of the online aspect
>> is the peer to peer production model. Also there is no need
>> of a central node carrying accounts to take into account the
>> trust given to a particular contributor. There is digital
>> signature technologies such as gpg for example. Having a
>> central node with a web interface just makes things easier
>> for most users, it doesn't improve the trustability of the
>> environment. On the contrary, with a single point of failure,
>> we actually rely on a weaker solution on this regard.
>>> Â Also how would you have access to templates, and components
>>> like that from off-line?
>> Well, that just show how innefecient are this tools to
>> continue to contribute while being offline. It's allways
>> possible to install Mediawiki and download required
>> templates, but currently this process seems way to
>> complicated, doesn't it.
>>> Also we generally cannot download the images separately as
>>> that is usually part of the later clean-up where people have
>>> the technical skills.
>> I'm afraid the term "image" misguided your answer. It's seems
>> you interpreted that as picture elements from files, while I
>> was talking about this files themselves.
>>> So yes, there is the capacity to have the text and proofread
>>> the text, that actual checking the text against the image is
>>> not the sole component of proofreading, and further it would
>>> not be at all helpful for validation.
>> There is nothing magic about working directly in a browser.
>> People do download and upload all the required material
>> anyway, but on a page per page base. The result is just as
>> valid as it is done when transactions are operated on a file
>> repository level.
>> Wikisource-l mailing list
>> Wikisource-l mailing list