Discussion:
Do we have tools for offline collaboration?
(too old to reply)
mathieu stumpf guntz
2018-03-24 14:58:32 UTC
Permalink
Raw Message
Hello,

A person in a local Wikisource workshop asked me if we could download
all material of a specific work to proofread it offline. So download
both the pictures and the OCRed text. Additionaly I think it would be
good to provide tool to at least have side by side plain text and pictures.

So, are you aware of anything close to such a tool? :)

Cheers
billinghurst
2018-03-24 15:22:46 UTC
Permalink
Raw Message
Though that would defeat the purpose of online proofreading with account
verification. Some of the true value of our online process is that
contribution builds a level of trust and knowledge and that is reflected
in both our patrolling and the allocation of autopatrolled status. Also
how would you have access to templates, and components like that from
off-line?

Also we generally cannot download the images separately as that is
usually part of the later clean-up where people have the technical
skills.

So yes, there is the capacity to have the text and proofread the text,
that actual checking the text against the image is not the sole
component of proofreading, and further it would not be at all helpful
for validation.

-- billinghurst

------ Original Message ------
From: "mathieu stumpf guntz" <***@culture-libre.org>
To: "discussion list for Wikisource, the free library"
<wikisource-***@lists.wikimedia.org>
Sent: 25/03/2018 1:58:32 AM
Subject: [Wikisource-l] Do we have tools for offline collaboration?

>Hello,
>
>A person in a local Wikisource workshop asked me if we could download
>all material of a specific work to proofread it offline. So download
>both the pictures and the OCRed text. Additionaly I think it would be
>good to provide tool to at least have side by side plain text and
>pictures.
>
>So, are you aware of anything close to such a tool? :)
>
>Cheers
>
mathieu stumpf guntz
2018-03-26 08:21:26 UTC
Permalink
Raw Message
Le 24/03/2018 à 16:22, billinghurst a écrit :
> Though that would defeat the purpose of online proofreading with
> account verification. Some of the true value of our online process is
> that contribution builds a level of trust and knowledge and that is
> reflected in both our patrolling and the allocation of autopatrolled
> status.
How providing tools to make batch work offline would interfere in anyway
with that? Once the work is done, it can be uploaded to Wikisource with
whichever account the user want.

Actually, to my mind, the main benefit of the online aspect is the peer
to peer production model. Also there is no need of a central node
carrying accounts to take into account the trust given to a particular
contributor. There is digital signature technologies such as gpg for
example. Having a central node with a web interface just makes things
easier for most users, it doesn't improve the trustability of the
environment. On the contrary, with a single point of failure, we
actually rely on a weaker solution on this regard.

>  Also how would you have access to templates, and components like that
> from off-line?
Well, that just show how innefecient are this tools to continue to
contribute while being offline. It's allways possible to install
Mediawiki and download required templates, but currently this process
seems way to complicated, doesn't it.

>
> Also we generally cannot download the images separately as that is
> usually part of the later clean-up where people have the technical skills.
I'm afraid the term "image" misguided your answer. It's seems you
interpreted that as picture elements from files, while I was talking
about this files themselves.

> So yes, there is the capacity to have the text and proofread the text,
> that actual checking the text against the image is not the sole
> component of proofreading, and further it would not be at all helpful
> for validation.
There is nothing magic about working directly in a browser. People do
download and upload all the required material anyway, but on a page per
page base. The result is just as valid as it is done when transactions
are operated on a file repository level.

Cheers
Nahum Wengrov
2018-03-26 15:27:18 UTC
Permalink
Raw Message
I frequently work offline on he.wikisource. I download the entire pdf file
from commons to my hard drive, and OCR the page I need myself. One can use
the OCR of wikisource and download the text too, I guess, page by page.
Then I proof the text in a Word document, open to the lower half of my
screen, with the pdf open on the upper half of the screen, where I go to
the page I need with acrobat reader, and scroll both windows down or up as
needed.

On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz <
***@culture-libre.org> wrote:

> Le 24/03/2018 à 16:22, billinghurst a écrit :
>
> Though that would defeat the purpose of online proofreading with account
> verification. Some of the true value of our online process is that
> contribution builds a level of trust and knowledge and that is reflected in
> both our patrolling and the allocation of autopatrolled status.
>
> How providing tools to make batch work offline would interfere in anyway
> with that? Once the work is done, it can be uploaded to Wikisource with
> whichever account the user want.
>
> Actually, to my mind, the main benefit of the online aspect is the peer to
> peer production model. Also there is no need of a central node carrying
> accounts to take into account the trust given to a particular contributor.
> There is digital signature technologies such as gpg for example. Having a
> central node with a web interface just makes things easier for most users,
> it doesn't improve the trustability of the environment. On the contrary,
> with a single point of failure, we actually rely on a weaker solution on
> this regard.
>
> Also how would you have access to templates, and components like that
> from off-line?
>
> Well, that just show how innefecient are this tools to continue to
> contribute while being offline. It's allways possible to install Mediawiki
> and download required templates, but currently this process seems way to
> complicated, doesn't it.
>
>
> Also we generally cannot download the images separately as that is usually
> part of the later clean-up where people have the technical skills.
>
> I'm afraid the term "image" misguided your answer. It's seems you
> interpreted that as picture elements from files, while I was talking about
> this files themselves.
>
> So yes, there is the capacity to have the text and proofread the text,
> that actual checking the text against the image is not the sole component
> of proofreading, and further it would not be at all helpful for validation.
>
> There is nothing magic about working directly in a browser. People do
> download and upload all the required material anyway, but on a page per
> page base. The result is just as valid as it is done when transactions are
> operated on a file repository level.
>
> Cheers
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-***@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
mathieu stumpf guntz
2018-04-12 23:22:36 UTC
Permalink
Raw Message
Thank you Nahum,

Could you indicate which OCR solution you are using?


Le 26/03/2018 à 17:27, Nahum Wengrov a écrit :
> I frequently work offline on he.wikisource. I download the entire pdf
> file from commons to my hard drive, and OCR the page I need myself.
> One can use the OCR of wikisource and download the text too, I guess,
> page by page. Then I proof the text in a Word document, open to the
> lower half of my screen, with the pdf open on the upper half of the
> screen, where I go to the page I need with acrobat reader, and scroll
> both windows down or up as needed.
>
> On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz
> <***@culture-libre.org <mailto:***@culture-libre.org>>
> wrote:
>
> Le 24/03/2018 à 16:22, billinghurst a écrit :
>> Though that would defeat the purpose of online proofreading with
>> account verification. Some of the true value of our online
>> process is that contribution builds a level of trust and
>> knowledge and that is reflected in both our patrolling and the
>> allocation of autopatrolled status.
> How providing tools to make batch work offline would interfere in
> anyway with that? Once the work is done, it can be uploaded to
> Wikisource with whichever account the user want.
>
> Actually, to my mind, the main benefit of the online aspect is the
> peer to peer production model. Also there is no need of a central
> node carrying accounts to take into account the trust given to a
> particular contributor. There is digital signature technologies
> such as gpg for example. Having a central node with a web
> interface just makes things easier for most users, it doesn't
> improve the trustability of the environment. On the contrary, with
> a single point of failure, we actually rely on a weaker solution
> on this regard.
>
>>  Also how would you have access to templates, and components like
>> that from off-line?
> Well, that just show how innefecient are this tools to continue to
> contribute while being offline. It's allways possible to install
> Mediawiki and download required templates, but currently this
> process seems way to complicated, doesn't it.
>
>>
>> Also we generally cannot download the images separately as that
>> is usually part of the later clean-up where people have the
>> technical skills.
> I'm afraid the term "image" misguided your answer. It's seems you
> interpreted that as picture elements from files, while I was
> talking about this files themselves.
>
>> So yes, there is the capacity to have the text and proofread the
>> text, that actual checking the text against the image is not the
>> sole component of proofreading, and further it would not be at
>> all helpful for validation.
> There is nothing magic about working directly in a browser. People
> do download and upload all the required material anyway, but on a
> page per page base. The result is just as valid as it is done when
> transactions are operated on a file repository level.
>
> Cheers
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-***@lists.wikimedia.org
> <mailto:Wikisource-***@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
>
>
>
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-***@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
mathieu stumpf guntz
2018-04-13 06:54:49 UTC
Permalink
Raw Message
Good to know. I consulted the website of ABBYY and it say one option is
an "Open license for local use on workstations", but I guess it's not a
FLOSS license, unfortunately.

By the way, what is the state of the affair regarding Indic languages?

Do we have a central page documenting existing OCR pipeline used by the
wikisource community?

What should I say to a contributor which come to me asking "I have this
old PD book in my personnal library that I would like to digitalize,
share and proofread in Wikisource, where should I start?". Do we have an
online service, for example on tool labs, which enable to either upload
or simply input url of a facsimile and that launch the OCR for example
backed on tesseract?

Shouldn't we update our roadmap[1], or is there a more up to date
document elsewhere?

[1] https://meta.wikimedia.org/wiki/Wikisource_roadmap


Le 13/04/2018 à 08:28, Nahum Wengrov a écrit :
> I use ABBYY Finereader, don't remember the exact version (probably 12
> or 11). I bought it a few years ago and it works perfectly for my
> language (Hebrew).
>
> On Fri, Apr 13, 2018 at 2:22 AM, mathieu stumpf guntz
> <***@culture-libre.org <mailto:***@culture-libre.org>>
> wrote:
>
> Thank you Nahum,
>
> Could you indicate which OCR solution you are using?
>
>
> Le 26/03/2018 à 17:27, Nahum Wengrov a écrit :
>> I frequently work offline on he.wikisource. I download the entire
>> pdf file from commons to my hard drive, and OCR the page I need
>> myself. One can use the OCR of wikisource and download the text
>> too, I guess, page by page. Then I proof the text in a Word
>> document, open to the lower half of my screen, with the pdf open
>> on the upper half of the screen, where I go to the page I need
>> with acrobat reader, and scroll both windows down or up as needed.
>>
>> On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz
>> <***@culture-libre.org
>> <mailto:***@culture-libre.org>> wrote:
>>
>> Le 24/03/2018 à 16:22, billinghurst a écrit :
>>> Though that would defeat the purpose of online proofreading
>>> with account verification. Some of the true value of our
>>> online process is that contribution builds a level of trust
>>> and knowledge and that is reflected in both our patrolling
>>> and the allocation of autopatrolled status.
>> How providing tools to make batch work offline would
>> interfere in anyway with that? Once the work is done, it can
>> be uploaded to Wikisource with whichever account the user want.
>>
>> Actually, to my mind, the main benefit of the online aspect
>> is the peer to peer production model. Also there is no need
>> of a central node carrying accounts to take into account the
>> trust given to a particular contributor. There is digital
>> signature technologies such as gpg for example. Having a
>> central node with a web interface just makes things easier
>> for most users, it doesn't improve the trustability of the
>> environment. On the contrary, with a single point of failure,
>> we actually rely on a weaker solution on this regard.
>>
>>>  Also how would you have access to templates, and components
>>> like that from off-line?
>> Well, that just show how innefecient are this tools to
>> continue to contribute while being offline. It's allways
>> possible to install Mediawiki and download required
>> templates, but currently this process seems way to
>> complicated, doesn't it.
>>
>>>
>>> Also we generally cannot download the images separately as
>>> that is usually part of the later clean-up where people have
>>> the technical skills.
>> I'm afraid the term "image" misguided your answer. It's seems
>> you interpreted that as picture elements from files, while I
>> was talking about this files themselves.
>>
>>> So yes, there is the capacity to have the text and proofread
>>> the text, that actual checking the text against the image is
>>> not the sole component of proofreading, and further it would
>>> not be at all helpful for validation.
>> There is nothing magic about working directly in a browser.
>> People do download and upload all the required material
>> anyway, but on a page per page base. The result is just as
>> valid as it is done when transactions are operated on a file
>> repository level.
>>
>> Cheers
>>
>> _______________________________________________
>> Wikisource-l mailing list
>> Wikisource-***@lists.wikimedia.org
>> <mailto:Wikisource-***@lists.wikimedia.org>
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>> <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
>>
>>
>>
>>
>> _______________________________________________
>> Wikisource-l mailing list
>> Wikisource-***@lists.wikimedia.org
>> <mailto:Wikisource-***@lists.wikimedia.org>
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>> <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
>
>
Nicolas VIGNERON
2018-04-13 09:33:33 UTC
Permalink
Raw Message
2018-04-13 8:54 GMT+02:00 mathieu stumpf guntz <
***@culture-libre.org>:

> Good to know. I consulted the website of ABBYY and it say one option is an
> "Open license for local use on workstations", but I guess it's not a FLOSS
> license, unfortunately.
>
Not at all, read more carefully, this license is available only when you
already purchased more than 50 licenses (
https://www.abbyy.com/en-ca/finereader/licensing/ ) so at least 5000 € IIRC.

> By the way, what is the state of the affair regarding Indic languages?
>
I left that one for people more acquainted with that but it seems to work
fine.

> Do we have a central page documenting existing OCR pipeline used by the
> wikisource community?
>
Not that I know of.
And AFAIK, each Wikisource and Wikisourcerer have different systems
(sometimes small differences but sometimes big differences).

> What should I say to a contributor which come to me asking "I have this
> old PD book in my personnal library that I would like to digitalize, share
> and proofread in Wikisource, where should I start?". Do we have an online
> service, for example on tool labs, which enable to either upload or simply
> input url of a facsimile and that launch the OCR for example backed on
> tesseract?
>
There is BUB https://tools.wmflabs.org/bub/ but only for certains websites.

> Shouldn't we update our roadmap[1], or is there a more up to date document
> elsewhere?
>
Whe should write a new document.

Cdlt, ~nicolas
Bodhisattwa Mandal
2018-04-13 12:19:09 UTC
Permalink
Raw Message
On 13 April 2018 at 15:03, Nicolas VIGNERON <***@gmail.com>
wrote:


> There is BUB https://tools.wmflabs.org/bub/ but only for certains
> websites.
>

BUB is not working for more than a year.


--
Bodhisattwa
Yann Forget
2018-03-25 11:49:18 UTC
Permalink
Raw Message
FYI, Zoé on the French Wikisource works offline, and then copy-paste the
proofread text back to Wikisource.
Seeing the result, she has quite a good process, fast and good quality.
You might want to ask her how she works:
https://fr.wikisource.org/wiki/Sp%C3%A9cial:Contributions/Zo%C3%A9

Regards,

Yann


2018-03-24 20:28 GMT+05:30 mathieu stumpf guntz <
***@culture-libre.org>:

> Hello,
>
> A person in a local Wikisource workshop asked me if we could download all
> material of a specific work to proofread it offline. So download both the
> pictures and the OCRed text. Additionaly I think it would be good to
> provide tool to at least have side by side plain text and pictures.
>
> So, are you aware of anything close to such a tool? :)
>
> Cheers
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-***@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>


--
Jai Jagat 2020 Grand March Coordinator
https://www.jaijagat2020.org/
+91-62 60 140 319
+91-74 34 93 33 58
Loading...