Discussion:
Indic Wikisource Update November 2016
(too old to reply)
Jayanta Nath
2016-11-02 18:02:45 UTC
Permalink
Raw Message
Hello all,

We've just published the November 2016 Indic Wikisource statistics. After
implementing Google OCR script to our all Indic Wikisource , they are
growing rapidly.

Here is the few stats ans their top three rank...

As per Number of article
1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.


As per Number of page Validation

1. Telugu Wikisource ( 18142 pages)
2. Tamil Wikisource ( 5167 pages)
3. Gujarati Wikisource ( 3729 pages)


As per Number of page Proofread

1. Telugu Wikisource ( 20213 pages)
2. Malayalam Wikisource ( 8065 pages)
3. Tamil Wikisource ( 7737 pages)

As per percentage supported by scan pages.
1. Bengali Wikisource (25.90%)
2. Telugu Wikisource ( 24.30%)
3. Gujarati Wikisource (17.51%)

I want to specially mention that there are no visible improvement at
Marathi and Assamese Wikisource.

For Sanskrit and Kannada Wikisource, they need to exploring their work of
proofreaded text towards scan page support.

Full Indic Wikisource stats here
https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats

Regards,
Jayanta Nath
Indic Wikisource Community
Andrea Zanni
2016-11-02 18:16:14 UTC
Permalink
Raw Message
Thanks, Jayanta,
is very important that you keep track of this progress.
Have you talked with Sam Wilson about this?

There could be many ways in which the WMF can help you
analyze this important moment of the Indic community,
and it's also very important to them (and their donors)
to understand how do they have an impact.

Google OCR is a "simple thing", but we ("Western wikisources) learned very
late that
OCR was not available in many Indic languages.
I have shown many people in the WMF the stats about Telugu Wikisource (the
peak in the chart)
and it's crucial that many other people inside WMF is aware of that.
The Indic Wikisource community can show that there are very "cheap" things
the WMF can do to help their communities thrive. The Indic Wikisource
community thus has a big responsability ;-)

Aubrey
Post by Jayanta Nath
Hello all,
We've just published the November 2016 Indic Wikisource statistics. After
implementing Google OCR script to our all Indic Wikisource , they are
growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article
1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
1. Telugu Wikisource ( 18142 pages)
2. Tamil Wikisource ( 5167 pages)
3. Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
1. Telugu Wikisource ( 20213 pages)
2. Malayalam Wikisource ( 8065 pages)
3. Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
1. Bengali Wikisource (25.90%)
2. Telugu Wikisource ( 24.30%)
3. Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at
Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of
proofreaded text towards scan page support.
Full Indic Wikisource stats here
https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
Regards,
Jayanta Nath
Indic Wikisource Community
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Sam Wilson
2016-11-03 00:46:02 UTC
Permalink
Raw Message
Yes, I agree! :-) There're so many smallish things that I reckon can go
a long way towards making Wikisources bigger and better.

And it keeps surprising me how many people within the Wikimedia movement
aren't familiar with how Wikisource works — and are amazed when they're
shown! :-) It really does seem that we're not very good at advertising
ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)

Talking of stats, what is French Wikisource doing that's so successful
at getting things proofread and validated?
Loading Image...
https://tools.wmflabs.org/phetools/statistics.php?diff=30

—sam
Post by Andrea Zanni
Thanks, Jayanta,
is very important that you keep track of this progress.
Have you talked with Sam Wilson about this?
There could be many ways in which the WMF can help you
analyze this important moment of the Indic community,
and it's also very important to them (and their donors)
to understand how do they have an impact.
Google OCR is a "simple thing", but we ("Western wikisources) learned
very late that
OCR was not available in many Indic languages.
I have shown many people in the WMF the stats about Telugu Wikisource
(the peak in the chart)
and it's crucial that many other people inside WMF is aware of that.
The Indic Wikisource community can show that there are very "cheap"
things the WMF can do to help their communities thrive. The Indic
Wikisource community thus has a big responsability ;-)
Aubrey
On Wed, Nov 2, 2016 at 7:02 PM, Jayanta Nath
Post by Jayanta Nath
Hello all,
We've just published the November 2016 Indic Wikisource statistics.
After implementing Google OCR script to our all Indic Wikisource ,
they are growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article
1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
1. Telugu Wikisource ( 18142 pages)
2. Tamil Wikisource ( 5167 pages)
3. Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
1. Telugu Wikisource ( 20213 pages)
2. Malayalam Wikisource ( 8065 pages)
3. Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
1. Bengali Wikisource (25.90%)
2. Telugu Wikisource ( 24.30%)
3. Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at
Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their
work of proofreaded text towards scan page support.
Full Indic Wikisource stats here
https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
Regards,
Jayanta Nath
Indic Wikisource Community
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_________________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
mathieu stumpf guntz
2016-11-03 07:36:26 UTC
Permalink
Raw Message
I guess that the "100 livres en 100 jours
<https://fr.wikisource.org/wiki/Wikisource:Accueil/100wikijours>" (100
books in 100 days) challenge help somewhat. The goal is to treat a whole
new book everyday. No anticipation work allowed. Missing the goal a
single day reset the counter.
Post by Sam Wilson
Yes, I agree! :-) There're so many smallish things that I reckon can
go a long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia
movement aren't familiar with how Wikisource works — and are amazed
when they're shown! :-) It really does seem that we're not very good
at advertising ourselves. (Well, one doesn't like to blow one's own
trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful
at getting things proofread and validated?
https://tools.wmflabs.org/phetools/graphs/Wikisource_-_proofread_pages_per_day.png
https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
Post by Andrea Zanni
Thanks, Jayanta,
is very important that you keep track of this progress.
Have you talked with Sam Wilson about this?
There could be many ways in which the WMF can help you
analyze this important moment of the Indic community,
and it's also very important to them (and their donors)
to understand how do they have an impact.
Google OCR is a "simple thing", but we ("Western wikisources) learned
very late that
OCR was not available in many Indic languages.
I have shown many people in the WMF the stats about Telugu Wikisource
(the peak in the chart)
and it's crucial that many other people inside WMF is aware of that.
The Indic Wikisource community can show that there are very "cheap"
things the WMF can do to help their communities thrive. The Indic
Wikisource community thus has a big responsability ;-)
Aubrey
Hello all,
We've just published the November 2016 Indic Wikisource
statistics. After implementing Google OCR script to our all Indic
Wikisource , they are growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article
1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
1. Telugu Wikisource ( 18142 pages)
2. Tamil Wikisource ( 5167 pages)
3. Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
1. Telugu Wikisource ( 20213 pages)
2. Malayalam Wikisource ( 8065 pages)
3. Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
1. Bengali Wikisource (25.90%)
2. Telugu Wikisource ( 24.30%)
3. Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement
at Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their
work of proofreaded text towards scan page support.
Full Indic Wikisource stats here
https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
<https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats>
Regards,
Jayanta Nath
Indic Wikisource Community
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
<https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
_________________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Andrea Zanni
2016-11-03 09:12:16 UTC
Permalink
Raw Message
Thanks Mathieu.
What really strikes me is that challenge is doable in fr.wikisource: in
many others would be complete madness ;-)
Also, Polish Wikisource is doing great.

What interest me is understanding how they are building their community of
active and super-active proofreaders: are they doing something that other
wikisource aren't?

Aubrey


On Thu, Nov 3, 2016 at 8:36 AM, mathieu stumpf guntz <
Post by mathieu stumpf guntz
I guess that the "100 livres en 100 jours
<https://fr.wikisource.org/wiki/Wikisource:Accueil/100wikijours>" (100
books in 100 days) challenge help somewhat. The goal is to treat a whole
new book everyday. No anticipation work allowed. Missing the goal a single
day reset the counter.
Yes, I agree! :-) There're so many smallish things that I reckon can go a
long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement
aren't familiar with how Wikisource works — and are amazed when they're
shown! :-) It really does seem that we're not very good at advertising
ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at
getting things proofread and validated?
https://tools.wmflabs.org/phetools/graphs/Wikisource_-_
proofread_pages_per_day.png
https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
Thanks, Jayanta,
is very important that you keep track of this progress.
Have you talked with Sam Wilson about this?
There could be many ways in which the WMF can help you
analyze this important moment of the Indic community,
and it's also very important to them (and their donors)
to understand how do they have an impact.
Google OCR is a "simple thing", but we ("Western wikisources) learned very
late that
OCR was not available in many Indic languages.
I have shown many people in the WMF the stats about Telugu Wikisource (the
peak in the chart)
and it's crucial that many other people inside WMF is aware of that.
The Indic Wikisource community can show that there are very "cheap" things
the WMF can do to help their communities thrive. The Indic Wikisource
community thus has a big responsability ;-)
Aubrey
Hello all,
We've just published the November 2016 Indic Wikisource statistics. After
implementing Google OCR script to our all Indic Wikisource , they are
growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article
1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
1. Telugu Wikisource ( 18142 pages)
2. Tamil Wikisource ( 5167 pages)
3. Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
1. Telugu Wikisource ( 20213 pages)
2. Malayalam Wikisource ( 8065 pages)
3. Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
1. Bengali Wikisource (25.90%)
2. Telugu Wikisource ( 24.30%)
3. Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at
Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of
proofreaded text towards scan page support.
Full Indic Wikisource stats here
https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
Regards,
Jayanta Nath
Indic Wikisource Community
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________*
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
_______________________________________________
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Nicolas VIGNERON
2016-11-03 09:37:34 UTC
Permalink
Raw Message
Post by Andrea Zanni
Thanks Mathieu.
What really strikes me is that challenge is doable in fr.wikisource: in
many others would be complete madness ;-)
Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of
active and super-active proofreaders: are they doing something that other
wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can
immediately think of a correlation since these two are among the rare
wikisources which are prooferead system only (or nearly only : 92 % and 96
% of mainspace pages back with scan, see
http://tools.wmflabs.org/phetools/statistics.php).

Cdlt, ~nicolas
Ankry
2016-11-03 11:46:49 UTC
Permalink
Raw Message
Post by Nicolas VIGNERON
Post by Andrea Zanni
Thanks Mathieu.
What really strikes me is that challenge is doable in fr.wikisource: in
many others would be complete madness ;-)
Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of
active and super-active proofreaders: are they doing something that other
wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can
immediately think of a correlation since these two are among the rare
wikisources which are prooferead system only (or nearly only : 92 % and 96
% of mainspace pages back with scan, see
http://tools.wmflabs.org/phetools/statistics.php).
Cdlt, ~nicolas
In pl.ws we have a policy that if a text *can* be processed using
ProofreadPage (legal aspects, scan availability) then it *has to* be
processed using this extention.

Ankry
Alex Brollo
2016-11-03 14:20:18 UTC
Permalink
Raw Message
I go sometimes into fr.source as a contributor, even if my French is very
poor; I appreciate a lot fr.source editing tools for proofreading, they
document a deep interest about any trick to make editing faster, safer, and
more comfortable. This "evidence of care" is very rewarding for any
contributor.

Alex
Post by Ankry
Post by Nicolas VIGNERON
Post by Andrea Zanni
Thanks Mathieu.
What really strikes me is that challenge is doable in fr.wikisource: in
many others would be complete madness ;-)
Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of
active and super-active proofreaders: are they doing something that other
wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can
immediately think of a correlation since these two are among the rare
wikisources which are prooferead system only (or nearly only : 92 % and
96
Post by Nicolas VIGNERON
% of mainspace pages back with scan, see
http://tools.wmflabs.org/phetools/statistics.php).
Cdlt, ~nicolas
In pl.ws we have a policy that if a text *can* be processed using
ProofreadPage (legal aspects, scan availability) then it *has to* be
processed using this extention.
Ankry
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Bodhisattwa Mandal
2016-11-03 14:33:47 UTC
Permalink
Raw Message
Post by Alex Brollo
I go sometimes into fr.source as a contributor, even if my French is very
poor; I appreciate a lot fr.source editing tools for proofreading, they
document a deep interest about any trick to make editing faster, safer, and
more comfortable. This "evidence of care" is very rewarding for any
contributor.
Alex
It would be great if a common page in meta or mul.ws is created to document
all the best practices, gadgets, tools, scripts which are used by every
language communities specially the big ones. That would help the smaller
communities to uplift themselves and draw more editors.
--
Bodhisattwa
Anika Born
2016-11-04 10:37:41 UTC
Permalink
Raw Message
+1 to Bodhisattwa

a page for Best Practice would be very much appreciated!

Anika
Post by Bodhisattwa Mandal
Post by Alex Brollo
I go sometimes into fr.source as a contributor, even if my French is very
poor; I appreciate a lot fr.source editing tools for proofreading, they
document a deep interest about any trick to make editing faster, safer, and
more comfortable. This "evidence of care" is very rewarding for any
contributor.
Alex
It would be great if a common page in meta or mul.ws is created to
document all the best practices, gadgets, tools, scripts which are used by
every language communities specially the big ones. That would help the
smaller communities to uplift themselves and draw more editors.
--
Bodhisattwa
_______________________________________________
Wikisource-l mailing list
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Federico Leva (Nemo)
2016-11-12 12:32:35 UTC
Permalink
Raw Message
It would be great if a common page in meta or mul.ws <http://mul.ws> is
created to document all the best practices, gadgets, tools, scripts
which are used by every language communities specially the big ones.
That would help the smaller communities to uplift themselves and draw
more editors.
Recurring aspects are listed at https://wikisource.org/wiki/WS:COORD

Nemo

Ankry
2016-11-03 11:43:29 UTC
Permalink
Raw Message
Post by Andrea Zanni
Thanks Mathieu.
What really strikes me is that challenge is doable in fr.wikisource: in
many others would be complete madness ;-)
Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of
active and super-active proofreaders: are they doing something that other
wikisource aren't?
Aubrey
I think I will nor betray any secret If I tell about that.

At the moment our community is based on few active users and we are trying
to support newcommers actively. We noticed that many new users do not like
classic on-wiki communication (through talk pages or Scriptorium) so we
support them also through other channels (email, IRC). They are always
welcome to ask.

I think we have no two users that come to the project in the same way.
Our users are often active in many fields. We have a Facebook page,
sometimes we mention our project on various fan sites (eg few years ago a
short note about plwikisource on an ebook fan site brought over 100 new
users; a few of them is still active).
We also appreciate occasional actions of our fellow colleagues from Polish
Wikipedia and Wikimedia PL (interviews, blog articles, workshops).

We notice that our community originates from different societies than
Wikipedia community (in Wikipedia some level of creativity is required, in
Wikisource other skills are preferred) and as Polish orthography and
grammar did change significantly since XIX c. and even since 1920-ties and
1930-ties, we do not look for new users among teenagers (we do not want to
break their fresh orthography-related skills; we look for users rather
among retired :) ).

Also I think, OCR tools progress (thanks to Wieralee), a short techical
guide for newcommers (also thanks to Wieralee) and a lot of automation
(thanks to Zdzislaw) made plwikisource more familiar for new users, even
if they have no earlier wiki experience.
(When I really came to plws in 2010, almost all books were re-written
manually)

We also noticed that various near goals when announced (eg. reaching 90%
ProofreadPage-based pages in main, 300.000 pages in Page namespace,
150.000 proofread pages or prepare the full set of Sienkiewicz's texts
onto 100th anniversary of his death) make our community more active.

Ankry
Yann Forget
2016-11-12 11:44:19 UTC
Permalink
Raw Message
Hi,

2016-11-03 8:36 GMT+01:00 mathieu stumpf guntz <
***@culture-libre.org>:

I guess that the "100 livres en 100 jours" (100 books in 100 days)
challenge help somewhat. The goal is to treat a whole new book everyday. No
anticipation work allowed. Missing the goal a single day reset the counter.
Yes, I agree! :-) There're so many smallish things that I reckon can go a
long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement
aren't familiar with how Wikisource works — and are amazed when they're
shown! :-) It really does seem that we're not very good at advertising
ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at
getting things proofread and validated?
https://tools.wmflabs.org/phetools/graphs/Wikisource_-_
proofread_pages_per_day.png
https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
Yes, the 100 books in 100 days challenge helps a bit, but growth comes
mainly from Zoé, who corrects all volumes of the "Revue des Deux Mondes",
and the partnership with the BibliothÚque et Archives nationales du Québec
(Quebec National Archives and Library), due to the leadership of Ernest.
See https://fr.wikisource.org/wiki/Wikisource:BAnQ and
http://www.banq.qc.ca/activites/wiki/wiki-source.html

Regards,

Yann
Loading...