Commons:Batch uploading/Spread the sign

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Spread the sign[edit]

This is what the films look like. This one shows the Swedish sign for apelsin (orange).


  • Describe the works to be uploaded in detail (audio files, images by …):
    • Spreadthesign has around 150 000 films of signs in 16 different languages, and are continuing to make more films in new languages. They want to share their films to help raise awareness about sign language and to make better use of the material they have. The films are of high quality but they have yet to decide what resolution and format they want to upload.


  • Which license tag(s) should be applied?
    • CC-BY-SA


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
    • It would probably be a good idea to have a special template. One thing I'm thinking about is a field that allows connecting films with the same word in different languages for example.

Axel Pettersson (WMSE) (talk) 10:36, 29 August 2013 (UTC)

Opinions[edit]

In a first meeting me and Lokal_Profil had the following comments on the code:

  • Make the filename $word_$language-spreadthesign.ogv
  • Create categories with $wordclass (or what it might be called to have nouns, verbs and such in separate categories)
  • Create the categories Videos of sign language in $language and $language sign language
  • Make the description more dynamic and at least in english and the language of the film. Help with l10n might be needed here, although they have a network with partners in all languages they have films in.
That’a cool project! Awesome :)
Just a few rapid thoughts
  • I gather from the Gist that source links all follow the same pattern − it might be worth to create {{STS link|$vid}} to create the link (whose label could be i18n).
  • Not sure why you need a special template for. Connecting to similar films could be done through the other versions field. Am I missing something?
  • License: File:67329.webm is tagged with CC-BY.
  • Author: do we have better metadata for that? « own work » does not really cut it. Who should be attributed here?
All for now. Hope that helps!
Jean-Fred (talk) 21:06, 29 August 2013 (UTC)
Thanks for your concerns Jean-Fred, we also feel it's important projects.
  • We want to use one correct template to create all commons pages to 150000 uploaded films.
  • Yes, we have better data. « own work » is gone, each file name will be as suggested above $word_$language-spreadthesign.ogv, better, dynamic(language supported) description to.
Spreadthesign10:21, 30 August 2013 (UTC)
Updated bot code.
  • New version is available here
Spreadthesign10:21, 30 August 2013 (UTC)
Updated bot code.
  • Changes link source here
  • We vill use CC-BY-SA-3.0 for the upload.
Spreadthesign 09:33, 2 September 2013 (UTC)
Updated bot code.
  • Added support for wordclass category [[category:verb]] here
Spreadthesign(talk) 10:45, 3 September 2013 (UTC)
Updated bot code.
Example: Apelsin spreadthesign.ogv
SpreadthesignBot (talk) 08:35, 4 September 2013 (UTC)
I created {{STS-cooperation}}. Please help translate it and put it in the code somewhere. /Axel Pettersson (WMSE) (talk) 10:14, 5 September 2013 (UTC)
A few things: 1) Which language codes are you planning to use? It makes sense to use the sign language code and not the spoken/written language code (so swl for the Swedish sign language, not sv). 2) In the bot code I don't see any simple way to cancel uploading or pausing between uploads. You need a way to shut it down in case of "emergency", and in the beginning you should start uploading at 1-2 files per minute, and gradually increase that. 3) Are the videos already encoded as OGV? If not, you might consider using Webm instead, since that gives better quality, but it's no big deal. Skalman (talk) 08:04, 6 September 2013 (UTC)
Wordclass (ordklass in Swedish) is called part of speech in English, POS for short. What I understand, filenames on Commons need to be unique. Same word can belong to different parts of speech. So if I am not wrong, I suggest that the part of speech should also be part of the filename. There should also be possible to add another optional distinguisher if two or more signs are used for the same word of the same part of speech.
Regarding the order $word-$language, isn't it better to state the $language first, like $language-$word-$pos-$optional_dist-spreadtheword.webm/ogv. Please give me your thoughts.
I don't know if this matters, but working with Wiktionary I am used to the fact that capitalization matters. Is there a reason to not normally name the files without capital letter for the word? Like swl-apelsin-noun-spreadthesign.webm/ogv. ~ Dodde (talk) 17:16, 6 September 2013 (UTC)
Thanks all.
To Skalman.
  • We were planning to use language code (ex. sv for Swedish, Svenska) for naming the files simply because we are able to support it, now we may add support for sing language code.
  • Yes, we thought we can limit the amount of files proceeded by limiting how many rows are fetched from the database i SQL. As you mentioned we are going to start with a few records and then increase the amount.
  • All videos we are going to upload are in flash, mp4 or/and ogv format.
Spreadthesign 13:31, 11 September 2013 (UTC)
To Dodde.
  • Today we cannot support using POS for naming the files, I'll create another distinguisher.
  • I can't see any benefit using $language before $word or another way around. If there is any please tell me.
  • Capitalization matters, I guess it was just a typo or test case.
Spreadthesign (talk) 19:51, 11 September 2013 (UTC)
A matter of organizing the files, I suppose. It's easier to see which language the word belong to, if the language code is presented first. In a listing, words of the same language would be grouped together. That is all I can say.
In order to be able to insert entries for the signs, the information regarding part of speech needs to be present. Is this information present somewhere else (in some database), or is it expected the person who runs the bot should manually decide or sometimes guess for each word before creating the entry? ~ Dodde (talk) 22:54, 11 September 2013 (UTC)
For pronunciation files, I believe that the language always comes first, see Category:Pronunciation. It makes sense to use the same system for sign language videos.
If you're not using the actual sign language codes, what do you intend to do with written languages that are covered by multiple sign languages, such as English (US, UK)? Are you using codes such as en-uk? I really think it'd be easier and more correct to just use the sign language code.
Regarding capitalization: The first letter of a file name is always capitalized here (which is another reason to have the lang code first). However, File:Coca-cola spreadthesign.ogv should actually have a capital C in the middle (as well as the lang code).
I feel that it'd be good if somebody who is part of the Commons community commented on this as well (I'm only familiar with the Swedish language Wiktionary). Skalman (talk) 06:26, 12 September 2013 (UTC)
Updated bot code.
  • Added support for sign language cod for naming the files. Now the file name we creating gonna be: swl-apelsin-spreadthesign.ogv sign of orange i Swedish sign language or ase-orange-spreadthesign.ogv sing of orange in American sign language. here.

SpreadthesignBot (talk) 12:16, 12 September 2013 (UTC)

Great work with the upload preparations. A few thoughts though.
  • Are the words always unique? I.e. the Swedish banan (banana/the track) could technically have two signs which would each end up becoming swl-banan-spreadthesign.ogv. A way around this would be to append the internal STS-id. In this case banana would become swl-banan-spreadthesign-98036.ogv. This also solves the issue with e.g. apelsin having three separate videos
    • As a follow up to the three different apelsin videos. Will the information "vanligast/lika vanlig/används i Norrland" which distinguishes the three be included somehow?
  • The license used should probably be {{cc-by-sa-3.0|[http://www.spreadthesign.com Spread the Sign]}} instead of {{self|cc-by-sa-3.0}}. What I did was to bake this into the {{STS-cooperation}} template so that this can be used as the permission parameter instead. This template also includes the Media contributed by Spread the sign-category meaning that it should also be removed from the github code.
  • The license should be complemented by an e-mail being sent to permissions-commons@wikimedia.org stating that STS are the owners of the material uploaded by User:Spreadthesign and releases it under cc-by-sa-3.0. Once this is properly registered the id can be added to {{STS-cooperation}}.
  • As for using a special information template. Since the information is structured (POS, language etc.) and there are so many videos it might actually help localisation to use a purpose created template. What I'm thinking is that if there is e.g. a parameter for POS then the language mapping could be done directly in the template (similar to how {{Technique}} works). Similar thing could be done with the languages etc. Other opinions on this would be welcome though.
/André Costa (WMSE) (talk) 15:12, 16 September 2013 (UTC)
Using a template to enter POS and any other information would be very helpful when inserting videos+descriptions automatically into Wiktionary entries, as long as it's well-structured. Skalman (talk) 08:48, 17 September 2013 (UTC)
Yeah, Spreadthesign is back. Thank you all for your thoughts, advices and assistance. We are very excited to begin the upload of our material very soon.
Now for the questions:
  • All files should be unique, I added a distinguishe for each file name.
  • I'm not quite sure how the license should be if not {{self|cc-by-sa-3.0}}. Please explain!
  • {{STS-cooperation}} was complemented, id release is added.
  • We understand how useful it would be to use the POS, but today this is not possible due to lack of support in the db.
  • Updated bot code is available here

SpreadthesignBot (talk) 13:29, 30 September 2013 (UTC)

In the code you should change permission from {{CC-BY-SA 3.0}} to {{STS-cooperation}}. Then it will be as in Apelsin spreadthesign.ogv with complete cooperation, license and OTRS-templates.
I still think description should include both english and the language of the film. Probably something like {{Multilingual description|en=$desc $categoryLanguage|(something that finds out language)=($desc in the language of the film $language in the own language)}} as it would be helpful to non-english communities.
Bot request is at here. Please help with the aproval there.
/Axel Pettersson (WMSE) (talk) 12:22, 3 October 2013 (UTC)
Please don’t use {{Multilingual description}}, please favor {{sv|...}}{{en|...}}. The behaviour of Mld is automatically triggered when there are more than N languages in the description field (don’t remember right now how much is N). Jean-Fred (talk) 13:12, 3 October 2013 (UTC)
Sorry about that, I didn't know. On the other hand, if the desription field only has two languages, english and the language of the film, will it be triggered then? Or maybe it doesn't matter as there is only two languages there. Still have the problem of inserting the right language-code there also, or is there an existing solution somewhere? /Axel Pettersson (WMSE) (talk) 13:39, 3 October 2013 (UTC)

Way to go, File:Bho-make+a+reservation-spreadthesign-9982.ogv is up and running. Some thoughts:

  • No need for + in filenames, it should be Bho-make a reservation-spreadthesign-9982.ogv
  • Format the upload as this
  • The description should state that it's Brittish sign language. Something like "Book a table in a restaraunt at a particular time so that you can eat a meal in British sign language."
  • Categories should be on one line each
  • Categories should (probably) be category:British English sign language and category:Videos of sign language in British English for Bho.

/Axel Pettersson (WMSE) (talk) 09:02, 21 October 2013 (UTC)

A few more points:
  • The name of the language is British Sign language - any categories should probably not include the word "English". The categories Axel suggested should probably be category:British Sign Language and category:Videos in British Sign Language (though I don't understand the difference between them - are both needed?)
  • I am confused as to which language this is. "bho" is the language code of Bhojpuri - British Sign Language has the language code "bfi".
  • The link back to www.spreadthesign.com should not be a Swedish language link. http://www.spreadthesign.com/se/9982/ -> http://www.spreadthesign.com/9982/
  • Where does the description/definition "Book a table in a restaraunt at a particular time so that you can eat a meal" come from? On spreadthesign.com I only see the text "make a reservation" (+the translations to other written languages). I believe you can "make a reservation" at a hotel too, so which description is accurate?
Skalman (talk) 12:23, 21 October 2013 (UTC)


Thanks.

  • + in a file name is a bug and it's fixed already.
  • The description will be complemented
  • Categories as well.
ToSkalman
  • "bho" comes from http://archive.ethnologue.com/14/show_family.asp?subid=1 i think i got it right.
  • Nice point with the link back to spreadthesign.com vill fix it until next test upload.
  • Sign description comes from our colleagues around the world whom get the chance to help since they know better what each sign means in theirs own language.

SpreadthesignBot (talk) 19:02, 21 October 2013 (UTC)

@SpreadthesignBot: That's an old version of Ethnologue. Starting with Ethnologue 15 it's "bfi". See here for the current version: http://www.ethnologue.com/subgroups/deaf-sign-language. On sv-wikt we use ISO 639-3 if Wikimedia doesn't have a special code (and I believe that the new version is the same as ISO 639-3). Skalman (talk) 08:40, 22 October 2013 (UTC)
To Skalman

Thanks a lot Skalman, my mistake.

SpreadthesignBot (talk) 12:13, 22 October 2013 (UTC)

Any status update? Skalman (talk) 11:51, 15 November 2013 (UTC)

Reboot in December[edit]

I've added some new movies

New description, wordclass in the categories and some more issues solved. Please have a look and comment here. /user:SpreadthesignBot (through Axel Pettersson (WMSE) (talk) 10:15, 6 December 2013 (UTC))

No comments after waiting for a few days. Moving along with some more uploads now, but feel free to interrupt or comment as we move along. /Axel Pettersson (WMSE) (talk) 10:54, 9 December 2013 (UTC)
Hey, I just hac a quick look. Looks very good, not much to say − please upload more!
One feature request just for the pleasure to ask for the impossible ;-). I see descriptions are provided in English and Swedish − good. But I see that translations are available in many more languages on STS website ; for example 81278, if inserted with a /de/, gives /de/81278/ which says “personen”. Any chance to fetch all those and add them to the file description page ? :) Jean-Fred (talk) 11:48, 9 December 2013 (UTC)
My suggestions:
  • Put the word in quotes (e.g. "annat" på svenskt teckenspråk)
  • In English, language names use capital letters, so it should be Swedish Sign Language (but "svenskt teckenspråk")
  • I'm wondering about File:Swl-annat-spreadthesign-73566.ogv - "annat" in Swedish does not mean "else" in English (annat=other, annars=else). I hope such mistakes are uncommon, but it would be nice to know what the actual meaning is - should we assume that for Swedish Sign Language videos, the Swedish description is (most likely to be) correct? Is there a good place to report errors?
Nice to see some activity here! Skalman (talk) 16:12, 9 December 2013 (UTC)


How is it going? Do you need help uploading? I might be able to help. Are there other considerations? Skalman (talk) 14:22, 14 January 2014 (UTC)

Files uploaded during test period[edit]

Theese files should be deleted and uploaded again with correct name and format later on.

Reboot

Assigned to Progress Bot name Category
Axel Pettersson (WMSE), Spreadthesign coding SpreadthesignBot Media contributed by Spread the sign