Commons:Requests and votes/User:Dvortybot

From Wikimedia Commons, the free media repository
Jump to: navigation, search



Hi, I've thrown together a script thinggy to take a zip file from User:Dvortygirl and convert her .wav files to .ogg, then upload them here, with her "standard" text, for her. She plans to send me batches of about 100 files at a time. Special:Contributions/Dvortybot shows the test run. Let me know what throttling I should add between the python calls, and I'll happily add that back in. TIA! --Connel MacKenzie 07:47, 5 February 2007 (UTC)

I believe all the relevant information was already in my descriptions, but here is what I think the Information template would look like, if it would make everybody happier.
Pronunciation of the term in US English, recorded by [[User:Dvortygirl|Dvortygirl]], 
|Description= Audio pronunciation of the term in United States English.  [[Category:English pronunciation|word]]
|Source= Self
|Date= {{subst:CURRENTDAY}} {{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}
|Author= [[User:Dvortygirl|Dvortygirl]]
|Permission= {{self2|GFDL|cc-by-sa-2.5,2.0,1.0}}
|other_versions= (optional variable, can be left out)
Yes, I had previously been using Audacity, and may yet do so for one-offs, such as requests or words of the day. I'm not sure that makes a whole lot of difference, but if for some reason we need to say "Source = Self (using Shtooka audio)" or some such, please suggest that. With 4000-plus words already done and probably 70,000 in the database, ready to go, I am eager to automate what parts of the process I can.
If anybody would like to divide audio pronunciations up by regions, I'm certainly open to having a different category. As far as I'm concerned, that's just template text. U.S. English varies, too, so we should think carefully about what to call that category, if we make one. That said, I think I'm generally in the vicinity of "GenAm", or General American. (As a rule, I don't do pronunciations for words that are clearly outside the American dialect. You will not, for instance, find U.S. audio for honour or bagsie.) All my files are already labeled En-us-word.ogg (for English, United States), and I have encouraged other interested Wiktionarians, at least, to follow suit with this Language-region-word naming convention. Our audio template in Wiktionary already has a field for region, too.
As a technical note, can templates nest, as I have done with the licensing, or should the GFDL/cc-by template go below with the licensing reading "see below", instead? Dvortygirl 05:23, 7 February 2007 (UTC)
I've seen it done both ways but I think we prefer if the template is below in the licensing section, it makes it marginally easier for bots to find. But it's not the same as if you embedded the license INTO a template rather than passed it in as a PARAMETER as is done here... the embedding into templates is something we are not so keen on (see recent VP discussions) That template looks good, what I would say would be an improvement is on source= say a bit more than "self"... some pointers to how it was done might be helpful. Since you are originating the content, you don't have to give provenance like you do for something you got somewhere else (like a PD-old pic). Looking good though. ++Lar: t/c 16:19, 7 February 2007 (UTC)
Lar, I'm sorry, but I'm quite a newbie when it comes to licensing format preferences on commons. Can you please repeat the example here, in the format you want, for me? --Connel MacKenzie 22:39, 7 February 2007 (UTC)
Well, I'm biased, but I quite like how this one came out: Image:M29 Weasel Arctic USArmyTransMuseum.jpg ... go into edit mode and you'll see the information box filled out. Of course that's an image not a sound. Here's another one: Image:Star in the east solfege.ogg by Makemi which shows the license at bottom as well. In both cases the permissions section describes verbally but the license itself is in ==Licensing== ... hope that helps rather than confuses. I'd err on the side of too much information. ++Lar: t/c 23:43, 7 February 2007 (UTC)
OK, all better now? --Connel MacKenzie 07:09, 11 February 2007 (UTC)


  • Symbol support vote.svg Support--Jusjih 15:13, 5 February 2007 (UTC)
  • No objections. May be will be good idea to place files into Category:American English pronunciation (or what is correct term for it? :-)? --EugeneZelenko 16:04, 5 February 2007 (UTC)
    • On Commons, is it OK to put these entries in both Category:English pronunciation and AmEng? Or does it have to be one or the other? If the latter, then I would personally prefer they not be all sorted through, and moved into a less-useful subcategory. That is, she is a native speaker of American English; very few of these words are specific to American English though. And a subcategory with that name, would/could/should be limited only to "Americanisms"? I guess I'm confused, and had better let Dvorty answer, herself. --Connel MacKenzie 08:14, 6 February 2007 (UTC)
      • My point is that pronunciation categories are usually big. Since there are several kinds of pronunciations will be good idea to split them initially by type. Category:British English pronunciation exists already. --EugeneZelenko 15:54, 6 February 2007 (UTC)
        • As I said, I think I'd better let Dvortygirl answer. --Connel MacKenzie 05:14, 7 February 2007 (UTC)
  • No objections. Looks worthwhile. Cary "Bastiqe" Bass demandez 16:26, 5 February 2007 (UTC)
  • Could you use the {{information}} template to standardise the info about sources, how these were created, etc and give more provenance and contact/context info? Other than that your test run looks good, no objections here. ++Lar: t/c 22:47, 5 February 2007 (UTC)
    • AFAIK, She was using Audacity to record words for the last couple years, but this week, started using moostik, and suddenly needed a way to upload terms 100 times faster. How detailed is that supposed to get? Should she identify what type of microphone she's using, etc? Or do you want to know the source of her "feeder lists"? I guess I'd better let her answer that and the other questions directly. --Connel MacKenzie 08:14, 6 February 2007 (UTC)
      • I'm not enough of a geek about sound, or what she is doing, to know what is keen to have recorded, I just know that more info is better. Please at least consider taking a shot at filling out the {{information}} template, it has a good set of fields that people often want to know the answers to. But it sounds like a great bot to have running. ++Lar: t/c 03:19, 7 February 2007 (UTC)
        • Hmmm. I guess I better read up on the commons:' bot procedures. Does "no objection" mean I can/should proceed with more tests, or that it is good to go? --Connel MacKenzie 05:14, 7 February 2007 (UTC)
          • Bastique can answer more precisely but I think no objection means go ahead and test for now by all means, and after some period of time (a weekish after all objections are satisfied?) if no one objects (any more, if there were any), a 'crat will call consensus and set the bot flag on and let you know. Things are a bit more informal here, we don't have a Bot Approval Group like wp:en does. Hope that's correct and helpful! For the record, I have no objection. I am just suggesting that you get more verbose in your descriptions... cross references and more detail and suchlike are always goodness, but my "no objection" stands whether you do or not. ++Lar: t/c 16:09, 7 February 2007 (UTC)
            • <Breathes sigh of relief> I'm very glad to hear that! </whew> --Connel MacKenzie 22:39, 7 February 2007 (UTC)
            • As a sidenote, I would have taken a different approach (proabably with the same end result) if I had known about COM:FUS. Thanks again - I guess we'll see in a week, now. --Connel MacKenzie 15:33, 11 February 2007 (UTC)