Commons:Requests for comment/Allow transferring files from other Wikimedia Wikis server side
- The following discussion is archived. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Supported with technical nits. –Be..anyone (talk) 11:00, 2 January 2015 (UTC)[reply]
An editor had requested comment from other editors for this discussion. The discussion is now closed, please do not modify it. |
Contents
Occasionally, it is required or desired to move files from Wikipedia and other wikis hosted by the Wikimedia Foundation to Wikimedia Commons. The upload-by-URL feature is currently enabled for files from Flickr through the UploadWizard and for about 15 other domains through the GLAM-Wiki Toolset special page. However, it remains impossible to use the feature for files located under the *.wikimedia.org domain.
Aim
[edit]This RFC aims at evaluating whether there is community support to enable the upload-by-url
feature to allow easy-to-use copying of files from other Wikimedia wikis (like Wikipedia).
The result of this RFC should ...
- prove or refute strong community consensus, convincing the Wikimedia Foundation operations team to re-consider their position
- make people aware of this feature
- not be a vote about implementing a specific tool making use of
upload-by-url
; those tools do not currently exist but existing tools could benefit fromupload-by-url
as they would work faster and more reliable in certain cases if updated to useupload-by-url
when possible
Background
[edit]- Current procedure (Temporary download)
- File is downloaded from wiki, for example Wikipedia
- File is stored temporarily by a tool
- File is uploaded to Commons
- File is deleted from tool's storage
Tools working like this are For the Common Good, Commons Helper 2, Move-to-commons assistant, and Old version filemover.
- Possible new procedure (upload-by-URL)
- Tool sends request to Wikimedia Commons API with the URL to fetch the image from
- WMF server manages download and stores the file directly at Commons' upload directory
- Who is able to perform upload-by-URL actions?
Elected community members:
Pros
[edit]- Less dependency on third party tools for transferring files; thus less error-prone
- Saves bandwidth, mostly for users (not so much for WMF servers due to technical challanges)
- Allows JavaScript implementations running 100% in browsers (currently it's also possible to implement that in JS but it's awkward and slow)
- It will be easier and faster to transfer images from any Wikimedia project to Commons.
Cons
[edit]- Additional hardware/software needs to be configured/created on the WMF-side (c.f. bugzilla:42473)
I support allowing copy upload files from Wikimedia Foundation sites on Wikimedia Commons
- Rillke(q?) 15:36, 29 March 2014 (UTC)[reply]
- Regardless of implementation, it is the clean way to transfer files. --McZusatz (talk) 16:55, 29 March 2014 (UTC)[reply]
- Steinsplitter (talk) 17:03, 29 March 2014 (UTC)[reply]
- Fæ (talk) 21:37, 29 March 2014 (UTC)[reply]
- PS License reviewers are not elected. --Fæ (talk) 11:02, 12 May 2014 (UTC)[reply]
- meaning that I don't have a problem with it, not that I consider this extremely important. darkweasel94 00:47, 30 March 2014 (UTC)[reply]
- Sounds great. I always wondered why as a license reviewer I can only transfer files from Flickr (using UploadWizard) and not other sources. A first, basic step towards really being able to Upload files from a URL (upload_by_url) as stated in Special:ListGroupRights. FDMS 4 16:15, 8 April 2014 (UTC)[reply]
- Per McZusatz. Long overdue. I'd like to see some kind of right to transfer from files from WMF. Either a certain edit count here or being a license reviewer. It takes a little bit of knowledge how Commons works in order to be able to transfer files without running into a copyvio. :) --Hedwig in Washington (mail?) 18:07, 12 April 2014 (UTC)[reply]
- Of course! --Zhuyifei1999 (talk) 09:15, 14 April 2014 (UTC)[reply]
- No real downside I can see, if the devs can be persuaded to support this. --MichaelMaggs (talk) 13:14, 25 April 2014 (UTC)[reply]
- It makes sense to be able to transfer more directly from other Wikimedia projects. Green Giant supports NonFreeWiki (talk) 03:15, 1 May 2014 (UTC)[reply]
- Way long overdue...--Stemoc (talk) 11:29, 12 May 2014 (UTC)[reply]
- Yes, please! Being on a slow connection I am always reluctant to transfer files from sister projects to Commons. Regards, --ChrisiPK (Talk|Contribs) 11:32, 12 May 2014 (UTC)[reply]
- —Clockery Fairfeld who, me? 13:06, 12 May 2014 (UTC)[reply]
- Raymond 14:10, 12 May 2014 (UTC)[reply]
- Jmabel ! talk 15:30, 12 May 2014 (UTC)[reply]
- Support Current tools create quite a mess with the descriptions, sometimes missing important details like license or author and often flooding problem categories. such tool is quite overdue. I hope that it will be able to preserve the edit history and map between templates found on other wikipedias and Commons (not an easy task). --Jarekt (talk) 18:04, 12 May 2014 (UTC)[reply]
- Strong support If the implementation could be done, a lot of extra workload would be history. Great idea! --Hedwig in Washington (mail?) 23:03, 12 May 2014 (UTC)[reply]
- Support. see comment below--Wdwd (talk) 13:46, 13 May 2014 (UTC)[reply]
- Support. Sounds the cleaner way to do this, yes. -- Tuválkin ✉ 07:29, 16 May 2014 (UTC)[reply]
- Support. --Túrelio (talk) 08:40, 16 May 2014 (UTC)[reply]
- This is a good idea, but attribution and file history information must be preserved during transfer, either with a specialized template (like FTCG has), or by importing the revision history. Sven Manguard Wha? 17:07, 16 May 2014 (UTC)[reply]
- It's not just revision history that must be transferred. Note that edit history and upload history are stored in two separate tables, e.g. https://commons.wikimedia.org/wiki/File:Example.jpg#filehistory and https://commons.wikimedia.org/wiki/File:Example.jpg?action=history represent two distinct tables to transfer. TeleComNasSprVen (talk) 23:19, 16 May 2014 (UTC)[reply]
- Sure. But this will be the job of the tool-authors. I apologize if the RfC wasn't clear about that before - it should be now - but it is not about implementing tools transferring files e.g. making use of
upload_by_url
in future. If you would like a discussion about that (including of course existing tools transferring files as this would be closely related), please start an extra-RfC. Thanks in advance. -- Rillke(q?) 20:41, 17 May 2014 (UTC)[reply]
- Sure. But this will be the job of the tool-authors. I apologize if the RfC wasn't clear about that before - it should be now - but it is not about implementing tools transferring files e.g. making use of
- It's not just revision history that must be transferred. Note that edit history and upload history are stored in two separate tables, e.g. https://commons.wikimedia.org/wiki/File:Example.jpg#filehistory and https://commons.wikimedia.org/wiki/File:Example.jpg?action=history represent two distinct tables to transfer. TeleComNasSprVen (talk) 23:19, 16 May 2014 (UTC)[reply]
- Support User:Armbrust (Local talk - en.Wikipedia talk) 22:15, 18 May 2014 (UTC)[reply]
Oppose
[edit]I oppose allowing copy upload files from Wikimedia Foundation sites on Wikimedia Commons
Note that despite consensus, we might not get this feature. The Wikimedia servers would connect to themselves and transfer the files. The operators of the servers consider that an issue, overlooking it is a far more serious issue having to download the file from a place with slow internet connection and uploading it with even slower connection. Not to talk about bandwidth cost or contributors restricted on transfer volume. -- Rillke(q?) 15:36, 29 March 2014 (UTC)[reply]
- Question I do not understand how this would save bandwidth for users, or how this would make anything easier for people (actually, just admins, license reviewers and gwtoolset people — a small minority of users) with slow web connections? My naive imagination is that currently the WMF Labs server downloads the file from the other project, then uploads it to Commons. The file is never transferred from or to the user's computer. Do I have that right, or is my imagination wrong? darkweasel94 18:13, 29 March 2014 (UTC)[reply]
- Well, for contributors like you, it would be easy to request license reviewer status claiming that you want to use it for transferring files from WMF wikis. ForTheCommonGood is not running on WMFlabs; it's a .NET application and it's downloading and uploading to and from the user's computer. Do you want to depend forever on WMF Labs for transferring files? What happens if it's down? Or the maintainer retired? IIRC, some people at WMF wanted to give the upload-by-url right to more people by changing the user group rights but I would oppose anything in this direction. Files transferred have to be reviewed by knowledgeable and trusted community members. When I think about who is transferring files, Stefan4 and Fæ come into my mind and both have license reviewer status. Do you know someone transferring files and who would hardly get license reviewer status when requesting? -- Rillke(q?) 18:46, 29 March 2014 (UTC)[reply]
- OK, for people who use other tools than CommonsHelper to transfer files, this might make sense, granted. But as an entirely tangential issue, if a file is problematic, I would say it's problematic no matter if the user uploaded it directly from another website, or first downloaded it from there and then uploaded it, so I don't really get the point of the restriction to these user groups — and I don't think we've ever required license reviews for files transferred from Wikimedia projects? darkweasel94 20:32, 29 March 2014 (UTC)[reply]
- Unfortunately everyone can use CommonsHelper. I don't know how many of uploads of it I've cleaned up in the past or went out to other project demanding for restoring the file so I could add information that was lost during the transfer but I remember that I did and that it often was neither pleasant nor efficient time usage. On the other hand, if CommonsHelper would be restricted, users would download the files and just upload them (still happens) forgetting to mention that it isn't their own work but a copy from some other wiki. So why restricting upload-by-url? Because it is extremely efficient and fast. One can queue multiple uploads (using the async option), consequently one can easily transfer thousands of files a day. And if something like that is not properly done or planned, we end up messed up. Granting the GWToolset right is now restricted to bureaucrats and it has very similar capabilities. This is, why we have this RfC at all, I suppose. Note that Tomasz W. Kozlowski is one of our crats. Also note that I could, in theory, buy a super-fast internet connection create a sock and flood commons with copies from any site I like. So why is the GW Toolset right restricted? Ask odder and not me, please. -- Rillke(q?) 22:32, 29 March 2014 (UTC)[reply]
- Given how the vast majority of images on enwp seem to be < 4Mb, I doubt bandwidth is *that* much of an issue, unless you're exclusively targeting large files :o -FASTILY 22:59, 29 March 2014 (UTC)[reply]
- I don't know where you live but where I do, they usually sell you about 0.7 MiB/s download and about 100 KiB/s upload which is sufficient for most web applications but not if you're uploading stuff. And meeting people from Kenya, I saw they're still using dial-up connections… -- Rillke(q?) 23:24, 29 March 2014 (UTC)[reply]
- Given how the vast majority of images on enwp seem to be < 4Mb, I doubt bandwidth is *that* much of an issue, unless you're exclusively targeting large files :o -FASTILY 22:59, 29 March 2014 (UTC)[reply]
- Unfortunately everyone can use CommonsHelper. I don't know how many of uploads of it I've cleaned up in the past or went out to other project demanding for restoring the file so I could add information that was lost during the transfer but I remember that I did and that it often was neither pleasant nor efficient time usage. On the other hand, if CommonsHelper would be restricted, users would download the files and just upload them (still happens) forgetting to mention that it isn't their own work but a copy from some other wiki. So why restricting upload-by-url? Because it is extremely efficient and fast. One can queue multiple uploads (using the async option), consequently one can easily transfer thousands of files a day. And if something like that is not properly done or planned, we end up messed up. Granting the GWToolset right is now restricted to bureaucrats and it has very similar capabilities. This is, why we have this RfC at all, I suppose. Note that Tomasz W. Kozlowski is one of our crats. Also note that I could, in theory, buy a super-fast internet connection create a sock and flood commons with copies from any site I like. So why is the GW Toolset right restricted? Ask odder and not me, please. -- Rillke(q?) 22:32, 29 March 2014 (UTC)[reply]
- OK, for people who use other tools than CommonsHelper to transfer files, this might make sense, granted. But as an entirely tangential issue, if a file is problematic, I would say it's problematic no matter if the user uploaded it directly from another website, or first downloaded it from there and then uploaded it, so I don't really get the point of the restriction to these user groups — and I don't think we've ever required license reviews for files transferred from Wikimedia projects? darkweasel94 20:32, 29 March 2014 (UTC)[reply]
- Well, for contributors like you, it would be easy to request license reviewer status claiming that you want to use it for transferring files from WMF wikis. ForTheCommonGood is not running on WMFlabs; it's a .NET application and it's downloading and uploading to and from the user's computer. Do you want to depend forever on WMF Labs for transferring files? What happens if it's down? Or the maintainer retired? IIRC, some people at WMF wanted to give the upload-by-url right to more people by changing the user group rights but I would oppose anything in this direction. Files transferred have to be reviewed by knowledgeable and trusted community members. When I think about who is transferring files, Stefan4 and Fæ come into my mind and both have license reviewer status. Do you know someone transferring files and who would hardly get license reviewer status when requesting? -- Rillke(q?) 18:46, 29 March 2014 (UTC)[reply]
See bugzilla:42473 for the reason why this hasn't already been done. I replied to Faidon's refusal to enable this request in comment 19 at that bug. No further progress has been made since then, aside from someone else filing the same bug at bugzilla:62820. This, that and the other (talk) 05:29, 20 April 2014 (UTC)[reply]
Panoramio
[edit]What about Panoramio? We have a review-bot now. -- Rillke(q?) 14:21, 8 April 2014 (UTC)[reply]
- Is there a reason that we are whitelisting instead of blacklisting, considering that only trusted users are allowed to use this feature? --McZusatz (talk) 17:20, 8 April 2014 (UTC)[reply]
- I don't know one. -- Rillke(q?) 02:26, 9 April 2014 (UTC)[reply]
- The possibility of transferring pictures from Panoramio to Commons would be a great thing. I found here at panoramio.com over 1500 files in a high quality that could illustrate many Wikipedia articles. The pictures are provided under a free license, with accurate coordinates and with useful titles. Instead of transferring a few files manually and troublesome if needed in articles, a batch transfer would make it easier to use this valuable pictures. --Pustekuchen2014 (talk) 22:28, 12 April 2014 (UTC)[reply]
- Filed the "whitelist thing" as bugzilla:63961. --McZusatz (talk) 19:45, 15 April 2014 (UTC)[reply]
- I don't know one. -- Rillke(q?) 02:26, 9 April 2014 (UTC)[reply]
- Panoramio upload enabled through gerrit:126384. --McZusatz (talk) 10:01, 20 April 2014 (UTC)[reply]
Flickr upload tools and user rights
[edit]Hi, i use very often https://toolserver.org/~bryan/flickr/upload for flickr uploads (until toolserver gets killed this year). Regarding Commons:Upload Wizard/Flickr: "The Upload Wizard can upload flies directly from Flickr. This featured was deployed on Wikimedia Commons in December 2012, accessible to administrators and image-reviewers only for the testing phase." So, i can't use this. To become an "Image-reviewer", i would need to request this right and be approved. I'm not a newbie and i was given "autopatrol" rights last year. I can and do use bryan/flickr/upload (again: until toolserver gets killed this year), but apparently not the new Upload Wizard feature. There are only 203 Commons reviewers total, but currently 3,484 autopatrollers. If there is some wisdom in this, i don't get it. --Atlasowa (talk) 21:36, 12 April 2014 (UTC)[reply]
- Autopatrol rights are granted on admin's discretion while becoming Image-reviewer requires a small election. If you like to see changes the
upload_by_url
right also attached to autopatrollers (which seems to be plausible to me), please start a discussion about it, preferably a new RfC. Thanks for bringing this up. -- Rillke(q?) 09:03, 13 April 2014 (UTC)[reply] Being an autopatroller means that you are likely not a vandal, which in my eyes does not have anything special to do with the upload_by_url right. Very differently to that being a license reviewer means that you are (very) familiar with copyright, which in my eyes is indeed a requirement for being able to upload files from Flickr the fast way. It is not that difficult to become a license reviewer (some users even did without any support votes), so when you feel ready please apply for the user right. FDMS 4 18:52, 13 April 2014 (UTC)[reply]
- I think that UploadWizard feature is problematic anyway. It adds a template "license verified by UploadWizard", but there is no reliable way to see if UploadWizard was even actually used. Everybody can easily and perfectly fake a description page so that it looks as if it had been uploaded with UploadWizard and the license verified by it. I think this is a good reason for its availability only to license reviewers and admins, because such people are presumably trusted enough not to do such things. I think UploadWizard should add a simple {{Flickrreview}} tag, as Flickr2Commons does, and then I see no reason not to allow everybody to use it — it does the same thing as Flickr2Commons (I don't know how it compares in terms of features because I can use only F2C) so there's no reason not to allow it to the same people, i.e. everyone. darkweasel94 19:12, 13 April 2014 (UTC)[reply]
- Is there an easy way to make the initial upload edit summary say User created page with UploadWizard (without actually using it)? FDMS 4 00:43, 14 April 2014 (UTC)[reply]
- UploadWizard is hackable on the client side as far as I know (the script should move to server side), making it always pass reviews. As for the edit summary, Special:ApiSandbox is likely the easiest way. --Zhuyifei1999 (talk) 09:29, 14 April 2014 (UTC)[reply]
- Is there an easy way to make the initial upload edit summary say User created page with UploadWizard (without actually using it)? FDMS 4 00:43, 14 April 2014 (UTC)[reply]
- (Edit conflict) Depends on whether you consider use of a bot framework easy. With a bot framework it's trivial to upload files with any edit summary, including this one. It doesn't matter though if it's easy (that depends on your skills, obviously), the problem is that it's feasible and therefore the edit summary isn't a reliable way to see if UploadWizard was actually used — it's important to be thinking about what the most malicious person would do, not that "nobody's going to be such an asshole anyway". Therefore I think there should be an independent license check. darkweasel94 09:33, 14 April 2014 (UTC)[reply]
- No, I do not :) . Still I like the idea of this UploadWizard feature, so what about adding a review ID and logs assigned to it? Can we be 100% sure whether or not our license review bots are not intentionally making unfree files pass LR? FDMS 4 11:48, 14 April 2014 (UTC)[reply]
- I think bot operators who operate license review bots can be sufficiently trusted — if such a thing ever does happen, we'll certainly have problems going far beyond UploadWizard. Certainly they are more trustworthy than a client-side JavaScript program that doesn't even leave reliable traces of having been used. darkweasel94 12:22, 14 April 2014 (UTC)[reply]
- (Edit conflict) If you want to confirm the ones done by my bots, the scripts are open-sourcely running on tools in
- FlickreviewR 2:
/data/project/yifeibot/o/toolserver/bryan/flickr/bots/flickreviewr.py
(main script, should have an up-to-date copy in User:FlickreviewR 2/flickreviewr.py) - Picasa Review Bot 2:
/data/project/yifeibot/o/prb/Picasa-Review-Bot/Program.cs
(main script not compiled)/data/project/yifeibot/o/prb/Picasa-Review-Bot/bin/Debug/PicasaReview.exe
(running binary) - Panoramio Review Bot:
/data/project/yifeibot/panrb/panrb.py
(main script, should have an up-to-date copy in User:Panoramio Review Bot/panrb.py)
- FlickreviewR 2:
- Feel free to nano them if you have an account ;) Disclaimer: I cannot say for sure that any of the very trusted roots never edits the files. --Zhuyifei1999 (talk) 12:30, 14 April 2014 (UTC)[reply]
- I do trust Zhuyifei a lot (definitely not only because he is a bot operator and a trusted user per UserRightsLog), but if all this is about trust I am afraid I do not understand darkweasels' concerns correctly. @Zhuyifei1999: Thank you very much for your explanation, but … where exactly? Open source is fine (in this case), but is there a version history? FDMS 4 17:56, 14 April 2014 (UTC)[reply]
- AFAIK we don't generally consider it best practice even for license reviewers to review their own files. But if you read carefully, I was talking mainly about extending the ability to use that UploadWizard feature to people who aren't license reviewers. That should be done only if it adds a simple flickrreview tag like F2C does, and it's better if it does that for everyone. darkweasel94 21:19, 14 April 2014 (UTC)[reply]
- License reviewers are not allowed to review their own uploads, but in this case it is the (currently unfortunately manipulable) UploadWizard reviewing Flickr files before uploading them on the user's behalf. I was referring to I think that UploadWizard feature is problematic anyway and it's important to be thinking about what the most malicious person would do. However, now that I think about I find the current situation quite ridiculous. Currently, new users can upload any file they find on the internet and then enter something cryptic as source. For this reason, there are thousands of files every week that have to be deleted as No source since. On the contrary, I, as a license reviewer, can only upload files from flickr after UploadWizard reviewed them. So, why do not we give upload_by_url permission to every user and switch it from whitelist to blacklist?
- Perfect Commons: If uploads are claimed to be in the public domain, UploadWizard would tag them with {{PDreview}}. If there is no such claim, UploadWizard would …
- only allow uploads from hard drives if files are either Own work (UploadWizard would then search for identical files online same way Tineye and Google do; Own work uploads by new users would be tagged with {{OWreview}}), have a valid or pending OTRS ticket or are a derivate work of a freely licensed file already available on Commons.
- Require users to provide a full URL of the file to upload and either …
- the reference URL of a site showing the file and stating it can be reused under the terms of a free license.
- the URL of a site showing the file and the reference URL of a site linked to from the site showing the file stating it can be reused under the terms of a free license.
- UploadWizard would then …
- check whether the source is blacklisted (Bing/Google Images, celeb news websites, …).
- if it is a new user (anyone without any additional user right except autopatrolled) …
- and if the source is a website such as Flickr or Panoramio where UploadWizard can find out about license statuses easily disallow the upload of ARR, NC, ND and similiar files.
- and if it is not a source such as Flickr or Panoramio search for terms such as CC BY and links to a Creative Commons license page; if none are found these users would have to make a request on a Commons page similiar to en:WP:AFC and wait for license reviewers to either perform or decline it.
- add a {{LicenseReview}} tag.
- In the end, we would have a category each for pending PD reviews (same amount of files as today), license reviews (for files from external sources having passed every UploadWizard checks; (unfortunately) very low amount) and own work reviews (for uploads from new users that might be (obviously) derived from offline sources; quite high amount). Basically, the only copyvios outside one of these categories would be FOP cases and those with a false claim of ownership we are never going to find out about anyway (f. ex. a series of holiday pictures with one image shot by a friend).
- Of course, a lot about UploadWizard would have to change, not only its susceptibility to manipulation.
- I guess these suggestions are perfectly in the scope of this RFC ;). FDMS 4 00:32, 15 April 2014 (UTC)[reply]
- Yes, that's perfect for LRing. But will the backlog be as huge as CAT:UNCAT? As for version history, I'm now setting up a gerrit repo for transparency ;) --Zhuyifei1999 (talk) 10:08, 15 April 2014 (UTC)[reply]
- UPDATE: http://tools.wmflabs.org/yifeibot/gitweb/?p=botscripts.git;a=summary --Zhuyifei1999 (talk) 12:00, 15 April 2014 (UTC)[reply]
- AFAIK we don't generally consider it best practice even for license reviewers to review their own files. But if you read carefully, I was talking mainly about extending the ability to use that UploadWizard feature to people who aren't license reviewers. That should be done only if it adds a simple flickrreview tag like F2C does, and it's better if it does that for everyone. darkweasel94 21:19, 14 April 2014 (UTC)[reply]
- I do trust Zhuyifei a lot (definitely not only because he is a bot operator and a trusted user per UserRightsLog), but if all this is about trust I am afraid I do not understand darkweasels' concerns correctly. @Zhuyifei1999: Thank you very much for your explanation, but … where exactly? Open source is fine (in this case), but is there a version history? FDMS 4 17:56, 14 April 2014 (UTC)[reply]
- No, I do not :) . Still I like the idea of this UploadWizard feature, so what about adding a review ID and logs assigned to it? Can we be 100% sure whether or not our license review bots are not intentionally making unfree files pass LR? FDMS 4 11:48, 14 April 2014 (UTC)[reply]
- (Edit conflict) Depends on whether you consider use of a bot framework easy. With a bot framework it's trivial to upload files with any edit summary, including this one. It doesn't matter though if it's easy (that depends on your skills, obviously), the problem is that it's feasible and therefore the edit summary isn't a reliable way to see if UploadWizard was actually used — it's important to be thinking about what the most malicious person would do, not that "nobody's going to be such an asshole anyway". Therefore I think there should be an independent license check. darkweasel94 09:33, 14 April 2014 (UTC)[reply]
Security
[edit]Is there any hope of fixing the https-problem shortly? Are there any other security issues that could arise by allowing the direct transfer? --Hedwig in Washington (mail?) 23:11, 12 May 2014 (UTC)[reply]
- Huh, which HTTPS problem? -- Rillke(q?) 07:23, 13 May 2014 (UTC)[reply]
- No HTTPS possible via Bugzilla Bug 42473 #6 --Hedwig in Washington (mail?) 00:58, 15 May 2014 (UTC)[reply]
- I think it's able to fetch from HTTPS, at least one can specify HTTPS flickr URLs, AFAIK. -- Rillke(q?) 05:27, 15 May 2014 (UTC)[reply]
- No HTTPS possible via Bugzilla Bug 42473 #6 --Hedwig in Washington (mail?) 00:58, 15 May 2014 (UTC)[reply]
- OK I trust you guys know whatcha doing. Just wanted to clarify so we are not getting rear-ended later on. --Hedwig in Washington (mail?) 22:04, 15 May 2014 (UTC)[reply]
Missing file history
[edit]The API function upload-by-url
will not address the problem of the missing history. Any upload, even with upload-by-url
, is technicaly a new upload: The original history is missing and a potential license violation. So, the upload log from the source wiki must be added in the text field/description manually or be a tool (as done by commons helper).
In my opinion it would be much more convenient to implement an import function for files in the media wiki software (on server side). This enable the possibility to import all revision from a source like wikipedia to commons without the need of any file transfer. And we can preserve the original history with all file revisions and all text/meta data revision. After the file import there is a need to make a sort of clean up. Translate some of templates to commons templates name and so on. Which can be done by a simple client side (java)script, a bot_script, or can by done manually in the web browser.--Wdwd (talk) 13:46, 13 May 2014 (UTC)[reply]
- Missing file history is a problem, especially when transfer bot "looses" many important info. File transfer should be done by Import, not new reupload. --Jarekt (talk) 13:55, 13 May 2014 (UTC)[reply]
- There is an extension mw:Extension:Push: Lightweight extension to push content to other wikis which include files too. I have neither tested it nor I know if it would fullfil all of our needs. It is not reviewed for security and performance by WMF, so it could be a longer way to get it enabled here. Raymond 15:20, 13 May 2014 (UTC)[reply]
- @Raymond: But it's worth a try. The authentication section does not mention CentralAuth -- Any idea whether it would work out-of-the-box for the WM-Wikifarm? -- Rillke(q?) 21:00, 17 May 2014 (UTC)[reply]
- @Rillke: Sorry, no idea :-( And currently no time to test the extension :-( Raymond 06:31, 19 May 2014 (UTC)[reply]
- @Raymond: But it's worth a try. The authentication section does not mention CentralAuth -- Any idea whether it would work out-of-the-box for the WM-Wikifarm? -- Rillke(q?) 21:00, 17 May 2014 (UTC)[reply]
- There is an extension mw:Extension:Push: Lightweight extension to push content to other wikis which include files too. I have neither tested it nor I know if it would fullfil all of our needs. It is not reviewed for security and performance by WMF, so it could be a longer way to get it enabled here. Raymond 15:20, 13 May 2014 (UTC)[reply]
Cons added by Sven Manguard
[edit]Let me address the cons here, please:
- Additional hardware/software needs to be configured/created on the WMF-side (c.f. bugzilla:42473)
- A proxy config must be changed. This is removing one line of text. That's presumably all. But OPS does not want this because they possbly fear being subject to mockery: "Hey look at Wikimedia; they are copying files by copying them through their proxies." Still, this solution would be cheaper than the current one where external servers have to download and upload to WMF servers. -- Rillke(q?) 20:17, 17 May 2014 (UTC)[reply]
- OPS wants to be able to copy the files in the storage backend directly. This may cause additional development cost. -- Rillke(q?) 20:17, 17 May 2014 (UTC)[reply]
- If file transfer is done by reupload, rather than import, attribution (and possibly other) information will be lost.
- It's the same issue as with current tools. If UploadWizard would not import attribution from Flickr properly, it would be lost. If CommonsHelper would not copy these information from Wikipedia, it would be lost ... -- Rillke(q?) 20:17, 17 May 2014 (UTC)[reply]
- Standard file transferring template-based problems (templates used on other projects but not Commons not appeaing, templates that appear differently on Commons than on other projects, etc.)
- CommonsHelper and ForTheCommonGood have the same issue. I don't understand how this points relates to activating upload-by-url for WMF projects. -- Rillke(q?) 20:17, 17 May 2014 (UTC)[reply]
To conclude, I think, the cons provided are irrelevant to the topic. -- Rillke(q?) 20:17, 17 May 2014 (UTC)[reply]
- @Rillke - I have a concern. There seems to be a lack of non-local (e.g. enwp) admin interest in deleting files transferred to Commons. For example, the category for files transferred to Commons, here & here, shows a massive backlog. If the few interested non-local admins are already overwhelmed as it is, implementing this feature would surely bury them. Honestly, it almost seems like a waste of effort to transfer files to Commons if admin negligence means that trans-wikied files will remain hosted on local projects anyways. -FASTILY 07:10, 20 May 2014 (UTC)[reply]
- How would it be a waste of efforts if a file becomes usable by >100 projects this way while before it was only one? -- Rillke(q?) 07:36, 20 May 2014 (UTC)[reply]
- Well obviously the long-term benefits outweigh any cons, but I am wary of deliberately creating a mountainous backlog nobody wants to work on. -FASTILY 07:42, 20 May 2014 (UTC)[reply]
- We should ask for any tool that makes it easier to be shut down, then or be restricted to administrators at the "source" project ;-) -- Rillke(q?) 07:55, 20 May 2014 (UTC)[reply]
- Fastily and Rillke: Part of the reason that the categories that Fastily pointed out are so large is that some of those files are also in the 4000 item-strong Category:Wikipedia files on Wikimedia Commons for which a local copy has been requested to be kept. There are a number of editors that are under the impression that, if a file is sent to Commons, it will be deleted without a good reason, by an admin that doesn't know what they're doing, and with no effort to let the local projects know about the deletion discussion (if it goes through that process). There are people that proselytize that view on their user pages, and while it's not exactly pervasive, I still come across it often enough. While the first is right sometimes, and the second only rarely, the third is right most of the time. Commons is very bad at letting people know that the files they have uploaded are being considered for deletion. The URAA mass deletions, and deletions for things that most people don't know anything about (like freedom of panorama) also hurt Commons' cause. There's no good answer for this, but it is certainly a part of those backlogs.
- That, and that there aren't that many English Wikipedia admins that work in files. If I were an admin, I'd be happy to work on it. I'm not though, and I have no intention of running any time soon. Sven Manguard Wha? 16:03, 21 May 2014 (UTC)[reply]
- We should ask for any tool that makes it easier to be shut down, then or be restricted to administrators at the "source" project ;-) -- Rillke(q?) 07:55, 20 May 2014 (UTC)[reply]
- Well obviously the long-term benefits outweigh any cons, but I am wary of deliberately creating a mountainous backlog nobody wants to work on. -FASTILY 07:42, 20 May 2014 (UTC)[reply]
- How would it be a waste of efforts if a file becomes usable by >100 projects this way while before it was only one? -- Rillke(q?) 07:36, 20 May 2014 (UTC)[reply]
How is this being done at present?
[edit]Please excuse my ignorance, but how are files presently being transferred from other WMF projects, and by whom? I ask because I recently nominated for deletion an obvious copyvio—scanned from a published map—that had evidently been moved here from a Wikipedia site (another instance having previously been deleted from enWP), and was surprised that nobody had questioned the “own work” claim on it. OTOH I “human-reviewed“ an enWP photo for transfer here quite some time ago, but it’s still sitting in a holding category along with thousands more such images.—Odysseus1479 (talk) 20:47, 18 May 2014 (UTC)[reply]
- As far as I know Commons-reviewing files on en.WP only makes sense if you haven't got enough time to move it yourself or are blocked on Commons, so probably consider rather doing it yourself either using a standard upload form or a tool. (There is no special user group for transfers from Wikimedia projects.) FDMS 4 21:01, 18 May 2014 (UTC)[reply]
- Thanks; how does one do that while preserving the file history? I could down- and up-load the file, and copy-paste the description, author, &c., but wouldn’t that break the attribution to the original uploader and the older versions (if any)? Do the tools you mention handle this, and where can they be found?—Odysseus1479 (talk) 21:15, 18 May 2014 (UTC)[reply]
- In my opinion, no, because reusers for example in a newspaper wouldn't print the file history either :) . But is commons practice to copy the file history (not the version history) using the {{Original upload log}} and {{Original description page}} templates. You can find an example of how I've done that here (scroll down a bit). FDMS 4 21:24, 18 May 2014 (UTC)[reply]
- Tools. (However, attribution is not always correct using these tools either.) FDMS 4 21:26, 18 May 2014 (UTC)[reply]
- Thanks again; I just tried out CommonsHelper, and it does use the above templates; all I had to do was add categories. Sorry for the digression, but I’m in a much better position now to comment on the topic at hand.—Odysseus1479 (talk) 21:56, 18 May 2014 (UTC)[reply]
- Thanks; how does one do that while preserving the file history? I could down- and up-load the file, and copy-paste the description, author, &c., but wouldn’t that break the attribution to the original uploader and the older versions (if any)? Do the tools you mention handle this, and where can they be found?—Odysseus1479 (talk) 21:15, 18 May 2014 (UTC)[reply]
- Odysseus1479 - I recommend, if you have an OS that can use it, the tool For The Common Good. It makes transfers much easier. As to your other questions: anyone can do transfers, but most people don't because it takes a long time to do each one, and preferably requires enough knowledge about image policy and copyright law to make sure that only things that should get transferred, do get transferred. People that tag files for transfer are hoping that someone else will put in the time to do the transfers themselves, or that a bot will come by and do it while no one is looking (last time the idea of a bot transfer was raised, it got a cool reception, because a bot would have no idea if the files it was transferring were legitimate or were copyvios). Sven Manguard Wha? 16:15, 21 May 2014 (UTC)[reply]
- @Sven Manguard - I agree, FTCG is tedious as hell. Here's my semi-automated solution to said tediousness: Commons:CommonsMover. I've been developing it for some time now, and just finished it up yesterday. I haven't widely publicized it, because I want to be certain that it'll work as promised in that eclectic zoo of transfer candidates. I am looking for
victimsbeta testers though, so if you know anyone who'd be interested, could you send them my way? Cheers, FASTILY 22:34, 21 May 2014 (UTC)[reply]- I'm not sure FTCG is tedious, as much as that transferring itself is (checking the license, moving it over, adding categories, repeat. FTCG is actually a godsend compared to the old tools, because it actually transfers all of the information over correctly, as opposed to the old Magnus tool, which... well... usually didn't. Sven Manguard Wha? 01:04, 22 May 2014 (UTC)[reply]
- How does Special:Import work with regard to files? TeleComNasSprVen (talk) 06:10, 22 May 2014 (UTC)[reply]
- It only imports the page revision history. Files have to be uploaded separately. --Zhuyifei1999 (talk) 09:46, 22 May 2014 (UTC)[reply]
- How does Special:Import work with regard to files? TeleComNasSprVen (talk) 06:10, 22 May 2014 (UTC)[reply]
- I'm not sure FTCG is tedious, as much as that transferring itself is (checking the license, moving it over, adding categories, repeat. FTCG is actually a godsend compared to the old tools, because it actually transfers all of the information over correctly, as opposed to the old Magnus tool, which... well... usually didn't. Sven Manguard Wha? 01:04, 22 May 2014 (UTC)[reply]
- @Sven Manguard - I agree, FTCG is tedious as hell. Here's my semi-automated solution to said tediousness: Commons:CommonsMover. I've been developing it for some time now, and just finished it up yesterday. I haven't widely publicized it, because I want to be certain that it'll work as promised in that eclectic zoo of transfer candidates. I am looking for
Loss (sometimes total) of metadata
[edit]One of my biggest concern is the loss of metadata. See recent examples in the section below. Those files lost all the metadata which was stored in the file description page. The only thing that is left is license, original uploader and original upload date, which do not come from file description page. Recovery would be simpler, if we preserved file history. --Jarekt (talk) 12:51, 22 May 2014 (UTC)[reply]
- Example files which lost metadata in the transfer
- File:WIKIPEDIA ARTICLE - ABIOGENESIS (Part 01).ogg
- File:EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 01).ogg
- File:WIKIPEDIA ARTICLE - ABIOGENESIS (Part 02).ogg
- File:WIKIPEDIA ARTICLE - ABIOGENESIS (Part 04).ogg
- File:WIKIPEDIA ARTICLE EARTH (Part 04).ogg
- File:EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 02).ogg
- File:HISTORY OF THE EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 01).ogg
- File:WIKIPEDIA ARTICLE - ABIOGENESIS (Part 03).ogg
- File:EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 03).ogg
- File:EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 04).ogg
- File:HISTORY OF THE EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 02).ogg
- File:HISTORY OF THE EARTH - WIKIPEDIA SPOKEN ARTICLE (Part 03).ogg
- File:WIKPEDIA SPOKEN ARTICLE - Universe (Part 1).ogg
- File:WIKPEDIA SPOKEN ARTICLE - Universe (Part 4).ogg
- File:WIKPEDIA SPOKEN ARTICLE - Universe (Part 2).ogg
More files like that might be found at User:Jarekt/b, in case someone would like to help restoring the metadata. In some cases access to deleted metadata on Wikipedia might be useful. --Jarekt (talk) 13:33, 22 May 2014 (UTC)[reply]
- First of all, while they are technically "metadata", what you're talking about and what most people on Commons think of when you say metadata are entirely different. You are complaining that information from the file description was lost during the transfer. On Commons, metadata is generally taken to mean the information embeded into the file, such as EXIF data, and appears under the header "Metadata" on file description pages.
- The loss of information from the {{Information}} template (or attribution/date/description contained outside of that template) happens regularly with every file transfer tool except for For the Common Good. Even with that tool, which is by far and away the best that we have, information doesn't always transfer over smoothly. If the source project is using a template that doesn't exist on Commons, as is the case with your spoken Wikipedia recordings), the programs can't figure out where to put it. Different tools handle this differently. CommonsHelper just unceremoniously drops the information and inserts useless information it its place. It looks like Fastily's new program drops the information to. FTCG is can miss sometimes, but seems to lose information a lot less often. Sven Manguard Wha? 18:02, 22 May 2014 (UTC)[reply]
- I was talking about "metadata which was stored in the file description page", but you are right that EXIF is also part of metadata. I agree that if file is using some exotic infobox template unknown to commons, than we might run into problems, but {{Spoken article entry}} (which I have never heard about before) was around since 2008. Unfortunately what is often happening is that file is transferred, without any descriptions other than little bit in Original upload log section and than the original is deleted and now you need an wikipedia admin to look up what was the description of the file. You can do it for a few files but with large number of files, as in this case it becomes quite a lot of hassle. I do not think this is a show stopper, but may be the codes need to be tweeked somehow as to at least recognize when things go wrong like that. But than again that might be more of a problem with user:Fastily script than proposed extension to upload wizard. --Jarekt (talk) 18:29, 22 May 2014 (UTC)[reply]
- My tool interfaces with CommonsHelper to generate file description pages. If CommonsHelper isn't generating the correct output, then I won't have it either. Anyways, I just synced all the file description pages linked above with their enwp counterparts -FASTILY 21:29, 22 May 2014 (UTC)[reply]
- Fastily Any way that your tool could rip the brains/logic out of For The Common Good instead of CommonsHelper? FTCG seems to do a better job of not losing information than CommonsHelper does. Sven Manguard Wha? 23:20, 22 May 2014 (UTC)[reply]
- FTCG is a client-side C# program and my tool is a client-side Java program. If I was to implement the logic behind FTCG, I would have to do it from scratch, and given the amount and varying degree of parsing involved, it would not be an easy job. -FASTILY 07:09, 23 May 2014 (UTC)[reply]
- Fastily Any way that your tool could rip the brains/logic out of For The Common Good instead of CommonsHelper? FTCG seems to do a better job of not losing information than CommonsHelper does. Sven Manguard Wha? 23:20, 22 May 2014 (UTC)[reply]
- My tool interfaces with CommonsHelper to generate file description pages. If CommonsHelper isn't generating the correct output, then I won't have it either. Anyways, I just synced all the file description pages linked above with their enwp counterparts -FASTILY 21:29, 22 May 2014 (UTC)[reply]
- I was talking about "metadata which was stored in the file description page", but you are right that EXIF is also part of metadata. I agree that if file is using some exotic infobox template unknown to commons, than we might run into problems, but {{Spoken article entry}} (which I have never heard about before) was around since 2008. Unfortunately what is often happening is that file is transferred, without any descriptions other than little bit in Original upload log section and than the original is deleted and now you need an wikipedia admin to look up what was the description of the file. You can do it for a few files but with large number of files, as in this case it becomes quite a lot of hassle. I do not think this is a show stopper, but may be the codes need to be tweeked somehow as to at least recognize when things go wrong like that. But than again that might be more of a problem with user:Fastily script than proposed extension to upload wizard. --Jarekt (talk) 18:29, 22 May 2014 (UTC)[reply]
Wait wait
[edit]First off, there is literally nothing preventing this from being done in JavaScript. The WikiLabs tool just has to implement the Access-Control-Allow-Origin header.
Second off, I have serious concerns about how the tool will implement the transferring of templates. I realize that this was discussed above, but the answer is not simple. The tools currently keep a list of templates which look something like this:
- global
{{Information}}
=> [special information handler]{{PD-self}}
=>{{PD-user-w}}
,1=$proj
,2=$uploader
{{NowCommons}}
=> [ignore]{{PermissionOTRS}}
=> [copy exactly][maintain location]{{self}}
=> [copy exactly],author=[[:$proj:$code:User:$uploader|]] at [http://$code:$proj.org $code.$proj]
- English
{{GFDL-self}}
=>{{GFDL-user-en-no-disclaimers}}
,1=$uploader
,migration=$migration
- Arabic
{{معلومات}}
=> [special information handler]{{ملكية عامة - شخصي}}
=>{{PD-user-ar}}
,1=$uploader
Additionally, we have to worry about things like transferring the author and upload date in the summary. Currently, the tools add this automatically via {{Original uploader}}, {{Original upload date}}, and {{Transferred from}}.
As you can see, implementing the logic for this is extremely complex. Is the software is any way set up to handle this level of complexity?
My guess is it that it is not, and this could cause incorrect or incomplete information to be transferred. We absolutely must get this question answered before we implement such a thing. Magog the Ogre (talk) (contribs) 03:46, 13 July 2014 (UTC)[reply]
- Completely agree that implementation has to be done properly and there are a lot more rules that must be applied; and caveats. I think all users who commented agree to this. But please bear in mind that there is no such software doing the transfer yet, and secondly involving a labs tool means a) Labs has more load b) an additional source of uncertainness is added to the pipeline, thus the probability of downtime of the tool that does not yet exist is increased. And downtime matters. -- Rillke(q?) 08:23, 13 July 2014 (UTC)[reply]
- I am not really sure you understand what is being proposed here. This is not about whether or not we want to have a certain tool which has a specific logic to transfer these files. This RFC is only about enabling the software feature which would allow transferring files without downloading them to the user's machine and then uploading them from there. Current tools already exist and all work this way. There is no reason why these could not be modified to use this new feature. I agree that a tool to transfer files must be implemented in such a way that all vital information is preserved but this is completely beyond the scope of this RFC. Regards, -- ChrisiPK (Talk|Contribs) 20:41, 15 July 2014 (UTC)[reply]
- Isn't this RfC only about transferring the files themselves? Generation of suitable text for the file information page would appear to be a separate task and out of scope for this RfC. --Stefan4 (talk) 22:51, 17 August 2014 (UTC)[reply]
- The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.