Commons talk:OpenRefine

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

OpenRefine's file upload functionality should support all file formats that are natively supported by Wikimedia Commons as well[edit]

Tracked on GitHub #4276

In a conversation on the Wikimedia Commons Telegram channel, and in various other places, I have regularly heard people ask 'but will OpenRefine support batch uploading (video files) / (pdf files) / (djvu files)...' This sounds like a no-brainer, but as I've noticed that other batch upload tools sometimes don't support certain file formats, it's good to put this on the radar and to be aware / investigate why certain file formats would prove to be challenging. Spinster (talk) 07:06, 7 November 2021 (UTC)Reply[reply]

Integrate thumbnail previews during the batch file upload process[edit]

Tracked on GitHub #4277

Request received in a conversation in the Wikimedia Commons Telegram channel. During a batch upload process of media files, it is extremely helpful if one can (easily) see thumbnail previews of the media files that are being uploaded.

Some existing Wikimedia Commons (batch) upload tools support this indeed (the default UploadWizard, for instance), others don't (Pattypan only shows previews of files and their infoboxes during the checking phase of the upload process, after all data has been prepared already).

OpenRefine is essentially a data-centric tool, so this may be a stretch, but it's good to have this request on the radar, as it makes a lot of sense IMO. Spinster (talk) 07:26, 7 November 2021 (UTC)Reply[reply]

Error log while uploading[edit]

Using OpenRefine 3.7 on Windows. I see on the openrefine console errors. Is the a written log where I can see the errors happing during the upload process? Raymond 08:35, 27 July 2022 (UTC)Reply[reply]

Hi @Raymond, thank you for trying OpenRefine! I am not sure if I fully understand your question. Are you interested in seeing more specific information about errors happening during upload than the ones you are currently seeing in the console? What kind of errors would you be interested in seeing? We could certainly do a better job at making the error reporting more detailed and descriptive. Best, Spinster (talk) 15:53, 11 August 2022 (UTC)Reply[reply]
@Spinster During the upload runs I have seen different errors in the console:
  1. http-bad-status: There was a problem during the HTTP request: 415 Unsupported Media Type
  2. verification error: File extension ".jpg" does not match the detected MIME type of the file (inode/x-empty)
  3. abusefilter-warning: abusefilter-warning-overwriting-artwork
Screenshots: photos.app.goo.gl/84rzjVweQUrUFC338 (sorry, unable to link the URL because of the spam filter)
The reasons for 1. and 2. are currently not checked. Probably corrupt files. The reason for 3. is known. My fault because I forgot to exclude some already uploaded files.
What I am missing: a saved error log which includes the file names. Currently I have no idea which files are generating the errors from 1. and 2. Raymond 12:07, 13 August 2022 (UTC)Reply[reply]

(edit): Comment added to https://github.com/OpenRefine/OpenRefine/issues/5166 now. Raymond 14:58, 13 August 2022 (UTC)Reply[reply]

Flag for already finished uploads[edit]

Maybe I missed it somewhere in the docu: Is there a flag to filter for already finished uploads in case I have to stop OpenRefine in the middle of the upload process? Raymond 08:39, 27 July 2022 (UTC)Reply[reply]

Hi Raymond, apologies for the late reply. If an upload failed in the middle, I would expect the file names of already uploaded files to appear in blue (reconciled) in your OpenRefine project. If that is not the case: you can go in your OpenRefine project, select the column of file names, clear reconciliation data, and reconcile that column against Wikimedia Commons again. The files that have already been uploaded will appear blue (reconciled); the other ones will again need to be marked as to be created new. Next, you can filter down to both separate sets by using the reconciliation judgment facet. I hope this makes sense! Spinster (talk) 15:52, 11 August 2022 (UTC)Reply[reply]
@Spinster Thank you for the answer. That helps a lot and works now. Upload is running since yesterday. One comment: " I would expect the file names of already uploaded files to appear in blue (reconciled) in your OpenRefine project." Sadly not. Bug or feature? I had to do the steps you suggested (clearing, reconcile). Do you see any chance to avoid this step in a newer version? Raymond 11:53, 13 August 2022 (UTC)Reply[reply]

A messy project[edit]

Hi, Here is a copy of my report to the forum.

Hi, I am trying to upload (a lot of) files to Commons, so I tried OR 3.7beta2 and OR 3.8-20221220.184714 (Java included).

OR 3.8-20221220.184714 loads OK, but the “Next” button after selecting files is not accessible, whatever number of files is selected. I am stuck there.

OR 3.7beta2 with Java included doesn’t even load (Java not found).

Second issue: Firefox is used, although Chrome is my default browser. OR loads in Chrome when copying-paste the URL http://127.0.0.1:3333/ to it.

General comment: Selecting files before a project exists in counter intuitive. The right order should be: first create a project, then select files to include. Yann (talk) 18:01, 26 December 2022 (UTC)Reply[reply]

Pattypan comparison[edit]

I'm not sure what is meant with "You can't edit data inside Pattypan."? I think most Pattypan users rarely uses the spreadsheet and only uses the built-in edit features. Abbe98 (talk) 11:22, 10 January 2023 (UTC)Reply[reply]

Userbox[edit]

I copied over the userbox from Wikidata, to make it possible use it on Commons too. You can use {{User loves OpenRefine}} and it gives you the following:

This user loves OpenRefine.



Not sure if this should be advertised on Commons:OpenRefine? I could not find a fitting section for it.

Those userboxes are quite useful to find a list of enthusiast users (it will populate Category:OpenRefine user when it gets used). The userbox is also available on meta. − Pintoch (talk) 18:41, 6 September 2023 (UTC)Reply[reply]

Permission denied for Upload[edit]

I'm getting MediaWiki error while editing. It's showing "The action you have requested is limited to users in one of the groups: Users, Autoconfirmed users, Administrators, Confirmed users." I'm already a autoconfirmed users and don't know why I'm getting this error. Does anyone has got similar error? ❙❚❚❙❙ GnOeee ❚❙❚❙❙ 05:13, 23 September 2023 (UTC)Reply[reply]

FilePath on PAWS[edit]

I am trying to build an upload with OpenRefine in the PAWS environment, but unfortunately i am failing with the file path. I have uploaded the files to my PAWS directory. But both attempts are failing: using the public-url https://public-paws.wmcloud.org/User:ZentralGut/CommonsUpload/Files_StAOW_Images/StAOW_257460.jpg fails due that this domain is not allowlisted. the linux-based filepath "/CommonsUpload/Files_StAOW_Images/StAOW_257460.jpg" is rejected as filepath before upload starts. Has anyone some experience with PAWS and file uploads? Best ZentralGut (talk) 13:36, 15 October 2023 (UTC)Reply[reply]

Found the solution myself - add "/home/paws/" before your directory/filepath in your paws-account. ZentralGut (talk) 13:44, 15 October 2023 (UTC)Reply[reply]

Commons extension is missing on PAWS[edit]

i dont seem to see this extension?

i tried to follow Commons:OpenRefine/Advanced tips and tricks#Adding the Wikimedia Commons reconciliation service to OpenRefine but it says

Error contacting recon service: timeout : timeout - https://commonsreconcile.toolforge.org/en/api

how do i boot OR up on paws to upload files? RZuo (talk) 18:37, 28 January 2024 (UTC)Reply[reply]

@RZuo: Thanks for reporting this! I am not sure how the reconciliation service ended up in this state. I have restarted it and it seems to be accessible again. Can you try again on your side? − Pintoch (talk) 10:11, 29 January 2024 (UTC)Reply[reply]
@Pintoch thx a lot! your toolforge page is back up and i was able to add it as a "standard service". RZuo (talk) 10:25, 29 January 2024 (UTC)Reply[reply]