Category talk:Photos from Fotopedia

From Wikimedia Commons, the free media repository
Jump to: navigation, search

Intro[edit]

moved from Commons:Village pump#Fotopedia

Hi. w:Fotopedia is shutting down on August 10, 2014 (in 9 days). They host photos, some of which are under free licenses (CC). Could those photos be retrived to Commons? --Rinaku (t · c) 14:04, 1 August 2014 (UTC)

(The interface is atrocious!) I browsed a few “stories” for photos and found some that are "©" (copywrighted), others "㏄⃝🚹⃝" (cc-by), yet others "㏄⃝🚹⃝⤿⃝" (cc-by-sa). I could not found any central repository, index, listing, or anything. A typical photo url is: http://i.images.cdn.fotopedia.com/r_TTcQJLHZ0-KuVrNYDJaqs-hd/Countries_of_the_World/America/United_States/Garden_of_the_Gods.jpg… -- Tuválkin 19:37, 1 August 2014 (UTC)
Uhm, this is a bit embarrassing, but I have to ask: How did you manage to download a picture? I tried, but I didn´t find out how to do that. --Rudolph Buch (talk) 20:14, 1 August 2014 (UTC)
Check something like «multimedia resources» under «page properties», it should work for any browswer (except maybe for MSIE, I don’t know — stopped using it in 1997) running in a proper computer (even in a Mac, I hope, but probably not in the kind of “device” this craptastic interface was designed for). -- Tuválkin 22:39, 1 August 2014 (UTC)
Such a beautiful interface, but at the same time so frustrating because it totally doesn't give you what you want. The best way I could find free images was via https://www.google.ca/search?as_st=y&tbm=isch&hl=en&as_q=&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=fotopedia.com&safe=images&tbs=sur:fmc . Bawolff (talk) 20:10, 1 August 2014 (UTC)
Fotopedia is closing down because a large proportion of its content is from other websites e.g. many articles simply mirror existing Wikipedia articles. Of the approximately 1.5 million images, it appears that two-thirds may be from Flickr. Of the remainder, it seems that only about 22,000 images are both non-Flickr and have Commons-compatible licenses. It might be worth asking a bot operator if they could harvest these ones but it would be difficult with barely a week left and many Wikimedians getting ready to party with Jimbo and the Gang very soon. Green Giant (talk) 21:29, 1 August 2014 (UTC)
From Green Giant’s filtered google search I grabbed the 1st 100 pairs of image url + page url. It is only worth to upload them if there is a license reviewer to testify the original CC-BY and CC-BY-SA licenses before the site goes belly up. -- Tuválkin 23:07, 1 August 2014 (UTC)
Those 100 ✓ uploaded now, added to the 98 already in Category:Photos from Fotopedia. Hardly dented those 20 thousand, sadly. A bot operation would be good. Those paltry 100, although grabbed via browser addon and uploaded with Vicuña, still took me a while to prepare, I probably missed the highest rez for some pics, for a few stoopid filenames had to be allowed, and still to be pinpointed authorship and added categorization. -- Tuválkin 03:09, 2 August 2014 (UTC)
Rudolph Buch the download links are on the image page rather than the image itself. As an example, look at this page, and note there are two buttons labelled "Download" and "Actions". If you click Download it will open the full size image in a separate tab and then you can save it using your right-click menu. If you click Actions there is a dropdown menu, from which selecting "Download original" will again open the image in a separate tab and then it can be saved as above. Green Giant (talk) 21:39, 1 August 2014 (UTC)
Meanwhile: Around 100 98 images from Fotopedia were had already been brought over to Commons: Special:Search/Fotopedia. -- Tuválkin 23:07, 1 August 2014 (UTC) (clarified. -- Tuválkin 05:20, 2 August 2014 (UTC))
Tuválkin: I am trying to review these, but the link to Fotopedia is incomplete. Regards, Yann (talk) 04:59, 2 August 2014 (UTC)
(No, you’re trying to review not these old ones, but the new ones, mentioned a couple lines above…) Many thanks! The mass upload was done with a generic link to the main page of Fotopedia as the source; I'm now uptading it, as said. The whole list is at Category talk:Photos from Fotopedia#Recently uploaded. -- Tuválkin 05:08, 2 August 2014 (UTC)
OK, tell me when you have done with updating the links, then I can help reviewing. Also I noticed that several files didn't have the right license (cc-by-sa instead of cc-by, or wrong number) Regards, Yann (talk) 12:36, 2 August 2014 (UTC)
I am 2/3 through the category reviewing images. Some remarks: as I said to your talk page, please do not add the direct link, it confuses the review script, and I don't think it adds anything. Then it takes longer for an already tedious task. Some of the licenses are wrong: cc-by-sa instead of cc-by, or wrong number. Then some images do not have any link to Fotopedia. I also rename some images with a meaningless name, and put for deletion some with a -nc or -nd license, and one out of scope self portrait. Regards, Yann (talk) 11:11, 5 August 2014 (UTC)
Yesterday, on schedule, it went offline. There’s a placeholder page now, with final credits roll. -- Tuválkin 02:26, 12 August 2014 (UTC)

Filenames in Fotopedia[edit]

From what I learned so far, a “grabbable” direct link to a maximum resolution image in Fotopedia is something like

http://images.cdn.fotopedia.com/(author)-(photo)-original.jpg

The respective file information page includes useful things like its licensing, title, author name (and link) and usage; its generic url is

http://www.fotopedia.com/items/(author)-(photo)

Author pages’ generic url is

http://www.fotopedia.com/users/(author)

(Pretty simple, after all, and simpler than most such sites’ urls.) Given (photo) and (author), a competent bot could scrubb this site off its 22000 compatibly licensed photos in a blink. That’s out of my abilities, though. -- Tuválkin 13:23, 5 August 2014 (UTC)

Some filenames (so far a couple dozen among almost thousand) follow a different naming convention:
http://images.cdn.fotopedia.com/(uuid)-original.jpg
I don’t know why this is so, nor whether this is a more or less complete system of synonymity. So far a given image is either labelled like this or (most often) as given above. -- Tuválkin 20:52, 6 August 2014 (UTC)

Both author and photo are made up of 11 characters, repeatable, case sensitive, from a-z, A-Z, 0-9, and also hyphen and underscore. (The connecting hyphen may neighbour a code hyphen on either side). -- Tuválkin 20:52, 6 August 2014 (UTC)

Some author are not a jumble of 11 random characters, but a human readable/entered string, that can be less than 11 characters. -- Tuválkin 23:49, 6 August 2014 (UTC)

Recently uploaded[edit]

Here’s the list of the urls for the 100 additional images uploaded today — of among about 22 thousand still to go in one week, before this site closes down. -- Tuválkin 02:48, 2 August 2014 (UTC)

Second manual batch[edit]

I pre-selected and uploaded locally 683 669 656 more images (max res); this set doesn’t include any evident off-scope, copyvio/FoP, nor duplicate images. I’ll upload them with much better starting info, including also Yann’s remarks. -- Tuválkin 15:38, 5 August 2014 (UTC)

A first experimental batch of 34 was uploaded just now — some great photos here! They need license review from an admin, now (pinging User: Yann). They likewise need categorization, some cropping, etc., but that can be done also after Fotopedia.com goes belly up, so I’ll focus on uploading more for now: 635 to go. -- Tuválkin 04:36, 6 August 2014 (UTC)
There is some issue with the "Info" template. See File:Thai boys eating icecream.jpg. I think the name could be improved. I renamed this. And you should ask for reviewer right. Regards, Yann (talk) 14:01, 6 August 2014 (UTC)
I know about the issue with the "Info" template, it is caused by me cramming more stuff in than Vicuña would let me to without breaking it. I did made some improvements in the two most recent batches (mainly now lowercase template arguments, to avoid automated duplication) and I do plan to fix it up upon recategorizing; the cleanup is trivial, anyway. As for reviewer right — cool, but it defeats the goal if one reviews one’s own uploads, especially when we’re pressed for time, I think. (Concerning filenames, we’re done talking.) -- Tuválkin 21:40, 6 August 2014 (UTC)
I end up uploading 574 photos from this site; other people uploaded a few more, adding to the 98 we had from before the demise of Fotopedia was announced. Could have been (even) worse, in terms of salvaging, but read below about alternatives. -- Tuválkin 02:26, 12 August 2014 (UTC)

License washing?[edit]

Hi, Seeing that many images on Fotopedia are copied from Flickr, I have some worries about license washing. See these cases: Commons:Deletion requests/File:Sylvain Lefebvre.jpg and Commons:Deletion requests/File:Ramos Casillas Copa del Rey 2011.jpg‎. Regards, Yann (talk) 17:51, 5 August 2014 (UTC)

Yes, these cases need to singled outand shot on sight (maybe also this), but doesn’t seem to be an egregious case of serial copyvio, surely not enough to blanket the site as a questionable source. Of course all images from delinquent uploaders should be scrutined in priority after a copyvio is detected. (I’m glad these were not among those uploaded by me, at this time of requiem for Fotopedia…) -- Tuválkin 05:03, 6 August 2014 (UTC)
The whole site not, but all images which have been copied from Flickr. If either it is not available on Flickr, and the license is not acceptable on Flickr, we should question them. Regards, Yann (talk) 15:12, 6 August 2014 (UTC)
So far I didn’t find any clear indication that an image is also available in Flickr. Regardless of anything else, though, as Fotopedia is closing in 3 days and Flickr is not, anything that can be grabbed from Flickr should be left out, because time urges. -- Tuválkin 21:43, 6 August 2014 (UTC)

An alternative approach?[edit]

Given that we have a list of 22'000 URLs that we'd like to copy, but currently no bot to do so... would it be maybe faster to save those 22'000 URLs to web.archive.org? I just did so for http://www.fotopedia.com/items/9MaaEZ8Hz0A-a2a7eY_MvkQ, and now archive.org even serves the full-res original file from its own copy. Just an idea... (Note: other items already exist at archive.org, for instance http://www.fotopedia.com/items/AW6O4d24noA-pTK2E6PugrM exists at [1], and the download link also works and serves the full-res from archive.org.) Lupo 22:17, 6 August 2014 (UTC)

Looks like a fine idea, but do we «have a list of 22'000 URLs»? I hope so, but I don’t — who does? -- Tuválkin 23:43, 6 August 2014 (UTC)
Isn't that the Google search that Green Giant posted above? Lupo 05:42, 7 August 2014 (UTC)
Maybe, but I do not know how to extract what the browser gets from that http-req into a raw text list of urls. If someone knows, please post it here. -- Tuválkin 17:04, 7 August 2014 (UTC)
Also: it might be worth trying to contact Fotopedia directly. Maybe they'd be interested in collaborating with WMF to salvage some of their user's contents. Lupo 05:45, 7 August 2014 (UTC)
ArchiveTeam has a project for saving Fotopedia's content. --Gazebo (talk) 05:06, 8 August 2014 (UTC)
Yay! They saved it all, should be available soon. -- Tuválkin 05:54, 8 August 2014 (UTC)
It seems the images with potential to be hosted here may find another home, according to a Creative Commons blog posting of 9 August.—Odysseus1479 (talk) 06:59, 10 August 2014 (UTC)
The Creative Commons images (I don't know if all free CC or any CC) are at https://archive.org/details/2014.08.fotopedia-cc-export-collection (277 GB). The collection linked above is mostly meant for Wayback machine (WARC files).
They're mostly unfree though:
$ grep -Ec "/(publicdomain|by|by-sa)/" export-fotopedia-cc.tsv 
48373
$ grep -Evc "/(publicdomain|by|by-sa)/" export-fotopedia-cc.tsv 
258220
--Nemo 15:43, 15 August 2014 (UTC)
(ec) Excellent find. I downloaded http://archive.org/download/2014.08.fotopedia-cc-export-collection/fotopedia-tar.txt — the full list of all file-names and -sizes (and local timestamps and Unix flags). Too bad the HTML files are bundled with the JPGs — one needs to download that almost 300 Gb blob to get a couple megabytes of information and from there list the (relatively few) ones with CC-by and CC-by-SA licenses. -- Tuválkin 19:00, 15 August 2014 (UTC)
Never mind: It is trivial to extract from export-fotopedia-cc.tsv that list of the (relatively few) photos with CC-by and CC-by-SA licenses. Someone savvy with grep and gz could do it. -- Tuválkin 20:50, 15 August 2014 (UTC)
Doesn't seem to have happened yet though. :) Commons:Batch uploading/Fotopedia (thanks for creating, this discussion in Category_talk is not so easy to find). --Nemo 21:51, 19 September 2014 (UTC)