Commons:Batch uploading/Minerals from Rob Lavinsky on irocks.com

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Minerals from Rob Lavinsky on irocks.com

As described in Commons:Robert_Lavinsky, Bob Lavinsky has not only offered his images on mindat.org (which have been processed meanwhile), but also all the images on his own website irocks.com. Some discussion (how to identify the images and similar issues) took already place on Commons_talk:Robert_Lavinsky.

The image descriptions will mostly be the same as for the mindat.org upload, e.g.: File:Cassiterite-31073.jpg.

Quantity structure

  • 53756 images on irocks.com
  • 20582 mineral images that are different from the mindat.org images

Problems

Can you give an example, what the bot can do? Perhaps you can load the pictures without locality information, but with the rest of the picture description. Locality information could added then, later. -- Ra'ike T C 11:00, 7 May 2010 (UTC)
I am just about to aggregate and match the locality information - as with the mindat images it is much easier to do the matching beforehand and automatically than afterwards and manually. The same applies to the mineral names and categories... But there is a lot of comments and sales oriented stuff in the descriptions as "Azurite (Chessylite) with neat old label", "Rhodonite (illustrated 3 TIMES!)" or "incredible! like glass! complete 360 !" which is hard to handle by the bot. I guess there is a lot of manual postprocessing necessary in any case. --Reinhard Kraasch (talk) 12:14, 7 May 2010 (UTC)
Hello Reinhard, is it really a big problem? We have an own Category:Herkimer diamond for this special quartz and the pictures of Lavinsky from mindat where allways landed correct there (like this). I also don't know about another mineral with missunderstanding name like herkimer diamond. greetings -- Ra'ike T C 21:16, 20 May 2010 (UTC)
The mindat material was much better categorized (e.g. the varieties really marked as varieties), so these two sources (irocks and mindat) are quite different. If there really is no other ambiguos category (I have no idea - I'm not an expert in this area), I will just move the "diamonds" to "Herkimer diamond". --Reinhard Kraasch (talk) 09:01, 21 May 2010 (UTC)
Could you scan the descriptions for the catchword "herkimer"? Because only "herkimer diamonds" are quartz, all other diamonds are really diamonds. -- Ra'ike T C 16:36, 22 May 2010 (UTC)
I checked this already (searched for the category "quartz", which should do the same, I guess), and only modified the images which show no "real" diamonds --Reinhard Kraasch (talk) 18:59, 22 May 2010 (UTC)
That's a bad problem :-/ On one side, it's fine to have a serial of different views of a mineral, but on the other side, the description of that (irocks) serial is terrible.
Could you load the pictures from irocks and change the description then to that from the first duplicate picture (mindat)? -- Ra'ike T C 16:36, 22 May 2010 (UTC)
I'm just investigating in this - after all I will have to fingerprint the images to identify the duplicates. I had hoped I could save myself that extra effort ... --Reinhard Kraasch (talk) 16:44, 22 May 2010 (UTC)

Here are the duplicates within the 100 images uploaded so far:

With these images I took the description from the mindat.org page instead of the irocks.com page (and have set the reference accordingly) --Reinhard Kraasch (talk) 20:59, 22 May 2010 (UTC)

Further problems

See here --Reinhard Kraasch (talk) 20:44, 30 May 2010 (UTC)

Test Upload

Here are the first images:

If there is no contradiction, I will start the actual upload with the next lot of 1000 images this evening. --Reinhard Kraasch (talk) 12:18, 24 May 2010 (UTC)

Hello Reinhard, could it be, that we now have more duplicates as we wanted? For example, the new File:Iron-4jg49a.jpg is a duplicate of the old File:Iron-20983.jpg.
I'm afraid, that we first have to control the bot uploads for double pictures, before he should upload the next 1000... -- Ra'ike T C 06:15, 25 May 2010 (UTC)
There is no need to do anything manually. The list is only a means to compare the image descriptions, not a direction to delete or resolve the duplicate problem manually.
I get an error message when I upload an image already existing (there are also images which have been uploaded manually by other users before), which I will evaluate later. In any case, the mindat.org image should be deleted - and not the irocks.com one - since then the references to the "other views" will not have to be repaired. --Reinhard Kraasch (talk) 09:04, 25 May 2010 (UTC)
*seufz* Sorry, if I made once more manual as necessary, but somehow we never get enough informations about the ability of the Bot. The list above suggested, that someone has to examine the pictures again and maybe merge the galleries. Therefore my attempt to regulate that with the outsourced list. If my action (and the deletion of the picture) wasn't necessary or even disturbs the expiration of the Bot, because he cannot connect the galleries right (if I understood that correctly), then you should cancel my action and explain again exactly, what the Bot really can do and which handwork is needed after the bot work. Otherwise we talk and work again past each other and that unnecessarily wastes time and nerves. Greetings -- Ra'ike T C 11:57, 25 May 2010 (UTC) Falls das auf englisch nicht so verständlich sein sollte, besser nochmal in deutsch ;-): Sorry, wenn ich da mal wieder mehr "händisch" gemacht habe, als nötig, aber irgendwie kommt nie richtig rüber, zu welchen Leistungen der Bot in der Lage ist. Die obige Liste suggerierte, dass man die Bilder nochmal prüfen und die Galerien der verschiedenen Ansichten möglicherweise zusammenführen müsste. Deshalb mein Versuch, das über die ausgelagerte Liste zu regeln. Wenn das (und die Löschung des Bildes) gar nicht nötig ist und vielleicht sogar den Ablauf stört, weil der Bot die Galerien nicht richtig verbinden kann (wenn ich das richtig verstanden habe), solltest Du meine Aktion wieder rückgängig machen und nochmal genau erklären, was der Bot wirklich kann und welche Nacharbeiten wirklich noch nötig sind. Andernfalls reden und arbeiten wir immer wieder aneinander vorbei und das würde unnötig Zeit und Nerven verschwenden.
Na ja, "der Bot" ist ja nichts Festes, ich bin ständig dabei, die Routinen anzupassen und den Anforderungen nachzuarbeiten. Insofern ist das letztendlich eine Sache der Diskussion - nach dem Motto: "Schön wäre es, wenn der Bot dies oder jenes noch machen würde..." Wobei mir schon klar ist, dass wir da manchmal zwangsweise aneinander vorbeireden (insbesondere auf Englisch...). Weder kannst du mir in voller Schönheit sagen, wie sich aus Robs Daten das ideale Mineralienbild ableiten lässt, noch kann ich sagen, was sich alles per Bot erledigen lässt (die Grenze ist letztendlich mein Zeiteinsatz - im Prinzip geht alles, aber es muss halt irgendwo auch im Verhältnis zum Nutzen stehen und auch rein zeitlich überhaupt machbar sein ...). Und die Diskussionsbeiträge dritter sind ja sehr überschaubar... Im Grundprinzip möchte ich aber mal festhalten: Manuelle Nacharbeit ist erst erforderlich, wenn der Upload beeendet ist. --Reinhard Kraasch (talk) 12:41, 25 May 2010 (UTC)
Short summary for readers not understanding German: There is no need to start any manual procedure right now, especially it is not necessary to delete the duplicates manually, this will be done by the bot. (And then there is still plenty of time to find out which other jobs can be done by the bot). --Reinhard Kraasch (talk) 19:07, 25 May 2010 (UTC)
Also, was die Bildbeschreibungen angeht, sind wir eigentlich schon an einem ziemlich guten Punkt angelangt. Wenn ich andere freie Bilder von mindat hochlade, übernehme ich diese Form bereits.
Eine konkrete Frage: Bei File:Beryl-Schorl-d05-15a.jpg und vielen anderen ist eine gallery mit einem scheinbar unnötigen Selbstverweis zu finden. Kommen da noch andere Ansichten nach?
Dem muss ich noch mal auf den Grund gehen...
Die Selbstverweise habe ich rausgeworfen. --Reinhard Kraasch (talk) 15:23, 26 May 2010 (UTC)
Dann noch etwas: Könnte man wohl bei den Bildern, die fluoreszierende Minerale zeigt, wie z.B. File:Calcite-Pyrite-Fluorite-245540.jpg die Category:Fluorescent minerals ergänzen? Bisher habe ich das jeweils händisch erledigt, soweit ich Bilder dieser Art gefunden habe, aber automatisch wäre natürlich besser. Gruß -- Ra'ike T C 21:41, 25 May 2010 (UTC)
Das ist - wenn ich es richtig verstehe - eine Bildeigenschaft (das Bild zeigt die Fluoreszenz). Oder kann man das irgendwo aus dem Mineral selbst oder aus der Beschreibung ableiten. Wenn es das letztere ist, kann der Bot es machen, aber Bilder optisch klassifizieren, das kann er nicht... --Reinhard Kraasch (talk) 00:53, 26 May 2010 (UTC)
Das ist tatsächlich eine Mineral- bzw. Bildeigenschaft, die sich aber immer in der Beschreibung niederschlägt mit den Stichworten "fluorescence" oder "fluorescent". Müsste sich also leicht rausfiltern lassen. -- Ra'ike T C 11:57, 26 May 2010 (UTC)
Ich kann ja - wenn alle Bilder hochgeladen sind - diejenigen herausfiltern, die diese Stichworte in der Beschreibung haben. Dann kann man sehen, wie treffsicher der Filter ist, und je nach Aufwand entweder alle gefilterten Bilder automatisch mit der Kategorie ausstatten und die paar, bei denen es nicht passt, hinterher wieder manuell zurückändern - oder nur diejenigen, bei denen es passt, manuell ändern. --12:21, 26 May 2010 (UTC)

You now find a list of fluorescent minerals (or at least minerals that have the substring "fluorescen" in their description) here: Commons:Batch uploading/Minerals from Rob Lavinsky on irocks.com/Fluorescent minerals --Reinhard Kraasch (talk) 21:56, 29 May 2010 (UTC)

Thank you Reinhard. I have controlled them all and given that picture the Category:Fluorescent minerals, which show that optical attribute. greetings -- Ra'ike T C 10:33, 31 May 2010 (UTC)

Opinions

Assigned to Progress Bot name Category
Reinhard Kraasch finished RKBot Category:Images by Rob Lavinsky
Source
So it's easier to control the uploads for mistakes. -- Ra'ike T C 10:55, 12 May 2010 (UTC)
I mixed slashes and backslashes with the link - fixed meanwhile. Regarding the description pages: With the more common species (like Quartz: http://www.irocks.com/render.html?species=Quartz) you would then have to click through hundreds of further pages, I think that is not really practional. One could directly render the appropiate page: http://www.irocks.com/render.html?species=Quartz&page=238 but this I fear is not very stable. -- Reinhard Kraasch (talk) 12:25, 12 May 2010 (UTC)
For the moment, I cannot find File:Quartz-bb79a.jpg on irocks.com and there are too much pictures of quartz there, so I give another example, which isn't uploaded yet:
On the page http://www.irocks.com/render.html?species=Galena&page=9 is a picture namend "MD-149540 - Linarite, Anglesite, Galena" (description: Blanchard Mine (Portalas-Blanchard Mine), Bingham, Hansonburg District, Socorro Co., New Mexico, USA. thumbnail, 2.6 x 2.1 x 2 cm.)
Pictures are http://www.irocks.com/db_pics/mdpics/MD-149540a.jpg (info: 59,95 KB , 500px × 404px)
and http://www.irocks.com/db_pics/mdpics/MD-149540b.jpg (49,16 KB , 500px × 401px)
So you can see, we need both source pages and the size "thumbnail" on the description page only can mean the mineral specimen size. The pictures are big enough. -- Ra'ike T C 09:43, 13 May 2010 (UTC)
Ich schreib jetzt einfach auf deutsch, es ist nicht einzusehen, dass wir uns hier einen Wolf zurechtübersetzen: Das Problem ist, dass die Seite http://www.irocks.com/render.html?species=Galena&page=9 dynamisch aus dem Datenbestand gerendert wird, d.h. was jetzt auf Seite 9 landet, hängt davon ab, wieviele andere Mineralien die "Galena"-Suche liefert. Heute ist die Beschreibung auf Seite 9, morgen hat Rob vielleicht 40 weitere Galenit-Bilder eingestellt, dann ist sie auf Seite 13... Diese Seiten-Angabe ist nichts Statisches, auf das man sich verlassen kann. --Reinhard Kraasch (talk) 13:20, 16 May 2010 (UTC)

In the size field there are dimension tags ("cabinet", "large cabinet", "miniature", "small cabinet", "thumbnail" and "toenail"). Should I translate these to German (when: which translations: "Zehennagelgröße"?) or simply discard them? --Reinhard Kraasch (talk) 12:31, 12 May 2010 (UTC)

See above ;-) -- Ra'ike T C 09:43, 13 May 2010 (UTC)
Das ist mir schon klar - die Frage war, ob ich diese Angaben (für die deutsche Beschreibung) übersetzen oder weglassen soll. Also: Größe: "Daumennagel", 2.6 x 2.1 x 2 cm.!? --Reinhard Kraasch (talk) 13:20, 16 May 2010 (UTC)
Ups, und nee. Eine solche Angabe ist imo mehr als unsinnig. Einfach "Größe: ..." reicht völlig :-D Gruß -- Ra'ike T C 08:09, 17 May 2010 (UTC)

Details

Assigned to Job Status Comments
Reinhard Kraasch Image (and description) download from irocks.com Status:    Done 15:00, 3 May 2010 (UTC) All irocks.com images have been downloaded
Reinhard Kraasch Generate image descriptions and locality info Status:    Done 19:32, 11 May 2010 (UTC)
Reinhard Kraasch Test upload Status:    Done 19:32, 11 May 2010 (UTC) 100 images
Various Discussion of test upload Status:    Done 14:00, 24 May 2010 (UTC)
Reinhard Kraasch Fixes to upload procedure Status:    Done 14:00, 24 May 2010 (UTC)
Reinhard Kraasch Actual image upload Status:    Done 19:51, 28 May 2010 (UTC) 20100+ images
Reinhard Kraasch Delete image duplicates Status:    Done 08:40, 2 June 2010 (UTC) 1800+ images
Reinhard Kraasch Generate missing locality categories Status:    (Not necessary)
Reinhard Kraasch Delete duplicate locality categories Status:    (Not necessary)