Commons:Batch uploading/MH Ile-de-France
Uploading 3018 historical monuments pictures/data from the Base Mémoire and data.iledefrance.fr, using the Special:GWToolset, to Category:Files from Base Mémoire via_data.iledefrance.fr.
Metadata is pre-processed with Python and converted to XML, which is fed to the GWToolset, which through the mapping GWToolset:Metadata Mappings/Jean-Frédéric/MH IDF.json gives it to the ingestion template {{Ingestion-MH IDF}}
<record> <_ext>jpg</_ext> <lieucor></lieucor> <commons_title>Eglise_Saint-Quiriace_-_Choeur_-_Provins_-_Médiathèque_de_l'architecture_et_du_patrimoine_-_APMH00006801</commons_title> <wgs84_1>3.290922</wgs84_1> <wgs84_0>48.560847</wgs84_0> <_filename>sap01_mh006801_p.jpg</_filename> <filename>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_p.jpg</filename> <autoeu></autoeu> <merimee_id>PA00087203</merimee_id> <ref>APMH00006801</ref> <reg>Ile-de-France</reg> <nump>MH0006801</nump> <leg>Choeur</leg> <adresse></adresse> <serie></serie> <insee>77379</insee> <autp>{{Creator:Jean-Eugène Durand}}</autp> <copy>Ministère de la Culture (France) - Médiathèque de l'architecture et du patrimoine - diffusion RMN</copy> <categories>[[Category:Collegiate Saint-Quiriace of Provins]] [[Category:Jean-Eugène Durand]]</categories> <dpt>77</dpt> <scle></scle> <datpv></datpv> <typdoc>Négatif</typdoc> <video_p>http://data.iledefrance.fr/api/datasets/1.0/photographies-serie-monuments-historiques-1851-a-1914/images/e434ac6fcaad3030fd11eec78bdee2ff/</video_p> <video_v>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_v.jpg</video_v> <_url>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_p.jpg</_url> <edif>Eglise Saint-Quiriace</edif> <com>Provins</com> </record>
Alignment
[edit]Categorisation
[edit]Through the alignment some categorisation is made. Here are the numbers.
- Updated Jean-Fred (talk) 23:17, 9 July 2014 (UTC)
- Per category
- The collection has 6223 categories, 136 distinct ones
- The most used category is on 956 files
- The less used on 1 files
- On average, a category is used 45.8 times (mean)
- The median is: 11.5
- The 10 most used categories are:
Jean-Eugène Durand - 956 - Churches in Île-de-France - 615 - Photographs by Eugène Atget - 518 - Camille Enlart - 418 - 1892 photographs - 282 - Photographs by Séraphin-Médéric Mieusement - 280 - Paul Robert (photograph) - 246 - 1900 photographs - 179 - Castles in Île-de-France - 177 - Historical images of Notre-Dame de Paris - 165
- The 10 less used categories are:
1877 photographs - 4 - 1925 photographs - 3 - 1902 photographs - 3 - 1914 photographs - 3 - 1852 photographs - 2 - 1891 photographs - 2 - 1906 photographs - 2 - 1916 photographs - 2 - 1896 photographs - 1 - 1926 photographs - 1
- Per file
- The most categorized file has 3 categories
- The less categorized file has 0 categories
- We have 21 uncategorized files
- We have 2430 files with two categories or more, which makes 80.5%
- On average, a file has 2.1 categories (mean)
- The median is: 2.0
- Per category
- The collection has 9237 categories, 316 distinct ones
- The most used category is on 956 files
- The less used on 1 files
- On average, a category is used 29.2 times (mean)
- The median is: 8.0
- The 10 most used categories are:
Jean-Eugène Durand - 956 - Churches in Île-de-France - 615 - Photographs by Eugène Atget - 518 - Camille Enlart - 418 - Paris - 382 - Paris IVe arrondissement - 317 - 1892 photographs - 282 - Photographs by Séraphin-Médéric Mieusement - 280 - Paul Robert (photograph) - 246 - Paris Ier arrondissement - 204 The 10 less used categories are: Bessancourt - 1 - Sainte-Geneviève-des-Bois (Essonne) - 1 - Parmain - 1 - Hardricourt - 1 - Yerres - 1 - Souzy-la-Briche - 1 - Coignières - 1 - Chatou - 1 - Angerville (Essonne) - 1 - Vallangoujard - 1
- Per file
- The most categorized file has 4 categories
- The less categorized file has 1 categories
- We have 0 uncategorized files
- We have 2996 files with two categories or more, which makes 99.3%
- On average, a file has 3.1 categories (mean)
- The median is: 3.0
Process
[edit]- Made a test run for 67 files. Mixed up two fields, resulting in the upload of the thumbnails, not the actual files >_< . See Category:Files from Base Mémoire. Jean-Fred (talk) 22:58, 6 July 2014 (UTC)
- Support usefull and unique files ; except for thumbnails, everything seems good and the metadata retrieval is perfect. VIGNERON (talk) 21:09, 9 July 2014 (UTC)
- Nuked the 67 files (yay VFC! \o/) − easier to start again. Jean-Fred (talk) 23:17, 9 July 2014 (UTC)
- Added some more code to match the INSEE code (given in the metadata) to the Commons category, using data from Wikidata (yay awesome Wikidata Query \o/). This raises the categorisation stats even further − see above. Jean-Fred (talk) 23:17, 9 July 2014 (UTC)
- Just launched a final test run with 20 files. I shall go with the rest of the upload unless someone complains :) Jean-Fred (talk) 23:17, 9 July 2014 (UTC)
- Gogogogogo ! Symac (talk) 07:12, 10 July 2014 (UTC)
Open questions
[edit]- At the moment the technique “Négatif” is not i18n. Ideas to which template to use? {{Technique}}? Jean-Fred (talk) 23:23, 9 July 2014 (UTC)
- I have arbitrarily matched :
- <edif> → Title
- <leg> → Description
- <adresse>/<com>/<reg> → Depicted place
- Is that really a good idea? Jean-Fred (talk) 23:23, 9 July 2014 (UTC)
- I've reviewed half of the file uploaded, the mapping looks good to me. --PierreSelim (talk) 06:39, 10 July 2014 (UTC)
- As PierreSelim, looking at a sample of these 20 files, it looks good to me, good job Jean-Fred. Symac (talk) 07:09, 10 July 2014 (UTC)
Done & todo list
[edit]Done. To do list:
- 41 files were not uploaded − this must be investigated
APMH00004658 APMH00004661 APMH00005393 APMH00005410 APMH00005420 APMH00005430 APMH00005441 APMH00005449 APMH00005452 APMH00005867 APMH00006091 APMH00006496 APMH00007233 APMH00008966 APMH00014042 APMH00014115 APMH00016263 APMH00016774 APMH00016775 APMH00016776 APMH00016777 APMH00016778 APMH00016779 APMH00016780 APMH00017547 APMH00017548 APMH00017549 APMH00017550 APMH00017551 APMH00017552 APMH00017553 APMH00017554 APMH00017555 APMH00035692 APMH00035987 APMH00037555 APMH00037561 APMH00037562 APMH00037564 APMH00037767 APMH00037929
- Files must be renamed to avoid the double extension
- Ingestion template must be substed.
Jean-Fred (talk) 17:44, 19 July 2014 (UTC)
- All files are now renamed − huge thanks to Steinsplitter who lent me his bot for the task :-) Jean-Fred (talk) 10:39, 28 July 2014 (UTC)
- All ingestion templates have now been recusively substed. Jean-Fred (talk) 12:58, 2 November 2014 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
User:Jean-Frédéric |