Commons:Batch uploading/MH Ile-de-France

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

MH Ile-de-France[edit]

Uploading 3018 historical monuments pictures/data from the Base Mémoire and data.iledefrance.fr, using the Special:GWToolset, to Category:Files from Base Mémoire via_data.iledefrance.fr.

Metadata is pre-processed with Python and converted to XML, which is fed to the GWToolset, which through the mapping GWToolset:Metadata Mappings/Jean-Frédéric/MH IDF.json gives it to the ingestion template {{Ingestion-MH IDF}}

Metadata sample
  <record>
    <_ext>jpg</_ext>
    <lieucor></lieucor>
    <commons_title>Eglise_Saint-Quiriace_-_Choeur_-_Provins_-_Médiathèque_de_l'architecture_et_du_patrimoine_-_APMH00006801</commons_title>
    <wgs84_1>3.290922</wgs84_1>
    <wgs84_0>48.560847</wgs84_0>
    <_filename>sap01_mh006801_p.jpg</_filename>
    <filename>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_p.jpg</filename>
    <autoeu></autoeu>
    <merimee_id>PA00087203</merimee_id>
    <ref>APMH00006801</ref>
    <reg>Ile-de-France</reg>
    <nump>MH0006801</nump>
    <leg>Choeur</leg>
    <adresse></adresse>
    <serie></serie>
    <insee>77379</insee>
    <autp>{{Creator:Jean-Eugène Durand}}</autp>
    <copy>Ministère de la Culture (France) - Médiathèque de l'architecture et du patrimoine - diffusion RMN</copy>
    <categories>[[Category:Collegiate Saint-Quiriace of Provins]]
[[Category:Jean-Eugène Durand]]</categories>
    <dpt>77</dpt>
    <scle></scle>
    <datpv></datpv>
    <typdoc>Négatif</typdoc>
    <video_p>http://data.iledefrance.fr/api/datasets/1.0/photographies-serie-monuments-historiques-1851-a-1914/images/e434ac6fcaad3030fd11eec78bdee2ff/</video_p>
    <video_v>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_v.jpg</video_v>
    <_url>http://www.culture.gouv.fr/Wave/image/memoire/0010/sap01_mh006801_p.jpg</_url>
    <edif>Eglise Saint-Quiriace</edif>
    <com>Provins</com>
  </record>

Alignment[edit]

Explanations

Categorisation[edit]

Through the alignment some categorisation is made. Here are the numbers.

Deprecated
Per category
  • The collection has 6223 categories, 136 distinct ones
  • The most used category is on 956 files
  • The less used on 1 files
  • On average, a category is used 45.8 times (mean)
  • The median is: 11.5
  • The 10 most used categories are:

Jean-Eugène Durand - 956 - Churches in Île-de-France - 615 - Photographs by Eugène Atget - 518 - Camille Enlart - 418 - 1892 photographs - 282 - Photographs by Séraphin-Médéric Mieusement - 280 - Paul Robert (photograph) - 246 - 1900 photographs - 179 - Castles in Île-de-France - 177 - Historical images of Notre-Dame de Paris - 165

  • The 10 less used categories are:

1877 photographs - 4 - 1925 photographs - 3 - 1902 photographs - 3 - 1914 photographs - 3 - 1852 photographs - 2 - 1891 photographs - 2 - 1906 photographs - 2 - 1916 photographs - 2 - 1896 photographs - 1 - 1926 photographs - 1

Per file
  • The most categorized file has 3 categories
  • The less categorized file has 0 categories
  • We have 21 uncategorized files
  • We have 2430 files with two categories or more, which makes 80.5%
  • On average, a file has 2.1 categories (mean)
  • The median is: 2.0
Per category
  • The collection has 9237 categories, 316 distinct ones
  • The most used category is on 956 files
  • The less used on 1 files
  • On average, a category is used 29.2 times (mean)
  • The median is: 8.0
  • The 10 most used categories are:

Jean-Eugène Durand - 956 - Churches in Île-de-France - 615 - Photographs by Eugène Atget - 518 - Camille Enlart - 418 - Paris - 382 - Paris IVe arrondissement - 317 - 1892 photographs - 282 - Photographs by Séraphin-Médéric Mieusement - 280 - Paul Robert (photograph) - 246 - Paris Ier arrondissement - 204 The 10 less used categories are: Bessancourt - 1 - Sainte-Geneviève-des-Bois (Essonne) - 1 - Parmain - 1 - Hardricourt - 1 - Yerres - 1 - Souzy-la-Briche - 1 - Coignières - 1 - Chatou - 1 - Angerville (Essonne) - 1 - Vallangoujard - 1

Per file
  • The most categorized file has 4 categories
  • The less categorized file has 1 categories
  • We have 0 uncategorized files
  • We have 2996 files with two categories or more, which makes 99.3%
  • On average, a file has 3.1 categories (mean)
  • The median is: 3.0

Process[edit]

Open questions[edit]

Is that really a good idea? Jean-Fred (talk) 23:23, 9 July 2014 (UTC)[reply]
I've reviewed half of the file uploaded, the mapping looks good to me. --PierreSelim (talk) 06:39, 10 July 2014 (UTC)[reply]
As PierreSelim, looking at a sample of these 20 files, it looks good to me, good job Jean-Fred. Symac (talk) 07:09, 10 July 2014 (UTC)[reply]

Done & todo list[edit]

Done. To do list:

  • 41 files were not uploaded − this must be investigated

APMH00004658 APMH00004661 APMH00005393 APMH00005410 APMH00005420 APMH00005430 APMH00005441 APMH00005449 APMH00005452 APMH00005867 APMH00006091 APMH00006496 APMH00007233 APMH00008966 APMH00014042 APMH00014115 APMH00016263 APMH00016774 APMH00016775 APMH00016776 APMH00016777 APMH00016778 APMH00016779 APMH00016780 APMH00017547 APMH00017548 APMH00017549 APMH00017550 APMH00017551 APMH00017552 APMH00017553 APMH00017554 APMH00017555 APMH00035692 APMH00035987 APMH00037555 APMH00037561 APMH00037562 APMH00037564 APMH00037767 APMH00037929

  • Files must be renamed to avoid the double extension
  • Ingestion template must be substed.

Jean-Fred (talk) 17:44, 19 July 2014 (UTC)[reply]

All files are now renamed − huge thanks to Steinsplitter who lent me his bot for the task :-) Jean-Fred (talk) 10:39, 28 July 2014 (UTC)[reply]
All ingestion templates have now been recusively substed. Jean-Fred (talk) 12:58, 2 November 2014 (UTC)[reply]
Assigned to Progress Bot name Category
User:Jean-Frédéric