Commons talk:Wiki Loves Monuments 2011/Monuments database

From Wikimedia Commons, the free media repository
Jump to: navigation, search

draft on two new fields (?)[edit]

monument_article[edit]

monument_article varchar(255) NOT NULL DEFAULT ''

monument_article would contain wiki article title about monument

generation:

  • if local table has field for article name that will be used
  • otherwise article is extracted from the name field -- first wikilink in the name field (in most cases this is article about the monument)

formatting: monument_article would be formatted same as page_title in http://www.mediawiki.org/wiki/Manual:Page_table -- i.e. spaces replaced with '_' and first letter converted to uppercase

registrant_url[edit]

registrant_url varchar(255) NOT NULL DEFAULT ''

registrant_url would contain registrant url of the monument, like http://www.culture.gouv.fr/public/mistral/merimee_fr?ACTION=CHERCHER&FIELD_1=REF&VALUE_1=PA00088517, http://register.muinas.ee/?menuID=monument&action=view&id=20875, etc

generation: if registrant_url is generated in local row template, then it can be also generated for database (generated from id field, etc)

--WikedKentaur (talk) 16:56, 14 February 2012 (UTC)

France[edit]

If I understand correctly, the database has the municipality ("commune") for less than half of the French monuments. You can find the full list, with commune and Mérimée ID on data.gouv.fr [1].--Zolo (talk) 13:01, 24 March 2012 (UTC)

What do you base this one? According to the statistics it's almost complete (99.48%). Did I miss something? Multichill (talk) 13:34, 24 March 2012 (UTC)
Ok, found it: SELECT COUNT(*) FROM `monuments_fr_(fr)` WHERE commune='non' LIMIT 5; gives 11579. Why did people put in "non" there? This should be updated in the lists on the French Wikipedia. Multichill (talk) 13:38, 24 March 2012 (UTC)

DB field questions[edit]

I'm working on a prototype mw:Wiki Loves Monuments mobile application and have some questions & comments on the API...

  1. JSON output sends a PHP error message before the contents
  2. 'lat' and 'lon' are sent as strings in JSON output, should probably be numbers
  3. 'name' field sometimes contains wiki markup; should we assume that's legit and strip out the links / format links on display?
    • 'address' and 'municipality' also often contain links! 'address' sometimes includes templates for geo-coordinates.
  4. 'address' and 'municipality' look like they can sometimes go together to get an address that could be passed to a Maps application to get directions, but not always. In particular I notice US entries seem to list the county for municipality, and include city (and/or state hiding under a link) in the address. Are these usable as a pair in other countries, or should we only use addresses as display and try to link to maps using lat/lon?
  5. lots of entries are missing lat/lon
  6. What does the 'lang' represent, specifically? Are fields like 'monument_article' expected to be on the wiki with that language code? Does it affect anything else? Is it possible to have the same monument listed under multiple languages, or should we assume monument entries are unique and language is just for labeling and linking?
  7. What do 'source' and 'registrant_url' represent? These are often empty or kind of unclear.
  8. There appear to be entries with bad IDs ("???" or with ''' bold markup on them), how can these get cleaned up?
  9. Is there a bounding box limit on geographic searches? Nothing shows up around San Francisco or Berlin on my initial test searches; I suspect there's nothing listed for SF yet and I'm not sure about Berlin, but I know there are lots of things elsewhere in Germany.
    • (It looks like srlat and srlon don't do what I'd think, but maybe do exact searches or something? I'm trying with the 'bbox' parameter instead, setting my own bounding box... unfortunately the toolserver API seems to have hung now and isn't accessible to do more testing just now...)
      • (Ok I got 'bbox' working -- it wants (lon1,lat1,lon2,lat2) where I was expecting (lat1,lon1,lat2,lon2). I can now confirm that there *are* entries in San Francisco, which will make testing a lot easier for us. :)
  10. There doesn't appear to be a way to get a list of available countries/regions; I can hardcode them but that seems awkward. If those can be made queriable, especially along with names, that would be fantastic.
  11. are the pages listed in 'monument_article' supposed to actually exist? On some spot checks (assuming they live on 'lang'.wikipedia.org) some exist, but some don't.

Ok that should do it for starters. :) --brion (talk) 20:50, 9 May 2012 (UTC)

Doing the suggested '%foo%' LIKE-style search on srname is SUUUUUPPPPEEEEERRRRRR slow, taking a few minutes just to return no results. It would really be nice to be able to narrow searches by name, so this needs fixing... --brion (talk) 20:57, 14 May 2012 (UTC)

Hi Brion, sorry for the late reply, was on holiday and it slipped my mind. I numbered your questions so it's easier to answer them.
  1. Ok. That should be debugged and fixed
  2. Agree, another FIXME
  3. You can assume that all varchar/string fields can contain wiki markup
  4. Different sources have different types of information. Currently we have three fields describing the location (besides lat/lon): Country, municipality, address. I'm thinking about expanding this to a more layered system. So in the US that would be country, state, county, town/city/municipality/locality.
  5. Yes, depends per country, see Commons:Wiki Loves Monuments 2011/Monuments database/Statistics
  6. Source language of the entry. Lang code is always the same as the Wikipedia language code. monument_article is always on that Wiki. It's part of the primary key. It's possible to have the same entry in multiple languages (Switzerland, Belgium do have some examples)
  7. Source is a permalink to the Wikipedia article we got the information from. This is for debugging and maybe attribution. registrant_ur is a link to the particular monument in a register. Category:Cultural heritage monuments with known IDs contains examples here at Commons. For example File:Amsterdam - Oudezijds Achterburgwal 53.jpg links to http://monumentenregister.cultureelerfgoed.nl/php/main.php?cAction=search&cOffset=1&cLimit=25&oOrder=ASC&cSubmit=1&sCompMonNr=15.
  8. That's a hard one. I should probably do that at input with a regex to only allow id's which match the regex for that particular source.
  9. It uses the standard bounding box in use by Google and OSM. See here. You might want to play around with the toolbox
  10. Yes, that's something we should add.
  11. Nope, some sources have this as a field, for others it's the first wiki field extracted. It's all defined here
As for the slowness, that's probably because we don't have an index on that (yet). Did you see the infographic at User:ErfgoedBot? All the source code is public in https://fisheye.toolserver.org/browse/erfgoed/
  1. Update program and the configuration
  2. Exact defines in p_erfgoed_p at sql.toolserver.org (or full dump of the database)
  3. Api source code. Design by Platonides based on the MediaWiki api, so the code should look familiar.
Multichill (talk) 12:53, 21 May 2012 (UTC)
  1. Is the number of images for a given monument retrievable? How about the images themselves? --Philinje (talk) 21:36, 23 May 2012 (UTC)

How to add another source do the monuments database?[edit]

Could somebody help me with adding Ukrainian lists? Unfortunately I don't have the "erfgoed" account and can't deal with mysql. --A1 (talk) 17:37, 14 July 2012 (UTC)