This is a quick collection of issues with the metadata supplied by the bundesarchiv. Most of them result from the fact that metadata is embedded in the images using the IPTC format, which can not express all the properties in the way we would like. Since we already get supplementary data, like people depicted, in an XML, it would be nice to get all metadata, as detailed as possible, that way. IPTC may be used in addition to that, but not as the sole means of transferring metadata to commons.
Concrete problems are:
- Date ranges and qualifications are not retained. The Bundesarchive has some pages labeled with dates like "1867/1917" or "1932 ca.". This gets truncated to "1867" amd "1932" respectively, which is NOT good.
- The location of the image is not available separately. The Bundesarchive stores information about the location of an image separately, but we only get it mashed into the headline -- where it may get confused with topics.
- Captions for different times and sources get mashed together in a single text. This becomes problematic especially when (part of) the caption is heavy propaganda. It would be nice to have captions from different times/sources separated in a machine-readable way, if that is possible.
- Image caption gets cut off after 255 characters.
- please provide an example. I thought I fixed that bug before we even started the bulk upload. -- Duesentrieb ⇌ 10:05, 15 December 2008 (UTC)
- The XML provided by the Bundesarchiv is somtimes malformed, containing unescaped &-characters.
- Umlauts and other German special chars are broken: example. Matt (talk) 19:29, 28 February 2009 (UTC)