Commons talk:Structured data
![]() |
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. |
![]() |
---|
Scanned page of a book: Depicts page(Q1069725) or book(Q571)?[edit]
For ex https://commons.wikimedia.org/wiki/File:Voyage_ou_il_vous_plaira_1843_(144926870).jpg I would say it's depicting a page? (if the book was closed then maybe book would be applicable?) Curious what you guys think Thibaultmol (talk) 23:32, 25 July 2022 (UTC)
- Made a cross reference to this question on Wikidata:Property talk:P180#Open questions on how to use this property on Commons talk:Structured data. -- Juergen 217.61.195.60 23:02, 22 October 2022 (UTC)
- Neither of those should be used for Depicts. Book or page is the medium being used. Consider a file that was taken with a camera. We don't use
Depicts→Photo
. We useInstance of→digital photograph
. -Senator2029 06:12, 4 January 2023 (UTC)
Proposal to remove duplicate information from file description templates that is already stored in structured data[edit]
For several years now bots operated by Multichill (talk · contribs) and myself have been focusing on replicating information from file descriptions in structured data. Since then adoption and template support for structured data has significantly improved. The next step would be to remove the duplicate information from file description and use the information in structured data instead. If the information is only removed from template parameters that are automatically recoved by the corresponding template from associated structured data, there will be no visual change to the file description page. Please share your thoughts on a corresponding proposal at: Commons:Bots/Requests/SchlurcherBot11 --Schlurcher (talk) 22:17, 1 November 2022 (UTC)
- I am in principle in favor of storing information on one place and show it on other places, and get rid of duplicate information. My question: can someone who is not familiar with structured data still change (all the) information in the file description page, also when that information is stored in structured data? JopkeB (talk) 06:52, 13 January 2023 (UTC)
Distributed by[edit]
At File:Bertrand Blanchard Acosta (1895-1954) obituary in the New Bridgeport Telegram of Bridgeport, Connecticut on September 2, 1954 by the Associated Press.png for structured data, I am not allowed to use "distributed by". Should I delete it, or should "distributed by" be added as a valid Property? It would be a way to find all the articles distributed the Associated Press or other news agencies. Or would you prefer "Associated Press" as an author/creator. --RAN (talk) 20:12, 28 November 2022 (UTC)
Creators with Wikidata items[edit]
See Commons:Bots/Work requests#Creators with Wikidata items --Nintendofan885T&Cs apply 13:06, 6 December 2022 (UTC)
- Now archived, but still needs doing. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:33, 27 January 2023 (UTC)
Interwikilinking, descriptions and pinging[edit]
Hello, someone hanging around to reach out? Since a couple of days, I have problems with all the above. Did something change? To give an ʽFile:Histoire du tissu ancien à l'exposition de l'Union centrale des arts décoratifs (1883) (14597541669).jpg|example] to show how it shows when I wish to add hard returns or wikilink to the file. Also, I cannot ping anymoreː ̺ping|Vystoskyˌ
thank you so much for your time. Lotje (talk) 09:50, 9 January 2023 (UTC)
- Didn't you accidently activate "Input tools". On the top of the page, try clicking the language name you use, then Input settings and Disable input tools. --Matěj Suchánek (talk) 07:30, 10 January 2023 (UTC)
- Thank you so much Matěj Suchánek, indeed, going to the top of the page I was able to de activate the input tools. I think it is okay now. Cheers. Lotje (talk) 09:11, 10 January 2023 (UTC)
located in the administrative territorial entity[edit]
When adding this property to an image taken in the US should the value be the county the photo was taken in or should it be the state it was taken in? Trade (talk) 15:03, 27 January 2023 (UTC)
- @Trade: The most local administrative entity possible. County (or, in Louisiana, parish) always wins over state. Often a city, census-designated place, etc. will be even more specific. In the weird case of New York City, where five boroughs/counties are within the city, the borough is preferred; in other weird cases like Bothell, Washington, where a city spans two counties, this would ideally be double-valued, indicating both city and county. But if none of that is known, state beats having nothing. - Jmabel ! talk 16:09, 27 January 2023 (UTC)
- @Trade: according to Commons:Structured data/Modeling/Location, it's not clear that P131 is intended to be used like that at all. Strictly speaking all files are located in the Wikimedia servers. Strakhov (talk) 19:10, 27 January 2023 (UTC)
- I assume there would have been constraints if that wasn't intended Trade (talk) 19:45, 27 January 2023 (UTC)
- @Strakhov: it's not about the location of the file, it's about the location represented in the image. - Jmabel ! talk 01:03, 28 January 2023 (UTC)
- located in the administrative territorial entity (P131) should only be used for the lowest administrative level with a self-administration (NUTS 4/5 or LAU 1/2). In most cases this will be the municipality. location (P276) can be used for more specific location without any kind of autonomy like neighborhoods. GPSLeo (talk) 07:06, 28 January 2023 (UTC)
- @Jmabel: . Of course I'm aware you are trying to "structure" the location of places depicted in photos. I'm just pointing out how current guidelines suggest P1071, P7108 and P180 should be used for this. You are certainly free to not follow them. Apart from that, I refer to Multichill's comment here. Strakhov (talk) 08:09, 28 January 2023 (UTC)
- located in the administrative territorial entity (P131) should only be used for the lowest administrative level with a self-administration (NUTS 4/5 or LAU 1/2). In most cases this will be the municipality. location (P276) can be used for more specific location without any kind of autonomy like neighborhoods. GPSLeo (talk) 07:06, 28 January 2023 (UTC)
- @Strakhov: it's not about the location of the file, it's about the location represented in the image. - Jmabel ! talk 01:03, 28 January 2023 (UTC)
- For UK images bulk-imported from Geograph, I think Multichill has been using location of creation (P1071) to indicate the administrative entity where an image has been taken; as distinct from location of the point of view (P7108) if we can identify a specific structure the pic has been taken from (eg bridge, viaduct, etc).
- I know Multichill has also been adding the administrative entity as a depicts (P180) value. I am uncertain as to whether that is so appropriate, given that (even for an image say of a valley, never mind an image just of a brick) the image will be at most depicting only such a small fraction of the administative entity as a whole. But that can perhaps be an open topic for discussion. Jheald (talk) 13:24, 28 January 2023 (UTC)
- I think that's a good approach (location of creation (P1071) as a property to indicate places (mostly "administrative", and the most precise) where the picture was taken, and location of the point of view (P7108) for buildings, mountains, bridges (or administrative territorial entities when used in art works depicting places, too)). On the contrary, IMHO depicts (P180) should only be used when the depicted item ...is depicted in a significant manner. Not a infinitesimal fraction of it (wrt to municipalities, valleys may be OK with P180 (or it may not), but bricks, portraits of people, or doors certainly would not be OK). Strakhov (talk) 13:42, 28 January 2023 (UTC)
- @Trade: these bulk edits are just plain incorrect. All images in this search shoud, as Jheald pointed out, use location of creation (P1071). Maybe use a bot to fix these? I fixed the constraint.
- @Jheald: someone (forgot who) was very insistent on me adding depicts (P180) with the (broad) location. Of course you can always replace it with the more specific item like pushing it down the category tree. For example if this file would have depicts (P180) -> Haarlem (Q9920) you can replace it with depicts (P180) -> Grote Kerk (Q1545193) because Grote Kerk (Q1545193) is linked to Haarlem (Q9920) through located in the administrative territorial entity (P131). We don't want "COM:OVERDEPICTS" :-) (free after COM:OVERCAT). Multichill (talk) 13:48, 28 January 2023 (UTC)
- @Multichill: someone (forgot who) was very insistent on me adding depicts (P180) with the (broad) location: then they were insistently wrong. - Jmabel ! talk 17:08, 28 January 2023 (UTC)
- I disagree with that classification. It might be very imprecise, but it's not wrong otherwise I wouldn't have added it to several million files. Found it at Commons talk:Geograph Britain and Ireland/Reverse geocoding#Testing_in_Devon.
- File:The bridge at Poolewe - geograph.org.uk - 4342059.jpg is now the most recent upload. Doesn't that depict Gairloch (Q68815558)? Just like Category:Gairloch (civil parish) can be replaced with a more specific category, Gairloch (Q68815558) can be replaced with a more specific item. Multichill (talk) 17:20, 28 January 2023 (UTC)
- @Multichill: On the other hand, would we put depicts = human on an image of a fingernail ?
- For me I think there are a couple of things this raises: (a) we need to be able to distinguish when what is being depicted is all/most of the depicts (P180) value (so the P180 has a 'good' value) from when what is being depicted is only a fraction of the depicts (P180) value (so the P180 value could be improved). For this second case it may be useful to include a P180 value as a placeholder to indicate a slot for description there that is potentially capable of improvement (eg Gairloch (civil parish) -> Gairloch -> some part of Gairloch -> ...); but if we do this it seems to me important at least to qualify it with something like
applies to part = somevaluedepicted part (P5961) = somevalue, to distinguish that usage from the case of the statement value being a 'good' P180 value. - (b) there is the question of whether this use is redundant and/or inappropriate if the essentially same rather approximate location information is already being carried elsewhere by a location of creation (P1071) statement. Is flagging the existence of a slot to be filled a good enough reason to duplicate it with a P180 statement ?
- Also perhaps (c) Is the P180 statement necessary if we want the image to be included in the returns of an eg "Show me all images in Yorkshire" request ? Or would the presence of a 'somewhere in Yorkshire' P1071 statement be enough to capture searches for the keyword "Yorkshire" and/or get it to register in sensible ways to query for it ?
- I don't have decisive answers, just that these seem to be some things to think about for discussion. Jheald (talk) 18:04, 28 January 2023 (UTC)
- It is useful for cases like "I would like to geotag all of the photos in Yorkshire which doesn't have geotags". Of course it can be done also using Petscan and categories, but if we think that we should replace categories with SDC you cannot easily do this. -- Zache (talk) 18:28, 28 January 2023 (UTC)
- FWIW I don't think we should replace categories with SDC. I think each complement the other, and each can assist in populating the other, and both have a role to play (and should continue to be enhanced and invested in), both now and long-term. The question here though is: does the P180 tag help us to do something the P1071 doesn't ? Jheald (talk) 18:51, 28 January 2023 (UTC)
- It is useful for cases like "I would like to geotag all of the photos in Yorkshire which doesn't have geotags". Of course it can be done also using Petscan and categories, but if we think that we should replace categories with SDC you cannot easily do this. -- Zache (talk) 18:28, 28 January 2023 (UTC)
- @Multichill: someone (forgot who) was very insistent on me adding depicts (P180) with the (broad) location: then they were insistently wrong. - Jmabel ! talk 17:08, 28 January 2023 (UTC)
- I think that's a good approach (location of creation (P1071) as a property to indicate places (mostly "administrative", and the most precise) where the picture was taken, and location of the point of view (P7108) for buildings, mountains, bridges (or administrative territorial entities when used in art works depicting places, too)). On the contrary, IMHO depicts (P180) should only be used when the depicted item ...is depicted in a significant manner. Not a infinitesimal fraction of it (wrt to municipalities, valleys may be OK with P180 (or it may not), but bricks, portraits of people, or doors certainly would not be OK). Strakhov (talk) 13:42, 28 January 2023 (UTC)
- Still, what are we doing if a photographer was located in one administrative unit and the object is located in another one? I have for example nice pictures of Olympic Peninsula (Washington, US) taken from Canada.--Ymblanter (talk) 13:18, 29 January 2023 (UTC)
Using categories as a proxy for P180 depicts values[edit]
I have been testing the idea that it would be possible to derive depicts (P180) values for files from categories using Lua. The basic idea is that template would be added to categories, making the derived values visible on the category page. There would also be an invisible link to store the values so they can be accessed programmatically and exported from an external links database table. It would also mean that the template would be added to 100k - 500k categories as minimium (1M is max)
Using standard templates and Lua modules allows users to report incorrect values to the template's talk page or fix problems by creating and updating wikidata items for categories. This would allow the system to scale, as most improvements come from updating the Lua module code, which would affect multiple categories simultaneously. Corner cases could be fixed by adding wikidata items.
As result suggested values could be used as expected values for machine vision. So, the machine vision problem would be: "Is there a thing defined by the suggested value in the photo?" Exact options are much less error-prone to false positives than trying blindly to figure out with machine vision what is in the photo.
This makes it possible to scale the adding SDC P180 values to tens of millions of images.
- Code
- Examples
- Category:Salt Lake City 1999 Tornado
- Category: Views from Hemakuta temple hill complex
- Category:Bosses of Prieuré Saint-Arnoul de Crépy-en-Valois
- Category: Illustrations of Macroscelidea
- Category:Orthographic projections of Wenninger polyhedra (red, yellow, blue)
- Category:Sanborn Fire Insurance Map from Richmond, Wayne County, Indiana&
In any case, what do you think about the idea? Is it good/bad/dead-end etc... Personally I think that, populationg P180 values to the photos must be mainly automatic because of the vast number of images so the question is how to do it. --Zache (talk) 12:14, 28 January 2023 (UTC)
- Only one of the six images in Category:Salt Lake City 1999 Tornado depicts the tornado. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:20, 28 January 2023 (UTC)
- Something I have wondered for a while is whether we could helpfully use a "possible value, but confirmation required" rank of truthiness, alongside the existing "preferred", "normal", and "deprecated" -- to indicate both potential values inferred from categories, and potential values inferred from machine vision, both of which probably need manual or other additional confirmation.
- A "poor man's way" to do this given that we don't have such a rank might be to add such values with rank = "deprecated", with reason for deprecated rank (P2241) = "inferred value, confirmation needed". Then values would accessible by tools or queries that specifically looked for them, but not regular tools or queries otherwise.
- Eg it's quite common to find "view from" images in categories (--> location of the point of view (P7108)) in addition to "view of" images (--> depicts (P180)).
- I do think a way to indicate on-wiki potential values that need confirmation is something we strongly need (and ideally something that ideally should have been in place before so much computer vision tagging was rolled out). Jheald (talk) 13:34, 28 January 2023 (UTC)
- Yeah, though I was thinking with this that we would have something where we can do mass changes to suggested values just by editing Lua module code or changing Wikidata values or updating categories. Not so that we need to edit files one by one and it leaves edit to their edit history. After confirmation (either by human or another method) it would be something which become something which would be stored in version history. -- Zache (talk) 13:59, 28 January 2023 (UTC)
- @Zache: Not so that we need to edit files one by one and it leaves edit to their edit history -- Bots. That's what bots are for, and why we have them. Also IMO yes, we very much do want to leave an edit in their edit history. Any statement on any page should have its footprint in the audit trail, so we can always ask when the statement was added and by who. Secondly, to be useful these statements need to be in the SPARQL copy of the data. And statements only get changed or added there when an edit is made. Jheald (talk) 16:43, 28 January 2023 (UTC)
- One other thing to think about is that we similarly would also need bots to be ready to remove such statements if the categorisation of an image was later changed (ie due to a changed identification of what was being depicted). Bots to add statements are quite often made. Bots to remove them (or, at least, flag the statements as needing a re-check) maybe a bit less so. But such bots are also needed. Jheald (talk) 16:58, 28 January 2023 (UTC)
- @Jheald about first comment. In this case the information is comparable to other modules. The changes are done to separate entities (ie. to Wikidata items, to categories, to Lua module which is rendering the information) and not directly to the page where information is visible.
- About second, sure, I think that there should be documentation at the point when the statement is saved to the photo what is source and determination method. In this case it could be: Module:P180fromCategory with value https://commons.m.wikimedia.org/wiki/Category:Turku_Castle#P180=Q136893 AND Salesforce LAVIS Image Classification "Turku Castle" with 87% confidence using CLIP model and ImageNet dataset.
- About SPARQL. There is multiple things which would be nice to get inside SPARQL endpoint (for example whole cirrus search index at start or categories. However, development is rather slow and personally I have started to think that if something is needed then one should do things which can load to own sparql endpoint and one can do federated queries. As long the Blazegraph transition is has not been solved Wikimedias SPARQL endpoint development is dead in the water.-- Zache (talk) 18:05, 28 January 2023 (UTC)
- @Zache: My point was, that for something to be available from within SPARQL, the most straightforward way to get it there is with an actual SDC statement. And I think having things available from SPARQL is important, because for most people this is the way to query SDC that is most available, most accessible, most documented, and most similar to querying wikidata that they may already be familiar with. If the information is there as an actual SDC statement, it means that it is there where people will look for it, where people will see it, where people will query for it.
- Yes, in principle it would be nice if more information was accessible from within a SPARQL query in a more virtual way. In fact category membership is (or has been) available, both as a SERVICE and as a separate SPARQL database that can be federated (though I'm not sure if that is still up at the moment) -- though so far only for information about category membership of categories, not individual images.
- I don't actually see Blazegraph's EOL issues as such a deal-breaker here. Blazegraph has the capability to do most of what we want (and does it), both for WCQS here and with WDQS for wikidata, so I think actually we are very little limited if there is no longer any Blazegraph dev team, because we don't much seem to need one. Category information for files could be implemented either as a SERVICE or as an additional federate-able database without needing any code patches from Blazegraph. The issue is more that the Search engineering team (WMF) doesn't have any spare resources to do it (and doesn't have Commons as a priority), while the Wikidata team (WMDE) sees Commons as out of its scope (?).
- Finally, as for determination method, Mathematics Genealogy Project ID (P549) is available and IMO absolutely should be included in the reference for a bot-added edit. IMO it's crazy that the CV-assisted edits weren't (and aren't ?) being documented in this way. Mathematics Genealogy Project ID (P549) = "inferred from category" would IMO be 100% legitimate to add in the referencing, ideally perhaps with inferred from (P3452) = <category item> -- except that sadly most categories don't (and won't) have items; but a Wikipage-URL valued referencing property could be made instead. Jheald (talk) 19:21, 28 January 2023 (UTC)
- @Jheald The main problem is that editing everything using bots doesn't scale. Bots cannot add new values fast enough, but it doesn't scale regarding human labor effort either, as running such bots will need a substantial amount of time and effort. For example, the most active SDC edit bot SchlurcherBot does 2M edits per month. For comparison, Commons will get new files of 1M per month, and there are currently 90M files. The simplest case for adding values is MIME type (P1163) where the value comes directly from the database, with only one possible well-defined value. However, in current editing speed it will take five years to add P1163 values to the remaining 67M files, even if SchlurcherBot would not do anything else. However P1163 is also something the SDC could read directly from the mediawiki database without needing a bot as a middleman if there would be any developer resources to implement it. In my proposal, I proposed something which is like that. It could be done in a centralized inside Wikimedia Commons, and it would be useful in classifying tens of millions of files in limited timeframe. It would also allow to refine the results without needing to edit a massive number of files one by one again when something is changed. So, the idea is not to get end values which can be used directly in SDC but do a intermediate step for classifying the photos. Zache (talk) 15:56, 30 January 2023 (UTC)
- One other thing to think about is that we similarly would also need bots to be ready to remove such statements if the categorisation of an image was later changed (ie due to a changed identification of what was being depicted). Bots to add statements are quite often made. Bots to remove them (or, at least, flag the statements as needing a re-check) maybe a bit less so. But such bots are also needed. Jheald (talk) 16:58, 28 January 2023 (UTC)
- @Zache: Not so that we need to edit files one by one and it leaves edit to their edit history -- Bots. That's what bots are for, and why we have them. Also IMO yes, we very much do want to leave an edit in their edit history. Any statement on any page should have its footprint in the audit trail, so we can always ask when the statement was added and by who. Secondly, to be useful these statements need to be in the SPARQL copy of the data. And statements only get changed or added there when an edit is made. Jheald (talk) 16:43, 28 January 2023 (UTC)
- Yeah, though I was thinking with this that we would have something where we can do mass changes to suggested values just by editing Lua module code or changing Wikidata values or updating categories. Not so that we need to edit files one by one and it leaves edit to their edit history. After confirmation (either by human or another method) it would be something which become something which would be stored in version history. -- Zache (talk) 13:59, 28 January 2023 (UTC)
These are interesting thoughts (and thanks @Pigsonthewing: for pointing me towards this discussion). I have four points to contribute:
- Linking files to Wikidata items via categories was a big reason for the multi-million-edit work that I did to match Commons categories with Wikidata items (and this is still ongoing, please help!). There have even been proposals that SDC could replace Commons categories in the future, although that still feels like it's a long time away - but perhaps this work helps us work towards that.
- @PMG: has already been contributing hundreds of thousands of depicts (P180) values via semi-automated editing. I'm in awe of their efforts here, and I think they could provide a lot of valuable input into this discussion.
- Bots edit at the rate they are allowed to do so (both due to community and server constraints). There are ways of speeding them up. If we wanted to copy all category uses (where they are matched against Wikidata items) to SDC depicts values, we could probably do that within a month. The question is whether that *should* happen.
- The bottom line is accuracy: if we're going to make changes by bot, Lua, gadgets, or otherwise, they need to be 99.9% accurate. How do we ensure that in this case?
Thanks. Mike Peel (talk) 21:18, 3 February 2023 (UTC)
- This is answer for: The bottom line is accuracy: if we're going to make changes by bot, Lua, gadgets, or otherwise, they need to be 99.9% accurate. How do we ensure that in this case? -- If the use case is for generating likely P180 property values for categories using Lua (example: Category:Salt Lake City 1999 Tornado) It doesn't need to be 99.9% accurate if we can respond to errors by fixing the Lua module code in a timely manner. Errors can also be fixed by creating a Wikidata item for the category, which would give exact values. -- about mass creating actual Wikidata items or adding P180 values directly to files. I don't think that we have a single method for 99.9% accuracy, but we can minimize errors by combining techniques. For that, we can use machine vision to confirm what is in the picture, GPT style parsing for category names or descriptions, trying to ensure the category's topic by running machine vision against multiple images in the category, and finding common denominators. However, the tasks are too complex to be done in one step so we need to split them a smaller tasks and solve them one by one. -- Zache (talk) 16:36, 4 February 2023 (UTC)
- @Zache: From the talk of "fixing the Lua module code" in that answer, I infer that you are still thinking about what we might call 'virtual' statements -- statements that are not 'real' statements, in the sense that they could not be queried or searched, would not be in the actual wikibase, would not be accessible from SPARQL; but would be able to be displayed in some form, some how.
- My sticking point is still this: I'm still at a loss to know, what would be the use of such things? Jheald (talk) 19:14, 4 February 2023 (UTC)
- Yes, virtual statements. Access to the data would be using SQL via toolforge database using external links table. It would allow us to access the data in mass and near real-time. External links can also be accessed through MediaWiki API, which Pywikibot and bots can use. It would be possible to access external links using the Mwapi service in SPARQL, but because of mawapi performance limitations, the usecases would be very narrow. ( If the idea works, the logical next steps would be to use a separate database table instead of external links table and add the ability to create dynamic SDC values using Lua code. However, this would be outside of this proposal's scope. This proposal focuses on using the structures we already have as it is something that we can realistically implement.) The resulted data itself can be used to narrow down false positives from other tools. For example, if one wants to create a tool that would like to detect P180 values using machine vision, then one can change the query, "what is in the photo?" to the question, "Is there an object $P180 in the photo?" for example. However, we do not know how far we can get on on automation before we have the data and we have tried it. My best guess in this is that it will reduce significantly the false positive rate from automated machine learning tools if we can combine multiple methods. -- Zache (talk) 09:32, 5 February 2023 (UTC)
Info from PMG[edit]
Hello. I am user that is marking photos. I am mostly using Depictor and Wikicrowd, and when category is bigger than 20 images I am speeding up stuff using AC/DC gadget (for less than 20 images I spotted that its faster to mark using Depictor). I am thinking that for this discussion also owners of this tools (@Husky: and Adam Shorland) can be very helpful. Adam Shorland published statistics that shows that percentage of ratio between "images in category" to "images that contain stuff that is mentioned in name of category" what can vary from 95% (is this a foto of firework?) to 3% (is this a foto of covered bridge?).
With "lets use bots" idea I am fine, but there is one "but". But I am Software Tester in real life and as every good tester I want to point some potential problems. Most of them are connected to what exactly people are putting in categories that can result in poor descriptions (en:Garbage in, garbage out). When I am making my edits I intentionally avoiding such areas, because they are very problematic and results in many images not marked even if they are in correct category.
- artists. Categories of artists that are making somethin physical (painters, writers, architects, pipe organ makers) are terrible. Usually its like this: there is one photo of specific artist and for example 100 pictures of some of his/her art. Example: Category:Joseph Callinet, Category:Edmond Alexandre Roethinger, Category:Bolesław Biegas.
- companies - this is very difficult topic, because I am not sure how to mark correctly photo of company. Is main building of company fine? Is main product of company (image of can for CocaCola?). What you will put as "is this a photo of Ford Company"? Example: Category:Manufacture d'Orgues Thomas
- Mountains/national parks. This is always difficult for me. Should I mark such file as Category:Bory Tucholskie National Park?
- events Category:Procession de la Sainte Coiffe. Is photo of one person also photo of whole procession/event?
- People like Obama. On Wikicrowd only 28.29% images were marked "Yes, on this image there is Obama". There is many images that was made on events with Obama, but they dont show Obama.
- Monuments - there are many times situations that people are mixing "this is a foto of this monument" and "this is a photo of person/event that was close do monument. Example: Category:United States Navy Memorial and this photo.
- Concerts - many times there is category of some band making concert and photo shows people listening to this band (Category:IRA (Polish heavy metal band) in 2015 and File:Band-IRA fans 0367.JPG as example).
- Bands - many times there is category of band, but you see images of individuals.
- racing drivers. Many times there is situation that we can see his car - but not him. I spend a lot of time trying to solve this issue, and Depictor was tool that I was using to mark only this images that have this specific driver. Example: Category:Marco Andretti, Category:Alex Garcia
- Fictional characters. I am not sure how Category:Jasmine (Disney) should work, if its fictional.
- additionally Category:Love and other feelings. Weather stuff.
You can ask "ok, PMG, then what is good for marking such images". Its: cemeteries, churches, sportsman/sportwoman, ships, paintings, actors, fountains, military people (But not military units). If you want more info I am happy to share my experience in this subject, please ask. PMG (talk) 17:20, 6 February 2023 (UTC)
- I don't have much to add to the discussion, i think PMG lined it out quite nicely here. In general i would say that a blanket 'let's add ML tags to images' is a bad idea because there is so much nuance in all the different topics. One approach that i think could work is using it more as a suggestion, and then it could also work in conjuction with a tool like Depictor. Run an algorithm on a bunch of images, get back some suggestions, and then use those together with a human reviewer to make sure the tag is actually correct.
Husky (talk to me) 12:03, 10 February 2023 (UTC)
AI generated images[edit]
What is the correct way to show that an AI generated image was created using DALL-E? I don't believe simply P180 (Depicts) would be appropriate Trade (talk) 18:29, 3 February 2023 (UTC)
- A very good question; you mean, this is like indicating that an image is a drawing or painting, and then, e.g. an oil painting? Ziko van Dijk (talk) 18:42, 3 February 2023 (UTC)
- the best way to indicate that an image is AI genersted would be by using genre (136) with artificial intelligence art (Q65066631) as the value.@Ziko: --Trade (talk) 21:25, 3 February 2023 (UTC)
- fabrication method (P2079) ? Jheald (talk) 19:17, 4 February 2023 (UTC)
- Another option might be something like instance of (P31) = computer-generated imagery (Q6002306); but I am a bit wary of recommending this.
- A query https://w.wiki/6JKe breaking down a random set of 100,000 cases (out of 23,211,627 total https://w.wiki/6JKs) shows that instance of (P31) seemingly can be used in this way (although seeming not yet with Q6002306 as a value). But, from wikidata experience, there may be good reasons to prefer where possible to try to express information using attribute = value, rather than instance of (P31) with an ever-larger number of subclasses and case-classes.
- Also the item computer-generated imagery (Q6002306) isn't so good -- at the moment it seems to be doing double duty for both (i) the general art of using computers to make works, and (ii) an individual item of computer-assisted work. This double use isn't good, and should be cleared up. Jheald (talk) 19:42, 4 February 2023 (UTC)
References[edit]
Yesterday I found this edit at one of my images: Revision of 728226300. The references were new to me, but I guess they exist now. As a reference, however, a template used only internally I find confusing for users. It not something to look up or check. Even otherwise, however, I find the references rarely helpful. The references should be however in any case also for users visible and understandable sources. --XRay 💬 06:51, 12 February 2023 (UTC)
failed-save,[object Object],[object Object],[object Object][edit]
Constantly getting my edits ruined by this error message is getting real tiring. Trade (talk) 21:10, 16 February 2023 (UTC)
- "failed-save" is a negative response from the server that should usually be accompanied by an error message. "[object Object]" is usually an indication of a programming error. So I guess when there is something wrong server-side, the client-side handling is also broken. It would be good if you wrote donw the steps to reproduce and reported the problem, so that it can be looked at. --Matěj Suchánek (talk) 17:58, 25 February 2023 (UTC)
- I believe the problem is caused when one of the pictures are protected against being edited by non-admins Trade (talk) 22:15, 25 February 2023 (UTC)
- I tried to reproduce this on File:FISHERMAN.jpg (BTW why has this picture been protected infinitely for edit warring?), but I don’t have any edit button in the SDC tab. So either you used a non-standard tool you forgot to mention, or something else is the culprit. —Tacsipacsi (talk) 16:17, 27 February 2023 (UTC)
- I believe the problem is caused when one of the pictures are protected against being edited by non-admins Trade (talk) 22:15, 25 February 2023 (UTC)
Some Exif time fields are valuable for detecting errors[edit]
Some Exif time fields might be worth putting into the SDC, as one can detect lagging GPS coordinates. Jidanni (talk) 05:30, 3 March 2023 (UTC)
- I usually prefer to add the GPS information from the file page to the SDC. If there is no GPS information in the file page, I would also advise to first transition it there. --Schlurcher (talk) 16:27, 3 March 2023 (UTC)