Commons talk:Category disambiguation

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Incomprehensible[edit]

I fixed a typo based on the degree to which I could make any sense of a phrase, but the phrase is still incomprehensible: "at least 90% the others combined should usually be necessary." What on earth does this mean? - Jmabel ! talk 14:45, 22 April 2017 (UTC)

And actually the rest of it isn't much clearer. - Jmabel ! talk 14:47, 22 April 2017 (UTC)

In fact, I can't even tell for sure what this is about. Is this about creating disambiguation pages, about adding parenthetical words or phrases to the end of category names, or about something else? - Jmabel ! talk 14:49, 22 April 2017 (UTC)

This is currently primarily about when topics should be primary and when they should be disambiguated, see w:WP:PRIMARYTOPIC. "at least 90% the others combined should usually be necessary." refers to how readers are likely to search for the topic in question vs the others combined. Crouch, Swale (talk) 14:55, 22 April 2017 (UTC)

I agree with Jmabel about this being hard to understand. I can't quite put my finger on why, but let's start here: the word should can be used in two ways. One has to do with responsibility or obligation, as in "I should get to work on time". The other is a sort of substitute for if, as in "Should you get hungry, help yourself to something to eat". If any of your uses of should are the latter, how about changing them to use if instead? In the process, maybe you would clarify the phrase should there should be a higher threshold. --Auntof6 (talk) 16:31, 22 April 2017 (UTC)

Probably we'd better reason by exclusion: first locate the cases in which we do not have to disambiguate; then the ones when we have to; once excluded those cases we can examine the gray zone. Of course if this gray zone encompasses the 80 per cent of the categories there's some problem because disambiguations should be an exception, not the rule. :-) -- SERGIO (aka the Blackcat) 17:44, 22 April 2017 (UTC)

On Wikipedia, I'd agree that disambiguation should be an exception. Wikipedia wants to get people to the article they're looking for as quickly as possible, and having unqualified titles for articles about primary meanings (with hatnotes mentioning dab pages or other meanings) does that.
However, Wikipedia and Commons aren't the same kind of thing. On Commons, we have an issue that Wikipedia doesn't have: a major function here is categorizing files, and that categorizing is often done by bots and people who don't pay close enough attention to the categories they're assigning. They should definitely pay more attention to it, but bots may not be able to, and people may either not understand categorization here (even when they do pay attention), not understand that it matters, not see the details because they use tools like HotCat where you don't see the category definition, or they may just not care.
So, to me, it makes the most sense to do what we can to help categorizers, whether bot or human, to get it right. IMO, one way is to qualify anything that's ambiguous, and have a dab at the base name. Some people object to this on the grounds that the AP stylebook says some city names don't need to be qualified. Stylebooks are for prose: this kind of rule is so that there aren't a lot of extraneous words in text. That doesn't matter in Commons category names: a category name isn't in the middle of prose. --Auntof6 (talk) 19:37, 22 April 2017 (UTC)
I think places have been the most controversial on this, but that isn't the only field where they may be issues. One additional problem, for places in particular, is large places have lots of sub-categories - do we really want to have to add ", England" to all the subcats of Category:London?.--Nilfanion (talk) 20:14, 22 April 2017 (UTC)
We may not want to, but we may need to. --Auntof6 (talk) 08:46, 3 May 2017 (UTC)

As a general comment, this page should speak to Commons, using Commons terminology and Commons issues. Commons doesn't have a problem with "readers" landing at the wrong category, as almost everyone gets to a category via a direct link. It does have problems with categorisation going wrong. For example, bots and HotCat users can make mistakes where there is a "primary" subject. Never mind the quality of the writing, I think this is just too narrow in focus, as it just focuses on translating a single element of w:WP:DISAMBIGUATION (ie primary topic) over to Commons instead of the whole. With that in mind, I'd like to see any guideline address:

  1. How disambiguation should be done (ie what term is added to disambiguate, do we use parentheses or commas, and in what language)?
    1. Do we just defer to en.wp's guidance on this matter?
    2. Is non-English disambiguation acceptable? If it is, when?
  2. When should a disambiguator be added, if no confusion exists?
  3. If category is disambiguated, when should the disambiguator apply to sub-cats? For example, Category:Norfolk, England is disambiguated. But two of its sub-cats are Category:History of Norfolk, England‎ and Category:Sites of Special Scientific Interest in Norfolk‎. Are those titles OK or not, and why?
  4. How should multiple-level disambiguation be done? For example, is Category:Market Street, Wakefield, West Yorkshire‎ fine with double disambiguation? How about Category:Follies (architecture) in Norfolk, England‎ - with two separate single-disambiguations?
  5. If one subject is more vastly important than the others of that share its name should it be put at the base name instead of having a disambiguator? For instance, is Category:Science correct when Category:Science (journal) also exists?
    1. If so, how do we assess when this exception kicks in (this is the equivalent of primary topic)? I believe the page as it stands attempts to answer this question.
    2. If so, are (disambiguation) categories ever ok? Would we have Category:Science (disambiguation)?
  6. How to construct disambiguation categories, like Category:Sale?

With regards to User:Blackcat's comment about cases which we do/do not have to disambiguate, some examples maybe:

  1. Streets: Always the place name, even if the street is unique, for disambiguation and clarity.
  2. Ships: Include (ship, launch year) per existing naming convention, even if its a unique name.
  3. Minor planets: Include the MPC number (eg Category:99942 Apophis‎), for clarity and disambiguation.
  4. People: Do not disambiguate by default, use the full name.
  5. Places: Do not disambiguate by default, unless convention otherwise requires (ie adding US state name).

What I would say is if we ever decide the default is "no disambiguation" a conflict can occur. I'll give one example: Category:George Washington is ambiguous. Is it right that the first US President is at the base name there? If so, why? If not, why not?--Nilfanion (talk) 20:11, 22 April 2017 (UTC)

In terms of what should be primary I would ask the following;
  1. How likely is it that when a user searches for the term, they want the given term. If Sleaford gets around 43x the hits of Sleaford, Hampshire then that might show this. Although as has been pointed out probably not many people search on Commons and different topics may be more relevant to Commons users than WP users.
  2. How likely are uploaders likely to upload to the given category.
  3. How important is the topic compared to the others, do all others (or at least the main competitors) derive their name from the topic.
  4. How likely is it that someone would be surprised to find the given topic at the base name (eg many users would probably be supprised to find the city at the base name but almost nobody should be surprised to find the fruit.
  5. Are incomplete disambiguation primary topics acceptable (like "Newport, Wales" and "Newport, Pembrokeshire") outside subsidiary meanings like ("Borough of Ashford" for "Ashford, Kent" and "Cambridge University" for "Cambridge", Cambridgeshire). And where the disambiguated term is widely considered a term eg "Springfield, Virginia" and "Springfield, Westmoreland County, Virginia".
And there some other considerations;
  1. If the primary meaning matches the WP article. But what is primary in 1 language might not be in another.
  2. If the primary meaning is likely something else in other languages (eg Reading where the English town is primary) which could cause someone to link to the wrong category because they just use the plain Commonscat template.
  3. Where there is conflict in other launguages but not in the subject's or English eg Perm (because the Permian is sometimes called just Perm).
  4. How long has the title been stable (mainly because external sites might have linked to it)
And if a "main" category is disambiguated (eg Norfolk) when should we disambiguate the "sub categories";
  1. Apply the same test to any main category
  2. Apply a slightly higher test (like Category:Countryside in Norfolk, England where there may be less risk of confusion)
  3. Disambiguate unless there is very little risk of confusion
  4. Disambiguate where there is no risk at all (eg Category:Sites of Special Scientific Interest in Norfolk‎
In the last case this generally simplifies things by removing the need to evaluate each individual category but what about churches for example Category:All Saints church, Hanworth where there is only one church by that name but there are other Hanworths. Crouch, Swale (talk) 15:12, 28 April 2017 (UTC)

I have no idea what most of the sentences currently on the page are supposed to mean. I can't comment on this until the text is greatly clarified... AnonMoos (talk) 09:29, 24 April 2017 (UTC)

I have tried to clarify a bit by expanding. Please remember that many new pages can start up in a poor way. Crouch, Swale (talk) 15:12, 28 April 2017 (UTC)
@Crouch, Swale: I see that you updated the page, but some of it still doesn't make sense. --Auntof6 (talk) 16:25, 28 April 2017 (UTC)
@Auntof6: what would be better then? I have tried to clarify with examples like London and Perth. Crouch, Swale (talk) 16:53, 28 April 2017 (UTC)
It's mostly a grammar issue. The phrase mentioned in the first comment here hasn't been changed, and still doesn't make sense. The phrase "if the topic if overwhelmingly more important than others with that term" is another example. "When this isn't the case in Worcester or Perth" also doesn't make sense -- maybe you mean "as with the categories for Worcester and Perth"? --Auntof6 (talk) 17:41, 28 April 2017 (UTC)
@Auntof6: Is the current version better?, I have also expanded it. Crouch, Swale (talk) 08:15, 3 May 2017 (UTC)
I think I understand generally what you're trying to say, but you still haven't clarified the very beginning. Besides that, when you say that a primary topic is the one that is wanted 90% of the time, I have two questions: where does the 90% figure come from, and how do we know what percentage of users want which topic?
Aside from that, I guess we need to also write an essay for the opposing viewpoint. --Auntof6 (talk) 08:46, 3 May 2017 (UTC)
I agree it still needs work. The 90% figure comes from mainly what the reader wants when they search for the term, similar to how primary topics are established on Wikipedia (page views (here and on WP), Google, file county, sources etc), in the case of Plymouth for example when a reader searches for plain "Plymouth" are they searching for the English town 90% of the time or are they searching for the car or the MA town? It also though depends on what someone is likely to categorize, if someone adds "Category:Plymouth" to a page are they likely intending Devon or something else? What opposing views are there? do you agree with the 90% rule? I personally think it should be 95% but I suggested 90% to reduce the amount of pages that might need to be renamed per Nilfanion's concerns about stability. Crouch, Swale (talk) 09:20, 3 May 2017 (UTC)
The opposing view is the one I hold, that categories at the base names should all be dab cats instead of primary topics. You keep citing English Wikipedia's pages about primary topics, but I think that Wikipedia and Commons are fundamentally different in this area, as I explained above. --Auntof6 (talk) 10:14, 3 May 2017 (UTC)
The point that I am making is that primary topics should be stricter than on Wikipedia. You made the point here that not every category should be disambiguated here, as such do you think 95% is still to low? My point was that Nilfanion's view about stable titles and unnecessary changes was also considered here otherwise I might have suggested almost everything be disambiguated or at least higher that 90%. Crouch, Swale (talk) 10:26, 3 May 2017 (UTC)
Any number whatsoever is not suitable - whether thats 50% or 99.999% - as it completely kills editorial judgment - no-one wants arguments like? "this is over 95%", "not its not its 94.9%". "Having a higher bar than en.wp" is perfectly sensible, it doesn't need a numerical quantity.--Nilfanion (talk) 10:44, 3 May 2017 (UTC)
Yes the whole point is that some need to be considered on their own merits but having a guideline of 90% would at least help people have some idea otherwise what is significantly more likely will have different meanings to different people, saying 90% makes it clear that (at least when moving to a primary topic) that it needs generally to be within that order of magnitude. Crouch, Swale (talk) 10:52, 3 May 2017 (UTC)
Just lose the number, its asking for trouble. If you quote a number, it may easily be understood as a hard number (especially by those who don't speak English). A relative term - "if WP wants 75% in this specific case, we want 90%" would be OK, but you aren't going to be able to give a concrete example where that breaks (the edge case on WP never quote percentages like that).--Nilfanion (talk) 10:56, 3 May 2017 (UTC)

@Auntof6: We do have some primary topics (ie Science). Whether we should have primary topic for geographic places is a more specific question. This page should probably address the general not the specific (for places). If you think we shouldn't have them at all for places, the best way to establish that might be with a CFD on the most extreme cases. Plymouth and Cleveland don't cut it - as if concept of primary topic applies on Commons, it might not in those cases as the lead isn't big enough. I'd go for Shanghai: The Chinese city of 24 million, or the unincorporated places in WV and VA. I'd probably start a CFD on that myself but that feels to POINTy.--Nilfanion (talk) 10:44, 3 May 2017 (UTC)

Policy[edit]

Can someone point me to official policy on disambiguation? Evrik (talk) 21:25, 1 May 2018 (UTC)

@Evrik: I'm not seeing a policy, but I just undid your changes to Template:Disambig: not all disambigs are galleries. --Auntof6 (talk) 22:02, 1 May 2018 (UTC)
There is a belief that

“"Galleries are created in the same way as Articles are created in Wikipedia." This means that the "regular" namespace at Commons is reserved for gallery pages.”

Did I get that right @De728631:? I think that a brief guideline may be overdue. Evrik (talk) 16:10, 2 May 2018 (UTC)

@Evrik: That sounds right to me, but Template:Disambig is used in other namespaces, too, not just galleries, so you can't have it categorize everything under galleries. --Auntof6 (talk) 17:08, 2 May 2018 (UTC)
Then we need to fix the template, or use two different templates. Evrik (talk) 18:59, 30 May 2018 (UTC)