This is a draft of a rant I'm going to post to Commons-L soonish. Comments, criticism, refutation welcome

One question about categories: are they actually useful? If so, how do we know? What empirical evidence do we have that all this effort helps users?

My definition of "useful" would be: a person with no pre-existing knowledge of Commons comes to the site, and, by drilling down into categories, finds what s/he wants without backtracking or thrashing around.

My guess is that this NEVER happens, although as of February 2011 I have no data either way.

  • In 2011, people are much more comfortable with searching, and to a lesser extent, tagging.
  • Perversely, the best-categorized images are the ones that you have to drill down the farthest to find.
  • Most categories are made because they are easy, not because they are useful. Example from Wikipedia: "1977 births". I can't imagine the use case for browsing through all the people born in a specific year.
  • The world has moved on from hierarchical categories, and so should we.

Categories are useful when they map well to how the searching user would have classified things. When Yahoo was a hierarchical listing of sites, they had teams of experts constantly testing if their hierarchy was consistent and clear. On Commons there is no similar effort to achieve consistency and clarity; it's all up to the user who uploaded the thing. So it's not at all obvious that Commons' categorization effort helps the users.

In any case, Yahoo and pretty much every other site have abandoned the hope of a hierarchical classification of everything. Very rapidly you end up with categories which have too many members to navigate, or a hierarchy that is too deep to navigate. Searching and tagging are far more efficient.

Categories seem to be semi-useful:

  • as "galleries" -- the aesthetic experience of looking at all the pictures in a group
  • as work queues -- "Images that need such-and-such"
  • for subjects with a well-known hierarchical taxonomy: like biology.

So perhaps we should create more specialized systems that do correspond to those use cases.

Otherwise, I'd advocate abolishing Categories entirely from Commons, along with all the debates about categorization bots or category flattening or any other insoluble problem that they cause. (I think it is obvious that if there is a bot that can determine the category of something from, say, the textual description, then the textual description was already superior as a means of finding that item.)

Tagging would also be helpful, since it's the one kind of distributed effort that is lightweight enough that users will actually do it. (Even so, only 2% of the images on Flickr are actually well-tagged.) It is true that tagging tends to fail for multilingual systems, but categories have similar issues anyway. Perhaps we should be putting our efforts into a tagging system that is internationalizable somehow, where if I tag something as "fr:chat" it's not the same thing as "en:chat".

Bottom line:

The overwhelming majority of users are going to find our content from Google or other search engines. Search engines are going to make decisions based on description text, title, and how well-linked the page is. In light of this fact we should be focusing our attention on getting good titles and descriptions, and perhaps translating them.

Hierarchical categories are a distraction; a side issue. They are an outmoded means of structuring content on the web, preserved in MediaWiki only by historical accident. They are neither inevitable nor necessary nor helpful.

They may be worth some effort for very special use cases, but not the vast amount of attention the Commons community is paying to them.

NeilK (talk) 18:42, 23 February 2011 (UTC)

