Commons:Dados legíveis por máquina
No Wikimedia Commons, muito dos metadados (incluído a licença e o autor) não são legíveis por máquinas. Existe um módulo API, iiprop=extmetadata que pode ser usado para recuperar alguns valores (exemplo), mas à medida que a informação é introduzida como texto livre na página própria de descrição do ficheiro, mas a forma como a informação é inserida como texto livre na página de descrição do ficheiro em si não é perfeita. Há planos para mudar os metadados na base de dados$ref, mas isso não vai acontecer em breve.
Para compensar e facilitar a transição para dados mais estruturados num momento futuro, o Wikimedia Commons usa um conjunto de predefinições padrão que foram feitas de forma a serem legíveis por máquina de algumas formas, através de elementos HTML. Alguns scripts já fazem delas. É interessante salientar que esses dados estão disponíveis para qualquer wiki que use o Wikimedia Commons, onde podem ser lidos a partir do código HTML da página Ficheiro:, assim como outros dados locais.
Machine readable data
Machine readable data set by infobox templates
These are several standard infobox templates tagging different elements of the template with different tags to allow parsing of the information. Several different styles of tags are used:
- Microformat tags follow industry standards and can be parsed by already existing tools.
- <td> id attributes (identifiers) are custom markings which allow more complete tags, which have to be read by custom tools. Most universal infoboxes have two column structure: column #1 holds name of the field and column #2 holds the value
- Traditionally <td> id attributes were used to tag the name call in the first column in a row. To get the data, you would need to get the contents of the following
<td>
cell in the second column. - {{Creator}} and {{Institution}} templates have more complicated structure, so the cells with the actual data are tagged with
attributes using magenta background
.
- Traditionally <td> id attributes were used to tag the name call in the first column in a row. To get the data, you would need to get the contents of the following
Predefinição | Nome de parâmetro da predefinição | Descrição | ID de atributo <td> | Microformato | Comentário |
---|---|---|---|---|---|
{{Information}} | description | descrição do ficheiro | fileinfotpl_desc |
hProduct.description. | Often contains multiple languages annotated with {{Lang}}. |
{{Information}} | date | data original de criação da obra | fileinfotpl_date |
hCalendar vevent.dtstart | microformato adicionado pela predefinição {{Date}} |
{{Information}} | source | fonte do ficheiro | fileinfotpl_src |
Often contains entire tables. We have no good way to deal with this source templates yet. Source templates often have references to catalogue IDs, but these are also not machine readable. | |
{{Information}} | author | autor do ficheiro | fileinfotpl_aut |
This can be author, creator and/or copyright holder and is used mixed. Often contains the {{Creator}} template which is described below. | |
{{Information}} | permission | licença/permissão do ficheiro | fileinfotpl_perm |
||
{{Information}} | other versions | outras versões do ficheiro | fileinfotpl_ver |
||
{{Artwork}} | description | descrição da obra de arte | fileinfotpl_desc |
hProduct.description | |
{{Artwork}} | date | data original de criação da obra de arte | fileinfotpl_date |
hCalendar vevent.dtstart | microformat added by {{Date}} template |
{{Artwork}} | source | fonte do ficheiro | fileinfotpl_src |
||
{{Artwork}} | artist | criador da obra de arte | fileinfotpl_aut |
"hProduct.fn value" | |
{{Artwork}} | author | autor da obra de arte | fileinfotpl_aut |
"hProduct.fn value" | |
{{Artwork}} | permission | licença/permissão do ficheiro e obra de arte | fileinfotpl_perm |
||
{{Artwork}} | other versions | outras versões do ficheiro | fileinfotpl_ver |
||
{{Artwork}} | title | título da obra de arte | fileinfotpl_art_title |
hProduct.fn | |
{{Artwork}} | object type | tipo de objeto da obra de arte | fileinfotpl_art_object_type |
||
{{Artwork}} | medium | técnica e meios da obra de arte | fileinfotpl_art_medium |
||
{{Artwork}} | dimensions | dimensões da obra de arte | fileinfotpl_art_dimensions |
||
{{Artwork}} | gallery | instituição que possui a obra de arte | fileinfotpl_art_gallery |
||
{{Artwork}} | location | localização da obra de arte dentro da instituição | fileinfotpl_art_location |
hProduct.locality | |
{{Artwork}} | accession number | número de acesso da obra de arte | fileinfotpl_art_id |
hProduct.identifier | |
{{Artwork}} | object history | object history of the artwork | fileinfotpl_art_object_history |
||
{{Artwork}} | exhibition history | exhibition history of the artwork | fileinfotpl_art_exhibition_history |
||
{{Artwork}} | credit line | credit line of the artwork | fileinfotpl_art_credit_line |
||
{{Artwork}} | inscriptions | inscrições na obra de arte | fileinfotpl_art_inscriptions |
||
{{Artwork}} | notes | notas sobre a obra de arte | fileinfotpl_art_notes |
||
{{Artwork}} | references | referências relacionadas à obra de arte | fileinfotpl_art_references |
||
{{Book}} | Author | autor do livro | fileinfotpl_author |
||
{{Book}} | Editor | editor do livro | fileinfotpl_book_editor |
||
{{Book}} | Translator | tradutor do livro | fileinfotpl_book_translator |
||
{{Book}} | Illustrator | ilustrador do livro | fileinfotpl_book_illustrator |
||
{{Book}} | Title | título do livro | fileinfotpl_book_title |
||
{{Book}} | Subtitle | subtítulo do livro | fileinfotpl_book_subtitle |
||
{{Book}} | Series title | título da série do livro | fileinfotpl_book_series-title |
||
{{Book}} | Authority file | dados de controlo de autoridade | fileinfotpl_book_authority |
||
{{Book}} | Publisher | publicação do livro | fileinfotpl_book_publisher |
||
{{Book}} | Printer | impressor do livro | fileinfotpl_book_printer |
||
{{Book}} | Year of publication | data ou ano da publicação do livro | fileinfotpl_date |
||
{{Book}} | Place of publication | sítio ou cidade da publicação do livro | fileinfotpl_book_place-of-publication |
||
{{Book}} | Language | idioma do livro | fileinfotpl_book_language |
||
{{Book}} | Description | descrição do livro | fileinfotpl_desc |
||
{{Creator}} | Name | Nome do criador | creator |
vCard.fn | |
{{Creator}} | Alternative names | Nomes alternativos do criador | fileinfotpl_creator_alt-name_value |
vCard.nickname | |
{{Creator}} | Description | Nacionalidade e ocupação(ões) do criador | fileinfotpl_creator_desc_value |
vCard.note | |
{{Creator}} | Date of death | Data da morte do criador | fileinfotpl_creator_deathdate_value |
||
{{Creator}} | Date of birth | Data do nascimento do criador | fileinfotpl_creator_birthdate_value |
vCard.bday | |
{{Creator}} | Location of birth/death | Local da morte do criador | fileinfotpl_creator_deathloc_value |
||
{{Creator}} | Location of birth | Local de nascimento do criador | fileinfotpl_creator_birthloc_value |
||
{{Creator}} | Work period | Período de atividade do criador | fileinfotpl_creator_work-period_value |
||
{{Creator}} | Work location | Local de trabalho do criador | fileinfotpl_creator_work-location_valuev |
||
{{Creator}} | Image | retrato ou foto a mostrar o criador | fileinfotpl_creator_image |
||
{{Creator}} | Authority file | Controlo de autoridade relacionado com o criador | fileinfotpl_creator_authority_value |
| |
{{FileContentsByBot}} | (vários) | depende, por favor confira {{FileContentsByBot}} | (various) |
hproduct-by-bot | grande conjunto de dados e ainda em crescimento, por favor confira {{FileContentsByBot}} |
{{Photograph}} | title | título da fotografia | fileinfotpl_art_title |
hProduct.fn | |
{{Photograph}} | description | descrição da fotografia | fileinfotpl_desc |
hProduct.description | |
{{Photograph}} | original description | descrição arquivística original da fotografia | fileinfotpl_desc |
hProduct.description | |
{{Photograph}} | date | data da criação da obra de arte original | fileinfotpl_date |
hCalendar vevent.dtstart | microformat added by {{Date}} template |
{{Photograph}} | medium | técnica e meios da fotografia | fileinfotpl_art_medium |
||
{{Photograph}} | dimensions | dimensões da fotografia | fileinfotpl_art_dimensions |
||
{{Photograph}} | artist | criador da fotografia | fileinfotpl_aut |
"hProduct.fn value" | |
{{Photograph}} | institution | instituição que possui a fotografia | fileinfotpl_art_gallery |
||
{{Photograph}} | location | localização da fotografia dentro da instituição | fileinfotpl_art_location |
hProduct.locality | |
{{Photograph}} | source | fonte do ficheiro | fileinfotpl_src |
||
{{Photograph}} | permission | licença/permissão do ficheiro e obra de arte | fileinfotpl_perm |
||
{{Photograph}} | other versions | outras versões do ficheiro | fileinfotpl_ver |
||
{{Photograph}} | accession number | número de acesso da fotografia | hProduct.identifier |
Alternative format for CommonsMetadata
Because the table + id based format proved very hard to add to templates which were not formatted similarly to the Commons information template, CommonsMetadata allows an alternative format, similar to license templates: the whole information template has to be enclosed in a fileinfotpl
class and the tag containing the specific information needs to have a fileinfotpl_*
class (same names as above, but class, not id).
Conjunto de dados legíveis por máquina por predefinições de licença
Introduced in October 2010, using classes <span class="licensetpl_XXX">
licensetpl
- An element identifying a license. Wraps the entire license code and should be a SINGLE license, not a multi license.
licensetpl_short
- Short name of the license: “Public domain”, “CC BY-SA 3.0”, “CC by 2.0 fr”, etc.
licensetpl_long
- Long name of the license: “Public domain”, “Creative Commons Attribution-Share Alike 3.0”,
licensetpl_attr_req
- Whether attribution is required. “true” or “false”.
licensetpl_attr
- The requested attribution: Free text.
licensetpl_link_req
- Whether a link to the license is required for this license. “true” or “false”.
licensetpl_link
- The link to the license deed. “www.creativecommons.org/licenses/by-sa/XXX/YYY”
licensetpl_nonfree
- “true“ if this is a non-free license (not used on Commons, only on wikis with an EDP)
Multiple licensetpl
blocks for the same work might be wrapped in a block using the class licensetpl_wrapper
.
Templates setting this information
- Templates setting
licensetpl
include:
{{PD-Layout}}, {{Cc-by-sa-3.0-migrated}}, {{Cc-by-layout}}, {{Cc-by-sa-layout}}, {{Cc-zero}}, {{FAL}}, {{GFDL}}, {{GFDL-1.2}}, {{GPL}} e {{LGPL}}.
Machine readable data set by style formatting templates
Style formatting templates, meant to provide uniform styles to different families of non-license templates, carry machine readable data identifying these families.
Predefinição | Propósito | nome da classe |
---|---|---|
{{Restriction-Layout}} | used by Restriction tags | restrictiontemplate
|
{{FoP-Layout}} | used by freedom of panorama tags | foptemplate
|
{{Partnership-Layout}} | used by Partnership templates | partnershiptemplate
|
{{Source-Layout}} | used by generic Source templates | sourcetemplate
|
{{Created with}} | used by Created with ... templates | createdwithtemplate
|
Machine readable data set by non-copyright restriction templates
Templates regarding non-copyright legal restrictions carry these classes to identify specific types of restrictions.
Template(s) | Purpose | class name |
---|---|---|
{{Trademarked}} | Trademarked images | restriction-trademarked
|
{{Copydesign}} | Copyrighted designs | restriction-design
|
{{Communist symbol}} | Communist symbols | restriction-communist
|
{{Italy-MiBAC-disclaimer}} {{Soprintendenza}} | Italian cultural goods | restriction-ita-mibac
|
{{Australian Commonwealth reserve}} | Australian reserves | restriction-aus-reserve
|
{{Personality rights}} {{Romania personality rights}} | Personality rights | restriction-personality
|
{{2257}} | Child Protection and Obscenity Enforcement Act warning (United States) | restriction-2257
|
{{Costume}} | Costuming | restriction-costume
|
{{Fan art}} | Fan art | restriction-fan-art
|
{{Currency}} | Currency | restriction-currency
|
{{IHL Symbol}} | Symbols restricted by International Humanitarian Law | restriction-ihl
|
{{Nazi symbol}} | Nazi and fascist symbols | restriction-nazi
|
{{Insignia}} | Official insignia | restriction-insignia
|
Machine readable data set by specific templates
More machine-readable data are set. Here is a non-exhaustive list:
- {{Personality rights}}
<span class="commons-template-name" style="display:none" id="commons-template-personality-rights">Personality rights</span>
- {{Credit line}}
<td id="fileinfotpl_credit" class="fileinfo-paramfield fileinfotpl_credit" style=""></td>
Machine-readable data set by location templates
{{Location}} and similar templates add machine-readable geocodes in the following format: <span class="geo">12.34;24.68</span>
(latitude and longitude as floating-point numbers, separated by a semicolon). The coordinates use the en:WGS84 system (same as the GPS and most online maps). See Commons:Geocoding for more details.
Uso
MediaWiki API
(Open in API Sandbox) that returns some useful parameters such as Credit, Artist, LicenseUrl and Copyrighted and is used by Media Viewer, for example.
Scripts que usam dados legíveis por máquina
- MediaWiki:Gadget-Stockphoto.js
- MediaWiki:GallerySlideshow.js
- MediaWiki:Gadget-AddInformation.js
- MediaWiki:FileContentsByBot.js
Ferramentas externas
Ver também
- Category:Templates generating microformats
- Commons:WikiProject Microformats
- Category:Files with lack of machine-readability
- Projetos experimentais e descontinuados: Commons:API, Commons:Commons API
Defining new machine readable data
- Do NOT use HTML id's, use classes. An ID can only be used once per page and most of these fields can occur multiple times per page. Consider for instance descriptions of derivative works, which can include information about the original and the derivative.
- When possible, wrap the actual data, not some field header. This last method is historically used for all our Information templates, but much harder to support in the long run.
- Wrap data, not the way the data is formatted.
- Expect that formatting is lost when converting to data. Visual dress up is not part of the information.
- Don't wrap multiple units of information inside one field. There is a difference between a publication date and a creation date. Both are dates, but both are different 'data fields'. Also CC BY-SA-4.0-3.0-2.5 is not a license name, those would be 3 licenses with the name CC BY-SA-##.
- Make sure that the data value has one unit, or outputs one consistent unit.
Problems
There are a few things that are currently NOT or badly recognizable. These include:
- Derivative works
- Works included in works. See also Category:FoP_templates
- licenses derivates or works included in works are a mess.
- Author vs. Copyright holder
- usernames vs 'real names'
- Catalogue IDs etc
- VRTS permissions
- Publication date vs creation date
- Donating institutions of materials
- Anything that is NOT using the above structures is not recognizable at all and will require manual cleanup at some point.
- Heirs: {{Heirs-license}}
- Multilicensed CC works, that use {{Cc-by-3.0,2.5,2.0,1.0}}, {{Cc-by-sa-2.5,2.0,1.0}}, {{Cc-by-sa-4.0,3.0,2.5,2.0,1.0}} or {{Cc-by-all}}.
- Non-licensed works: {{Copyrighted free use}}, {{Attribution}} (Problem, how to describe this grant of rights success ?)
- Improvised File description templates like User:Tevaprapas/Information
- Templates denoting the copyright of partials of the work: {{Copyright information}}