Template talk:Book/2020

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Wikidata parameter

@Jarekt: Should the wikidata parameter receive the Wikidata ID of the book (as just a book) or the Wikidata ID for the specific edition that has been uploaded to Commons? I'm trying to figure this out for File:The Life of the Spider.djvu. Should I use Q64992886 or Q51499529? I guess another way to phrase this question is, what data from Wikidata about the book is used by this template? Kaldari (talk) 17:48, 29 August 2019 (UTC)

Kaldari, I pull a lot of fields. In addition to whatever you might pull for {{Artwork}} (as those 2 templates share most of the code) I pull publication date, publisher, printer, translator, date of publication, place of publication, etc. Items with instance of (P31) = version, edition or translation (Q3331189) are usually the best option, so I would pick The Life of the Spider (Q51499529) over The Life of the Spider (Q64992886), but the second one would be fine too. --Jarekt (talk) 23:31, 29 August 2019 (UTC)
@Jarekt and Kaldari: I think it would actually be useful to have more nuance here. For what we call "books" there are three main things we typically need to refer to: the work as an abstract thing; a particular edition of that work, that may have various differences from other editions; and physical copies of a particular edition of a particular work.
For the latter, think of Shakespeare's w:First Folio: the extant copies are catalogued in exhaustive detail in multiple independent tomes (there are at least three 1000+-page independent surveys of extant folios), and the w:Folger Shakespeare Library even has a registry of distinguishing physical characteristics of each copy detailed enough that they can pinpoint which specific copy they're dealing with (there was a famous recent case where a stolen first folio was attempted sold). Last time I went looking I found 23 of them that had been digitised and put online.
But for most scans uploaded here we don't really care which physical copy it is (and trying to model that at WD is tedious). And for some we may not have the information to say specifically which edition it is, but we know the work and the work might even be notable enough to have a Wikipedia article (e.g. w:The Plays of William Shakespeare) which it would be good to link it to. Linking a copy or an edition to a work on WD is incorrect (it asserts an "isa" relationship that is false).
I think it would be good to reflect this nuance by something like keeping "|wikidata=" as an alias for "|wikidata-work=" with semantics "is associated with", and adding "|wikidata-edition=" and "|wikidata-copy=" whose semantics are more specifically that the file is a representation of that particular edition or copy. That would be backward compatible, if imprecise, but still allow us to express the more granular and accurate relationships. Or, put another way, it would let us query for "all copies of the First Folio" or "all editions of Hamlet" and similar.
Thoughts? --Xover (talk) 09:42, 1 July 2020 (UTC)
Xover, yes we could add all those extra parameters and possibly display them in a way that would make clear what they are. However I think a better way would be to model it clearly on Wikidata and than pull it from there. I have not looked at how are those relationships modeled there but if there is some way to unambiguously move between work, edition and specific copy it would be great. Book template should be linked to the most specific one. By the way, there is also a 4th level with items like The Mysterious Stranger (Q2582895) or Diary of Anne Frank (Q6911) where multiple competing texts exist. --Jarekt (talk) 13:31, 1 July 2020 (UTC)
@Jarekt: See d:WD:WikiProject Books. It is indeed possible to traverse the hierarchy to get from more specific (copy) to most general (work), but, for obvious reasons, not vice versa. I agree we should use the most specific, but that's qualified with "the most specific that we care about in this case". For most cases we simply do not care which particular copy it is (a mass-produced book), and for many other cases we either do not care or do not know which specific edition it is. There is a need to be able to express all these relationships through parameters that establish a link to Wikidata (i.e. a QID) with semantics that express what the relationship is (because all scans are inherently copies, but most of the time it is either the edition or work that can be identified).
I'm not sure what you mean by "4th level" though. Those Wikidata items both appear to be a work (which incorrectly has a publisher or date of publication property; those belong on the edition). Without studying the issue in any depth, in my experience most variations can be modelled as an edition (including translations: they're just an edition in a different language and with an extra author called a "translator"). I could be wrong.
PS. and BTW: Structured data for Commons + a custom UI ala. Index: namespace at Wikisource (see e.g. s:Index:A School History of England (1911).djvu when editing) would be a great way to deal with metadata for books. The WS bit is provided by mw:Extension:ProofreadPage and uses a template to define the fields displayed in the custom UI (i.e. something like Template:Book on steroids ;D). --Xover (talk) 14:07, 1 July 2020 (UTC)
Xover,
  • By most specific I mean, that our file is of some unique copy of the book so it should be linked to the most specific item we have, be it work, edition or specific copy
  • The forth level is "literary work" for which multiple manuscripts exist or which does not have a single universally agreed version. "Mysterious Stranger" by Mark Twain was never finished and we have multiple versions of the text, the publish versions might be based on different drafts and can vary a lot. Same with "Dairy of Anne Frank" or even a Bible where we have different set of books depending on your denomination and very different translations.
  • I agree that specific files on commons might store in SDC more data about specific copy which might not be relevant to more generic Wikidata item. We should use it more
  • Index namespace on Wikisource might be great but it is not usable to me as I do not think I can access information stored there. The best think from commons perspective would be to copy whatever data is locked there into common repository on Wikidata, the way we are moving for years data locked in individual file infoboxes to Wikidata.
I am open to any specific suggestions about how to improve {{Book}} template, but at this point most of the effort is moving away from more input fields in the template and towards better use of SDC and Wikidata. At this point I am doing too many things, and do not have time to devote to researching ways to improve the Book template, but will make time if someone wants to cooperate on designing new code and than integrating their code into Book code. --Jarekt (talk) 17:39, 1 July 2020 (UTC)
Agree to giving only the most specific item in a parameter here. The more general items can be found via exemplar -> exemplar of (P1574) -> edition -> edition or translation of (P629) -> work. Then data can be pulled from the more general items, too. --Marsupium (talk) 18:43, 4 July 2020 (UTC)

Changes to template to accomodate manuscripts

Would it be possible and advisable to adapt the book template to accomodate manuscripts? Some necessary information would be: Author, Title, Language, Scribe, Illuminator, Place of creation, Date of creation, Material, Size, Binding, Object history, Exhibits, Institution, Access number, Source, Permission, Notes, References Venicescapes (talk) 16:18, 10 January 2020 (UTC)

@Venicescapes: I think manuscripts may be better treated as objects in a collection, using the {{information}} template which already has a lot of the parameters you list above. Manuscripts are not really a good match for the "book" model, and, I believe, archives and museums tend to make the same distinction for that very reason. --Xover (talk) 09:46, 1 July 2020 (UTC)

Where IA link given then linking to jp2 zip components at IA

@Jarekt: I have noticed that IA has each page of a work now directly available with an apparently stable url, where the page of work is available as a jp2 file through .zip view details. You can see an example at https://archive.org/download/whofearstospeako00cuma then follow the view details by the jp2.zip link

The formulaic string to the view details is … https://archive.org/download/<ia-identifier>/<ia-identifier>_jp2.zip/

These jp2 files are usually the best quality available links from which we can extract and clean up individual images. I believe it would be really helpful if we could look to leverage the {{IA}} data that many have used within the template to provide a direct link to jp2 listing page with some encouragement to utilise to extract the images.

I will be looking to see what we can do on the Wikisource side, however, there it is harder as the pages are more removed from the metadata, so anything would be more manual at this stage, especially with the backlog of the broken linkage triangles between Commons/Wikidata/Wikisource.  — billinghurst sDrewth 06:52, 14 November 2019 (UTC)

billinghurst, books on Commons got very little attention over the years. They often do not have very good metadata and great many are not connected to wikidata, so they do not benefit from metadata there. If you or other wikisource experts come up with a way to get more out of {{IA}} or to improve {{Book}} I can help with coding it up or deploying, but at the moment I am kind of swamped with adding Structured data support to Module:Artwork and Module:Information. --Jarekt (talk) 13:53, 15 November 2019 (UTC)
@Jarekt and Billinghurst: (and CC tpt and Samwilson regarding ia-upload, see below) Just noting for the future: I looked into this a while back and found that while the convention described above is indeed good practice at IA, it is just a convention that is not enforced and a significant number of works there do not adhere to it. We probably (probably; I haven't checked specifically) can get there by querying their API (which might involve downloading a .xml or .json file and parsing it!) but that would in practice have to be done in JavaScript (or, I suppose, in a periodic bot run). It might be a worthwhile non-default Gadget to have for those of us who work a lot with books, but is probably not a good general / default solution.
But as I write this it occurs to me that there are several reasons it would be good to have easy access to the various files that you get from a IA link. Extracting high-res images for use in a transcription on Wikisource is one, and similarly (but not exact the same use case) to illustrate a Wikipedia article on some subject. But the PDF and DjVu files IA generates from the scan images are also scaled down and recompressed, so we sometimes need to check the individual scan images or even the raw uncorrected scan images (raw scans are color-correct, rotated, and cropped; and the process sometimes gives bad results). And for some works we might want to regenerate a new DjVu file at full resolution or with new OCR (non-English scripts are not always correctly detected; or a newer OCR engine will give better OCR results then whatever IA used in 2006).
Maybe it would be worthwhile to mirror all the IA data here for a given work, with some convention for where the files will live relative to the PDF or DjVu? A category with a given naming convention or something. That make it feasible to have something croptool-ish designed specifically to extract an illustration from a book for use on Wikisource, or an easy automatic link to at least the directory of page images, even if we still can't link a specific image. If implemented by the ia-upload tool, and backfilled by a bot based on files using a valid {{IA}} template, it might be workable at scale.
It's a bigger thing then a template tweak, of course, so mostly just throwing the idea out there. --Xover (talk) 10:10, 1 July 2020 (UTC)
@Xover: That is what Hesperian did at enWS, and part of the reason for not bringing the images here was that they needed cleanup to multiple extents. I note that Fae has already brought many images in. I would also note that IA can have multiple editions of the same work, and some have better OCR, some have better scans, others have poor image scans, so some issues will be solved, some will be complications.  — billinghurst sDrewth 16:09, 5 July 2020 (UTC)
@Billinghurst: Hesperian pulled in only the pages with illustrations, as detected by the enWS "raw image" template. My thought here was more the brute force approach: pull in the whole zip file with all the scan images from IA. It won't get us a direct link to the correct scan image, but we could get a link to the per-work category where all pages from that work live in a format that's convenient (vs. PDF/DjVu). A lot fewer clicks and more navigable than trying to find your way at IA. And since the image is now inside the Wikimedia ecosystem, it's feasible to have something similar to CropTool that streamlines extracting an illustration from a page scan and makes it available for use in our transcription of that page (without downloading it at IA; editing it in an image editor; uploading it to Commons; fiddling with the information templates here; jumping over to enWS to use it; etc.). Plus it'd work equally well for all language Wikisources, not just enWS. --Xover (talk) 17:03, 5 July 2020 (UTC)
@Xover: , You aware of the Upload efforts of other Commons contributors. I will also mention https://phabricator.wikimedia.org/T257025 here. ShakespeareFan00 (talk) 09:27, 5 July 2020 (UTC)

Contributors beyond author/editor/translator/illustrator

I keep running into cases where I need to abuse one of the author/editor/translator/illustrator fields to be able to document a contributor to a work. This is desirable for both bibliographic metadata and for documenting copyright status for works first published in a pma-based copyright jurisdiction.

The two immediate cases are books with an introduction or a foreword by a different contributor than the main author; and collections of works by multiple authors (but typically edited by a single main editor).

The former is exemplified by File:Lessingsnathanwi00lessrich.pdf. Its main content is a play authored by Gotthold Ephraim Lessing (author); the edition was edited by Ernest Bell (editor); and it contains a foreword by Edward Brooks, Jr. (author, but not the main author).

The latter by File:Farthest North, vol. 1 (1897).djvu, which is written by Fridtjof Nansen (author) and Hjalmar Johansen (author), but contains an appendix written by Otto Sverdrup (contributing author). Volumes 3 and 4 of that series are also the "scientific report" from Nansen's polar expedition, with each chapter being a report from a different scientific specialty expedition member, and is still in copyright in Norway (first published) because some of the authors are still within the pma. 70 span. Or the general case of a collection of essays or short stories: the collection has an editor, but each chapter has an independent author and are distinct works. This would also be the case for a typical magazine or newspaper: each article has a separate author (and independent copyright). Early Sherlock Holmes stories where serialised in The Strand Magazine, for example.

enwp has w:Module:Citation/CS1 (which implements all the other main citation templates), which supports parameters like "|editor1-last=", "|editor1-first=", "|editor2-last=", etc. for granular metadata (separate first and last names, and an arbitrary number of editors). And the safety valve there is "|contributor1-last=" and a "|contribution=" parameter to describe the contribution. Since that template is for citations to one specific work (either the foreword, or a chapter, or an appendix; but never more than one of them) it only supports a single contribution parameter. But this might be one possible way to handle it here.

Support multiple author/editor/translator/illustrator fields indexed by "|author1=", "|author2=" (first/last isn't needed here if creator templates or WD QIDs are used, which I think should be preferred). And then add "|contributor1=", "|contributor2=" that are matched with "|contribution1=", "|contribution2=", that takes values like "Foreword", "Introduction", "Appendix", or simply a chapter or essay title.

However it's done it should support an arbitrary number of authors/contributors/etc. since the list can easily be several tens of people, and in extreme (but not exceedingly rare!) cases it can run into the hundreds (think along the lines of a collection of short poems: one poem, and thus author, per page, for up to a thousand pages; or a collected volume of a daily newspaper for an entire year).

@Jarekt: Thoughts? --Xover (talk) 09:06, 1 July 2020 (UTC)

Xover {{Book}} / {{Artwork}} etc. templates were always designed to have (for most part) one input parameter per displayed field. However those fields did not have much restrictions on format of the content (a major difference from various citation templates) and users were allowed to hand-craft it as much as they wanted. That philosophy is not likely to change with those old templates used on millions of pages. We can write new templates that do things differently, but I do not want to be adding any new input parameters to the template. However if you menage to express some of those nuances in Wikidata model, we can work on a code to present such models within the current template. Module:Wikidata art has a lot of customized code to read Wikidata models (mostly for artwork) and present them within {{Artwork}}, we can do the same with {{Book}} template. --Jarekt (talk) 13:50, 1 July 2020 (UTC)
Xover and Jarekt, there are author of foreword (P2679) and author of afterword (P2680). For more specific indications one of the existing properties can be used with object has role (P3831) or applies to part (P518) qualifiers. It would be nice to then pull all that information with {{Artwork}} and {{Book}}. --Marsupium (talk) 18:48, 4 July 2020 (UTC)
That does sounds like a good idea. I can look into it but a faster way would be is someone proposed modification to Module:Wikidata art. --Jarekt (talk) 03:01, 5 July 2020 (UTC)
Marsupium, I added support for Wikidata's author of foreword (P2679) and author of afterword (P2680), and for title page number (P4714) SDC property. --Jarekt (talk) 03:30, 10 July 2020 (UTC)

Multi-language books

Any way correctly describe multi-language books? Many 19 century books from Russia using two languages — some European and official (like that has two parallel text in Russian and French). Also Vocabularies, Grammatics etc are existing. — Preceding unsigned comment added by Artem.komisarenko (talk • contribs) 06:14, 03 October 2016 (UTC)

@Artem.komisarenko: Fixed by User:Jarekt on 9 October 2020, though marking a book this way requires familiarity with Wikidata—you must create an entry for it there, if one does not already exist, and give the property language of work or name (P407) multiple values, one for each language. File:Swahili tales.djvu, which corresponds to d:Q99526042, is an example.
(See Module talk:Artwork#Could this template be made compatible with multi-language works? for technical fix details for the template.) --truthious andersnatch 07:52, 10 October 2020 (UTC)

Link to TOC, index, etc in djvu/pdf

Once properties chosen for these (Properties_table#djvu%2Fpdf_files), it could be interesting to present direct links to these.

Either with or without a (small) thumbnail. Jura1 (talk) 12:18, 16 July 2020 (UTC)

Compare with d:Q96643524#P2670
Instead of P2670, a specialized property could be created. That would avoid having to sanity check all P2670 values/allow applying constraints for P7668.
@Jarekt: what do you think? Jura1 (talk) 06:22, 19 July 2020 (UTC)
Jura1, I think it would be great to display link to TOC or index if such data was stored. I guess it would need to be stored in SDC instead of Wikidata as those would be file properties. At the moment {{Book}} is able to access and use title page number (P4714) to show icon of the title page, but we certainty can also show other key pages. The key issue would be to identify such pages in large number of djvu/pdf files. Maybe that is something that can be scraped from Wikisource pages? --Jarekt (talk) 18:46, 20 July 2020 (UTC)
  1. Yes, storing it here is preferable, especially as it should eventually be possible to render files with it.
  2. For the TOCs it may be possible to retrieve it from Wikisource. Wikisource index pages have a field that can link to it.
  3. There is a property proposal on some aspects of index pages.
  4. Many djvu/pdf of BHL may never have such pages however. The question is then if we should create similar index pages here (or maybe the pagination part, thus the property proposal).
  5. Actual data would need still need to be filled in. This would be manual or assisted with some tool, unless a way to import it from elsewhere exists. IA usually seems to know the location of the title page (even if it can be off by 1 or 2). Maybe this is available somewhere and can be reformatted.
    Jura1 (talk) 21:55, 20 July 2020 (UTC)

Scanned periodicals

Some in Files from the Biodiversity Heritage Library using this template are several bound issues of periodicals combined.

Should these use issue (P433) or page(s) (P304), possibly start time (P580)&end time (P582) to indicate the range(s)? Jura1 (talk) 13:17, 20 July 2020 (UTC)


Row with initial pages

To check a pdf, I think a row with thumbnails of the initial pages could be handy, e.g. for this file, it would look like:

Initial pages

Maybe even a more compact view is possible. I suppose this should go in Module:Artwork somewhere. --- Jura1 (talk) 10:14, 22 July 2020 (UTC)

That would be fine with me, but let me Ping some other people lattely active in this forum. @Billinghurst, Xover, and Marsupium: . --Jarekt (talk) 13:46, 22 July 2020 (UTC)
Hi, no opposition, but I'd be restrained, I've first seen this on mobile, and the pages are all displayed in a column and also in desktop view it takes a lot of space. So maybe smaller size, not to many or collapsed by default? --Marsupium (talk) 16:20, 22 July 2020 (UTC)
Good point. Maybe adding class="wdinfo_nomobile" is sufficient? Did that just now. --- Jura1 (talk) 18:19, 22 July 2020 (UTC)

 Comment For many of books that WS hosts it is not going to be that useful to display books from the origin, as there is much emptiness at the beginnings and ends. Typically the title page is the start of the useful content, and occasionally the page prior can be a frontispiece so could have value.

If we were going to do something, I am not convinced that it should not be the default. Could it be an optional that calls pages on request? Say based on image page, display image page, then do -1, +1, +2, +3 as the default, then give them the option to present more, +4, +5, +6+, +7, +8 from image page.  — billinghurst sDrewth 20:58, 22 July 2020 (UTC)

  • I think for most books, we don't know what title page is and thus the advantage of this row. We can already do +1/-1, but that doesn't really help finding it. --- Jura1 (talk) 06:32, 23 July 2020 (UTC)
I guess we could skip books that are used by Wikisource, but I don't think there is currently a way of knowing what they are (at least for templates/SDC). --- Jura1 (talk) 06:48, 23 July 2020 (UTC)

Multiple languages

Multiple languages is still a problem, even with values pulled from Wikidata. It can handle multiple "Object type" values properly but here it seems to just choose the first language alphabetically, even though there are two normal-ranked values on Wikidata. --truthious andersnatch 03:39, 12 September 2020 (UTC)

Fixed 8 October 2020, see details above. --truthious andersnatch 07:52, 10 October 2020 (UTC)
Sorry I thought I left a message here about fixing it. --Jarekt (talk) 21:59, 10 October 2020 (UTC)