Commons:Library back up project

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search


  • 213 BC
    English: Qin Shi Huang destroyed all privately-held unorthodox books in by fire.
    中文:公元前213年,秦始皇用火焚毁了所有私藏的非正统书籍。
  • 206 BC
    English: Xiang Yu set a fire on the governmental library containing unique copies of the books, sounding the death of ancient Chinese thoughts and history.
    中文:公元前206年,项羽放火烧毁了藏有书籍孤本的政府图书馆,这标志着中国古代思想和历史的死亡。
  • 1408 AD
    date QS:P,+1408-00-00T00:00:00Z/9
    English: Yongle Encyclopedia is completed, comprising 22,937 chapters in 11,095 volumes and 917,480 pages. Only one copy after that original copy was made. Most of them are lost in history and only about 800 chapters survive today.
    中文:《永乐大典》于1408年完成,全书22937章,11095卷,917480页。在制作了原本之后仅制作了一份副本。它们中的大多数都已在历史中消失,如今仅存约800章。
    Works related to zh:永乐大典 at Wikisource
  • 1932 AD
    date QS:P,+1932-00-00T00:00:00Z/9
    English: 463 thousand Han Fen Lou rare books were burned in war.
    中文:1932年,战火焚烧了46.3万册涵芬楼善本。

English: To prevent such regrettable things that destroy the memory of mankind ever happen again, let's systematically back up the world's all surviving books in public domain to Wikimedia Commons.
中文:为了防止这种破坏人类记忆的令人遗憾的事情再次发生,让我们系统性地将世界上所有公有领域的书籍备份到维基共享资源。

list of destroyed libraries

English: We hope, eventually, all ancient books will be uploaded to Wikimedia Commons, transcripted in Wikisource and cited in Wikipedia.
中文:我们希望,最终所有古代书籍都将上传到维基共享资源、在维基文库中转写成文本、在维基百科中受到引用。

English: Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.
中文:想象一个世界,在这个世界上,地球上的每一个人都可以自由访问所有人类知识的总和。

English: Jimmy Wales
中文:吉米·威尔士

2468839 files worth of books saved!
as of 15:45 26 April 2024.

Total file size: 109.93TB

Benefits[edit]

  1. One more back up, one more hope that the books can be saved. This could benefit the preservation of our civilisations. Wikimedia Commons allows web crawlers. Interested readers could easily back up another copy.
  2. More accessible to readers around the world. Some library website could be very slow to foreign readers.
  3. The books can be transcribed by user in Wikisource.
  4. Pictures of the whole page can be directly used as illustration in Wikipedia. If one needs to be cropped first, it can be easier categorised and be found later.
  5. The books can be linked as references in other Wikimedia Projects.
  6. The books can be easily annotated and categorised by volunteers in this site, as it uses the MediaWiki system.

Standards[edit]

Quality[edit]

English: To preserve history, books should be saved in their highest quality available from the source website.
中文:为了保存历史,书籍应该以来源的最高清晰度保存。

Naming[edit]

The name of the book file should contain 3 parts:

  • Source abbreviation.
  • Source ID.
  • Name and volume of the book.

For example:

CADAL02079034 明史(一).djvu

Categorisation[edit]

Books should be categorised in names of their original language. Categorisation is necessary because many books have many volumes or several editions. For unity, books currently only have a single volume and a single edition should also be categorsied, as they could have derivations in the future. The categories should contain {{Category for book}} or {{Category for book series}} so that they can be identified as such. The category for genres of the book should be placed in the category page rather than the file page. However, if a category only applies to certain volumes of a book, then they can be placed at the file pages.

Help / 帮助[edit]

  • img2pdf - Convert images to PDF without re-encoding nor losing quality. 将图片转换为pdf的python包。
  • use in other projects - How to insert in Wikipedia and transcribe in Wikisource. 如何将图书插入维基百科,在维基文库转写为文本。
  • how to download - You can and are welcomed to downloads books and the entire collection. 您可以下载几本或所有图书,欢迎!没有限制。

Projects (libraries)[edit]

Project Country List of files Description Source characteristics Uploading status
CADAL

(usage)

 China Ancient

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

Republican era

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

China Academic Digital Associative Library, https://www.cadal.cn/ , is a book digitalization project in China. The project was initiated by China and US computer scientists and began in Dec. 2000 as Million Book Digital Library Project (MBP) and later as Universal Digital Library (UDL). In Sept. 2002, the project was renamed as the China-America Digital Academic Library (CADAL). In Aug. 2009, it was renamed as the China Academic Digital Associative Library. A million books were digitalised in 2001-2006 and another 1.5 million books were digitalised in 2007-2012. Visit project background. Both old and new books were digitalised in CADAL. Volunteers would like to upload public domain content to Wikimedia Commons to make the books more accessible. The initial uploads (Nov. 2019 for ancient books and late 2022 for Republican era books) were from files found in a web drive, which presumably obtained before the official website started to control for file downloads.
It is now difficult to get newly digitalised books from the site, since users are allowed to "loan" only 3 books for a period of time.
The initial uploads from the netdrive were completed.
NAJDA

(usage)

 Japan 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 National Archives of Japan https://www.digital.archives.go.jp/ preserve Japanese government documents and historical records and make them available to the public. The site includes many digitalised documents and books. Most can be directly uploaded by url, while large (>50MB) often fails and needs to be downloaded to PC and then upload. Chinese and Japanese books from the Cabinet Library scanned before mid-2020 have been completely uploaded.
WUL

(usage)

 Japan 01 02 03 04 05 06 07 08 09 10 11 12 failed Waseda University Library https://www.wul.waseda.ac.jp/ is one of the largest libraries in Japan. Can be directly uploaded by url. Completed
NCPSSD

(usage)

 China NCPSSD failed National Center for Philosophy and Social Sciences Documentation http://www.ncpssd.org/about.aspx is a digital book project initiated by the Chinese Academy of Social Sciences. Most were downloaded from a netdrive and could be downloaded in the website. Some files are are not available from the website and are skipped. On June 2023, new files added to the website were uploaded here. Completed
Harvard-Yenching

(usage)

 USA 1 2 3 failed Harvard-Yenching Library, Harvard University. https://curiosity.lib.harvard.edu/chinese-rare-books Completed (Note, there are occasional incomplete jpg download and such images are shown as blurred in the pdf. The uploader only realized the issue when the upload were almost finished. Welcome to report files affected and the uploader will fix them.)
NLC

(usage usage usage usage usage usage usage usage usage usage usage usage usage usage usage)

 China 刊(01 02 03 04 05 06 07 08 09 10) 金藏 古(01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16) EB 書(01 02 03 04 05 06 07 08 09 10 11 PD2022(01 02 03 04 05)) 津(01 02 03 04 05) 文(01 02 03 04 05 06 07 08 PD2022(01 02 03)) 族1 族2 National Library of China (Q732353) in Peking maintains the China Ancient Books Resource Library with hundreds of thousands of scanned books or images available via the Internet. http://read.nlc.cn/thematDataSearch/toGujiIndex Metadata and files are scraped with custom scripts. No direct link available. Metadata waits for further processing (e.g. extract author name one by one and category accordingly). In progress (stats corrupted log)
ssid

(usage)

 China Ancient and Republican era books

1800-1899 1900-1909 1910-1915 1916-1920 1921-1925 1926-1928 1929-1930 1931-1932 1933-1934 1935 1936 1937-1938 1939-1940 1941-1942 1943-1944 1945-1946 1947-1948 1949-1950 irregular date

Likely to be in PD (most are reprints of old books)

1951-1980 1961-2000 1981-1990 1991-2000 2001-2005 2005-2010 2011-2015 2016-2019

Without date information in metadata and likely in PD

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16

From an online source that use ssid as identifier. Old books identified by three criteria were uploaded. Completed
3GM

(usage)

 China Three Gorges Museum (Q10872783) located in Chongqing digitized dozens of books as listed at http://www.3gmuseum.cn/web/ancient/toAncient.do?itemno=50&itemsonno=8280819a60d452fe0160d46016c00016 . No plan to backup antique pictures on its website for now. File naming is not following the guidelines since there is no obvious consistent numeric identifier. Completed (books only)
IOC.UTokyo

(usage)

 Japan 1 2 3 failed Scans from the Institute for Advanced Studies on Asia, the University of Tokyo. http://shanben.ioc.u-tokyo.ac.jp/list.php Completed
National Central Library

Taiwan eBook (usage)

Rare Books & Special Collections (usage)

 Taiwan Taiwan eBook

01 02 03 04 05 06 07 failed

Rare Books & Special Collections

rare 1 rare 2 Peking UW

For scans from the Taiwan eBook database, National Central Library. https://taiwanebook.ncl.edu.tw/ Books published before 1950 (included) are uploaded. For scans from the Taiwan eBook database, they were from the website. For files from Rare Books & Special Collections, they were downloaded from netdrive. Completed
NDL

(usage)

 Japan file list failed The National Diet Library is the national library of Japan and among the largest libraries in the world. It was established in 1948 for the purpose of assisting members of the National Diet of Japan (国会, Kokkai) in researching matters of public policy. The website provides IFFF download and PDF request download. Some rare books were downloaded using the former method and others were downloaded using the latter method. Completed
YNUTCM

(usage)

 China 1 Yunnan University of Chinese Medicine (Q8061337) Ancient Books Digital Library Hundreds of books publicly available for now, totaling thousands of volumes. Pdfs are created by combining page images with img2pdf. Some volumes are broken or missing. Completed
Naxi literature Database

(usage)

 China list Watermark make text difficult to read. PDF files avaliable. Completed
WZLib

(usage)

 China db: 01 02; oyjy Wenzhou Library (Q67499772)[1][2] More than 20k ancient books, featuring Wenzhou local books. The library hosts two separate websites where most books are the same. A few files are missing(1, 2) Completed with roughly 1/5 non-PD ones omitted
Shanghai Library

(usage)

 China ancient books genealogy books Shanghai Library (Q616272) Downloaded from Baidu Netdrive. In progress

Projects (book collections)[edit]

Project Country List of files Description Source characteristics Uploading status
The Collected Works of Mahatma Gandhi (usage)  India The complete works of Mahatma Gandhi are 100 volumes in 3 languages: English, Hindi and Gujarati. All volumes are in the public domain in India. These used to be available in several websites, but these are now access restricted. Some volumes have been deleted, and will be undeleted when the copyright in USA expires. Wikisource: The Collected Works of Mahatma Gandhi Completed
The Single Tax / Land&Liberty  United Kingdom Scans are available at Henry George Foundation Land&Liberty is a quarterly magazine of popular political economics. Land&Liberty was launched in June 1894 under the title The Single Tax, published as “The Organ of the Scottish Land Restoration Union”. The periodical changed its title in 1902 to Land Values and subsequently in 1919 to Land&Liberty. Earlier issues are uploaded to Commons. Later ones are uploaded to Wikisource. Wikisource: Portal:Land&Liberty In progress
✓ Done until 1905
L'Illustration  France Scans are available at Internet Archive and Hathitrust: [3] L'Illustration was a weekly French newspaper published in Paris from 1843 to 1944. Despite that all issues are in the public domain in France, the publisher claims a copyright, and restricts access to the scans. Issues published after 1936 are still under a copyright in USA. Wikisource: L’Illustration (in French) In progress
✓ Done until 1852
Young India  India Scans are available on Hathitrust Young India was a weekly paper or journal in English founded by Lala Lajpat Rai in 1916 and published by Mahatma Gandhi from 1919 to 1931. Wikisource: Young India Completed
Project Blue Book  USA Scans are available on Internet Archive. It includes more than 12,000 files. Project Blue Book was one of a series of systematic studies of unidentified flying objects (UFOs) conducted by the United States Air Force. In progress
Mercure de France  France Scans are available at Internet Archive and Hathitrust Mercure de France was originally a French gazette and literary magazine first published in the 17th century. The gazette was published from 1672 to 1724 (with an interruption in 1674–1677) under the title Mercure galant (sometimes spelled Mercure gallant; 1672–1674) and Nouveau Mercure galant (1677–1724). The title was changed to Mercure de France in 1724. The gazette was briefly suppressed (under Napoleon) from 1811 to 1815 and ceased publication in 1825. The name was revived in 1890 for both a literary review and (in 1894) a publishing house initially linked with the symbolist movement. Wikisource: Mercure de France (in French) In progress
✓ Done until 1901
Camera Notes  USA Scans are available at Hathitrust Camera Notes was a photographic journal published by the Camera Club of New York from 1897 to 1903. It was edited for most of that time by photographer Alfred Stieglitz and was considered the most significant American photography journal of its time. Completed
Camera Work  USA Scans are available at Modernist Journals Project Camera Work was an American quarterly photographic journal published by Alfred Stieglitz from 1903 to 1917. Wikisource: Camera Work Completed
Dictionnaire de la noblesse  France Scans are available on Gallica (partial), Internet Archive (partial), and Hathitrust Dictionary on nobility published from 1863 until 1876 in 19 volumes Completed
Nobiliaire universel de France  France Scans are available at Internet Archive and Hathitrust Dictionary on nobility published from 1872 and 1877 in 21 volumes Completed
Œuvres complètes de Tolstoï  France Scans were not available on the Internet. These books were scanned by the Library of Geneva. Only one publication of the complete works of Tolstoy in French from 1902 to 1911. None of these is available today. The whole collection should have included 43 volumes, but volumes 23, 25, 29 to 35, and 38 to 43 were never published. Wikisource: Œuvres complètes de Léon Tolstoï (in French) Completed
Warren Commission Hearings and United States House Select Committee on Assassinations  USA Scans available at Internet Archive Important documents on the assassinations of J. F. Kennedy and M. L. King (12 volumes x 2, and 26 volumes). Completed
Journal asiatique  France Scans available at Gallica and Internet Archive The Journal asiatique is a biannual peer-reviewed academic journal established in 1822 covering Asian studies. It is one of the oldest continuous French publications. Wikisource: Journal asiatique (in French) In progress
Encyclopédie  France Scans from various sources The Encyclopédie was a general encyclopedia published in France between 1751 and 1772, edited by Denis Diderot and, until 1759, co-edited by Jean le Rond d'Alembert. Wikisource: Encyclopédie, ou Dictionnaire raisonné des sciences, des arts et des métiers Completed
Revue de linguistique et de philologie comparée  France Scans available at Internet Archive French Journal of Linguistics and Comparative Philology (1867 to 1916, 48 volumes) Wikisource: Revue de linguistique et de philologie comparée (in French) Completed
Catalogue général de la librairie française  France Scans available at Internet Archive French General Catalog of French Publications, 1867, 34 volumes Wikisource: Livre:Catalogue général de la librairie française, 1867, tome 1.djvu (in French) Completed
Revue de philosophie  France Scans available at Internet Archive French Philosophy Journal founded by Émile Peillaube (1900 to 1940) In progress
La Revue blanche  France Scans available at Gallica La Revue blanche was a French art and literary magazine run between 1889 and 1903 (30 volumes). Wikisource: La Revue blanche (in French) Completed
Recueil général des anciennes lois françaises  France Scans available at Internet Archive General Collection of old French laws, 1821, 29 volumes Wikisource: Recueil général des anciennes lois françaises, depuis l’an 420 jusqu’à la Révolution de 1789 (in French) Completed
Correspondance littéraire  France Scans available at Internet Archive Literary, philosophical and critical correspondence, 1753 to 1882, 31 volumes Wikisource: Correspondance littéraire, philosophique et critique (in French) Completed
Revue de l'histoire des colonies françaises  France Scans available at Gallica Journal of the History of the French Colonies, 1913 to 1922, 10 volumes Wikisource: Revue de l’histoire des colonies françaises (in French) Completed
Revue de l’histoire des religions  France Scans available at Internet Archive Journal of the History of Religions, 1880 to today Wikisource: Revue de l’histoire des religions (in French) In progress
Annales du Musée Guimet  France Scans available at Internet Archive Annals of the Guimet Museum, 1889 to 1938, ? volumes Wikisource: Annales du Musée Guimet (in French) In progress
Revue philosophique de la France et de l’étranger  France Scans available at Internet Archive Philosophical Journal from France and abroad, 1876 to 1937, 123 volumes Wikisource: Revue philosophique de la France et de l’étranger (in French) Completed
Revue pédagogique  France Scans available at Hathitrust Pedagogic Journal, 1878 to 1942, 125 volumes Wikisource: Revue pédagogique (in French) In progress
✓ Done until 1927
Correspondance générale de J.-J. Rousseau  France Scans available at Internet Archive General correspondence of J.-J. Rousseau, 1924, 20 volumes. Volume 15 missing, but available at HT Completed
Revue internationale de l'enseignement  France Scans available at Gallica: 1881-1940 International Journal on Teaching, in French, 1878-1940 In progress
✓ Done until 1880
Statutes at Large (Ruffhead)  United Kingdom Scans available at Google Books and Hathitrust Owen Ruffhead's The Statutes at Large, and also the Runnington's Edition. Listing the Acts of the Parliaments of England and the United Kingdom. Two series of 14 volumes (one missing), PDF and DjVu format. Wikisource: s:en:The Statutes at Large (Ruffhead) Completed
Book collection of John Geraghty Mostly UK Scans available at Internet Archive 34 super rare books uploaded by a private collector Completed
Project Blue Book Mostly USA Scans available at Internet Archive, but not at the original source +10,000 documents from the US Air Force about UFO sightings In progress
Upload mostly done, needed checking.
Chinese Old Book Photocopies  China Chinese Old Book Photocopies Some published photocopies of Chinese old books. Their quality are not as good as direct scans of old books. However, some rare books are only avaliable in this form. In progress
LibriVox recordings Worldwide Recodings available at Internet Archive 18,277 items on June 2023. Recordings proposed on Commons:Media of the day In progress

Note, even for completed collections, updates should be checked and update here.