Data talk:Memory of the World Register.tab

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Category[edit]

I've added the talk page to Category:Memory_of_the_World_Register since I can't add the data page itself. --Hardwigg (talk) 01:01, 31 January 2019 (UTC)[reply]

Means of Collection[edit]

This data was collected semi-automatically by manually moving through each page of the register and running the following code on Firefox 64.0.2. It should be made fully automatic now that this method has been tested to work.

/** Load a module using unpkg.com. Returns a promise. */
function loadModule(name) {
  return new Promise((res, rej) => {
    var script = document.createElement('script');
    script.type = 'text/javascript';
    script.src = 'https://unpkg.com/' + name;
    script.onload = () => res(script.src);
    script.onerror = e => rej(e);
    document.head.appendChild(script);
  });
}

/** Copies document info from the current page. Returns array that can be pasted into the Data:.tab page. */
async function main() {
  await loadModule('lodash@4.17');

  return _.chunk($(".content h4, .content .csc-textpic-text>:first-child, .content h4+div a:not(.lightbox)"), 3)
  .map(chunk => {
    return {
      title: chunk[0].textContent.trim(),
      description: chunk[1].textContent.trim(),
      link: chunk[2].href
    };
  })
  .map(c => [c.title, c.description, c.link])
}

// Copy the result and add it to the .tab page.
copy(await main());

To add the "year_accepted" column, the data generated from above was passed through the following code (as `d`). The result then replaced the current data. Note there were 2 manual edits that had to be made to make this work. 1 description had the words "<//span>" appended to it; another was missing the period at the end of the sentence.

copy(d.map(row => [...row, parseInt(row[1].match(/(\d+)\.$/), 10)]))

--Hardwigg (talk) 00:52, 31 January 2019 (UTC)[reply]