English subtitles for clip: File:Wikimania 2021 Abstract Wikipedia and Wikifunctions Introduction.webm
Jump to navigation
Jump to search
1 00:00:00,100 --> 00:00:06,000 Hello, and welcome to Wikimania! Thank you for your interest in Abstract Wikipedia and Wikifunctions. 2 00:00:06,000 --> 00:00:12,000 Our goal is a world where everyone can share in the sum of all knowledge 3 00:00:12,000 --> 00:00:20,000 But today the knowledge in Wikipedia is very unevenly distributed. English has millions of articles, 4 00:00:20,000 --> 00:00:30,000 but Hausa, with sixty million native speakers, and thirty million second language speakers, has about ten thousand articles. 5 00:00:30,000 --> 00:00:36,000 The fundamental issue is that Wikipedia’s cost is basically the number of topics times the number of languages - 6 00:00:36,000 --> 00:00:42,000 all articles about a topic in each language are created and maintained completely independently. 7 00:00:42,000 --> 00:00:46,000 Can we change that? Can we turn the multiplication - 8 00:00:46,000 --> 00:00:54,000 into an Addition and thus reduce the cost of Wikipedia by two orders of magnitude and more. 9 00:00:54,000 --> 00:00:58,000 Yes, we can! Wikidata demonstrates a step in this direction: 10 00:00:58,000 --> 00:01:02,000 This is the article about Marie Curie in the Urdu Wikipedia. 11 00:01:02,000 --> 00:01:07,000 These blue parts - the top part, the side part - are translated by MediaWiki. That’s taken care of. 12 00:01:07,000 --> 00:01:13,000 The red parts - the infobox, references, language links, and Authority control files 13 00:01:13,000 --> 00:01:17,000 they already are not maintained locally anymore, but in Wikidata. 14 00:01:17,000 --> 00:01:24,000 And a growing number of Wikipedia language editions choose to share from Wikidata’s common knowledge base, 15 00:01:24,000 --> 00:01:27,000 to work together on maintaining knowledge in one single place, 16 00:01:27,000 --> 00:01:33,000 to keep it up to date, to keep it correct, and to use it to display knowledge on their Wikipedias. 17 00:01:33,000 --> 00:01:38,000 So all we have to do is to move everything from Wikipedia to Wikidata and done? 18 00:01:38,000 --> 00:01:44,000 Unfortunately, no. The major problem is that Wikidata has a very limited expressivity. 19 00:01:44,000 --> 00:01:49,000 It cannot do Narration, and narration is fundamental for humans to learn. 20 00:01:49,000 --> 00:01:54,000 Wikidata can’t do reference by description, and it is bad at redundancy. 21 00:01:54,000 --> 00:01:57,000 To just illustrate the last thing, about redundancy, 22 00:01:57,000 --> 00:02:03,000 there is one fact about Marie Curie which is mentioned in the lead of basically every language edition of her article, 23 00:02:03,000 --> 00:02:06,000 even the short Urdu article we saw: 24 00:02:06,000 --> 00:02:12,000 She is the only person to ever have won the Nobel Prize in two different scientific fields! 25 00:02:12,000 --> 00:02:15,000 This is a fact that Wikidata cannot express, 26 00:02:15,000 --> 00:02:20,000 although it is clearly important as we can see because all these Wikipedias had contributors 27 00:02:20,000 --> 00:02:24,000 who went through the effort of writing it down in the lead of the article. 28 00:02:24,000 --> 00:02:30,000 Abstract Wikipedia will extend the expressivity Wikidata provides considerably, 29 00:02:30,000 --> 00:02:36,000 and then we will have functions to generate natural language text from this new abstract representation 30 00:02:36,000 --> 00:02:43,000 and allow the Wikipedias to fill their gaps with this baseline knowledge content generated from Wikidata. 31 00:02:43,000 --> 00:02:50,000 All we need to do, is switch out this function, used on the same content, and we get text in a different language. 32 00:02:50,000 --> 00:03:00,000 This leads to an architecture where we have only one content per topic and one set of renderers per language, thus indeed fulfilling the goal - 33 00:03:00,000 --> 00:03:06,000 of having the cost of Wikipedia being topics plus languages, not topics times languages. 34 00:03:06,000 --> 00:03:13,000 So, who is going to write all these functions that generate text in hundreds of languages? 35 00:03:13,000 --> 00:03:20,000 Well, even though there are plenty of people in our panel, that’s not enough people to create renderers for hundreds of languages. 36 00:03:20,000 --> 00:03:26,000 Instead we will create a platform, a new wiki, where a community will be able to create functions - 37 00:03:26,000 --> 00:03:28,000 Wikifunctions 38 00:03:28,000 --> 00:03:33,000 A wiki where you can use functions to answer your questions, 39 00:03:33,000 --> 00:03:36,000 Where you can create more functions to answer more types of questions 40 00:03:36,000 --> 00:03:40,000 We will be the first Wikimedia project to launch since 2012. 41 00:03:40,000 --> 00:03:48,000 Wikifunctions aims to be fully multilingual, like Wikidata, both in terms of natural as well as programming languages. 42 00:03:48,000 --> 00:03:53,000 And we won’t be focusing just on natural language functions, but we will do all kinds of functions. 43 00:03:53,000 --> 00:03:55,000 So, what is a function? 44 00:03:55,000 --> 00:04:03,000 A function takes some input and then we have some algorithm, some recipe, that allows us to calculate the output of the function. 45 00:04:03,000 --> 00:04:06,000 That’s the technical definition. But more importantly: 46 00:04:06,000 --> 00:04:09,000 Functions are a type of knowledge 47 00:04:09,000 --> 00:04:15,000 Functions are a type of knowledge, and therefore it’s our job to allow everyone to share in this knowledge. 48 00:04:15,000 --> 00:04:19,000 But currently, we are not that good at letting anyone share in this kind of knowledge. 49 00:04:19,000 --> 00:04:23,000 The big tech companies graciously allow you to access some functions: 50 00:04:23,000 --> 00:04:30,000 Ask Siri how many teaspoons are in two tablespoons, and it will answer. That’s a function, running right there on the phone for you. 51 00:04:30,000 --> 00:04:37,000 Or go to Bing and ask, “When was Wikipedia founded?” and it will not check the Web index, even though Bing is a search engine, 52 00:04:37,000 --> 00:04:42,000 but it will run a function against its Knowledge Graph, Satori, which is like our Wikidata. 53 00:04:42,000 --> 00:04:46,000 Take that date, go to DuckDuckGo, and ask how many days have passed since then. 54 00:04:46,000 --> 00:04:51,000 Again, the result is not a search result, but a function evaluation. 55 00:04:51,000 --> 00:04:58,000 Ask Google about the volume of a Pyramid, and it will give you this beautiful handcrafted experience, 56 00:04:58,000 --> 00:05:01,000 and the answer will be calculated by a function. 57 00:05:01,000 --> 00:05:09,000 But if you move out of these select few handcrafted experiences that the tech companies have chosen to deploy - you are out of luck. 58 00:05:09,000 --> 00:05:16,000 Even though we have enough computing power in our hands, most of us cannot use this computing power to calculate the functions we need. 59 00:05:16,000 --> 00:05:19,000 Functions are a type of knowledge 60 00:05:19,000 --> 00:05:22,000 Knowledge is power 61 00:05:22,000 --> 00:05:25,000 And functions really are a superpower 62 00:05:25,000 --> 00:05:32,000 Because functions are not just knowledge, no, they can with confidence answer questions no one has ever asked before 63 00:05:32,000 --> 00:05:38,000 Functions create knowledge. And what’s more a superpower than creating knowledge? 64 00:05:38,000 --> 00:05:44,000 We want to democratize this superpower. We want to give it to everyone with access to the Web. 65 00:05:44,000 --> 00:05:48,000 For this we will introduce Wikifunctions as a new wiki project, 66 00:05:48,000 --> 00:05:53,000 where a community can create functions, and where people can use functions to answer their questions. 67 00:05:53,000 --> 00:05:59,000 (These next few slides show some old mockup designs that just give a rough idea of how it might work.) 68 00:05:59,000 --> 00:06:07,000 This wiki will have functions such as converting tablespoons to teaspoons, like Siri did earlier 69 00:06:07,000 --> 00:06:12,000 Calculating how many days passed between two dates, like in DuckDuckGo 70 00:06:12,000 --> 00:06:19,000 Taking a geoshape uploaded to Commons and calculate how many square miles that shape covers. 71 00:06:19,000 --> 00:06:23,000 Or taking a string and reversing it 72 00:06:23,000 --> 00:06:31,000 Here’s a short demo of how the "reverse" function looks like in our current Prototype 73 00:06:31,000 --> 00:06:37,000 We see that the function takes a string, here's the argument that takes a string, and returns a string, 74 00:06:37,000 --> 00:06:46,000 and it has one implementation in JavaScript, that looks like this. It is stored in another page, and is being transcluded. 75 00:06:46,000 --> 00:06:57,000 This is the reverse function. And if we want to run it, we can go to the "evaluate function call" special page, 76 00:06:57,000 --> 00:07:09,000 and call the reverse function on a value like "Wikipedia". And the results here: "aidepikiW" 77 00:07:09,000 --> 00:07:14,000 The interface shown here is still very early and very rough 78 00:07:14,000 --> 00:07:22,000 We are working right now on a full design of our interface, and here is a very first sneak peek. 79 00:07:22,000 --> 00:07:24,000 but back to the older mockups. 80 00:07:24,000 --> 00:07:29,000 Here’s a simple function, for multiplying positive numbers. 81 00:07:29,000 --> 00:07:37,000 As I said earlier, we support several programming languages. So far we have implemented JavaScript and Python, we want to support many more. 82 00:07:37,000 --> 00:07:43,000 Here’s how multiplication is implemented in JavaScript, using the native multiplication. 83 00:07:43,000 --> 00:07:46,000 Here in Python, basically the same 84 00:07:46,000 --> 00:07:52,000 This implementation though is different. Here we implement multiplication in something called composition. 85 00:07:52,000 --> 00:08:01,000 We take functions that already exist in Wikifunctions and compose them together in order to get more powerful functions implemented. 86 00:08:01,000 --> 00:08:09,000 Now the really interesting part here is: since these functions used in the composition are functions in Wikifunctions itself 87 00:08:09,000 --> 00:08:15,000 And just like Items in Wikidata have their QID, each Function also has its own page, its own ZID 88 00:08:15,000 --> 00:08:24,000 The functions down here, add, zero, and so on, are ZIDs with labels in different language 89 00:08:24,000 --> 00:08:28,000 And that allows us to switch languages 90 00:08:28,000 --> 00:08:32,000 And we can see the same composition in German 91 00:08:32,000 --> 00:08:34,000 Or in Bengali 92 00:08:34,000 --> 00:08:39,000 And not just read the implementation in Bengali, but also write it in Bengali 93 00:08:39,000 --> 00:08:44,000 We are aiming to allow people to create implementations without having to know basic English, 94 00:08:44,000 --> 00:08:49,000 Which is a major blocker for many people in the world to create functions, as research has repeatedly shown - 95 00:08:49,000 --> 00:08:54,000 including research by frequent Wikimania attendee Benjamin Mako Hill. 96 00:08:54,000 --> 00:09:00,000 For many many people this will be the first time that they will be able to write implementations in their native language. 97 00:09:00,000 --> 00:09:08,000 For hundreds of millions of people we will unlock that superpower of using, reading, and writing functions. 98 00:09:08,000 --> 00:09:13,000 For Wikifunctions, we will have a new community, with people from the existing projects, 99 00:09:13,000 --> 00:09:17,000 and mixing in new people who have never contributed to our projects before. 100 00:09:17,000 --> 00:09:21,000 We want to make functions accessible in many different ways, 101 00:09:21,000 --> 00:09:25,000 through many different gateways, wherever you need them, whenever you need them 102 00:09:25,000 --> 00:09:32,000 on the Web, on your phone, through an assistant, within a spreadsheet - and in the Wikimedia projects. 103 00:09:32,000 --> 00:09:39,000 And we want to generate multilingual content from expressive abstract representations in 300+ languages 104 00:09:39,000 --> 00:09:44,000 Which is why we need to make it available to as many people as possible. 105 00:09:44,000 --> 00:09:50,000 To aim at people who don’t speak English. At people who are not already coders. 106 00:09:50,000 --> 00:09:56,000 We call this aiming high, aiming wide, instead of narrowing on just English speaking coders. 107 00:09:56,000 --> 00:10:01,000 We have a lot of discussion about this, because it is very ambitious, it is difficult. 108 00:10:01,000 --> 00:10:05,000 We are doing user research to understand how difficult it is. 109 00:10:05,000 --> 00:10:07,000 Just one surprising example: 110 00:10:07,000 --> 00:10:13,000 most people think that coding correlates with STEM (Science, Technology, Engineering, Mathematics) abilities, with mathematical abilities, 111 00:10:13,000 --> 00:10:20,000 and many people who don’t think of themselves as being strong in STEM dismiss participating in coding before even taking a closer look. 112 00:10:20,000 --> 00:10:25,000 Even though research published last year in Nature shows no correlation between STEM and programming, 113 00:10:25,000 --> 00:10:31,000 but a clear and strong correlation between language aptitude and learning programming languages. 114 00:10:31,000 --> 00:10:34,000 Language aptitude! Exactly! 115 00:10:34,000 --> 00:10:39,000 So the very same people who are contributing to Wikipedia and to Wiktionary 116 00:10:39,000 --> 00:10:45,000 are exactly the people that would have the capabilities and the motivation to contribute. 117 00:10:45,000 --> 00:10:48,000 And we need their contributions to cover 300 languages. 118 00:10:48,000 --> 00:10:53,000 We need to solve the challenge of how to frame the project, and reach them, 119 00:10:53,000 --> 00:10:59,000 because in our user research so far they are the ones who say “oh, I don’t need to understand this” and go away. 120 00:10:59,000 --> 00:11:04,000 That’s why it is so important to get the framing right, to make it inviting. 121 00:11:04,000 --> 00:11:11,000 Here are two early milestones we want to pass next year: Paradigms for inflections, and Abstract Descriptions 122 00:11:11,000 --> 00:11:16,000 The first is to create functions that can generate regular inflections for all kinds of words and languages. 123 00:11:16,000 --> 00:11:21,000 These can then be used for the lexicographic content in Wikidata 124 00:11:21,000 --> 00:11:24,000 Or for the inflection tables in the Wiktionaries. 125 00:11:24,000 --> 00:11:31,000 But instead of having the inflections implemented in 160 or so Wiktionaries independently (in templates, modules, or manually), 126 00:11:31,000 --> 00:11:39,000 we implement it in Wikifunctions and make it available for all Wiktionaries, and also for Wikidata and for Abstract Wikipedia! 127 00:11:39,000 --> 00:11:41,000 Abstract Descriptions 128 00:11:41,000 --> 00:11:45,000 In Wikidata, every Item has a description in every language. 129 00:11:45,000 --> 00:11:51,000 Before we can generate whole articles for Abstract Wikipedia, we will be able to generate noun phrases. 130 00:11:51,000 --> 00:11:59,000 And descriptions? They are usually noun phrases. So we aim to make Abstract Descriptions for Wikidata another early milestone. 131 00:11:59,000 --> 00:12:04,000 This year we are focusing on developing Wikifunctions. 132 00:12:04,000 --> 00:12:12,000 In 2022 we will start adding a focus on Abstract Wikipedia, working on the early milestones, Paradigms and Abstract Descriptions 133 00:12:12,000 --> 00:12:18,000 And in 2023 we would finally get to the place where we can create Abstract Content 134 00:12:18,000 --> 00:12:23,000 and where we will be able to read and contribute to that content across many different languages, 135 00:12:23,000 --> 00:12:28,000 where we have a truly multilingual Wikipedia, getting so much closer to a world, 136 00:12:28,000 --> 00:12:32,000 Where everyone can share in the sum of all knowledge. 137 00:12:32,000 --> 00:12:35,000 Thank you for your attention