English subtitles for clip: File:Wikimania 2021 Abstract Wikipedia and Wikifunctions Introduction.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:00,100 --> 00:00:06,000
Hello, and welcome to Wikimania! Thank you for your interest in Abstract Wikipedia and Wikifunctions.

2
00:00:06,000 --> 00:00:12,000
Our goal is a world where everyone can share in the sum of all knowledge

3
00:00:12,000 --> 00:00:20,000
But today the knowledge in Wikipedia is very unevenly distributed. English has millions of articles,

4
00:00:20,000 --> 00:00:30,000
but Hausa, with sixty million native speakers, and thirty million second language speakers, has about ten thousand articles.

5
00:00:30,000 --> 00:00:36,000
The fundamental issue is that Wikipedia’s cost is basically the number of topics times the number of languages - 

6
00:00:36,000 --> 00:00:42,000
all articles about a topic in each language are created and maintained completely independently.

7
00:00:42,000 --> 00:00:46,000
Can we change that? Can we turn the multiplication -

8
00:00:46,000 --> 00:00:54,000
into an Addition and thus reduce the cost of Wikipedia by two orders of magnitude and more.

9
00:00:54,000 --> 00:00:58,000
Yes, we can! Wikidata demonstrates a step in this direction:

10
00:00:58,000 --> 00:01:02,000
This is the article about Marie Curie in the Urdu Wikipedia.

11
00:01:02,000 --> 00:01:07,000
These blue parts - the top part, the side part  - are translated by MediaWiki. That’s taken care of.

12
00:01:07,000 --> 00:01:13,000
The red parts - the infobox, references, language links, and Authority control files

13
00:01:13,000 --> 00:01:17,000
they already are not maintained locally anymore, but in Wikidata. 

14
00:01:17,000 --> 00:01:24,000
And a growing number of Wikipedia language editions choose to share from Wikidata’s common knowledge base, 

15
00:01:24,000 --> 00:01:27,000
to work together on maintaining knowledge in one single place, 

16
00:01:27,000 --> 00:01:33,000
to keep it up to date, to keep it correct, and to use it to display knowledge on their Wikipedias.

17
00:01:33,000 --> 00:01:38,000
So all we have to do is to move everything from Wikipedia to Wikidata and done?

18
00:01:38,000 --> 00:01:44,000
Unfortunately, no. The major problem is that Wikidata has a very limited expressivity.

19
00:01:44,000 --> 00:01:49,000
It cannot do Narration, and narration is fundamental for humans to learn. 

20
00:01:49,000 --> 00:01:54,000
Wikidata can’t do reference by description, and it is bad at redundancy.

21
00:01:54,000 --> 00:01:57,000
To just illustrate the last thing, about redundancy, 

22
00:01:57,000 --> 00:02:03,000
there is one fact about Marie Curie which is mentioned in the lead of basically every language edition of her article, 

23
00:02:03,000 --> 00:02:06,000
even the short Urdu article we saw:

24
00:02:06,000 --> 00:02:12,000
She is the only person to ever have won the Nobel Prize in two different scientific fields!

25
00:02:12,000 --> 00:02:15,000
This is a fact that Wikidata cannot express, 

26
00:02:15,000 --> 00:02:20,000
although it is clearly important as we can see because all these Wikipedias had contributors 

27
00:02:20,000 --> 00:02:24,000
who went through the effort of writing it down in the lead of the article.

28
00:02:24,000 --> 00:02:30,000
Abstract Wikipedia will extend the expressivity Wikidata provides considerably, 

29
00:02:30,000 --> 00:02:36,000
and then we will have functions to generate natural language text from this new abstract representation 

30
00:02:36,000 --> 00:02:43,000
and allow the Wikipedias to fill their gaps with this baseline knowledge content generated from Wikidata.

31
00:02:43,000 --> 00:02:50,000
All we need to do, is switch out this function, used on the same content, and we get text in a different language.

32
00:02:50,000 --> 00:03:00,000
This leads to an architecture where we have only one content per topic and one set of renderers per language, thus indeed fulfilling the goal -

33
00:03:00,000 --> 00:03:06,000
of having the cost of Wikipedia being topics plus languages, not topics times languages.

34
00:03:06,000 --> 00:03:13,000
So, who is going to write all these functions that generate text in hundreds of languages?

35
00:03:13,000 --> 00:03:20,000
Well, even though there are plenty of people in our panel, that’s not enough people to create renderers for hundreds of languages.

36
00:03:20,000 --> 00:03:26,000
Instead we will create a platform, a new wiki, where a community will be able to create functions -

37
00:03:26,000 --> 00:03:28,000
Wikifunctions

38
00:03:28,000 --> 00:03:33,000
A wiki where you can use functions to answer your questions,

39
00:03:33,000 --> 00:03:36,000
Where you can create more functions to answer more types of questions

40
00:03:36,000 --> 00:03:40,000
We will be the first Wikimedia project to launch since 2012.

41
00:03:40,000 --> 00:03:48,000
Wikifunctions aims to be fully multilingual, like Wikidata, both in terms of natural as well as programming languages.

42
00:03:48,000 --> 00:03:53,000
And we won’t be focusing just on natural language functions, but we will do all kinds of functions.

43
00:03:53,000 --> 00:03:55,000
So, what is a function?

44
00:03:55,000 --> 00:04:03,000
A function takes some input and then we have some algorithm, some recipe, that allows us to calculate the output of the function. 

45
00:04:03,000 --> 00:04:06,000
That’s the technical definition. But more importantly:

46
00:04:06,000 --> 00:04:09,000
Functions are a type of knowledge

47
00:04:09,000 --> 00:04:15,000
Functions are a type of knowledge, and therefore it’s our job to allow everyone to share in this knowledge.

48
00:04:15,000 --> 00:04:19,000
But currently, we are not that good at letting anyone share in this kind of knowledge.

49
00:04:19,000 --> 00:04:23,000
The big tech companies graciously allow you to access some functions:

50
00:04:23,000 --> 00:04:30,000
Ask Siri how many teaspoons are in two tablespoons, and it will answer. That’s a function, running right there on the phone for you.

51
00:04:30,000 --> 00:04:37,000
Or go to Bing and ask, “When was Wikipedia founded?” and it will not check the Web index, even though Bing is a search engine, 

52
00:04:37,000 --> 00:04:42,000
but it will run a function against its Knowledge Graph, Satori, which is like our Wikidata.

53
00:04:42,000 --> 00:04:46,000
Take that date, go to DuckDuckGo, and ask how many days have passed since then.

54
00:04:46,000 --> 00:04:51,000
Again, the result is not a search result, but a function evaluation.

55
00:04:51,000 --> 00:04:58,000
Ask Google about the volume of a Pyramid, and it will give you this beautiful handcrafted experience, 

56
00:04:58,000 --> 00:05:01,000
and the answer will be calculated by a function.

57
00:05:01,000 --> 00:05:09,000
But if you move out of these select few handcrafted experiences that the tech companies have chosen to deploy - you are out of luck.

58
00:05:09,000 --> 00:05:16,000
Even though we have enough computing power in our hands, most of us cannot use this computing power to calculate the functions we need.

59
00:05:16,000 --> 00:05:19,000
Functions are a type of knowledge

60
00:05:19,000 --> 00:05:22,000
Knowledge is power

61
00:05:22,000 --> 00:05:25,000
And functions really are a superpower

62
00:05:25,000 --> 00:05:32,000
Because functions are not just knowledge, no, they can with confidence answer questions no one has ever asked before

63
00:05:32,000 --> 00:05:38,000
Functions create knowledge. And what’s more a superpower than creating knowledge?

64
00:05:38,000 --> 00:05:44,000
We want to democratize this superpower. We want to give it to everyone with access to the Web.

65
00:05:44,000 --> 00:05:48,000
For this we will introduce Wikifunctions as a new wiki project, 

66
00:05:48,000 --> 00:05:53,000
where a community can create functions, and where people can use functions to answer their questions. 

67
00:05:53,000 --> 00:05:59,000
(These next few slides show some old mockup designs that just give a rough idea of how it might work.)

68
00:05:59,000 --> 00:06:07,000
This wiki will have functions such as converting tablespoons to teaspoons, like Siri did earlier

69
00:06:07,000 --> 00:06:12,000
Calculating how many days passed between two dates, like in DuckDuckGo

70
00:06:12,000 --> 00:06:19,000
Taking a geoshape uploaded to Commons and calculate how many square miles that shape covers.

71
00:06:19,000 --> 00:06:23,000
Or taking a string and reversing it

72
00:06:23,000 --> 00:06:31,000
Here’s a short demo of how the "reverse" function looks like in our current Prototype

73
00:06:31,000 --> 00:06:37,000
We see that the function takes a string, here's the argument that takes a string, and returns a string,

74
00:06:37,000 --> 00:06:46,000
and it has one implementation in JavaScript, that looks like this. It is stored in another page, and is being transcluded. 

75
00:06:46,000 --> 00:06:57,000
This is the reverse function. And if we want to run it, we can go to the "evaluate function call" special page, 

76
00:06:57,000 --> 00:07:09,000
and call the reverse function on a value like "Wikipedia". And the results here: "aidepikiW"

77
00:07:09,000 --> 00:07:14,000
The interface shown here is still very early and very rough

78
00:07:14,000 --> 00:07:22,000
We are working right now on a full design of our interface, and here is a very first sneak peek.

79
00:07:22,000 --> 00:07:24,000
but back to the older mockups.

80
00:07:24,000 --> 00:07:29,000
Here’s a simple function, for multiplying positive numbers.

81
00:07:29,000 --> 00:07:37,000
As I said earlier, we support several programming languages. So far we have implemented JavaScript and Python, we want to support many more.

82
00:07:37,000 --> 00:07:43,000
Here’s how multiplication is implemented in JavaScript, using the native multiplication.

83
00:07:43,000 --> 00:07:46,000
Here in Python, basically the same

84
00:07:46,000 --> 00:07:52,000
This implementation though is different. Here we implement multiplication in something called composition.

85
00:07:52,000 --> 00:08:01,000
We take functions that already exist in Wikifunctions and compose them together in order to get more powerful functions implemented.

86
00:08:01,000 --> 00:08:09,000
Now the really interesting part here is: since these functions used in the composition are functions in Wikifunctions itself

87
00:08:09,000 --> 00:08:15,000
And just like Items in Wikidata have their QID, each Function also has its own page, its own ZID

88
00:08:15,000 --> 00:08:24,000
The functions down here, add, zero, and so on, are ZIDs with labels in different language

89
00:08:24,000 --> 00:08:28,000
And that allows us to switch languages

90
00:08:28,000 --> 00:08:32,000
And we can see the same composition in German

91
00:08:32,000 --> 00:08:34,000
Or in Bengali

92
00:08:34,000 --> 00:08:39,000
And not just read the implementation in Bengali, but also write it in Bengali

93
00:08:39,000 --> 00:08:44,000
We are aiming to allow people to create implementations without having to know basic English,

94
00:08:44,000 --> 00:08:49,000
Which is a major blocker for many people in the world to create functions, as research has repeatedly shown - 

95
00:08:49,000 --> 00:08:54,000
including research by frequent Wikimania attendee Benjamin Mako Hill.

96
00:08:54,000 --> 00:09:00,000
For many many people this will be the first time that they will be able to write implementations in their native language.

97
00:09:00,000 --> 00:09:08,000
For hundreds of millions of people we will unlock that superpower of using, reading, and writing functions.

98
00:09:08,000 --> 00:09:13,000
For Wikifunctions, we will have a new community, with people from the existing projects, 

99
00:09:13,000 --> 00:09:17,000
and mixing in new people who have never contributed to our projects before.

100
00:09:17,000 --> 00:09:21,000
We want to make functions accessible in many different ways,   

101
00:09:21,000 --> 00:09:25,000
through many different gateways, wherever you need them, whenever you need them

102
00:09:25,000 --> 00:09:32,000
on the Web, on your phone, through an assistant, within a spreadsheet - and in the Wikimedia projects.

103
00:09:32,000 --> 00:09:39,000
And we want to generate multilingual content from expressive abstract representations in 300+ languages

104
00:09:39,000 --> 00:09:44,000
Which is why we need to make it available to as many people as possible.

105
00:09:44,000 --> 00:09:50,000
To aim at people who don’t speak English. At people who are not already coders.

106
00:09:50,000 --> 00:09:56,000
We call this aiming high, aiming wide, instead of narrowing on just English speaking coders. 

107
00:09:56,000 --> 00:10:01,000
We have a lot of discussion about this, because it is very ambitious, it is difficult. 

108
00:10:01,000 --> 00:10:05,000
We are doing user research to understand how difficult it is.

109
00:10:05,000 --> 00:10:07,000
Just one surprising example:

110
00:10:07,000 --> 00:10:13,000
most people think that coding correlates with STEM (Science, Technology, Engineering, Mathematics) abilities, with mathematical abilities, 

111
00:10:13,000 --> 00:10:20,000
and many people who don’t think of themselves as being strong in STEM dismiss participating in coding before even taking a closer look. 

112
00:10:20,000 --> 00:10:25,000
Even though research published last year in Nature shows no correlation between STEM and programming, 

113
00:10:25,000 --> 00:10:31,000
but a clear and strong correlation between language aptitude and learning programming languages.

114
00:10:31,000 --> 00:10:34,000
Language aptitude! Exactly!

115
00:10:34,000 --> 00:10:39,000
So the very same people who are contributing to Wikipedia and to Wiktionary 

116
00:10:39,000 --> 00:10:45,000
are exactly the people that would have the capabilities and the motivation to contribute. 

117
00:10:45,000 --> 00:10:48,000
And we need their contributions to cover 300 languages. 

118
00:10:48,000 --> 00:10:53,000
We need to solve the challenge of how to frame the project, and reach them, 

119
00:10:53,000 --> 00:10:59,000
because in our user research so far they are the ones who say “oh, I don’t need to understand this” and go away. 

120
00:10:59,000 --> 00:11:04,000
That’s why it is so important to get the framing right, to make it inviting.

121
00:11:04,000 --> 00:11:11,000
Here are two early milestones we want to pass next year: Paradigms for inflections, and Abstract Descriptions

122
00:11:11,000 --> 00:11:16,000
The first is to create functions that can generate regular inflections for all kinds of words and languages.

123
00:11:16,000 --> 00:11:21,000
These can then be used for the lexicographic content in Wikidata

124
00:11:21,000 --> 00:11:24,000
Or for the inflection tables in the Wiktionaries.

125
00:11:24,000 --> 00:11:31,000
But instead of having the inflections implemented in 160 or so Wiktionaries independently (in templates, modules, or manually), 

126
00:11:31,000 --> 00:11:39,000
we implement it in Wikifunctions and make it available for all Wiktionaries, and also for Wikidata and for Abstract Wikipedia!

127
00:11:39,000 --> 00:11:41,000
Abstract Descriptions

128
00:11:41,000 --> 00:11:45,000
In Wikidata, every Item has a description in every language.

129
00:11:45,000 --> 00:11:51,000
Before we can generate whole articles for Abstract Wikipedia, we will be able to generate noun phrases.

130
00:11:51,000 --> 00:11:59,000
And descriptions? They are usually noun phrases. So we aim to make Abstract Descriptions for Wikidata another early milestone.

131
00:11:59,000 --> 00:12:04,000
This year we are focusing on developing Wikifunctions.

132
00:12:04,000 --> 00:12:12,000
In 2022 we will start adding a focus on Abstract Wikipedia, working on the early milestones, Paradigms and Abstract Descriptions

133
00:12:12,000 --> 00:12:18,000
And in 2023 we would finally get to the place where we can create Abstract Content

134
00:12:18,000 --> 00:12:23,000
and where we will be able to read and contribute to that content across many different languages, 

135
00:12:23,000 --> 00:12:28,000
where we have a truly multilingual Wikipedia, getting so much closer to a world,

136
00:12:28,000 --> 00:12:32,000
Where everyone can share in the sum of all knowledge.

137
00:12:32,000 --> 00:12:35,000
Thank you for your attention