File talk:Graph comparing article edits on Wikivoyage and Wikitravel.svg

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Source code[edit]

The following R code was used to create this graph, and the corresponding graph for recent changes. It's available under the GPL version 2 or later, as well as CC-BY-SA version 3 and the GFDL.

# Create graphs of WV/WT recent changes data from January to March 2013

library(reshape2)
library(ggplot2)

# Read in data
wvwt <- read.csv("compare.csv")

# convert to long format
wvwt2 <- melt(wvwt, id.var="date0")
wvwt2$site <- toupper(substr(wvwt2$variable, 1, 2))
wvwt2$type <- ifelse(substr(wvwt2$variable, 3, 4) == "rc", "Recent changes", "Article edits")
wvwt2$date <- as.Date(strptime(wvwt2$date0, format="%Y%m%d"))

# plot # of article edits
svg("Graph comparing article edits on Wikivoyage and Wikitravel.svg", width = 6.5, height = 4.5)
ggplot() + 
layer(
  data = wvwt2[wvwt2$type=="Article edits",], mapping = aes(x = date, y = value, colour = site),
  geom = "point", stat = "identity"
) + 
layer(
  data = wvwt2[wvwt2$type=="Article edits",], mapping = aes(x = date, y = value, colour = site),
  geom = "smooth", stat = "smooth", span = 0.15, se = FALSE
) +
scale_x_date(name="Date") +
scale_y_continuous(name="Number of article edits on each day") +
theme(legend.position = c(.82, .4))
dev.off()

# plot # of recent changes
svg("Graph comparing recent changes on Wikivoyage and Wikitravel.svg", width = 6.5, height = 4.5)
ggplot() + 
layer(
  data = wvwt2[wvwt2$type=="Recent changes",], mapping = aes(x = date, y = value, colour = site),
  geom = "point", stat = "identity"
) + 
layer(
  data = wvwt2[wvwt2$type=="Recent changes",], mapping = aes(x = date, y = value, colour = site),
  geom = "smooth", stat = "smooth", span = 0.15, se = FALSE
) +
scale_x_date(name="Date") +
scale_y_continuous(name="Number of recent changes on each day") +
theme(legend.position = c(.82, .4))
dev.off()

The data has been read in from a file called compare.csv, which has this format:

date0,wvrc,wtrc,wvae,wtae
20130115,2658,746,2251,512
20130116,9134,921,8272,563
20130117,7142,726,6407,507
...

This has so far been created by manually editing the output from Nicolas1981's compare.sh script (January version). This process could easily be improved, e.g. by using the latest version, and/or changing the script to produce such a csv file, and running it directly from R as needed. But perhaps it'd be more useful to port the script into R, and extend it to allow more complex analyses (e.g. total net bytes added in constructive edits, top contributors by edits or by bytes added, most edited articles, etc).

--Avenue (talk) 13:35, 23 March 2013 (UTC)[reply]