5 min read

Global interest in Bitcoin over Time

Bitcoin: Visualizing (1) Wikipedia Pageviews, (2) Mentions in the New York Times, (3) Google Searches and (4) Exchange Rates from 2015 to early 2018. Simple timeseries plots (created from Web-API data) show that these four are strongly correlated, except when they aren't.

In this post I’ll display global interest in Bitcoin and Ethereum over time, according to accesses to their Wikipedia articles, mentions in the New York Times, and as Google Searches.

The end of the year 2017 saw the “Bitcoin Bubble”, a period of one month where 1 Bitcoin was traded for up to $20000 on most exchanges. I am not invested in Bitcoin, and never was substantially, but cryptocurrencies and the blockchain technology still interest me.

Preprocessing

Preprocessing entails:

  • Define necessary R packages
  • Fetching the data

At this time, no user registration process is required to get to the data. Except for the New York Times Data, no API keys and authentication steps are necessary to call the Web-APIs mentioned below.

Load R packages

library(pageviews) # access Wikipedia
library(Quandl)    # provider of financial data
library(lubridate)
library(tidyverse) 
library(gtrendsR)  # access Google trends API
# ggplot defaults
theme_set(theme_bw()) 
scale_colour_discrete <- function(...) scale_color_brewer(palette="Set1")

Define a wrapper function to access a Wikipedia API

The function contains quite a few of hardwired parameters. The pageviews package fetches the data. The function could be made a lot more flexible by adding more arguments, but that is not necessary for this blogpost.

The search function only works for terms for which an article exists with the same title in both the German and the English Wikipedia.

Luckily “Bitcoin” is such a term.

# Get the pageviews for "Search term"
# pageview data is available since 2015
startdate <- ymd("20150101")
enddate <- now()

wikipedia_de_vs_en <- function(term = "Bitcoin") {
  # call English wikipedia        
  pageviews <- article_pageviews(
    project = "en.wikipedia",
    article = term,
    user_type = "user",
    start = startdate,
    end = enddate
  )
  # call German wikipedia
  pageviews2 <- article_pageviews(
    project = "de.wikipedia",
    article = term,
    user_type = "user",
    start = startdate,
    end = enddate
  )

  # combine to data frame
  pageviews.df <- bind_rows(pageviews, pageviews2) %>%
    mutate(language = case_when(
      language == "en" ~ "English Wikipedia",
      language == "de" ~ "German Wikipedia"
    ))

  #plot it with 2 panels
  ggplot(pageviews.df, aes(date, views, color = language)) +
    geom_line() +
    xlab("") + ylab("Views per Day") +
    theme(legend.position = "none") +
    facet_wrap(~ language, nrow = 2, scales = "free_y") +
    ggtitle(
      sprintf("Daily Pageviews of '%s' in the English and German Wikipedias", term),
      subtitle = "Human Users only"
    ) 
}

Data Visualization

(If you want to follow along, or verify the data: An interactive frontend that does the same is available at this url: https://tools.wmflabs.org )

Get some data for the Wikipedia article on Bitcoin:

Get some data for the article on Ethereum, another popular cryptocurrency (at this time).

These plots show a lot of things.

  • Unsurprisingly, the interest (measured in pageviews) in the English version of the Wikipedia articles is consistently higher than interest in the German version. About 5 times as high.
  • Accesses to the Wikipedia articles peaked quite a few times even before the 2017 craze started. (I haven’t looked it up what these events were. Ransomware spreads, perhaps?).
  • Interest in Ethereum seems to be disproportionally high in Germany: English pageviews were “only” 3 times as high as German pageviews (usually it’s 5 times as high).
  • Strong peaks in Ethereum-article pageviews often occured independently from Bitcoin-Article pageviews, for both language articles.
  • (TBC)

Mentions in the New York Times

It is also instructive to check when the Media turned their attention to Bitcoin, and how often they did this over time. The New York Times has a Search API which can be queried for free if you have an API key. This shell script queries the NYT Search API and saves the results into a CSV file:

#!/bin/sh 
# following this example:
# https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/book/01.Rmd
apikey=c62ffea...........................


parallel --gnu -j1 --delay 8 --results bitcoin \
 "curl -sL  'http://api.nytimes.com/svc/search/v2/articlesearch.json?q=bitcoin&begin_date={1}0101&end_date={1}1231&page={2}&api-key=$apikey'" ::: {2009..2018} ::: {0..99}

# create csv file with command line tools: (cat, jq and json2csv)
cat bitcoin/1/*/2/*/stdout | \
jq -c '.response.docs[] | {date: .pub_date, type: .document_type, title: .headline.main }' \
| json2csv -p -k date,type,title > bitcoin-nytimes.csv 

Result - raw data

# date,type,title
# 2011-11-29T00:40:30Z,blogpost,"Today's Scuttlebot: Drones for Everyone, and Bitcoin's Decline"
# 2011-09-07T00:20:26Z,blogpost,Golden Cyberfetters
# 2011-07-04T00:00:00Z,article,Speed Bumps on the Road to Virtual Cash
# 2011-05-30T00:00:00Z,article,Some Faint Praise for Mr. Ballmer
# ...

How often did the word ‘Bitcoin’ appear in articles of the New York Times over the years?

The term ‘Bitcoin’ did hardly appear at all until 2012. The New York Times provides a platform for bloggers and experts such as Paul Krugman for instance. He has talked about Bitcoin in 2014.

Another obligatory comparison

Last but not least: the exchange rate for Bitcoin, here given in Euro. Data are from Bitcoin.de, aggregated by Quandl.

library(Quandl)
btc <- Quandl(code = "BCHARTS/BTCDEEUR", type = "xts")
autoplot(btc$Close["2015-01-01/"]) +
        ylab("Bitcoin/EUR") + xlab("Year") +
        ggtitle("Bitcoin Exchange Rate 2015-2018",
                subtitle="'Close Price' in EUR, according to the Bitcoin.de trading platform. Data from Quandl.com.")

What many people talk about is the ‘volatility’ of Bitcoin. Rarely have I seen someone visualizing this, so here is another plot showing the daily change in the ‘closing price’ (do these trading platforms really close?) for 1 Bitcoin, in Euro:

autoplot(diff(btc$Close["2015-01-01/"])) +
        ylab("Daily Price Change, in EUR/day ") + xlab("Year") +
        ggtitle("Volatility of Bitcoin 'Close Price' 2015-2018",
                subtitle="in EUR, according to the Bitcoin.de trading platform. Data from Quandl.com.")

I’ve never seen any financial time series like that.

Conclusion
What I learned from doing this:
  • Collecting API data
  • jq usage, during preprocessing (code not shown))
  • Many small R tricks (primarily ggplot2 and RMarkdown stuff)