Animate your Strava activities in R using rStrava and gganimate

Getting started with rStrava

rStrava is an R package that allows you to access data from Strava using the Strava API. Some of the functions of rStrava scrape data from the public Strava website but to access your own data you will need a Strava profile and an authentication token. Details on obtaining your unique token can be found on the rStrava homepage. In addition to this key, we use rgbif::elevation() to calculate the elevation of each route. This requires a Google API key which can be created here.

Got a Strava authentication token? Got a Google API key? We are ready to create some animations! To create our animations, we use gganimate that requires ImageMagick to be installed.

Loading packages and defining tokens

First load the packages that are used in the script and our Strava and Google authentication tokens.

# load packages ####
library(rStrava) # devtools::install_github('fawda123/rStrava')
library(gganimate) # devtools::install_github('dgrtwo/gganimate')
library(dplyr)
library(tidyr)
library(purrr)
library(sp)
library(ggmap)
library(raster)

# initial setup ####
# Strava key
app_name <- 'xxxx'
app_client_id <- 'xxxxx'
app_secret <- '"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"'

# create strava token
my_token <- httr::config(token = strava_oauth(app_name, app_client_id, app_secret))

# Google elevation API key
GoogleAPI <- 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

A browser window should open at this point saying Authentication complete. Please close this page and return to R.. This means everything is going well!

Download your data

We can then download our personal activity data using the rStrava::get_activity_list(). This function needs your strava token and your strava athlete id. For example, my strava id is 2140248.

# download strava data
my_acts <- get_activity_list(my_token, id = 2140248)

length(my_acts)
## [1] 785

This returns a large list of all your previous activities. Mine has 785 previous entries. If you want to explore your list, you can use View(my_acts) in RStudio which opens the Data Viewer window.

Compile your data into “tidy” dataframe

rStrava has a function that compiles the information stored in the output of get_activity_list() to a “tidy” dataframe, with one row for each activity. compile_activities() finds all the columns across all activities and returns NA when a column is not present in a given activity. This means that if HR was not measured across all your strava activities, the function will still work!

# compile activities into a tidy dataframe
my_acts <- compile_activities(my_acts)

# have a look at the dataframe
dplyr::glimpse(my_acts)
## Observations: 785
## Variables: 56
## $ achievement_count      <dbl> 4, 70, 0, 0, 1, 60, 1, 1, 1, 5, 25, 1, ...
## $ athlete_count          <dbl> 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...
## $ athlete.id             <chr> "2140248", "2140248", "2140248", "21402...
## $ athlete.resource_state <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ average_heartrate      <chr> "151.2", "131.6", "105.6", "125", "126....
## $ average_speed          <dbl> 13.6008, 25.0308, 27.4536, 25.4844, 26....
## $ comment_count          <dbl> 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, ...
## $ commute                <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ distance               <dbl> 5.0064, 112.8620, 2.6385, 20.9050, 21.9...
## $ elapsed_time           <dbl> 1325, 18600, 346, 2953, 3020, 17859, 33...
## $ elev_high              <dbl> 86.0, 231.1, 120.0, 199.0, 199.0, 189.2...
## $ elev_low               <dbl> 63.9, 0.3, 71.6, 66.6, 69.3, 3.9, 71.6,...
## $ end_latlng1            <dbl> 51.92, 50.16, 50.16, 50.17, 50.16, 50.1...
## $ end_latlng2            <dbl> -2.05, -5.12, -5.12, -5.13, -5.12, -5.1...
## $ external_id            <chr> "garmin_push_2703271543", "garmin_push_...
## $ flagged                <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ from_accepted_tag      <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ gear_id                <chr> "g2482208", "b751035", "b751035", "b751...
## $ has_heartrate          <chr> "TRUE", "TRUE", "TRUE", "TRUE", "TRUE",...
## $ has_kudoed             <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ id                     <dbl> 1575586892, 1569537013, 1560414332, 156...
## $ kudos_count            <dbl> 3, 13, 0, 5, 4, 11, 0, 2, 2, 2, 6, 0, 1...
## $ location_country       <chr> "United Kingdom", "United Kingdom", "Un...
## $ manual                 <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ map.id                 <chr> "a1575586892", "a1569537013", "a1560414...
## $ map.resource_state     <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ map.summary_polyline   <chr> "cq{{H~ooKqBxJvMvInDrWM~u@l@j@eD`MtIiAr...
## $ max_heartrate          <chr> "172", "160", "133", "155", "149", "160...
## $ max_speed              <dbl> 19.80, 59.76, 56.88, 56.16, 72.00, 70.2...
## $ moving_time            <dbl> 1325, 16231, 346, 2953, 3020, 15492, 33...
## $ name                   <chr> "Lunch Run", "Morning Ride", "Evening R...
## $ photo_count            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ pr_count               <chr> "0", "28", "0", "0", "0", "20", "0", "1...
## $ private                <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ resource_state         <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ start_date             <chr> "2018-05-16T11:26:28Z", "2018-05-13T08:...
## $ start_date_local       <chr> "2018-05-16T12:26:28Z", "2018-05-13T09:...
## $ start_latitude         <dbl> 51.92, 50.16, 50.17, 50.16, 50.16, 50.1...
## $ start_latlng1          <dbl> 51.92, 50.16, 50.17, 50.16, 50.16, 50.1...
## $ start_latlng2          <dbl> -2.05, -5.12, -5.13, -5.12, -5.12, -5.1...
## $ start_longitude        <dbl> -2.05, -5.12, -5.13, -5.12, -5.12, -5.1...
## $ suffer_score           <chr> "38", "183", "1", "21", "23", "156", "2...
## $ timezone               <chr> "(GMT+00:00) Europe/London", "(GMT+00:0...
## $ total_elevation_gain   <dbl> 37.3, 1499.0, 25.4, 293.2, 288.6, 1273....
## $ total_photo_count      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ trainer                <chr> "FALSE", "FALSE", "FALSE", "FALSE", "FA...
## $ type                   <chr> "Run", "Ride", "Ride", "Ride", "Ride", ...
## $ upload_id              <chr> "1694137366", "1687878323", "1678043395...
## $ utc_offset             <chr> "3600", "3600", "3600", "3600", "3600",...
## $ average_cadence        <chr> NA, "75.3", "73.4", "68.4", "70.3", "72...
## $ average_watts          <dbl> NA, 151.8, 121.2, 167.1, 163.9, 152.0, ...
## $ device_watts           <chr> NA, "FALSE", "FALSE", "FALSE", "FALSE",...
## $ kilojoules             <dbl> NA, 2464.6, 41.9, 493.6, 495.1, 2355.2,...
## $ location_city          <chr> NA, "Mabe Burnthouse", "Mabe Burnthouse...
## $ location_state         <chr> NA, "England", "England", "England", "E...
## $ workout_type           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

There are so many columns here, so I remove some columns I am not interested in for this post and do some data transformations to get the date in a correct format.

# columns to keep
desired_columns <- c('distance', 'elapsed_time', 'moving_time', 'start_date', 'start_date_local', 'type', 'map.summary_polyline', 'location_city', 'upload_id', 'start_latitude', 'start_longitude')

# keep only desired columns
my_acts <- dplyr::select(my_acts, match(desired_columns, names(my_acts)))

# transformations ####
my_acts <- mutate(my_acts,
                  activity_no = seq(1,n(), 1),
                  elapsed_time = elapsed_time/60/60,
                  moving_time = moving_time/60/60, 
                  date = gsub("T.*$", '', start_date) %>%
                    as.POSIXct(., format = '%Y-%m-%d'),
                  EUdate = format(date, '%d/%m/%Y'),
                  month = format(date, "%m"),
                  day = format(date, "%d"),
                  year = format(date, "%Y")) %>%
  mutate_at(., c('month', 'day'), as.numeric)

Get latitude and longitude for each activity

Each activity has a bunch of data associated with it. For mapping, I am interested in the map.summary_polyline, which is a Google polyline which allows the encoding of multiple latitude and longitude points as a single string. We can get the latitude and longitude for each of the activities by using get_latlon() which decodes the polylines and using dplyr and purrr to iterate over every activity in the dataframe.

# get lat lon and distance of every ride ####
lat_lon <- my_acts %>%
  filter(!is.na(map.summary_polyline)) %>%
  nest(., -activity_no) %>%
  mutate(coords = map(data, get_latlon),
         distance = map(coords, ~get_dists(.x$lon, .x$lat))) %>%
  unnest(., data) %>%
  unnest(., coords, distance)

Having got the latitude and longitude for every ride, we can now get the elevation of each point and then calculate the gradient between points.

# get elevation and calculate gradient between points
lat_lon <- mutate(lat_lon, ele = rgbif::elevation(latitude = lat, longitude = lon, key = GoogleAPI)$elevation)

lat_lon <- group_by(lat_lon, activity_no) %>%
  mutate(., ele_diff = c(0, diff(ele)),
         dist_diff = c(0, diff(distance)),
         grad = c(0, (ele_diff[2:n()]/10)/dist_diff[2:n()])) %>%
  ungroup() %>%
  dplyr::select(., -c(ele_diff, dist_diff))

This now gives us a data frame of all the activities I have done with the latitude, longitude, cumulative distance, elevation and gradient. It would now be super easy to create elevation profiles, but I will save that for another post.

Create a gif of a single ride

We now have almost all the components to create a gif of a single ride.

lat_lon_single <- filter(lat_lon, activity_no == 2)
nrow(lat_lon_single)
## [1] 143

However, Google polylines do not give a consistent number of latitude and longitude points. This means that it might be hard to always get enough points to make smooth transitions between frames in a gif. I have tried using tweenr to achieve this, but an alternative is touse geospatial packages such as sp and raster to interpolate a desired number of points from the current ones.

# reorder columns so lat lon are first
lat_lon_single <- dplyr::select(lat_lon_single, lat, lon, everything())

# make new data with Duffy's method
interp <- raster::spLines(as.matrix(lat_lon_single[,1:2])) %>%
  sp::spsample(., n = 250, type = 'regular') %>%
  data.frame() %>%
  mutate(., distance = get_dists(lon, lat),
         ele = rgbif::elevation(latitude = .$lat, longitude = .$lon, key = GoogleAPI)$elevation,
         ele_diff = c(0, diff(ele)),
         dist_diff = c(0, diff(distance)),
         grad = c(0, (ele_diff[2:n()]/10)/dist_diff[2:n()]),
         n = row_number())

We can now put the gif together, using ggmap and ggplot2. We make our normal ggplot2 plot into a gif by adding a frame argument and cumulative = TRUE.

# make bbox
bbox <- ggmap::make_bbox(lon, lat, data = interp, f = 1.3)

# download map
map <- get_map(location = bbox, source = 'google', maptype = 'terrain')

single_ride <- ggmap(map, darken = 0.15) +
  geom_path(aes(x = lon, y = lat,  col = grad, group = 1, frame = n, cumulative = TRUE), data = interp, size = 2, alpha = 1) +
  scale_color_distiller('Gradient (%)', palette = 'Spectral') +
  labs(title = 'Ride to St Just and back') +
  coord_cartesian() +
  ggforce::theme_no_axes(theme_bw(base_size = 16))

# animate plot
animation::ani.options(interval = 1/20)
gganimate::gganimate(single_ride, title_frame = FALSE, 'where_you_want_to_save_it.gif', ani.width = 800, ani.height = 700)

The output of the gganimate() can be seen below.

Create a gif of ALL the rides

We can also make a gif of multiple activities. I will filter my activities to only be bike rides, be more than 20 km in distance and to have occurred in Cornwall in the UK (where I live) so that we don’t have rides from the family home near Manchester.

# get a bbox for Cornwall
bbox <- ggmap::make_bbox(lat_lon_single$lon, lat_lon_single$lat, f = 1.2)

# add column for frame and total distance per ride
lat_lon <- group_by(lat_lon, activity_no) %>%
  mutate(n = 1:n(),
         tot_dist = max(distance)) %>%
  ungroup()

# filter lat_lon for when points are within this
lat_lon <- filter(lat_lon, between(start_longitude, bbox[1], bbox[3]) & between(start_latitude, bbox[2], bbox[4]) & type == 'Ride' & tot_dist > 20)

# add column for frame
lat_lon <- group_by(lat_lon, activity_no) %>%
  mutate(n = 1:n()) %>%
  ungroup()

# make bbox again
bbox <- ggmap::make_bbox(lon, lat, data = lat_lon, f = 0.1)

# download map
map <- get_map(location = bbox, source = 'google', maptype = 'terrain')

all_the_rides <- ggmap(map, darken = 0.15) +
  geom_path(aes(x = lon, y = lat,  col = year, group = activity_no, frame = n, cumulative = TRUE), data = lat_lon, size = 1.25, alpha = 0.5) +
  labs(title = 'All the rides') +
  coord_cartesian() +
  ggforce::theme_no_axes(theme_bw(base_size = 16)) +
  guides(colour = guide_legend(override.aes = list(alpha=1)))

# animate plot
animation::ani.options(interval = 1/20)
gganimate::gganimate(all_the_rides, title_frame = FALSE, 'where_you_want_to_save_it.gif', ani.width = 800, ani.height = 700)

The output of this call to gganimate() can be seen below.

And there we have it. A relatively simple way to animate your strava activities in R. I personally find that saving the output as .mp4 rather than .gif gives smaller and higher quality files when uploading them to Instagram, but these options are easy to change. Take back your own data and get plotting!

There are loads of other functions and uses for the rStrava package. I hope to blog more about them soon. In addition, the amazing Thomas Lin Pedersen is starting to rewrite gganimate. You can follow the exciting progress here, but I look forward to trying it out.

comments powered by Disqus