Setup

This script relies on 3 geopackage outputs from the previous script. It creates a long dataframe, inclusive of all three vendors, groups them by permanent, unique bike ID, and defines a trip as a movement of 50 meters between timestamps, in an attempt to adjust for GPS variability.

link_data <- st_read("../../results/link_050122_050722.gpkg")
lime_data <- st_read("../../results/lime_050122_050722.gpkg")
spin_data <- st_read("../../results/spin_050122_050722.gpkg")
scooters_raw = st_as_sf(rbindlist(list(link_data, lime_data, spin_data), fill = TRUE)) %>%
  st_transform(crs = 3857)
#st_write(scooters_raw, dsn = "../../results/scooters_raw_050122_050722.gpkg", append = FALSE)
#scooters_raw <- st_read("../../results/scooters_raw_050122_050722.gpkg")
scooters_split <- scooters_raw %>%
  distinct() %>%
  filter(is_disabled == 0) %>%
  group_by(vendor, bike_id) %>%
  group_split()

This function defines what constitutes a trip. It is slow. Parallel processing using parallel and parLapply() speeds things up, but it still takes 15+ minutes to run over the 300,000 rows I collected for the week of morning commutes.

numcores <- detectCores()
cl = makeCluster(numcores)
clusterEvalQ(cl, {
  library(sf)
  library(dplyr)
  library(units)
})
scooters_trip <- parLapply(cl, scooters_split, function(df){
  df %>% mutate(dist_prev = units::drop_units(st_distance(geom, lag(geom), by_element = TRUE)),
                dist_next = units::drop_units(st_distance(geom, lead(geom), by_element = TRUE)),
                time_id = row_number(), #this is what allows us to order points for QGIS analysis
                movement_id = paste(bike_id, "_", row_number(), sep = ""), #perhaps redundant, but easy solution for moving between R and QGIS
                trip = case_when(
                  dist_prev > 50 | dist_next > 50 ~ 1, #define trip based on distance column
                  TRUE ~ 0))
})
stopCluster(cl)
trip_long <- st_as_sf(rbindlist(scooters_trip)) %>%
  filter(trip == 1) #filter by only trip points
trip_split <- trip_long %>% #split again by trips
  group_by(bike_id) %>%
  group_split()
trip_split_id <- lapply(trip_split, function(df){
  df %>% mutate(time_id = row_number())
})
trip_id_long <-st_as_sf(rbindlist(trip_split_id)) #bind into trips
st_write(trip_id_long, "../../results/trip_id_long.gpkg", append = FALSE)

Writing files into appropriate filetypes after analysis is very important with large, static datasets like these: when one code block takes 15 minutes to run, it isn’t a great idea to run the code every time it’s needed if it can be read from an existing file.