Description

The Health-Oriented Transportation (HOT) model is a collection of statistcial distributions and exposure-response functions used to quantify the expected health benefits of population-level increases in active transportation. HOT is publicly available for use as an R package and web application.

Details

  • The statistcal distributions include: baseline physical activity, baseline travel activity, and scebnario travel activity

    • All distributions are in terms of MET-hrs./week

    • The exposure-rerspons functions include all-cause mortality, cardiovascular disease, and diabetes

  • A comparitive risk assessment is used to estimate the population attributable fraction

  • Using travel survey data, baseline estimates for participation, frequency, and intensity of active transportation are computed and used to parameterize the distribution of travel activity as a mixture of log-normal and a point-mass at zero.

  • The baseline distribiution of physical activity is modeled using results from Arem et al.

Using HOT

Installation

Install the HOT R package using install_gitlab() from the devtools package.

library("devtools")
install_gitlab("GHI-UW/HOT")
library("HOT")

Once installed, load the package and view the help page.

help(HOT)

Travel Survey Data

The TravelSurvey object is designed to effectively store all relevant data from a given survey of a populations travel activity. This formulated TravelSurvey (TS) object is intended for analysis using the HOT model toolkit. The TS object is structured as 4 slots, each a data frame: household, person, trip, location.

We use the 2017 US NHTS data for an example. The data can be found here: https://nhts.ornl.gov/assets/2016/download/Csv.zip.

We save these data files to a folder in the working directory named data.

Household Table

First we create a data frame with the household data. This data frame requires three variables houseID, location and year. location is a factor variable indicating the location of the house. In this case we use HHSTATE in this example to stratify the data by state. The variable year is character class and identifies the year of the study. Identifier variables like houseID and later subjectID are character class.

# Load raw household data
hhpub.raw <- read.csv(file = "./data/hhpub.csv", stringsAsFactors = FALSE)

# Modify to fit TS format
hhpub <- within(hhpub.raw,{
  houseID = as.character(HOUSEID)
  location = factor(as.character(HHSTATE))
  year = "2017"
})

# Filter on urban households to reduce filesize, select relevant variables
hhpub <- hhpub %>% filter(URBAN == 1 ) %>% select(houseID, location, year)

Person Table

Second is the person (or individual) table. It contains four variables; houseID, subjectID, age and sex. The variables age and sex are factors with levels “child”, “adult”, “senior” and “M” and “F”, respectively.

# Load raw person data
perpub.raw <- read.csv(file = "./data/perpub.csv", stringsAsFactors = FALSE)

# Modify to fit TS format
perpub <- within(perpub.raw,{
  houseID = as.character(HOUSEID)
  subjectID = as.character(PERSONID)
  sex = factor(ifelse(R_SEX=="1", "M", ifelse(R_SEX=="2", "F", NA)), levels = c("M", "F"))
  age = factor(ifelse(!is.na(R_AGE), ifelse(R_AGE <= 18, "child", ifelse(R_AGE <= 65, "adult", "senior")), NA), levels = c("child","adult","senior"))
})

# Filter on adults in urban households to reduce filesize, select relevant variables
perpub <- perpub %>% filter(URBAN == 1 & age == "adult") %>% select(houseID, subjectID, sex, age)

Trip Table

Third we have the trip data. The duration variable is numeric and mode is a factor with levels “walk”, “cycle”, “other”.

# Load raw trip data
trippub.raw <- read.csv(file = "./data/trippub.csv", stringsAsFactors = FALSE)

# Modify to fit TS format
trippub <- within(trippub.raw,{
  houseID = as.character(HOUSEID)
  subjectID = as.character(PERSONID)
  duration = ifelse(as.numeric(TRVLCMIN) < 0, as.numeric(NA), as.numeric(TRVLCMIN))
  mode = factor(ifelse(TRPTRANS %in% c(-9,-8,-7,97), NA, ifelse(TRPTRANS == 1, "walk", ifelse(TRPTRANS == 2, "cycle", "other"))), levels = c("walk", "cycle", "other"))
})

# Filter on urban households to reduce filesize, select relevant variables
trippub <- trippub %>% filter(URBAN == 1 ) %>% select(houseID, subjectID, duration, mode)

Create TravelSurvey Object

NHTS <- createTravelSurvey(person = perpub, trip = trippub, house = hhpub, frequency = 5/7)
saveRDS(NHTS, file = "./data/NHTS.ts.rds")

Complete! You have successfully formed a TravelSurvey object. A handy function to validate your in-progress object is the command validObject(). As one might expect, it tests the validity of an object per its class definition – returning TRUE or FALSE.

Location Table (Optional)

Finally we have the location table, locpub, which contains participation estimates by location.

The location table has two variables, the location variable, a factor with the same levels as the house data frames, and participation, an estimate for the proportion of the population that are active travelers (the same value is used for all age and sex).

NHTS <- createTravelSurvey(person = perpub, trip = trippub, house = hhpub, location = locpub)

Summarize Travel Survey

summary(NHTS)
## Travel Survey data summary: 
## 
## Number of households:  64377 
## 
## Number of persons:  109890 
## 
## Sample population by sex: 
##     sex      n
## 1     M  51087
## 2     F  58727
## 3  <NA>     76
## 4 total 109890
## 
## Sample population by age: 
##     age      n
## 1 adult 109890
## 2 total 109890
## 
## Number of sub-locations:  51 
## 
## Average participation:  0.2588591 
## 
## Average time of trip by mode: 
##    mode        n
## 1  walk 16.93180
## 2 cycle 22.14480
## 3 other 21.56446
## 4  <NA> 41.59399

View Travel Survey Tables

head(NHTS@person)
##    houseID subjectID sex   age
## 1 30000007         3   F adult
## 2 30000012         1   F adult
## 3 30000041         1   M adult
## 4 30000041         2   F adult
## 5 30000082         1   F adult
## 6 30000082         2   M adult
head(NHTS@house)
##    houseID location year
## 1 30000007       NC 2017
## 2 30000012       NY 2017
## 3 30000019       MD 2017
## 4 30000029       WI 2017
## 5 30000039       PA 2017
## 6 30000041       CA 2017
head(NHTS@trip)
##    houseID subjectID duration  mode
## 1 30000007         1       15 other
## 2 30000007         1       20 other
## 3 30000007         2      120 other
## 4 30000007         2      150 other
## 5 30000007         3       15 other
## 6 30000007         3       15 other
head(NHTS@location)
## # A tibble: 6 x 2
##   location participation
##   <fct>            <dbl>
## 1 AK               0.270
## 2 AL               0.139
## 3 AR               0.156
## 4 AZ               0.246
## 5 CA               0.307
## 6 CO               0.381

HOT Functions

Intensity

getIntensity(NHTS) %>% head
## # A tibble: 6 x 3
##   location  mean    sd
##   <fct>    <dbl> <dbl>
## 1 AK       15.3  20.2 
## 2 AL       12.3  13.1 
## 3 AR        5.91  5.69
## 4 AZ       11.9  17.6 
## 5 CA       12.0  15.4 
## 6 CO       11.9  15.5

Prevalence

getPrevalence(NHTS) %>% head
## # A tibble: 6 x 2
##   location     p1
##   <fct>     <dbl>
## 1 AK       0.193 
## 2 AL       0.0990
## 3 AR       0.111 
## 4 AZ       0.176 
## 5 CA       0.219 
## 6 CO       0.272

Participation

getParticipation(NHTS) %>% head
## # A tibble: 6 x 2
##   location participation
##   <fct>            <dbl>
## 1 AK               0.270
## 2 AL               0.139
## 3 AR               0.156
## 4 AZ               0.246
## 5 CA               0.307
## 6 CO               0.381

Means

getMeans(NHTS) %>% head
## # A tibble: 6 x 4
##   location  walk cycle other
##   <fct>    <dbl> <dbl> <dbl>
## 1 AK        29.5 15.9   59  
## 2 AL        49    0     51.6
## 3 AR        23.6  0     65.6
## 4 AZ        30.1  8.76  56.7
## 5 CA        34.8  6.70  68.6
## 6 CO        26.0 10.8   71.2

Comparative Risk Assessment

getCRA(NHTS) %>% head
##   location        PAF
## 1       AK 0.08419022
## 2       AL 0.09863220
## 3       AR 0.06889356
## 4       AZ 0.07544604
## 5       CA 0.07407728
## 6       CO 0.06681366

For a list of all HOT functions, type “HOT::” into your R Studio console.

Web Application

To utilize the HOT model in its web app, visit https://hotshinytool.shinyapps.io/HOT-App/.

Within the online appliation, users may upload their own datasets. To perform this step, the app requires two .rds files: the TravelSurvey object and its corresponding GIS file, formatted as a dataframe. For exporting the TS to a proper .rds file, see ‘Compiling final TS object’ section above. See the ‘Visualize’ section below for information on forming the GIS dataframe.

Plot Maps

Looking ahead, we provide a brief introduction to visualizing HOT model results using real-world location maps, which require corresponding GIS shapefile data.

This section is much more complex than above, so feel free to approach as comfortable.

In order to visualize location-based results, we need GIS data corresponding to our formed TravelSurvey object. Specically, the location values themselves need to match.

First, find GIS data and convert it to a usable structure for plotting with result values: dataframe.

# Load USA GIS data
usaGIS <- raster::getData("GADM", path = "./data/", country = "USA", level = 1)

# convert to usable dataframe for merging with TS analysis and mapping
usaGIS@data$id <- rownames(usaGIS@data)
usaGIS.points <- fortify(usaGIS, region = "id")
usaGIS.df <- plyr::join(usaGIS.points, usaGIS@data, by = "id")

Next, use comparison between the TravelSurvey object's location levels and the columns in the obtained GIS dataframe.

The goal here is to modify the GIS dataframe in order to include a column of location values that exactly matches the TravelSurvey object’s location slot’s location column (class factor with identical levels).

# find existing column similar to NHTS location values -- VARNAME_1 seems to do the trick, given some string manipulation
head(levels(NHTS@location$location))
head(usaGIS.df)

# convert VARNAME_1 to match NHTS locations
usaGIS.df<- usaGIS.df %>%
  mutate( location = factor(sapply(strsplit(usaGIS.df$VARNAME_1,"\\|"), "[" , 1)) ) %>%
  select(long, lat, order, hole, piece, id, group, location)

# clean up specific differences using factor levels and ifelse()
levels(usaGIS.df$location) <- ifelse(levels(usaGIS.df$location) == "Commonwealth of Kentucky", "KY",
  ifelse(levels(usaGIS.df$location) == "Commonwealth of Massachusetts", "MA",
  ifelse(levels(usaGIS.df$location) == "Commonwealth of Pennsylvania", "PA",
  ifelse(levels(usaGIS.df$location) == "State of Rhode Island and Providence Plantations", "RI",
  levels(usaGIS.df$location)))))

# note that factor levels are still not identical, due to their ordering
identical(levels(NHTS@location$location), levels(usaGIS.df$location))

# sort factor levels in order to match NHTS
usaGIS.df$location <- factor(usaGIS.df$location, levels = sort(levels(usaGIS.df$location)))

# perform check once more
identical(levels(NHTS@location$location), levels(usaGIS.df$location))

From there, we store the final formatted GIS dataframe as a local .rds file.

saveRDS(usaGIS.df, file = "./data/USAGIS.rds")

As a quick example, a user may use the nifty HOT function plotMap to quickly visualize a given HOT metric onto a map. See help text for details.

We filter our dataframe down to the mainalnd lower 48 states, so as to simplify the resultant map (previously included Alaska, Hawaii, and various Pacific islands).

We suppress the R output here so as to reduce the size of this markdown document.

usaGIS_filtered.df <- usaGIS.df %>% dplyr::filter( !(location %in% c("AK", "HI")) )

plotMap(NHTS, usaGIS_filtered.df, metric = "duration", mode = "cycle")