Data Preview and Metrics Summary

Column

Data Preview

Column

Activity Metrics Overview

Sleep Metrics Overview

Travel VS at Home

Row

Activity Level when Travel

Row

Steps When Travel

Calorie Burned When Travel

Heart Rate When Travel

Workdays VS Weekends 1

Column

Average Activity metrics by Day

Column

Average Sleep Time by Day

Average Floors by Day

Workdays VS Weekends 2

Column

Correlation between Steps, Distance and Calories Burned

Column

Correlation between Heart Rate and Calories Burned

Write up

Project Topics

Fitness Wearable market is undoutedly a growing market in the past few years, with the emergence of many cutting edge fitness tracker. As a result, people are relying on these wearables to tack their daily activities to help them quantifying their life, and ultimately live a healthier life.

The purpose of this project is to use the data collected by the fitness tracker to provide some data-driven approach in solving the following questions using data visualization

Data Acquisition and Manipulation

I personally have been using Fitbit Charge 2 for about 4 months, so the raw data is easily obtainable through fitbit online. The time frame that I’m using, as shown in the datatable, is from 2016-12-01 to 2017-02-12. In order to get the intraday data for the time that I was doing travel, I used a R library called FitbitScraper to scrape data from Fitbit.com. They’re 15 minute-interval intraday data for the selected time frame with the necessary metrics.

The data manipulation is straightforward. I formatted the date so that R can recognized them as date. I’ve also add two factor columns to indicate if the date is a weekend or not and the day of week for further analyzing purposes.

Visualization Methodology

Packages that I’ve used

To start with the methodology, I first provide an overview of the dataset using DT library. Then I was trying to see an overview of some main activity metrics such as daily steps, distance and calories burned, thus I choose to use a scatter plot with x axis as steps, y axis as distance and apply a color scale on the calories burned in the scatter plot. To see the relative relationships between each stage of sleep, I plotted a stacked line graph to show minutes in bed, minutes asleep and minutes awake respectively everyday using plotly.

Next, I was trying to see if travelling has some impact on my activity minutes since I was always moving around during travel. To get more data poingts, I utilized a package called fitbitscraper to get daily data with a 15 minutes interval from 2016-12-24 to 2017-01-04, when I was travelling. I drew a trend line for the minutes in each zone (Sedentary,lightly active, fairly active and very active) and highlight the time I was doing travel. I’ve also did plots on some main metrics such as steps taken, calories burned and heart rate when I was doing travel.

After that, I’d like to look at how my metrics vary through out the day of the week, so I did four bar plots to see the averages of each metric for a given day of the week. I’ve also color scaled them by value.

Finally, I tried to see some correlations between some of the metrics. For studying the correlations between steps, distance and calories burned, I used a 3D scatter plot since there are 3 variables. I have also seperated weekend and weekdays to see how the position of points differ. To study the correlation between heart rate and calories burned. I took a random sample from the dataset that I scraped online and used a scatter plot and applied geom_smooth function to see the correlation.

Results and Conclusion

---
title: "Miao You-ANLY512-50-2016-IV-Final Project"
output: 
  flexdashboard::flex_dashboard:
    vertical_layout: fill
    source_code: embed
---
Data Preview and Metrics Summary {data-orientation=columns}
======================

```{r setup, include=FALSE}
library(flexdashboard)
##Attach necessary library
library(flexdashboard)
library(lubridate)
library(plyr)
library(dplyr)
library(fitbitScraper)
library(plotly)
library(DT)
library(chron)
library(dygraphs)
library(xts)
library(reshape2)
library(gridExtra)
library(cowplot)
library(readr)
library(scales)
library(ggthemes)
library(rbokeh)
##Load downloaded data
#Directory1
#MyFitbit <- read_csv("U:/R Projects/Fitbit_Export.csv")
#Directory2
MyFitbit <- read_csv("~/Personal/HU/ANLY512-51 Data Visualization/Final Project/Fitbit_Export.csv")
##Hide pw
fileName <- "/Users/Leo/Personal/HU/ANLY512-51 Data Visualization/Final Project/Fitbit PW.txt"
#fileName<-"U:/R Projects/Fitbit PW.txt"
con <- file(fileName,open="r")
fbpw <- readLines(con)
close(con)
##Scrape intraday data online
cookie = login("youmiao2510@gmail.com", fbpw, rememberMe = TRUE)

##Prepare Data
MyFitbit$Weekend<-chron::is.weekend(MyFitbit$Date)
MyFitbit$Weekend[MyFitbit$Weekend==TRUE]<-"Weekend"
MyFitbit$Weekend[MyFitbit$Weekend==FALSE]<-"Weekday"
MyFitbit$Weekday<-format(MyFitbit$Date,"%A")
MyFitbit$Weekday <- factor(MyFitbit$Weekday, levels=MyFitbit$Weekday[c(2, 6, 7, 5, 1, 3, 4)])



#MyFitbit$SleepHour<-round(MyFitbit$TimeinBed/60,2)
#MyFitbit$AwakeHour<-round(MyFitbit$MinutesAwake/60,2)
#MyFitbit$MinutesAsleep<-round(MyFitbit$MinutesAsleep/60,2)
SleepData<-as.xts(MyFitbit$MinutesAsleep,MyFitbit$Date)
MinSed<-as.xts(MyFitbit$MinutesSedentary,MyFitbit$Date)
MinLgt<-as.xts(MyFitbit$MinutesLightlyActive,MyFitbit$Date)
MinFar<-as.xts(MyFitbit$MinutesFairlyActive,MyFitbit$Date)
MinVry<-as.xts(MyFitbit$MinutesVeryActive,MyFitbit$Date)
attach(MyFitbit)
```

Column {data-width=650}
-----------------------------------------------------------------------

### Data Preview

```{r}
DT::datatable(MyFitbit,options = list(pagelength = 10),fillContainer = TRUE)

```

Column {data-width=350}
-----------------------------------------------------------------------

### Activity Metrics Overview

```{r}
plot_ly(data = MyFitbit, x = MyFitbit$Steps ,y = MyFitbit$Distance,color = MyFitbit$`Calories Burned`,hoverinfo = 'text',
        text = ~paste('Steps: ', MyFitbit$Steps, 
                      '
Distance: ', MyFitbit$Distance, '
Calories Burned: ', MyFitbit$`Calories Burned`, '
Date: ', MyFitbit$Date),mode = "markers",marker = list(size = 15))%>% layout(xaxis = list(title = "Steps/K"),yaxis = list(title = "Distance/Mile"),title = "Activities Metrics") ``` ### Sleep Metrics Overview ```{r} plot_ly(MyFitbit, x = ~Date, y = ~TimeinBed, type = 'scatter', mode = 'lines', line = list(color = 'transparent'), showlegend = FALSE, name = 'Min in Bed') %>% add_trace(y = ~MinutesAwake, type = 'scatter', mode = 'lines', fill = 'tonexty', fillcolor='rgba(0,100,80,0.2)', line = list(color = 'transparent'), showlegend = FALSE, name = 'Min Awake') %>% add_trace(x = ~Date, y = ~MinutesAsleep, type = 'scatter', mode = 'lines', line = list(color='rgb(0,100,80)'), name = 'Min Asleep') %>% layout(title = "Sleep Time Analysis", paper_bgcolor='rgb(255,255,255)', plot_bgcolor='rgb(229,229,229)', xaxis = list(title = "Months", gridcolor = 'rgb(255,255,255)', showgrid = TRUE, showline = FALSE, showticklabels = TRUE, tickcolor = 'rgb(127,127,127)', ticks = 'outside', zeroline = FALSE), yaxis = list(title = "Minutes", gridcolor = 'rgb(255,255,255)', showgrid = TRUE, showline = FALSE, showticklabels = TRUE, tickcolor = 'rgb(127,127,127)', ticks = 'outside', zeroline = FALSE)) ``` Travel VS at Home {data-orientation=rows} =================== Row {data-height=350} ----------------------------------------------------------------------- ### Activity Level when Travel ```{r} MinActivity<-cbind(MinSed,MinLgt,MinFar,MinVry) names(MinActivity)[1]<-"Sed Min" names(MinActivity)[2]<-"Lgt Act Min" names(MinActivity)[3]<-"Far Act Min" names(MinActivity)[4]<-"Very Act Min" dygraph(MinActivity, main = "Activity Min in Different Zones")%>% dyOptions(fillGraph = TRUE,fillAlpha = 0.4,colors = RColorBrewer::brewer.pal(3, "Set2"))%>% dyRangeSelector(height = 40)%>% dyHighlight(highlightCircleSize = 5)%>% dyShading(from = "2016-12-24", to = "2017-01-04")%>% dyLegend(width = 400) ``` Row {data-height=650} ----------------------------------------------------------------------- ### Steps When Travel ```{r} dates <- seq(as.Date("2016-12-24"), as.Date("2017-01-04"), by="day") df_list <- lapply(dates, function(x) get_intraday_data(cookie=cookie, what="steps", as.character(x))) df.Steps <- do.call(rbind, df_list) ggplot(df.Steps) + geom_bar(aes(x=time, y=steps, fill=steps), stat="identity") + xlab("") +ylab("steps") + theme(axis.ticks.x=element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.background=element_blank(), panel.grid.major.y=element_line(colour="gray", size=.1), legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.text.x = element_text(angle = 30,vjust = 0.5,size = 8), axis.text.y = element_text(size = 8), axis.title=element_text(size=10))+ scale_x_datetime(breaks=date_breaks("1 day"))+ xlab("Date")+ ylab("Steps") ``` ### Calorie Burned When Travel ```{r} dates <- seq(as.Date("2016-12-24"), as.Date("2017-01-04"), by="day") df_list <- lapply(dates, function(x) get_intraday_data(cookie=cookie, what="calories-burned", as.character(x))) df.Calories <- do.call(rbind, df_list) colnames(df.Calories)[2] <- "Calories Burned" ggplot(df.Calories) + geom_bar(aes(x=time, y=`Calories Burned`, fill=`Calories Burned`), stat="identity") + xlab("") +ylab("steps") + theme(axis.ticks.x=element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.background=element_blank(), panel.grid.major.y=element_line(colour="gray", size=.1), legend.position="right", legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.text.x = element_text(angle = 30,vjust = 0.5,size = 8), axis.text.y = element_text(size = 8), axis.title=element_text(size=10))+ scale_x_datetime(breaks=date_breaks("1 day"))+ xlab("Date")+ ylab("Calories Burned")+ scale_fill_gradient(low = "orange",high = "green",guide = "colourbar") ``` ### Heart Rate When Travel ```{r} dates <- seq(as.Date("2016-12-24"), as.Date("2017-01-04"), by="day") df_list <- lapply(dates, function(x) get_intraday_data(cookie=cookie, what="heart-rate", as.character(x))) df.HR <- do.call(rbind, df_list) df.HR<-subset(df.HR,bpm > 0) ggplot(data = df.HR,aes(x = time,y = bpm))+ geom_line(aes(color = bpm))+ theme(axis.ticks.x=element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.background=element_blank(), panel.grid.major.y=element_line(colour="gray", size=.1), legend.position="right", legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.text.x = element_text(angle = 30,vjust = 0.5,size = 8), axis.text.y = element_text(size = 8), axis.title=element_text(size=10))+ scale_x_datetime(breaks=date_breaks("1 day"))+ xlab("Date")+ ylab("Calories Burned")+ scale_color_gradient(low = "green",high = "red",guide = "colourbar") ``` Workdays VS Weekends 1 {data-orientation=columns} ================== Column {data-width=700} ----------------------------- ### Average Activity metrics by Day ```{r} avgs.steps <- by(MyFitbit$Steps, MyFitbit$Weekday, mean) avgs.steps <- data.frame(day=names(avgs.steps), Steps=as.numeric(avgs.steps)) avgs.steps$Steps<-round(avgs.steps$Steps,2) avgs.steps$day <- factor(avgs.steps$day, levels=avgs.steps$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.calories <- by(MyFitbit$`Calories Burned`, MyFitbit$Weekday, mean) avgs.calories <- data.frame(day=names(avgs.calories), `Calories Burned`=as.numeric(avgs.calories)) avgs.calories$day <- factor(avgs.calories$day, levels=avgs.calories$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.calories$Calories.Burned<-round(avgs.calories$Calories.Burned,2) avgs.sleep <- by(MyFitbit$TimeinBed, MyFitbit$Weekday, mean) avgs.sleep <- data.frame(day=names(avgs.sleep), TimeinBed=as.numeric(avgs.sleep)) avgs.sleep$day <- factor(avgs.sleep$day, levels=avgs.sleep$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.sleep$TimeinBed<-round(avgs.sleep$TimeinBed,2) avgs.asleep <- by(MyFitbit$MinutesAsleep, MyFitbit$Weekday, mean) avgs.asleep <- data.frame(day=names(avgs.asleep), MinutesAsleep=as.numeric(avgs.asleep)) avgs.asleep$day <- factor(avgs.asleep$day, levels=avgs.asleep$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.asleep$MinutesAsleep<-round(avgs.asleep$MinutesAsleep,2) avgs.awake <- by(MyFitbit$MinutesAwake, MyFitbit$Weekday, mean) avgs.awake <- data.frame(day=names(avgs.awake), MinutesAwake=as.numeric(avgs.awake)) avgs.awake$day <- factor(avgs.awake$day, levels=avgs.awake$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.awake$MinutesAwake <-round(avgs.awake$MinutesAwake,2) avgs.dis <- by(MyFitbit$Distance, MyFitbit$Weekday, mean) avgs.dis <- data.frame(day=names(avgs.dis), Distance=as.numeric(avgs.dis)) avgs.dis$day <- factor(avgs.dis$day, levels=avgs.dis$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.dis$Distance<-round(avgs.dis$Distance,2) avgs.flr <- by(MyFitbit$Floors, MyFitbit$Weekday, mean) avgs.flr <- data.frame(day=names(avgs.flr), Floor=as.numeric(avgs.flr)) avgs.flr$day <- factor(avgs.flr$day, levels=avgs.flr$day[c(2, 6, 7, 5, 1, 3, 4)]) avgs.flr$Floor<-round(avgs.flr$Floor,2) avgs.all<-cbind(avgs.steps,avgs.calories$Calories.Burned,avgs.dis$Distance,avgs.sleep$TimeinBed,avgs.asleep$MinutesAsleep,avgs.awake$MinutesAwake,avgs.flr$Floor) names(avgs.all)[2]<-"Steps" names(avgs.all)[3]<-"Calories Burned" names(avgs.all)[4]<-"Distance" names(avgs.all)[5]<-"Time in Bed" names(avgs.all)[6]<-"Minutes Asleep" names(avgs.all)[7]<-"Minutes Awake" names(avgs.all)[8]<-"Floor" plot_ly(avgs.all, x = ~day) %>% add_bars(y = ~Steps, name = "Steps") %>% add_bars(y = ~`Calories Burned`, name = "Calories Burned", visible = F) %>% add_bars(y = ~Distance, name = "Distance", visible = F) %>% add_bars(y = ~`Time in Bed`, name = "Time in Bed", visible = F) %>% layout( title = "", updatemenus = list( list( buttons = list( list(method = "restyle", args = list("visible", list(TRUE, FALSE,FALSE,FALSE)), label = "Steps"), list(method = "restyle", args = list("visible", list(FALSE, TRUE,FALSE,FALSE)), label = "Calories Burned"), list(method = "restyle", args = list("visible", list(FALSE, FALSE,TRUE,FALSE)), label = "Distance"), list(method = "restyle", args = list("visible", list(FALSE, FALSE,FALSE,TRUE)), label = "Time in Bed") )) ) ) ``` Column {data-width=500} ----------------------------- ### Average Sleep Time by Day ```{r} plot_ly(avgs.all, x = ~day, y = ~avgs.all$`Time in Bed`, type = 'bar', name = 'Time in Bed', marker = list(color = 'rgb(49,130,189)')) %>% add_trace(y = ~avgs.all$`Minutes Asleep`, name = 'Minutes Asleep', marker = list(color = 'rgb(204,204,204)')) %>% add_trace(y = ~avgs.all$`Minutes Awake`, name = 'Minutes Awake', marker = list(color = 'rgb(127,127,127)')) %>% layout(xaxis = list(title = "Weekday"), yaxis = list(title = "Minutes"), margin = list(b = 100), barmode = 'group', title = "Average Sleep time by Weekday") ``` ### Average Floors by Day ```{r} p<-ggplot(data = MyFitbit,aes(x = Weekday,y = Floors))+ geom_boxplot(aes(color = Floors))+ theme(axis.ticks.x=element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.background=element_blank(), panel.grid.major.y=element_line(colour="gray", size=.1), legend.position="right", legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.text.x = element_text(size = 8), axis.text.y = element_text(size = 8), axis.title=element_text(size=10))+ labs(x = "") ggplotly(p) ``` Workdays VS Weekends 2 {data-orientation=columns} ================== Column {data-width=500} ----------------------------------------------------------------------- ### Correlation between Steps, Distance and Calories Burned ```{r} plot_ly(MyFitbit, x = ~Steps, y = ~Distance, z = ~`Calories Burned`, color = ~Weekend, colors = c('#BF382A', '#0C4B8E'),hoverinfo = 'text', text = ~paste('Steps: ', MyFitbit$Steps, '
Distance: ', MyFitbit$Distance, '
Calories Burned: ', MyFitbit$`Calories Burned`, '
Date: ', MyFitbit$Date),mode = "markers",marker = list(size = 10)) %>% add_markers() %>% layout(title = "Correlation between steps, distance and calories burned", scene = list(xaxis = list(title = 'Steps'), yaxis = list(title = 'Distance'), zaxis = list(title = 'Calories Burned'))) ``` Column {data-width=500} ----------------------------------------------------------------------- ### Correlation between Heart Rate and Calories Burned ```{r} dates.all <- seq(as.Date("2016-12-01"), as.Date("2017-02-12"), by="day") df_list.all <- lapply(dates.all, function(x) get_intraday_data(cookie=cookie, what="heart-rate", as.character(x))) df.HR.1 <- do.call(rbind, df_list.all) df.HR.1$weekday<-format(df.HR.1$time,"%A") df.HR.1$Weekend<-chron::is.weekend(df.HR.1$dateTime) df.HR.1$Weekend[df.HR.1$Weekend==TRUE]<-"Weekend" df.HR.1$Weekend[df.HR.1$Weekend==FALSE]<-"Weekday" df.HR.1<-filter(df.HR.1,bpm >0) df.HR.2<-df.HR.1[sample(nrow(df.HR.1), 5000), ] p<-ggplot(data = df.HR.2,aes(bpm,caloriesBurned,color = Weekend))+ geom_point()+ geom_smooth(span = 0.8)+ xlab("bpm") + ylab("Calories Burned") + ggtitle("Correlation between bpm and Calories Burned") + theme(axis.ticks.x=element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.background=element_blank(), panel.grid.major.y=element_line(colour="gray", size=.1), legend.position="right", legend.title = element_text(size = 8), legend.text = element_text(size = 8), axis.text.x = element_text(size = 8), axis.text.y = element_text(size = 8), axis.title=element_text(size=10)) ggplotly(p) ``` Write up ====================== **Project Topics** Fitness Wearable market is undoutedly a growing market in the past few years, with the emergence of many cutting edge fitness tracker. As a result, people are relying on these wearables to tack their daily activities to help them quantifying their life, and ultimately live a healthier life. The purpose of this project is to use the data collected by the fitness tracker to provide some data-driven approach in solving the following questions using data visualization * A summary of main activities matrics such as daily steps, distances and Calories Burned and their correlations * A summary of main sleep metrics and how they trend over time * The impact of travelling on all metrics * The impact of weekends on all metrics * Correlation between heart rate and Calories and the impact of weekends on the coefficients **Data Acquisition and Manipulation** I personally have been using Fitbit Charge 2 for about 4 months, so the raw data is easily obtainable through fitbit online. The time frame that I'm using, as shown in the datatable, is from 2016-12-01 to 2017-02-12. In order to get the intraday data for the time that I was doing travel, I used a R library called FitbitScraper to scrape data from Fitbit.com. They're 15 minute-interval intraday data for the selected time frame with the necessary metrics. The data manipulation is straightforward. I formatted the date so that R can recognized them as date. I've also add two factor columns to indicate if the date is a weekend or not and the day of week for further analyzing purposes. **Visualization Methodology** Packages that I've used * Date import & manipulation: readr, lubridate, dplyr, fitbitScraper, chron, xts * Dashboard: flexdashboard * Static Visualization: ggplot2, ggthemes * Interactive Visualization: plotly, DT, dygraphs To start with the methodology, I first provide an overview of the dataset using DT library. Then I was trying to see an overview of some main activity metrics such as daily steps, distance and calories burned, thus I choose to use a scatter plot with x axis as steps, y axis as distance and apply a color scale on the calories burned in the scatter plot. To see the relative relationships between each stage of sleep, I plotted a stacked line graph to show minutes in bed, minutes asleep and minutes awake respectively everyday using plotly. Next, I was trying to see if travelling has some impact on my activity minutes since I was always moving around during travel. To get more data poingts, I utilized a package called fitbitscraper to get daily data with a 15 minutes interval from 2016-12-24 to 2017-01-04, when I was travelling. I drew a trend line for the minutes in each zone (Sedentary,lightly active, fairly active and very active) and highlight the time I was doing travel. I've also did plots on some main metrics such as steps taken, calories burned and heart rate when I was doing travel. After that, I'd like to look at how my metrics vary through out the day of the week, so I did four bar plots to see the averages of each metric for a given day of the week. I've also color scaled them by value. Finally, I tried to see some correlations between some of the metrics. For studying the correlations between steps, distance and calories burned, I used a 3D scatter plot since there are 3 variables. I have also seperated weekend and weekdays to see how the position of points differ. To study the correlation between heart rate and calories burned. I took a random sample from the dataset that I scraped online and used a scatter plot and applied geom_smooth function to see the correlation. **Results and Conclusion** * Obviously, there is a strong linear correlation between steps and distance travelled in a certain period, while the calories burned depends on the activity intensity. * Sleep time fluctuated between 400 mins and 600 mins, time asleep averages about 6~8 hours * Travel impacts more on light activity minutes than minutes in other zones, mainly because walking is the main activity while travel and it fall in light activity zone. Working out is not frequent during travel as well. * Sunday tend to see a higher number of steps and distance, sleep time is affected least by day of week. Average floors climbed tend to be higher on Thursday * There is a strong linear correlation between steps, distance and calories burned. * The correlation between heart rate and calories burned seem to be polynomial and weekday/weekend doesn't have a strong impact on the coefficients.