University of Missouri Columbia Methods in Sport Analytics Statistics Questions
Question Description
Lab 4
Your Name HereDate Here
library(tidyverse)
## — Attaching packages ——————————————————————————
## v ggplot2 3.3.2## v tibble 3.0.3## v tidyr 1.1.2## v readr 1.3.1
v purrr 0.3.4v dplyr 1.0.2v stringr 1.4.0v forcats 0.5.0
## — Conflicts —————————————————————————————## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':#### date, intersect, setdiff, union
Question 1
You need to load in the two data sets, GPS_Month_deidentified.csv and Wellness_Month_deidentified.csv,and do a little bit of preprocessing. The column names of both data sets contain spaces and forward slashesthat we don’t want in our column names. Additionally, each data set has a column denoting the date onwhich the observation occurs, but they are not recognized as dates. We want to fix this two issues. Below,you need to fill in the code where asked.
unction makes
gps = read_csv() #### FILL INwell = read_csv() #### FILL IN
well = %>% #### FILL IN
rename_all( ~str_replace_all() ) %>% #### FILL IN
rename_all( ~str_replace_all() ) %>% #### FILL IN
mutate(Timestamp = mdy(str_split(, ” “, simplify = T)[,1])) #### FILL IN, note the mdy f
gps = %>% #### FILL IN
rename_all( ~str_replace_all() ) %>% #### FILL INrename_all( ~str_replace_all() ) %>% #### FILL INmutate(Date = mdy()) #### FILL IN
1
–
–
Answer
Question 2
We want to look at the wellness data, and determine how the average of all of the wellness metric are changingover time by position. As a general outline,
1. Select the appropriate columns (or remove the unneeded column(s), whichever is easier),2. Group by time and position,
3. Summarize the variables,
4. Pivot to long format,
5. Plot the data.
However, there are two obvious ways to visualize the data. The first possibility is to color each line by thewellness metric and create a different plot for each position. The second is to color each line by the positionand create a different plot for each wellness metric. You will need to create both of these plots, and then givea short explanation, no more than 1 paragraph, as to why one plot is better at visualizing the data than theother (or perhaps you need both to properly visualize the data).
well_plt = %>% #### FILL INselect() %>% #### FILL INgroup_by() %>% #### FILL INsummarise_at() %>% #### FILL INungroup() %>%
pivot_longer(, values_to = “Rating”, names_to = “Metric”)#### FILL IN
ggplot(well_plt, aes(x = , y = , color = )) + #### FILL INgeom_line(size = 1) +
scale_x_date() + #### FILL IN – you can find this in a previous lab!facet_wrap() + #### FILL IN
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10),axis.text.y = element_text(size = 10),legend.position=”bottom”,
legend.title=element_blank())
ggplot(well_plt, aes(x = , y = , color = )) + #### FILL INgeom_line(size = 1) +
scale_x_date() + #### FILL IN- you can find this in a previous lab!facet_wrap() + #### FILL IN
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10),axis.text.y = element_text(size = 10),legend.position=”bottom”,
legend.title=element_blank())
Answer
Question 3
For this question, we want to plot PlayerLoad_per_Min., intensity, IMALeft, and IMARIght by positionby date from the GPS data, adding an vertical line that indicates if a game was played on that day. However,
2
if you look at the GPS data set, you will notice that PlayerLoad_per_Min. has entries #DIV/0! thatcorrespond to no training that day – we will need to handle these! As a general outline, we will
- Select the 4 metrics along with Date, Position, and Type,
- Filter out the rows corresponding to the unwanted PlayerLoad_per_Min. entries,
- Change Type to an indicator column where 1 means game day and 0 means not game day,
- Change PlayerLoad_per_Min. to numeric (it was a character column b/c of the #DIV/0! entriesoriginally),
- Group by date and position,
- Summarize the variables of interest,
- Change to long format,
- Plot the data.
As with the previous problem, there are two ways to visualize the data. One is coloring the lines by positionand creating individual plots for the metrics, and the other is coloring the lines by metrics and creatingindividual plots for the positions. You will need to create both of these plots, and then give a short explanation,no more than 1 paragraph, as to why one plot is better at visualizing the data than the other (or perhaps youneed both to properly visualize the data). Additionally, why might this type of figure be useful for coachesand trainers?
3
# creates a v
# creates a v
gps_plt = %>% #### FILL IN
select() %>% #### FILL IN
filter() %>% #### FILL IN
mutate(Type = ifelse(Type == “Games”, 1, 0)) %>%mutate() %>% #### FILL IN
group_by() %>% #### FILL IN
summarise_at() %>% #### FILL IN
ungroup() %>%
pivot_longer(, names_to = “Metric”, values_to = “Value”)#### FILL IN
ggplot() + #### FILL IN
geom_line(size = 1) +
geom_vline(aes(xintercept = Date), data = gps_plt %>% filter(Type == 1), alpha = 0.75) +scale_x_date() + #### FILL IN – you can find this in a previous lab!
facet_wrap(, scales = “free_y”) + #### FILL IN
xlab(“”) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10),
axis.text.y = element_text(size = 10),legend.position=”bottom”,legend.title=element_blank())
ggplot() + #### FILL IN
geom_line(size = 1) +
geom_vline(aes(xintercept = Date), data = gps_plt %>% filter(Type == 1), alpha = 0.75) +scale_x_date() + #### FILL IN – you can find this in a previous lab!
facet_wrap(, scales = “free_y”) + #### FILL IN
xlab(“”) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10),
axis.text.y = element_text(size = 10),legend.position=”bottom”,legend.title=element_blank())
Answer
4
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."