University of Missouri Columbia Exploring Confident Intervals Questions

Question Description

Lab 5

Your Name HereDate Here

library(tidyverse)
## — Attaching packages ——————————————————————————

## v ggplot2## v tibble## v tidyr## v readr

3.3.2 v purrr 0.3.43.0.3 v dplyr 1.0.21.1.2 v stringr 1.4.01.3.1 v forcats 0.5.0

## — Conflicts —————————————————————————————## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()

library(lubridate)##

## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':####     date, intersect, setdiff, union

Question 1

You just need to load in the data and do some pre-processing. Notice that there are 25 different data files, sowe will need to do something a little different here. NOTE, you NEED to make sure the lab_5_data folderis in the same directory as your R Markdown (or R) script. We create a vector called files that stores thefile path for each files, and then use the lapply() function to load in every element in the vector. As ageneral outline, we need to

Load in all the data files (this has been done for you),
Rename all column names to remove spaces and backslashes,
Change the Date, Start_Time, and End_Time columns to be time referenced (think back to a previouslab),
Change the Position_Name column to reduce the number of positions to be Midfielder, Striker,Goal Keeper, Defender, and Wing (from previous lab).

files = list.files(‘lab_5_data’, pattern=”*.csv”, full.names=TRUE)

dat = lapply(files, read_csv, col_types = cols()) %>%bind_rows() %>%
rename_all() %>% #### FILL IN
rename_all() %>% #### FILL IN

–

mutate(Date = , #### FILL INStart_Time = , #### FILL IN

End_Time = ) %>% #### FILL INmutate(Position_Name = ) #### FILL IN

Answer

Question 2

Here we are going to start exploring confident intervals (sort of). Looking at the four metricsPlayer_Load_Per_Minute, Meterage_Per_Minute, Maximum_Velocity, and Total_Distance, we want toplot the 95% confidence interval by position over time. Note that in the R chunk statement, I have includeda couple extra arguments; please do not delete these, they are only there to size your final plot. For the dataconstruction step, our general outline is

1. Select the appropriate columns,
2. Filter out NAs
3. Pivot from wide to long format
4. Find the mean, lower CI, and upper CI grouping by position, date, and metric.

Then, to plot this, our general outline is

Choose the x variable,
Choose the y variable,
Choose if you want to color and/or fill the lines/bounds by a variable (probably should do this),
Make a line,
Use geom_ribbon() to create the CI, where you pass in what the lower bound should be and what theupper bound should be,
Make the plot pretty (e.g., proper labels, perhaps a legend is not needed, etc.).

Answer

mean, lower C

nds. We supply

dat_ci = %>% #### FILL IN
select(Date, Position_Name, Period_Name, ) %>% #### FILL IN
filter(complete.cases(.)) %>% # This removes any NAs that are in the datafilter(Period_Name != ‘Session’) %>% # This is discussed on Q3
pivot_longer(, names_to = ‘Metric’, values_to = ‘Values’) %>% #### FILL IN
group_by() %>% #### FILL IN
summarise_at(vars(Values), list(mean = mean, # This creates 3 new summary variables, the

lower = ~ quantile(.,probs = 0.025),upper = ~ quantile(.,probs = 0.975))) %>%

ungroup()

ggplot(, aes(x = , y = , color = , fill = )) + #### FILL IN
geom_line() + # creates a line plot
geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.3) + # here are your confidence boufacet_wrap(~ , scale = ‘free_y’, nrow = 5) + #### FILL IN
scale_x_date() #### FILL IN

Question 3

Part a)

For this question, instead of filling in/writing your own code, you will be analyzing what I did. The “Question3” code chunk below has four different comments, each is associated with a number. You need to answer thecomment that is associated with the same numbered bullet point.

Why do we filter out all rows where the Period_Name is Session?
What do these four lines of code do?
What do these four lines of code do?
What is the effect of having the pivot_longer statement before the summarize_at statement? Whatwould happen if they were switched?

Answer

1.2.3.4.

Part b)

Below are four figures, Figures 1, 2, 3, and 4, that are created using the constructed data from part a. NOTE,you will need to load in the data from Question 1 for the figures to show up. For the four figures below,answer the following:

Is it better to represent the data as percent of time spent in each band, or would it have been better tonot transform the data and plot the raw values (i.e., the values contained in the original data)? Explain.
Are the four figures comparable? Explain.
Are the figures meaningful, and if so, what conclusions can you draw from them?
Is it better to have all of the y-axis on the same scale (withing figures and/or across figures), or shouldthe y-axis be specific to each subplot?



^{Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."}