MAT 243 South New Hampshire University NBA Basketball Team Worksheet
MAT 243 Project One Summary Report[Full Name]
[SNHU Email]
Southern New Hampshire University
Note: Replace the bracketed text on page one (the cover page) with your personal information.
1. Introduction: Problem Statement
Discuss the statement of the problem in terms of the statistical analyses that are being performed. In
your response, you should address the following questions:
●
●
●
What is the problem you are going to solve?
What data set are you using?
What statistical methods will you be using to do the analysis for this project?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
2. Introduction: Your Team and the Assigned Team
In this project, you picked a team and you were assigned a team to do comparative analysis.
See Steps 1 and 2 in the Python script to address the following items:
●
●
What team did you pick and what years were picked to do the analysis?
What team and range of years were you assigned for the comparative study? (Hint: This is called
the assigned team in the Python script.) Present this information in a formatted table as shown
below.
Table 1. Information on the Teams
1. Yours
2. Assigned
Name of Team
Team (e.g. Knicks)
Team (e.g. Bulls)
Assigned Years
XXXX-YYYY (e.g. 2013 – 2015)
XXXX-YYYY (e.g. 2013 – 2015)
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
3. Data Visualization: Points Scored by Your Team
In the Python script, you created a visualization for the distribution of points scored by your team.
See Step 3 in the Python script to address the following items in a paragraph response:
●
●
●
●
In general, how is data visualization used to study data distributions and trends?
In this activity, you were asked to pick one of the two plots that best describes the data
distribution of the variable for your team. Include a screenshot of this plot in your report.
Why did you pick this plot? Explain.
What can you say about the distribution of the variable by visually inspecting this plot? What
does this signify?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
4. Data Visualization: Points Scored by the Assigned Team
In the Python script, you created a visualization for the distribution of points scored by the assigned
team.
See Step 4 in the Python script to address the following items in a paragraph response:
●
●
●
In this activity, you were asked to pick one of the two plots that best describes the data
distribution of the variable for the assigned team. Include this plot in your report.
Why did you pick this plot? Explain.
What can you say about the distribution of the variable by visually inspecting this plot? What
does this signify?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
5. Data Visualization: Comparing the Two Teams
In the Python script, you created a visualization for the difference in the distributions of points scored by
your team and the assigned team.
See Step 5 in the Python script to address the following items in a paragraph response:
●
●
●
●
In general, how is data visualization used to compare two different data distributions?
In this activity, you were asked to pick one of the two plots that best compares the data
distributions of your team with the assigned team. Include a screenshot of this plot in your
report.
Why did you pick this plot? Explain.
How do the two distributions compare to each other?
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
6. Descriptive Statistics: Points Scored By Your Team in Home Games
In the Python script, you calculated descriptive statistics on the points scored by your team in games
played at home venue. These included the mean, median, variance, and standard deviation for the
relative skill of your team.
See Step 6 in the Python script to address the following items:
●
Summarize all statistics in a formatted table as shown below. Use one row for each statistic. You
will need to add rows to the table in order to include all of your statistics.
Table 2. Descriptive Statistics for Points Scored by Your Team in Home Games
Statistic Name
Statistic
(for example, Mean)
●
●
●
Value
X.XX
*Round off to 2 decimal places.
In general, how are the measures of central tendency and variability used to analyze a data
distribution?
Interpret each statistic in detail and explain what it represents in this scenario.
Use the mean and the median to describe the distribution of points scored by your team in home
games.
○ Describe the skew: Is it left, right, or bell-shaped?
○ Explain which measure of central tendency is best to use to represent the center of the
distribution based on its skew.
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
7. Descriptive Statistics: Points Scored By Your Team in Away Games
In the Python script, you calculated descriptive statistics on the points scored by your team in games
played at opponent’s venue (away). These included the mean, median, variance, and standard deviation
for the relative skill of the assigned team.
See Step 7 in the Python script to address the following items:
●
Summarize all statistics in a formatted table as shown below. Use one row for each statistic. You
will need to add rows to the table in order to include all of your statistics.
Table 3. Descriptive Statistics for Points Scored by Your Team in Away Games
Statistic Name
Statistic
(for example, Mean)
●
●
●
Value
X.XX
*Round off to 2 decimal places.
Interpret each statistic in detail and explain what it represents in this scenario.
Use the mean and the median to describe the distribution of points scored by your team in away
games.
a. Describe the skew: Is it left, right, or bell-shaped?
b. Explain which measure of central tendency is best to use to represent the center of the
distribution based on its skew.
Is your team performing better in games played at home than those played away? Use the mean
and the standard deviation to answer this question. What can be deduced by comparing the
standard deviation of points scored in home games and points scored in away games?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
8. Confidence Intervals for the Average Relative Skill of All Teams in Your Team’s Years
In the Python script, you calculated a 95% confidence interval for the average relative skill of all teams in
the league during the years of your team. Additionally, you calculated the probability that a given team
in the league has a relative skill level less than that of the team that you picked.
See Step 8 in the Python script to address the following items:
●
Report the confidence interval in a formatted table as shown below.
Table 4. Confidence Interval for Average Relative Skill of Teams in Your Team’s Years
Confidence Level (%)
XX% (for example, 95%)
●
●
●
●
Confidence Interval
(X.XX, X.XX)
*Round off to 2 decimal places.
Describe how confidence intervals are generally used in estimating the measures of central
tendency for a population.
Provide a detailed interpretation of the confidence interval in terms of the average relative skill
of teams in the range of years that you picked.
How would your interval be different if you had used a different confidence level?
What is the probability that a given team in the league has a relative skill level less than that of
the team that you picked? Is it unusual that a team has a skill level less than your team?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
9. Confidence Intervals for the Average Relative Skill of All Teams in the Assigned Team’s Years
In the Python script, you calculated a 95% confidence interval for the average relative skill of all teams in
the league during the years of the assigned team. Additionally, you calculated the probability that a
given team in the league has a relative skill level less than that of the assigned team.
See Step 9 in the Python script to address the following items:
●
Report the confidence interval in a formatted table as shown below.
Table 5. Confidence Interval for Average Relative Skill of Teams in Assigned Team’s Years
Confidence Level (%)
XX% (for example, 95%)
Confidence Interval
(X.XX, X.XX)
*Round off to 2 decimal places.
●
●
●
Provide a detailed interpretation of the confidence interval in terms of the average relative skill
of teams in the assigned team’s range of years.
Discuss how your interval would be different if you had used a different confidence level.
How does this confidence interval compare with the previous one? What does this signify in
terms of the average relative skill of teams in the range of years that you picked versus the
average relative skill of teams in the assigned team’s range of years?
Answer the questions in a paragraph response. Remove all questions and this note (but not the
table) before submitting! Do not include Python code in your report.
10. Conclusion
Describe the results of your statistical analyses clearly, using proper descriptions of statistical terms and
concepts.
●
●
What is the practical importance of the analyses that were performed?
Describe what these results mean for the scenario.
Answer the questions in a paragraph response. Remove all questions and this note before
submitting! Do not include Python code in your report.
11. Citations
You were not required to use external resources for this report. If you did not use any resources, you
should remove this entire section. However, if you did use any resources to help you with your
interpretation, you must cite them. Use proper APA format for citations.
Insert references here in the following format:
Author’s Last Name, First Initial. Middle Initial. (Year of Publication). Title of book: Subtitle of book,
edition. Place of Publication: Publisher.
Project One: Data Visualization, Descriptive Statistics,
Confidence Intervals
This notebook contains the step-by-step directions for Project One. It is very important to run through
the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the
steps in this notebook, be sure to write your summary report.
You are a data analyst for a basketball team and have access to a large set of historical data that
you can use to analyze performance patterns. The coach of the team and your management have
requested that you use descriptive statistics and data visualization techniques to study distributions
of key performance metrics that are included in the data set. These data-driven analytics will help
make key decisions to improve the performance of the team. You will use the Python programming
language to perform the statistical analyses and then prepare a report of your findings to present for
the team’s management. Since the managers are not data analysts, you will need to interpret your
findings and describe their practical implications.
There are four important variables in the data set that you will study in Project One.
Variable
What does it represent?
pts
Points scored by the team in a game
elo_n
A measure of the relative skill level of the
team in the league
year_id
Year when the team played the games
fran_id
Name of the NBA team
The ELO rating, represented by the variable elo_n, is used as a measure of the relative skill of a
team. This measure is inferred based on the final score of a game, the game location, and the
outcome of the game relative to the probability of that outcome. The higher the number, the higher
the relative skill of a team.
In addition to studying data on your own team, your management has assigned you a second team
so that you can compare its performance with your own team’s.
Team
What does it represent?
Your Team
This is the team that has hired you as an
analyst. This is the team that you will pick
below. See Step 2.
Assigned
Team
This is the team that the management has
assigned to you to compare against your
team. See Step 1.
Reminder: It may be beneficial to review the summary report template for Project One prior to
starting this Python script. That will give you an idea of the questions you will need to answer with
the outputs of this script.
Step 1: Data Preparation & the Assigned Team
This step uploads the data set from a CSV file. It also selects the assigned team for this analysis. Do
not make any changes to the code block below.
1. The assigned team is the Chicago Bulls from the years 1996-1998
Click the block of code below and hit the Run button above.
In [1]:
import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import display, HTML
nba_orig_df = pd.read_csv(‘nbaallelo.csv’)
nba_orig_df = nba_orig_df[(nba_orig_df[‘lg_id’]==’NBA’) & (nba_orig_df[‘is_pl
ayoffs’]==0)]
columns_to_keep = [‘game_id’,’year_id’,’fran_id’,’pts’,’opp_pts’,’elo_n’,’opp
_elo_n’, ‘game_location’, ‘game_result’]
nba_orig_df = nba_orig_df[columns_to_keep]
# The dataframe for the assigned team is called assigned_team_df.
# The assigned team is the Chicago Bulls from 1996-1998.
assigned_years_league_df = nba_orig_df[(nba_orig_df[‘year_id’].between(1996,
1998))]
assigned_team_df = assigned_years_league_df[(assigned_years_league_df[‘fran_i
d’]==’Bulls’)]
assigned_team_df = assigned_team_df.reset_index(drop=True)
display(HTML(assigned_team_df.head().to_html()))
print(“printed only the first five observations…”)
print(“Number of rows in the data set =”, len(assigned_team_df))
g
a
m
e
_
i
d
0
1
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
1
9
9
5
1
1
0
3
0
C
H
I
1
9
9
6
B
u
l
l
s
1
0
5
1
9
9
5
1
1
1
9
9
6
B
u
l
l
s
1
0
7
o
p
p
_
p
t
s
e
l
o
_
n
o
p
p
_
e
l
o
_
n
9
1
1
5
9
8
.
2
9
2
4
1
5
3
1
.
7
4
4
9
8
5
1
6
0
4
.
3
1
4
5
8
.
6
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
H
W
H
W
g
a
m
e
_
i
d
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
2
3
1
9
9
5
1
1
0
9
1
9
9
6
B
u
l
l
s
1
9
9
6
B
u
l
l
s
1
1
7
1
0
6
g
a
m
e
_
r
e
s
u
l
t
e
l
o
_
n
o
p
p
_
e
l
o
_
n
9
4
0
4
1
5
1
0
8
1
6
0
5
.
7
9
8
3
1
3
1
0
.
9
3
4
9
H
W
8
8
1
6
1
8
.
8
7
1
4
5
2
.
8
2
A
W
o
p
p
_
p
t
s
0
4
0
C
H
I
1
9
9
5
1
1
0
7
0
C
H
I
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
i
d
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
o
p
p
_
p
t
s
0
C
L
E
4
1
9
9
5
1
1
1
1
0
C
H
I
1
9
9
6
B
u
l
l
s
1
1
0
1
0
6
e
l
o
_
n
o
p
p
_
e
l
o
_
n
0
1
6
8
1
6
2
1
.
1
5
9
1
1
4
9
0
.
2
8
6
1
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
H
W
printed only the first five observations…
Number of rows in the data set = 246
Step 2: Pick Your Team
In this step, you will pick your team. The range of years that you will study for your team is 20132015. Make the following edits to the code block below:
1. Replace ??TEAM?? with your choice of team from one of the following team names.
*Bucks, Bulls, Cavaliers, Celtics, Clippers, Grizzlies, Hawks, Heat, Jazz, Kings, Knicks,
Lakers, Magic, Mavericks, Nets, Nuggets, Pacers, Pelicans, Pistons, Raptors, Rockets,
Sixers, Spurs, Suns, Thunder, Timberwolves, Trailblazers, Warriors, Wizards*
Remember to enter the team name within single quotes. For example, if you picked the
Suns, then ??TEAM?? should be replaced with ‘Suns’.
After you are done with your edits, click the block of code below and hit the Run button above.
In [2]:
# Range of years: 2013-2015 (Note: The line below selects ALL teams within th
e three-year period 2013-2015. This is not your team’s dataframe.
your_years_leagues_df = nba_orig_df[(nba_orig_df[‘year_id’].between(2013, 201
5))]
# The dataframe for your team is called your_team_df.
# —- TODO: make your edits here —your_team_df = your_years_leagues_df[(your_years_leagues_df[‘fran_id’]==’Lake
rs’)]
your_team_df = your_team_df.reset_index(drop=True)
display(HTML(your_team_df.head().to_html()))
print(“printed only the first five observations…”)
print(“Number of rows in the data set =”, len(your_team_df))
g
a
m
e
_
i
d
0
2
0
1
2
1
0
3
0
0
L
y
e
a
r
_
i
d
2
0
1
3
f
r
a
n
_
i
d
L
a
k
e
r
s
p
t
s
9
1
o
p
p
_
p
t
s
e
l
o
_
n
o
p
p
_
e
l
o
_
n
9
9
1
5
4
1
.
7
5
8
5
1
5
3
3
.
9
2
9
7
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
H
L
g
a
m
e
_
i
d
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
o
p
p
_
p
t
s
e
l
o
_
n
o
p
p
_
e
l
o
_
n
1
1
6
1
5
3
1
.
7
1
8
4
1
4
6
0
.
7
0
1
5
A
L
1
0
5
1
5
1
8
.
7
9
8
1
1
5
8
0
.
8
6
7
9
H
L
A
L
1
2
2
0
1
2
1
0
3
1
0
P
O
R
2
0
1
2
1
1
0
2
0
L
2
0
1
3
2
0
1
3
L
a
k
e
r
s
L
a
k
e
r
s
1
0
6
9
5
g
a
m
e
_
i
d
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
o
p
p
_
p
t
s
e
l
o
_
n
o
p
p
_
e
l
o
_
n
7
9
1
5
2
7
.
5
9
2
7
1
4
0
9
.
0
5
6
6
H
W
9
5
1
5
2
1
.
1
6
0
3
1
5
3
5
.
9
6
7
4
A
L
A
L
3
4
2
0
1
2
1
1
0
4
0
L
A
L
2
0
1
2
1
1
0
7
0
U
2
0
1
3
2
0
1
3
L
a
k
e
r
s
L
a
k
e
r
s
1
0
8
8
6
g
a
m
e
_
i
d
y
e
a
r
_
i
d
f
r
a
n
_
i
d
p
t
s
o
p
p
_
p
t
s
e
l
o
_
n
o
p
p
_
e
l
o
_
n
g
a
m
e
_
l
o
c
a
t
i
o
n
g
a
m
e
_
r
e
s
u
l
t
T
A
printed only the first five observations…
Number of rows in the data set = 246
Step 3: Data Visualization: Points Scored by Your Team
The coach has requested that you provide a visual that shows the distribution of points scored by
your team in the years 2013-2015. The code below provides two possible options. Pick ONE of
these two plots to include in your summary report. Choose the plot that you think provides the best
visual for the distribution of points scored by your team. In your summary report, you must explain
why you think your visual is the best choice.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
In [3]:
import seaborn as sns
# Histogram
fig, ax = plt.subplots()
plt.hist(your_team_df[‘pts’], bins=20)
plt.title(‘Histogram of points scored by Your Team in 2013 to 2015’, fontsize
=18)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
plt.show()
print(“”)
# Scatterplot
plt.title(‘Scatterplot of points scored by Your Team in 2013 to 2015’, fontsi
ze=18)
sns.regplot(your_team_df[‘year_id’], your_team_df[‘pts’], ci=None)
plt.show()
Step 4: Data Visualization: Points Scored by the Assigned Team
The coach has also requested that you provide a visual that shows a distribution of points scored by
the Bulls from years 1996-1998. The code below provides two possible options. Pick ONE of these
two plots to include in your summary report. Choose the plot that you think provides the best visual
for the distribution of points scored by your team. In your summary report, you will explain why you
think your visual is the best choice.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
In [4]:
import seaborn as sns
# Histogram
fig, ax = plt.subplots()
plt.hist(assigned_team_df[‘pts’], bins=20)
plt.title(‘Histogram of points scored by the Bulls in 1996 to 1998’, fontsize
=18)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
plt.show()
# Scatterplot
plt.title(‘Scatterplot of points scored by the Bulls in 1996 to 1998’, fontsi
ze=18)
sns.regplot(assigned_team_df[‘year_id’], assigned_team_df[‘pts’], ci=None)
plt.show()
Step 5: Data Visualization: Comparing the Two Teams
Now the coach wants you to prepare one plot that provides a visual of the differences in the
distribution of points scored by the assigned team and your team. The code below provides two
possible visuals. Choose the plot that allows for the best comparison of the data distributions.
Click the block of code below and hit the Run button above.
NOTE: If the plots are not created, click the code section and hit the Run button again.
In [5]:
import seaborn as sns
# Side-by-side boxplots
both_teams_df = pd.concat((assigned_team_df, your_team_df))
plt.title(‘Boxplot to compare points distribution’, fontsize=18)
sns.boxplot(x=’fran_id’,y=’pts’,data=both_teams_df)
plt.show()
print(“”)
# Histograms
fig, ax = plt.subplots()
plt.hist(assigned_team_df[‘pts’], 20, alpha=0.5, label=’Assigned Team’)
plt.hist(your_team_df[‘pts’], 20, alpha=0.5, label=’Your Team’)
plt.title(‘Histogram to compare points distribution’, fontsize=18)
plt.xlabel(‘Points’)
plt.legend(loc=’upper right’)
plt.show()
Step 6: Descriptive Statistics: Relative Skill of Your Team
The management of your team wants you to run descriptive statistics on the relative skill of your
team from 2013-2015. In this project, you will use the variable ‘elo_n’ to respresent the relative skill
of the teams. Calculate descriptive statistics including the mean, median, variance, and standard
deviation for the relative skill of your team. Make the following edits to the code block below:
1. Replace ??MEAN_FUNCTION?? with the name of Python function that calculates the
mean.
2. Replace ??MEDIAN_FUNCTION?? with the name of Python function that calculates the
median.
3. Replace ??VAR_FUNCTION?? with the name of Python function that calculates the
variance.
4. Replace ??STD_FUNCTION?? with the name of Python function that calculates the
standard deviation.
After you are done with your edits, click the block of code below and hit the Run button above.
In [7]:
print(“Your Team’s Relative Skill in 2013 to 2015”)
print(“——————————————————-“)
# —- TODO: make your edits here —mean = your_team_df[‘elo_n’].mean()
median = your_team_df[‘elo_n’].median()
variance = your_team_df[‘elo_n’].var()
stdeviation = your_team_df[‘elo_n’].std()
print(‘Mean =’, round(mean,2))
print(‘Median =’, round(median,2))
print(‘Variance =’, round(variance,2))
print(‘Standard Deviation =’, round(stdeviation,2))
Your Team’s Relative Skill in 2013 to 2015
——————————————————Mean = 1440.49
Median = 1412.34
Variance = 6337.75
Standard Deviation = 79.61
Step 7 – Descriptive Statistics – Relative Skill of the Assigned Team
The management also wants you to run descriptive statistics for the relative skill of the Bulls from
1996-1998. Calculate descriptive statistics including the mean, median, variance, and standard
deviation for the relative skill of the assigned team.
You are to write this code block yourself.
Use Step 6 to help you write this code block. Here is some information that will help you write this
code block.
1. The dataframe for the assigned team is called assigned_team_df.
2. The variable ‘elo_n’ respresent the relative skill of the teams.
3. Your statistics should be rounded to two decimal places.
Write your code in the code block section below. After you are done, click this block of code and hit
the Run button above. Reach out to your instructor if you need more help with this step.
In [9]:
# Write your code in this code block.
print(“Assigned Team’s Relative Skill in 1996 to 1998”)
print(“——————————————————-“)
# —- TODO: make your edits here —mean = assigned_team_df[‘elo_n’].mean()
median = assigned_team_df[‘elo_n’].median()
variance = assigned_team_df[‘elo_n’].var()
stdeviation = assigned_team_df[‘elo_n’].std()
print(‘Mean =’, round(mean,2))
print(‘Median =’, round(median,2))
print(‘Variance =’, round(variance,2))
print(‘Standard Deviation =’, round(stdeviation,2))
Assigned Team’s Relative Skill in 1996 to 1998
——————————————————Mean = 1739.8
Median = 1751.23
Variance = 2651.55
Standard Deviation = 51.49
Step 8: Confidence Intervals for the Average Relative Skill of All
Teams in Your Team’s Years
The management wants to you to calculate a 95% confidence interval for the average relative skill of
all teams in 2013-2015. To construct a confidence interval, you will need the mean and standard
error of the relative skill level in these years. The code block below calculates the mean and the
standard deviation. Your edits will calculate the standard error and the confidence interval. Make the
following edits to the code block below:
1. Replace ??SD_VARIABLE?? with the variable name representing the standard deviation
of relative skill of all teams from your years. (Hint: the standard deviation variable is in the
code block below)
2. Replace ??CL?? with the confidence level of the confidence interval.
3. Replace ??MEAN_VARIABLE?? with the variable name representing the mean relative
skill of all teams from your years. (Hint: the mean variable is in the code block below)
4. Replace ??SE_VARIABLE?? with the variable name representing the standard
error. (Hint: the standard error variable is in the code block below)
The management also wants you to calculate the probability that a team in the league has a relative
skill level less than that of the team that you picked. Assuming that the relative skill of teams is
Normally distributed, Python methods for a Normal distribution can be used to answer this question.
The code block below uses two of these Python methods. Your task is to identify the correct Python
method and report the probability.
After you are done with your edits, click the block of code below and hit the Run button above.
In [10]:
print(“Confidence Interval for Average Relative Skill in the years 2013 to 20
15”)
print(“———————————————————————————————————–“)
# Mean relative skill of all teams from the years 2013-2015
mean = your_years_leagues_df[‘elo_n’].mean()
# Standard deviation of the relative skill of all teams from the years 2013-2
015
stdev = your_years_leagues_df[‘elo_n’].std()
n = len(your_years_leagues_df)
#Confidence interval
# —- TODO: make your edits here —stderr = stdev/(n ** 0.5)
conf_int_95 = st.norm.interval(0.95, mean, stderr)
print(“95% confidence interval (unrounded) for Average Relative Skill (ELO) i
n the years 2013 to 2015 =”, conf_int_95)
print(“95% confidence interval (rounded) for Average Relative Skill (ELO) in
the years 2013 to 2015 = (“, round(conf_int_95[0], 2),”,”, round(conf_int_95
[1], 2),”)”)
print(“\n”)
print(“Probability a team has Average Relative Skill LESS than the Average Re
lative Skill (ELO) of your team in the years 2013 to 2015”)
print(“——————————————————————————————————————————————————–“)
mean_elo_your_team = your_team_df[‘elo_n’].mean()
choice1 = st.norm.sf(mean_elo_your_team, mean, stdev)
choice2 = st.norm.cdf(mean_elo_your_team, mean, stdev)
# Pick the correct answer.
print(“Which of the two choices is correct?”)
print(“Choice 1 =”, round(choice1,4))
print(“Choice 2 =”, round(choice2,4))
Confidence Interval for Average Relative Skill in the years 2013 to 2015
———————————————————————————————————-95% confidence interval (unrounded) for Average Relative Skill (ELO) in the y
ears 2013 to 2015 = (1502.0236894390478, 1507.1824625533618)
95% confidence interval (rounded) for Average Relative Skill (ELO) in the yea
rs 2013 to 2015 = ( 1502.02 , 1507.18 )
Probability a team has Average Relative Skill LESS than the Average Relative
Skill (ELO) of your team in the years 2013 to 2015
——————————————————————————————————————————————————–Which of the two choices is correct?
Choice 1 = 0.7147
Choice 2 = 0.2853
Step 9 – Confidence Intervals for the Average Relative Skill of All
Teams in the Assigned Team’s Years
The management also wants to you to calculate a 95% confidence interval for the average relative
skill of all teams in the years 1996-1998. Calculate this confidence interval.
You are to write this code block yourself.
Use Step 8 to help you write this code block. Here is some information that will help you write this
code block. Reach out to your instructor if you need help.
1. The dataframe for the years 1996-1998 is called assigned_years_league_df
2. The variable ‘elo_n’ represents the relative skill of teams.
3. Start by calculating the mean and the standard deviation of relative skill (ELO) in years
1996-1998.
4. Calculate n that represents the sample size.
5. Calculate the standard error which is equal to the standard deviation of Relative Skill
(ELO) divided by the square root of the sample size n.
6. Assuming that the population standard deviation is known, use Python methods for the
Normal distribution to calculate the confidence interval.
7. Your statistics should be rounded to two decimal places.
The management also wants you to calculate the probability that a team had a relative skill level less
than the Bulls in years 1996-1998. Assuming that the relative skill of teams is Normally distributed,
calculate this probability.
You are to write this code block yourself.
Use Step 8 to help you write this code block. Here is some information that will help you write this
code block.
1. Calculate the mean relative skill of the Bulls. Note that the dataframe for the Bulls is called
assigned_team_df. The variable ‘elo_n’ represents the relative skill.
2. Use Python methods for a Normal distribution to calculate this probability.
3. The probability value should be rounded to four decimal places.
Write your code in the code block section below. After you are done, click this block of code and hit
the Run button above. Reach out to your instructor if you need more help with this step.
In [11]:
# Write your code in this code block section
print(“Confidence Interval for Average Relative Skill in the years 1996 to 19
98”)
print(“———————————————————————————————————–“)
# Mean relative skill of all teams from the years 1996-1998
mean = assigned_years_league_df[‘elo_n’].mean()
# Standard deviation of the relative skill of all teams from the years 1996-1
998
stdev = assigned_years_league_df[‘elo_n’].std()
n = len(assigned_years_league_df)
#Confidence interval
# —- TODO: make your edits here —stderr = stdev/(n ** 0.5)
conf_int_95 = st.norm.interval(0.95, mean, stderr)
print(“95% confidence interval (unrounded) for Average Relative Skill (ELO) i
n the years 1996 to 1998 =”, conf_int_95)
print(“95% confidence interval (rounded) for Average Relative Skill (ELO) in
the years 1996 to 1998 = (“, round(conf_int_95[0], 2),”,”, round(conf_int_95[
1], 2),”)”)
print(“\n”)
print(“Probability a team has Average Relative Skill LESS than the Average Re
lative Skill (ELO) of the assigned team in the years 1996 to 1998”)
print(“——————————————————————————————————————————————————–“)
mean_elo_assigned_team = assigned_team_df[‘elo_n’].mean()
choice1 = st.norm.sf(mean_elo_assigned_team, mean, stdev)
choice2 = st.norm.cdf(mean_elo_assigned_team, mean, stdev)
# Pick the correct answer.
print(“Which of the two choices is correct?”)
print(“Choice 1 =”, round(choice1,4))
print(“Choice 2 =”, round(choice2,4))
Confidence Interval for Average Relative Skill in the years 1996 to 1998
———————————————————————————————————-95% confidence interval (unrounded) for Average Relative Skill (ELO) in the y
ears 1996 to 1998 = (1487.6565859527095, 1493.6465501840999)
95% confidence interval (rounded) for Average Relative Skill (ELO) in the yea
rs 1996 to 1998 = ( 1487.66 , 1493.65 )
Probability a team has Average Relative Skill LESS than the Average Relative
Skill (ELO) of the assigned team in the years 1996 to 1998
——————————————————————————————————————————————————–Which of the two choices is correct?
Choice 1 = 0.0268
Choice 2 = 0.9732
End of Project One
Download the HTML output and submit it with your summary report for Project One. The HTML
output can be downloaded by clicking File, then Download as, then HTML. Do not include the
Python code within your summary report.
Top-quality papers guaranteed
100% original papers
We sell only unique pieces of writing completed according to your demands.
Confidential service
We use security encryption to keep your personal data protected.
Money-back guarantee
We can give your money back if something goes wrong with your order.
Enjoy the free features we offer to everyone
-
Title page
Get a free title page formatted according to the specifics of your particular style.
-
Custom formatting
Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.
-
Bibliography page
Don’t pay extra for a list of references that perfectly fits your academic needs.
-
24/7 support assistance
Ask us a question anytime you need to—we don’t charge extra for supporting you!
Calculate how much your essay costs
What we are popular for
- English 101
- History
- Business Studies
- Management
- Literature
- Composition
- Psychology
- Philosophy
- Marketing
- Economics