# Quantitative Methods for Business Probability Distribution Regression Questions

Measures of central tendency and SpreadMeasures of central tendency:

Mean: Average of all observations (sum of all observations divided by the number of observations)

Mode: the most frequent observation

Median: the “middle” observation (50% of all observations have a value below the median and 50% of all observations have a
value above the median, the average between the middle two observations if we have an even number of observations)
When the median differs from the mean: Skewness

Range: The difference between the largest and the smallest observation

Variance: typical distance (squared) from the mean:

Standard deviation: square root of variance:

The Interquartile range: the difference between the third quartile and the first quartile.
2
Probability theory
1. If all events are equally likely (like for a dice, each side is equally likely) we compute
probability by the formula:
Number of outcomes when the event occurs / Number of possible outcomes
Typical question: Suppose you toss two fair (i.e., unbiased) six-faced dice.
What is the probability of getting the number 6 for both dices?
Solution: There are 6*6 possible outcomes. Why: 6 for the first dice and 6 for the second dice. How many
outcomes are there when the event (getting 6 on both) occurs? Only one: the outcome in which we get 6 on the
first dice and 6 on the second dice. Now we use the formula: 1/36.
Typical question: A card is drawn randomly from a deck of 52 cards. A) What is the probability that it is red? B)
What is the probability that it has the value 10? C) What is the probability that it is a red card with the value 10?
Solution: A) There are 52 cards. Half of the cards are red: the probability that it is red is thus 26/52 = 1/2. B) There
are four cards with the value 10: the probability a card has the value 10 is thus 4/52 = 1/13. C)There are only two
red cards with a value 10, so probability is 2/52 = 1/26.
3
Probability theory
2. What is Probability of A and B?
For example, what is the probability of getting the number 4 on the first dice and the number 5 on the
second dice?
If events are independent (they don’t influence each other, for example the outcome of one dice is
not influenced by the outcome of the other dice) then
P(A and B) = P(A)*P(B)
Typical question: There are 5 green books and 8 red books. Eric picks one book at random and then replaces it.
Anna picks another book at random and then replaces it. A) What is the probability that they both pick green
books? B) What is the probability that Eric picks a green book and Anna picks a red book?
Solution: A) The probability of picking a green book is 5/(5+8) = 5/13. The probability that Eric and Anna picks a
green book is P(Green)*P(Green) = (5/13)*(5/13) = 0.1479. B) The probability of picking a red book is 8/(5+8) =
8/13. the probability that Eric picks a green book and Anna picks a red book is (5/13)*(8/13) = 0.2367.
4
Probability theory
3. What is Probability of A or B?
It is P(A or B) = P(A) + P(B) – P(A and B).
A
B
Example: what is the probability a of getting ‘King’ OR a ‘red card?
• It is Pr(red) + Pr(King) -Pr(red AND king)
A and B
• = (26/52)+(4/52)-(2/52) = 28/52
Typical question: 22% of all firms in an industry have bought insurance. 15% have bought insurance and have a
risk consultant. 30% have a risk consultant. What proportion has insurance or a risk consultant?
Solution: P(I or R) = P(I)+P(R)-P(I and R) = 0.22+0.30- 0.15 = 0.52 – 0.15 = 0.37
Typical question: Suppose we flipped a coin and rolled a six faced unbiased dice. What is the probability of getting
a head on the coin OR the number 6 on the dice?
Solution: Use the formula: P(head) = 0.5, P(6) = 1/6. P(head and 6) = 0.5*(1/6). So, we get P(head or 6) = 0.5 +
(1/6) – 0.5*(1/6) = (6/12)+(2/12)-(1/12) = 7/12 = 0.58333.
5
Probability theory
4. Conditional Probability
The probability of A given that we know that B happened, P(A | B)
Formula (Bayes Rule): P(A | B) = P(A and B) / P(B)
Can also be written as: P(A | B) = P(B | A)P(A) / P(B)
Typical question: At the Warwick Foods factory, cookies are tested for crispness. 80% of th crispy cookies pass the
test but 10% of the not crispy cookies also pass the test. 80% of all cookies are crispy. What is the probability that
a cookie is crispy given that it passed the test?
Solution: P(crispy | pass the test) = P(pass the test and crispy ) / P(pass the test).
How to compute P(pass the test and crispy): 80% of the cookies are crispy. Out of these, 80% pass the test. Thus,
0.8*0.8 = 0.64 (64%) cookies are crispy and pass the test.
How to compute P(pass the test): 64% of cookies are crispy and pass the test. Some not crispy all pass the test.
What proportion are not crispy and pass the test: 20% are not crispy and 10% of these pass the test = 0.2*0.1 =
0.02. Overall, 0.8*0.8+0.2*0.1 = 0.64+0.02 = 0.66 pass the test.
Overall: P(crispy | pass the test) = 0.64 / 0.66 = 0.969697
6
Probability theory
Typical question: It is determined that 25 of every 100 marketing students and 600 of every 1,000 finance
students wear glasses. In a room there are 400 finance students and 100 marketing students. What is the
conditional probability that a student in this room without glasses is a finance student?
i.e., calculate Prob(Finance student | No glasses).
Solution: P(Finance student | No glasses) = P(Finance student and No glasses ) / P(No glasses).
How to compute P(Finance student and No glasses ). 400/500 = 80% of the students in the room are finance
students. 60% of these have glasses and 40% do not have glasses. Thus, P(Finance student and No glasses ) =
0.8*0.4 = 0.32.
How to compute P(No glasses): 32% are finance students and don’t have glasses. 20% are marketing students in
the room and 75% of them do not have glasses, i.e., P(marketing student and No glasses ) = 0.2*0.75 = 0.15. Thus,
P(No glasses) = 0.8*0.4+0.2*0.75 = 0.47.
Overall: P(Finance student | No glasses) = 0.8*0.4 / (0.8*0.4+0.2*0.75) = 0.32 / 0.47 = 0.681
7
Binomial Distribution
Suppose that on each trial either a success or a failure occurs. If the
probability of a success is p, what is the probability that we get exactly k
successes in n trials?
We calculate this using the Binomial distribution.
We can use the formula or the table.
The table lists the probability that we get r or more successes in n trials,
when the probability is p.
For example, the probability that
we get 4 or more successes in 8 trials
when p = 0.35 is the number
= 0.2935
8
9
Binomial Distribution
Typical question: The probability of a success is 0.35. In 9 trials, what is the probability of getting between
(and including) 3 and to (and including) 6 successes?
Solution: We can calculate this as follows: we need the red part (3,4,5 or 6). We can calculate this by first
calculating: P(3 or more). Then we calculate: P(7 or more).
Prob(between 3 and 6) = P(3 or more) – P(7 or more).
Successes
Hits
0
1
2
3
4
5
6
7
8
9
Pr(7 or more successes)
= see the table on next side
= 0.0112
Pr = P(3 or more successes) =
See the table on next side
= 0.6627
Prob(between 3 and 6) =
P(3 or more) – P(7 or more) = 0.6627-0.0112 =
0.5512
10
Binomial Distribution
Binomial Distribution
The probability of getting exactly k successes in n trials, when the probability of a
success is p, is
𝑛 !
𝑝 (1 − 𝑝)”#!
𝑘
𝑛
!!
=
, where n! = n*(n-1)*(n-2)….2*1. For example, 5! = 5*4*3*3*1.
𝑘 #! !\$# !
Note that 0! is defined as 1.
Here
Typical question: The probability of a success is 0.35. In 9 trials, what is the probability of getting exactly 4
successes?
Solution: We use the formula
𝑛 !
“!
%!
𝑝 (1 − 𝑝)”#! =
𝑝 ! (1 − 𝑝)”#! =
0.35& (1 − 0.35)%#&
!! “#! !
&! %#& !
𝑘
%!
%∗)∗*∗+∗’∗&∗,∗-∗.
%∗)∗*∗+
,1-&
=
0.35& (0.65)’ =
0.35& (0.65)’ =
0.35& (0.65)’ =
0.35& (0.65)’ = 0.219386
&!’!
(&∗,∗-∗.)(‘∗&∗,∗-∗.)
(&∗,∗-∗.)
(-&)
12
The Normal Distribution
0.1
Probability
is very useful because averages tend to
0.15
0.05
If X is normally distributed, the the probability
that X is larger or smaller than some number c, Average Restaurant Review on Yelp
only depends on the mean and the standard
deviation
0
3.2
3.3
3.4
3.5
3.6
3.7
3.8
Average
Typical question: X is a normally distributed variable with average = 50 and standard deviation = 10. What is the
probability that x is above 55?
Solution: We standardize: (55-mean)/stdev = (55-50)/10 = 5/10 = 0.5.
We look up P(z>0.5) in the table.
We get P(z>0.5)=0.3085
A figure helps a lot:
0.51
13
13
Normal Distribution
Typical question: X is a normally distributed variable with average = 50 and standard deviation = 10. What is the
probability that x is below 45?
Solution:
We standardize: (45-mean)/stdev = (45-50)/10 = -5/10 = -0.5.
A figure helps a lot:
Because of symmetry
P(z < -0.5) = P(z > 0.5)
= 0.3085
-1
-0.5
0.5
1
14
Normal Distribution
Table
15
Confidence interval
Idea behind: Suppose you take a sample of 50 restaurants and calculate the mean review score in the sample. The
sample mean is 3.67 say. You know that the population mean is not exactly 3.67. The population mean could differ
from 3.67. How much could it differ? A confidence interval is one way of quantifying how much it could vary. You
say: the population mean will lie, 95 % of the time, between the lower limit and the upper limit. This is a 95%
confidence interval.
How to calculate it for a Sample mean:
The upper limit = sample mean + z*STEM
The lower limit = sample mean – z*STEM
Here z is the critical value:
z = 2 for an approximate 95% confidence interval, z = 1.96 for an exact 95% confidence interval
z = 3 for an approximate 99% confidence interval, z = 2.58 for an exact 99% confidence interval
STEM = standard deviation / sqrt(n), where n is sample size.
How to calculate it for a Proportion:
q = observed proportion in the sample.
The upper limit = q+ z*STEP
The lower limit = q- z*STEP
Here z is the critical value
STEP =
!×(\$%!)

16
Confidence interval
Typical question: An auditor of a small business has sampled 100 accounts. The sample mean is £435, and the
sample standard deviation is £86. Find an approximate 95% confidence interval for the average amount of all
accounts.
Solution: Sample size n = 100, sample mean m = 435, sample standard deviation s = 86, the confidence level is
approximate 95%, and the critical value z = 2.
STEM = s/sqrt(n) = 86 / sqrt(100) = 8.6.
A rough 95% confidence interval for the average amount of all accounts is
[mean-z*STEM, mean+z*STEM] = [435 – 2*8.6, 435 + 2*8.6] = [417.8, 452.2].
Typical question: A random sample of 100 preschool children in Coventry revealed that only 80 had been
vaccinated. Provide an approximate 95% confidence interval for the proportion vaccinated in Coventry.
Solution: : Sample size n = 100, sample proportion 80/100 = 0.8, the confidence level is approximate 95%, and
the critical value z = 2. We calculate STEP:
STEP =
(.*×(.+
\$((
=
(.\$,
(.-
= \$( = 0.04. Upper limit = 0.8+2*0.04 = 0.88. Lower limit = 0.8-2*0.04 = 0.72.
\$((
17
Confidence interval
Typical question: A University administrator wants to survey students about their evaluation of a new program.
The administrator believes that evaluations have a standard deviation of 1.2. How many students do they need to
sample to ensure that the width of an exact 99% confidence interval (critical value 2.58) is at most 0.4? Your
answer should be an integer. Use at least 4 decimals in all your computations.
Solution: We use the formula: 𝑛 =
+∗σ∗/ +
. σ is the standard deviation = 1.2. z is critical value = 2.58. w = width
0
of the confidence interval = 0.4. We insert this into the formula
2 ∗ 1.2 ∗ 2.58 +
𝑛=
= 15.48+ = 239.63
0.4
The answer should be an integer. Is the answer 239 or 240? It is 240, because if the sample size is smaller than
239.63, the conﬁdence interval would be wider than at 239.63 .
18
Hypothesis Testing
General idea: We asses whether a “null hypothesis” can explain the data by calculating:
What is the probability that we would observe the data we do observe if the null hypothesis was true?
Specifically, we compute: what is the probability that we would observe a result as different from the null
hypothesis as we observe or even more different.
For example: we observe an increase of 10 on average in sample. We ask: can such a change occur by chance even
if there is no change in the population? We calculate: what is the probability of observing a change of 10 or more
if there has not been a change in the population (i.e., the null hypothesis is true)?
How do we calculate these probabilities? We use the normal approximation: we rely on the fact that averages
(and proportions) tend to be normally distributed.
19
Hypothesis Testing
How to compute a hypothesis test for a mean:
We compute: t = (m – Null Hypothesis)/STEM, where m is the sample mean.
STEM = standard deviation / sqrt(n)
We compare t to a critical value, z. The critical value depends on the significance value.
5 % significance value = critical value 2 (or exact 1.96)
1 % significance value = critical value 3 (or exact 2.58)
Reject the null hypothesis if t < -z or if t > z.
How to compute a hypothesis test for a Proportion:
We compute: t = (q – Null Hypothesis proportion)/STEP, where q is the sample proportion.
STEP =
1×(\$%1)
. (observe: p = the null hypothesis proportion, we use this one when we calculate STEP)

We compare t to a critical value, z. The critical value depends on the significance value.
5 % significance value = critical value 2 (or exact 1.96)
1 % significance value = critical value 3 (or exact 2.58)
Reject the null hypothesis if t < -z or if t > z.
20
Hypothesis Testing
Typical question: Robin suspects that a coin is biased. She throws it 25 times and gets head 15 times. Using this
data, test the null hypothesis that the coin is unbiased, i.e., test the null hypothesis that the probability of getting
of head is 0.5, using a significance level of 5%. What is the value of the test statistic (the “t-value” or “t-statistic”)?
Solution: Here STEP =
1×(\$%1)

=
(.2×(\$%(.2)
+2
=
(.+2
+2
= 0.1.
t = (q – Null Hypothesis proportion)/STEP, where q is the sample proportion
= (0.6-0.5)/ 0.1 = 1.
Because t = 1 < 1.96, the null hypothesis is NOT rejected. 21 Hypothesis Testing Typical question: A study of a new blood pressure medicine, with 81 patients, found that the average reduction in blood pressure was 10 with a standard deviation of 30. Test the null hypothesis that the medicine did not change blood pressure using a significance level of 5%. What is the value of the test statistic (the”t-value")? Round your answer to one decimal place. Use at least 4 decimals in all your computations. Can you reject the null hypothesis at a 5% significance level? Solution: STEM = stdev/ 𝑛 = 30/ 81 = 30/9 = 3.333. Test statistic = (m - null hypothesis) / STEM = (10-0)/ STEM = 10 / 3.333 = 3. This test statistic should be compared to the critical value, which is 1.96 (exact) or 2 (approximate). Because 3 > 1.96, we can reject the Null Hypothesis at a 5% significance level.
22
Hypothesis Testing
How to compute a hypothesis test for a difference between two means, if the null hypothesis is zero:
3 !”
!

We compute: t = #\$%&’
where STEDM =
(!”
(“”
+
)”
)!
We compare t to a critical value, z. The critical value depends on the significance value.
5 % significance value = critical value 2 (or exact 1.96)
1 % significance value = critical value 3 (or exact 2.58)
Reject the null hypothesis if t < -z or if t > z.
Typical question: There are 64 Traders in firm A. On average their profit was equal to 5.6 with a standard deviation equal
to 11.4. There are 78 Traders in firm B. On average their profit was equal to 4.3 with a standard deviation equal to 8.9. Test
the hypothesis that the average profits of the two firms is equal, using a significance level of 5%. Can you reject the null
hypothesis of equal means at a 5% significance level?
Solution: STEDM =
(!”
(”
+ )” =
)
**., ”
../”
+ 0, = 1.704
-.

!
“! !””
1.0!,.2
Test statistic = #\$%&’ = *.-3, = 0.7629.
This test statistic should be compared to the critical value, which is 1.96 (exact) or 2 (approximate). Because 0.7629 < 1.96, we CANNOT reject the Null Hypothesis at a 5% significance level. 23 Anova: idea behind the test A. What we test • C. We Calculate Within, Between and Total Variability ANOVA tests the null hypothesis H 0: μ 1 = μ 2 = … = μ K That is, “the group means are all equal” – • 2 A B The alternative hypothesis is 2 or, “the group means are not all equal” B. Idea Behind Test SSTot= ∑ ( x − x ) 2 C 2 SSBet = nA ( x A − x ) + nB ( x B − x ) + nC ( xC − x ) H1: μi ≠ μj for some i, j – 2 SSWith = ∑ ( x − x A ) + ∑ ( x − xB ) + ∑ ( x − xC ) 2 2 SSTot = SSBet + SSWith 40" 35" 30" D. The test is based on the F ratio: 25" 20" 15" 10" F= 5" 0" 0" 5" Picture of10"three 15"groups20"of data25" 30" Unlikely to have same means if: Large Variation Between Means & Small Variation Within Groups MSBet = SSBet / (k −1) MSWith = SSWith / (n − k) If F > critical value, null hypothesis is rejected.
24
Anova: how to compute it
1. Calculate sums and sum of squares
2. Calculate:
Tj2 T 2
SSBet = ∑ −
n
j nj
2
T
SSTot = S 2 −
n
SSWith = SSTot − SSBet
Sum of all observations (Total) = T
Sum of all squared observations = S2
3. Calculate the F ratio:
F=
MSBet = SSBet / (k −1)
MSWith = SSWith / (n − k)
4. Test the null hypothesis of no differences
in means, at significance level a.
If F > critical value, hypothesis is rejected.
CRsignif (dfbet, dfwith ) = CRsignif (k −1, n − k)
25
Typical question: Given the data below, test the null hypothesis that the averages are equal for all
groups, using an Anova and a significance level of 0.05. What is the F-ratio? Keep at least 4 decimals in
all computations and round the final answer to 1 decimal place.
Group 1
5
8
Group 2
3
10
Group 1
Solution:
SUMS
Group 3
9
9
Group 2
Group 4
8
12
Group 3
Group 4
Obs
Obs^2
Obs
Obs^2
Obs
Obs^2
Obs
Obs^2
5
25
3
9
9
81
8
64
8
64
10
100
9
81
12
144
13
89
13
109
18
162
20
208
From Table above, we can calculate:
• Total = 13+13+18+20 = 64 ; Sum of Squares (SS) = 89 + 109 + 162 + 208 = 568
• SS(between) = (13*13/2) +(13*13/2)+ (18*18/2)+(20*20/2) – (64*64/8)= 19
• SS(Total) = 568- (64*64/8) = 56.
• SS(within) = SS(total) – SS(between) = 56-19= 37.
• MS(Between) = 19/(4-1) = 6.333
• MS (Within) = 37/(8-4) = 9.25
‘#
0.222
• 𝐹 = ‘##\$%&\$\$’ = /.41 = 0.6847

&(%)(‘
Typical question: Can the null hypothesis be rejected at 5% significance?
F distribution, 0.05 significance level
k-1 = 4-1 = 3
n-k = 8-4 = 4
Critical value is 6.59
Because F ratio (0.7) is below 6.59, the null
hypothesis cannot be rejected.
Two Factor Anova: idea behind the test
Now 2 factors are varied (previously we considered one factor).
For example, we may have data on performance for firms using
two different hiring approaches and two different quality
management approaches:
!
!
!
!
!
Intuitive!
Quality!
Management!
Statistical!
Hiring!Approach!
!
More!
Educated!
6!
5!
7!
11!
14!
12!
Less!!
Educated!
8!
7!
9!
1!
3!
2!
!
!
!
!
!
Intuitive!
Quality!
Management!
Hiring!Approach!
!
Less!!
More!
Educated!
Educated!
!
!
xI,LE = 8
Statistical!
!
Main Effect of
Hiring Approach
MSH = SSH / (kH −1)
F=
2
MSWith
Main Effect of
Quality Approach
!
xS,ME =! 12.3 xS = 7.17
xLE = 5
xME = 9.17
!
2
Interaction Effect
F = MSH*Q = SSH*Q / (kH −1)(kQ −1)
MSWith
We can also examine the interaction effect: Does the benefit of a
statistical approach to quality management vary with how
educated the workforce is?
14″
8″
7″
10″
7″
6″
8″
Intui.ve”
6″
Sta.s.cal”
4″
4″
Intui0ve”
3″
Sta0s0cal”
2″
2″
1″
0″
0″
1″
2″
3″
0″
0″
1″
2″
3″
4.5″
7″
9″
4″
6″
8″
3.5″
6″
5″
5″
Intui1ve”
4″
Sta1s1cal”
3″
2
2.5″
2″
1.5″
1″
1″
0″
1″
2″
3″
Intui.ve”
Sta.s.cal”
2
2
Intui/ve”
3″
Sta/s/cal”
2″
0″
1″
2″
3″
Sta1s1cal”
3″
1″
0″
0″
0″
Intui1ve”
4″
2″
1″
0.5″
5″
0″
1″
2″
3″
0″
1″
2″
2
+nS,LE ( xS,LE − x ) + nS,ME ( xS,ME − x ) − SSH − SSQ
6″
4″
2
SSH*Q = nI. LE ( x I,LE − x ) + nI,ME ( x I,ME − x )
7″
5″
3″
2″
0″
SSQ = nI ( x I − x ) + nS ( xS − x )
No Interaction Effect: Effect of Factor 1 independent of factor 2
Total effect = Factor 1 effect + Factor 2 effect
9″
8″
12″
2
MSQ = SSQ / (kQ −1)
F=
MSWith
Given this data, we can examine the effect of each factor (Is it
better to hire more educated workers? Does a statistical quality
management approach lead to better results?)
Interaction Effect: Effect of Factor 1
depends on Factor 2
SSH = nLE ( x LE − x ) + nME ( x ME − x )
xI = 7
xI,ME = 6
xS,LE = 2
!
!
Generally: test main effects and interaction effects by computing
between and within variability
3″
28
Two Factor Anova: how to compute it
1. Calculate sums and sum of squares
2. Calculate Sums
T2
SSTot = S −
n
THj *THj T 2
TQj *TQj T 2
SSH = ∑
− , SSQ = ∑

n
n
n
n
Hj
Qj
j∈H
j∈Q
2
THiQj *THiQj T 2
SSH*Q = ∑

− SSH − SSQ
nij
n
i, j
Deduct these!
Interaction = variability
not due to main effects
SSWith = SSTot − SSH − SSQ − SSH*Q
3. Calculate the F ratios:
MSH = SSH / (kH −1)
FH =
MSWith = SSWith / (n − kH kQ )
FQ =
MSQ = SSQ / (kQ −1)
MSWith = SSWith / (n − kH kQ )
MSH*Q = SSH*Q / (kH −1)(kQ −1)
FH*Q =
MSWith = SSWith / (n − kH kQ )
4. Test hypotheses
If Fi > critical value, null-hypothesis is rejected.
Critical values:
CRH ,signif (kH −1, n − kH kQ )
CRQ,signif (kQ −1, n − kH kQ )
CRHQ,signif ((kH −1)(kQ −1), n − kH kQ )
29
Two Factor Anova: typical question
Deduct Ss_a
and SS_b
Solution:
1 = ka-1 = 2-1, 12 = n-ka*kb = 16-2*2
1 = kb-1 = 2-1, 12 = n-ka*kb = 16-2*2
1 = (ka-1)(kb-1) = (2-1)(2-1),
12 = n-ka*kb = 16-2*2
30
Mann-whitney test
Idea behind the test: If the null hypothesis that the two samples
come from the same distribution is true, the rank scores (ranging
from 1 = the lowest to 20 = the highest) should be evenly
distributed among the two samples.
It would be unlikely that all the low values are in sample 1 if the
null hypothesis is true.
For example: suppose we have 2 modules with 30 students in
each (thus 60 students overall). Suppose the height of the
students are drawn from the same distribution. How likely is it
that all the shortest 30 students end up in module 1? Not very
likely.
Note that if all the shortest 30 students end up in module 1, we
know that all the tallest 30 students end up in module 2. Thus,
we only need to look at one of the two samples to tell if the
distribution of the rank scores is uneven.
How to compute it:
1.
2.
Rank all scores (in both lists) from the lowest (Rank = 1) to the
highest (score N = total number of observations)
Sum the Ranks for each list. The rank sums are:
n1
n1
∑ R and ∑ R
3.
i,2
i,1
i=1
i=1
Calculate an U-value for each sample using the following
formulae:
n1
n (n +1)
U1 = ∑ Ri,1 − 1 1
2
i=1
n2
n (n +1)
U2 = ∑ Ri,2 − 2 2
2
i=1
Here n1 is the number of observations in list 1 and n2 is the number
of observations in list 2.
4. Compare the smaller U-value with the critical value in a Table for the
Mann-Whitney test. The calculated value must be equal to or smaller
than the table value for significance.
31
Mann-whitney test
Mann-whitney test
Typical question: Using the data below, test the hypothesis that there is no difference
between the two samples using a Mann-Whitney test and a significance level of 0.05. What
is the value of the test statistic (the lowest u value)? Can you reject the null-hypothesis?
Sample 1: 44 87 82 9 86 25 72 81
Sample 2: 73 13 49 12 57 15
Solution: Total 8+6 = 14 observations
Rank all observations from 1 (lowest) to 14 (highest).
Sample 1
Ranks
Sample 2
Ranks
44
6
87
14
82
12
9
1
86
13
25
5
73
10
13
3
49
7
12
2
57
8
15
4
The sum of ranks for sample 1 is: 71.
The U value for sample 1 is: 71 – 8*9/2 = 71 – 36 = 35
The sum of ranks for sample 2 is: 34.
The U value for sample 2 is: 34 – 6*7/2 = 34 – 21 = 13.
The smallest U value is: 13
72
9
81
11
Sum is 71
Sum is 34
The critical value is: 8
The smallest U value (13)
Is not smaller or equal to 8. Thus, we do
not reject the null hypothesi
33
Chi-square test
Critical values: df = (number rows -1)*(number columns-1)
Idea behind test: we want to test if two dimensions are independent (not associated). Null
hypothesis: not associated.
For example, is the proportion of students who pass the test independent of whether the
students come from Sweden?
From Sweden
Not from Sweden
Pass exam
20
45
Do not pass exam
10
15
There are 20+10+45+15 = 90 students.
30/90 are from Sweden.
65/90 pass the exam
If passing the exam was independent of being from Sweden, the proportion of students who
pass the exam AND are from Sweden would be:
P(pass exam)*P(Sweden) = (65/90)*(30/90)= .
The number of students, among the 90, who passed the exam and were from Sweden would
thus be (60/90)*(30/90) *90 = (2/9)*90 = 21 2/3. This is different from what we observe
(which is 20).
In this test we compare the expected numbers, in each “box”’, with the observed.
Expected in the box ”Sweden and pass exam” = (60/90)*(30/90) *90
This can be written as = (60*30/90)
To see if the observed differ from the expected we compute:
(observed-expected)^2 and divide with expected.
We do this for every “box” and sum these numbers.
We reject if the sum is large, because in this case the observed differ a lot from the expected
(and expected = expected if the null hypothesis of no association is true).
34
Chi-square test: typical question and how to compute it
35
Covariance
Definition:
cov(x, y) =
1
∑( x − x ) ( y − y )
n
Y
Y
Y
Average x
Correlation
Definition: rxy =
Stdev of y
Stdev of x
Y
Y
The correlation coefficient
is between -1 and +1
cov(x, y)
sx sy
Y
Y
Y
Y
Y
Y
Y
X
X
X
X
X
X
∑ xy − x * y
cov(x, y) =
Positive association:
Cov > 0
Negative
association
No
association
Cov < 0 Cov = 0 Simpler Formula: cov(x, y) = n Simpler Formulas: ∑x − x * x 2 Sx = n ∑y − y * y S = 2 ∑ xy − x * y y n n Positive association: Negative association No association Correlation > 0
Correlation < 0 Correlation =0 Perfect linear association and positive slope implies correlation coefficient = +1 150 Typical question: The data shows the performance (y) and amount of practice (x) for 4 individuals. Based on this data, compute the Covariance. Typical question: The data shows the performance (y) and amount of practice (x) for 4 individuals. Based on this data, compute the correlation coefficient. Solution: Solution: 100 50 0 0 50 100 150 36 Regression Problem: finding the best fitting line Typical question: The following table shows profitability (Y) and investment in online advertising (X) for five firms. What is the value of the slope (b) of the regression line, Y = a+bX? What is the value of the intercept? Firm 1 Advertising 3.0 Profitability 7.0 2 11.0 9.0 3 8.0 9.0 4 4.0 3.0 5 6.0 8.0 Solution: Computations can be done as follows: Solution ŷ = â + b̂x b̂ = SSxy â = y − b̂x SSx Computations: N SSxy = ∑ (xi − x )(yi − y ) i=1 # N &# N & % ∑ xi (% ∑ yi ( N \$ '\$ i=1 ' = ∑ xi yi − i=1 N i=1 obs 1 2 3 4 5 Sum Mean X 3 11 8 4 6 32.00 6.40 X*X 9 121 64 16 36 246.00 Y 7 9 9 3 8 36.00 7.20 X*Y 21 99 72 12 48 252.00 Y*Y 49 81 81 9 64 284.00 N SSx = ∑ (xi − x )2 i=1 # N &# N & % ∑ xi ( % ∑ xi ( N '\$ i=1 ' 2 \$ i=1 = ∑ xi − N i=1 37 Typical question: Given the data below compute the R-square coefficient. R-square coefficient How well does the line fit the data? It fits well if the sum of the squared distances is small obs 1 2 3 4 obs 1 2 3 4 Sum Mean ei2 = (yi − ŷi )2 N N ∑ e = ∑(y − ŷ ) 2 i i=1 i 2 i y 3 6 2 8 Solution: Computations can be done as follows: Squared distance between observation i and line: The line fits well if the sum of the squared distances x 2 3 4 7 x 2 3 4 7 16.000 4.000 x*x 4 9 16 49 78.000 y 3 6 2 8 19.000 4.750 x*y 6 18 8 56 88.000 y*y 9 36 4 64 113.000 First, we need to compute the slope (b) and the intercept (a). The formula for the !! slope is: ! = !!!". To calculate b we compute: ! ∑#" !! = 1 − %\$\$! & we need to compute ∑(%! and **& . # ∑(%! = ('! + (!! + ((! + ()! = (∑&)(∑() *+∗*+ = 88 − * . + = 88 − 76 = 12, and %%" = ∑' / − ) (∑&)(∑&) *+∗*+ !! */ * ) + = 78 − * . + = 78 − 64 = 14. Thus, the slope is !!!" = *. ≈ 0.857. ! 0.0016 + 4.44 + 7.563 + 0.46 = 12.464. (∑+)(∑+) Also, **& = ∑4 ! − % - & = The intercept is: 6 = (7 − !'̅ = 4.75 − 0.857 ∗ 4 ≈ 1.321. 113 − % Second, we compute the predicted y values ((

Don't use plagiarized sources. Get Your Custom Essay on
Quantitative Methods for Business Probability Distribution Regression Questions
Just from \$13/Page
Calculator

Total price:\$26
Our features

## Need a better grade? We've got you covered.

Order your essay today and save 20% with the discount code GOLDEN