# DAV Public School Statistics Worksheet & Report Paper

Student Name:___________________Table 2 Rubric

Checklist for “Pass”:

Table dimensions are comparable to the one in Ben’s lecture notes, showing between 4 and 6

different specifications (1 estimate per column) and enabling the reader to compare the 𝛽𝛽̂ s of interest

by reading across one row.

Standard errors are shown on separate rows and in parentheses for each reported coefficient.

Estimates or variables are scaled to prevent padding zeros and scientific notation. Y and X variables

are in functional form noted in topic approval, e.g., logarithms, z scores.

Number of digits reported on estimates is either 3 or 4. If you think your variables present an

exception to this rule, please discuss it with Ben before making your Table.

The lower portion of the table has a row that enables the reader to differentiate the estimates

according to what else is included in the model, like the example below.

o incomplete or poorly formatted, you earn a “low pass” of ½ or “fail” of 0 points.

Table has a Title.

Labels, including rows and cell contents, like those on the template below are tailored to the

individual student’s topic.

Row labels are informative and enable the reader to interpret the 𝛽𝛽̂ estimates (“what is a 1 unit

change?”).

o Row labels that don’t clearly communicate the units and variable definition(s) earn a low pass.

Table has a caption that explains the cells, e.g., what controls are included in “All”?

The overarching objective is to make the Table self-contained. I should be able to look at your Table

without looking up your topic and know what is being regressed on what and how to interpret the

estimates. Anything that isn’t obvious about the sample or the units from looking at the body of the

Table should go in a caption.

Table 2: Label this one “Table 2: Regression Estimates” in your Write-Up

Coefficient

estimate

Controls

Sample Size

Adjusted 𝑅𝑅 2

a

b

̂

̂

𝛽𝛽1|𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑂𝑂𝑂𝑂𝑂𝑂 𝛽𝛽1|𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠. 𝑏𝑏

(𝑠𝑠. 𝑒𝑒. )

(𝑠𝑠. 𝑒𝑒. )

None

Age

c

̂

𝛽𝛽1|𝑐𝑐

(𝑠𝑠. 𝑒𝑒. )

Age

and

state

d

̂

𝛽𝛽1|𝑑𝑑

(𝑠𝑠. 𝑒𝑒. )

All

Econ 360 Table 2

Brief: I’m writing about the effects of the use of vehicular transport (measured in time spent commuting

to work) and the extent of its effect on a population’s physical fitness and, more specifically, their levels

of obesity (BMI>30).

a

c

d

Coefficient .0070269

-.0030338 -.0022855

estimate

(.0385116) (.0406203) (.0385945)

e

-.0028577

(.0384676)

f

.0132067

(.0410755)

Controls

None

Percent

Access to

Exercise

Percent

Excessive

Drinking,

Percent Access

to Exercise

Percent

High

School

Graduates,

Percent

Excessive

Drinking,

Percent

Access to

Exercise

Adjusted

R2

-0.0107

-0.0149

0.0838

0.0900

Household

Income,

Percent

High

School

Graduates,

Percent

Excessive

Drinking,

Percent

Access to

Exercise

(All)

0.0923

Y=PercentObese=f(log(CommuteTime), PercentAccesstoExcercise, PercentExcessiveDrinking,

PercentHighschoolGraduates, log(HouseholdIncome))

(PercentObese)=β0 + β1log(CommuteTime) + β2(PercentAccesstoExercise) +

β3(PercentExcessiveDrinking) + β4(PercentHighschoolGraduates) + β5log(HouseholdIncome)

Where, in the given county,

Y=PercentObese= Percentage of adults that report a BMI of 30 or more

X1=log(CommuteTime)= Log of the mean travel time to work (minutes)

Control Variables:

X2=PercentAccesstoExercise= Percentage of population with adequate access to locations for physical

activity*

X3=PercentExcessiveDrinking= Percentage of adults reporting binge or heavy drinking**

X4=PercentHighschoolGraduates= Percentage of ninth-grade cohort that graduates in four years

X5=log(HouseholdIncome)= Log of the Median Household Income (US Dollars)

Table 2: Union Rates Effects on Income Inequality

a

Coefficient

estimate

-0.0031

Standard error

0.000742

b

Year

1990

2000

Year

1990

2000

c

-0.0064

0.0632

Year

1990

2000

0.00029

0.07834

0.0111

0.0119

Year

1990

2000

0.01192

0.01546

Observations

152

152

152

Controls

Simple

Year Indicators

Year and

Unemployment

Adjusted R2

0.0935

0.2920

0.2983

GDP and Happiness

a

b

c

d

Coefficient

estimate (B1)

0.19

(.049)

0.2

(.049)

0.19

(.045)

0.19

(.040)

Controls

None

health

health &

cash

all

Adjusted R^2

0.245 0.2545

0.3799

0.505

Included in All are the controls of health (the feeling

of how health the person is feeling) which is coded

as 1 (healthy) 0 (not healthy) cash (have they gone

without a cash income in the past 12 months) which

is coded as 1 (yes) and 0 (no) and lastly technology

(how developed they see their country) which is 1

(yes developed) and 0(no). As you can see the

adjust R^2 increases as the new controls are added

into the regression. Also education is included in the

Simple OLS as B1.

“Table 2: Regression Estimates”

̂

̂

̂

ln(𝑗𝑗𝑗𝑗𝑗𝑗) = 𝛽𝛽0 + 𝛽𝛽1 ln(𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡ℎ) + 𝛽𝛽2 ln(ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) + 𝛽𝛽̂3 ln(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) + 𝛽𝛽̂4 (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎) + 𝛽𝛽̂5 (𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢) + 𝜇𝜇

A

B

C

D

“ln(teach)” (𝛽𝛽̂1 )

.1574

-.4911

-.4877

-.4270

(.4133)

(.4353)

(.4286)

(.4437)

% Change on Juvenile

Crime

–

–

–

–

Controls

None

officers

officers, house

All

Observations

58

58

58

58

Adjusted 𝑅𝑅 2

0.00

.1248

.1515

.1273

Table 2 summarizes the regression of juvenile crime rate on the ratio of average K-12 teacher salaries

and per capita income. Column A describes the simple linear regression with no controls. Column B

describes the regression that controls for “officers”. Column C controls for both “officers” and “house”.

Column D controls for all variables stated in the model. The standard errors are provided immediately

below the coefficient estimates. The implied % change on juvenile crime rate is vacant because the

variable of interest was found to be insignificant at 𝛼𝛼 = .1 for all combinations of control variables. A key

for each variable is provided below:

𝑗𝑗𝑗𝑗𝑗𝑗 = 𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝 100,000

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡ℎ =

𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝐾𝐾 − 12 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖

ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖

𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 =

# 𝑜𝑜𝑜𝑜 𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗

# 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜

𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = # 𝑜𝑜𝑜𝑜 𝑐𝑐ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝 1000 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎

𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟

QB

WR

TE

RB

OL

LB

DL

DB

Position Group Effect On Point Differential

a

b

c

34.782

20.326

23.069

(19.054)

(18.960)

(19.639)

35.610

70.645

(33.836)

(36.961)

11.652

43.504

(26.733)

(28.258)

17.701

6.243

(30.440)

(30.812)

134.818

79.187

(44.824)

(47.674)

84.109

107.903

(36.460)

(37.427)

27.085

0.499

(41.169)

(42.631)

64.346

94.554

(44.797)

(46.036)

point diff(t-1)

# Teams

# Seasons

n

0.371

(0.082)

32

5

160

Adj R^2

Controls

d

58.365

(33.170)

58.887

(42.761)

11.284

(39.454)

24.278

(36.057)

65.629

(57.186)

60.782

(53.166)

38.120

(54.668)

169.004

(57.934)

0.015

None

32

5

160

0.090

32

4

128

0.219

32

4

128

0.092

Other HG

HG

HG

proportions proportions proportions

Lag diff

1st Differenced No

No

No

Yes

position group (per team, per year)=homegrown/mercenary

HG=Homegrown: % of players on a team, drafted by that team

point diff=point differential

Table 2: Regression Estimates

Effect on log average tournament winnings

a

b

c

d

0.00438

(0.00168)

-0.00169

(0.00149)

0.00462

(0.00175)

-0.00112

(-0.00154)

Controls

None

Rank

Age

Age and Rank

Adjusted

R^2

0.0439

0.443

0.0951

0.464

Coefficient

estimate for

Saved Break

Points

This table shows the coefficient estimates for a player’s total number of

saved breakpoints in a year when regressed on log of the player’s

average tournament winnings in dollars for the following year. The

different control variables are ATP year end ranking, expressed as a

non-linear indicator variable with 4 groups, and the player’s age,

expressed as a non-linear indicator variable with 3 groups.

Table 2

Coef.

S.E.

totalcredit

(response)

(response)

Variables

annualincome employment creditaccounts bankruptcies debttoincome

0.15

371.84

1642.28

513279.66

25505.64

0.005

86.2

37.91

816.288

3644.61

totalcredit is the total amount of available credit measured in USD, annualincome is total

annual income measured in USD, employment is the amount of years of employment,

creditaccounts is the total number of credit accounts the person has, bankruptcies is the

total number of times a person has filed for bankruptcy, and debttoincome is the person’s

debt as a fraction of their income.

Coef.

S.E.

Controls

Adj. R^2

1

2

3

10795.96

1554.92

230.72

2526.513

2292.93

2254.82

None

creditaccounts creditaccounts All

bankruptcies

0.0017

0.1838

0.2114

4

25505.64

3644.61

0.2841

This table shows the effect the other variables have on the debttoincome variable. In

column 4 “All” refers to all other variables listed above.

Table 2: Regression Estimates

Effect on Inflation Rate

Coefficient

estimate

Controls

A

-0.0168

(0.007)

None

B

-0.0175

(0.007)

Investment Ratio

Adjusted R2

Year

Observations

0.0280

2014

165

0.0292

2014

165

C

0.0011

(0.005)

Investment Ratio

& Literacy Rate

0.5421

2014

165

D

0.0028

(0.004)

All

0.6986

2014

165

This table contains the information of estimating the effect on “Inflation Rate” based on

different control variables. All of the data are from 2014, and there are total 165

observations. “All” includes “Investment Ratio”, “Literacy Rate” and “Life Expectancy”.

The Adj. R2 for column D is 0.6986, bigger than the value in column C, which is

considered as a better model. We can expect that “Life Expectancy” has a strong, positive

relationship with the dependent variable, ln(GDPpc).

Effect on Log (Average annual working hours Female)

a

b

c

d

Coefficient estimate

0.0238655

0.0176315

0.0180924

0.018539

(standard error)

(0.0060768)

(0.0121407)

(0.0120469)

(0.0121657)

t

3.93

1.45

1.50

1.52

Controls

None

Only regress on Average

Marriage age of Male

Average Marriage age of

Male

Average Marriage age of

Male and Fertility Rate

All

Average Marriage age of

Male and Fertility Rate and

Education Level

observations

55

55

55

55

2

Adjusted R

0.2108

0.3719

0.3820

0.3730

Estimates of this model used national data of 55 countries. Standard errors are cluster robust around each panel (Annual Working HrsMarriage Age of Female).

Column (a) shows the effect of Average marriage age of female on Annual working hours. Column (d) shows the regression when controlling the effects of Average

Marriage age of Male, Fertility Rate and Education Level on our independent variables. The emphasis in these tables is on the source of Annual working hours and

Marriage age of females. This supports the hypothesis that, one year older for the marriage age of female will lead to 1.8539% change on annual working hours

supplied per female, controlling all other factors.

1

Estimates of model using government data to determine the effects of the amount of poverty on

gun-violence. Data collected on county level in the United States, except for state-wide gun laws

(i.e. permits). The various control variables represented: lawpc is the number of full-time

employed law enforcement officials per county, lawpcsquared is “lawpc” squared, permit is an

indicator variable of whether or not the state the county is located in requires some form of

permit for purchase of a handgun, and lastly, lpop, which is the natural log of the population

density in each county. “All” includes all four mentioned controls. The coefficient estimate

remains very significant as additional controls were added, and improved the overall accounting

of variation.

ECON 360: Econometrics.

: Purdue University.

Table 2: Regression Estimates

Effect on Number of Fast Food Restaurants

Median Household Income (β1)

Implied % Change

Housing Density (β2)

Implied % Change

Grocery Stores Per Capita (β3)

Implied % Change

Constant

Observations

Controls

Adjusted R2

1

.3491

(.0358)

34.9%

.0228

(.0083)

2.28%

-.0276

(.0099)

-2.76%

-11.25

(.3832)

2957

All

.0333

2

.3435

(.0354)

34.3%

.0050

(.0053)

0.5%

-11.19

(.3797)

2957

Housing Density

.0304

3

.3538

(.0358)

35.3%

4

.3466

(.0354)

34.6%

-.0066

(.0064)

-.66%

-11.27

(.3829)

2957

Grocery Stores

.0316

-11.21

(.3789)

2957

None

.0309

This model uses cross-sectional county based data from the 2013 Census. All variables are reported as natural logs.

Control ‘All’ refers to housing density and grocery stores per capita being controlled for in the regression. The emphasis

in this table is on the effect of median household income on the number of fast food restaurants. The results do not

support the hypothesis that median household income is positively related to the number of fast food restaurants.

Intercept

s.e

Yrs_SinceFinals

s.e

Model1

Model2

Model3

Model4

Model5

18243.96

1654.129

1772.591

1832.736

2282.452

235.53

749.2729

779.8518

806.3994

849.5323

-‐25.6603

6.356

Attend

s.e

-‐8.901658

3.127091

-‐9.003631

3.139585

-‐9.07386

3.15765

-‐8.520247

3.159465

176.0905

7.862874

173.0709

9.528582

173.9054

9.936636

170.3422

10.12902

322.2021

571.4479

361.7113

587.4731

-‐124.9011

658.2046

-‐2.16E-‐06

7.04E-‐06

-‐2.89E-‐06

7.02E-‐06

Win

s.e

Sal_Spent

s.e

Allstar

s.e

Adj. R^2

n

181.7705

113.1937

0.0931

0.793

0.7921

0.7908

0.793

150

150

150

150

150

Coefficient estimate β1 simple OLS (s.e.) β1 spec. b. (s.e.)

β1 c. (s.e.)

β1 d (s.e.)

β1 (gross)

17100000 (5725319 ) 11800000 (4113776 ) 21600000 (5616624) 12900000 (4405705 )

Controls

None

Budget

Film Genre

All

Adjusted R2

0.0524

0.5166

0.1742

0.513

Gross is the total revenue in U.S. dollars a film garnered at the box office. Budget is the film’s budget in U.S.

dollars. Film Genre is the genre of each film included in the data, and includes: Adventure, Action, Comedy,

Documentary, Drama, Horror, Romantic Comedy, Thriller/Suspense, and Western.

Econ 360 project data table 2

variable name type

format

label

variable label

month

The total number of month of enrollment for each student upon graduation.

gpa

Accumulated gpa during college.

pocket

The amount of payment from student’s own pocket.

work

The total amount money earned from work and study during college.

scholarship

The total amount of scholarship or grants received during college.

tuition

Tuition and fees for a student who takes the same number of credits.

familyloan

Total amount of loan received from family members.

timeofloans

The number of times that student received loan from relatives or friends.

major

The students who have double major. Major = 1: one major.

credits

The number of credits needed to graduate.

Effect on longitude of graduation.(months)

1 Only

2Add

3Only

4All other

5 With

6 Full

GPA

Scholarshi

Scholarshi

means of

Major

model

p

p

support

and

Credits

GPA(B1)

-.064808

-.0816965

NA

NA

NA

-.1153693

1

(.1258316

(.12627)

(.127266

)

T=-0.91

2)

Scholarship(B

NA

2)

-.0000862

-.0000849

NA

NA

-.0000765

( .000035

(.0000354

(.0000399

5)

)

)

T=-1.92

Tuition(B5)

NA

NA

NA

.0002067

NA

( .000268

.0000716

( .000267)

7)

Familyloan(B6

NA

NA

NA

)

-.0005742

NA

( .000982)

-.0003948

( .000527

9)

Timesofloan(B

NA

NA

NA

7)

Major(B8)

NA

NA

NA

-1.163051

NA

-1.233999

(1.603061

(1.567671

)

)

NA

4.50453

1: one major,

4.839546

(1.720534

0:two major

(1.66365)

)

T=2.62

Credits(B9)

NA

NA

NA

NA

-.001331

-.0030398

8

(.0157617

(.015401

)

7)

Pocket(B2)

Work(B3)

NA

NA

NA

NA

NA

NA

-.0005057

NA

-.000168

(.0009916

(.0009788

)

)

-.0003322

NA

.0000406

( .000599

(.0006528

9)

)

Test of

t=-0.51

T=-2.4

None of

Major:

Two

significant

Pr=

Pr=0.017

the X has

T=2.91

variables

t value

Pr=

are

greater

0.004

significant.

0.611

than 1.

Major and

scholarshi

p.

R^2

0.0014

0.0319

0.0297

0.0131

0.0433

0.0772

Adjusted R^2

-0.0039

0.0216

0.0246

-0.0137

0.0331

0.0310

Effect on Sleep Hours

Model

a

b

c

d

Coefficient

estimate

-0.322

-0.322

-0.1667947

-0.1727768

Standard error 0.0180234

0.0163025

0.0366734

0.0361425

t

-17.87

-19.75

-4.55

-4.78

Control

None

Gender

Obsercation

72

72

Gender and

working

hours

72

All

Gender,working

hours, and year

72

Adjusted R^2

0.8176

0.8507

0.8845

0.8887

*Model

a. Sleep Hours = 𝛽𝛽0 + 𝛽𝛽1(Education)

The data of sleep hour are average hours among people, the proportion of genders are

different so that data are not gender weighted ones which might cause bias.

b. Sleep Hours = 𝛽𝛽0 + 𝛽𝛽1(Education) + 𝛽𝛽2(Gender)

As gender is dealt as binary data so that the coefficient of Education doesn’t change.

c. Sleep Hours = 𝛽𝛽0 + 𝛽𝛽1(Education) + 𝛽𝛽2(Gender) + 𝛽𝛽3(Working Hours)

Similar with Model a, the data of working hours are based on average working hours

among people, which may cause bias.

d. Sleep Hours = 𝛽𝛽0 + 𝛽𝛽1(Education) + 𝛽𝛽2(Gender) + 𝛽𝛽3(Working Hours) + 𝛽𝛽4(Year)

As years in which the data were collected are different, the proportion of genders

might change as time goes by, which can cause bias.

* Definition

Sleep Hours: The average sleep hours in a day of people over 25 in the same education level and

same gender in the US in different years, which is measured in hours with two decimals.

Education: Educational attainment of people in US. There are four categories which are “less

than a high school diploma”, “high school graduate bur no college”, “some college or associate

degree”, and “bachelor degree and higher”. And 1 to 4 are used to represent education level from

low to high. (e.g.: 1 represent a less than high school diploma)

Gender: Male or Female. 0 represents male and 1 represents female.

Working Hours: Average hours spent in a day by people over 25 on working and working

related activities in the same education level, same gender in the US in different year, which is

measure in hours with two decimals.

Year: The year in which data was collected. The years I choose is from 2006 to 2014.

Table 2: Regression Estimates:

Effect on the Ranking Level for the University

A

B

C

D

E

Coefficient

estimate

-0.0131

(0.0024)

-0.0176

(0.0027)

-0.0191

(0.0029)

-0.0199

(0.0029)

-0.0241

(0.0034)

Controls

None

Tuition

Tuition &

Mid-Salary

All

Adjusted 𝑅2

0.2190

0.2833

0.2862

Tuition, MidSalary &

Location

0.3251

0.3490

Year

2014

2014

2014

2014

2014

Observations

108

108

108

108

108

This table contains the results of estimating the effect of the amount of international

applications to a university. Totally 108 data are from 2014. “All” means the

variables—tuition, mid-salary, location and SAT requirement—are all be controlled

in Model E. Comparing the adjusted 𝑅 2 s, Model E is the batter model. In addition,

based on the results of coefficient estimates, the university’s ranking level has a

negative effect on the amount of the international applications to the university.

Section 1: Summary

People argue that employees with higher education, whether the person is male or female,

the ethnicity of the worker have a huge effect on wage rates. The non-binary dependent variable (y)

in this economic scenario is wage; (y= a + log(x), where x= 1,2,3 …). This is because the wage is

affected by other independent variables, including education and experience. Thus, it depicts the

total monthly earnings of an employee in preferred currencies and is in quantitative form.

Thus;

y=f (x1 + x2 + x3 +…)

That is;

y = wage = f (education, age, sex, ethnicity).

The “x” that will be the focus for the task’s casual analysis will be education. This

independent variable has a vast effect on the dependent variable y in the regression expression. It is

the paper’s casual analysis since it exhibits a fascinating relationship. Notably, education,

experience, sex and ethnicity show interesting outcomes whenever correlated with the non-binary

dependent variable. Indeed, a change in one of the independent variables will significantly affect the

results for the dependent variable. This relationship vastly interests the student because all the

independent variables contribute to the outcome of the dependent variable. Students and individuals

looking for job opportunities must understand and embrace the relationship since it will affect how

much they earn every week. Those with higher education and vast experience in the job market will

earn more than those with low education and less work experience. Besides, the unit of observation

for gathering relevant information would be in the job market, particularly in the workplace. The

row in the data set comprises the observations of the relevant data for analysis.

Section 2: FAQs

1. What is the causal relationship of interest?

The above regression equation has a causal relationship of interest, considering the effects of

the independent variables on the dependent variable. Ideally, a change in an independent variable

will cause a change in the dependent variable. For instance, an employee with high education, vast

work experience, being male and being white will earn a high weekly salary. This implies that high

education, extensive work experience, being male and being white cause employees to earn a lot of

money. Such a causal relationship makes it possible for people willing to seek job opportunities to

get an education and gain work experience to receive well-paying jobs. Individuals without these

qualifications will receive low salaries at the end of each week. High education, vast work

experience, age, sex, and ethnicity are attributable to increased productivity, profitability, and

competitiveness in the employment sector. Thus, employees with these fundamentals are preferred

most in all organizations.

2. What would be the ideal controlled experiment to test #1?)

Randomized controlled experiment is the ideal technique for testing the causal relationship

of interest. Indeed, after randomly selecting a given variable (education), it will be possible to hold

the other variables constant and determine its relationship with the dependent variable. The

controlled experiment will show how one variable (independent variable) in a data set has a direct

impact on another variable (dependent variable).

Fall 2022 – Page 1

Economics 360 Data Analysis Project

For this project, students will apply the methods from class to a real set of data. Below are the

milestones at which students are expected to have tangible progress towards completion.

Critical Due Dates:

September 25, 2022: Summary of topic and first 2 FAQs due.

October 30, 2022: Due date to present data set (video) and “working” regression model.

November 20, 2022: Formatted Table 2 due.

December 9, 2022: Final project due.

1. Pose a question. What interests you? Your data set and hypotheses do not have to have

obvious Economics overtones, so if you want to study sports or entertainment, that’s okay. Just

make sure you can find data on the topic of interest. For example:

• Your friend says that “for clothing brands, being featured prominently in popular movies has

a huge effect on the sales of the brand.” What is the causal relationship of interest? Sales

revenue is increased (caused) by the brand’s visibility in movies, ceteris paribus. You

should be able to find data on the sales of various clothing brands and the timing and

popularity of movies in which they were featured. If you find significantly higher sales for

brands after the movies are released, you can go back to your friend and say, “Aha! You

don’t know diddily about the fashion industry, and I’ve got the data to prove it!”

Think of some claim that has been made in one of your other classes or by a friend/coworker/family member that you want to test with data. Then find a sample that contains

observations you can use to test the claim. A good question is: a) specific, b) capable of being

answered empirically, and c) interesting (non-obvious, non-trivial, original).

By Sunday September 25, students must have an approved 1 page summary of their topic and

responses to the first 2 FAQs (from Angrist and Pischke, first day of class). The summary must

include:

• A non-binary dependent variable (y), 1

• A line like the next one, (as exhaustively as possible) listing variables that “explain”

variation in y:

𝑦𝑦 ≡ 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝑓𝑓(𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, 𝑗𝑗𝑜𝑜𝑜𝑜 𝑎𝑎𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 … ),

• The “x” that will be the focus of the paper’s causal analysis, and a compelling explanation

why this relationship interests the student,

• The unit of observation, e.g., individuals, countries, football teams. In the data set, what will

the rows consist of? 2

This is scored pass/fail and counts toward the final score (see last page) on the project. Do not

to wait until the last day to submit your proposal unless you are very confident everything is

in order. We can go through as many rounds of revision as needed before the due date, so if you

want to make sure your idea is both feasible and suitable for this assignment, consult the

1

It is strongly preferred that you have a ratio level dependent variable like wage, price, population, etc., because

regression is better suited to analyzing these.

2

The 50 United States are a poor choice for the sample. They are small in number and much more heterogeneous

within than across. The instructor will not approve proposals to use States as the unit of observation.

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 2

instructor ahead of time. There will be no revisions for credit allowed after the due date.

This includes proposals submitted on time and rejected ex post.

2. Data collection. Go find data! Data are all around you, waiting to be organized and analyzed.

All one has to do is observe the phenomenon of interest and systematically record observations.

Where can you go to observe the “x” and “y” variables in the causal relationship of interest?

End Goal:

• Data consist of observations (rows) and variables (columns) and have a rectangular

“spreadsheet” layout. A data set must observe multiple variables for multiple (n) elements.

• I’m not asking you to formulate your own survey or anything like that; if you’re really

ambitious, you can certainly do it, but there are plenty of suitable sample data sets already

collected that you can use (see Data and Writing Resources in D2L).

• You need enough information to make meaningful statistical inferences, i.e., large enough

sample size and variation in your variables. E.g., it would be hard to infer much about a

small Indiana town that enacts a zoning regulation, based on a comparison with 5

neighboring towns that didn’t (𝑛𝑛 = 6 and 𝑥𝑥 = 1 for only 1 observation!).

Where should you look?

• Research librarians, Profs. Zoe Mayhook and Bert Chapman, have built the “Costco” of

economic data (http://guides.lib.purdue.edu/Econ360) for our class. For most topics, you

will be able to find a source of data using one of the tools on this page.

• Don’t worry if you have to go to multiple sources for different variables, e.g., the

unemployment rate across counties from bls.gov, and the murder rate from the FBI. Consult

Ben and/or his lab instructions for how to match them to one another in Stata. It requires a

little patience, but is relatively painless and makes your data set much more powerful.

• If you have difficulty deciding on a set of data or finding a set that you can use to test your

hypotheses, please consult me, and I will help get you going.

Students will submit a video presenting the following, due October 30:

• “Working” regression specification,

• Data set in Stata format, and

• Codebook, e.g., Word document, explaining variable definitions.

“Present” means a 5-10 minute demonstration in which you open the data set, explain what

variables and observations you have, and answer a couple practical questions that will help make

the rest of the project easier. This is scored “pass” (full), “low pass” (½), or “fail” (0) and counts

toward the grade on the project (see last page). Note: an approved project topic is a

prerequisite for this video, even for students that do not meet the topic approval due date.

The University has its own version of YouTube called “Kaltura,” that students should use to

record their voice and capture their screen to make this video. Specifically the feature is called

Kaltura Capture (see the Data Set Assignment in Brightspace for a brief tutorial in how to use

it). After recording your video, Capture will upload it to the University’s server and create a

URL. You should submit this URL to earn credit for this milestone in the Project.

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 3

3. Econometric Analysis. Students will document all of the following in a word-processed

report and submit it on the last day of class. All tables and figures should be “self-contained” by

including a caption and intuitive labels for the rows, columns and axes.

3a. Give a sense of how your variables are distributed. Your write-up should include a

professional and easily understandable table of the descriptive statistics on your variables. This

means sample size, sample mean, a measure of variability such as standard deviation, and

skewness. For categorical or binary variables, make it clear how you have made them

quantitative and that the means represent proportions, e.g., the proportion that is male, lives in

Tippecanoe county, or the proportion of the songs on your streaming history that is a particular

genre. Ask yourself, “Do all the descriptive statistics seem plausible? If they do not, what are

some explanations for their bias?”

In the write-up:

• Label it “Table 1: Descriptive Statistics.”

• Carefully explain the units (weekly income? monthly? annual?) in the row labels and the

unit of observation (county? state? occupation-state?) in the caption.

• Are there missing observations or outliers for any variables? If so offer an explanation.

• Does the size of your sample present any concerns about the normality of the sampling

distribution? Speculate about whether the dependent variable’s distribution (skewness,

outliers) presents any problems for the Central Limit Theorem.

o Would taking logs help? 3 For clarity present the descriptive statistics in levels, even if

you take logs when you do the regression.

3b. Use Stata to estimate a simple linear regression for the relationship between the

(hypothesized) causally related variables:

𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥.

Use Stata to produce a scatterplot showing the mapping of x to y, and include the estimated

regression line on the plot to summarize their co-movement. 4

In the write-up:

• State the null and alternative hypotheses in terms of parameters (𝛽𝛽s) that will test the

relationship of interest.

• Discuss the sign (+/-) on 𝛽𝛽̂1? Does it confirm your original prediction?

• Discuss the default and robust standard errors of 𝛽𝛽̂1 and how statistically different from the

null (usually but not always zero) hypothesized value the estimate is. In practical terms, is

there a “wide” confidence interval around the point estimate?

• In terms a non-economist could understand, interpret the coefficient estimate: “. . . a one unit

change in . . . is associated with a . . . .” Is this a practically large effect?

• Discuss how well the linear trend “fits” the data. What is the coefficient of determination

(𝑅𝑅 2 )?

3

I recommend, before proceeding to write up your results from 3 b-e, that students get their functional specifications

(especially of y) right: logs or levels, scaling by 1000 or 1/1000.

4

This page on the Stata website will help with the syntax for making the scatterplot:

https://www.stata.com/support/faqs/graphics/gph/stata-graphs/. If you have a binary x variable: your scatterplot

will just look like “goal posts;” talk to Ben about substituting a table of t test results for the difference in means.

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 4

3c. Robustness part I. Build up your regression specification with explanatory variables that

either: i) shrink the error variance and improve precision of the estimates, or ii) control for

omitted factors in the error term (in the simple OLS specification). Create a table like the one in

Ben’s lecture notes, showing between 4 and 6 different specifications (1 estimate per column)

and enabling the reader to compare the 𝛽𝛽̂ s of interest by reading across one row.

The lower portion of the table should have a row that enables the reader to differentiate the

estimates according to what else is included in the model, like the example below. By

November 20, a polished (full points) version of “Table 2” is due in the drop box on

Brightspace. This counts toward your grade on the project (see last page). An incomplete or

poorly formatted Table will earn a “low pass” (½) or “fail” (zero points) on this part, which

signals that the student needs to revise it before the final due date.

Table 2: Label this one “Table 2: Regression Estimates” in your Write-Up

Your

main x

variable

with

units

Controls

a

b

c

d

�

̂

̂

�Simple OLS β1 � �Your 𝛽𝛽1|spec. 𝑏𝑏 � �Your 𝛽𝛽1|𝑐𝑐 � �Your 𝛽𝛽̂1|𝑑𝑑 �

(𝑠𝑠. 𝑒𝑒. )

(𝑠𝑠. 𝑒𝑒. )

(𝑠𝑠. 𝑒𝑒. )

(𝑠𝑠. 𝑒𝑒. )

None

[Important

control var.]

[More

control

vars.]

All

Adjusted

𝑅𝑅 2

The body of the table should be as self-explanatory as possible, but any information that cannot

go in the row labels, etc., should be explained in a caption, e.g., what is included in “All”?

In the write-up:

• Devote at least 1 paragraph (each) to discussing variables in the error term that could create

omitted variable bias. State specifically what in the error term (think education and omitted

ability) is related to x (and why, theoretically, you should worry about this) and whether it

would bias 𝛽𝛽̂ upward or downward. Do this for 2 different potential sources of bias.

o This might seem challenging if you haven’t taken a lot of other Econ. theory classes, but

consult your instructor or TA about your ideas.

• Discuss the set of estimates. How does 𝛽𝛽̂ change with the addition of controls? Is this

consistent with controlling for omitted variables and reducing bias (see above)?

• Comment on what’s going on with 𝑅𝑅� 2 and standard errors as you add controls.

• Assess your level of satisfaction with how the multiple regression tackles omitted variable

bias.

o It’s okay if you are critical. Often the omitted factors are very difficult to observe and

control for in cross sectional samples.

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 5

3d. Robustness part II. Extend your causal hypothesis to groups within the sample. For

example: “stricter parental ratings will have a negative effect on video game sales. But it will

have a bigger negative effect on ‘first person shooter’ style video games.” Report on a table the

results of a specification that involves interacting the x variable of interest with 1 or more other

regressors. Report the marginal effect of x for each group separately and a standard error for it.

Label this one “Table 3: Interaction Estimates”.

In the write-up:

• Explain why you think this interaction is a relevant test of the robustness of your hypothesis.

“Why should 1st person shooters be more adversely affected by ratings guidelines?” “Oh

yeah, because they tend to be more violent than other genres of games.”

• Does the group with the biggest (absolute value) effect match your hypothesis?

• Are the marginal effects statistically different between/among multiple groups? State a null

hypothesis, test it to verify this, and report the results.

3e. Diagnostics. Run the B-P and White tests for heteroskedasticity and report the results. They

don’t necessarily have to be on a table, because the code will be in your do file. Report (and

explain in the caption) on the table in part (c) robust standard errors if warranted.

Run the RESET to detect functional form misspecification. Your most saturated specification in

part (3c) should include polynomial and interactions terms that, if omitted, would significantly

reduce 𝑅𝑅 2 . Your do file and your summary of the results should include F statistics to confirm

the joint significance of these regressors.

Produce and include in the write-up the leverage-residuals plot from the full-sample specification

with the highest adjusted R squared. Are there any outliers or influential observations that

concern you? If so your tables in parts (b-d) should probably exclude this observation and

contain a note in the caption explaining your treatment of outliers. If you decide that the

observation(s) should be in the sample, explain your reasoning in the caption.

4. Overall instructions for the write-up. Organize your written summary as follows.

• Roughly 1 page containing: a statement of the causal relationship of interest, answers to the

first 2 FAQs, and a summary of the (observational) data source you use to answer FAQ #3.

• Roughly 1 page containing: the regression model specification in equation form and a

written explanation of the variables you will use in your analysis and the units, e.g.,

individuals or countries, that are observed. This is where you state hypotheses about

parameters you will test, too.

• The descriptive stats table and supporting text. Depending on the size, about 1 page.

• A figure containing the 2-D scatterplot and simple OLS line.

• The multiple regression tables (simple OLS as 1st column) and supporting text, statistics, and

diagnostics.

• A brief summary of your results. Have you accurately measured the causal relationship of

interest? Again it’s okay if you’re skeptical.

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 6

o What kind of “natural experiment” 5 would you seek out if you could spend another

semester (doesn’t that sound fun?) studying this and improving your methods?

As a minimum for a good grade, the caliber of written communication will befit a college

graduate. A paper that is incomprehensible (because of poor sentence structure, grammar, using

words out of context, or subject-verb disagreement, et al.) will earn you no points. I will not (nor

will any reader) waste time trying to decipher poorly written paragraphs. I have to read over 50

papers from the class, and I reserve the right to award a failing grade to any paper that is too

hard to read for grammatical or mechanical reasons.

• If you are concerned about your writing ability, visit the writing center. 6 Get a friend,

sibling, or co-worker to read your paper and proofread it. Run spellcheck (!) and search your

paper for incorrect homonyms (spellcheck won’t find these). Do whatever it takes to avoid

handing in a poorly written paper.

• Cite any sources, including data, in the text, (Author year) and include a works cited page.

• Use active voice.

• Avoid the following phrases: “I think”, “I believe”, “I feel.” You’re writing the thing; you

wouldn’t be writing it if you didn’t think it.

• Double space your text.

• Do all the other good things you learned in English composition classes.

Remember it’s your job to communicate your thoughts to the reader—not the reader’s job

to divine what you are trying to say.

On (or before!) December 10, students will turn in the following, by uploading 3 files to the

Semester Project folder on Brightspace.

1. The 7-8 page (including tables and figures) write-up of the project. Has its own folder on

D2L and checks for plagiarism; upload in Word (.doc or .docx) or .pdf format.

2. The (cleaned, .dta format) data set you used to produce the results.

3. The Stata do file containing the commands, in the order they appear in your write-up, that you

used to produce the regression estimates, test hypotheses, and run other tests. I should be able to

open the data set in Stata and run your do file from start to finish without any errors and reproduce your results.

#s 2 and 3 go in the same folder, which allows multiple files per student.

5

An event that is exogenous to the individuals and induces randomness in the x variable of interest. E.g., some

people live in states that pass laws banning electronic “e-cigs” cigarettes; this alters their calculus of whether to use

e-cigs, tobacco cigarettes, or none at all, in a way that has nothing to do with their individual preferences. So some

people who would likely continue using e-cigs are induced to stop and can be compared to people in other states that

are left to their preferences.

6

http://owl.english.purdue.edu/writinglab/servicesoverview

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 7

Project Grading Rubric

The instructor evaluates students’ papers on the following criteria. Each criterion will

receive a “pass,” “low pass,” or “fail” (0 points) score, (see next page).

1. Introduction (Pass=5; Low Pass=3):

a. Describes a novel and interesting empirical question

b. Adequately addresses the first 2 “FAQs” in empirical analysis

c. Clearly explains the data source and unit of observation

2. Description of methods (Pass=5; Low Pass=3):

a. Includes a regression model with the exhaustive list of controls

b. Clearly explains the variables and units in the model

c. Clearly states hypotheses that will be tested statistically

3. Tables and figures (Pass=5; Low Pass=3):

a. All assigned parts are present

b. Are well-labeled, well-formatted, easy-to-read, e.g., no log variables on T1

c. Are self-contained with informative captions

4. Empirical methods/results are (Pass=5; Low Pass=3):

a. Correct and applied consistently with in-class examples, e.g., using log forms

of variables

b. Supported by appropriate testing

c. Accompanied by Stata code, enabling the reader to reproduce the findings

5. Conclusion(s) drawn (Pass=5; Low Pass=3):

a. Explained clearly and concisely in text form

b. Are consistent with the quantitative results and principles of statistical

inference studied in class

c. Include the practical significance of the results, e.g., elasticity of y with respect

to x, when using a log-log model

6. Data set (Pass=5; Low Pass=3):

a. Unit of observation, set of variables match those specified in approved topic

proposal and requested by the instructor.

b. Has value added, e.g., intuitive variable names and/or labels, redundant

variables dropped, nonnumeric characters (like %) removed

c. Is cited and enables the reader to locate its original source(s)

7. Written communication (only “pass” or “fail”):

a. Is coherently organized (as described in the instructions)

b. Transitions from each idea to next smoothly

c. Contains minimal proofreading/formatting/grammatical errors

d. Data and empirical results/methods are described in comprehensible language

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

Fall 2022 – Page 8

Item(s)

Score

1[7“Pass”] ×

1&2

3-6

1 page proposal (approved by

9/25/22)

/2

Data set presentation (by 10/30/22)

/4

Table 2 (by 11/20/22)

/4

Overall Score

/10

/20

/10

/40

Overall Score = 1[Item 7 “Pass”](Score, Items 1&2) + (Score, Items 3 − 6)

+ (Points on intermediate steps)

ECON 360: Econometrics. Ben Van Kammen: Purdue University.

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.