# Economics Statistic STATA Problem Set

Name:Email ID:

@psu.edu

Worked with these other students:

ECON306 Problem Set 5

INSTRUCTIONS: Solve the following questions to the best of your ability. Ask me if

you do not know how to solve any of these questions before the due date. I will work with

you if you are having trouble solving these.

To receive full credit for this assignment, the problem set needs to be submitted to Canvas

in a single PDF document containing your 1) Stata log file in a .pdf file and 2) any

written explanations and answers. All of these components need to be attached together in

that order. Late submissions will NOT be accepted. DO NOT email! No assignments will be

accepted via email.

ECON 306 Problem Set 5, Fall 2022

Page 2

Boston Mortgages

1

Probability Replication

In this section, we are going to replicate the results presented in Chapter 11 of the Stock

and Watson textbook. You can view the data description by downloading it from Canvas in

the Problem Set 5 assignment. In Stata, you are used to performing an OLS regression with

the command regress. You can perform a probit estimation in the exact same way except

you replace regress with probit. If you want to perform a logit model, the command is logit.

You will use the data set hmda PS5.dta for this question. You can view the data description

with the file hmda.pdf on Canvas.

a) Replicate the results of the linear probability model, equation 11.1

b) Replicate the results of the probit models, equations 11.7 and 11.8

c) Replicate the results of the logit model, equation 11.10

d) The LTV variable is the loan-to-value ratio and it represents the fraction of the value of

the home that is being borrowed for the mortgage. It is much less risky for banks to make

a loan when the LTV is low. Using the probit model from equation 11.8, add variables

for coapplicant, LTV, and years of schooling.

e) You probably see the result that years of schooling is not statistically significant in the

previous regression. That result is incorrect. Examine your data and decide how to

perform another regression using the same variables where you find the legitimate result

that years of schooling are statistically significant.

ECON 306 Problem Set 5, Fall 2022

Page 3

Test Scores

2

Differences in Differences

The data in FFT102sp15.dta come from my ECON102 class in Spring 2015. After the first

midterm, I sent an email to students who failed the exam (got message=1 for these students).

The email expressed my desire to help students do better and included links to resources

for the course, encouragement to visit regular office hours, and a link to schedule a 1-on-1

meeting with me. I would like you to help me investigate if this message of encouragement

made a difference in these students grades for the midterm and final exam.

a) Calculate the difference in differences using a 4-number summary (average midterm 1

score and average midterm 2 score for students who did vs. did not get the message).

b) Find the appropriate variable in the data set and rename it “treat”

c) Generate a variable called “post” to use in your DID regression.

d) Create the interaction term and call it “treat post”

e) Perform the basic DID regression like we did in class (see Mastering Metrics equation 5.3

if you don’t remember what to include)

f) Interpret the results coefficients on treat, post, and treat post

g) Do you think it is fair to compare the students who failed the first test to those who did

not? Why or why not?

h) Add the control variable dropped to your DID regression. Interpret the coefficient on

dropped and comment if you think that leaving it out would cause omitted variable bias

in any of the other variables. NOTE: for students who dropped the class, we only see

their second midterm score to compare to the first midterm score. For students who did

not drop, we can compare using their second midterm and final exam scores in the post

period.

Note: I actually transformed the original data I had for the class in order to make this

problem easier for you. If you’re curious and want to know what it used to look like, you

can type the following command into Stata:

reshape wide testpct, i(id) j(testnum)

If

you want to get back, you can type reshape long

or just close the without saving.

ECON 306 Problem Set 5, Fall 2022

Page 4

Beer Tax

Download the description and data set fatalityPS5.dta for this problem. We will replicate

the results from Table 10.1 in Stock & Watson, columns 1-3.

3

Panel Data

a) Replicate the results of column (1) by performing a simple regression with the variables

beertax and fatalityrate treating the data as cross-sectional.

b) Let’s perform a simple panel data regression without any fixed effects. In order to do

this, we first need to let Stata know that we are working with panel data. In order to do

that, you need to use the xtset command. This basically tells Stata what variables relate

to the i and t in panel data. For this data set, state is the cross-sectional observational

unit and year represents the different times. Therefore, type xtset state year in order

to let Stata understand how your panel data are organized. In order to take advantage

of panel data regression, you have to use the command xtreg in Stata instead of regress.

c) Modify your previous result by asking Stata to calculate heteroskedasticity-and-autocorrelation

robust standard errors by typing , vce(robust) at the end of your regression command.

d) Replicate the results of column (2) by adding state fixed effects. In order to do this, you

need to add , fe vce(cluster state) to the end of your previous regression command.

The fe tells Stata to use entity fixed effects. The vce(cluster state) tells Stata to cluster

your standard errors by State (this does not have to be the same as your cross-sectional

ID…for example, if you had data on housing prices over time, you might want to cluster

standard errors by neighborhood).

e) Replicate the results of column (3) by adding time fixed effects by adding dummy variables

for the years 1982 – 1987 to your previous specification. Comment on if these time

dummies are significant.

f) Comment on what you notice about the difference from moving one step at a time from

parts a) through e) of this question.

Stock/Watson – Introduction to Econometrics 4th Edition

THE STATE TRAFFIC FATALITY DATA SET

The data are for the “lower 48” U.S. states (excluding Alaska and Hawaii),

annually for 1982 through 1988. The traffic fatality rate is the number of traffic deaths in

a given state in a given year, per 10,000 people living in that state in that year. Traffic

fatality data were obtained from the U.S. Department of Transportation Fatal Accident

Reporting System. The beer tax is the tax on a case of beer, which is an available

measure of state alcohol taxes more generally. The drinking age variables in Table 10.1

are binary variables indicating whether the legal drinking age is 18, 19, or 20. The two

binary punishment variables in Table 10.1 describe the state’s minimum sentencing

requirements for an initial drunk driving conviction: “Mandatory jail?” equals one if the

state requires jail time and equals zero otherwise, and “Mandatory community service?”

equals one if the state requires community service and equals zero otherwise. Total

vehicle miles traveled annually by state was obtained from the Department of

Transportation. Personal income was obtained from the U.S. Bureau of Economic

Analysis, and the unemployment rate was obtained from the U.S. Bureau of Labor

Statistics.

These data were graciously provided to us by Professor Christopher J. Ruhm of

the Department of Economics at the University of North Carolina.

Data Series:

Series

state

year

spircons

unrate

perinc

Descriptions

State ID (FIPS) Code

Year

Spirits Consumption

Unemployment Rate

Per Capita Personal Income

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

emppop

beertax

sobapt

mormon

mlda

dry

yngdrv

vmiles

breath

jaild

comserd

allmort

mrall

allnite

mralln

allsvn

a1517

mra1517

a1517n

mra1517n

a1820

a1820n

mra1820

mra1820n

a2124

mra2124

a2124n

mra2124n

aidall

mraidall

pop

pop1517

pop1820

pop2124

miles

unus

epopus

gspch

Employment/Population Ratio

Tax on Case of Beer

% Southern Baptist

% Mormon

Minimum Legal Drinking Age

% Residing in Dry Counties

% of Drivers Aged 15-24

Ave. Mile per Driver

Prelim. Breath Test Law

Mandatory Jail Sentence

Mandatory Community Service

# of Vehicle Fatalities (#VF)

Vehicle Fatality Rate (VFR)

# of Night-time VF (#NVF)

Night-time VFR (NFVR)

# of Single VF (#SVN)

#VF, 15-17 year olds

VFR, 15-17 year olds

#NVF, 15-17 year olds

NVFR, 15-17 year olds

#VF, 18-20 year olds

#NVF, 18-20 year olds

VFR, 18-20 year olds

NVFR, 18-20 year olds

#VF, 21-24 year olds

VFR, 21-24 year olds

#NVF, 21-24 year olds

NVFR, 21-24 year olds

# of alcohol-involved VF

Alcohol-Involved VFR

Population

Population, 15-17 year olds

Population, 18-20 year olds

Population, 21-24 year olds

total vehicle miles (millions

U.S. unemployment rate

U.S. Emp/Pop Ratio

GSP Rate of Change

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

THE BOSTON HMDA DATA SET

The Boston HMDA data set was collected by researchers at the Federal Reserve

Bank of Boston. The data set combines information from mortgage applications and a

follow-up survey of the banks and other lending institutions that received these mortgage

applications. The data pertain to mortgage applications made in 1990 in the greater

Boston metropolitan area. The full data set has 2925 observations, consisting of all

mortgage applications by blacks and Hispanics plus a random sample of mortgage

applications by whites.

To narrow the scope of the analysis in this chapter, we use a subset of the data for

single-family residences only (thereby excluding data on multi-family homes) and for

black and white applicants only (thereby excluding data on applicants from other

minority groups). This leaves 2380 observations. Definitions of the variables used in

this chapter are given in Table 11.1.

These data were graciously provided to us by Geoffrey Tootell of the Research

Department of the Federal Reserve Bank of Boston. More information about this data

set, along with the conclusions reached by the Federal Reserve Bank of Boston

researchers, is available in the article by Alicia H. Munnell, Geoffrey M.B. Tootell,

Lynne E. Browne, and James McEneaney, “Mortgage Lending in Boston: Interpreting

HMDA Data,” American Economic Review, 1996, pp. 25 – 53.

Two datasets have been included on the website. HMDA_AER is the full HMDA

data set used in the Munnell, Tootell, Browne and McEneaney paper. HMDA_SW is

contains the 2380 observations that are used in the analysis in Chapter 11. See the

replication files for Chapter 11 for the variable definitions used in the chapter.

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

The description of the data set given below was supplied by the Federal Reserve Bank of

Boston.

(list rev. 8/1/01)

Federal Reserve Bank of Boston

Research Department

General Research Data Set

FOLLOW-UP TO 1990 HOME MORTGAGE DISCLOSURE ACT (HMDA) REPORTS

LOAN/APPLICATION REGISTER (LAR)

DETAILED LIST OF VARIABLES

(Abbreviated as Question Number on HMDA Surveys)

I.

(SEQ) – sequence number, unique identifier for observations

II.

Original HMDA data

A. Loan Information

1. (S3) Type of Loan

Codes:

1 – Conventional

3. (S4) Purpose of Loan

Codes:

1 – Home purchase

2 – Home improvement

3 – Refinancing

4 – Multifamily

4. (S5) Occupancy

Codes:

1 – Owner-occupied

2 – Not owner-occupied

3 – Not applicable

5. (S6) Loan amount (in thousands)

6. (S7) Type of action taken

Codes:

1 – Loan originated

2 – Application approved but not accepted

by applicant

3 – Application denied

4 – Application withdrawn

5 – File closed for incompleteness

6 – Loan purchased by institution

B.

Property Location:

1. (S9) MSA (Boston Metropolitan Statistical Area) number where property

located

2. (S11) County where property located

Codes:

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

1 – Suffolk

0 – Other

*

C.

Applicant Information

1. (S13) Applicant race

Codes:

1 – American Indian or Alaskan Native

2 – Asian or Pacific Islander

3 – Black

4 – Hispanic

5 – White

6 – Other

7 – Information not provided by applicant in

mail or telephone application

8 – Not applicable

2. (S14) Co-applicant race*

3. (S15) Applicant sex

Codes:

1 – Male

2 – Female

3 – Information not provided by applicant in

mail or telephone application

4 – Not applicable

4. (S16) Co-applicant sex*

5. (S17) Applicant income (in thousands)

D.

Other Loan Information

1. (S18) Type of purchaser of loan

Codes:

0 – Loan was not sold in calendar year covered by register

1 – FNMA

2 – GNMA

3 – FHLMC

4 – FMHA

5 – Commercial bank

6 – Savings bank or savings association

7 – Life insurance company

8 – Affiliate institution

9 – Other type of purchaser

2. (S19A) Original HMDA report, reasons for denial

Codes:

1 – Debt-to-income ratio

2 – Employment history

3 – Credit history

4 – Collateral

5 – Insufficient cash

6 – Unverifiable information

Same codes as preceding variable

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

7 – Credit application incomplete

8 – Mortgage insurance denied

9 – Other

III.

*

Follow-up Survey Data

1. (S19B, S19C, S19D) Additions or corrections to reasons for denial from

Boston survey data*

2. (S20) Number of units in property purchased

3. (S23A) Marital status of applicant

Codes:

M – Married

U – Unmarried (includes single, divorced and widowed)

S – Separated

4. (S24A) Number of dependents claimed by applicant

5. (S25A) Years employed in applicable line of work

6. (S26A) Years employed on applicable job

7. (S27A) Self-employed applicant

Codes:

0 – Not self-employed

1 – Self-employed

8. (S30A) Base employment monthly income of applicant (in dollars)

9. (S30C) Base employment monthly income of coapplicant (in dollars)

10. (S31A) Total monthly income of applicant (in dollars)

11. (S31C) Total monthly income of coapplicant (in dollars)

12. (S32) Proposed monthly housing expense (in dollars)

13. (S33) Purchase price (in thousands)

14. (S34) Other financing (in thousands)

15. (S35) Liquid assets (in thousands)**

16. (S39) Number of commercial credit reports in loan file

17. (S40) Applicants’ credit history meets loan policy guidelines for approval

Codes:

0 – No

1 – Yes

18. (S41) Number of separate consumer credit lines on credit reports

19. (S42) Credit history – mortgage payments

Codes:

1 – No late mortgage payments

2 – No mortgage payment history

3 – One or two late mortgage payments

4 – More than two late mortgage payments

20. (S43) Credit history – consumer payments

Codes:

1 – No “slow pay” or delinquent accounts, but sufficient references for

determination

2 – One or two “slow pay” account(s) (each with one or two payments 30

days past due)

Same codes as preceding variable

Applicant and coapplicant data were summed if separate statements were completed.

**

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

3 – More than two “slow pay” accounts (each with one or two payments

30 days past due); or one or two chronic “slow pay” account(s) (with

three or more payments 30 days past due in any 12-month period)

4 – Insufficient credit history or references for determination

5 – Delinquent credit history (containing account(s) with a history of

payments 60 days past due)

6 – Serious delinquencies (containing account(s) with a history of

payments 90 days past due)

21. (S44) Credit history – public records

0 – Information not considered

0 – No public record defaults

1 – Bankruptcy

1 – Bankruptcy and charge offs

1 – One or two charge-off(s), public record(s), or collection action(s),

totaling less than $300

1 – Charge-off(s), public record(s), or collection action(s) totaling more

than $300

22. (S45) Debt-to-income ratio (the banks’ calculation of housing

expense/income)

23. (S46) Debt-to-income ratio (the banks’ calculation of total

obligations/income)

24. (S47) Fixed or adjustable rate loan (F or A)

Codes:

1 – Adjustable

2 – Fixed

3 – Not Available

25. (S48) Term of loan (months)

26. (S49) Special loan application program

27. (S50) Appraised value (in thousands)

28. (S51) Type of property purchased

Codes:

1 – Condominium

2 – Single family

3 – 2 to 4 families

29. (S52) Private mortgage insurance (PMI) sought?

Codes:

0 – No or information not available

1 – Yes

30. (S53) Private mortgage insurance (PMI) denied?

Codes:

0 – PMI approved, did not apply, or information not available

1 – PMI sought and denied

31. (S54) Was a gift or grant as part of down payment?

Codes:

0 – No or information not available

1 – Yes

32. (S55) Was there a co-signer for the application?

Codes:

0 – No or information not available

1 – Yes

33. (S56) Unverifiable information

©2018 Pearson Education, Inc.

Stock/Watson – Introduction to Econometrics 4th Edition

Codes:

0 – Not applicable (all verifiable)

1 – Some information unverifiable

34. (S57) Number of times application was reviewed by underwriter

III.

Variables Added for Analysis, taken from the Census Survey

1. (netw) Net worth (Total assets – Total liabilities)***

2. (uria) Probability of unemployment by industry

3. (rtdum) Minority population share in tract

Codes:

0 – if £ 0.30

1 – if > 0.30

4. (bd) Boarded-up value of tract

Codes:

0 – if £ MSA median

1 – if > MSA median

5. (mi) Median tract income

Codes:

0 – if £ MSA median

1 – if > MSA median

6. (old) Applicant age

Codes:

0 – if £ MSA median

1 – if > MSA median

2 – missing

7. (vr) Tract vacancy

Codes:

0 – if £ MSA median

1 – if > MSA median

8. (school) Years of education

9. (chvalc) Change in median value of property in a given tract, 1980-1990

IV.

Dummy variables created from HMDA data

1. (dnotown) Owner occupied property

0 – Owner occupied

1 – Not owner occupied, or information not available

2. (dprop) Type of property

0 – Condominium or single family

1 – 2-4 families

Notes:

1. 999,999.4 is used in the database to signify missing observations in numerical columns.

2. NA is used to signify missing observations in character columns.

***

Applicant and coapplicant combined

©2018 Pearson Education, Inc.

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.