# Statistics Question

PART ONE

Students should complete this Research Application Activity after other work in this Module is finished.

*Students should open and read the attached document, write their responses as requested in the attached document, and submit the essay and SPSS file as an attachment to this link.*

*USE THE FOLLOWING ATTACHED DOCUMENTS “*Data Cleaning Application Essay PSYC 3430.docx *AND* Data Cleaning Application Essay Raw Data.xls

PART TWO

*Each discussion forum allows students to identify the most challenging or confusing concept from the current module to discuss with others in a helpful, mutually supportive manner.*

*Students are expected to identify the concept from this week’s chapter that they find the most challenging and write a brief explanation of 1) why they find the concept challenging or confusing and 2) the concept in the**student’s own words**after researching the concept in the textbook and online. Student must**cite all online sources and include a working hyperlink**to the source to receive credit for the post. At least one online source must be used for all muddiest point explanations. Quoting or plagiarizing from the textbook and/or online sources will**not**receive credit.*

*USE THE FOLLOWING DOCUMENTS FOR THIS ASSIGNMENT*

Chapter 2

Frequency Distributions

PowerPoint Lecture Slides

Essentials of Statistics for the Behavioral Sciences

Tenth Edition

by Frederick J Gravetter, Larry B. Wallnau, and Lori-Ann B. Forzano

Learning Outcomes

1.

2.

3.

4.

5.

6.

Understand hoe frequency distributions are used

Organize data into a frequency distribution table…

… and into a grouped frequency distribution table

Know how to interpret frequency distributions

Organize data into frequency distribution graphs

Know how to interpret and understand graphs

Tools You Will Need

•

Proportions (Appendix A)

–

–

–

•

Scales of measurement (Chapter 1)

–

–

•

Fractions

Decimals

Percentages

Nominal, ordinal, interval, and ratio

Continuous and discrete variables (Chapter 1)

Real limits (Chapter 1)

2-1 Frequency Distributions and

Frequency Distribution Tables

•

A frequency distribution is

–

–

•

•

An organized tabulation

Showing the number of individuals located in each

category on the scale of measurement

Can be either a table or a graph

Always shows

–

–

The set of categories that make up the original

measurement scale

A record of the frequency, or number, of individuals

in each category

Frequency Distribution Tables

•

Structure of a frequency distribution table

–

–

•

•

Categories in a column (often ordered from highest

to lowest)

Frequency count (f) next to category (X values)

Σf = N

To compute ΣX (sum of the scores) from a table

–

–

Convert table back to original scores or

Compute ΣfX

Proportions and Percentages

Proportions

• Measures the fraction

of the total group that

is associated with

each score

f

proportion = p =

N

• Called relative

frequencies because

they describe the

frequency (f) in relation

to the total number (N)

Percentages

• Expresses relative

frequency out of

100

f

percentage = p (100) = (100)

N

•

Can be included as

a separate column

in a frequency

distribution table

Example 2.4 Frequency,

Proportion, and Percentage

X

f

p = f/N

percent = p(100)

5

1

1/10 = .10

10%

4

2

2/10 = .20

20%

3

3

3/10 = .30

30%

2

3

3/10 = .30

30%

1

1

1/10 = .10

10%

Learning Check 1 (1 of 2)

•

Use the frequency

distribution table to

determine how many

subjects were in the

study.

A. 10

B. 15

C. 33

D. Impossible to

determine

X

f

5

2

4

4

3

1

2

0

1

3

Learning Check 1 – Answer (1 of 2)

•

Use the frequency

distribution table to

determine how many

subjects were in the

study.

A. 10

B. 15

C. 33

D. impossible to

determine

X

f

5

2

4

4

3

1

2

0

1

3

Learning Check 1 (2 of 2)

•

•

For the frequency distribution

shown, is each of these

statements True or False?

T/F

–

•

More than 50% of the individuals

scored above 3.

T/F

–

The proportion of scores in the

lowest category was p = 3.

X

f

5

2

4

4

3

1

2

0

1

3

Learning Check 1 – Answer (2 of 2)

•

•

For the frequency distribution

shown, is each of these

statements True or False?

True

–

•

Six out of ten individuals scored

above 3 = 60% = more than half

False

–

A proportion is a fractional part;

3 out of 10 scores = 3/10 = .3

X

f

5

2

4

4

3

1

2

0

1

3

2-2 Grouped Frequency Distribution

Tables

•

•

If the number of categories is very large, they are

combined (grouped into intervals) to make the

table easier to understand

However, some information is lost when categories

are grouped

–

–

Individual scores cannot be retrieved

The wider the grouping interval, the more

information is lost

Guidelines for Constructing Grouped

Frequency Distributions

•

Guidelines

–

–

–

–

Ten or fewer class intervals is typical (but use good

judgment for the specific situation)

The width of each interval should be a relatively

simple number (e.g., 2, 5,10, or 20)

The bottom score in each class interval should be a

multiple of the width

All intervals should be the same width

Real Limits and Frequency

Distributions (1 of 3)

•

Constructing either frequency distributions or

grouped frequency distributions for discrete

variables is uncomplicated

–

–

Individuals with the same recorded score had

precisely the same measurements

The score is an exact score

Real Limits and Frequency

Distributions (2 of 3)

•

Constructing frequency distributions for

continuous variables requires understanding that

a score actually represents an interval

–

–

–

A given “score” actually could have been any value

within the score’s real limits

The recorded value was rounded off to the middle

value between the score’s real limits

Individuals with the same recorded score probably

differed slightly in their actual measurements (the

measurements are simply located in the same

interval)

Real Limits and Frequency

Distributions (3 of 3)

•

•

•

Constructing grouped frequency distributions for

continuous variables also requires understanding

that a score actually represents an interval

Consequently, grouping several scores actually

requires grouping several intervals

Apparent limits of the (grouped) class interval are

always one unit smaller than the real limits of the

(grouped) class interval. (Why?)

Learning Check 2 (1 of 2)

•

A grouped frequency distribution table has

categories 0–9, 10–19, 20–29, and 30–39. What is

the width of the interval 20–29?

A. 9 points

B. 9.5 points

C. 10 points

D. 10.5 points

Learning Check 2 – Answer (1 of 2)

•

A grouped frequency distribution table has

categories 0–9, 10–19, 20–29, and 30–39. What is

the width of the interval 20–29?

A. 9 points

B. 9.5 points

C. 10 points (29.5 – 19.5 = 10)

D. 10.5 points

Learning Check 2 (2 of 2)

•

Decide if each of the following statements

is True or False.

•

T/F

–

•

You can determine how many individuals had each

score from a frequency distribution table.

T/F

–

You can determine how many individuals had each

score from a grouped frequency distribution.

Learning Check 2 – Answer (2 of 2)

•

True

–

•

The original scores can be recreated from the

frequency distribution table

False

–

Only the number of individuals in the class interval

is available once the scores are grouped

2-3 Frequency Distribution Graphs

•

Pictures of the data organized in tables

–

–

–

•

All have two axes

X-axis (abscissa) typically has categories of the

measurement scale increasing from left to right

Y-axis (ordinate) typically has frequencies with

values increasing from bottom to top

General principles

–

–

Both axes should have a value of zero (0) where

they intersect

Height should be about ⅔ to ¾ of the length

Data Graphing Questions

•

•

•

Level of measurement? (nominal, ordinal, interval,

or ratio)

Discrete or continuous data?

Describing samples or populations?

The answers to these questions determine which is

the appropriate graph

Graphs for Interval or Ratio Data (1 of 4)

•

•

•

•

Require numerical scores (measured on an interval or ratio

scale)

Represent all scores on X-axis from minimum thru

maximum observed values

Include all scores with frequency of zero

Draw bars above each score (interval)

–

–

–

The height of the bar corresponds to the frequency for that

category

For continuous variables, the width of the bar extends to the

real limits of the category

For discrete variables, each bar extends exactly half the

distance to the adjacent category on each side

Figure 2.3 Frequency Distribution

Histogram

Graphs for Interval or Ratio Data (2 of 4)

•

•

Grouped data: data grouped into class intervals

Draw bars above each (grouped) class interval

–

–

Bar width is the class interval real limits

Consequence? Apparent limits are extended out

one-half score unit at each end of the interval

Figure 2.4 Frequency Distribution

Histogram for Grouped Data

Graphs for Interval or Ratio Data (3 of 4)

•

A standard histogram can be made into an informal

histogram (“block” histogram)

•

Create a bar of the correct height by drawing a

stack of blocks

•

Each block represents one individual

•

Therefore, block histograms show the frequency

count in each bar

Figure 2.5 Frequency Distribution

Block Histogram

Graphs for Interval or Ratio Data (4 of 4)

Constructing a polygon

–

–

•

Draw a dot above the center of each interval

–

–

–

•

List all numeric scores on the X-axis

Include those with a frequency of f = 0

Height of dot corresponds to frequency

Connect the dots with a continuous line

Close the polygon with lines to the Y = 0 point

Can also be used with grouped frequency

distribution data

Figure 2.6 Frequency Distribution

Polygon

Figure 2.7 Frequency Distribution

Polygon for Grouped Data

Graphs for Nominal or Ordinal Data

•

For non-numerical scores (nominal and ordinal

data), use a bar graph

–

–

Similar to a histogram

Spaces between adjacent bars indicate discrete

categories

• Without a particular order (nominal)

• Nonmeasurable width (ordinal)

Figure 2.8 Bar Graph

Box 2.1, Figure 2.11 The Use and

Misuse of Graphs

Graphs for Population Distributions

•

•

When a population is small, scores for each member

are used to construct a frequency distribution graph

such as a histogram and bar graph

When a population is large, scores for each member

are not possible

–

–

•

Graphs based on relative frequencies are used

Graphs use smooth curves to indicate exact scores

were not used

Normal distribution

–

–

Symmetric with greatest frequency in the middle

Common data structure for many variables

Figure 2.9 Bar Graph of Relative

Frequencies

Figure 2.10 The Population

Distribution of IQ scores

The Shape of a Frequency

Distribution

•

•

•

Researchers describe a distribution’s shape in

words rather than drawing it

Symmetrical distribution: each side is a mirror

image of the other

Skewed distribution: scores pile up on one side

and taper off in a tail on the other

–

–

Tail on the right (high scores) = positive skew

Tail on the left (low scores) = negative skew

Figure 2.12 Shapes for

Frequency Distributions

Learning Check 3 (1 of 2)

•

What is the shape of this

distribution?

A. symmetrical

B. negatively skewed

C. positively skewed

D. discrete

Learning Check 3 – Answer (1 of 2)

•

What is the shape of this

distribution?

A. symmetrical

B. negatively skewed

C. positively skewed

D. discrete

Learning Check 3 (2 of 2)

•

Decide if each of the following statements

is True or False.

•

T/F

–

•

It would be correct to use a histogram to graph

parental marital status data (single, married,

divorced…) from a treatment center for children.

T/F

–

It would be correct to use a histogram to graph the

time children spent playing with other children from

data collected in a children’s treatment center.

Learning Check 3 – Answer

(2 of 2)

•

False

–

•

Marital status is a nominal variable; a bar graph is

required

True

–

Time is measured continuously and is an interval

variable

2-4 Stem and Leaf Displays

•

•

A simple alternative to a grouped frequency

distribution table or graph

Each score is separated into two parts: a stem and

a leaf

–

–

•

•

The first digit (or digits) is called the stem

The last digit is called the leaf

Example: X = 85 would be separated into a stem of

8 and a leaf of 5

Every individual score can be identified

Learning Check 4 (1 of 2)

•

For the scores shown in

the stem and leaf display,

what is the lowest score in

the distribution?

A. 7

B. 15

C. 50

D. 51

9

374

8

945

7

7042

6

68

5

14

Learning Check 4 – Answer (1 of 2)

•

For the scores shown in

the stem and leaf display,

what is the lowest score in

the distribution?

A. 7

B. 15

C. 50

D. 51

9

374

8

945

7

7042

6

68

5

14

Learning Check 4 (2 of 2)

•

•

Decide if each of the following statements

is True or False.

T/F

–

•

Any frequency distribution is suitable for a stem

and leaf display.

T/F

–

A score of 54 is displayed as 5 (stem) and 4 (leaf)

in a stem and leaf display.

Learning Check 4 – Answer (2 of 2)

•

False

–

•

A stem and leaf display is a simple alternative for a

grouped frequency distribution

True

–

The first digit (5) is the stem and the last digit (4) is

the leaf

Clear Your Doubts, Ask Questions

Application Essay

Module 2 – Cleaning Data Prior to Statistical Analysis

Gathering data from research participants allows psychologists and other professionals to test how

effective their techniques are. Without data gathering and systematic analysis of that data, we could not be

sure that new methods of helping people are any more effective than traditional ways of doing things or

doing nothing at all. So, data collection and data analysis – via the statistical procedures you will learn in

this course – are critical to our body of knowledge as we engage in our careers in or related to psychology.

Data collection can be a messy process. Sometimes research participants drop out of a study or they may

misunderstand the directions for participation or they may provide information that is not accurate both

unintentionally (e.g., when they are asked to provide information they don’t know) or intentionally (e.g.,

whey they try to guess the researcher’s prediction and prove it wrong). There are many ways that data

could be difficult to interpret, so it is vital that the data is cleaned before analysis.

Data cleaning involves making decisions about the data that you can explain and defend to others. It

requires us to carefully examine all the information that a research participant provides and verify that it

makes sense. Though the information might include extreme values (e.g., a participant could be 99 years

old, a participant could be 7 feet tall), the information should not include values that are impossible and

were likely entered in error (e.g., a participant could not be 215 years old, a participant could not be 22

feet tall).

Also, the data should not include out of range values, which are scores that are not within the range of

scores that were measured on a particular scale. For example, a scale assessing self-esteem may ask

participants to rate how much they value themselves on a scale from 1 (not at all) to 7 (very much). A

score of 6 would indicate that the participant valued themselves a lot, which could be a sign of high selfesteem; however, a score of 9 would not be possible because the measured values range only from 1 to 7.

A value of 9 would be a data entry error because it is higher than the highest possible value on the scale.

In addition to checking the values that research participants provide one variable at a time (e.g., making

sure the ages are plausible then verifying there are no out of range values on self-esteem), a researcher

also needs to verify that the responses make sense across the multiple variables that are measured for each

participant. For example, a 65-year-old individual (age variable) should not claim to be currently enrolled

in middle school (grade level variable), an individual who reports never having served in the military

(military service variable) should not report that they served two tours in combat with the Army (combat

exposure variable), and so on. Data that is logically impossible when considered across variables should

be carefully scrutinized because its inclusion could alter the results of the research and render it

meaningless.

To reduce the likelihood that inaccurate data may affect the statistical analysis of a research project, the

researcher must critically consider the data each research participant provides and write rules regarding

when data will be excluded from the analyses for the project. These rules must be applied in exactly the

same manner to every research participant’s data in the data set. The researcher must be careful NOT to

selectively apply these data cleaning rules to just those cases that might refute their hypotheses. The rules

must be applied to every case regardless of whether that case is consistent or inconsistent with what the

researcher hopes the data will reveal. Selective application of rules is a breach of research ethics.

Unfortunately, the data sets provided by textbook publishers to practice statistical analyses are already

cleaned so that the prepackaged data makes sense and is logically plausible. Data collected in the real

world is rarely so neat and tidy.

To provide exposure to critically evaluating research data and writing cleaning rules that will be applied

to all cases in a data set prior to data analysis, students should review the hypothetical data set in the

Excel file associated with this assignment. An explanation of each variable follows:

ID – This is a number assigned to each participant based on when they completed the study. #1 is the first

person to complete it, #2 is the second person to complete it, and so on.

Age – This is the participant’s age in years. Because minors were not approved by the IRB for this

research, all research participants must be able to legally provide their consent to participate, which is the

age of 18 in Texas.

Gender – This is the participant’s self-reported gender identification. Numerical codes are used for the

data with 0 = cisgender female and 1 = cisgender male.

Military – This is the participant’s military affiliation. Numerical codes are used for the data with 0 = no

affiliation, 1 = active duty service member, 2 = spouse of active duty service member, 3 = child of active

duty service member.

Satisfaction – This is the average rating of the participant’s satisfaction with life scale score. Values

range from 1 (not satisfied) to 7 (very satisfied).

Intention – This value is the participant’s rating on intention to enroll in college in the upcoming fall

semester. Values range from 1 (not at all likely) to 10 (extremely likely).

Marital – This value is the participant’s marital status. Numerical codes are used for the data with 1 =

never married, 2 = married, 3 = divorced/widowed.

Children – This value represents the number of children under the age of 18 who live in the home with

the research participant.

To complete the application assignment, students should examine each variable one at a time to identify

problematic data points and write rules that would allow you to remove that data point (and others with

the same problem) from future analyses. In addition, students should review the variables measured

across each participant’s responses to verify that the responses are possible/logical. Students should

identify and write rules to exclude any cases (and others like it) that are logically inconsistent or

impossible. Then, students should practice the SPSS lessons in this module by importing the Excel file

into SPSS.

Students should respond to each item below by writing complete sentences for their responses to each

item.

1. For participant #1, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

2. For participant #2, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

3. For participant #3, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

4. For participant #4, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

5. For participant #5, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

6. For participant #6, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

7. For participant #7, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

8. For participant #8, should this individual be included in the data set for analysis? If so, why? If

not, why not? Explain the information you considered for your decision. If the data should be

excluded, state the data cleaning rule that could be applied to every research participant’s data in

the data set. (1 point)

9. Import the Excel file into SPSS. To document this portion of the assignment, 1) students may take

a screenshot of the data in the Data View in SPSS and upload the screenshot to the assignment

link, or 2) students may save the data file in SPSS and upload the data file to the assignment link.

(7 points)

ID

Age

1

2

3

4

5

6

7

8

Gender

17

22

35

75

43

27

19

49

Military

0

0

1

0

0

0

0

1

3

0

2

1

0

3

0

0

Satisfaction Intention Marital

5

5

6

10

7

7

7

1

5

0

6

6

6

10

6

3

Children

1

1

1

3

2

2

2

2

0

0

2

0

2

2

1

7

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.