# University of Virginia Variance in Perceived Stress Scores Questions

AssignmentIntroduction

The data set for this assignment, survey.sav, was designed to explore the factors that affect respo’dents’

psychological adjustment and well-being. You will explore the impact of the respondents’ perceptions

of control on their levels of perceived stress.

There are two different measures of control:

•

•

Master Scale – degree to which people feel they have control over the events in their lives

Perceived Control of Internal States Scase (PCOISS) – degree to which people feel they have

control over their internal states, emotions, thoughts, and physical reactions

In this assignment, you are interested in exploring how well the Mastery Scale and the PCOISS are able

to predict scores on a measure of perceived stress.

Data Set

survey.sav

The data set for this assignment is an SPSS file (survey.sav). JASP should have no trouble opening this

file. Remember to load JASP, then use the JASP menus to open the file from the location where you

saved it to your computer.

Instructions

This course is built to prepare you to write a Chapter Four, and this assignment will work in that

direction on a very small scale. Follow the instructions below

1. Start with an intro paragraph (2-3 sentences) about what the study is and describe the data set.

2. You will then address the research questions one at a time. Give an intoductory statement or

two about which test was chosen and why. Then, discuss the finding(s).

3. For this assignment, it is not necessary to write all the hypotheses.

4. So you will have the following headings:

a. Introduction

b. Research Question 1

c. Research Question 2

d. Summary

e. Appendix A

5. Finally, you will have on a new, separate page “Appendix A” with your ouput table(s).

Directions for responding to a Research Question

i

An introductory sentence about which test(s) will be used and what indicated that it was

the correct test.

ii A stated null and alternative hypothesis.

6. The output from JASP (You will copy and paste the output table in Appendix A)

iii A conclusion in paragraph form that includes the decision you made, APA formatted

reporting of your statistics, and a final statement about your findings as it relates to the

research question. (2-3 sentences is fine)

Research Questions

1. How well do the two measures of control (mastery, PCOISS) predict perceived stress? This is

addressed by finding out how much of the variance in perceived stress scores is explained by

these two scales.

2. Which is the best predictor of perceived stress: control of external events (Mastery Scale) or

control of internal states (PCOISS)?

5th Edition JASP v0.16.1 2022

Copyright © 2020 by Mark A Goss-Sampson.

Licenced as CC BY 4.0

All rights reserved. This book or any portion thereof may not be reproduced or used in any manner

whatsoever without the express written permission of the author except for research, education or

private study.

CONTENTS

PREFACE …………………………………………………………………………………………………………………….. 1

USING THE JASP ENVIRONMENT……………………………………………………………………………………… 2

DATA HANDLING IN JASP ………………………………………………………………………………………………. 8

JASP ANALYSIS MENU …………………………………………………………………………………………………. 11

DESCRIPTIVE STATISTICS ……………………………………………………………………………………………… 14

SPLITTING DATA FILES ……………………………………………………………………………………………………….. 19

DESCRIPTIVE DATA VISUALISATION ……………………………………………………………………………….. 20

BASIC PLOTS …………………………………………………………………………………………………………………….. 20

CUSTOMISABLE PLOTS ………………………………………………………………………………………………………. 22

EDITING PLOTS …………………………………………………………………………………………………………………. 26

EXPLORING DATA INTEGRITY………………………………………………………………………………………… 28

DATA TRANSFORMATION…………………………………………………………………………………………….. 36

EFFECT SIZE ……………………………………………………………………………………………………………….. 40

ONE SAMPLE T-TEST……………………………………………………………………………………………………. 42

BINOMIAL TEST ………………………………………………………………………………………………………….. 46

MULTINOMIAL TEST ……………………………………………………………………………………………………. 49

CHI-SQUARE ‘GOODNESS-OF-FIT’ TEST………………………………………………………………………………… 51

MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. ………………………………………………………………… 52

COMPARING TWO INDEPENDENT GROUPS ……………………………………………………………………… 54

INDEPENDENT T-TEST ……………………………………………………………………………………………………….. 54

MANN-WITNEY U TEST ……………………………………………………………………………………………………… 59

COMPARING TWO RELATED GROUPS …………………………………………………………………………….. 61

PAIRED SAMPLES T-TEST ……………………………………………………………………………………………………. 61

WILCOXON’S SIGNED RANK TEST………………………………………………………………………………………… 65

CORRELATION ANALYSIS ……………………………………………………………………………………………… 67

REGRESSION ……………………………………………………………………………………………………………… 73

SIMPLE REGRESSION …………………………………………………………………………………………………………. 76

MULTIPLE REGRESSION……………………………………………………………………………………………………… 79

LOGISTIC REGRESSION ………………………………………………………………………………………………… 86

COMPARING MORE THAN TWO INDEPENDENT GROUPS ……………………………………………………. 91

ANOVA ……………………………………………………………………………………………………………………………. 91

KRUSKAL-WALLIS – NON-PARAMETRIC ANOVA ……………………………………………………………………. 99

COMPARING MORE THAN TWO RELATED GROUPS …………………………………………………………. 103

RMANOVA ……………………………………………………………………………………………………………………… 103

FRIEDMAN’S REPEATED MEASURES ANOVA ………………………………………………………………………. 109

COMPARING INDEPENDENT GROUPS AND THE EFFECTS OF COVARIATES ……………………………. 112

ANCOVA ………………………………………………………………………………………………………………………… 112

TWO-WAY INDEPENDENT ANOVA ……………………………………………………………………………….. 120

TWO-WAY REPEATED MEASURES ANOVA ……………………………………………………………………… 128

MIXED FACTOR ANOVA ……………………………………………………………………………………………… 137

CHI-SQUARE TEST FOR ASSOCIATION……………………………………………………………………………. 145

META-ANALYSIS IN JASP ……………………………………………………………………………………………. 152

EXPERIMENTAL DESIGN AND DATA LAYOUT IN EXCEL FOR JASP IMPORT. ……………………………. 161

Independent t-test ………………………………………………………………………………………………………….. 161

Paired samples t-test ………………………………………………………………………………………………………. 162

Correlation …………………………………………………………………………………………………………………….. 163

Logistic Regression ………………………………………………………………………………………………………….. 165

One-way Independent ANOVA …………………………………………………………………………………………. 166

One-way repeated measures ANOVA ………………………………………………………………………………… 167

Two-way Independent ANOVA …………………………………………………………………………………………. 168

Two-way Repeated measures ANOVA ……………………………………………………………………………….. 169

Two-way Mixed Factor ANOVA…………………………………………………………………………………………. 170

Chi-squared – Contingency tables ……………………………………………………………………………………… 171

SOME CONCEPTS IN FREQUENTIST STATISTICS ……………………………………………………………….. 172

WHICH TEST SHOULD I USE? ……………………………………………………………………………………….. 176

Comparing one sample to a known or hypothesized population mean………………………………….. 176

Testing relationships between two or more variables …………………………………………………………. 176

Predicting outcomes ……………………………………………………………………………………………………….. 177

Testing for differences between two independent groups …………………………………………………… 177

Testing for differences between two related groups …………………………………………………………… 177

Testing for differences between three or more independent groups…………………………………….. 178

Testing for differences between three or more related groups …………………………………………….. 179

Test for interactions between 2 or more independent variables …………………………………………… 179

PREFACE

JASP stands for Jeffrey’s Amazing Statistics Program in recognition of the pioneer of Bayesian

inference Sir Harold Jeffreys. This is a free multi-platform open-source statistics package, developed

and continually updated by a group of researchers at the University of Amsterdam. They aimed to

develop a free, open-source programme that includes both standard and more advanced statistical

techniques with a major emphasis on providing a simple intuitive user interface.

In contrast to many statistical packages, JASP provides a simple drag and drop interface, easy access

menus, intuitive analysis with real-time computation and display of all results. All tables and graphs

are presented in APA format and can be copied directly and/or saved independently. Tables can also

be exported from JASP in LaTeX format

JASP can be downloaded free from the website https://jasp-stats.org/ and is available for Windows,

Mac OS X and Linux. You can also download a pre-installed Windows version that will run directly from

a USB or external hard drive without the need to install it locally. The WIX installer for Windows

enables you to choose a path for the installation of JASP – however, this may be blocked in some

institutions by local Administrative rights.

The programme also includes a data library with an initial collection of over 50 datasets from Andy

Fields’ book, Discovering Statistics using IBM SPSS statistics1 and The Introduction to the Practice of

Statistics2 by Moore, McCabe and Craig.

Keep an eye on the JASP site since there are regular updates as well as helpful videos and blog posts!!

This book is a collection of standalone handouts covering the most common standard (frequentist)

statistical analyses used by students studying Human Sciences. Datasets used in this document are

available for download from https://osf.io/bx6uv/

I would also like to thank Per Palmgren from the Karolinska Institutet in Sweden for his helpful

comments, suggestions and proofreading of this guide.

Dr Mark Goss-Sampson

Centre for Exercise Activity and Rehabilitation

University of Greenwich

2022

1

2

A Field. (2017) Discovering Statistics Using IBM SPSS Statistics (5th Ed.) SAGE Publications.

D Moore, G McCabe, B Craig. (2011) Introduction to the Practice of Statistics (7th Ed.) W H Freeman.

1|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

USING THE JASP ENVIRONMENT

Open JASP.

The main menu can be accessed by clicking on the top-left icon.

Open:

JASP has its own .jasp format but can open a variety of

different dataset formats such as:

•

•

•

•

•

•

•

•

•

.csv (comma separated values) can be saved in Excel

.txt (plain text) also can be saved in Excel

.tsv (tab-separated values) also can be saved in Excel

.sav (IBM SPSS data file)

.ods (Open Document spreadsheet)

.dta (Stata data file)

.por (SPSS ASCII file)

.Sas7bdat /cat (SAS data files)

.xpt (SAS transport file)

You can open recent files, browse your computer files,

access the Open Science Framework (OSF) or open the

wide range of examples

that are packaged with

the Data Library in

JASP.

2|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

Save/Save as:

Using these options the data file, any annotations and the analysis

can be saved in the .jasp format

Export:

Results can be exported to either an HTML file or as a PDF

Data can be exported to either a .csv, .tsv or .txt file

Sync data:

Used to synchronize with any updates in the current data file (also

can use Ctrl-Y)

Close:

As it states – it closes the current file but not JASP

Preferences:

There are four sections that users can use to tweak JASP to suit their needs

3|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

In the Data Preferences section users can:

•

•

•

•

Synchronize/update the data automatically when the data file is saved (default)

Set the default spreadsheet editor (i.e. Excel, SPSS etc)

Change the threshold so that JASP more readily distinguishes between nominal and scale data

Add a custom missing value code

In

the Results Preferences section users can:

•

•

•

•

Set JASP to return exact p values i.e. P=0.00087 rather than P Export Results.

The Add notes menu provides many options to change text font, colour size etc.

You can change the size of all the tables and graphs using ctrl+ (increase) ctrl- (decrease) ctrl= (back

to default size). Graphs can also be resized by dragging the bottom right corner of the graph.

As previously mentioned, all tables and figures are APA standard and can just be copied into any other

document. Since all images can be copied/saved with either a white or transparent background. This

can be selected in Preferences > Advanced as described earlier.

There are many further resources on using JASP on the website https://jasp-stats.org/

7|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

DATA HANDLING IN JASP

For this section open England injuries.csv

All files must have a header label in the first row. Once loaded, the dataset appears in the window:

For large datasets, there is a hand icon that allows easy scrolling through the data.

On import JASP makes a best guess at assigning data to the different variable types:

Nominal

Ordinal

Continuous

If JASP has incorrectly identified the data type just click on the appropriate variable data icon in the

column title to change it to the correct format.

If you have coded the data you can click on the variable name to open up the following window in

which you can label each code. These labels now replace the codes in the spreadsheet view. If you

save this as a .jasp file these codes, as well as all analyses and notes, will be saved automatically. This

makes the data analysis fully reproducible.

8|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

In this window, you can also carry out simple filtering of data, for example, if you untick the Wales

label it will not be used in subsequent analyses.

Clicking this icon in the spreadsheet window opens up a much more comprehensive set of data

filtering options:

Using this option will not be covered in this document. For detailed information on using more

complex filters refer to the following link: https://jasp-stats.org/2018/06/27/how-to-filter-your-datain-jasp/

9|Page

JASP 0.16.1 – Dr Mark Goss-Sampson

By default, JASP plots data in the Value order (i.e. 1-4). The order can be changed by highlighting the

label and moving it up or down using the appropriate arrows:

Move up

Move down

Reverse order

Close

If you need to edit the data in the spreadsheet just double click on a cell and the data should open up

in the original spreadsheet i.e. Excel. Once you have edited your data and saved the original

spreadsheet JASP will automatically update to reflect the changes that were made, provided that you

have not changed the file name.

10 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

JASP ANALYSIS MENU

The main analysis options can be accessed from the main toolbar. Currently, JASP offers the following

frequentist (parametric and non-parametric standard statistics) and alternative Bayesian tests:

Descriptives

• Descriptive stats

T-Tests

• Independent

• Paired

• One sample

ANOVA

• Independent

• Repeated measures

• ANCOVA

• MANOVA *

Mixed Models*

• Linear Mixed Models

Generalised linear mixed models

Regression

• Correlation

• Linear regression

• Logistic regression

Frequencies

• Binomial test

• Multinomial test

• Contingency tables

• Log-linear regression*

Factor

• Principal Component Analysis (PCA)*

• Exploratory Factor Analysis (EFA)*

• Confirmatory Factor Analysis (CFA)*

* Not covered in this guide

BY clicking on the

+ icon on the top-right menu bar you can also access advanced options that allow

the addition of optional modules. Once ticked they will be added to the main analysis ribbon. These

include;

Audit

Network analysis

BAIN

Prophet

Circular Statistics

Reliability analysis

Distributions

SEM

Cochrane Meta-Analyses

Summary statistics

Equivalence tests

Visual modelling

JAGS

Learning Bayes

Machine learning

R (beta)

Meta-analysis (included in this guide)

See the JASP website for more information on these advanced modules

11 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Once you have selected your required analysis all the possible statistical options appear in the left

window and output in the right window.

JASP provides the ability to rename and ‘stack’ the results output thereby organising multiple

analyses.

The individual analyses can be renamed using the pen icon or deleted using the red cross.

Clicking on the analysis in this list will then take you to the appropriate part of the results output

window. They can also be rearranged by dragging and dropping each of the analyses.

The green

+ icon produces a copy of the chosen analysis

The blue information icon provides detailed information on each of the statistical procedures used

and includes a search option.

12 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

13 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

DESCRIPTIVE STATISTICS

Presentation of all the raw data is very difficult for a reader to visualise or draw any inference on.

Descriptive statistics and related plots are a succinct way of describing and summarising data but do

not test any hypotheses. There are various types of statistics that are used to describe data:

•

•

•

•

•

Measures of central tendency

Measures of dispersion

Percentile values

Measures of distribution

Descriptive plots

To explore these measures, load Descriptive data.csv into JASP. Go to Descriptives > Descriptive

statistics and move the Variable data to the Variables box on the right.

You also have options to change and add tables in this section:

•

•

•

•

Split analyses by a categorical variable (i.e., group)

Transpose the main descriptive table (switch columns and rows)

Add frequency tables – important for categorical data

Add stem and Leaf tables: (shows all numeric observations from small to large. The

observations are split into a “stem”, the first digit(s), and a “leaf”, the subsequent digit).

14 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The Statistics menu can now be opened to see the various options available.

CENTRAL TENDENCY.

This can be defined as the tendency for variable values to cluster around a central value. The three

ways of describing this central value are mean, median or mode. If the whole population is considered

the term population mean / median/mode is used. If a sample/subset of the population is being

analysed the term sample mean/ median/mode is used. The measures of central tendency move

toward a constant value when the sample size is sufficient to be representative of the population.

The mean, M or x̅ (17.71) is equal to the sum of all the values divided by the number of values in the

dataset i.e. the average of the values. It is used for describing continuous data. It provides a simple

statistical model of the centre of distribution of the values and is a theoretical estimate of the ‘typical

value’. However, it can be influenced heavily by ‘extreme’ scores.

15 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The median, Mdn (17.9) is the middle value in a dataset that has been ordered from the smallest to

largest value and is the normal measure used for ordinal or non-parametric continuous data. Less

sensitive to outliers and skewed data

The mode (20.0) is the most frequent value in the dataset and is usually the highest bar in a distribution

histogram

DISPERSION

The standard error of the mean, SE (0.244) is a measure of how far the sample mean of the data is

expected to be from the true population mean. As the size of the sample data grows larger the SE

decreases compared to S and the true mean of the population is known with greater specificity.

Standard deviation, S or SD (6.935) is used to quantify the amount of dispersion of data values around

the mean. A low standard deviation indicates that the values are close to the mean, while a high

standard deviation indicates that the values are dispersed over a wider range.

The coefficient of variation (0.392) provides the relative dispersion of the data, in contrast to the

standar5)d deviation, which gives the absolute dispersion.

MAD, (4.7) median absolute deviation, a robust measure of the spread of data. It is relatively

unaffected by data that is not normally distributed. Reporting median +/- MAD for data that is not

normally distributed is equivalent to mean +/- SD for normally distributed data.

MAD Robust: (6.968) median absolute deviation of the data points, adjusted by a factor for

asymptotically normal consistency.

IQR (9.175) Interquartile Range is similar to the MAD but is less robust (see Boxplots).

Variance (48.1) is another estimate of how far the data is spread from the mean. It is also the square

of the standard deviation.

Confidence intervals (CI), although not shown in the general Descriptive statistics output, they are

used in many other statistical tests. When sampling from a population to get an estimate of the mean,

confidence intervals are a range of values within which you are n% confident the true mean is

included. A 95% CI is, therefore, a range of values that one can be 95% certain contains the true mean

of the population. This is not the same as a range that contains 95% of ALL the values.

16 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

For example, in a normal distribution, 95% of the data are expected to be within ± 1.96 SD of the mean

and 99% within ± 2.576 SD.

95% CI = M ± 1.96 * the standard error of the mean.

Based on the data so far, M = 17.71, SE = 0.24, this will be 17.71 ± (1.96 * 0.24) or 17.71 ± 0.47.

Therefore the 95% CI for this dataset is 17.24 – 18.18 and suggests that the true mean is likely to be

within this range 95% of the time

QUARTILES

In the Statistics options make sure that everything is unticked apart from Quartiles.

Quartiles are where datasets are split into 4 equal quarters, normally based on rank ordering of

median values. For example, in this dataset

1

1

2

2

3 3

25%

4

4

4

4

5

50%

5

5

6

7

8

8

75%

9

10

10

10

The median value that splits data by 50% = 50th percentile = 5

The median value of left side = 25th percentile = 3

The median value of right side = 75th percentile = 8

From this the Interquartile range (IQR) range can be calculated, this is the difference between the 75th

and 25th percentiles i.e. 5. These values are used to construct the descriptive boxplots later. The IQR

can also be shown by ticking this option in the Dispersion menu.

17 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

DISTRIBUTION

Skewness describes the shift of the distribution away from a normal distribution. Negative skewness

shows that the mode moves to the right resulting in a dominant left tail. Positive skewness shows

that the mode moves to the left resulting in a dominant right tail.

Negative skewness

Positive skewness

Kurtosis describes how heavy or light the tails are. Positive kurtosis results in an increase in the

“pointiness” of the distribution with heavy (longer) tails while negative kurtosis exhibit a much more

uniform or flatter distribution with light (shorter) tails.

+ kurtosis

Normal

– kurtosis

In the Statistics options make sure that everything is unticked apart from skewness, kurtosis and

Shapiro-Wilk test.

18 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

We can use the Descriptives output to calculate skewness and kurtosis. For a normal data distribution,

both values should be close to zero. The Shapiro-Wilk test is used to assess if the data is significantly

different from a normal distribution. (see – Exploring data integrity in JASP for more details).

SPLITTING DATA FILES

If there is a grouping variable (categorical or ordinal) descriptive statistics and plots can be produced

for each group. Using Descriptive data.csv with the variable data in the Variables box now add Group

to the Split box.

19 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

DESCRIPTIVE DATA VISUALISATION

JASP produces a comprehensive range of descriptive and analysis specific plots. The analysis specific

plots will be explained in their relevant chapters

BASIC PLOTS

Firstly, to look at examples of the basic plots, open Descriptive data.csv with the variable data in the

Variables box, go to Plots and tick Distribution plots, Display density, Interval plots, Q-Q plots, and dot

plots.

The Distribution plot is based on splitting the data into frequency bins, this is then overlaid with the

distribution curve. As mentioned before, the highest bar is the mode (most frequent value of the

dataset. In this case, the curve looks approximately symmetrical suggesting that the data is

approximately normally distributed. The second distribution plot is from another dataset which shows

that the data is positively skewed.

20 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The dot plot displays the distribution where each dot represents a value. If a value occurs more than

once, the dots are placed one above the other so that the height of the column of dots represents the

frequency for that value.

The interval plot shows a 95% confidence interval for the mean of each variable.

The Q-Q plot (quantile-quantile plot) can be used to visually assess if a set of data comes from a normal

distribution. Q-Q plots take the sample data, sort it in ascending order, and then plot them against

quantiles (percentiles) calculated from a theoretical distribution. If the data is normally distributed,

the points will fall on or close to the 45-degree reference line. If the data is not normally distributed,

the points will deviate from the reference line.

21 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Depending on the data sets, basic correlation graphs and pie charts for non-scale data can also be

produced.

CUSTOMISABLE PLOTS

There are a variety of options depending on your datasets.

22 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The boxplots visualise several statistics described above in one plot:

•

•

•

•

•

Median value

25 and 75% quartiles

Interquartile range (IQR) i.e., 75% – 25% quartile values

Maximum and minimum values plotted with outliers excluded

Outliers are shown if requested

Outlier

Maximum value

Top 25%

75% quartile

Median value

IQR

25% quartile

Bottom 25%

Minimum value

Go back to the statistics options, in Descriptive plots tick both Boxplot and Violin Element, look at how

the plot has changed. Next tick Boxplot, Violin and Jitter Elements. The Violin plot has taken the

smoothed distribution curve from the Distribution plot, rotated it 90o and superimposed it on the

boxplot. The jitter plot has further added all the data points.

Boxplot + Violin plot

Boxplot + Violin + Jitter plot

23 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

If your data is split by group, for example, the boxplots for each group will be shown on the same

graph, the colours of each will be different if the Colour palette is ticked. 5 colour palettes are

available.

Ggplot2 palette

Viridis pallette

Scatter Plots

JASP can produce scatterplots of various types and can include smooth or linear regression lines.

There are also options to add distributions to these either in the form of density plots or histograms.

24 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Tile Heatmap

These plots provide an alternative way of visualising data. For example, using the titanic survival

dataset to look at the relationship between the class of passage and survival.

25 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

EDITING PLOTS

Clicking on the drop-down menu provided access to a range of options including Edit Image.

Selecting this option provides some customisation for each graph.

This will open the plot in a new window which allows some modifications of each axis in terms of

axis title and range.

Any changes are then updated in the results window. The new plot can be saved as an image or can

be reset to default values.

Do not forget that group labels can be changed in the spreadsheet editor.

26 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

27 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

EXPLORING DATA INTEGRITY

Sample data is used to estimate parameters of the population whereby a parameter is a measurable

characteristic of a population, such as a mean, standard deviation, standard error or confidence

intervals etc.

What is the difference between a statistic and a parameter? If you randomly polled a selection of

students about the quality of their student bar and you find that 75% of them were happy with it. That

is a sample statistic since only a sample of the population were asked. You calculated what the

population was likely to do based on the sample. If you asked all the students in the university and

90% were happy you have a parameter since you asked the whole university population.

Bias can be defined as the tendency of a measurement to over or underestimate the value of a

population parameter. There are many types of bias that can appear in research design and data

collection including:

•

•

•

Participant selection bias – some being more likely to be selected for study than others

Participant exclusion bias – due to the systematic exclusion of certain individuals from the

study

Analytical bias – due to the way that the results are evaluated

However statistical bias can affect a) parameter estimates, b) standard errors and confidence intervals

or c) test statistics and p values. So how can we check for bias?

IS YOUR DATA CORRECT?

Outliers are data points that are abnormally outside all other data points. Outliers can be due to a

variety of things such as errors in data input or analytical errors at the point of data collection Boxplots

are an easy way to visualise such data points where outliers are outside the upper (75% + 1.5 * IQR)

or lower (25% – 1.5 * IQR) quartiles

Boxplots show:

•

•

•

•

•

Median value

25 & 75% quartiles

IQR – Inter quartile range

Max & min values plotted

with outliers excluded

Outliers shown if requested

28 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Load Exploring Data.csv into JASP. Under Descriptives > Descriptive Statistics, add Variable 1 to the

Variables box. In Plots tick the following Boxplots, Label Outliers, and BoxPlot Element.

The resulting Boxplot on the left looks very compressed and an obvious outlier is labelled as being in

row 38 of the dataset. This can be traced back to a data input error in which 91.7 was input instead of

917. The graph on the right shows the BoxPlot for the ‘clean’ data.

29 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

How you deal with an outlier depends on the cause. Most parametric tests are highly sensitive to

outliers while non-parametric tests are generally not.

Correct it? – Check the original data to make sure that it isn’t an input error, if it is, correct it, and

rerun the analysis.

Keep it? – Even in datasets of normally distributed data, outliers may be expected for large sample

sizes and should not automatically be discarded if that is the case.

Delete it? – This is a controversial practice in small datasets where a normal distribution cannot be

assumed. Outliers resulting from an instrument reading error may be excluded but they should be

verified first.

Replace it? – Also known as winsorizing. This technique replaces the outlier values with the relevant

maximum and/or minimum values found after excluding the outlier.

Whatever method you use must be justified in your statistical methodology and subsequent analysis.

WE MAKE MANY ASSUMPTIONS ABOUT OUR DATA.

When using parametric tests, we make a series of assumptions about our data and bias will occur if

these assumptions are violated, in particular:

•

•

Normality

Homogeneity of variance or homoscedasticity

Many statistical tests are an omnibus of tests of which some will check these assumptions.

TESTING THE ASSUMPTION OF NORMALITY

Normality does not mean necessarily that the data is normally distributed per se but it is whether or

not the dataset can be well modelled by a normal distribution. Normality can be explored in a variety

of ways:

•

•

•

Numerically

Visually / graphically

Statistically

Numerically we can use the Descriptives output to calculate skewness and kurtosis. For a normal data

distribution, both values should be close to zero. To determine the significance of skewness or kurtosis

we calculate their z-scores by dividing them by their associated standard errors:

skewness

Skewness Z =Skewness standard error

Z score significance:

kurtosis

Kurtosis Z =kurtosis standard error

p1.96

p2.58

p3.29

30 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Using Exploring data.csv, go to Descriptives>Descriptive Statistics move Variable 3 to the Variables

box, in the Statistics drop-down menu select Mean, Std Deviation, Skewness and Kurtosis as shown

below with the corresponding output table.

Both skewness and kurtosis are not close to 0. The positive skewness suggests that data is distributed

more on the left (see graphs later) while the negative kurtosis suggests a flat distribution. When

calculating their z scores it can be seen that the data is significantly skewed pIndependent Samples t-test move Variable 1 to the Variables

box and Group to the Grouping variable and tick Assumption Checks > Equality of variances.

In this case, there is no significant difference in variance between the two groups F (1) = 0.218, p =.643.

The assumption of homoscedasticity (equal variance) is important in linear regression models as is

linearity. It assumes that the variance of the data around the regression line is the same for all

predictor data points. Heteroscedasticity (the violation of homoscedasticity) is present when the

34 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

variance differs across the values of an independent variable. This can be visually assessed in linear

regression by plotting actual residuals against predicted residuals

If homoscedasticity and linearity are not violated there should be no relationship between what the

model predicts and its errors as shown in the graph on the left. Any sort of funnelling (middle graph)

suggests that homoscedasticity has been violated and any curve (right graph) suggests that linearity

assumptions have not been met.

35 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

DATA TRANSFORMATION

JASP provides the ability to compute new variables or transform data. In some cases, it may

be useful to compute the differences between repeated measures or, to make a dataset more

normally distributed, you can apply a log transform for example.

When a dataset is opened there will be a plus sign (+) at the end of the columns.

Clicking on the + opens up a small dialogue window where you can;

•

•

•

Enter the name of a new variable or the transformed variable

Select whether you enter the R code directly or use the commands built into JASP

Select what data type is required

Once you have named the new variable and chosen the other options – click create.

36 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

If you choose the manual option rather than the R code, this opens all the built-in create and

transform options. Although not obvious, you can scroll the left and right-hand options to see

more variables or more operators respectively.

For example, we want to create a column of data showing the difference between variable 2

and variable 3. Once you have entered the column name in the Create Computed Column

dialogue window, its name will appear in the spreadsheet window. The mathematical

operation now needs to be defined. In this case drag variable 2 into the equation box, drag

the ‘minus’ sign down and then drag in variable 3.

37 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

If you have made a mistake, i.e. used the wrong variable or operator, remove it by dragging

the item into the dustbin in the bottom right corner.

When you are happy with the equation/operation, click compute column and the data will be

entered.

If you decide that you do not want to keep the derived data, you can remove the column by

clicking the other dustbin icon next to the R.

Another example is to do a log transformation of the data. In the following case variable 1 has

been transformed by scrolling the operators on the left and selecting the log10(y) option.

Replace the “y” with the variable that you want to transform and then click Compute column.

When finished, click the X to close the dialogue.

38 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The Export function will also export any new data variables that have been created.

The two graphs below show the untransformed and the log10 transformed data. The skewed

data has been transformed into a profile with a more normal distribution

Untransformed

Log10 transformed

39 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

EFFECT SIZE

When performing a hypothesis test on data we determine the relevant statistic (r, t, F etc) and p-value

to decide whether to accept or reject the null hypothesis. A small p-value, 0.05 only means that there is

not enough evidence to reject the null hypothesis. A lower p-value is sometimes incorrectly

interpreted as meaning there is a stronger relationship of difference between variables. So what is

needed is not just null hypothesis testing but also a method of determining precisely how large the

effects seen in the data are.

An effect size is a statistical measure used to determine the strength of the relationship or difference

between variables. Unlike a p-value, effect sizes can be used to quantitatively compare the results

of different studies.

For example, comparing heights between 11 and 12-year-old children may show that the 12-yearolds are significantly taller but it is difficult to visually see a difference i.e. small effect size. However,

a significant difference in height between 11 and 16-year-old children is obvious to see (large effect

size).

The effect size is usually measured in three ways:

•

•

•

the standardized mean difference

correlation coefficient

odds ratio

When looking at differences between groups most techniques are primarily based on the differences

between the means divided by the average standard deviations. The values derived can then be used

to describe the magnitude of the differences. The effect sizes calculated in JASP for t-tests and ANOVA

are shown below:

40 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

When analysing bivariate or multivariate relationships the effect sizes are the correlation

coefficients:

When analysing categorical relationships via contingency tables i.e. chi-square test Phi is only used for

2×2 tables while Cramer’s V and be used for any table size.

For a 2 × 2 contingency table, we can also define the odds ratio measure of effect size.

41 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

ONE SAMPLE T-TEST

Research is normally carried out in sample populations, but how close does the sample reflect the

whole population? The parametric one-sample t-test determines whether the sample mean is

statistically different from a known or hypothesized population mean.

The null hypothesis (Ho) tested is that the sample mean is equal to the population mean.

ASSUMPTIONS

Three assumptions are required for a one-sample t-test to provide a valid result:

•

•

•

•

The test variable should be measured on a continuous scale.

The test variable data should be independent i.e. no relationship between any of the data

points.

The data should be approximately normally distributed

There should be no significant outliers.

RUNNING THE ONE SAMPLE T-TEST

Open one sample t-test.csv, this contains two columns of data representing the height (cm) and body

masses (kg) of a sample population of males used in a study. In 2017 the average adult male in the UK

population was 178 cm tall and has a body mass of 83.4 kg.

Go to T-Tests > One-Sample t-test and in the first instance add height to the analysis box on the right.

Then tick the following options above and add 178 as the test value:

42 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

UNDERSTANDING THE OUTPUT

The output should contain three tables and two graphs.

The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the heights are

normally distributed, therefore this assumption is not violated. If this showed a significant difference

the analysis should be repeated using the non-parametric equivalent, Wilcoxon’s signed-rank test

tested against the population median height.

This table shows that there are no significant differences between the means p =.706

The descriptive data shows that the mean height of the sample population was 177.6 cm compared

to the average 178 cm UK male.

43 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

The two plots show essentially the same data but in different ways. The standard Descriptive plot is

an Interval plot showing the sample mean (black bullet), the 95% confidence interval (whiskers),

relative to the test value (dashed line).

The Raincloud Plot shows the data as individual data points, boxplot, and the distribution plot. This

can be shown as either a vertical or horizontal display.

44 | P a g e

JASP 0.16.1 – Dr Mark Goss-Sampson

Repeat the procedure by replacing height with mass and changing the test value to 83.4.

The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the masses are

normally distributed.

This table shows that there is a significant difference between the mean sample (72.9 kg) and

population body mass (83.4 kg) p

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.