University of Virginia Variance in Perceived Stress Scores Questions
AssignmentIntroduction
The data set for this assignment, survey.sav, was designed to explore the factors that affect respo’dents’
psychological adjustment and well-being. You will explore the impact of the respondents’ perceptions
of control on their levels of perceived stress.
There are two different measures of control:
•
•
Master Scale – degree to which people feel they have control over the events in their lives
Perceived Control of Internal States Scase (PCOISS) – degree to which people feel they have
control over their internal states, emotions, thoughts, and physical reactions
In this assignment, you are interested in exploring how well the Mastery Scale and the PCOISS are able
to predict scores on a measure of perceived stress.
Data Set
survey.sav
The data set for this assignment is an SPSS file (survey.sav). JASP should have no trouble opening this
file. Remember to load JASP, then use the JASP menus to open the file from the location where you
saved it to your computer.
Instructions
This course is built to prepare you to write a Chapter Four, and this assignment will work in that
direction on a very small scale. Follow the instructions below
1. Start with an intro paragraph (2-3 sentences) about what the study is and describe the data set.
2. You will then address the research questions one at a time. Give an intoductory statement or
two about which test was chosen and why. Then, discuss the finding(s).
3. For this assignment, it is not necessary to write all the hypotheses.
4. So you will have the following headings:
a. Introduction
b. Research Question 1
c. Research Question 2
d. Summary
e. Appendix A
5. Finally, you will have on a new, separate page “Appendix A” with your ouput table(s).
Directions for responding to a Research Question
i
An introductory sentence about which test(s) will be used and what indicated that it was
the correct test.
ii A stated null and alternative hypothesis.
6. The output from JASP (You will copy and paste the output table in Appendix A)
iii A conclusion in paragraph form that includes the decision you made, APA formatted
reporting of your statistics, and a final statement about your findings as it relates to the
research question. (2-3 sentences is fine)
Research Questions
1. How well do the two measures of control (mastery, PCOISS) predict perceived stress? This is
addressed by finding out how much of the variance in perceived stress scores is explained by
these two scales.
2. Which is the best predictor of perceived stress: control of external events (Mastery Scale) or
control of internal states (PCOISS)?
5th Edition JASP v0.16.1 2022
Copyright © 2020 by Mark A Goss-Sampson.
Licenced as CC BY 4.0
All rights reserved. This book or any portion thereof may not be reproduced or used in any manner
whatsoever without the express written permission of the author except for research, education or
private study.
CONTENTS
PREFACE …………………………………………………………………………………………………………………….. 1
USING THE JASP ENVIRONMENT……………………………………………………………………………………… 2
DATA HANDLING IN JASP ………………………………………………………………………………………………. 8
JASP ANALYSIS MENU …………………………………………………………………………………………………. 11
DESCRIPTIVE STATISTICS ……………………………………………………………………………………………… 14
SPLITTING DATA FILES ……………………………………………………………………………………………………….. 19
DESCRIPTIVE DATA VISUALISATION ……………………………………………………………………………….. 20
BASIC PLOTS …………………………………………………………………………………………………………………….. 20
CUSTOMISABLE PLOTS ………………………………………………………………………………………………………. 22
EDITING PLOTS …………………………………………………………………………………………………………………. 26
EXPLORING DATA INTEGRITY………………………………………………………………………………………… 28
DATA TRANSFORMATION…………………………………………………………………………………………….. 36
EFFECT SIZE ……………………………………………………………………………………………………………….. 40
ONE SAMPLE T-TEST……………………………………………………………………………………………………. 42
BINOMIAL TEST ………………………………………………………………………………………………………….. 46
MULTINOMIAL TEST ……………………………………………………………………………………………………. 49
CHI-SQUARE ‘GOODNESS-OF-FIT’ TEST………………………………………………………………………………… 51
MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. ………………………………………………………………… 52
COMPARING TWO INDEPENDENT GROUPS ……………………………………………………………………… 54
INDEPENDENT T-TEST ……………………………………………………………………………………………………….. 54
MANN-WITNEY U TEST ……………………………………………………………………………………………………… 59
COMPARING TWO RELATED GROUPS …………………………………………………………………………….. 61
PAIRED SAMPLES T-TEST ……………………………………………………………………………………………………. 61
WILCOXON’S SIGNED RANK TEST………………………………………………………………………………………… 65
CORRELATION ANALYSIS ……………………………………………………………………………………………… 67
REGRESSION ……………………………………………………………………………………………………………… 73
SIMPLE REGRESSION …………………………………………………………………………………………………………. 76
MULTIPLE REGRESSION……………………………………………………………………………………………………… 79
LOGISTIC REGRESSION ………………………………………………………………………………………………… 86
COMPARING MORE THAN TWO INDEPENDENT GROUPS ……………………………………………………. 91
ANOVA ……………………………………………………………………………………………………………………………. 91
KRUSKAL-WALLIS – NON-PARAMETRIC ANOVA ……………………………………………………………………. 99
COMPARING MORE THAN TWO RELATED GROUPS …………………………………………………………. 103
RMANOVA ……………………………………………………………………………………………………………………… 103
FRIEDMAN’S REPEATED MEASURES ANOVA ………………………………………………………………………. 109
COMPARING INDEPENDENT GROUPS AND THE EFFECTS OF COVARIATES ……………………………. 112
ANCOVA ………………………………………………………………………………………………………………………… 112
TWO-WAY INDEPENDENT ANOVA ……………………………………………………………………………….. 120
TWO-WAY REPEATED MEASURES ANOVA ……………………………………………………………………… 128
MIXED FACTOR ANOVA ……………………………………………………………………………………………… 137
CHI-SQUARE TEST FOR ASSOCIATION……………………………………………………………………………. 145
META-ANALYSIS IN JASP ……………………………………………………………………………………………. 152
EXPERIMENTAL DESIGN AND DATA LAYOUT IN EXCEL FOR JASP IMPORT. ……………………………. 161
Independent t-test ………………………………………………………………………………………………………….. 161
Paired samples t-test ………………………………………………………………………………………………………. 162
Correlation …………………………………………………………………………………………………………………….. 163
Logistic Regression ………………………………………………………………………………………………………….. 165
One-way Independent ANOVA …………………………………………………………………………………………. 166
One-way repeated measures ANOVA ………………………………………………………………………………… 167
Two-way Independent ANOVA …………………………………………………………………………………………. 168
Two-way Repeated measures ANOVA ……………………………………………………………………………….. 169
Two-way Mixed Factor ANOVA…………………………………………………………………………………………. 170
Chi-squared – Contingency tables ……………………………………………………………………………………… 171
SOME CONCEPTS IN FREQUENTIST STATISTICS ……………………………………………………………….. 172
WHICH TEST SHOULD I USE? ……………………………………………………………………………………….. 176
Comparing one sample to a known or hypothesized population mean………………………………….. 176
Testing relationships between two or more variables …………………………………………………………. 176
Predicting outcomes ……………………………………………………………………………………………………….. 177
Testing for differences between two independent groups …………………………………………………… 177
Testing for differences between two related groups …………………………………………………………… 177
Testing for differences between three or more independent groups…………………………………….. 178
Testing for differences between three or more related groups …………………………………………….. 179
Test for interactions between 2 or more independent variables …………………………………………… 179
PREFACE
JASP stands for Jeffrey’s Amazing Statistics Program in recognition of the pioneer of Bayesian
inference Sir Harold Jeffreys. This is a free multi-platform open-source statistics package, developed
and continually updated by a group of researchers at the University of Amsterdam. They aimed to
develop a free, open-source programme that includes both standard and more advanced statistical
techniques with a major emphasis on providing a simple intuitive user interface.
In contrast to many statistical packages, JASP provides a simple drag and drop interface, easy access
menus, intuitive analysis with real-time computation and display of all results. All tables and graphs
are presented in APA format and can be copied directly and/or saved independently. Tables can also
be exported from JASP in LaTeX format
JASP can be downloaded free from the website https://jasp-stats.org/ and is available for Windows,
Mac OS X and Linux. You can also download a pre-installed Windows version that will run directly from
a USB or external hard drive without the need to install it locally. The WIX installer for Windows
enables you to choose a path for the installation of JASP – however, this may be blocked in some
institutions by local Administrative rights.
The programme also includes a data library with an initial collection of over 50 datasets from Andy
Fields’ book, Discovering Statistics using IBM SPSS statistics1 and The Introduction to the Practice of
Statistics2 by Moore, McCabe and Craig.
Keep an eye on the JASP site since there are regular updates as well as helpful videos and blog posts!!
This book is a collection of standalone handouts covering the most common standard (frequentist)
statistical analyses used by students studying Human Sciences. Datasets used in this document are
available for download from https://osf.io/bx6uv/
I would also like to thank Per Palmgren from the Karolinska Institutet in Sweden for his helpful
comments, suggestions and proofreading of this guide.
Dr Mark Goss-Sampson
Centre for Exercise Activity and Rehabilitation
University of Greenwich
2022
1
2
A Field. (2017) Discovering Statistics Using IBM SPSS Statistics (5th Ed.) SAGE Publications.
D Moore, G McCabe, B Craig. (2011) Introduction to the Practice of Statistics (7th Ed.) W H Freeman.
1|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
USING THE JASP ENVIRONMENT
Open JASP.
The main menu can be accessed by clicking on the top-left icon.
Open:
JASP has its own .jasp format but can open a variety of
different dataset formats such as:
•
•
•
•
•
•
•
•
•
.csv (comma separated values) can be saved in Excel
.txt (plain text) also can be saved in Excel
.tsv (tab-separated values) also can be saved in Excel
.sav (IBM SPSS data file)
.ods (Open Document spreadsheet)
.dta (Stata data file)
.por (SPSS ASCII file)
.Sas7bdat /cat (SAS data files)
.xpt (SAS transport file)
You can open recent files, browse your computer files,
access the Open Science Framework (OSF) or open the
wide range of examples
that are packaged with
the Data Library in
JASP.
2|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
Save/Save as:
Using these options the data file, any annotations and the analysis
can be saved in the .jasp format
Export:
Results can be exported to either an HTML file or as a PDF
Data can be exported to either a .csv, .tsv or .txt file
Sync data:
Used to synchronize with any updates in the current data file (also
can use Ctrl-Y)
Close:
As it states – it closes the current file but not JASP
Preferences:
There are four sections that users can use to tweak JASP to suit their needs
3|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
In the Data Preferences section users can:
•
•
•
•
Synchronize/update the data automatically when the data file is saved (default)
Set the default spreadsheet editor (i.e. Excel, SPSS etc)
Change the threshold so that JASP more readily distinguishes between nominal and scale data
Add a custom missing value code
In
the Results Preferences section users can:
•
•
•
•
Set JASP to return exact p values i.e. P=0.00087 rather than P Export Results.
The Add notes menu provides many options to change text font, colour size etc.
You can change the size of all the tables and graphs using ctrl+ (increase) ctrl- (decrease) ctrl= (back
to default size). Graphs can also be resized by dragging the bottom right corner of the graph.
As previously mentioned, all tables and figures are APA standard and can just be copied into any other
document. Since all images can be copied/saved with either a white or transparent background. This
can be selected in Preferences > Advanced as described earlier.
There are many further resources on using JASP on the website https://jasp-stats.org/
7|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
DATA HANDLING IN JASP
For this section open England injuries.csv
All files must have a header label in the first row. Once loaded, the dataset appears in the window:
For large datasets, there is a hand icon that allows easy scrolling through the data.
On import JASP makes a best guess at assigning data to the different variable types:
Nominal
Ordinal
Continuous
If JASP has incorrectly identified the data type just click on the appropriate variable data icon in the
column title to change it to the correct format.
If you have coded the data you can click on the variable name to open up the following window in
which you can label each code. These labels now replace the codes in the spreadsheet view. If you
save this as a .jasp file these codes, as well as all analyses and notes, will be saved automatically. This
makes the data analysis fully reproducible.
8|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
In this window, you can also carry out simple filtering of data, for example, if you untick the Wales
label it will not be used in subsequent analyses.
Clicking this icon in the spreadsheet window opens up a much more comprehensive set of data
filtering options:
Using this option will not be covered in this document. For detailed information on using more
complex filters refer to the following link: https://jasp-stats.org/2018/06/27/how-to-filter-your-datain-jasp/
9|Page
JASP 0.16.1 – Dr Mark Goss-Sampson
By default, JASP plots data in the Value order (i.e. 1-4). The order can be changed by highlighting the
label and moving it up or down using the appropriate arrows:
Move up
Move down
Reverse order
Close
If you need to edit the data in the spreadsheet just double click on a cell and the data should open up
in the original spreadsheet i.e. Excel. Once you have edited your data and saved the original
spreadsheet JASP will automatically update to reflect the changes that were made, provided that you
have not changed the file name.
10 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
JASP ANALYSIS MENU
The main analysis options can be accessed from the main toolbar. Currently, JASP offers the following
frequentist (parametric and non-parametric standard statistics) and alternative Bayesian tests:
Descriptives
• Descriptive stats
T-Tests
• Independent
• Paired
• One sample
ANOVA
• Independent
• Repeated measures
• ANCOVA
• MANOVA *
Mixed Models*
• Linear Mixed Models
Generalised linear mixed models
Regression
• Correlation
• Linear regression
• Logistic regression
Frequencies
• Binomial test
• Multinomial test
• Contingency tables
• Log-linear regression*
Factor
• Principal Component Analysis (PCA)*
• Exploratory Factor Analysis (EFA)*
• Confirmatory Factor Analysis (CFA)*
* Not covered in this guide
BY clicking on the
+ icon on the top-right menu bar you can also access advanced options that allow
the addition of optional modules. Once ticked they will be added to the main analysis ribbon. These
include;
Audit
Network analysis
BAIN
Prophet
Circular Statistics
Reliability analysis
Distributions
SEM
Cochrane Meta-Analyses
Summary statistics
Equivalence tests
Visual modelling
JAGS
Learning Bayes
Machine learning
R (beta)
Meta-analysis (included in this guide)
See the JASP website for more information on these advanced modules
11 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Once you have selected your required analysis all the possible statistical options appear in the left
window and output in the right window.
JASP provides the ability to rename and ‘stack’ the results output thereby organising multiple
analyses.
The individual analyses can be renamed using the pen icon or deleted using the red cross.
Clicking on the analysis in this list will then take you to the appropriate part of the results output
window. They can also be rearranged by dragging and dropping each of the analyses.
The green
+ icon produces a copy of the chosen analysis
The blue information icon provides detailed information on each of the statistical procedures used
and includes a search option.
12 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
13 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
DESCRIPTIVE STATISTICS
Presentation of all the raw data is very difficult for a reader to visualise or draw any inference on.
Descriptive statistics and related plots are a succinct way of describing and summarising data but do
not test any hypotheses. There are various types of statistics that are used to describe data:
•
•
•
•
•
Measures of central tendency
Measures of dispersion
Percentile values
Measures of distribution
Descriptive plots
To explore these measures, load Descriptive data.csv into JASP. Go to Descriptives > Descriptive
statistics and move the Variable data to the Variables box on the right.
You also have options to change and add tables in this section:
•
•
•
•
Split analyses by a categorical variable (i.e., group)
Transpose the main descriptive table (switch columns and rows)
Add frequency tables – important for categorical data
Add stem and Leaf tables: (shows all numeric observations from small to large. The
observations are split into a “stem”, the first digit(s), and a “leaf”, the subsequent digit).
14 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The Statistics menu can now be opened to see the various options available.
CENTRAL TENDENCY.
This can be defined as the tendency for variable values to cluster around a central value. The three
ways of describing this central value are mean, median or mode. If the whole population is considered
the term population mean / median/mode is used. If a sample/subset of the population is being
analysed the term sample mean/ median/mode is used. The measures of central tendency move
toward a constant value when the sample size is sufficient to be representative of the population.
The mean, M or x̅ (17.71) is equal to the sum of all the values divided by the number of values in the
dataset i.e. the average of the values. It is used for describing continuous data. It provides a simple
statistical model of the centre of distribution of the values and is a theoretical estimate of the ‘typical
value’. However, it can be influenced heavily by ‘extreme’ scores.
15 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The median, Mdn (17.9) is the middle value in a dataset that has been ordered from the smallest to
largest value and is the normal measure used for ordinal or non-parametric continuous data. Less
sensitive to outliers and skewed data
The mode (20.0) is the most frequent value in the dataset and is usually the highest bar in a distribution
histogram
DISPERSION
The standard error of the mean, SE (0.244) is a measure of how far the sample mean of the data is
expected to be from the true population mean. As the size of the sample data grows larger the SE
decreases compared to S and the true mean of the population is known with greater specificity.
Standard deviation, S or SD (6.935) is used to quantify the amount of dispersion of data values around
the mean. A low standard deviation indicates that the values are close to the mean, while a high
standard deviation indicates that the values are dispersed over a wider range.
The coefficient of variation (0.392) provides the relative dispersion of the data, in contrast to the
standar5)d deviation, which gives the absolute dispersion.
MAD, (4.7) median absolute deviation, a robust measure of the spread of data. It is relatively
unaffected by data that is not normally distributed. Reporting median +/- MAD for data that is not
normally distributed is equivalent to mean +/- SD for normally distributed data.
MAD Robust: (6.968) median absolute deviation of the data points, adjusted by a factor for
asymptotically normal consistency.
IQR (9.175) Interquartile Range is similar to the MAD but is less robust (see Boxplots).
Variance (48.1) is another estimate of how far the data is spread from the mean. It is also the square
of the standard deviation.
Confidence intervals (CI), although not shown in the general Descriptive statistics output, they are
used in many other statistical tests. When sampling from a population to get an estimate of the mean,
confidence intervals are a range of values within which you are n% confident the true mean is
included. A 95% CI is, therefore, a range of values that one can be 95% certain contains the true mean
of the population. This is not the same as a range that contains 95% of ALL the values.
16 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
For example, in a normal distribution, 95% of the data are expected to be within ± 1.96 SD of the mean
and 99% within ± 2.576 SD.
95% CI = M ± 1.96 * the standard error of the mean.
Based on the data so far, M = 17.71, SE = 0.24, this will be 17.71 ± (1.96 * 0.24) or 17.71 ± 0.47.
Therefore the 95% CI for this dataset is 17.24 – 18.18 and suggests that the true mean is likely to be
within this range 95% of the time
QUARTILES
In the Statistics options make sure that everything is unticked apart from Quartiles.
Quartiles are where datasets are split into 4 equal quarters, normally based on rank ordering of
median values. For example, in this dataset
1
1
2
2
3 3
25%
4
4
4
4
5
50%
5
5
6
7
8
8
75%
9
10
10
10
The median value that splits data by 50% = 50th percentile = 5
The median value of left side = 25th percentile = 3
The median value of right side = 75th percentile = 8
From this the Interquartile range (IQR) range can be calculated, this is the difference between the 75th
and 25th percentiles i.e. 5. These values are used to construct the descriptive boxplots later. The IQR
can also be shown by ticking this option in the Dispersion menu.
17 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
DISTRIBUTION
Skewness describes the shift of the distribution away from a normal distribution. Negative skewness
shows that the mode moves to the right resulting in a dominant left tail. Positive skewness shows
that the mode moves to the left resulting in a dominant right tail.
Negative skewness
Positive skewness
Kurtosis describes how heavy or light the tails are. Positive kurtosis results in an increase in the
“pointiness” of the distribution with heavy (longer) tails while negative kurtosis exhibit a much more
uniform or flatter distribution with light (shorter) tails.
+ kurtosis
Normal
– kurtosis
In the Statistics options make sure that everything is unticked apart from skewness, kurtosis and
Shapiro-Wilk test.
18 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
We can use the Descriptives output to calculate skewness and kurtosis. For a normal data distribution,
both values should be close to zero. The Shapiro-Wilk test is used to assess if the data is significantly
different from a normal distribution. (see – Exploring data integrity in JASP for more details).
SPLITTING DATA FILES
If there is a grouping variable (categorical or ordinal) descriptive statistics and plots can be produced
for each group. Using Descriptive data.csv with the variable data in the Variables box now add Group
to the Split box.
19 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
DESCRIPTIVE DATA VISUALISATION
JASP produces a comprehensive range of descriptive and analysis specific plots. The analysis specific
plots will be explained in their relevant chapters
BASIC PLOTS
Firstly, to look at examples of the basic plots, open Descriptive data.csv with the variable data in the
Variables box, go to Plots and tick Distribution plots, Display density, Interval plots, Q-Q plots, and dot
plots.
The Distribution plot is based on splitting the data into frequency bins, this is then overlaid with the
distribution curve. As mentioned before, the highest bar is the mode (most frequent value of the
dataset. In this case, the curve looks approximately symmetrical suggesting that the data is
approximately normally distributed. The second distribution plot is from another dataset which shows
that the data is positively skewed.
20 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The dot plot displays the distribution where each dot represents a value. If a value occurs more than
once, the dots are placed one above the other so that the height of the column of dots represents the
frequency for that value.
The interval plot shows a 95% confidence interval for the mean of each variable.
The Q-Q plot (quantile-quantile plot) can be used to visually assess if a set of data comes from a normal
distribution. Q-Q plots take the sample data, sort it in ascending order, and then plot them against
quantiles (percentiles) calculated from a theoretical distribution. If the data is normally distributed,
the points will fall on or close to the 45-degree reference line. If the data is not normally distributed,
the points will deviate from the reference line.
21 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Depending on the data sets, basic correlation graphs and pie charts for non-scale data can also be
produced.
CUSTOMISABLE PLOTS
There are a variety of options depending on your datasets.
22 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The boxplots visualise several statistics described above in one plot:
•
•
•
•
•
Median value
25 and 75% quartiles
Interquartile range (IQR) i.e., 75% – 25% quartile values
Maximum and minimum values plotted with outliers excluded
Outliers are shown if requested
Outlier
Maximum value
Top 25%
75% quartile
Median value
IQR
25% quartile
Bottom 25%
Minimum value
Go back to the statistics options, in Descriptive plots tick both Boxplot and Violin Element, look at how
the plot has changed. Next tick Boxplot, Violin and Jitter Elements. The Violin plot has taken the
smoothed distribution curve from the Distribution plot, rotated it 90o and superimposed it on the
boxplot. The jitter plot has further added all the data points.
Boxplot + Violin plot
Boxplot + Violin + Jitter plot
23 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
If your data is split by group, for example, the boxplots for each group will be shown on the same
graph, the colours of each will be different if the Colour palette is ticked. 5 colour palettes are
available.
Ggplot2 palette
Viridis pallette
Scatter Plots
JASP can produce scatterplots of various types and can include smooth or linear regression lines.
There are also options to add distributions to these either in the form of density plots or histograms.
24 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Tile Heatmap
These plots provide an alternative way of visualising data. For example, using the titanic survival
dataset to look at the relationship between the class of passage and survival.
25 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
EDITING PLOTS
Clicking on the drop-down menu provided access to a range of options including Edit Image.
Selecting this option provides some customisation for each graph.
This will open the plot in a new window which allows some modifications of each axis in terms of
axis title and range.
Any changes are then updated in the results window. The new plot can be saved as an image or can
be reset to default values.
Do not forget that group labels can be changed in the spreadsheet editor.
26 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
27 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
EXPLORING DATA INTEGRITY
Sample data is used to estimate parameters of the population whereby a parameter is a measurable
characteristic of a population, such as a mean, standard deviation, standard error or confidence
intervals etc.
What is the difference between a statistic and a parameter? If you randomly polled a selection of
students about the quality of their student bar and you find that 75% of them were happy with it. That
is a sample statistic since only a sample of the population were asked. You calculated what the
population was likely to do based on the sample. If you asked all the students in the university and
90% were happy you have a parameter since you asked the whole university population.
Bias can be defined as the tendency of a measurement to over or underestimate the value of a
population parameter. There are many types of bias that can appear in research design and data
collection including:
•
•
•
Participant selection bias – some being more likely to be selected for study than others
Participant exclusion bias – due to the systematic exclusion of certain individuals from the
study
Analytical bias – due to the way that the results are evaluated
However statistical bias can affect a) parameter estimates, b) standard errors and confidence intervals
or c) test statistics and p values. So how can we check for bias?
IS YOUR DATA CORRECT?
Outliers are data points that are abnormally outside all other data points. Outliers can be due to a
variety of things such as errors in data input or analytical errors at the point of data collection Boxplots
are an easy way to visualise such data points where outliers are outside the upper (75% + 1.5 * IQR)
or lower (25% – 1.5 * IQR) quartiles
Boxplots show:
•
•
•
•
•
Median value
25 & 75% quartiles
IQR – Inter quartile range
Max & min values plotted
with outliers excluded
Outliers shown if requested
28 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Load Exploring Data.csv into JASP. Under Descriptives > Descriptive Statistics, add Variable 1 to the
Variables box. In Plots tick the following Boxplots, Label Outliers, and BoxPlot Element.
The resulting Boxplot on the left looks very compressed and an obvious outlier is labelled as being in
row 38 of the dataset. This can be traced back to a data input error in which 91.7 was input instead of
917. The graph on the right shows the BoxPlot for the ‘clean’ data.
29 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
How you deal with an outlier depends on the cause. Most parametric tests are highly sensitive to
outliers while non-parametric tests are generally not.
Correct it? – Check the original data to make sure that it isn’t an input error, if it is, correct it, and
rerun the analysis.
Keep it? – Even in datasets of normally distributed data, outliers may be expected for large sample
sizes and should not automatically be discarded if that is the case.
Delete it? – This is a controversial practice in small datasets where a normal distribution cannot be
assumed. Outliers resulting from an instrument reading error may be excluded but they should be
verified first.
Replace it? – Also known as winsorizing. This technique replaces the outlier values with the relevant
maximum and/or minimum values found after excluding the outlier.
Whatever method you use must be justified in your statistical methodology and subsequent analysis.
WE MAKE MANY ASSUMPTIONS ABOUT OUR DATA.
When using parametric tests, we make a series of assumptions about our data and bias will occur if
these assumptions are violated, in particular:
•
•
Normality
Homogeneity of variance or homoscedasticity
Many statistical tests are an omnibus of tests of which some will check these assumptions.
TESTING THE ASSUMPTION OF NORMALITY
Normality does not mean necessarily that the data is normally distributed per se but it is whether or
not the dataset can be well modelled by a normal distribution. Normality can be explored in a variety
of ways:
•
•
•
Numerically
Visually / graphically
Statistically
Numerically we can use the Descriptives output to calculate skewness and kurtosis. For a normal data
distribution, both values should be close to zero. To determine the significance of skewness or kurtosis
we calculate their z-scores by dividing them by their associated standard errors:
skewness
Skewness Z =Skewness standard error
Z score significance:
kurtosis
Kurtosis Z =kurtosis standard error
p1.96
p2.58
p3.29
30 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Using Exploring data.csv, go to Descriptives>Descriptive Statistics move Variable 3 to the Variables
box, in the Statistics drop-down menu select Mean, Std Deviation, Skewness and Kurtosis as shown
below with the corresponding output table.
Both skewness and kurtosis are not close to 0. The positive skewness suggests that data is distributed
more on the left (see graphs later) while the negative kurtosis suggests a flat distribution. When
calculating their z scores it can be seen that the data is significantly skewed pIndependent Samples t-test move Variable 1 to the Variables
box and Group to the Grouping variable and tick Assumption Checks > Equality of variances.
In this case, there is no significant difference in variance between the two groups F (1) = 0.218, p =.643.
The assumption of homoscedasticity (equal variance) is important in linear regression models as is
linearity. It assumes that the variance of the data around the regression line is the same for all
predictor data points. Heteroscedasticity (the violation of homoscedasticity) is present when the
34 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
variance differs across the values of an independent variable. This can be visually assessed in linear
regression by plotting actual residuals against predicted residuals
If homoscedasticity and linearity are not violated there should be no relationship between what the
model predicts and its errors as shown in the graph on the left. Any sort of funnelling (middle graph)
suggests that homoscedasticity has been violated and any curve (right graph) suggests that linearity
assumptions have not been met.
35 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
DATA TRANSFORMATION
JASP provides the ability to compute new variables or transform data. In some cases, it may
be useful to compute the differences between repeated measures or, to make a dataset more
normally distributed, you can apply a log transform for example.
When a dataset is opened there will be a plus sign (+) at the end of the columns.
Clicking on the + opens up a small dialogue window where you can;
•
•
•
Enter the name of a new variable or the transformed variable
Select whether you enter the R code directly or use the commands built into JASP
Select what data type is required
Once you have named the new variable and chosen the other options – click create.
36 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
If you choose the manual option rather than the R code, this opens all the built-in create and
transform options. Although not obvious, you can scroll the left and right-hand options to see
more variables or more operators respectively.
For example, we want to create a column of data showing the difference between variable 2
and variable 3. Once you have entered the column name in the Create Computed Column
dialogue window, its name will appear in the spreadsheet window. The mathematical
operation now needs to be defined. In this case drag variable 2 into the equation box, drag
the ‘minus’ sign down and then drag in variable 3.
37 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
If you have made a mistake, i.e. used the wrong variable or operator, remove it by dragging
the item into the dustbin in the bottom right corner.
When you are happy with the equation/operation, click compute column and the data will be
entered.
If you decide that you do not want to keep the derived data, you can remove the column by
clicking the other dustbin icon next to the R.
Another example is to do a log transformation of the data. In the following case variable 1 has
been transformed by scrolling the operators on the left and selecting the log10(y) option.
Replace the “y” with the variable that you want to transform and then click Compute column.
When finished, click the X to close the dialogue.
38 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The Export function will also export any new data variables that have been created.
The two graphs below show the untransformed and the log10 transformed data. The skewed
data has been transformed into a profile with a more normal distribution
Untransformed
Log10 transformed
39 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
EFFECT SIZE
When performing a hypothesis test on data we determine the relevant statistic (r, t, F etc) and p-value
to decide whether to accept or reject the null hypothesis. A small p-value, 0.05 only means that there is
not enough evidence to reject the null hypothesis. A lower p-value is sometimes incorrectly
interpreted as meaning there is a stronger relationship of difference between variables. So what is
needed is not just null hypothesis testing but also a method of determining precisely how large the
effects seen in the data are.
An effect size is a statistical measure used to determine the strength of the relationship or difference
between variables. Unlike a p-value, effect sizes can be used to quantitatively compare the results
of different studies.
For example, comparing heights between 11 and 12-year-old children may show that the 12-yearolds are significantly taller but it is difficult to visually see a difference i.e. small effect size. However,
a significant difference in height between 11 and 16-year-old children is obvious to see (large effect
size).
The effect size is usually measured in three ways:
•
•
•
the standardized mean difference
correlation coefficient
odds ratio
When looking at differences between groups most techniques are primarily based on the differences
between the means divided by the average standard deviations. The values derived can then be used
to describe the magnitude of the differences. The effect sizes calculated in JASP for t-tests and ANOVA
are shown below:
40 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
When analysing bivariate or multivariate relationships the effect sizes are the correlation
coefficients:
When analysing categorical relationships via contingency tables i.e. chi-square test Phi is only used for
2×2 tables while Cramer’s V and be used for any table size.
For a 2 × 2 contingency table, we can also define the odds ratio measure of effect size.
41 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
ONE SAMPLE T-TEST
Research is normally carried out in sample populations, but how close does the sample reflect the
whole population? The parametric one-sample t-test determines whether the sample mean is
statistically different from a known or hypothesized population mean.
The null hypothesis (Ho) tested is that the sample mean is equal to the population mean.
ASSUMPTIONS
Three assumptions are required for a one-sample t-test to provide a valid result:
•
•
•
•
The test variable should be measured on a continuous scale.
The test variable data should be independent i.e. no relationship between any of the data
points.
The data should be approximately normally distributed
There should be no significant outliers.
RUNNING THE ONE SAMPLE T-TEST
Open one sample t-test.csv, this contains two columns of data representing the height (cm) and body
masses (kg) of a sample population of males used in a study. In 2017 the average adult male in the UK
population was 178 cm tall and has a body mass of 83.4 kg.
Go to T-Tests > One-Sample t-test and in the first instance add height to the analysis box on the right.
Then tick the following options above and add 178 as the test value:
42 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
UNDERSTANDING THE OUTPUT
The output should contain three tables and two graphs.
The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the heights are
normally distributed, therefore this assumption is not violated. If this showed a significant difference
the analysis should be repeated using the non-parametric equivalent, Wilcoxon’s signed-rank test
tested against the population median height.
This table shows that there are no significant differences between the means p =.706
The descriptive data shows that the mean height of the sample population was 177.6 cm compared
to the average 178 cm UK male.
43 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
The two plots show essentially the same data but in different ways. The standard Descriptive plot is
an Interval plot showing the sample mean (black bullet), the 95% confidence interval (whiskers),
relative to the test value (dashed line).
The Raincloud Plot shows the data as individual data points, boxplot, and the distribution plot. This
can be shown as either a vertical or horizontal display.
44 | P a g e
JASP 0.16.1 – Dr Mark Goss-Sampson
Repeat the procedure by replacing height with mass and changing the test value to 83.4.
The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the masses are
normally distributed.
This table shows that there is a significant difference between the mean sample (72.9 kg) and
population body mass (83.4 kg) p
We've got everything to become your favourite writing service
Money back guarantee
Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.
Confidentiality
We don’t share your private information with anyone. What happens on our website stays on our website.
Our service is legit
We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.
Get a plagiarism-free paper
We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.
We can help with urgent tasks
Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.
Pay a fair price
Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.