UV Statistics a Study on Interaction Effect of Gender Project

For the culminating activity, you will build upon the previous weekly assignments. You will use a self-selected, pre-approved data set to complete this assignment. After reviewing the data, complete the following:

Write at least 1 well-articulated RQ that addresses the data set – incorporating a null and alternate hypothesis.

Don't use plagiarized sources. Get Your Custom Essay on
UV Statistics a Study on Interaction Effect of Gender Project
Just from $13/Page
Order Essay

Conduct a full analysis based on your RQ using JASP.

  • Upon completion of the analysis, write a mini-Chapter Four (Research Findings). As noted in the Doctoral Research Handbook, this will include:
  • Introduction
  • Participants and Research Setting

    Analyses of Research Questions (one at a time)

    Supplementary Findings (if any)

  • Summary
  • Resource Materials materials:
  • Guided Notes (attached below)

  • MANOVA Introduction I & II (attached below)
  • Learning Statistics with JASP,Chapter 12, pp. 293-326 (attached below)

    Statistical Analysis in JASP: A Students-Guide, pp. 91-102 (attached below)

    American Psychological Association (2019). 

    Chapter 7 Tables and Figures. In the American Psychological Association

    APA Manual (7th Ed.), pp. 195-250. American Psychological Association

    5th Edition JASP v0.16.1 2022
    Copyright © 2020 by Mark A Goss-Sampson.
    Licenced as CC BY 4.0
    All rights reserved. This book or any portion thereof may not be reproduced or used in any manner
    whatsoever without the express written permission of the author except for research, education or
    private study.
    CONTENTS
    PREFACE …………………………………………………………………………………………………………………….. 1
    USING THE JASP ENVIRONMENT……………………………………………………………………………………… 2
    DATA HANDLING IN JASP ………………………………………………………………………………………………. 8
    JASP ANALYSIS MENU …………………………………………………………………………………………………. 11
    DESCRIPTIVE STATISTICS ……………………………………………………………………………………………… 14
    SPLITTING DATA FILES ……………………………………………………………………………………………………….. 19
    DESCRIPTIVE DATA VISUALISATION ……………………………………………………………………………….. 20
    BASIC PLOTS …………………………………………………………………………………………………………………….. 20
    CUSTOMISABLE PLOTS ………………………………………………………………………………………………………. 22
    EDITING PLOTS …………………………………………………………………………………………………………………. 26
    EXPLORING DATA INTEGRITY………………………………………………………………………………………… 28
    DATA TRANSFORMATION…………………………………………………………………………………………….. 36
    EFFECT SIZE ……………………………………………………………………………………………………………….. 40
    ONE SAMPLE T-TEST……………………………………………………………………………………………………. 42
    BINOMIAL TEST ………………………………………………………………………………………………………….. 46
    MULTINOMIAL TEST ……………………………………………………………………………………………………. 49
    CHI-SQUARE ‘GOODNESS-OF-FIT’ TEST………………………………………………………………………………… 51
    MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. ………………………………………………………………… 52
    COMPARING TWO INDEPENDENT GROUPS ……………………………………………………………………… 54
    INDEPENDENT T-TEST ……………………………………………………………………………………………………….. 54
    MANN-WITNEY U TEST ……………………………………………………………………………………………………… 59
    COMPARING TWO RELATED GROUPS …………………………………………………………………………….. 61
    PAIRED SAMPLES T-TEST ……………………………………………………………………………………………………. 61
    WILCOXON’S SIGNED RANK TEST………………………………………………………………………………………… 65
    CORRELATION ANALYSIS ……………………………………………………………………………………………… 67
    REGRESSION ……………………………………………………………………………………………………………… 73
    SIMPLE REGRESSION …………………………………………………………………………………………………………. 76
    MULTIPLE REGRESSION……………………………………………………………………………………………………… 79
    LOGISTIC REGRESSION ………………………………………………………………………………………………… 86
    COMPARING MORE THAN TWO INDEPENDENT GROUPS ……………………………………………………. 91
    ANOVA ……………………………………………………………………………………………………………………………. 91
    KRUSKAL-WALLIS – NON-PARAMETRIC ANOVA ……………………………………………………………………. 99
    COMPARING MORE THAN TWO RELATED GROUPS …………………………………………………………. 103
    RMANOVA ……………………………………………………………………………………………………………………… 103
    FRIEDMAN’S REPEATED MEASURES ANOVA ………………………………………………………………………. 109
    COMPARING INDEPENDENT GROUPS AND THE EFFECTS OF COVARIATES ……………………………. 112
    ANCOVA ………………………………………………………………………………………………………………………… 112
    TWO-WAY INDEPENDENT ANOVA ……………………………………………………………………………….. 120
    TWO-WAY REPEATED MEASURES ANOVA ……………………………………………………………………… 128
    MIXED FACTOR ANOVA ……………………………………………………………………………………………… 137
    CHI-SQUARE TEST FOR ASSOCIATION……………………………………………………………………………. 145
    META-ANALYSIS IN JASP ……………………………………………………………………………………………. 152
    EXPERIMENTAL DESIGN AND DATA LAYOUT IN EXCEL FOR JASP IMPORT. ……………………………. 161
    Independent t-test ………………………………………………………………………………………………………….. 161
    Paired samples t-test ………………………………………………………………………………………………………. 162
    Correlation …………………………………………………………………………………………………………………….. 163
    Logistic Regression ………………………………………………………………………………………………………….. 165
    One-way Independent ANOVA …………………………………………………………………………………………. 166
    One-way repeated measures ANOVA ………………………………………………………………………………… 167
    Two-way Independent ANOVA …………………………………………………………………………………………. 168
    Two-way Repeated measures ANOVA ……………………………………………………………………………….. 169
    Two-way Mixed Factor ANOVA…………………………………………………………………………………………. 170
    Chi-squared – Contingency tables ……………………………………………………………………………………… 171
    SOME CONCEPTS IN FREQUENTIST STATISTICS ……………………………………………………………….. 172
    WHICH TEST SHOULD I USE? ……………………………………………………………………………………….. 176
    Comparing one sample to a known or hypothesized population mean………………………………….. 176
    Testing relationships between two or more variables …………………………………………………………. 176
    Predicting outcomes ……………………………………………………………………………………………………….. 177
    Testing for differences between two independent groups …………………………………………………… 177
    Testing for differences between two related groups …………………………………………………………… 177
    Testing for differences between three or more independent groups…………………………………….. 178
    Testing for differences between three or more related groups …………………………………………….. 179
    Test for interactions between 2 or more independent variables …………………………………………… 179
    PREFACE
    JASP stands for Jeffrey’s Amazing Statistics Program in recognition of the pioneer of Bayesian
    inference Sir Harold Jeffreys. This is a free multi-platform open-source statistics package, developed
    and continually updated by a group of researchers at the University of Amsterdam. They aimed to
    develop a free, open-source programme that includes both standard and more advanced statistical
    techniques with a major emphasis on providing a simple intuitive user interface.
    In contrast to many statistical packages, JASP provides a simple drag and drop interface, easy access
    menus, intuitive analysis with real-time computation and display of all results. All tables and graphs
    are presented in APA format and can be copied directly and/or saved independently. Tables can also
    be exported from JASP in LaTeX format
    JASP can be downloaded free from the website https://jasp-stats.org/ and is available for Windows,
    Mac OS X and Linux. You can also download a pre-installed Windows version that will run directly from
    a USB or external hard drive without the need to install it locally. The WIX installer for Windows
    enables you to choose a path for the installation of JASP – however, this may be blocked in some
    institutions by local Administrative rights.
    The programme also includes a data library with an initial collection of over 50 datasets from Andy
    Fields’ book, Discovering Statistics using IBM SPSS statistics1 and The Introduction to the Practice of
    Statistics2 by Moore, McCabe and Craig.
    Keep an eye on the JASP site since there are regular updates as well as helpful videos and blog posts!!
    This book is a collection of standalone handouts covering the most common standard (frequentist)
    statistical analyses used by students studying Human Sciences. Datasets used in this document are
    available for download from https://osf.io/bx6uv/
    I would also like to thank Per Palmgren from the Karolinska Institutet in Sweden for his helpful
    comments, suggestions and proofreading of this guide.
    Dr Mark Goss-Sampson
    Centre for Exercise Activity and Rehabilitation
    University of Greenwich
    2022
    1
    2
    A Field. (2017) Discovering Statistics Using IBM SPSS Statistics (5th Ed.) SAGE Publications.
    D Moore, G McCabe, B Craig. (2011) Introduction to the Practice of Statistics (7th Ed.) W H Freeman.
    1|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    USING THE JASP ENVIRONMENT
    Open JASP.
    The main menu can be accessed by clicking on the top-left icon.
    Open:
    JASP has its own .jasp format but can open a variety of
    different dataset formats such as:









    .csv (comma separated values) can be saved in Excel
    .txt (plain text) also can be saved in Excel
    .tsv (tab-separated values) also can be saved in Excel
    .sav (IBM SPSS data file)
    .ods (Open Document spreadsheet)
    .dta (Stata data file)
    .por (SPSS ASCII file)
    .Sas7bdat /cat (SAS data files)
    .xpt (SAS transport file)
    You can open recent files, browse your computer files,
    access the Open Science Framework (OSF) or open the
    wide range of examples
    that are packaged with
    the Data Library in
    JASP.
    2|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Save/Save as:
    Using these options the data file, any annotations and the analysis
    can be saved in the .jasp format
    Export:
    Results can be exported to either an HTML file or as a PDF
    Data can be exported to either a .csv, .tsv or .txt file
    Sync data:
    Used to synchronize with any updates in the current data file (also
    can use Ctrl-Y)
    Close:
    As it states – it closes the current file but not JASP
    Preferences:
    There are four sections that users can use to tweak JASP to suit their needs
    3|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    In the Data Preferences section users can:




    Synchronize/update the data automatically when the data file is saved (default)
    Set the default spreadsheet editor (i.e. Excel, SPSS etc)
    Change the threshold so that JASP more readily distinguishes between nominal and scale data
    Add a custom missing value code
    In
    the Results Preferences section users can:




    Set JASP to return exact p values i.e. P=0.00087 rather than P Export Results.
    The Add notes menu provides many options to change text font, colour size etc.
    You can change the size of all the tables and graphs using ctrl+ (increase) ctrl- (decrease) ctrl= (back
    to default size). Graphs can also be resized by dragging the bottom right corner of the graph.
    As previously mentioned, all tables and figures are APA standard and can just be copied into any other
    document. Since all images can be copied/saved with either a white or transparent background. This
    can be selected in Preferences > Advanced as described earlier.
    There are many further resources on using JASP on the website https://jasp-stats.org/
    7|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    DATA HANDLING IN JASP
    For this section open England injuries.csv
    All files must have a header label in the first row. Once loaded, the dataset appears in the window:
    For large datasets, there is a hand icon that allows easy scrolling through the data.
    On import JASP makes a best guess at assigning data to the different variable types:
    Nominal
    Ordinal
    Continuous
    If JASP has incorrectly identified the data type just click on the appropriate variable data icon in the
    column title to change it to the correct format.
    If you have coded the data you can click on the variable name to open up the following window in
    which you can label each code. These labels now replace the codes in the spreadsheet view. If you
    save this as a .jasp file these codes, as well as all analyses and notes, will be saved automatically. This
    makes the data analysis fully reproducible.
    8|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    In this window, you can also carry out simple filtering of data, for example, if you untick the Wales
    label it will not be used in subsequent analyses.
    Clicking this icon in the spreadsheet window opens up a much more comprehensive set of data
    filtering options:
    Using this option will not be covered in this document. For detailed information on using more
    complex filters refer to the following link: https://jasp-stats.org/2018/06/27/how-to-filter-your-datain-jasp/
    9|Page
    JASP 0.16.1 – Dr Mark Goss-Sampson
    By default, JASP plots data in the Value order (i.e. 1-4). The order can be changed by highlighting the
    label and moving it up or down using the appropriate arrows:
    Move up
    Move down
    Reverse order
    Close
    If you need to edit the data in the spreadsheet just double click on a cell and the data should open up
    in the original spreadsheet i.e. Excel. Once you have edited your data and saved the original
    spreadsheet JASP will automatically update to reflect the changes that were made, provided that you
    have not changed the file name.
    10 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    JASP ANALYSIS MENU
    The main analysis options can be accessed from the main toolbar. Currently, JASP offers the following
    frequentist (parametric and non-parametric standard statistics) and alternative Bayesian tests:
    Descriptives
    • Descriptive stats
    T-Tests
    • Independent
    • Paired
    • One sample
    ANOVA
    • Independent
    • Repeated measures
    • ANCOVA
    • MANOVA *
    Mixed Models*
    • Linear Mixed Models
    Generalised linear mixed models
    Regression
    • Correlation
    • Linear regression
    • Logistic regression
    Frequencies
    • Binomial test
    • Multinomial test
    • Contingency tables
    • Log-linear regression*
    Factor
    • Principal Component Analysis (PCA)*
    • Exploratory Factor Analysis (EFA)*
    • Confirmatory Factor Analysis (CFA)*
    * Not covered in this guide
    BY clicking on the
    + icon on the top-right menu bar you can also access advanced options that allow
    the addition of optional modules. Once ticked they will be added to the main analysis ribbon. These
    include;
    Audit
    Network analysis
    BAIN
    Prophet
    Circular Statistics
    Reliability analysis
    Distributions
    SEM
    Cochrane Meta-Analyses
    Summary statistics
    Equivalence tests
    Visual modelling
    JAGS
    Learning Bayes
    Machine learning
    R (beta)
    Meta-analysis (included in this guide)
    See the JASP website for more information on these advanced modules
    11 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Once you have selected your required analysis all the possible statistical options appear in the left
    window and output in the right window.
    JASP provides the ability to rename and ‘stack’ the results output thereby organising multiple
    analyses.
    The individual analyses can be renamed using the pen icon or deleted using the red cross.
    Clicking on the analysis in this list will then take you to the appropriate part of the results output
    window. They can also be rearranged by dragging and dropping each of the analyses.
    The green
    + icon produces a copy of the chosen analysis
    The blue information icon provides detailed information on each of the statistical procedures used
    and includes a search option.
    12 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    13 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    DESCRIPTIVE STATISTICS
    Presentation of all the raw data is very difficult for a reader to visualise or draw any inference on.
    Descriptive statistics and related plots are a succinct way of describing and summarising data but do
    not test any hypotheses. There are various types of statistics that are used to describe data:





    Measures of central tendency
    Measures of dispersion
    Percentile values
    Measures of distribution
    Descriptive plots
    To explore these measures, load Descriptive data.csv into JASP. Go to Descriptives > Descriptive
    statistics and move the Variable data to the Variables box on the right.
    You also have options to change and add tables in this section:




    Split analyses by a categorical variable (i.e., group)
    Transpose the main descriptive table (switch columns and rows)
    Add frequency tables – important for categorical data
    Add stem and Leaf tables: (shows all numeric observations from small to large. The
    observations are split into a “stem”, the first digit(s), and a “leaf”, the subsequent digit).
    14 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The Statistics menu can now be opened to see the various options available.
    CENTRAL TENDENCY.
    This can be defined as the tendency for variable values to cluster around a central value. The three
    ways of describing this central value are mean, median or mode. If the whole population is considered
    the term population mean / median/mode is used. If a sample/subset of the population is being
    analysed the term sample mean/ median/mode is used. The measures of central tendency move
    toward a constant value when the sample size is sufficient to be representative of the population.
    The mean, M or x̅ (17.71) is equal to the sum of all the values divided by the number of values in the
    dataset i.e. the average of the values. It is used for describing continuous data. It provides a simple
    statistical model of the centre of distribution of the values and is a theoretical estimate of the ‘typical
    value’. However, it can be influenced heavily by ‘extreme’ scores.
    15 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The median, Mdn (17.9) is the middle value in a dataset that has been ordered from the smallest to
    largest value and is the normal measure used for ordinal or non-parametric continuous data. Less
    sensitive to outliers and skewed data
    The mode (20.0) is the most frequent value in the dataset and is usually the highest bar in a distribution
    histogram
    DISPERSION
    The standard error of the mean, SE (0.244) is a measure of how far the sample mean of the data is
    expected to be from the true population mean. As the size of the sample data grows larger the SE
    decreases compared to S and the true mean of the population is known with greater specificity.
    Standard deviation, S or SD (6.935) is used to quantify the amount of dispersion of data values around
    the mean. A low standard deviation indicates that the values are close to the mean, while a high
    standard deviation indicates that the values are dispersed over a wider range.
    The coefficient of variation (0.392) provides the relative dispersion of the data, in contrast to the
    standar5)d deviation, which gives the absolute dispersion.
    MAD, (4.7) median absolute deviation, a robust measure of the spread of data. It is relatively
    unaffected by data that is not normally distributed. Reporting median +/- MAD for data that is not
    normally distributed is equivalent to mean +/- SD for normally distributed data.
    MAD Robust: (6.968) median absolute deviation of the data points, adjusted by a factor for
    asymptotically normal consistency.
    IQR (9.175) Interquartile Range is similar to the MAD but is less robust (see Boxplots).
    Variance (48.1) is another estimate of how far the data is spread from the mean. It is also the square
    of the standard deviation.
    Confidence intervals (CI), although not shown in the general Descriptive statistics output, they are
    used in many other statistical tests. When sampling from a population to get an estimate of the mean,
    confidence intervals are a range of values within which you are n% confident the true mean is
    included. A 95% CI is, therefore, a range of values that one can be 95% certain contains the true mean
    of the population. This is not the same as a range that contains 95% of ALL the values.
    16 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    For example, in a normal distribution, 95% of the data are expected to be within ± 1.96 SD of the mean
    and 99% within ± 2.576 SD.
    95% CI = M ± 1.96 * the standard error of the mean.
    Based on the data so far, M = 17.71, SE = 0.24, this will be 17.71 ± (1.96 * 0.24) or 17.71 ± 0.47.
    Therefore the 95% CI for this dataset is 17.24 – 18.18 and suggests that the true mean is likely to be
    within this range 95% of the time
    QUARTILES
    In the Statistics options make sure that everything is unticked apart from Quartiles.
    Quartiles are where datasets are split into 4 equal quarters, normally based on rank ordering of
    median values. For example, in this dataset
    1
    1
    2
    2
    3 3
    25%
    4
    4
    4
    4
    5
    50%
    5
    5
    6
    7
    8
    8
    75%
    9
    10
    10
    10
    The median value that splits data by 50% = 50th percentile = 5
    The median value of left side = 25th percentile = 3
    The median value of right side = 75th percentile = 8
    From this the Interquartile range (IQR) range can be calculated, this is the difference between the 75th
    and 25th percentiles i.e. 5. These values are used to construct the descriptive boxplots later. The IQR
    can also be shown by ticking this option in the Dispersion menu.
    17 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    DISTRIBUTION
    Skewness describes the shift of the distribution away from a normal distribution. Negative skewness
    shows that the mode moves to the right resulting in a dominant left tail. Positive skewness shows
    that the mode moves to the left resulting in a dominant right tail.
    Negative skewness
    Positive skewness
    Kurtosis describes how heavy or light the tails are. Positive kurtosis results in an increase in the
    “pointiness” of the distribution with heavy (longer) tails while negative kurtosis exhibit a much more
    uniform or flatter distribution with light (shorter) tails.
    + kurtosis
    Normal
    – kurtosis
    In the Statistics options make sure that everything is unticked apart from skewness, kurtosis and
    Shapiro-Wilk test.
    18 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    We can use the Descriptives output to calculate skewness and kurtosis. For a normal data distribution,
    both values should be close to zero. The Shapiro-Wilk test is used to assess if the data is significantly
    different from a normal distribution. (see – Exploring data integrity in JASP for more details).
    SPLITTING DATA FILES
    If there is a grouping variable (categorical or ordinal) descriptive statistics and plots can be produced
    for each group. Using Descriptive data.csv with the variable data in the Variables box now add Group
    to the Split box.
    19 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    DESCRIPTIVE DATA VISUALISATION
    JASP produces a comprehensive range of descriptive and analysis specific plots. The analysis specific
    plots will be explained in their relevant chapters
    BASIC PLOTS
    Firstly, to look at examples of the basic plots, open Descriptive data.csv with the variable data in the
    Variables box, go to Plots and tick Distribution plots, Display density, Interval plots, Q-Q plots, and dot
    plots.
    The Distribution plot is based on splitting the data into frequency bins, this is then overlaid with the
    distribution curve. As mentioned before, the highest bar is the mode (most frequent value of the
    dataset. In this case, the curve looks approximately symmetrical suggesting that the data is
    approximately normally distributed. The second distribution plot is from another dataset which shows
    that the data is positively skewed.
    20 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The dot plot displays the distribution where each dot represents a value. If a value occurs more than
    once, the dots are placed one above the other so that the height of the column of dots represents the
    frequency for that value.
    The interval plot shows a 95% confidence interval for the mean of each variable.
    The Q-Q plot (quantile-quantile plot) can be used to visually assess if a set of data comes from a normal
    distribution. Q-Q plots take the sample data, sort it in ascending order, and then plot them against
    quantiles (percentiles) calculated from a theoretical distribution. If the data is normally distributed,
    the points will fall on or close to the 45-degree reference line. If the data is not normally distributed,
    the points will deviate from the reference line.
    21 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Depending on the data sets, basic correlation graphs and pie charts for non-scale data can also be
    produced.
    CUSTOMISABLE PLOTS
    There are a variety of options depending on your datasets.
    22 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The boxplots visualise several statistics described above in one plot:





    Median value
    25 and 75% quartiles
    Interquartile range (IQR) i.e., 75% – 25% quartile values
    Maximum and minimum values plotted with outliers excluded
    Outliers are shown if requested
    Outlier
    Maximum value
    Top 25%
    75% quartile
    Median value
    IQR
    25% quartile
    Bottom 25%
    Minimum value
    Go back to the statistics options, in Descriptive plots tick both Boxplot and Violin Element, look at how
    the plot has changed. Next tick Boxplot, Violin and Jitter Elements. The Violin plot has taken the
    smoothed distribution curve from the Distribution plot, rotated it 90o and superimposed it on the
    boxplot. The jitter plot has further added all the data points.
    Boxplot + Violin plot
    Boxplot + Violin + Jitter plot
    23 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    If your data is split by group, for example, the boxplots for each group will be shown on the same
    graph, the colours of each will be different if the Colour palette is ticked. 5 colour palettes are
    available.
    Ggplot2 palette
    Viridis pallette
    Scatter Plots
    JASP can produce scatterplots of various types and can include smooth or linear regression lines.
    There are also options to add distributions to these either in the form of density plots or histograms.
    24 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Tile Heatmap
    These plots provide an alternative way of visualising data. For example, using the titanic survival
    dataset to look at the relationship between the class of passage and survival.
    25 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    EDITING PLOTS
    Clicking on the drop-down menu provided access to a range of options including Edit Image.
    Selecting this option provides some customisation for each graph.
    This will open the plot in a new window which allows some modifications of each axis in terms of
    axis title and range.
    Any changes are then updated in the results window. The new plot can be saved as an image or can
    be reset to default values.
    Do not forget that group labels can be changed in the spreadsheet editor.
    26 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    27 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    EXPLORING DATA INTEGRITY
    Sample data is used to estimate parameters of the population whereby a parameter is a measurable
    characteristic of a population, such as a mean, standard deviation, standard error or confidence
    intervals etc.
    What is the difference between a statistic and a parameter? If you randomly polled a selection of
    students about the quality of their student bar and you find that 75% of them were happy with it. That
    is a sample statistic since only a sample of the population were asked. You calculated what the
    population was likely to do based on the sample. If you asked all the students in the university and
    90% were happy you have a parameter since you asked the whole university population.
    Bias can be defined as the tendency of a measurement to over or underestimate the value of a
    population parameter. There are many types of bias that can appear in research design and data
    collection including:



    Participant selection bias – some being more likely to be selected for study than others
    Participant exclusion bias – due to the systematic exclusion of certain individuals from the
    study
    Analytical bias – due to the way that the results are evaluated
    However statistical bias can affect a) parameter estimates, b) standard errors and confidence intervals
    or c) test statistics and p values. So how can we check for bias?
    IS YOUR DATA CORRECT?
    Outliers are data points that are abnormally outside all other data points. Outliers can be due to a
    variety of things such as errors in data input or analytical errors at the point of data collection Boxplots
    are an easy way to visualise such data points where outliers are outside the upper (75% + 1.5 * IQR)
    or lower (25% – 1.5 * IQR) quartiles
    Boxplots show:





    Median value
    25 & 75% quartiles
    IQR – Inter quartile range
    Max & min values plotted
    with outliers excluded
    Outliers shown if requested
    28 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Load Exploring Data.csv into JASP. Under Descriptives > Descriptive Statistics, add Variable 1 to the
    Variables box. In Plots tick the following Boxplots, Label Outliers, and BoxPlot Element.
    The resulting Boxplot on the left looks very compressed and an obvious outlier is labelled as being in
    row 38 of the dataset. This can be traced back to a data input error in which 91.7 was input instead of
    917. The graph on the right shows the BoxPlot for the ‘clean’ data.
    29 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    How you deal with an outlier depends on the cause. Most parametric tests are highly sensitive to
    outliers while non-parametric tests are generally not.
    Correct it? – Check the original data to make sure that it isn’t an input error, if it is, correct it, and
    rerun the analysis.
    Keep it? – Even in datasets of normally distributed data, outliers may be expected for large sample
    sizes and should not automatically be discarded if that is the case.
    Delete it? – This is a controversial practice in small datasets where a normal distribution cannot be
    assumed. Outliers resulting from an instrument reading error may be excluded but they should be
    verified first.
    Replace it? – Also known as winsorizing. This technique replaces the outlier values with the relevant
    maximum and/or minimum values found after excluding the outlier.
    Whatever method you use must be justified in your statistical methodology and subsequent analysis.
    WE MAKE MANY ASSUMPTIONS ABOUT OUR DATA.
    When using parametric tests, we make a series of assumptions about our data and bias will occur if
    these assumptions are violated, in particular:


    Normality
    Homogeneity of variance or homoscedasticity
    Many statistical tests are an omnibus of tests of which some will check these assumptions.
    TESTING THE ASSUMPTION OF NORMALITY
    Normality does not mean necessarily that the data is normally distributed per se but it is whether or
    not the dataset can be well modelled by a normal distribution. Normality can be explored in a variety
    of ways:



    Numerically
    Visually / graphically
    Statistically
    Numerically we can use the Descriptives output to calculate skewness and kurtosis. For a normal data
    distribution, both values should be close to zero. To determine the significance of skewness or kurtosis
    we calculate their z-scores by dividing them by their associated standard errors:
    skewness
    Skewness Z =Skewness standard error
    Z score significance:
    kurtosis
    Kurtosis Z =kurtosis standard error
    p1.96
    p2.58
    p3.29
    30 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Using Exploring data.csv, go to Descriptives>Descriptive Statistics move Variable 3 to the Variables
    box, in the Statistics drop-down menu select Mean, Std Deviation, Skewness and Kurtosis as shown
    below with the corresponding output table.
    Both skewness and kurtosis are not close to 0. The positive skewness suggests that data is distributed
    more on the left (see graphs later) while the negative kurtosis suggests a flat distribution. When
    calculating their z scores it can be seen that the data is significantly skewed pIndependent Samples t-test move Variable 1 to the Variables
    box and Group to the Grouping variable and tick Assumption Checks > Equality of variances.
    In this case, there is no significant difference in variance between the two groups F (1) = 0.218, p =.643.
    The assumption of homoscedasticity (equal variance) is important in linear regression models as is
    linearity. It assumes that the variance of the data around the regression line is the same for all
    predictor data points. Heteroscedasticity (the violation of homoscedasticity) is present when the
    34 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    variance differs across the values of an independent variable. This can be visually assessed in linear
    regression by plotting actual residuals against predicted residuals
    If homoscedasticity and linearity are not violated there should be no relationship between what the
    model predicts and its errors as shown in the graph on the left. Any sort of funnelling (middle graph)
    suggests that homoscedasticity has been violated and any curve (right graph) suggests that linearity
    assumptions have not been met.
    35 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    DATA TRANSFORMATION
    JASP provides the ability to compute new variables or transform data. In some cases, it may
    be useful to compute the differences between repeated measures or, to make a dataset more
    normally distributed, you can apply a log transform for example.
    When a dataset is opened there will be a plus sign (+) at the end of the columns.
    Clicking on the + opens up a small dialogue window where you can;



    Enter the name of a new variable or the transformed variable
    Select whether you enter the R code directly or use the commands built into JASP
    Select what data type is required
    Once you have named the new variable and chosen the other options – click create.
    36 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    If you choose the manual option rather than the R code, this opens all the built-in create and
    transform options. Although not obvious, you can scroll the left and right-hand options to see
    more variables or more operators respectively.
    For example, we want to create a column of data showing the difference between variable 2
    and variable 3. Once you have entered the column name in the Create Computed Column
    dialogue window, its name will appear in the spreadsheet window. The mathematical
    operation now needs to be defined. In this case drag variable 2 into the equation box, drag
    the ‘minus’ sign down and then drag in variable 3.
    37 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    If you have made a mistake, i.e. used the wrong variable or operator, remove it by dragging
    the item into the dustbin in the bottom right corner.
    When you are happy with the equation/operation, click compute column and the data will be
    entered.
    If you decide that you do not want to keep the derived data, you can remove the column by
    clicking the other dustbin icon next to the R.
    Another example is to do a log transformation of the data. In the following case variable 1 has
    been transformed by scrolling the operators on the left and selecting the log10(y) option.
    Replace the “y” with the variable that you want to transform and then click Compute column.
    When finished, click the X to close the dialogue.
    38 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The Export function will also export any new data variables that have been created.
    The two graphs below show the untransformed and the log10 transformed data. The skewed
    data has been transformed into a profile with a more normal distribution
    Untransformed
    Log10 transformed
    39 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    EFFECT SIZE
    When performing a hypothesis test on data we determine the relevant statistic (r, t, F etc) and p-value
    to decide whether to accept or reject the null hypothesis. A small p-value, 0.05 only means that there is
    not enough evidence to reject the null hypothesis. A lower p-value is sometimes incorrectly
    interpreted as meaning there is a stronger relationship of difference between variables. So what is
    needed is not just null hypothesis testing but also a method of determining precisely how large the
    effects seen in the data are.
    An effect size is a statistical measure used to determine the strength of the relationship or difference
    between variables. Unlike a p-value, effect sizes can be used to quantitatively compare the results
    of different studies.
    For example, comparing heights between 11 and 12-year-old children may show that the 12-yearolds are significantly taller but it is difficult to visually see a difference i.e. small effect size. However,
    a significant difference in height between 11 and 16-year-old children is obvious to see (large effect
    size).
    The effect size is usually measured in three ways:



    the standardized mean difference
    correlation coefficient
    odds ratio
    When looking at differences between groups most techniques are primarily based on the differences
    between the means divided by the average standard deviations. The values derived can then be used
    to describe the magnitude of the differences. The effect sizes calculated in JASP for t-tests and ANOVA
    are shown below:
    40 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    When analysing bivariate or multivariate relationships the effect sizes are the correlation
    coefficients:
    When analysing categorical relationships via contingency tables i.e. chi-square test Phi is only used for
    2×2 tables while Cramer’s V and be used for any table size.
    For a 2 × 2 contingency table, we can also define the odds ratio measure of effect size.
    41 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    ONE SAMPLE T-TEST
    Research is normally carried out in sample populations, but how close does the sample reflect the
    whole population? The parametric one-sample t-test determines whether the sample mean is
    statistically different from a known or hypothesized population mean.
    The null hypothesis (Ho) tested is that the sample mean is equal to the population mean.
    ASSUMPTIONS
    Three assumptions are required for a one-sample t-test to provide a valid result:




    The test variable should be measured on a continuous scale.
    The test variable data should be independent i.e. no relationship between any of the data
    points.
    The data should be approximately normally distributed
    There should be no significant outliers.
    RUNNING THE ONE SAMPLE T-TEST
    Open one sample t-test.csv, this contains two columns of data representing the height (cm) and body
    masses (kg) of a sample population of males used in a study. In 2017 the average adult male in the UK
    population was 178 cm tall and has a body mass of 83.4 kg.
    Go to T-Tests > One-Sample t-test and in the first instance add height to the analysis box on the right.
    Then tick the following options above and add 178 as the test value:
    42 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    UNDERSTANDING THE OUTPUT
    The output should contain three tables and two graphs.
    The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the heights are
    normally distributed, therefore this assumption is not violated. If this showed a significant difference
    the analysis should be repeated using the non-parametric equivalent, Wilcoxon’s signed-rank test
    tested against the population median height.
    This table shows that there are no significant differences between the means p =.706
    The descriptive data shows that the mean height of the sample population was 177.6 cm compared
    to the average 178 cm UK male.
    43 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    The two plots show essentially the same data but in different ways. The standard Descriptive plot is
    an Interval plot showing the sample mean (black bullet), the 95% confidence interval (whiskers),
    relative to the test value (dashed line).
    The Raincloud Plot shows the data as individual data points, boxplot, and the distribution plot. This
    can be shown as either a vertical or horizontal display.
    44 | P a g e
    JASP 0.16.1 – Dr Mark Goss-Sampson
    Repeat the procedure by replacing height with mass and changing the test value to 83.4.
    The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the masses are
    normally distributed.
    This table shows that there is a significant difference between the mean sample (72.9 kg) and
    population body mass (83.4 kg) p

    Calculator

    Calculate the price of your paper

    Total price:$26
    Our features

    We've got everything to become your favourite writing service

    Need a better grade?
    We've got you covered.

    Order your paper
    Live Chat+1(978) 822-0999EmailWhatsApp

    Order your essay today and save 20% with the discount code GOLDEN

    seoartvin escortizmir escortelazığ escortbacklink satışbacklink saleseskişehir oto kurtarıcıeskişehir oto kurtarıcıoto çekicibacklink satışbacklink satışıbacklink satışbacklink