chapter reflection
Part 1:
Reflection: You will complete a reflection assignment for Chapters 4 and 5. In the reflection, you will include a 1-2 paragraph reflection of the material presented in the chapters. Then you will provide a one paragraph connection to your work in the schools. Suggested format:
Chapter 4:
One-Two paragraph summary or outline for Chapter.
One paragraph connection.
Chapter 5:
One-Two paragraph summary or outline for Chapter.
One paragraph connection.
Part 2:
Assignment Directions:
For this portion of the assignment, I would like for you to look at the classroom makeup and unit identification of the key assessment. Please identify what unit you will be focusing on for the task, and if you do not have students with the exceptionalities listed, you will plan for the students listed in the document as if they were in your class.
Assessing Learners with Special
Needs: An Applied Approach
Eighth Edition
Chapter 4
Reliability & Validity
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Reliability & Validity
Aids in determining test accuracy and dependability
• Reliability—the dependability or consistency of an
instrument across time or items.
• Validity—the degree to which an instrument measures
what it was designed to measure.
Instruments should have both properties but may have only
one (not that a strong of an instrument)
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Correlation (r)
Correlation—the degree of relationship between two variables.
• Two administrations of the same test
• Administration of equivalent forms
Correlation coefficient ranges: +1.00 to −1.00
• Perfect positive correlation = +1.00
• Perfect negative correlation = −1.00
• No correlation = 0
• Numbers closer to 1.00 represent stronger relationships
– The greater degree of the relationship, the more reliable the
instrument.
– The does not indicate strength, but direction.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Scattergram
• Scattergrams provide a graphic representation of a data
set and show a correlation.
• The more closely the dots on a scattergram approximate a
straight line, the nearer to perfect the correlation.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Types of Correlation
POSITIVE
CORRELATION
Variables with a
positive
relationship move
in the same
direction.
Scores on
variables increase
simultaneously.
NEGATIVE
CORRELATION
• High scores on
one variable are
associated with
low scores on
another variable.
No Correlation
• When data
from two
variables are
not associated
or have no
relationship.
• No linear
direction on a
scattergram
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Methods of Measuring Reliability
Pearson’s r
• Pearson’s Product Moment correlation
• Used with interval or ratio data
Internal Consistency
• The consistency of items on an instrument to measure a
skill, trait or domain.
– Test-retest
– Equivalent forms
– Split-half
– Kuder-Richardson formulas
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Test-Retest Reliability
Test-retest reliability—the trait being measured is one that is
stable over time.
• If the trait being measured remains constant, the readministration of the instrument will result in scores similar
to the first score.
– Important to conduct re-test shortly after first test to
control for influencing variables.
• Difficulties:
– Too soon: Students may remember test items (practice
effect) and score higher the second time.
– Too far: Greater influence of time variables (e.g.,
learning, maturation, etc.)
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Equivalent (Alternate) Forms
Reliability
Equivalent forms reliability
• Two forms of the same instrument are used.
• Items are matched for difficulty.
Advantage: Two tests of the same difficulty level that can be
administered within a short time frame without the influence
of practice effects.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Internal Consistency Measures
Split-Half Reliability
• Takes all available items on a test and divides the items in
half.
• Establishes reliability of half the test with the other half.
• Does not establish reliability of the entire test—reliability
increases with the number of items.
Kuder-Richardson 20
• Used to check consistency across items of an instrument
with right or wrong answers.
Coefficient Alpha
• Used to check consistency across items of an instrument
where credit varies across responses.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Interrater Reliability
Interrater reliability
• The consistency of a test across examiners.
• One person administers a test, a second person rescores
the test.
• The scores are then correlated to determine how much
variability exists between the scores.
• Very important for subjective-scoring tests.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Which Type of Reliability is Best?
Three reliability types
• Consistency over time
• Consistency of items on a test
• Consistency of scorers
Optimal r scores
• .60 is adequate
• .80 is very good (preferred)
Which one is chosen depends upon the purpose of the
assessment.
Reliability coefficient is a group statistic and can be
influenced by the make-up of the group. It is important to
review the manual to determine the make-up of the group.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Standard Error of Measurement
Basic assumption of assessment: ERROR EXISTS
Variables that affect scores exist for a variety of reasons:
• Poor testing environment
• Errors in the test
• Student variables (e.g., hungry, tired)
This variance is called error and is the standard error of
measurement.
• Instruments with small standard error of measurement are
preferred.
A single test may not accurately reflect a student’s true
score.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Calculating Standard Error of
Measurement
• To estimate the amount of error present in an obtained
score
SEM = SD 1 – r
SEM = Standard Error of Measurement
SD = Standard Deviation
r = Reliability coefficient
SEM is based on normal distribution theory.
Confidence Interval
• The range of scores for an obtained score the SEM
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Application of SEM
The range of error and the range of a student’s score may
vary substantially, which may change the interpretation of the
score for placement purposes.
SEM varies by age, grade and subtest.
When SEM is applied to scores, discrepancies may not be
significant.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Estimated True Scores
A method of calculating the amount of error correlated with
the distance of the score from the mean of the group.
• The further a score is from the mean, the greater chance
for error.
• A true score is always assumed to be nearer to the mean
than the obtained score.
• Estimated true scores can be used to establish a range of
scores.
Estimated True Scores = M + r ( X − M)
M = mean of group
r = reliability coefficient
X = obtained score
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Test Validity
Does the test actually measure what it is supposed to
measure?
Criterion-related validity: Comparing scores with other criteria
known to be indicators of the same trait or skill
• Concurrent Validity: Two tests are given within a very
short timeframe (often the same day). If scores are similar,
the tests are said to be measuring the same trait.
• Predictive Validity: Measures how well an instrument can
predict performance on some other variable.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Content Validity
Ensuring that the items in a test are representative of content
purported to be measured.
• PROBLEM: Teachers often generalize and assume the test
covers more than it does (e.g., the WRAT-3 reading subtest
only measures word recognition—not phonemic awareness,
phonics, vocabulary, reading comprehension, etc.).
Some of the variables of content validity may influence the manner
in which results are obtained and can contribute to bias in testing.
• Presentation Format: The method by which items are presented
to the student
• Response Mode: The method for the examinee to answer
items.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Construct Validity
A term used to describe a psychological trait, personality trait,
psychological concept, attribute or theoretical characteristic.
The construct must be clearly defined although they are often
abstract concepts.
Types of studies that can establish construct validity
•
•
•
•
•
•
Developmental changes
Correlations with other tests
Factor analysis
Internal consistency
Convergent and discriminate validation
Experimental interventions
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Validity of Test and Validity of Use
Tests may be used inappropriately even though they are
valid instruments.
Results obtained may be used an in invalid manner.
Tests may be biased and/or discriminate against different
groups.
Item bias, when an item is answered incorrectly a
disproportionate number of times by one group compared to
another.
Predictive validity may predict accurately for one group and
not another.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is
provided solely for the use of instructors in teaching their courses
and assessing student learning. Dissemination or sale of any part
of this work (including on the World Wide Web) will destroy the
integrity of the work and is not permitted. The work and materials
from it should never be made available to students except by
instructors using the accompanying text in their classes. All
recipients of this work are expected to abide by these restrictions
and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Assessing Learners with Special
Needs: An Applied Approach
Eighth Edition
Chapter 5
An Introduction to NormReferenced Assessment
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
How Norm-Referenced Tests Are
Constructed
• When it is decided that an assessment will be created for
a specific domain…
– An item pool is created
– Items are arranged sequence according to difficulty
– A developmental version is field tested with a small
sample
– Professionals critique the assessment
– Revisions are made
– Field test with a larger sample
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
What is a Norm-referenced Test?
Allows teachers to compare the performance of one student
with the average performance of other students who are of
the same age or grade.
• The norm group (i.e., sample) is a group of diverse
students (e.g., linguistic, disabilities, cultural, etc.)
– The norm group sets the average performance for the
assessment
– The norm group should be representative of the
students who will be later assessed
– Students who are later assessed should be similar in
background, age or grade.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Interpolating Data
• Test developers obtain average expected scores for each
month by interpolating data.
– Divide existing data into smaller units to establish
developmental scores
– Scores are written using a decimal ()
Scores might also be divided by age group so that each
month a chronological age is represented.
– A student’s age expressed in years, months and days.
– Scores are written using a dash ( − )
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Steps in Test Administration
Be sure to read the test manual
Follow the directions established by the test developer
Practice administering the test
Establish a positive testing environment
1.
2.
3.
4.
5.
Get familiar with the student before testing begins.
Engage in friendly conversation before testing begins.
Explain why the testing is being completed.
Provide a brief introduction to the test.
Begin testing in a calm manner.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Calculating Chronological Age
• Chronological age determines how old a student is in
years, months and days (in that order). Must be calculated
correctly for interpreting test results.
• Is calculated by writing the test day first then subtracting
the student’s date of birth.
• Each column in the calculation is a different base:
– 12 months
– 30 days
Blank
Year −1
Month +12
Day +30
Test Date
2003
4
2
Birth Date
−1994
−10
−8
8
5
24
Blank
– Chronological age (rounded) is 8−6
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Calculating Raw Score
• The first score obtained during testing is the raw score.
– The number of items a student answers correctly.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Basals and Ceilings (1 of 2)
• The starting and stopping points of a test must be determined so that
unnecessary items are not administered.
– The starting points are meant to represent a level at which a
student could answer.
This information is provided in the manual, in the protocol or on the test
itself.
• Each student must establish a basal.
– A basal is the level at which the student could correctly answer all
easier items.
– Typically, a manual will state that a student must get X number of
items in a row correct in order to establish a basal.
– Once a basal is established, the testing my proceed.
– If the student does not establish a basal, the test is probably too
hard. An alternative should be given.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Basals and Ceilings (2 of 2)
• Starting points can be given as age or grade.
– Student is 6-4; start with item 10
– Student is 8.3; start with item 50
NOTE: When calculating the raw score, all items that appear before the established basal
are counted as correct.
It is better to start the test asking easier questions to reduce
frustration levels of students who may be below typical levels.
• Ceilings are thought to represent the level at which more
difficult questions would not be passed.
– Typically a manual will state that a student must get X
number of items in a row incorrect in order to establish a
ceiling.
– Once a student “hits” the ceiling, the testing stops.
–
NOTE: Basals and ceilings may not be the same number. In fact, the basal and
ceiling may vary with each section of an assessment!
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Using Information on Protocols
The protocol is the form used during the test administration
and for scoring.
Student answers are often scored as a series of 1s and 0s
(correct and incorrect) in the protocol.
Be sure to read the manual regarding what subtests to
administer and basal and ceiling information.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Getting the Best Results
•
Students tend to respond more and perform better in testing situations with
examiners who are familiar with them.
– Students should not meet you for the first time on testing day!
•
It may also be helpful for the student to visit the testing site to become familiar
with the environment.
•
Classroom observations and visits may aid the examiner in determining which
tests to administer.
– Do not over-test the student
•
Make the student feel at ease.
•
Convey the importance of the testing without making the student feel anxious.
•
Reinforce the student’s attempts and efforts, not correct responses.
•
Young students may enjoy a tangible reinforcer upon the completion of the
testing session.
– Not recommended during the assessment.
•
Follow all directions in the manual.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Reducing Bias
1. Do sensory or communicative impairments make portions of
the test inaccessible?
2. Do sensory or communicative impairments limit students from
responding to questions?
3. Do test materials or method of responding limit students from
responding?
4. Do background experiences limit the student’s ability to
respond?
5. Does the content of classroom instruction limit students from
responding?
6. Is the examiner familiar to the student?
7. Are instructions explained in a familiar fashion?
8. Is the recording technique required of the student on the test
familiar?
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Obtaining Derived Scores
• Raw scores are used to locate other derived scores.
• There are advantages and disadvantages to using
different types of derived scores.
– It is important to understand the numerical scales that
the scores are representing.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Types of Derived Scores
Scores
Description
Standard Score
Average score is 100.
T-score
Average score is 50 with a standard deviation is 10.
z-scores
Expresses a student’s standing in standard
deviation units.
Stanine Score
Divide the distribution into 9 segments with an
average of 5 and a standard deviation of 2.
Scaled Score
Divide the distribution into 19 segments with a
standard deviation of 3.
Percentile Rank
How a student performed in relationship to students
in the norm sample of the same age or grade.
Age Equivalent
The age of the student in the norm sample who, on
average, got the same score.
Grade Equivalent
The grade of the student in the norm sample who,
on average, got the same score.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Group Testing
• Schools often administer group standardized tests of
achievement.
• These are often referred to as high stakes tests.
• Considerations in testing:
– Tests should be logical and serve the purpose for
which they were intended.
– No test has the capability of answering all achievement
questions.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
National Center on Educational
Outcomes
Core principles:
1. All students are included in ways that hold schools
accountable for their learning.
2. Assessments allow all students to show their knowledge
and skills on the same challenging content.
3. High-quality decision making determines how students
participate.
4. Public reporting includes the assessment results of all
students.
5. Accountability determinations are affected in the same way
by all students.
6. Continuous improvement, monitoring, and training ensure
the quality of the overall system (p. v).
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
High Stakes Considerations (1 of 2)
• IDEA requires students with disabilities to be included in assessments.
– Used as a measure of accountability.
– Student progress is measured to determine if programs are
effective.
• These students are afforded accommodations.
– Changes in format, response mode, setting, timing or scheduling.
– May not alter what the test is measuring.
– Accommodations should prevent measuring a student’s disability.
• How assessment requirements are met are determined during the IEP
and 504 processes.
– Decisions regarding assessments should focus on the standard
that students are expected to master.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
High Stakes Considerations (2 of 2)
• Students who can not take the assessment must be
provided with an alternative assessment.
– There is a 1% cap on the number of students who may
take the alternative assessments.
– Assessments permitted include: Portfolio, performancebased, authentic and observations.
NOTE: Students who are E L may require accommodations to test their content knowledge
and not their English skills.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Issues & Research in High-Stakes
Testing (1 of 2)
• Concerns about the inconsistency of definitions, federal law
requirements, variability among states and districts, differences in
standards of expectations for students with disabilities, lack of
participation of students with disabilities in test development and
standardization of instruments, and lack of consistency.
• Conceptual understanding of the purpose and nature of the
assessment.
• Mandatory statewide assessments have resulted in damaging the
American education system for all students and alternate
assessments may not be the best way to measure academic progress.
• Some teachers reported that high-stakes assessment helped teachers
target and individualize instruction, and that their students who
disliked reading or had difficulties with academics felt more in control
of their own learning.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Issues & Research in High-Stakes
Testing (2 of 2)
• Performance task-based reading alternate tests can be scaled
to statewide assessments, although determining their validity
and reliability may be difficult.
• Development of alternate assessments is difficult and states
require more time to develop appropriate measures.
• Practices on some campuses might result in specific groups of
students being encouraged not to attend school on the days of
the assessment so that campus data might be more favorable.
• Test items were not comparable across assessments when the
modified assessments used for children with various disabilities
were analyzed. Moreover, the items varied by disability
category.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Universal Design of Assessments
There has been a growing interest in making the design of all assessments more fair and
user friendly for all learners rather than trying to fit a test to a student’s needs.
• Principle One: Equitable Use: The design is useful and marketable to people with
diverse abilities.
•
Principle Two: Flexibility in Use: The design accommodates a wide range of individual
preferences and abilities.
•
Principle Three: Simple and Intuitive Use: Use of the design is easy to understand,
regardless of the user’s experience, knowledge, language skills, or current concentration
level.
•
Principle Four: Perceptible Information: The design communicates necessary
information effectively to the user, regardless of ambient conditions or the user’s sensory
abilities.
•
Principle Five: Tolerance for Error: The design minimizes hazards and the adverse
consequences of accidental or unintended actions.
•
Principle Six: Low Physical Effort: The design can be used efficiently and comfortably
and with a minimum of fatigue.
•
Principle Seven: Size and Space for Approach and Use: Appropriate size and space is
provided for approach, reach, manipulation, and use regardless of user’s body size,
posture, or mobility.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is
provided solely for the use of instructors in teaching their courses
and assessing student learning. Dissemination or sale of any part
of this work (including on the World Wide Web) will destroy the
integrity of the work and is not permitted. The work and materials
from it should never be made available to students except by
instructors using the accompanying text in their classes. All
recipients of this work are expected to abide by these restrictions
and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.
Copyright © 2016, 2012, 2009 Pearson Education, Inc. All Rights Reserved
Top-quality papers guaranteed
100% original papers
We sell only unique pieces of writing completed according to your demands.
Confidential service
We use security encryption to keep your personal data protected.
Money-back guarantee
We can give your money back if something goes wrong with your order.
Enjoy the free features we offer to everyone
-
Title page
Get a free title page formatted according to the specifics of your particular style.
-
Custom formatting
Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.
-
Bibliography page
Don’t pay extra for a list of references that perfectly fits your academic needs.
-
24/7 support assistance
Ask us a question anytime you need to—we don’t charge extra for supporting you!
Calculate how much your essay costs
What we are popular for
- English 101
- History
- Business Studies
- Management
- Literature
- Composition
- Psychology
- Philosophy
- Marketing
- Economics