Concordia College Ratio and Regression Questions

R Companion for
Sampling
Design and Analysis
Third Edition
R Companion for
Sampling
Design and Analysis
Third Edition
Yan Lu and Sharon L. Lohr
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2022 Yan Lu and Sharon L. Lohr
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe. SAS® and all other SAS Institute Inc. product or service
names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA
registration.
Library of Congress Cataloging‑in‑Publication Data
Names: Lu, Yan, author. | Lohr, Sharon L., 1960- author.
Title: R companion for sampling : design and analysis / Yan Lu and Sharon
L. Lohr.
Description: First edition. | Boca Raton : CRC Press, 2022. | Includes
bibliographical references and index. | Summary: “The R Companion for
Sampling: Design and Analysis, designed to be read alongside Sampling:
Design and Analysis, Third Edition by Sharon L. Lohr (SDA; 2022, CRC
Press), shows how to use functions in base R and contributed packages to
perform calculations for the examples in SDA. No prior experience with R
is needed. Chapter 1 tells you how to obtain R and RStudio, introduces
basic features of the R statistical software environment, and helps you
get started with analyzing data. Each subsequent chapter provides
step-by-step guidance for working through the data examples in the
corresponding chapter of SDA, with code, output, and interpretation.
Tips and warnings help you develop good programming practices and avoid
common survey data analysis errors. R features and functions are
introduced as they are needed so you can see how each type of sample is
selected and analyzed. Each chapter builds on the knowledge developed
earlier for simpler designs; after finishing the book, you will know how
to use R to select and analyze almost any type of probability sample”-Provided by publisher.
Identifiers: LCCN 2021039318 (print) | LCCN 2021039319 (ebook) | ISBN
9781032135946 (paperback) | ISBN 9781032132150 (hardback) | ISBN
9781003228196 (ebook)
Subjects: LCSH: R (Computer program language) | Sampling (Statistics)
Classification: LCC QA276.45.R3 L8 2022 (print) | LCC QA276.45.R3 (ebook)
| DDC 519.5/202855133–dc23
LC record available at https://lccn.loc.gov/2021039318
LC ebook record available at https://lccn.loc.gov/2021039319
ISBN: 978-1-032-13215-0 (hbk)
ISBN: 978-1-032-13594-6 (pbk)
ISBN: 978-1-003-22819-6 (ebk)
DOI: 10.1201/9781003228196
Access the Support Material: https://www.routledge.com/9781032135946
To Guoyi and Lynn, and to Doug
Contents
Preface
xi
1 Getting Started
1.1 Obtaining the Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Installing R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 R Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Reading Data into R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Saving Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Integrating R Output into LATEX Documents . . . . . . . . . . . . . . . . .
1.7 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
2
4
5
7
10
12
13
2 Simple Random Sampling
2.1 Selecting a Simple Random Sample . . . . . . . . . . . . . . . . . . . . . .
2.2 Computing Statistics from a Simple Random Sample . . . . . . . . . . . .
2.3 Additional Code for Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
18
24
25
3 Stratified Sampling
3.1 Allocation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Selecting a Stratified Random Sample . . . . . . . . . . . . . . . . . . . . .
3.3 Computing Statistics from a Stratified Random Sample . . . . . . . . . . .
3.4 Estimating Proportions from a Stratified Random Sample . . . . . . . . . .
3.5 Additional Code for Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
27
27
30
32
36
37
38
4 Ratio and Regression Estimation
4.1 Ratio Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Regression Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Domain Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Poststratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Ratio Estimation with Stratified Sampling . . . . . . . . . . . . . . . . . .
4.6 Model-Based Ratio and Regression Estimation . . . . . . . . . . . . . . . .
4.7 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
41
41
44
46
48
49
50
54
5 Cluster Sampling with Equal Probabilities
5.1 Estimates from One-Stage Cluster Samples . . . . . . . . . . . . . . . . . .
5.2 Estimates from Multi-Stage Cluster Samples . . . . . . . . . . . . . . . . .
5.3 Model-Based Design and Analysis for Cluster Samples . . . . . . . . . . . .
5.4 Additional Code for Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
59
63
65
67
vii
viii
Contents
6 Sampling with Unequal Probabilities
6.1 Selecting a Sample with Unequal Probabilities . . . . . . . . . . . . . . . .
6.1.1 Sampling with Replacement . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Sampling without Replacement . . . . . . . . . . . . . . . . . . . . .
6.2 Selecting a Two-Stage Cluster Sample . . . . . . . . . . . . . . . . . . . . .
6.3 Computing Estimates from an Unequal-Probability Sample . . . . . . . . .
6.3.1 Estimates from with-Replacement Samples . . . . . . . . . . . . . .
6.3.2 Estimates from without-Replacement Samples . . . . . . . . . . . . .
6.4 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
69
69
69
70
71
77
77
79
83
7 Complex Surveys
7.1 Selecting a Stratified Two-Stage Sample . . . . . . . . . . . . . . . . . . . .
7.2 Estimating Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Computing Estimates from Stratified Multistage Samples . . . . . . . . . .
7.4 Univariate Plots from Complex Surveys . . . . . . . . . . . . . . . . . . . .
7.5 Scatterplots from Complex Surveys . . . . . . . . . . . . . . . . . . . . . .
7.6 Additional Code for Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
7.7 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . .
85
85
88
89
92
95
103
105
8 Nonresponse
107
8.1 How R Functions Treat Missing Data . . . . . . . . . . . . . . . . . . . . . 107
8.2 Poststratification and Raking . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.3 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 112
9 Variance Estimation in Complex Surveys
113
9.1 Replicate Samples and Random Groups . . . . . . . . . . . . . . . . . . . . 113
9.2 Constructing Replicate Weights . . . . . . . . . . . . . . . . . . . . . . . . 116
9.2.1 Balanced Repeated Replication . . . . . . . . . . . . . . . . . . . . . 117
9.2.2 Jackknife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.2.3 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.2.4 Replicate Weights and Nonresponse Adjustments . . . . . . . . . . . 124
9.3 Using Replicate Weights from a Survey Data File . . . . . . . . . . . . . . 126
9.4 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 127
10 Categorical Data Analysis in Complex Surveys
129
10.1 Contingency Tables and Odds Ratios . . . . . . . . . . . . . . . . . . . . . 129
10.2 Chi-Square Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
10.3 Loglinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.4 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 137
11 Regression with Complex Survey Data
139
11.1 Straight Line Regression with a Simple Random Sample . . . . . . . . . . . 139
11.2 Linear Regression for Complex Survey Data . . . . . . . . . . . . . . . . . 142
11.3 Using Regression to Compare Domain Means . . . . . . . . . . . . . . . . . 145
11.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.5 Additional Resources and Code . . . . . . . . . . . . . . . . . . . . . . . . 151
11.6 Summary, Tips, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . . 152
12 Additional Topics for Survey Data Analysis
155
12.1 Two-Phase Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
12.2 Estimating the Size of a Population . . . . . . . . . . . . . . . . . . . . . . 157
Contents
12.2.1 Ratio Estimation of Population Size . . . . . . . . . . . . . . . . . .
12.2.2 Loglinear Models with Multiple Lists . . . . . . . . . . . . . . . . . .
12.3 Small Area Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
157
159
161
162
A Data Set Descriptions
163
Bibliography
197
Index
205
Preface
R Companion for Sampling: Design and Analysis, Third Edition shows how to use the
R statistical software environment to perform the calculations in the textbook Sampling:
Design and Analysis, Third Edition (SDA) by Sharon L. Lohr. It is intended to be read
in conjunction with SDA and is not a standalone text. The parallel book by Lohr (2022)
shows how to perform the computations for the examples using SAS® software, and could
be read together with this book and SDA to learn how to perform the analyses in each
software package.
All code and data sets can be downloaded from any of the following websites:
https://math.unm.edu/~luyan/rbook.html
https://www.sharonlohr.com
https://www.routledge.com/9781032135946
The first two websites also contain additional programs, not discussed in this book, that
you can adapt for some of the SDA exercises. The data sets used in this book have also
been saved in R format in the contributed R package SDAResources (Lu and Lohr, 2021).
In this book, we give step-by-step guidance for using functions from base R and contributed
packages to select samples and analyze the data sets discussed in Chapters 1–13 of SDA.
The software, however, can do much more than analyze the examples presented in this book.
You can find information on advanced capabilities for the survey and sampling contributed
packages in the documentation for those packages by Lumley (2020) and Tillé and Matei
(2021); the books and articles by Lumley (2004, 2010) and Tillé and Matei (2010) provide
additional information about the packages. Goga (2018) gives an overview of using R for
survey sampling.
For easy reference, the index at the back of the book gives page numbers for the examples
in SDA. To locate the code and output for Example 2.5, for example, look up the subentry
“Example 02.05” under “Examples in SDA” in the index. The book also gives code and suggestions for some of the exercises in SDA, and these are listed under index entry “Exercises
in SDA.”
Each chapter ends with a summary section containing tips and warnings for the analyses
discussed in that chapter. These provide ways of avoiding common survey data analysis
errors and checking whether you did the analysis correctly.
Although prior experience with R is helpful, it is not needed to read this book. Chapter 1
tells how to obtain the software and do basic operations in R. It also lists resources for
learning more about programming in R and tells how to obtain help.
This book makes use of functions that exist in base R and contributed packages, and does
not discuss how to write R functions. One of R’s most valuable features, however, is the
capacity for writing functions to carry out new tasks. Advanced R users may want to write
their own functions to select samples or analyze data from a complex survey. When teaching
xi
xii
Preface
survey sampling to students who have R programming experience, we have sometimes asked
them to write their own functions to carry out various sampling tasks. This helps solidify
their knowledge of the material and allows them to do computations not available in existing
functions. For example, we have asked students to write R functions to perform allocation
for and analyze data from a stratified random sample, select a with-replacement unequalprobability sample using Lahiri’s method, compute the Sen–Yates–Grundy estimate of the
variance, simulate the sampling distribution of a statistic, and find empirical estimates of
the coverage probability of a confidence interval for a biased estimator.
All code, data sets, and output in this book are provided for educational purposes only
and without warranty. Base R does not contain functions for survey data, and this book
relies heavily on contributed packages that have been developed. These packages are in
widespread use and have been quality-checked by their authors and other users. We have
verified that the calculations from the R functions used for the examples in this book agree
with calculations by the formulas and with calculations performed in other survey software
packages.
Other R packages may not be checked as carefully, however. Although R contributed packages undergo some consistency and functionality tests when they are submitted (see Wickham, 2015, for a description of checks that are performed), no central authority reviews
the packages to make sure that the functions do what they claim or that the algorithms
perform computations accurately. Most R contributed packages are not peer-reviewed, and
you should be aware that some may contain errors.
The code and output in this book were developed using version 4.0.4 of R for Windows (R
Core Team, 2021) and the versions of the packages listed in their respective bibliography
entries, and all code in the book works with those versions. But R is a dynamic language,
and the R Core Team and authors of contributed packages can change or remove functions at
any time. Although most authors who revise a package try to avoid changes that will affect
previously written code, functions in R are not guaranteed to be backward compatible—it
is possible that R code you write today may not work the same way with future versions
of the software. If backward compatibility is important to you—for example, if you will be
using the same code to produce estimates each year for an annual survey—you may want
to perform or check your computations in a package that is backward compatible, such as
SAS software. If a function changes in a subsequent version of an R package, you can either:
• Read the documentation and change your code so that it works with the modified
function, or
• Download and use the older version of the package. You can find previous versions on
the package’s web page under the heading “Old sources.”
Acknowledgments. Many thanks to John Kimmel, our editor at CRC Press, for encouraging
us to write this book, and to the CRC Press production team for all their support and
help. We are grateful to Yves Tillé and Thomas Lumley for answering questions about the
sampling and survey packages. Students in Yan Lu’s sampling class at the University of
New Mexico provided helpful suggestions for clarifying the material. We also want to thank
Lynn Zhang for helping with the preparation of the SDAResources package.
1
Getting Started
The R statistical software environment is a powerful and flexible platform for performing
statistical analyses. The basic package contains thousands of functions for computing statistics, and user-contributed packages for this open-source software provide thousands more.
Advanced users can write their own functions to implement new methods for statistical
analyses.
Best of all, the base R package and all user-contributed packages are available free of charge
to anyone with an internet connection.
This chapter tells you how to obtain R software and contributed packages and introduces
you to some basic R functions. It also shows you how to read data sets into R and save
output and graphics produced while you are using the package.
Conventions used in this book. This book is intended to be read in conjunction with
Sampling: Design and Analysis, Third Edition by Sharon L. Lohr, henceforth referred to as
SDA. Many of the examples in this book refer to figures, tables, examples, or exercises in
SDA. To avoid confusion, we refer to figures in SDA as “Figure x.x in SDA.” We refer to
figures in this book as “Figure x.x” with no qualifier.
The names of external data files and programs, such as agsrs.csv and ch02.R, are in
typewriter font, as are the names of R packages and code we type. Variable names,
function names, and internal R data set names are in italic type.
Much of this book consists of R commands and output, set in light shaded boxes such as
the following:
# This is a comment
# Enter data values into vector ‘myvec’
myvec

Don't use plagiarized sources. Get Your Custom Essay on
Concordia College Ratio and Regression Questions
Just from $13/Page
Order Essay
Calculator

Calculate the price of your paper

Total price:$26
Our features

We've got everything to become your favourite writing service

Need a better grade?
We've got you covered.

Order your paper
Live Chat+1(978) 822-0999EmailWhatsApp

Order your essay today and save 20% with the discount code GOLDEN

seoartvin escortizmir escortelazığ escortbacklink satışbacklink saleseskişehir oto kurtarıcıeskişehir oto kurtarıcıoto çekicibacklink satışbacklink satışıbacklink satışbacklink