# UG AcceptReject Algorithm Problems

Assignment 1Problem 1

Problem 2

Use R!

Advisors:

Robert Gentleman

· Kurt Hornik · Giovanni Parmigiani

For other titles published in this series, go to

http://www.springer.com/series/6991

Christian P. Robert · George Casella

Introducing Monte Carlo

Methods with R

123

Christian P. Robert

Université Paris Dauphine

UMR CNRS 7534

CEREMADE

place du Maréchal de Lattre

de Tassigny

75775 Paris cedex 16

France

xian@ceremade.dauphine.fr

George Casella

Department of Statistics

University of Florida

103 Grifﬁn-Floyd Hall

Gainesville FL 32611-8545

USA

casella@stat.uﬂ.edu

Series Editors

Robert Gentleman

Department of Bioinformatics

and Computational Biology

Genentech South San Francisco

CA 94080

USA

Kurt Hornik

Department of Statistik and Mathematik

Wirtshchaftsuniversität Wien Augasse 2-6

A-1090 Wien

Austria

Giovanni Parmigiani

Department of Biostatistics

and Computational Biology

Dana-Farber Cancer Institute

44 Binney Street

Boston, MA 02115

USA

ISBN 978-1-4419-1575-7

DOI 10.1007/978-1-4419-1576-4

e-ISBN 978-1-4419-1576-4

Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2009941076

c Springer Science+Business Media, LLC 2010

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the

publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief

excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and

retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter

developed is forbidden.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identiﬁed

as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

v

To our parents, who taught us much in many ways.

vi

“What if you haven’t the data?”

“Then we shall proceed directly to the brandy and cigars.”

Lyndsay Faye

The Case of Colonel Warbuton’s Madness

Preface

“After that, it was down to attitude.”

Ian Rankin

Black & Blue

The purpose of this book is to provide a self-contained entry into Monte Carlo

computational techniques. First and foremost, it must not be confused with

a programming addendum to our earlier book Monte Carlo Statistical Methods whose second edition came out in 2004. The current book has a diﬀerent

purpose, namely to make a general audience familiar with the programming

aspects of Monte Carlo methodology through practical implementation. Not

only have we introduced R at the core of this book, but the emphasis and

contents have changed drastically from Monte Carlo Statistical Methods, even

though the overall vision remains the same. Theoretical foundations are intentionally avoided in the current book.

Indeed, the emphasis on practice is a major feature of this book in that

its primary audience consists of graduate students in statistics, biostatistics,

engineering, etc., who need to learn how to use simulation methods as a tool

to analyze their experiments and/or datasets. The book should appeal to

scientists in all ﬁelds, given the versatility of these Monte Carlo tools. It can

also be used for a more classical statistics audience when aimed at teaching a

quick entry into modern computational methods based on R, at the end of an

undergraduate program for example, even though this may prove challenging

for some students.

The choice of the programming language R, as opposed to faster alternatives such as Matlab or C and more structured constructs such as BUGS, is

due to its pedagogical simplicity and its versatility. Readers can easily conduct experiments in their own programming language by translating the examples provided in this book. (Obviously, the book can also supplement other

textbooks on Bayesian modeling at the graduate level, including our books

Bayesian Choice (Robert, 2001) and Monte Carlo Statistical Methods (Robert

viii

Preface

and Casella, 2004).) This book can also be viewed as a companion to, rather

than a competitor of, Jim Albert’s Use R! book Bayesian Computation with

R (Albert, 2009). Indeed, taken as a pair, these two books can provide a fairly

thorough introduction to Monte Carlo methods and Bayesian modeling.

We stress that, at a production level (that is, when using advanced Monte

Carlo techniques or analyzing large datasets), R cannot be recommended as

the default language, but the expertise gained from this book should make

the switch to another language seamless.

Contrary to usual practice, many exercises are interspersed within the

chapters rather than postponed until the end of each chapter. There are two

reasons for this stylistic choice. First, the results or developments contained in

those exercises are often relevant for upcoming points in the chapter. Second,

they signal to the student (or to any reader) that some pondering over the

previous pages may be useful before moving to the following topic and so may

act as self-checking gateways. Additional exercises are found at the end of

each chapter, with abridged solutions of the odd-numbered exercises provided

on our Webpages as well as Springer’s.

Thanks

We are immensely grateful to colleagues and friends for their help with this

book, in particular to the following people: Ed George for his comments on the

general purpose of the book and some exercises in particular; Jim Hobert and

Fernando Quintana for helpful discussions on the Monte Carlo EM; Alessandra

Iacobucci for signaling in due time a fatal blunder; Jean-Michel Marin for

letting us recycle the ﬁrst chapter of Bayesian Core (Marin and Robert, 2007)

into our introduction to R and for numerous pieces of R advice, as well as

pedagogical suggestions; Antonietta Mira for pointing out mistakes during

a session of an MCMC conference in Warwick; François Perron for inviting

CPR to Montréal and thus providing him with a time window to complete

Chapter 8 (only shortened by an ice-climbing afternoon in Québéc!), and also

François Perron and Clémentine Trimont for testing the whole book from

the perspectives of a professor and a student, respectively; Martyn Plummer

for answering queries about coda; Jeﬀ Rosenthal for very helpful exchanges

on amcmc; Dimitris Rizopoulos for providing Exercise 7.21; and Phil Spector

from Berkeley for making available his detailed and helpful notes and slides

on R and C, now partly published as Spector (2009). Comments from both

reviewers were especially useful in ﬁnalizing the book. We are also grateful to

John Kimmel of Springer for his advice and eﬃciency, as well as for creating

the visionary Use R! series and supporting the development of the R language

that way. From a distance, we also wish to thank Professors Gentleman and

Ihaka for creating the R language and for doing it in open-source, as well as

the entire R community for contributing endlessly to its development.

Sceaux and Gainesville

October 18, 2009

Christian P. Robert and George Casella

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1

Basic R Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.1 The vector class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.2 The matrix, array, and factor classes . . . . . . . . . . . . . . . . . .

1.3.3 The list and data.frame classes . . . . . . . . . . . . . . . . . . . . . .

1.4 Probability distributions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5 Basic and not-so-basic statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6 Graphical facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7 Writing new R functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.8 Input and output in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.9 Administration of R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.10 The mcsm package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.11 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

2

3

5

6

9

12

14

14

26

31

35

36

36

37

2

Random Variable Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1 Uniform simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.2 The inverse transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 General transformation methods . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.1 A normal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.2 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.3 Mixture representations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3 Accept–reject methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

42

42

44

46

47

48

50

51

x

Contents

2.4 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3

Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2 Classical Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3.1 An arbitrary change of reference measure . . . . . . . . . . . . .

3.3.2 Sampling importance resampling . . . . . . . . . . . . . . . . . . . .

3.3.3 Selection of the importance function . . . . . . . . . . . . . . . . .

3.4 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

62

65

69

69

75

78

86

4

Controlling and Accelerating Convergence . . . . . . . . . . . . . . . . . 89

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.2 Monitoring variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3 Asymptotic variance of importance sampling estimators . . . . . . 92

4.4 Eﬀective sample size and perplexity . . . . . . . . . . . . . . . . . . . . . . . . 98

4.5 Simultaneous monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.6 Rao–Blackwellization and deconditioning . . . . . . . . . . . . . . . . . . . 107

4.7 Acceleration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.7.1 Correlated simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.7.2 Antithetic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.7.3 Control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.8 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5

Monte Carlo Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.2 Numerical optimization methods . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.3 Stochastic search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.3.1 A basic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.3.2 Stochastic gradient methods . . . . . . . . . . . . . . . . . . . . . . . . 136

5.3.3 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.4 Stochastic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.4.1 Optimizing Monte Carlo approximations . . . . . . . . . . . . . 146

5.4.2 Missing-data models and demarginalization . . . . . . . . . . . 150

5.4.3 The EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.4.4 Monte Carlo EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.5 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6

Metropolis–Hastings Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.2 A peek at Markov chain theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.3 Basic Metropolis–Hastings algorithms . . . . . . . . . . . . . . . . . . . . . . 170

6.3.1 A generic Markov chain Monte Carlo algorithm . . . . . . . 171

6.3.2 The independent Metropolis–Hastings algorithm . . . . . . 175

6.4 A selection of candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Contents

xi

6.4.1 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

6.4.2 Alternative candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

6.5 Acceptance rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

6.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7

Gibbs Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

7.2 The two-stage Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

7.3 The multistage Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

7.4 Missing data and latent variables . . . . . . . . . . . . . . . . . . . . . . . . . . 209

7.5 Hierarchical structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7.6 Other considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

7.6.1 Reparameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

7.6.2 Rao–Blackwellization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

7.6.3 Metropolis within Gibbs and hybrid strategies . . . . . . . . 230

7.6.4 Improper priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

7.7 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

8

Monitoring and Adaptation for MCMC Algorithms . . . . . . . . 237

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

8.2 Monitoring what and why . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

8.2.1 Convergence to the stationary distribution . . . . . . . . . . . . 238

8.2.2 Convergence of averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

8.2.3 Approximating iid sampling . . . . . . . . . . . . . . . . . . . . . . . . 240

8.2.4 The coda package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

8.3 Monitoring convergence to stationarity . . . . . . . . . . . . . . . . . . . . . 242

8.3.1 Graphical diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

8.3.2 Nonparametric tests of stationarity . . . . . . . . . . . . . . . . . . 243

8.3.3 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

8.4 Monitoring convergence of averages . . . . . . . . . . . . . . . . . . . . . . . . 250

8.4.1 Graphical diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

8.4.2 Within and between variances . . . . . . . . . . . . . . . . . . . . . . 253

8.4.3 Eﬀective sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

8.4.4 Fixed-width batch means . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

8.5 Adaptive MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

8.5.1 Cautions about adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 258

8.5.2 The amcmc package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

8.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

Index of R Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

Index of Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

List of Figures

1.1 Illustrations of the processing of vectors in R. . . . . . . . . . . . . . . . .

1.2 Illustrations of the processing of matrices in R. . . . . . . . . . . . . . . .

1.3 Illustrations of the factor class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4 Chosen features of the list class. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5 Deﬁnition of a data frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6 Loess and natural splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7 Autocorrelation and partial autocorrelation plots . . . . . . . . . . . . .

1.8 Simple illustration of bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.9 Bootstrap linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.10 Spline approximation of monthly deaths . . . . . . . . . . . . . . . . . . . . .

1.11 Cumsum illustration for an AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . .

1.12 Range of Brownian motions with conﬁdence band . . . . . . . . . . . .

1.13 Some artiﬁcial loops in R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

11

12

13

14

19

23

24

26

29

30

32

34

2.1

2.2

2.3

2.4

Representation of a uniform random sample . . . . . . . . . . . . . . . . .

Representations of an exponential random sample . . . . . . . . . . . .

Representation of a binomial random sample . . . . . . . . . . . . . . . . .

Generation of beta random variables . . . . . . . . . . . . . . . . . . . . . . . .

43

45

51

54

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

Evaluation of integrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison of integrate and area . . . . . . . . . . . . . . . . . . . . . . . . . . .

One-dimensional Monte Carlo integration . . . . . . . . . . . . . . . . . . .

Importance sampling approximation of a normal tail . . . . . . . . . .

Representation of the posterior π(α, β|x) . . . . . . . . . . . . . . . . . . . .

Analysis of a sample from π(α, β|x) . . . . . . . . . . . . . . . . . . . . . . . . .

Inﬁnite variance importance sampler . . . . . . . . . . . . . . . . . . . . . . . .

Convergence of two estimators of the integral (3.9) . . . . . . . . . . .

Posterior of the regression parameter (β1 , β2 ) . . . . . . . . . . . . . . . .

63

64

67

72

74

78

80

84

86

4.1

4.2

Conﬁdence bands for a simple example . . . . . . . . . . . . . . . . . . . . . . 92

Range and conﬁdence for the Cauchy-normal problem (1) . . . . . 97

xiv

List of Figures

4.3 Range and conﬁdence for the Cauchy-Normal problem (2) . . . . . 98

4.4 ESS and perplexity for the Cauchy-Normal problem . . . . . . . . . . 101

4.5 ESS and perplexity for the Cauchy-Normal problem . . . . . . . . . . 104

4.6 Brownian conﬁdence band for the Cauchy-Normal problem . . . . 106

4.7 Convergence of estimators of E{exp(−X 2 )} . . . . . . . . . . . . . . . . . . 109

4.8 Approximate risks of truncated James–Stein estimators . . . . . . . 114

4.9 Impact of dyadic average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.10 Impact of control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.11 Impact of control variates in logistic regression . . . . . . . . . . . . . . . 121

5.1 Sequences of MLEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2 Newton–Raphson sequences for a mixture likelihood . . . . . . . . . . 129

5.3 Simple Monte Carlo maximization . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.4 Two Cauchy likelihood maximizations . . . . . . . . . . . . . . . . . . . . . . 133

5.5 A Cauchy likelihood approximation . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.6 Representation of the function of Example 5.6 . . . . . . . . . . . . . . . 136

5.7 Stochastic gradient paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.8 Simulated annealing paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.9 Simulated annealing sequence with two modes . . . . . . . . . . . . . . . 146

5.10 Simulated annealing sequence for four schedules . . . . . . . . . . . . . . 147

5.11 Monte Carlo approximations of a probit marginal . . . . . . . . . . . . 149

5.12 EM sequences for a normal censored likelihood . . . . . . . . . . . . . . . 155

5.13 Multiple-mode EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.14 MCEM on logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.1 Metropolis output from a beta target . . . . . . . . . . . . . . . . . . . . . . . 173

6.2 Metropolis simulations from a beta target . . . . . . . . . . . . . . . . . . . 174

6.3 Output of a gamma accept–reject algorithm . . . . . . . . . . . . . . . . . 177

6.4 Metropolis–Hastings schemes for a Cauchy target . . . . . . . . . . . . . 179

6.5 Cumulative coverage for a Cauchy target . . . . . . . . . . . . . . . . . . . . 180

6.6 Fitting the braking data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

6.7 Random walk proposals with diﬀerent scales . . . . . . . . . . . . . . . . . 184

6.8 Scale impact on mixture exploration . . . . . . . . . . . . . . . . . . . . . . . . 185

6.9 Langevin samples for probit posterior . . . . . . . . . . . . . . . . . . . . . . . 187

6.10 Langevin samples for mixture posterior . . . . . . . . . . . . . . . . . . . . . 189

6.11 Cumulative mean plots with diﬀerent scales . . . . . . . . . . . . . . . . . . 193

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

Histograms from the Gibbs sampler of Example 7.2 . . . . . . . . . . . 203

Histograms from the Gibbs sampler of Example 7.3 . . . . . . . . . . . 205

Histograms from the Gibbs sampler of Example 7.5 . . . . . . . . . . . 208

Histograms of the posterior distributions from Example 7.6 . . . . 211

Histograms from the Gibbs sampler of Example 7.7 . . . . . . . . . . . 212

Histograms from the Gibbs sampler of Example 7.8 . . . . . . . . . . . 214

Gibbs output for mixture posterior . . . . . . . . . . . . . . . . . . . . . . . . . 217

Simple slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

List of Figures

xv

7.9 Logistic slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7.10 Histograms from the pump failure data of Example 7.12 . . . . . . . 223

7.11 Autocorrelations from a Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . 225

7.12 Autocovariance plots for the Gibbs sampler of model (7.7) . . . . . 226

7.13 Histograms of λ in Example 7.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

7.14 Histograms from the Gibbs sampler of (7.12) . . . . . . . . . . . . . . . . . 233

8.1

8.2

8.3

8.4

8.5

8.6

8.7

8.8

Raw coda output for the random eﬀect logit model . . . . . . . . . . . 244

Empirical cdfs for the random eﬀect logit parameters . . . . . . . . . 244

Plot of successive Kolmogorov–Smirnov statistics . . . . . . . . . . . . . 246

Comparison of two MCMC scales for the noisy AR model . . . . . 251

Multiple MCMC runs for the noisy AR model . . . . . . . . . . . . . . . . 252

Gelman and Rubin’s evaluation for the noisy AR model . . . . . . . 254

Gelman and Rubin’s evaluation for the pump failure model . . . . 255

Fixed-width batch sampling variance estimation for the pump

failure model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

8.9 Degenerating MCMC adaptation for the pump failure model . . . 261

8.10 Nonconverging non-parametric MCMC adaptation for the

noisy AR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

8.11 Mode-recovering non-parametric MCMC adaptation for the

noisy AR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

List of Examples

1.1

Bootstrapping simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

Exponential variable generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transformations of exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Normal variable generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Negative binomial random variables as mixtures . . . . . . . . . . . . . . . .

Accept–reject for beta variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Continuation of Example 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

46

47

48

49

50

53

55

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

3.10

Precision of integrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

integrate versus area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Monte Carlo convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Precision of a normal cdf approximation . . . . . . . . . . . . . . . . . . . . . . .

Tail probability approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Beta posterior importance approximation . . . . . . . . . . . . . . . . . . . . . .

Continuation of Example 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Importance sampling with inﬁnite variance . . . . . . . . . . . . . . . . . . . .

Selection of the importance sampling function . . . . . . . . . . . . . . . . . .

Probit posterior importance sampling approximation . . . . . . . . . . . .

62

63

65

67

70

71

77

79

82

83

4.1

4.2

4.3

4.4

4.5

4.6

4.7

4.8

4.9

Monitoring with the CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Cauchy prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Continuation of Example 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Continuation of Example 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Continuation of Example 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Student’s t expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

James–Stein estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Continuation of Example 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Cauchy posterior with antithetic variables . . . . . . . . . . . . . . . . . . . . . 115

xviii

List of Examples

4.10

4.11

Continuation of Example 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

5.10

5.11

5.12

5.13

5.14

5.15

5.16

5.17

Maximizing a Cauchy likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Mixture model likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

A ﬁrst Monte Carlo maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Continuation of Example 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Continuation of Example 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Minimization of a complex function . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Continuation of Example 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Continuation of Example 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Simulated annealing for a normal mixture . . . . . . . . . . . . . . . . . . . . . 144

Continuation of Example 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Bayesian analysis of a simple probit model . . . . . . . . . . . . . . . . . . . . . 147

Missing-data mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Censored–data likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Continuation of Example 5.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

EM for a normal mixture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Missing–data multinomial model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Random eﬀect logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

6.10

Metropolis–Hastings algorithm for beta variables . . . . . . . . . . . . . . . 172

Cauchys from normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Metropolis–Hastings for regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Normals from uniforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Metropolis–Hastings for mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Probit regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Continuation of Example 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Acceptance rates: normals from double exponentials . . . . . . . . . . . . 192

Continuation of Example 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

7.9

7.10

7.11

7.12

7.13

Normal bivariate Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Generating beta-binomial random variables . . . . . . . . . . . . . . . . . . . . 202

Fitting a one-way hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Normal multivariate Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Extension of Example 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Censored-data Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Grouped multinomial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

More grouped multinomial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Gibbs for normal mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

A ﬁrst slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Logistic regression with the slice sampler . . . . . . . . . . . . . . . . . . . . . . 219

A Poisson hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Correlation in a bivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

List of Examples

xix

7.14

7.15

7.16

7.17

7.18

7.19

Continuation of Example 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

Normal bivariate Gibbs revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Poisson counts with missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Metropolis within Gibbs illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 230

Conditional exponential distributions—-nonconvergence . . . . . . . . . 232

Improper random eﬀects posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

8.1

8.2

8.3

8.4

8.5

8.6

8.7

8.8

8.9

8.10

8.11

Random eﬀect logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Poisson hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Metropolis–Hastings random walk on AR(1) model . . . . . . . . . . . . . 249

Continuation of Example 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Continuation of Example 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

Continuation of Example 8.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

Another approach to Example 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Continuation of Example 8.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

Adaptive MCMC for anova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

Continuation of Example 8.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Continuation of Example 8.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

1

Basic R Programming

“You’re missing the big picture,” he told her. “A good album should

be more than the sum of its parts.”

Ian Rankin

Exit Music

Reader’s guide

The Reader’s guide is a section that will start each chapter by providing comments on its contents. It also usually contains indications of the purpose of the

chapter and its links with other chapters.

This ﬁrst chapter is where we introduce the programming language R, which we

use to implement and illustrate our algorithms. We discuss here input and output,

data structures, and basic programming commands for this language. It is thus

a crucial chapter for these new to R, but it will unavoidably feel unsatisfactory

because the coverage of those notions will most likely be too sketchy for most

readers. For those already familiar with R, or simply previously exposed to another

introduction to R, this chapter will undoubtedly feel mostly familiar and act as

a refresher, maybe prodding them to delve deeper into the R language using a

reference book. The similarity with the introductory chapter of Bayesian Core is

not coincidental, as we used the same skeleton as in Marin and Robert (2007).

C.P. Robert, G. Casella, Introducing Monte Carlo Methods with R, Use R,

DOI 10.1007/978-1-4419-1576-4_1, © Springer Science+Business Media, LLC 2010

2

1 Basic R Programming

1.1 Introduction

This chapter only attempts at introducing R to newcomers in a few pages

and, as such, it should not be considered as a proper introduction to R. Entire

volumes, such as the monumental R Book by Crawley (2007), the introduction

by Dalgaard (2002), and the focused R data manipulation by Spector (2009),

are dedicated to the practice of this language, and therefore additional eﬀorts

(besides reading this chapter) will be required from the reader to suﬃciently

master the language.1 However, before discouraging anyone, let us comfort

you with the fact that:

a. The syntax of R is simple and logical enough to quickly allow for a basic

understanding of simple R programs, as should become obvious in a few

paragraphs.

b. The best, and in a sense the only, way to learn R is through trial-anderror on simple and then more complex examples. Reading the book with

a computer available nearby is therefore the best way of implementing

this recommendation.

In particular, the embedded help commands help() and help.search() are

very good starting points to gather information about a speciﬁc function or

a general issue, even though more detailed manuals are available both locally

and on-line. Note that help.start() opens a Web browser linked to the local

manual pages.

One may ﬁrst wonder why we support using R as the programming interface for this introduction to Monte Carlo methods, since there exist other

languages, most (all?) of them faster than R, like Matlab, and even free, like C

or Python. We obviously have no partisan or commercial involvement in this

language. Rather, besides the ease of presentation, our main reason for this

choice is that the language combines a suﬃciently high power (for an interpreted language) with a very clear syntax both for statistical computation and

graphics. R is a ﬂexible language that is object-oriented and thus allows the

manipulation of complex data structures in a condensed and eﬃcient manner.

Its graphical abilities are also remarkable, with possible interfacing with a

text processor such as LATEX with the package Sweave. R oﬀers the additional

advantages of being a free and open-source system under the GNU General

Public Licence principle, with constant upgrades and improvements from the

statistics community,2 as well as numerous (free) Web-based tutorials and

user’s manuals,3 and running on all platforms, including both Apple’s Mac

1

If you decide to skip this chapter, be sure to at least print the handy R Reference Card available at http://cran.r-project.org/doc/contrib/Short-refcard.pdf that

summarizes, in four pages, the major commands of R.

2

There is even an R newsletter, R-News, which is available on cran.rproject.org/doc/Rnews.

3

This means it is unlikely that you will need to acquire an R programming book

in order to get a proper understanding of the R language, even though it may

1.2 Getting started

3

and Microsoft Windows (and, obviously, under the Linux and Unix operating

systems). R provides a powerful interface that can integrate programs written

in other languages such as C, C++, Fortran, Perl, Python, and Java. Not only

can you keep programming in your usual language, but integrating programs

written by others in an alien language then becomes (mostly) seamless, as

seen in Chapter 8 with the package amcmc. At last, it is increasingly common

to see people who develop new methodology simultaneously producing an R

package in support of their approach and to back up introductory statistics

courses with illustrations in R, as shown by the expanding Use R! Springer

series in which this book is published.

One choice we have not addressed above is “why R and not BUGS?” BUGS

(which stands for Bayesian inference using Gibbs sampling) is a Bayesian analysis software developed since the early 1990’s, mostly by researchers from the

Medical Research Council (MRC) at Cambridge University. The most common

version is WinBugs, working under Windows, but there also exists an opensource version called OpenBugs. So, to return to the initial question, we are

not addressing the possible links and advantages of BUGS simply because the

purpose is diﬀerent. Our goal is to give a deep but intuitive understanding of

simulation tools for statistical inference, not (directly) help in the construction

of Bayesian models and the Bayesian estimation of their parameters. While

access to Monte Carlo speciﬁcations is possible in BUGS, most computing operations are handled by the software itself, with the possible outcome that the

user does not bother about this side of the problem and instead concentrates

on Bayesian modeling. Thus, while R can be easily linked with BUGS and simulation can be done via BUGS, we think that a lower-level language such as R

is more eﬀective in bringing you in touch with simulation imperatives. Note

that this perspective would not hold so tightly for a book on computational

statistics, as Albert (2009).

1.2 Getting started

The R language is straightforward to install: It can be downloaded (obviously

free) from one of the numerous CRAN (Comprehensive R Archive Network)

mirror Websites around the world.4 (Note that it is resident in most current

Linux kernels.)

At this stage, we do not cover installation of the R package and thus assume

that (a) R is installed on the machine you want to work with and (b) that

you have managed to launch it (in most cases, you simply have to click on

prove useful at a later stage. See http://cran.r-project.org/manuals.html for manuals available on-line.

4

The main CRAN Website is http://cran.r-project.org/.

4

1 Basic R Programming

the proper icon). You should then obtain a terminal window whose ﬁrst lines

resemble the following, most likely with a more recent version:

R version 2.5.1 (2007-06-27)

Copyright (C) 2007 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type ’license()’ or ’licence()’ for distribution details.

R is a collaborative project with many contributors.

Type ’contributors()’ for more information and

’citation()’ on how to cite R or R packages in publications.

Type ’demo()’ for some demos, ’help()’ for on-line help, or

’help.start()’ for an HTML browser interface to help.

Type ’q()’ to quit R.

>

Neither this austere beginning nor the prospect of using a line editor should

put you oﬀ, though, as there are many other ways of inputting and outputting

commands and data, as we shall soon see! The ﬁnal line above with the symbol

> is not a typo but rather means that the R software is waiting for a command

from the user (i.e., you). This character > at the beginning of each line in the

executable window is called the prompt and precedes the line command, which

is terminated by pressing the RETURN key. At this early stage, all commands

will be passed as line commands, and you should thus spot commands thanks

to this symbol.

Make sure to remember that exiting R can be done by typing q() after

the prompt, as in

> q()

Save workspace image? [y/n/c]: n

the options being proposed in the line right after the prompt having to

do with the storage of the command history and the produced objects,

something you can ignore at this stage. Commands and programs that

need to be stopped during their execution, for instance because they take

too long or too much memory to complete or because they involve a

programming mistake such as an inﬁnite loop, can be stopped by the

Control-C double-key action without exiting the R session.

1.3 R objects

5

Exercise 1.1 To start with a limited challenge, using the ﬁnal lines above the

prompt, you could embark on an on-line visit of the main features of R by typing

demo() after the prompt (make sure to test demo(image) and demo(graphics)

to get an idea of the great graphical abilities of R). More statistical demos are

also available, as listed by demo().

For memory and eﬃciency reasons, R does not install all the available

functions and programs when launched but only the basic packages that it

requires to run properly. Those basic packages are base, stats, graphics, nmle,

and lattice. Additional packages can be loaded via the library command, as

in

> library(combinat) # combinatorics utilities

> library(datasets) # The R Datasets Package

and the entire list of available packages is provided by library(). (The symbol

# in the prompt lines above indicates a comment: All characters following #

until the end of the command line are ignored. Comments are recommended to

improve the readability of your programs.) There exist hundreds of packages

available on the Web.5 Installing a new package such as the package mcsm

that is associated with this book is done by downloading the ﬁle from the

Web depository and calling

> install.package(“mcsm”)

or

> download.package(“mcsm”)

For a given package, the install command obviously needs to be executed

only once, while the library call is required each time R is launched (as the

corresponding package is not kept as part of the .RData ﬁle, whose role is

explained below in Section 1.9). Thus, it is good practice to include calls to

required libraries within your R programs in order to avoid error messages

when launching them.

1.3 R objects

As with many advanced programming languages, R distinguishes between several types of objects. Those types include scalar, vector, matrix, time series,

data frames, functions, or graphics. An R object is mostly characterized by a

mode that describes its contents and a class that describes its structure. The

R function str applied to any R object, including R functions, will show its

structure. The diﬀerent modes are

5

Packages that have been validated and tested by the R core team are listed at

http://cran.r-project.org/src/contrib/PACKAGES.html.

6

–

1 Basic R Programming

null (empty object),

logical (TRUE or FALSE),

numeric (such as 3, 0.14159, or 2+sqrt(3)),

complex, (such as 3-2i or complex(1,4,-2)), and

character (such as ”Blue”, ”binomial”, ”male”, or “y=a+bx”),

and the main classes are vector, matrix, array, factor, time-series, data.frame, and

list. Heterogeneous objects such as those of the list class can include elements

with various modes. Manual entries about those classes can be obtained via

the help commands help(data.frame) or ?matrix for instance.

R can operate on most of those types as a regular function would operate

on a scalar, which is a feature that should be exploited as much as possible for

compact and eﬃcient programming. The fact that R is interpreted rather than

compiled involves many subtle diﬀerences, but a major issue is that all variables in the system are evaluated and stored at every step of R programs. This

means that loops in R are enormously time-consuming and should be avoided

at all costs! Therefore, using the shortcuts oﬀered by R in the manipulation

of vectors, matrices, and other structures is a must.

1.3.1 The vector class

As indicated logically by its name, the vector object corresponds to a mathematical vector of elements of the same type, such as (TRUE,TRUE,FALSE,TRUE)

or (1,2,3,5,7,11). Creating small vectors can be done using the R command

c() as in

> a=c(2,6,-4,9,18)

This fundamental function combines or concatenates terms together. For instance,

> d=c(a,b)

concatenates the two vectors a and b into a new vector d. Note that decimal

numbers should be encoded with a dot, character strings in quotes ” “, and

logical values with the character strings TRUE and FALSE or with their respective abbreviations T and F. Missing values are encoded with the character

string NA. The option recursive=TRUE in c() allows breaking down a list into

its individual components.

Being able to use abbreviations in R is quite handy, but this may lead to

confusion! In particular, the use of T instead of TRUE is only valid if T is not

deﬁned otherwise in the current R session. Since T is a standard symbol

in Monte Carlo simulation, often denoting the number of iterations, this

may create unsuspected problems. For instance, using

1.3 R objects

7

> sort(weit,dec=T)

Error in sort(weit/sum(weit), dec = T) :

’decreasing’ must be a length-1 logical vector.

Did you intend to set ’partial’?

resulted in an error message because T was already deﬁned in the R

program as 103 .

In Figure 1.1, we give a few illustrations of the use of vectors in R. The

character + indicates that the console is waiting for a supplementary instruction, which is useful when typing long expressions. The assignment operator

is =, not to be confused with ==, which is the Boolean operator for equality.

An older assignment operator is x b=a[2:4]

build the numeric vector b of dimension 3

with elements 5.6, 1, 4

> d=a[c(1,3,5)]

build the numeric vector d of dimension 3

with elements 5, 1, –5

> 2*a

multiply each element of a by 2

and display the result

> b%%3

provides each element of b modulo 3

> d%/%2.4

computes the integer division of each element of d by 2.4

> e=3/d

build the numeric vector e of dimension 3

and elements 3/5, 3, –3/5

> log(d*e)

multiply the vectors d and e term by term

and transform each term into its natural logarithm

> sum(d)

calculate the sum of d

> length(d)

display the length of d

> t(d)

transpose d, the result is a row vector

> t(d)%*%e

scalar product between the row vector t(b) and

the column vector e with identical length

> t(d)*e

elementwise product between two vectors

with identical lengths

> g=c(sqrt(2),log(10)) build the numeric

√ vector g of dimension 2

and elements 2, log(10)

> e[d==5]

build the subvector of e that contains the

components e[i] such that d[i]=5

> a[-3]

create the subvector of a that contains

all components of a but the third.6

> is.vector(d)

display the logical expression TRUE if

a vector and FALSE else

> a=c(5,5.6,1,4,-5)

Fig. 1.1. Illustrations of the processing of vectors in R.

Note the convenient use of Boolean expressions to extract subvectors from

a vector without having to resort to a component-by-component test (and

hence a loop). The quantity d==5 is itself a vector of Booleans, while the number of components satisfying the constraint can be computed by sum(d==5).

The ability to apply scalar functions to vectors as a whole is also a major advantage of R. In the event the function depends on a parameter or an option,

this quantity can be entered as in

> e=gamma(e^2,log=T)

which returns the vector with components log Γ (e2i ). Functions that are specially designed for vectors include, for instance, sample, permn, order, sort,

and rank, which all have to do with manipulating the order in which the

components of the vector occur. (Note that permn is part of the combinat

6

Positive and negative indices cannot be used simultaneously.

1.3 R objects

9

library, as indicated when typing help.search(“permn”), which returns a

permn(combinat) among its matching entries.)

Exercise 1.3 Test the help command on the functions seq, sample, and

order. (Hint: Start with help(help).)

Exercise 1.4 Explain the diﬀerence between the functions order and rank. For

the function rep. , explain the diﬀerence between the options times, length.out,

and each.

Besides their numeric and logical indexes, the components of a vector can

also be identiﬁed by names. For a given vector x, names(x) is a vector of

characters of the same length as x. This additional attribute is most useful

when dealing with real data, where the components have a meaning such

as “unemployed” or “democrat”. Those names can also be erased by the

command

> names(x)=NULL

The : operator found in Figure 1.1 is a very useful device that deﬁnes

a consecutive sequence, but it is also fragile in that reverse sequences do

not always produce what is expected.7 For one thing, 1:n-1 is interpreted

as (1:n)-1 rather than 1:(n-1). For another, while 3:1 returns the vector

c(3,2,1), the command 1:0 returns c(1,0), which may or may not be okay

depending on the circumstances. For instance, a[1:0] will only return a[1],

and this may not be the limiting case the programmer had in mind. Note also

that a[0] does not produce an error message but a vector with length zero.

Exercise 1.5 Show that the alternative seq(1,n-1,by=1) does not suﬀer from

the same drawbacks as 1:(n-1). Find a modiﬁcation of by=1 that handles the

case where n ≤ 1.

1.3.2 The matrix, array, and factor classes

The matrix class provides the R representation of matrices. A typical entry is,

for instance,

> x=matrix(vec,nrow=n,ncol=p)

which creates an n × p matrix whose elements are those of the vector vec,

assuming this vector is of dimension np. An important feature of this entry

is that, in a somewhat unusual way, the components of vec are stored by

column, which means that x[1,1] is equal to vec[1], x[2,1] is equal to

7

This diﬃculty was pointed out by Radford Neal.

10

1 Basic R Programming

vec[2], and so on, except if the option byrow=T is used in matrix. (Because

of this choice of storage convention, working on R matrices columnwise is

faster then working rowwise.) Note also that, if vec is of dimension n × p, it is

not necessary to specify both the nrow=n and ncol=p options in matrix. One

of those two parameters is suﬃcient to deﬁne the matrix. On the other hand,

if vec is not of dimension n × p, matrix(vec,nrow=n,ncol=p) will create an

n × p matrix with the components of vec repeated the appropriate number of

times. For instance,

> matrix(1:4,ncol=3)

[,1] [,2] [,3]

[1,]

1

3

1

[2,]

2

4

2

Warning message:

data length [4] is not a submultiple or multiple of the number

of columns [3] in matrix in: matrix(1:4, ncol = 3)

produces a 2 × 3 matrix along with a warning message that something may

be missing in the call to matrix. Note again that 1, 2, 3, 4 are entered consecutively when following the column (or lexicographic) order. Names can be

given to the rows and columns of a matrix using the rownames and colnames

functions.

In some situations, it is useful to remember that an R matrix can also be

used as a vector. If x is an n×p matrix, x[i, j]=x[i+n*(j-1)] is equal

to x[i,j], i.e., x can also be manipulated as a vector made of the columns

of vec piled on top of one another. For instance, x[x>5] is a vector, while

x[x>5]=0 modiﬁes the right entries in the matrix x. Conversely, vectors can

be turned into p × 1 matrices by the command as.matrix. Note that x[1,]

produces the ﬁrst row of x as a vector rather than as a p × 1 matrix.

R allows for a wide range of manipulations on matrices, both termwise and

in the classical matrix algebra perspective. For instance, the standard matrix

product is denoted by %*%, while * represents the term-by-term product. (Note

that taking the product a%*%b when the number of columns of a diﬀers from

the number of rows of b produces an error message.) Figure 1.2 gives a few

examples of matrix-related commands. The apply function is particularly easy

to use for functions operating on matrices by row or column.

The function diag can be used to extract the vector of the diagonal elements of a matrix, as in diag(a), or to create a diagonal matrix with a

given diagonal, as in diag(1:10). Since matrix algebra is central to good

programming in R, as matrix programming allows for the elimination of timeconsuming loops, it is important to be familiar with matrix manipulation. For

instance, the function crossprod replaces the product t(x)%*%y on either

vectors or matrices by crossprod(x,y) more eﬃciently:

> system.time(crossprod(1:10^6,1:10^6))

1.3 R objects

11

build the numeric matrix x1 of dimension

5 × 4 with ﬁrst row 1, 6, 11, 16

> x2=matrix(1:20,nrow=5,byrow=T) build the numeric matrix x2 of dimension

5 × 4 with ﬁrst row 1, 2, 3, 4

> a=x3%*%x2

matrix summation of x2 and x3

> x3=t(x2)

transpose the matrix x2

> b=x3%*%x2

matrix product between x2 and x3,

with a check of the dimension compatibility

> c=x1*x2

term-by-term product between x1 and x2

> dim(x1)

display the dimensions of x1

> b[,2]

select the second column of b

> b[c(3,4),]

select the third and fourth rows of b

> b[-2,]

delete the second row of b

> rbind(x1,x2)

vertical merging of x1 and x2

> cbind(x1,x2)

horizontal merging of x1 and x2

> apply(x1,1,sum)

calculate the sum of each row of x1

> as.matrix(1:10)

turn the vector 1:10 into a 10 × 1 matrix

> x1=matrix(1:20,nrow=5)

Fig. 1.2. Illustrations of the processing of matrices in R.

user system elapsed

0.016

0.048

0.066

> system.time(t(1:10^6)%*%(1:10^6))

user system elapsed

0.084

0.036

0.121

(You can also check the symmetric function tcrossprod.)

Eigenanalysis of square matrices is also included in the base package. For

instance, chol(m) returns the upper triangular factor of the Choleski decomposition of m; that is, the matrix R such that RT R is equal to m. Similarly,

eigen(m) returns a list (see Section 1.3.3) that contains the eigenvalues of

m (some of which can be complex numbers) as well as the corresponding

eigenvectors (some of which are complex if there are complex eigenvalues).

Related functions are svd and qr, which provide the singular values and the

QR decomposition of their argument, respectively. Note that the inverse M−1

of a matrix M can be found either by solve(M) (recommended) or ginv(M),

which requires downloading the library MASS and also produces generalized

inverses (which may be a mixed blessing since the fact that a matrix is not invertible is not signaled by ginv). Special versions of solve are backsolve and

forwardsolve, which are restricted to upper and lower diagonal triangular

systems, respectively. Note also the alternative of using chol2inv which returns the inverse of a matrix m when provided by the Choleski decomposition

chol(m).

Structures with more than two indices are represented by arrays and can

also be processed by R commands, for instance x=array(1:50,c(2,5,5)),

which gives a three-entry table of 50 terms. Once again, they can also be

interpreted as vectors.

12

1 Basic R Programming

The apply function used in Figure 1.2 is a very powerful device that operates on arrays and, in particular, matrices. Since it can return arrays, it

bypasses calls to multiple loops and makes for (sometimes) quicker and (always) cleaner programs. It should not be considered as a panacea, however,

as apply hides calls to loops inside a single command. For instance, a comparison of apply(A, 1, mean) with rowMeans(A) shows the second version

is about 200 times faster. Using linear algebra whenever possible is therefore

a more eﬃcient solution. Spector (2009, Section 8.7) gives a detailed analysis

of the limitations of apply and the advantages of vectorization in R.

A factor is a vector of characters or integers used to specify a discrete

classiﬁcation of the components of other vectors with the same length. Its

main diﬀerence from a standard vector is that it comes with a level attribute

used to specify the possible values of the factor. This structure is therefore

appropriate to represent qualitative variables. R provides both ordered and unordered factors, whose major appeal lies within model formulas, as illustrated

in Figure 1.3. Note the subtle diﬀerence between apply and tapply.

> state=c(“tas”,”tas”,”sa”,”sa”,”wa”) create a vector with ﬁve values

> statef=factor(state)

distinguish entries by group

> levels(statef)

give the groups

> incomes=c(60,59,40,42,23)

create a vector of incomes

> tapply(incomes,statef,mean)

average the incomes for each group

> statef=factor(state,

deﬁne a new level with one more

+ levels=c(“tas”,”sa”,”wa”,”yo”))

group than observed

> table(statef)

return statistics for all levels

Fig. 1.3. Illustrations of the factor class.

1.3.3 The list and data.frame classes

A list in R is a rather loose object made of a collection of other arbitrary

objects known as its components.8 For instance, a list can be derived from n

existing objects using the function list:

a=list(name_1=object_1,…,name_n=object_n)

This command creates a list with n arguments using object_1,…,object_n

for the components, each being associated with the argument’s name, name_i.

For instance, a$name_1 will be equal to object_1. (It can also be represented

as a[[1]], but this is less practical, as it requires some bookkeeping of the

order of the objects contained in the list.) Lists are very useful in preserving

information about the values of variables used within R functions in the sense

8

Lists can contain lists as elements.

1.3 R objects

13

that all relevant values can be put within a list that is the output of the

corresponding function (see Section 1.7 for details about the construction of

functions in R). Most standard functions in R, for instance eigen in Figure 1.4,

return a list as their output. Note the use of the abbreviations vec and val in

the last line of Figure 1.4. Such abbreviations are acceptable as long as they

do not induce confusion. (Using res$v would not work!)

> li=list(num=1:5,y=”color”,a=T)

create a list with three arguments

> a=matrix(c(6,2,0,2,6,0,0,0,36),nrow=3) create a (3, 3) matrix

> res=eigen(a,symmetric=T)

diagonalize a and

> names(res)

produce a list with two

arguments: vectors and values

> res$vectors

vectors arguments of res

> diag(res$values)

create the diagonal matrix

of eigenvalues

> res$vec%*%diag(res$val)%*%t(res$vec) recover a

Fig. 1.4. Chosen features of the list class.

The local version of apply is lapply, which computes a function for each

argument of the list

> x = list(a = 1:10, beta = exp(-3:3),

+ logic = c(TRUE,FALSE,FALSE,TRUE))

> lapply(x,mean) #compute the empirical means

$a

[1] 5.5

$beta

[1] 4.535125

$logic

[1] 0.5

provided each argument is of a mode that is compatible with the function

argument (i.e., is numeric in this case). A “user-friendly” version of lapply is

sapply, as in

> sapply(x,mean)

a

beta

logic

5.500000 4.535125 0.500000

The last class we brieﬂy mention here is the data frame. A data frame

is a list whose elements are possibly made of diﬀering modes and attributes

but have the same length, as in the example provided in Figure 1.5. A data

frame can be displayed in matrix form, and its rows and columns can be

extracted using matrix indexing conventions. A list whose components satisfy

the restrictions imposed on a data frame can be coerced into a data frame

14

1 Basic R Programming

using the function as.data.frame. The main purpose of this object is to

import data from an external ﬁle by using the read.table function.

simulate 30 independent uniform

random variables on {1, 2, . . . , 12}

> v2=sample(LETTERS[1:10],30,rep=T) simulate 30 independent uniform

random variables on {a, b, …., j}

> v3=runif(30)

simulate 30 independent uniform

random variables on [0, 1]

> v4=rnorm(30)

simulate 30 independent realizations

from a standard normal distribution

> xx=data.frame(v1,v2,v3,v4)

create a data frame

> v1=sample(1:12,30,rep=T)

Fig. 1.5. Deﬁnition of a data frame.

1.4 Probability distributions in R

R is primarily a statistical language. It is therefore well-equipped with probability distributions. As described in Table 1.1, all standard distributions are

available, with a clever programming shortcut: A “core” name, such as norm,

is associated with each distribution, and the four basic associated functions,

namely the cdf, the pdf, the quantile function, and the simulation procedure,

are deﬁned by appending the preﬁxes d, p, q, and r to the core name, such

as dnorm, pnorm, qnorm, and rnorm. Obviously, each function requires additional entries, as in pnorm(1.96) or rnorm(10,mean=3,sd=3). Recall that

pnorm and qnorm are inverses of one another.

Exercise 1.6 Study the properties of the R function lm using simulated data as

in

> x=rnorm(20)

> y=3*x+5+rnorm(20,sd=0.3)

> reslm=lm(y∼x)

> summary(reslm)

The simulation aspects related to the normal distribution (and to these other

standard distributions) will be discussed in detail in Chapter 2.

1.5 Basic and not-so-basic statistics

R is designed by statisticians, as the logical continuation of the former S-plus

language, and, as such, it oﬀers a very wide range of statistical packages that

cover the entire spectrum of statistics. The battery of these (classical) statistical tools, ranging from descriptive statistics to non-parametric density

1.5 Basic and not-so-basic statistics

15

Table 1.1. Standard distributions with R core name.

Distribution

Beta

Binomial

Cauchy

Chi-square

Exponential

F

Gamma

Geometric

Hypergeometric

Log-normal

Logistic

Normal

Poisson

Student

Uniform

Weibull

Core

beta

binom

cauchy

chisq

exp

f

gamma

geom

hyper

lnorm

logis

norm

pois

t

unif

weibull

Parameters

shape1, shape2

size, prob

location, scale

df

1/mean

df1, df2

shape,1/scale

prob

m, n, k

mean, sd

location, scale

mean, sd

lambda

df

min, max

shape

Default Values

0, 1

1

NA, 1

0, 1

0, 1

0, 1

0, 1

estimation and generalized linear models, cannot be provided in this book,

but we refer you to, for instance, Dalgaard (2002) or Venables and Ripley

(1999) for a detailed introduction.

At the most basic level, descriptive statistics can be obtained for any object

with numerical entries. For instance, mean (possibly trimmed), var, sd, median,

quantile, and summary produce standard estimates for the samples on which

they are called. Note that, due to the choice of an unbiased version of this

estimator that involves dividing the sum of squares by n−1 instead of dividing

by n, the variance estimator of a single observation is NA rather than 0.

When applied to a matrix x, the output of var(x) diﬀers from the output

of sd(x)^2

> b=matrix(1:9,ncol=3)

> var(b)

[,1] [,2] [,3]

[1,]

1

1

1

[2,]

1

1

1

[3,]

1

1

1

> sd(b)^2

[1] 1 1 1

because the former returns an estimate of the covariance between the columns

of x, while the latter produces an estimate of the variances of the columns.

Note that the deﬁnition of b only speciﬁes the number of columns, 3 in this

16

1 Basic R Programming

case, and thus assumes that the length of the vector is a multiple of 3. (If

it is not, R produces a warning that the data length is not a submultiple or

multiple of the number of columns.)

Classical hypothesis tests, such as the equality of two means or the equality of two variances, can be conducted using standard functions. Typing

help.search(“test”) will produce a list of tests that most likely contains

more tests than you have previously heard of. For example, checking that the

mean of a normal sample with unknown variance is zero can be conducted

using the t test (Casella and Berger, 2001) as

> x=rnorm(25) #produces a N(0,1) sample of size 25

> t.test(x)

One Sample t-test

data: x

t = -0.8168, df = 24, p-value = 0.4220

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

-0.4915103 0.2127705

sample estimates:

mean of x

-0.1393699

whose outcome can be interpreted as providing a p-value of 0.4220 (i.e., a fairly

large probability of observing a larger empirical average x̄ than the one just

observed, −0.139) and hence as concluding that the data do not contradict

the null hypothesis.

As pointed out previously, all but the most basic R functions return lists

as their output (or value). For instance, when running t.test above, the

output involves nine arguments:

> out=t.test(x)

> names(out)

[1] “statistic” “parameter” “p.value” “conf.int” “estimate”

[6] “null.value” “alternative” “method” “data.name”

which can be handled separately, as for instance in as.numeric(out$est)^2.

Similarly, the presence of correlation between two variables can be tested

by cor.test, as in the example

> attach(faithful) #resident dataset

> cor.test(faithful[,1],faithful[,2])

1.5 Basic and not-so-basic statistics

17

Pearson’s product-moment correlation

data: faithful[, 1] and faithful[, 2]

t = 34.089, df = 270, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8756964 0.9210652
sample estimates:
cor
0.9008112
which concludes that the data, faithful, made of eruptions and waiting,
which correspond to the eruption times and the waiting times of the Old Faithful geyser in Yellowstone National Park, has its two variables most certainly
correlated.
Non-parametric tests such as the one-sample and two-sample Kolmogorov–
Smirnov adequation tests (ks.test), Shapiro’s normality test (shapiro.test),
Kruskall–Wallis homogeneity test (kruskal.test), and Wilcoxon rank tests
(wilcox.test) are available. For instance, testing for normality on the faithful
dataset leads to
> ks.test(jitter(faithful[,1]),pnorm)

One-sample Kolmogorov-Smirnov test

data: jitter(faithful[, 1])

D = 0.9486, p-value < 2.2e-16
alternative hypothesis: two-sided
> shapiro.test(faithful[,2])

Shapiro-Wilk normality test

data: faithful[, 2]

W = 0.9221, p-value = 1.016e-10

> wilcox.test(faithful[,1])

Wilcoxon signed rank test with continuity correction

data: faithful[, 1]

V = 37128, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 0
In the ﬁrst command line above, the function jitter is used to perturb each
entry in the dataset in order to remove the ties within it. Otherwise, the
p-value cannot be computed:
18
1 Basic R Programming
Warning message:
cannot compute correct p-values with ties in:
ks.test(faithful[, 1], pnorm)
This function is also quite handy when plotting datasets with ties.
Most R functions require arguments, and most of them have default values
for at least some arguments. For instance, the Wilcoxon test wilcox.test has
mu=0 as its default location to be tested. Those default values are indicated
on the help page of the functions.
Non-parametric kernel density estimates can similarly be constructed via
the function density and are quite amenable to calibration, from the choice of
the kernel to the choice of the bandwidth (see Venables and Ripley, 1999).
In our case, they will be handy as sample-based proposals when designing
MCMC algorithms in Chapters 6 and 8. Spline modeling also is available via
the functions spline and splinefun. Non-parametric regression can be performed
via the loess function or using natural splines.
For instance, the data constructed as
> Nit = c(0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,6,6,6)

> AOB =c(4.26,4.15,4.68,6.08,5.87,6.92,6.87,6.25,

+ 6.84,6.34,6.56,6.52,7.39,7.38,7.74,7.76,8.14,7.22)

reports on the relationship between nitrogen level in soil (coded 0,1,2,3,4,6)

and abundance of a bacteria called AOB, reproduced on the left-hand side of

Figure 1.6. The loess and natural spline ﬁts are obtained via the R code

> AOBm=tapply(AOB,Nit,mean)

#means of AOB

> Nitm=tapply(Nit,Nit,mean)

#means of Nit

> plot(Nit,AOB,xlim=c(0,6),ylim=c(min(AOB),max(AOB)),pch=19)

> fitAOB=lm(AOBm∼ns(Nitm,df=2))

#natural spline

> xmin=min(Nit);xmax=max(Nit)

> lines(seq(xmin,xmax,.5),

#fit to means

+

predict(fitAOB,data.frame(Nitm=seq(xmin,xmax,.5))))

> fitAOB2=loess(AOBm∼Nitm,span = 1.25)

#loess

> lines(seq(xmin,xmax,.5),

#fit to means

+

predict(fitAOB2,data.frame(Nitm=seq(xmin,xmax,.5))))

where the function ns requires the splines library. The loess ﬁt will vary

with the choice of span, as the natural spline ﬁt will vary with the choice of

ns.

Covariates can be used as well for more advanced statistical modeling.

Indeed, linear and generalized linear (regression) models are similarly welldeveloped in R. The syntax is slightly unusual, though, since a standard linear

regression is launched as follows:

> x=seq(-3,3,le=5) # equidispersed regressor

1.5 Basic and not-so-basic statistics

19

Fig. 1.6. Scatterplot of bacteria abundance (AOB) versus nitrogen levels (left

panel). The right panel shows both the natural spline ﬁt (dark) with ns=2 and loess

ﬁt (light) with span=1.25.

> y=2+4*x+rnorm(5) # simulated variable

> lm(y∼x)

Call:

lm(formula = y ∼ x)

Coefficients:

(Intercept)

1.820

x

4.238

> summary(lm(y∼x))

Call:

lm(formula = y ∼ x)

Residuals:

1

2

0.25219 -0.07421

3

4

0.07080 -0.92773

5

0.67895

Coefficients:

Estimate Std. Error t value Pr(>|t|)

20

1 Basic R Programming

(Intercept)

1.8203

0.3050

5.967 0.00942 **

x

4.2381

0.1438 29.472 8.58e-05 ***

–Signif. codes: 0 ‘***’ .001 ‘**’ .01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6821 on 3 degrees of freedom

Multiple R-Squared: 0.9966,

Adjusted R-squared: 0.9954

F-statistic: 868.6 on 1 and 3 DF, p-value: 8.58e-05

The core idea is to introduce the model formula y∼x as the argument to the

function. This model means that y is regressed on x. If no intercept is involved,

the model is modiﬁed as y∼x-1. Introducing interactions in the regression can

be speciﬁed via the colon (:) symbol, following the syntax of McCullagh and

Nelder (1989) for generalized linear models.

The function lm produces a list, and the estimates of the regression coeﬃcients can be recovered as lm(y∼x)$coeff. Surprisingly, the estimated

standard error (0.6821 above) is not an argument of this list and needs to be

computed by

> out=lm(y∼x)

> sqrt(sum(out$res^2)/out$df)

[1] 0.6821

rather than via var(out$res), which uses the “wrong” number of degrees

of freedom. Note that the package arm (Gelman and Hill, 2006) provides a

cleaner output than summary via its display function.

An analysis of variance can be done by recycling output from lm, as in

this analysis on the impact of food type on chicken weights:

> summary(lm(weight ∼ feed, data = chickwts))

Call:

lm(formula = weight ∼ feed, data = chickwts)

Residuals:

Min

1Q

-123.909 -34.413

Median

1.571

3Q

38.170

Max

103.091

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)

323.583

15.834 20.436 < 2e-16 ***
feedhorsebean -163.383
23.485 -6.957 2.07e-09 ***
feedlinseed
-104.833
22.393 -4.682 1.49e-05 ***
feedmeatmeal
-46.674
22.896 -2.039 0.045567 *
feedsoybean
-77.155
21.578 -3.576 0.000665 ***
1.5 Basic and not-so-basic statistics
21
feedsunflower
5.333
22.393
0.238 0.812495
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 54.85 on 65 degrees of freedom
Multiple R-Squared: 0.5417,
Adjusted R-squared: 0.5064
F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
> anova(lm(weight ∼ feed, data = chickwts))

Analysis of Variance Table

Response: weight

Df Sum Sq Mean Sq F value

Pr(>F)

feed

5 231129

46226 15.365 5.936e-10 ***

Residuals 65 195556

3009

–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

where the ﬁrst command produces the regression coeﬃcients for each type

of food, while the second command evaluates the relevance of the regression

model (and concludes positively). When using factor variables, more speciﬁc

analyzes can be conducted by splitting the degrees of freedom in aov using

the option split.

Generalized linear models can be equally well-estimated thanks to the

polymorphic function glm. For instance, ﬁtting a binomial generalized linear

model to the probability of suﬀering from diabetes for a woman within the

Pima Indian population is done by

> glm(formula = type ∼ bmi + age, family = “binomial”,

+

data = Pima.tr)

Deviance Residuals:

Min

1Q

Median

-1.7935 -0.8368 -0.5033

3Q

1.0211

Max

2.2531

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -6.49870

1.17459 -5.533 3.15e-08 ***

bmi

0.10519

0.02956

3.558 0.000373 ***

age

0.07104

0.01538

4.620 3.84e-06 ***

–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

22

1 Basic R Programming

Null deviance: 256.41

Residual deviance: 215.93

AIC: 221.93

on 199

on 197

degrees of freedom

degrees of freedom

Number of Fisher Scoring iterations: 4

concluding with the signiﬁcance both of the body mass index bmi and the age.

Other generalized linear models can be deﬁned by using a diﬀerent family

value whose range is provided by the function family. Note also that link

functions diﬀerent from the intrinsic (and default) link functions (McCullagh

and Nelder, 1989) can be speciﬁed, as well as a scale factor, as in

> glm(y ∼ x, family=quasi(var=”mu^2″, link=”log”))

where the model corresponds to a quasi-likelihood with link equal to the log

function.

Unidimensional and multidimensional time series (xt )t can be handled

directly by the arima function, following a Box–Jenkins-like analysis,

> arima(diff(EuStockMarkets[,1]),order=c(0,0,5))

Call:

arima(x = diff(EuStockMarkets[, 1]), order = c(0, 0, 5))

Coefficients:

ma1

ma2

0.0054 -0.0130

s.e. 0.0234

0.0233

ma3

-0.0110

0.0221

sigma^2 estimated as 1053:

aic = 18226.45

ma4

-0.0041

0.0236

ma5

-0.0486

0.0235

intercept

2.0692

0.6990

log likelihood = -9106.23,

while more advanced models can be ﬁtted thanks to the function StructTS.

Simpler time-series functions can also be used, such as

> acf(ldeaths, plot=F) #monthly deaths from bronchitis,

#emphysema and asthma in the UK, 1974-1979

Autocorrelations of series ‘ldeaths’, by lag

0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833

1.000 0.755 0.397 0.019 -0.356 -0.609 -0.681 -0.608

0.6667 0.7500 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500

-0.378 -0.013 0.383 0.650 0.723 0.638 0.372 0.009

1.3333 1.4167 1.5000

-0.294 -0.497 -0.586

which is less straightforward to analyze than its graphical alternatives

1.5 Basic and not-so-basic statistics

23

Fig. 1.7. Autocorrelation and partial autocorrelation plots for the ldeaths series,

representing monthly deaths from bronchitis, emphysema, and asthma in the UK

over the period 1974—1979. The dashed horizontal lines correspond to the signiﬁcance boundaries for the non-zero terms.

> acf(ldeaths)

> acf(ldeaths,type=”partial”)

represented on Figure 1.7 for both the standard and the partial autocorrelations. The standard autocorrelation exhibits quite clearly the seasonal pattern

of the deaths.

Lastly, we mention the bootstrap. This procedure has many uses (see Efron

and Tibshirani, 1993), but here we will mainly illustrate its use as a means of

attaching standard errors.

For readers unfamiliar with the notion of the bootstrap, we brieﬂy recall

here that this statistical method is based upon the notion that the empirical

distribution of a sample X1 , . . . , Xn converges in n to the true distribution.

(The empirical distribution is a discrete distribution that puts a probability

1/n on every point Xi of the sample and 0 everywhere else.) The bootstrap

procedure then uses the empirical distribution as a substitute for the true

distribution to construct variance estimates and conﬁdence intervals. Given

24

1 Basic R Programming

Fig. 1.8. Histogram of 2500 bootstrap means of a sample of size 8 generated from

a gamma G(4, 1) distribution, along with the normal approximation.

that the empirical distribution has a ﬁnite but large support made of nn

points, Monte Carlo approximations are almost always necessary to evaluate

those quantities, as illustrated in the next example.

For example, if we have a data vector y, we can create a bootstrap sample

y ∗ using the code

> ystar=sample(y,replace=T)

Figure 1.8 shows a histogram of 2500 bootstrap means mean(ystar) based on

the sample

y = c(4.313, 4.513, 5.489, 4.265, 3.641, 5.106, 8.006, 5.087),

along with the normal approximation based on the original sample (i.e., using

the empirical mean and empirical variance of y). The sample was in fact drawn

from a gamma G(4, 1) distribution, and we can see on this graph that the

bootstrap approximation does capture some of the skewness in the distribution

of ȳ (indeed, the sample size n = 8 is rather small for the Central Limit

Theorem to apply). The standard deviation of the sample is 0.4699, while the

standard deviation of the 2500 bootstrap means is 0.4368, showing agreement.

One diﬃculty in implementing the bootstrap technique is that it is not

always clear which quantity should be bootstrapped. A treatment of this im-

1.5 Basic and not-so-basic statistics

25

portant topic is outside the scope of this book, and we caution the reader to

verify the proper bootstrapping technique in any particular application.

As an example, we apply the bootstrap technique to the simple linear

regression example seen previously in this chapter.

Example 1.1. Recall the linear regression

> x=seq(-3,3,le=5)

> y=2+4*x+rnorm(5)

> lm(y∼x)

# equidispersed regressor

# simulated dependent variable

This corresponds to the regression model

Yij = α + βxi + εij ,

where α and β are the unknown intercept and slope, and the εij are the iid normal

errors. We ﬁt the model with least squares and get α̂ = 1.820 and β̂ = 4.238.9

The residuals from the least squares ﬁt are given by

ε̂ij = yij − α̂ − β̂xi ,

and these are the random variables making the sample that we bootstrap. That

is, we create bootstrap samples by resampling the ε̂ij ’s, producing a new sample

(ε̂∗ij )ij by sampling with replacement from the ε̂ij ’s. The bootstrap data are then

∗

= yij + ε̂∗ij . This can be implemented with the R code

yij

> fit=lm(y∼x)

#fit the linear model

> Rdata=fit$residuals

#get the residuals

> nBoot=2000

#number of bootstrap samples

> B=array(0,dim=c(nBoot, 2)) #bootstrap array

> for(i in 1:nBoot){

#bootstrap loop

>

ystar=y+sample(Rdata,replace=T)

>

Bfit=lm(ystar∼x)

>

B[i,]=Bfit$coefficients

>

}

The results of this bootstrap inference are summarized in Figure 1.9, where

we provide the histograms of the 2000 bootstrap replicates of both regression

coeﬃcients. We can also derive from those replicates conﬁdence intervals on

both coeﬃcients (by taking 2.5th and 97.5th percentiles, for example), as well as

conﬁdence intervals on predicted values (i.e., for new values of x). For instance,

based on the bootstrap sample, the derived 90% conﬁdence intervals are (in our

case) (2.350, 3.416) for the intercept α and (4.099, 4.592) for the slope β.

9

When running the same experiment, you will obviously get diﬀerent numerical

values due to the use of a diﬀerent random seed (see Section 2.1.1).

26

1 Basic R Programming

Fig. 1.9. Histogram of 2000 bootstrap intercepts (left) and slopes (right) for the

linear regression of Exercise 1.1. The least squares estimate of the intercept is 2.900,

and the slope estimate is 4.35.

Exercise 1.7 For the data associated with Figure 1.8:

a. Bootstrap the data and obtain a similar ﬁgure based on 1000 bootstrap replications. If the inference is about the 95% of the distribution of ȳ, q.95 (ȳ),

give a bootstrap estimate of this quantity, q̂.95 (ȳ).

b. Construct a bootstrap experiment that provides a 95% conﬁdence interval on

q̂.95 (ȳ). (Hint: You have to use two levels of bootstrapping to achieve this

goal.)

Exercise 1.8 For a dataset simulated as in Example 1.1, compare the bootstrap conﬁdence intervals on both coeﬃcients to the usual ones based on the

t-distribution. Comment on the diﬀerences.

1.6 Graphical facilities

Another clear advantage of using the R language is that it allows a very rich

range of graphical possibilities. Functions such as plot and image can be

customized to a large extent, as described in Venables and Ripley (1999) or

1.6 Graphical facilities

27

Murrell (2005) (the latter being entirely dedicated to the R graphic abilities).

Even though the default output of plot as for instance in

> plot(faithful)

is not the most enticing, plot is incredibly ﬂexible: To see the number of

parameters involved, you can type par() that delivers the default values of

all those parameters.

The wealth of graphical possibilities oﬀered by R should be taken advantage of cautiously! That is, good design avoids clutter, small fonts,

unreadable scale, etc. The recommendations found in Tufte (1990, 2001)

are thus worth following to avoid horrid outputs like those often found

in some periodicals! In addition, graphs produced by R usually tend to

look nicer on the current device than when printed or included in a slide

presentation. Colors may change, font sizes may turn awkward, separate

curves may end up overlapping, and so on. In the early stages of working

with R, and even later, you should thus check that the diﬀerent outputs

corresponding to the same graph are satisfactory before closing your R

session and losing hours of work!!!

Before covering the most standard graphic commands, we start by describing the notion of device that is at the core of those graphic commands. Each

graphical operation sends its outcome to a device, which can be a graphical

window (like the one that automatically appears when calling a graphical command for the ﬁrst time as in the example above) or a ﬁle where the graphical

outcome is stored for printing or other uses. Under Unix and Linux OS, launching a new graphical window can be done via X11(), with many possibilities

for customization (such as size, positions, color, etc.). Once a graphical window is created, it is given a device number and can be managed by functions

that start with dev., such as dev.list, dev.set, and others. An important

command is dev.off, which closes the current graphical window. When the

device is a ﬁle, it is created by a function that is named after its driver. There

are therefore a postscript, a pdf, a jpeg, and a png function. The complete

list is given by capabilities(). When printing to a ﬁle, as in the following

example,

> jpeg(file=”faith,jpg”)

> par(mfrow=c(1,2),mar=c(4,2,2,1))

> hist(faithful[,1],nclass=21,col=”grey”,main=””,

+ xlab=names(faithful)[1])

> hist(faithful[,2],nclass=21,col=”wheat”,main=””,

+ xlab=names(faithful)[2])

> dev.off()

closing the sequence with dev.off() is recommended since it completes the

ﬁle, which is then saved. If the command jpeg(file=”faith,jpg”) is repeated, the earlier version of the jpeg ﬁle is erased.

28

1 Basic R Programming

Using a line command interface for controlling graphics may seem antiquated, but this is the consequence of the R object-oriented philosophy. In

addition, current graphs can be saved to a postscript ﬁle using the dev.copy

and dev.print functions. Note that R-produced graphs tend to be large objects, in part because the graphs are not pictures of the current state but

instead preserve every action ever taken. For this reason, long series should

be thinned down to a few thousand points, images should work with a few

hundred pixels, contours should be preferred to images, and jpeg preferred to

pdf.

One of the most frustrating features of R is that the graphical device

is not refreshed while a program is executed in the main window. This

implies that, if you switch from one terminal to another or if the screen

saver starts, the whole or parts of the graph currently on the graphical

device will not be visible until the completion of the program. Conversely,

refreshing very large graphs will delay the activation of the prompt >.

As already stressed above, plot is a highly versatile tool that can be used

to represent functional curves and two-dimensional datasets. Colors (chosen

by colors() or colours() out of 650 hues), widths, and types can be calibrated at will and LATEX-like formulas can be included within the graphs

using expression; see plotmath(grDevices) for a detailed list of the mathematical symbols. Text and legends can be included at a speciﬁc point with

locator (see also identify) and legend. An example of (relatively simple)

output is

> plot(as.vector(time(mdeaths)),as.vector(mdeaths),cex=.6,

+ pch=19,xlab=””,ylab=”Monthly deaths from bronchitis”)

> lines(spline(mdeaths),lwd=2,col=”chocolate”,lty=3)

> ar=arima(mdeaths,order=c(1,0,0))$coef

> lines(as.vector(time(mdeaths))[-1], ar[2]+ar[1]*

+ (mdeaths[-length(mdeaths)]-ar[2]),col=”grey”,lwd=2,lty=2)

+ title(“Splines versus AR(1) predictor”)

> ari=arima(mdeaths,order=c(1,0,0),seasonal=list(order=c(1,

+ 0,0),period=12))$coef

> lines(as.vector(time(mdeaths))[-(1:13)],ari[3]+ari[1]*

+ (mdeaths[-c(1:12,72)]-ari[3])+ari[2]*(mdeaths[-(60:72)]+ ari[3]),lwd=2,col=”steelblue”,lty=2)

> title(“\n\nand SAR(1,12) predictor”)

+ legend(1974,2800,legend=c(“spline”,”AR(1)”,”SAR(1,12)”),

+ col=c(“chocolate”,”grey”,”steelblue”),

+ lty=c(3,2,2),lwd=rep(2,3),cex=.5)

1.6 Graphical facilities

29

represented ion Figure 1.10, which compares spline ﬁtting to an AR(1) predictor and to an SAR(1,12) predictor. Note that the seasonal model is doing

worse.

Fig. 1.10. Monthly deaths from bronchitis in the UK over the period 1974—1980

and ﬁts by a spline approximation and two AR predictors.

Another example illustrates the use of the command cumsum, which is

particularly handy when checking Monte Carlo convergence, as discussed in

the remark box of page 66.

> x=rnorm(1)

> for (t in 2:10^3)

+

x=c(x,.09*x[t-1]+rnorm(1))

> plot(x,type=”l”,xlab=”time”,ylab=”x”,lwd=2,lty=2,

+ col=”steelblue”,ylim=range(cumsum(x)))

> lines(cumsum(x),lwd=2,col=”orange3″)

30

1 Basic R Programming

This four-line program generates a simple AR(1) sequence and plots the original sequence (xt ) along with the cumulated sum sequence,

t

xi .

i=1

Note that, due to the high correlation factor (0.9), the cumulated sum is

behaving much closer to a random walk.

Fig. 1.11. Simulated AR(1) sequence (dotted) along with its corresponding cumulated sum.

Useful graphical functions include hist, for constructing and optionally

plotting histograms of datasets; points, for adding points on an existing

graph; lines, for linking points together on an existing graph, as in the above

example; polygon, for ﬁlling the area between two sets of points; barplot, for

creating barplots; and boxplot, for creating boxplots. The two-dimensional

representations oﬀered by image and contour are quite handy when provid-

1.7 Writing new R functions

31

ing likelihood or posterior surfaces, as in Figures 3.5 and 5.2. An instance of

using polygon is provided by

>

>

>

>

>

+

>

+

>

+

par(mar=c(2,2,2,2))

x=matrix(0,ncol=100,nrow=10^4)

for (t in 2:10^4)

x[t,]=x[t-1,]+rnorm(100)*10^(-2)

plot(seq(0,1,le=10^4),x[,1],ty=”n”,

ylim=range(x),xlab=””,ylab=””)

polygon(c(1:10^4,10^4:1)/10^4,c(apply(x,1,max),

rev(apply(x,1,min))),col=”gold”,bor=F)

polygon(c(1:10^4,10^4:1)/10^4,c(apply(x,1,quantile,.95),

rev(apply(x,1,quantile,.05))),col=”brown”,bor=F)

which approximates the range of 100 Brownian motions, as well as a 90%

conﬁdence band, represented in Figure 1.12 (see Kendall et al., 2007, and

Section 4.5).

The command points is used to add one or several points on a twodimensional plot. It suﬀers from a drawback, however, in that the entry

is by default a time series. Therefore, calling points(x) when x is a

two-dimensional vector will plot both points (1, x1 ) and (2, x2 ) rather

than the single point (x1 , x2 ). The result will be as expected if x is a

two-column matrix, resulting in the points (xi1 , xi2 ) being plotted.

These comments are only here to provide an introduction to the capacities

of R. Speciﬁc references such as Murrell (2005) need to be consulted to get a

complete picture of those capacities!

1.7 Writing new R functions

One of the strengths of R is that new functions and libraries can be created

by anyone and then added to Web depositories to continuously enrich the

language. These new functions are not distinguishable from the core functions

of R, such as median or var, because those are also written in R. This means

their code can be accessed and potentially modiﬁed, although it is safer to

deﬁne new functions. (A few functions are written in C, however, for eﬃciency.)

Learning how to write functions designed for one’s own problems is paramount

for their resolution, even though the huge collection of available R functions

may often contain a function already written for that purpose.

Exercise 1.9 Among the R functions you have met so far, check which ones are

written in R by simply typing their name without parentheses, as in mean or var.

32

1 Basic R Programming

Fig. 1.12. Range of 100 simulated Brownian motions (lighter hue) and 90% conﬁdence band (darker hue).

A function is deﬁned in R by an assignment of the form

name=function(arg1[=expr1],arg2[=expr2],…) {

expression

…

expression

value

}

where expression denotes an R command that uses some of the arguments

arg1, arg2, … to calculate a value, value, that is the outcome of the

function. The braces indicate the beginning and the end of the function and

the brackets some possible default values for the arguments. Note that producing a value at the end of a function is essential because anything done

1.7 Writing new R functions

33

within a function is local and temporary, and is therefore lost once the function has been exited unless saved in value (hence, again, the appeal of list).

For instance, the following function, named sqrnt, implements a version of

Newton’s method for calculating the square root of y:

sqrnt=function(y){

x=y/2

while (abs(x*x-y) > 1e-10) x=(x+y/x)/2

x

}

When designing a new R function, it is more convenient to use an external

text editor and to store the function under development in an external ﬁle,

say myfunction.R, which can be executed in R as source(“myfunction.R”).

Note also that some external commands can be launched within an R function

via the very handy command system. This is, for instance, the easiest (if not

the most eﬃcient) way to incorporate programs written in other languages

(e.g., Fortran, C, Matlab) within R programs.

The fact that R is an interpreted language obviously helps in debugging

programs. Indeed, whenever a program is interrupted or aborted, the variables

that have been previously deﬁned keep their latest value and can thus be

used to assess the cause of the error. This is not always suﬃcient though,

in particular because variables deﬁned within functions are not stored, and a

useful tool is the pause command browser, which temporarily stops a program

to allow the programmer to check the values of all variables at a given point.

Further debugging commands are debug and trace.

Using an external text editor and an external program ﬁle is important

for two reasons. First, using the line editor of R is ineﬃcient and close to

impossible for complex problems, even though functions can be internally

edited by the Unix vi editor as myfun=vi(myfun). Cut-and-paste is just

much easier. Second, this forces you to save your functions rather than

relying on .RData and .Rhistory, which are not 100% secure.

The expressions used in a function rely on a syntax that is quite similar to

those of other programming languages, with conditional statements such as

if (expres1) expres2 else expres3

where expres1 is a logical value, and loops such as

for (name in expres1) expres2

and

while (expres4) expres2

34

1 Basic R Programming

where expres1 is a collection of values, as illustrated in Figure 1.13, and

expres4 is a Boolean expression. In particular, Boolean operators can be

used within those expressions, including == for testing equality, != for testing

inequality, & for the logical and, | for the logical or, and ! for the logical

contradiction.

> bool=T;i=0

> while(bool==T) {i=i+1; bool=(i s=0;x=rnorm(10000)

> system.time(for (i in 1:length(x)){

+

s=s+x[i]})[3]

> system.time(t(rep(1,10000))%*%x)[3]

> system.time(sum(x))[3]

separate commands by semicolons

stop at i = 10

output sum(x) and

provide computing time

compare with vector product

compare with sum eﬃciency

Fig. 1.13. Some artiﬁcial loops in R.

Exercise 1.10 Explain the diﬀerence between the logical operators & and &&,

|, ||, and xor.

The operator if (and the associated operators ifelse and ) are some of

the rare occurrences where R does not apply to vectors. For vector-valued

tests, logical vectors like (abs(x)>1.96) can be used as indices of the output

vector, like the allocation commands

> y[(abs(x)1.96))

> y[(abs(x)>1.96)]=x[(x>1.96)]

Since R is an interpreted language, avoiding loops by vectorial programming is generally a good idea, but this may render programs much harder to

read. It is therefore extremely useful to include comments within the programs

by using the symbol #.

As noted previously, R is fundamentally slower than other languages.

Checking both the speed of a program and the reasons for its poor speed

can be done using the system.time command or the more advanced proﬁling

commands Rprof and Rprofmem described in the manual. There are, however, ways of speeding up the execution of your programs. First, using faster

functions (for example, those already programmed in C; see below) obviously

brings improvement. Second, preallocating memory as in x=double(10^5)

also increases speed. Third (and this is getting way beyond the scope of this

introduction!), it is possible to recompile parts of the R package with libraries

that are designed for your machine. An example is the Blas (basic linear algebra subprogram), which can be optimized using the free library Atlas (and

lead to improvements by factors from two to ﬁve). Details can be found in the

1.8 Input and output in R

35

R administration manual. Fourth, and this again requires some programming

expertise, you can take advantage of multiple processors, using for instance

netWorkSpace (NWS), Rpmi, or snow, developed by Luke Tierney.

While we cannot delve much here into the issue of interfacing R with other

languages, we do nonetheless stress that this is an important feature of R that

you should investigate, simply because there are problems R is just too slow to

handle! Using some routines written in C or Fortran then becomes paramount,

without losing the main advantages of R altogether. The easiest way to connect R with external subroutines such as the C executable mycprog.o is to

design the corresponding C program to take its input from a ﬁle such as

mycinput and write its output in another ﬁle such as mycouput. In this case,

calling

> system(“mycprog.o”)

within the R program will be suﬃcient. Obviously, this is a rudimentary type

of interfacing and it suﬀers from two drawbacks, the ﬁrst one being that

repeated access to ﬁles is time-consuming as well and the second one being that

the C program cannot call R functions this way. A more advanced approach is

based on the function .C, which can call C functions with arguments, and the

C subroutine call_R, as described for instance in Crawley (2007). The main

diﬃculty with these more advanced techniques is to ensure the compatibility

between data types. Section 8.5.2 provides an illustration of a C program being

called by an R program in an eﬃcient manner.

1.8 Input and output in R

Large data objects need to be read as values from ex…

## We've got everything to become your favourite writing service

### Money back guarantee

Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.

### Confidentiality

We don’t share your private information with anyone. What happens on our website stays on our website.

### Our service is legit

We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.

### Get a plagiarism-free paper

We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.

### We can help with urgent tasks

Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.

### Pay a fair price

Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.