Time series analysis paper — analyze completely
Time series analysis paper
For Problem 2, you are to evaluate the given analysis and interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you
would do differently. Note: points will be deducted for comments on format. The critique must
be about predictive analytics, not layout. Do not copy the report into your exam, rather use the
2
For this problem, you are to evaluate the analysis and interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you
would do differently. Keep your assessments by the numbering system to assure your criticisms
coincide with the respective material.
I do not want comments on format. Whether a figure is not in a convenient place is of not
interest. Grammar and spelling are of no interest. I will mark down for non-essential criticism.
Focus on the analysis and the interpretation of the results.
2.1 Introduction
A continuing question on daily return surgeries time series indices is whether any one series is
interchangeable with another; i.e., does one surgery index time series have the same daily counts
as some other particular surgery index time series within specified statistical error? Statistical
differences in surgery index time series include, e.g., networks of multiple observers or counting
methodology. The Debrecen index is compared to the Surgery Tracking And Recognition Algorithm
(STARA) index, and with the Addendum of Authenticated and Verified Surgery Observations
(AAVSO) index.
A pairwise comparison of index counts is confounded by the possible autocorrelation of each
series, and hence a traditional regression-type comparison is inappropriate as the autocorrelation
violates the regression independence assumption. In addition, two aspects of a time series must
be examined for a comparison; when the count occurred and the count magnitude. The analytical
methodologies include autocorrelation and cross-correlation from statistical time series analysis to
determine when a count occurred, and the nonparametric Wilcoxon signed rank test to compare
magnitudes of the count. The time series analyses are used to determine pairwise day-by-day
alignment. Once the paired series are time-aligned, the count magnitudes are made using the
Wilcoxon signed rank test as the counts data do not follow a normal (Gaussian) distribution.
Section 2.2 is a description of the three returning time series data sets; Section 2.3 discusses
the time series statistical analyses of each of the three data sets; Section 2.4 is the data set com-
parisons, or statistical time series cross-correlation analysis including a brief explanation of why
regression is inappropriate; Section 2.5 are the count magnitude comparisons; and Section 2.6 are
the conclusions.
2.2 Data Sets
This section describes the daily returning surgeries times series of the AAVSO, Debrecen, and
STARA data sets. The descriptions indicate some of the characteristics that must be accounted for
prior to a statistical comparison. The AAVSO series is described first, followed by the Debrecen
series, and ending with the STARA series.
2.2.1 AAVSO Data
The AAVSO’s program of data-gathering and analysis of surgeries has been active since its inception
in 1944. AAVSO raw data are submitted monthly as sets of date- and time-stamped values. The
pre scrubbed AAVSO data contain 34,435 returning surgery counts that span from May 1, 2010
through July 12, 2013. The left panel of Figure 1 shows that these data are truncated on the left
at zero counts, skewed to the right. The histogram suggests these count data follow a Poisson
distribution.
2.2.2 Debrecen Data
The pre scrubbed Debrecen data contain 41,866 daily returning surgery counts that span from
December 4, 1981 through January 5, 2011. As with the AAVSO data, the middle panel of Figure
1 shows that these data are truncated on the left at zero counts, skewed to the right. The histogram
suggests these data follow a Poisson distribution.
Figure 1: AAVSO, Debrecen, and STARA index counts histograms of the pre scrubbed data. The
green dashed curves are best-fit exponential distributions, and the black solid curves are best-fit
gamma distributions..
2.2.3 STARA Data
The STARA data contain 1,152 daily returning surgery return counts span from May 1, 2010
through July 12, 2013. The right panel of Figure 1 shows that these data are truncated on the
left at zero counts, and are skewed to the right. This suggests these counts data follow a Poisson
distribution.
The Poisson distributions of each of these data sets affect the accuracy of the paired count
magnitude comparisons, as will be seen below.
2.3 Autocorrelation Analysis
A time series is a stochastic process where the index set is of countable time increments; i.e., a
time series is a set of observations, xt, each recorded at a specified time t. To allow for the possibly
unpredictable nature of future observations we may suppose that each observation is a realization
of a random variable Xt. The time series {xt, t ∈ T0} is a realization of the family of random
variables {Xt, t ∈ T}, where T ≥ T0.; i.e., the realization xt is a subset of all possible values of Xt.
The following time series process analyses assess whether the count pairings are index set (time)
aligned. This alignment is necessary before paired counts magnitude comparisons can be made.
The times series autocorrelation analysis is preceded by a descriptive analysis of the data sets.
2.3.1 Descriptive Analysis
The AAVSO and Debrecen series have days with multiple observations which we summarize by the
count median. Further, the time span of each data set must be matched. The common span is
found to be from May 1, 2010 through January 5, 2011. The time series that result from using the
daily median and matched spans are displayed in Figures 2 and 3. Figure 2 depicts the three series
in a stacked, matched-span plot. The AAVSO data are in the upper panel, the Debrecen data are
in the middle panel, and the lower panel has the STARA data. These plots show ambiguously
matched count magnitudes.
Figure 2: The AAVSO (top panel), Debrecen (middle panel), and STARA (bottom panel)
matched-span time series plot. The data are daily..
Figure 2 has the three matched-span series superimposed over each other. The AAVSO series
is the solid black curve, the Debrecen series is the dashed red curve, and the STARA series is the
dotted green curve. As with the stacked plot, this plot also shows no obvious coincidence in count
magnitude.
Figure 3: The three matched-span series superimposed. The AAVSO series is the solid black
curve, the Debrecen series is the dashed red curve, and the STARA series is the dotted green curve.
The data are daily..
Fortunately, statistical time series analysis is able to remove much of the apparent ambiguity.
Time series analysis will help determine if the counts are time-aligned. Once this outcome is
available, a magnitude comparison is possible.
2.3.2 Autocorrelation Models
Before the counts time series magnitudes can be compared, the individual time series must be
examined for autocorrelation, as autocorrelation inflates the series variability. A critical property of
any time series is stationarity, which is required to assess the autocorrelations and cross-correlations.
Stationarity is the property of a time series in which, over a specified time span, the mean and
variance of the series is constant. This is the time series analysis equivalent of the mean zero,
constant variance assumption requirement for such statistical methods a the t-test, analysis of
variance, and regression. If a time series follows a Gaussian distribution, then it can be shown that
the time series is stationary.
We saw above that the three counts time series do not follow a normal distribution, and hence
stationarity may not be assumed. A commonly used transformation to obtain a stationary time
series is differencing. A first difference transformation is
5Xt = Xt −Xt−1 = Xt −BXt = (1 −B)Xt, (1)
where 5Xt is the tth first difference operation between the tth and the t−1st values of the random
variable X, and B is the back shift operator such that BXt = Xt−1. The differencing operator
may be extended to second (5(2)), third (5(3)), etc., differences, as can the back shift operator B,
but higher order differencing is not needed for the return surgeries time series. The first difference
transformation results in stationarity for each of the three series.
With stationarity established, we can examine each series for autocorrelation. The sample
autocorrelation function (ACF) and the sample partial autocorrelation function (PACF), and their
associated plots, are used to identify if and what types of autocorrelation exist. The sample ACF
measures time series white noise autocorrelation as a moving average order. The sample PACF
measures time series autocorrelation as the order of autoregression
In Figures 4 and 5, the panels on the diagonal depict the first-differenced (lag 1) series sample
ACF and sample PACF respectively. The off-diagonal panels are unadjusted cross-correlations
between paired series, and are here ignored pending further time series analysis. In each figure, the
row one column one panel is the AAVSO series, the second row second column panel is the Debrecen
series, and the third row third column is the STARA series, each after taking first differences. We
are interested in the plot lag values of each panel that extend above or below the horizontal blue
dashed 95% confidence interval (CI) lines. Each series has 211 days of return surgery counts, and
at the 95% CI, this suggests that there are 0.05 × 211 ≈ 11 expected CI marginal overreaches. We
therefore are interested in those lag patterns that strongly extend outside the CI band.
Figure 4 is the sample ACF of the three series. The zeroth lag (t = t) is ignored in each sample
ACF plot. The AAVSO plot suggests a lag 1 (preceding day) moving average model should be
examined. The Debrecen plot indicates that both a lag 1 and a lag 3 moving average model may
be appropriate. The STARA plot, like the AAVSO plot, suggests a lag 1 moving average model
should be tested.
Figure 4: The sample ACFs of the AAVSO, Debrecen, and STARA time series..
Figure 5 is the sample PACF of the three series. In each panel on the diagonal of the plot, there
are no systematic overreaches of the CIs, i.e., the overreaches appear random, which suggests no
autoregressive behavior in these three series.
Figure 5: The sample PACFs of the AAVSO, Debrecen, and STARA time series..
The sample ACF and sample PACF suggest the types of time series models for each surgery
count source. The models take the form of Autoregressive Integrated Moving Average (ARIMA)
models. The models are denoted as ARIMA(p,d,q), where AR refers to the autoregressive compo-
nent, I refers to the integrated component which determines the order of differencing to establish
stationarity, MA refers to the moving average component, and p, d, and q are the non-negative
integers indicating the orders of autoregression, integration, and moving averaging, respectively.
The ARIMA analysis of the AAVSO series gives a ARIMA(0, 1, 1) model, the Debrecen model is
ARIMA(0, 1, 3), and the STARA model ARIMA(1, 1, 3).
Goodness-of-fit indicators for the ARIMA models are cumulative periodograms of the model
standardized residuals, and time series plots of the standardized residuals. The behavior of the
model residuals are particularly important for the cross-correlation analysis below. Figures ??
and ?? are the diagnostics for the AAVSO ARIMA(0, 1, 1) model. Figure ?? is the cumulative
periodogram. The blue dashed diagonal lines define a 95% CI band that, if the black cumulative
periodogram curve lies within, suggests the model is adequate. Containment of the curve within the
CI band suggests it follows a normal distribution, which is an indicator of model adequacy. Figure
?? has three diagnostic plots. The top panel is the standardized residuals time series plot which
indicates an adequate model when no more than 11 residuals exceed the plus or minus 3 standard
deviation levels. The middle panel is the sample ACF of the residuals which suggest the ARIMA
model is adequate as all the lags lie within the horizontal red dashed 95% CI levels. The bottom
panel is the Ljung-Box p-value plot in which no p-values fall below the threshold line indicating an
adequate model. Hence, the ARIMA(0, 1, 1) may be considered a reasonable model of the AAVSO
series.
Figure 6: AAVSO series ARIMA model diagnostic plots..
Figures ?? and ?? are the diagnostics for the Debrecen ARIMA(0, 1, 3) model. Figure ?? is
the cumulative periodogram which suggests it follows a normal distribution. Figure ?? has the
time-based diagnostic plots. The standardized residuals time series plot has only 2 of the possible
11 values that lie outside ±3 standard deviations. The sample ACF of the residuals suggest the
ARIMA model has all the lags within the 95% CI band. The Ljung-Box p-value plot has no p-
values below the horizontal red threshold line. Hence, the ARIMA(0, 1, 3) may be considered a
reasonable model of the Debrecen series.
Figure ?? and ?? are the diagnostics for the STARA ARIMA(1, 1, 3) model. Figure ?? is the
cumulative periodogram which suggests the periodogram is normally distributed. Figure ?? has
the three time series diagnostic plots. The standardized residuals time series plot has no residuals
outside the plus or minus 3 standard deviation levels. The sample ACF of the residuals has all the
lags within the horizontal red dashed 95% CI levels. The Ljung-Box p-value plot has no p-values
below the horizontal red threshold line. Hence, the ARIMA(1, 1, 3) may be considered a reasonable
model of the STARA series.
The autocorrelation of each of the three return surgery data sets has been identified and de-
scribed. The residuals analyses show that the residuals of each time series are reduced to white
noise, and thus the residuals are independent between any series pair. This is an important property
for the series comparisons. We may now make pairwise comparisons of the data sets.
Figure 7: Debrecen series ARIMA model diagnostic plots..
2.4 Cross-Correlation Analysis
The panel of scatter plots of the count sources in Figure 9 show the paired series associations. The
second row, column one panel shows that the Debrecen versus AAVSO data have a clear nonlinear
relationship with the smaller counts having the greater nonlinearity, and the large counts have the
greater variability. A similar nonlinear relationship exists between the Debrecen and STARA series,
which is depicted in the second row, column three panel. However, the STARA versus AAVSO data
exhibit a more nearly linear relationship, though the variability of the larger counts is greater. This
relationship is shown in the panel in the third row of the first column. Some of these characteristics
have been addressed by constructing ARIMA models for each series, and it is with these models
that the cross-correlations, i.e., the time-based alignment, may be developed.
With autocorrelated data it is difficult to assess the dependence or comparison between any
two time series. It is therefore necessary to disentangle the linear association between any two
series from their respective autocorrelations. Another property that must be satisfied is that the
two series must be stationary and independent of each other. While the data may be stationary,
they must still be transformed to white noise to assure independence. The transformation may be
accomplished by using the residuals from the respective series ARIMA models. We saw from the
ARIMA model diagnostics that the residuals from the series ARIMA models are white noise, thus
implying that the residuals of the ARIMA models are independent. For example, it was shown
that the AAVSO data are adequately modeled by an ARIMA(0, 1, 1) with no intercept term, so,
Figure 8: STARA series ARIMA model diagnostic plots..
for xt representing the AAVSO counts,
x̄t = zt −θzt−1
= (1 −θB) zt, (2)
where x̄t is the white noise model return surgery count at time t, zt is the white noise value at
time t, and θ is the white noise parameter that is estimated from the ARIMA model analysis.
The ARIMA model residuals x̄t, t = 0,±1,±2, · · · , are white noise and this process is known as
prewhitening.
We now compare the two series using the cross-correlation function (CCF) by prewhitening one
series with its ARIMA model. The other series then is filtered through this same ARIMA model.
Stationarity is assured by incorporating the first difference in the ARIMA filter. As prewhitening is a
linear operation, any linear relationship between the two series will be preserved after prewhitening.
For example, to compare the AAVSO data with the Debrecen data, first prewhiten the AAVSO
data using its ARIMA model. Then filter the Debrecen data with the AAVSO ARIMA model.
Finally, use the CCF to look for lags between the two series.
Often a regression model is used to measure the relationship of one counts series to another. The
fallacy of this method arises from the violation of two assumptions of regression: (i) the response
must follow a normal distribution, and (ii) the two series must be independent. The first assumption
was shown above to be violated as the counts follow a Poisson distribution. The second assumption
is violated as demonstrated by the autocorrelation identified in the ARIMA model analyses, which
is an indictment of non-independence.
Figure 10 is the sample CCF between the ARIMA(0, 1, 1) filtered Debrecen counts and the
Figure 9: Scatter plots of the return surgeries count sources show the paired series associations..
ARIMA(0, 1, 1) prewhitened AAVSO counts. It is clear from the plot that the only lag is at zero,
which suggests that the two series are nearly aligned in time.
Figure 11 is the sample CCF between the ARIMA(0, 1, 1) filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO counts. The plot shows balance between the AAVSO the
STARA data. The AAVSO series and the STARA series is balanced at lag 0. This balance
suggests that the two series are aligned in time.
Figure 12 is the sample CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the
ARIMA(1, 1, 3) prewhitened STARA counts. The AAVSO series and the STARA series are bal-
anced at lag 0. This balance suggests that the two series are aligned in time.
The cross-correlation analysis gives the pairwise time alignments to compare the magnitude of
the counts for each series. The cross-correlation between the AAVSO and Debrecen series have zero
lag and hence they are aligned. The same result holds for the cross-correlation between the AAVSO
and STARA data, i.e., they are aligned. Similarly, the cross-correlation between the STARA and
Debrecen data show they are aligned.
2.5 Magnitude Comparison
With the appropriate shifts for each return surgery counts series if needed, the counts magnitude
comparison is tested with the Wilcoxon signed ranks test. This test is used over the t-test as the
counts data do not follow a normal distribution, which is an assumption required for the t-test. The
n time-ordered data pairs (x1,1,x2,1), (x1,2,x2,2), · · · , (x1,n∗,x2,n∗ ) for which the absolute value of
Figure 10: The sample CCF between the ARIMA(0, 1, 1) filtered Debrecen counts and the
ARIMA(0, 1, 1) prewhitened AAVSO count residuals..
Figure 11: The sample CCF between the ARIMA(0, 1, 1) filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO count residuals..
Figure 12: The sample CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the
ARIMA(1, 1, 3) prewhitened STARA count residuals..
the differences are found such that
Di = x1,i −x2,i, i = 1, . . . ,n∗. (3)
Simplistically, all differences with the value 0 are eliminated so the remaining differences are n ≤ n∗.
The n |Di| differences are ordered from lowest to highest, and then are ranked 1 to n. The ith rank
Ri is designated as a positive rank if Di > 0, or Ri is designated as a negative rank if Di < 0. The
test statistic is the sum of the positive signed ranks:
T∗ =
∑
Ri, ∀Ri 3 Di > 0, i = 1, . . . ,n. (4)
The test statistic T∗ is compared to the quantiles of a distribution whose shape varies depending
on conditions.
Table 2 lists the surgery counts time series pairs and their respective Wilcoxon signed rank
test statistics. The last column in the table indicates if the count magnitudes may be considered
statistically equal. Only the STARA and Debrecen time series have statistically identical daily
counts.
Table 2: Wilcoxon rank sum test with continuity correction counts magnitude comparison..
X Y n W P(>W) X = Y
AAVSO Debrecen 211 35368.5 < 2.2e− 16 no AAVSO STARA 211 34903 < 2.2e− 16 no STARA Debrecen 210 22286.5 0.8468 yes
2.6 Conclusions
Three time series of daily returning surgeries counts were compared for interchangeability; i.e.,
does one return surgery time series have the same daily counts as some other particular time series
within specified statistical error? Each series had peculiarities, e.g., networks of multiple observers
or counting methodology, for which some adjustments were made in the time series and magnitude
analyses.
The Debrecen time series was compared to the STARA time series, and with the AAVSO time
series. Also, the STARA and AAVSO series were compared. These daily time series were shown to
be autocorrelated which was accounted for before the series were compared.
Each time series was made stationary by taking the first difference. The autocorrelation function
and the partial autocorrelation function were used to identify the order and type of autocorrela-
tion for each of the series. The analysis of the AAVSO series gave the ARIMA(0, 1, 1) model,
the Debrecen series analysis gave the ARIMA(0, 1, 3) model, and the STARA analysis gave the
ARIMA(1, 1, 3) model.
The cross-correlation function (CCF) between the ARIMA(0, 1, 1) filtered Debrecen counts
and the ARIMA(0, 1, 1) prewhitened AAVSO counts showed the count changes occurred on the
same days. It was clear from the plot that there was no lagging, which suggested that the two
series were time-aligned. The CCF between the ARIMA(0, 1, 1) filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO counts showed that the count series were time-aligned. The
CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the ARIMA(1, 1, 3) prewhitened
STARA counts suggested that the Debrecen series and the STARA data are time aligned.
After the appropriate series shifts were made, the magnitude of the series counts was compared.
Table 2 gives the details of the counts magnitude comparisons, and the table shows that only the
STARA and Debrecen series are interchangeable.
We showed that returning surgeries time series counts comparisons are best made after a statis-
tical times series analysis is performed. We also showed that, as the counts do not follow a normal
distribution, the appropriate magnitude comparison statistical method is the Wilcoxon signed ranks
test provided the series pairings first are time-aligned. The results showed that only the Debrecen
series and the STARA series are interchangeable.
3
Set up the three-dimensional (3D) VAR(2) where the third variable does not Granger-cause the
first variable. The Bonus.R script may help.
4 Bonus, “Best Model”, 5 points
Give criteria for aiding in the choice of a “best” time series model when two or more such models
are available. What is, arguably, the most important criterion?
- Time Series Model Construction, 20 points
- Bonus, “Best Model”, 5 points
Fossil Fuels Company Stocks
Blackhole Detection from Suspected Gravity Lensing
Return Surgeries, 15 points
Introduction
Data Sets
AAVSO Data
Debrecen Data
STARA Data
Autocorrelation Analysis
Descriptive Analysis
Autocorrelation Models
Cross-Correlation Analysis
Magnitude Comparison
Conclusions
Bonus, 3D VAR(2) Model, 5 points
Top-quality papers guaranteed
100% original papers
We sell only unique pieces of writing completed according to your demands.
Confidential service
We use security encryption to keep your personal data protected.
Money-back guarantee
We can give your money back if something goes wrong with your order.
Enjoy the free features we offer to everyone
-
Title page
Get a free title page formatted according to the specifics of your particular style.
-
Custom formatting
Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.
-
Bibliography page
Don’t pay extra for a list of references that perfectly fits your academic needs.
-
24/7 support assistance
Ask us a question anytime you need to—we don’t charge extra for supporting you!
Calculate how much your essay costs
What we are popular for
- English 101
- History
- Business Studies
- Management
- Literature
- Composition
- Psychology
- Philosophy
- Marketing
- Economics