Nonlinear Regression — part 2

September ice in the Arctic (Arctic9.dta) provides a real-world example. Previously we saw that sea ice area declined over 1979-2011, the period of close satellite observation. Diagnostic graphs reveal that a linear model fits poorly, however, because the decline has been faster than linear (Figures 7.9, 7.11). An alternative quadratic model better describes the trend in observations through 2011 (Figure 7.12). If the quadratic model is projected a few years ahead, however, its path becomes physically impossible — crashing rapidly to zero and continuing down to negative area. Physical-process models tend to project a more gradual decline, flattening out as area approaches zero (e.g., Wang and Overland 2009). A simple analogue for such physical models might be an asymmetrical S-curve such as the Gompertz, rather than the precipitous fall of a quadratic.

The following commands fit a 3-parameter Gompertz model for Arctic sea ice. For minor variety we focus on extent (at least 15% ice) rather than the area (100% ice) used in earlier examples. The two variables have similar behavior, as seen earlier in Figure 3.14.

. use C:\data\Arctlc9.dta, clear

. nl gom3: extent year, nolog

The Gompertz model fits well, and all three parameters are significant. The first parameter, b1 = 7.58, gives the model’s asymptotic starting point, 7.58 million km2. The third parameter, b3 = 2.17.5, gives the inflection point where this curve shifts from concave upwards (steepening rate of decline) to convex upwards (slowing rate of decline): during the year 2017. The second parameter, b2 = -.0996, controls change in the rate of decline. To visualize this model, Figure 8.11 graphs the predicted values connected by a median spline (smooth curve).

The Gompertz curve in Figure 8.11 does not look much different from a quadratic curve (not shown), and fits only slightly better. It does lack the quadratic’s unrealistic slight increase in the earliest years. Substantial differences appear, however, if we extrapolate this Gompertz curve beyond the range of the data.

As a purely speculative, if-then exercise, we could extend our analysis to the year 2030. The actual data end with 2011, so we start by adding 19 more observations containing no ice data, but only new year values from 2012 to 2030. The first command sets the number of observations at 52 (was 33). The second calculates year values equal to the previous year plus one, for each observation that does not yet have a year. Finally we re-estimate the Gompertz model with the new years, obtaining exactly the same result as before.

Although the ice data and the model are unchanged, our extended dataset contains additional years. This makes it possible to obtain predicted values for all years from 1979 to 2030.

. predict gomext2

Unlike predicted values, residuals are defined only for the years with data on extent.

The following commands use the predicted extent values, gomext2, together with the standard deviation of the residuals, r(sd) (results defined by the summarize command above) to define lower and upper confidence limits of ±2sd around the predicted values. We then draw a more elaborate graph showing the Gompertz curve extrapolated out to 2030, with emphasis on its prediction for the year 2012.

. gen gomlo = gomext2 -2*r(sd)

. gen gomhi = gomext2 +2*r(sd)

. label variable gomext2 “nl gom3: extent year”

. label variable gomlo “Gompertz extent – 2sd”

. label variable gomhi “Gompertz extent + 2sd”

. graph twoway rarea gomlo gomhi year if year < 2012, color(gs13)

|| mspline gomext2 year if year < 2012, bands(60) lwidth(thick) lcolor(maroon)

|| mspline gomext2 year if year >= 2012, bands(60) lwidth(medthick) lcolor(maroon) lpattern(dash)

|| connect extent year, lwidth(medthick) msymbol(O) lcolor(navy) mcolor(navy)

|| scatter gomext2 year if year == 2012, msymbol(S) mcolor(maroon)

|| if year>1978, xlabel(1980(5)2030, grid)

yline(0, lcolor(black)) yline(1, lcolor(gs11) lwidth(thick))

xtitle(“”) legend(order(4 2 1) label(2 “Gompertz curve”)

label(4 “NSIDC extent”) label(1 “=char(177)’ 2sd”) position(7) ring(0) col(1))

text(4.5 2019 “2012 Gompertz” “prediction”

“4.3    (3.4’=char(150)’5.1)”, color(maroon))

The light gray (gs(13))confidence bands in Figure 8.12 are drawn first, as a twoway rarea (range area) plot for the years before 2012. Next a thick maroon median spline curve (mspline) tracing the Gompertz predicted values before 2012 is overlaid on top of the gray confidence bands. A medium-thick dashed curve traces predicted values for years 2012 and later. Observed extent values are overlaid next as a connected-line plot (connect). The fifth and final overlay consists of a scatterplot with only one point, the predicted value for 2012. Text specifies the numerical value and confidence limits for this prediction. Text color is maroon to match the predicted-values curve it describes. A horizontal gray line at 1 million km2 (yline(1)) marks a low September level that could be considered virtually ice free.

It must be emphasized that extrapolating curves in this manner is not a reliable way to predict the future. This particular model draws on no physical understanding, such as what might be causing the ice to decline. It depends entirely on our choice of statistical model, and how that fits past data. As a statistical exercise, however, we can take these naive predictions one more step. The curve in Figure 8.12 drops below 1 million km2 line in 2025, offering a point prediction for a virtually ice-free Arctic Ocean. But even if the Gompertz model proved to be true in some sense, we should expect actual behavior to vary around this curve as it has in the past.

Suppose that variation around the curve in future years followed a normal distribution with the same standard deviation observed around the curve in the past. We could simulate such behavior by adding noise to the smooth curve, drawing random values from a normal distribution with standard deviation equal to that of past residuals. For example, the following commands create a new set of predicted values (gomext3) equal to the old predictions (gomext2) plus random normal noise with the residuals’ standard deviation (r(sd) after summarize res). If the new prediction suggests a physically impossible negative extent, then it is just set to 0.

. quietly summ res

. gen gomext3 = gomext2 + r(sd)*rnormal()

. replace gomext3 = 0 if gomext3 < 0

. replace gomext3 = extent if year == 2011

The following commands would graph the results, in a manner similar to Figure 8.11.

. graph twoway rarea gomlo gomhi year if year < 2012, color(gs13)

|| mspline gomext2 year if year < 2012, bands(60) lwidth(thick) lcolor(maroon)

|| mspline gomext2 year if year >= 2012, bands(60)

lwidth(medthick) lcolor(maroon) lpattern(dash)

|| connect extent year, lwidth(medthick) msymbol(O) lcolor(navy) mcolor(navy)

|| line gomext3 year if year >= 2011, lwidth(medthick) lcolor(midblue)

|| , xlabel(1980(10)2030, grid)

yline(0, lcolor(black)) yline(1, lcolor(gs11) lwidth(thick))

ylabel(0(1)8) ytitle(“”) xtitle(“”) legend(order(4 2 1 5) label(2 “Gompertz curve”)

label(4 “NSIDC extent”) label(1 “=char(177)’ 2sd”)

label(5 “simulated”) position(7) ring(0) col(1) rowgap(*.3))

Each time we run this set of commands we will get different random noise values and hence a different-looking graph. Figure 8.13 combines four such graphs into one image, which helps to emphasize the unpredictable year-to-year variations. In any given year ice extent might go up or down, while the curve just depicts average behavior that no realization is expected to match.

Sometimes graphs published on the Internet take on a life of their own and travel far beyond the author’s knowledge or intent. For such purposes it can be worthwhile to embed your name, data source or other identifying information within the graphic itself, as done in Figure 8.13 using note and caption options. The following command draws this composite image by combining four previously-saved graphs, which had been named Gompertzextentl and so forth.

. graph combine Gom.pertz_extent1.gph Gompertz_extent2.gph

Gompertz_extent3.gph Gompertz_extent4.gph ,

title(“Gompertz extent simulation: some possible future paths”, size(medlarge))

caption(“data through 2011: NSIDC”)

note(“graph:   L Hamilton 6/21/2012”) imargin(small) col(2)

l1title(“September sea ice extent, million km’=char(178)'”)

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *