Reshaping Data in Stata

A different sort of restructuring is possible through the reshape command. This command switches datasets between two basic configurations termed wide and long. Earlier in this chapter we built a dataset with the Multivariae ENSO Index (MEIO.dta). The data are in wide format: years define the rows, but each month is a separate column. Thus, meil represents the MEI value for January, mei2 is February, and so on.

. use C:\data\MEI0.dta, clear

. describe

 

We can reshape these wide-format data into a time series in long format. The following command names a new variable to be created, mei. Each row of the new long-format dataset will have an observation identifier, i(year), and sub-observation identifier, j(month).

Now we have the Multivariate ENSO Index time series in year/month form, similar to the year/month time series of global surface temperatures (global2.dta) we built earlier. With both datasets sorted by year and month, we can merge the two into a common file.

The temperature data in global2.dta cover each month from January 1880 through December 2011, whereas meil.dta covers only January 1950 through December 2011. Consequently, 70*12 = 840 months exist only in the master data and are not matched; the remaining 12*62 = 744 months exist in both datasets and are matched one to one.

After saving the new merged data as global3.dta, we can draw a time plot with both temperature and mei over the years 1950-2011. These two variables have different scales, so mei is assigned to the right-handy axis, denoted yaxis(2). The graph command below overlays two line plots, one for temp and one for mei, the latter drawn with a dashed line. The command also specifies a legend with two rows, instead of the default here which would be two columns. A first look at the graph suggests that global temperature and the ENSO index often vary together from year to year, but ENSO lacks the decadal upward trend of temperature. Chapter 12 applies time series modeling for a more rigorous analysis of this point.

. sort year month

. drop _merge

. compress

reshape works equally well in reverse, to switch data from long to wide format. We could convert the year/month time series of temperature and MEI into a wide dataset in which each row was a year, and each column a variable/month, by the following commands (not shown).

. drop edate

. reshape wide mei temp, i(year) j(month)

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *