Long after a dataset has been created, we might discover that for some purposes it has the wrong organization. Fortunately, several commands facilitate drastic restructuring of datasets. The simplest of these, collapse, aggregates data into means, medians or other statistics for groups defined by one or more variables. For illustration, we return to the data on monthly global temperatures from January 1880 to December 2011 (global2.dta), graphed earlier in Figure 2.1.
With collapse, we could build a simplified dataset containing mean temperature anomalies for 132 years instead of 1,584 separate months.
. collapse (mean) temp, by(year)
. label variable temp “NCDC annual mean temp anomaly, deg C”
. save C:\data\global_yearly.dta, replace
. describe
Our new annual dataset might be visualized with a spike plot, in which vertical spikes indicate distance of each year’s temperature anomaly above or below the 1901-2000 mean.
A wider range of statistics can be collected using the flexible statsby command, which works as a prefix for other analyses. In the following example we return to global2.dta and generate a new variable called decade (1880 for years 1880-1889, 1890 for 1890-1899, and so forth). Then we create a new dataset consisting of summarize statistics for temperature, by decade.
The new dataset contains number of observations, mean, variance, maximum and other summarize statistics for each decade. Figure 2.3 graphs the maximum monthly temperature anomaly (max) for each decade (setting aside the “2010” decade which just has two years).
statsby can also make datasets of results from regression models or other analyses. Type help statsby or consult the Data Management Reference Manual for more information and examples. Selecting
Statistics > Other > Collect statistics for a command across a by list
from the menus brings up the dialog box for this command. Another useful aggregation command, contract, creates a dataset that resembles a frequency table for any combinations of specified variables (see help contract).
Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.
I have been checking out a few of your posts and i must say pretty good stuff. I will definitely bookmark your website.