Using Explicit Subscripts with Variables in Stata

When Stata has data in memory, it also defines certain system variables that describe those data. For example, _N represents the total number of observations. _n represents the observation number: _n = 1 for the first observation, _n = 2 for the second, and so on to the last observation ( _n = _N ). If we issue a command such as the following, it creates a new variable, caselD, equal to the number of each observation as presently sorted:

. generate caselD = _n

Sorting the data another way will change each observation’s value of _n , but its caselD value will remain unchanged. Thus, if we do sort the data another way, we can later return to the earlier order by typing

. sort caselD

Creating and saving unique case identification numbers that store the order of observations at an early stage of dataset development can facilitate later data management.

We can use explicit subscripts with variable names, to specify particular observation numbers. For example, the 4th observation in our global temperature dataset global2.dta is April 1880, with a temperature anomaly (temp) of -.0912 °C.

. display temp[4]

-.0912

Similarly, temp[5] is the temperature anomaly for May 1880, -.151 °C:

. display temp[5]

-.15099999

Explicit subscripting and the _n system variable have particular relevance when our data form a series. In this temperature example, either temp or, equivalently, temp[ _n] denotes the value of the _nth observation. temp[ _n-1] denotes the previous temperature, and temp[ _n+1] denotes the next. Thus, we might define a new variable diftemp, which is equal to the change in temp since the previous month:

. generate diftemp = temp – temp[_n-1]

Chapter 12 on time series analysis returns to this topic.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *