Creating New Categorical and Ordinal Variables in Stata

A previous section illustrated how to construct a categorical variable called type to distinguish among territories, provinces and nation in our Canadian dataset. You can create categorical or ordinal variables in many other ways. This section gives a few examples.

Suppose we want to re-express type as a set of dichotomies or dummy variables, each coded 0 or 1. tabulate will create dummy variables automatically if we add the generate option. In the following example, this results in a set of variables called typel, type2 and type3, each representing one of the three categories of type:

Re-expressing categorical information as a set of dummy variables involves no loss of information; in this example, typel through type3 together tell us exactly as much as type itself does. Occasionally, however, analysts choose to re-express a measurement variable in categorical or ordinal form, even though this does result in a substantial loss of information. For example, unemp in Canada2.dta gives a measure of the unemployment rate. Excluding Canada itself from the data, we see that unemp ranges from 7% to 19.6%, with a mean of 12.26: . summarize unemp if type != 3

Two commands create a dummy variable named unemp2 with values of 0 when unemployment is below average (12.26), 1 when unemployment is equal to or above average, and missing when unemp is missing. In reading the second command, recall that Stata’s sorting and relational operators treat missing values as very large numbers.

. generate unemp2 = 0 if unemp < 12.26

(7 missing values generated!

. replace unemp2 = 1 if unemp > = 12.26 & !missing(unemp)

(5 real changes narlei

We might want to group the values of a measurement variable, thereby creating an ordered- category or ordinal variable. The autocode function (see Using Functions) provides automatic grouping of measurement variables. To create new ordinal variable unemp3, which groups values of unemp into three equal-width groups over the interval from 5 to 20, type

. generate unemp3 = autocode(unemp,3,5,20)

(2 missing values generated)

A list of the data shows how the new dummy (unemp2) and ordinal (unemp3) variables correspond to values of the original measurement variable unemp.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *