Converting Between Numeric and String Formats in Stata

Dataset Canada2.dta contains one string variable, place. It also has a labeled categorical variable, type. Both seem to have nonnumeric values.

Beneath the labels, however, type remains a numeric variable, indicated by a blue font in the Data Editor or Browser. Clicking on that cell will show the underlying numbers, or we can list these asking for the nolabel option:

String and labeled numeric variables behave differently when analyzed. Most statistical operations and algebraic relations are not defined for string variables, so we might want to have both string and labeled-numeric versions of the same information in our data. The encode command generates a labeled-numeric variable from a string variable. The number 1 is given to the alphabetically first value of the string variable, 2 to the second, and so on. The following example creates a labeled numeric variable namedplacenum from the string variable place:

. encode place, gen(placenum)

An opposite conversion is possible, too: The decode command generates a string variable using the values of a labeled numeric variable. Here we create string variable typestr from numeric variable type:

. decode type, gen( typestr)

When listed, the new numeric variable placenum, and the new string variable typestr, look similar to the originals:

But with the nolabel option, the differences become visible. Stata views placenum and type basically as numbers.

. list place placenum type typestr, nolabel

Most statistical analyses, such as finding means and standard deviations, work only with numeric variables. For calculation purposes, their labels do not matter.

Occasionally we encounter a string variable where the values are all or mostly numbers. To convert these string values into their numeric counterparts, use the real function. For example, in the artificial dataset below, the variable siblings is a string variable, although it only has one value, “4 or more,” that could not be represented just as well by a number.

The new variable sibnum is numeric, with a missing value where siblings had “4 or more.” . list

.list

The destring command provides a more flexible method for converting string variables to numeric. In the example above, we could have accomplished the same thing by typing

. destring siblings, generate(sibnum) force

See help destring for information about syntax and options.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *