Declare Survey Data in Stata

Since 2001, the Granite State Poll at the University of New Hampshire has conducted statewide telephone surveys several times each year. Each survey contacts a new sample of about 500 people, asking a variety of opinion questions along with respondent background characteristics. The poll’s political findings attain national importance every four years during New

Hampshire’s presidential primary campaigns. Dataset Granite_2011_6.dta contains questions from a Granite State Poll of 516 people, conducted in June 2011.

As with any survey, the sampling design and response patterns may result in sample data that differs from the target population. For example, Census data tell us that less than 52% of the state’s adult population is female, but women make up almost 55% of this sample.

To compensate for minor bias in sampling, survey researches routinely calculate probability weights. Some weights are calculated from comparisons between sample and population characteristics, such as gender in the example above. Others are based on features of the sampling design. For the Granite State Poll, researchers define a variable named censuswt that combines adjustments for household size, number of telephone lines, gender and region within the state. censuswt values in the June 2011 poll have a mean of 1 but range from .16 (observations given low weights to compensate for over-representation) to 2.19 (high weights to compensate for under-representation).

A svyset command declares the survey structure of the data, with probability weights given by censuswt. Saving the data then saves this information, although survey weights will be used in any particular analysis only if we specifically ask for them. Otherwise the data are unchanged.

Once data have been svyset, commands with the prefix svy: will perform calculations using the survey weight information. After weighting the gender balance is closer to what we expect in the population.

Many Stata commands from simple tables to statistical models permit the svy: prefix. For example, we could perform a weighted logistic regression (Chapter 9) of personal opinions about climate change, variable warmop2, on respondent education and political party through a command such as

. svy: logit warmop2 educ party

Type help survey for a list of analytical possibilities. The svyset command also can declare more information than just the probability weights used in the example above. svyset options allow for complex designs including stratified and multistage cluster sampling, finite population corrections, alternative methods of variance estimation, and poststratification. Type help svyset to see the syntax and a complete list of options. The Survey Data Reference Manual gives more examples and technical details.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *