Creating a New Stata Dataset by Copy and Paste

When the original data source is electronic, such as a web page, text file, spreadsheet or word processor document, we can bring these data into Stata by copy and paste. For example, the National Climate Data Center (NCDC) produces estimates of global temperature anomalies (deviations from the 1901-2000 mean, in degrees Celsius) for every month back to January 1880. The NCDC index is one of several based on a global network of data from weather stations and sea surface measurements. NCDC updates the global index monthly (through December 2012 as this is written) and publishes results online. The first five months are listed below. The first value, -0.0623, indicates that January 1880 was globally about .06 °C cooler than the average for January in the 20th Century.

1880 1 -0.0623

1880 2 -0.1929

1880 3 -0.1966

1880 4 -0.0912

1880 5 -0.1510

Depending on details of how raw data (including missing values) are organized, it may not work to just copy the whole set of numbers and paste them into the Data Editor. An intermediate step, expressing the raw data as comma-separated values, often proves helpful. An easy way to do this is to copy all the numbers and paste them into Stata’s Do-File Editor, a simple text editor that has many applications. Then use the Do-File Editor’s Edit > Find > Replace function to Replace All occurrences of double spaces with single spaces. Repeat this a few times until no double spaces (only single spaces) remain in the document. Then as a last step, Replace All the single spaces with commas. We have just used the Do-File Editor to convert the data into comma separated values, a very common data format. In the Do-File Editor, we can also add a first row containing comma-separated variable names:

We can now Edit > Select All then copy the information from the Do-File Editor and paste it into an empty Data Editor, using Paste Special with Comma delimiter and Treat first row as variable names options.

Comma-separated values (.csv) files can also be written by any spreadsheet, or by Stata itself, making this a conveniently portable data format. To read a .csv file directly into Stata use an insheet command:

. insheet using C:\data\global.csv, comma clear

Once data are in memory, we can label the data and variables, then save the results as a Stata system file.

. label data “Global climate”

. label variable year “Year”

. label variable month “Month”

. label variable temp “NCDC global temp anomaly vs 1901-2000, C”

. save C:\data\global1.dta

. describe

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *