Writing Programs for Data Management in Stata

Data management on larger projects often involves repetitive or error-prone tasks that are best handled by writing specialized Stata programs. Advanced programming can become very technical, but we could also begin by writing simple programs that consist of nothing more than a sequence of Stata commands, typed and saved as a text file. Text files can be created using your favorite word processor or text editor, which should offer several kinds of text files among its options under File > Save As. One convenient way to create such text files is through Stata’s Do-file Editor, which is brought up by clicking Window > Do-file Editor or the icon . Alternatively, bring up the Do-file Editor by typing the command doedit, or doedit filename iffilename exists. Commands in the Review window can be highlighted and sent directly to the Do-File Editor (right-click to get this menu choice). Commands can also be copied and pasted into the Do-File Editor from other sources such as log files or the Results window.

Across several sections of this chapter we began building a global climate dataset, starting with temperature, then reshaping and merging the Multivariate El Nino/Southern Oscillation Index (MEI), and finally graphing temperature and MEI together as Figure 2.4. The commands that executed each of these steps could be assembled into a single do-file, as follows. Note the use of /// to continue the long graph twoway command onto more than one line. At its end, this do- file saves Figure 2.4 in Stata graph (.gph) and enhanced Windows metafile (.emf) file formats.

insheet using C:\data\global.csv, comma clear

label data “Global climate”

label variable year “Year”

label variable month “Month”

label variable temp “NCDC global temp anomaly vs 1901-2000, C”

generate edate = mdy(month, 15, year)

label variable edate “elapsed date”

format edate %tdmCY

sort year month

order year month edate

save C:\data\global2.dta, replace

use C:\data\MEI0.dta, clear

reshape long mei, i(year) j(month)

sort year month

label variable mei “Multivariate ENSO Index”

save C:\data\mei1.dta, replace

use C:\data\global2.dta, clear

merge 1:1 year month using c:\data\mei1.dta

sort year month

drop _merge


save c:\data\global3.dta, replace graph twoway line temp edate ///

|| line mei edate, yaxis(2) lpattern(dash) ///

|| if year>1949, legend(row(2)) graph save

Graph “C:\graphs\fig02_04.gph”, replace

graph export “C:\graphs\fig02_04.emf”, as(emf) replace

This file could be written by highlighting commands in the Review window, then right-click and Send to Do-file Editor. Save the do-file with a new name, such as global.do. Once the do-file is created, we can run it by selecting File > Do and opening global.do from the menus; or just by typing a command such as

. do global

Such batch-mode programs are usually saved with a .do extension. More elaborate programs (defined by either do-files or automatic ado-files) can be stored in memory, and can call other programs in turn, creating new Stata commands and opening a world of possibilities for the adventurous.

Stata ordinarily interprets the end of a command line as the end of that command. This is reasonable onscreen, where the line can be arbitrarily long, but does not work as well when we are typing commands in a text file. Three forward slashes (///) at the end of a physical line tell Stata that the command is continued on the next physical line. The command executes only after reaching a line that does not end with ///

Another way to handle long lines in do-files is to use a #delimit ; command, which sets a semicolon as the end-of-command delimiter. In the example below we make a semicolon the delimiter, type a long command that does not end until a semicolon appears, and then finally reset the delimiter to its usual value, a carriage return ( cr )

#delimit ;

graph twoway line temp edate

|| line mei edate, yaxis(2) lpattern(dash)

|| if year>1949, legend(row(2)) ;

#delimit cr

Stata normally pauses each time the Results window becomes hill of information, and waits to proceed until we press the space bar or any other key (or click  ). Instead of pausing, we can ask Stata to continue scrolling until the output is complete. Typed in the Command window or as part of a program, the command

. set more off

calls for continuous scrolling. This is convenient if our program produces much screen output that we don’t want to see, or if it is writing to a log file that we will examine later. Typing

. set more on

returns to the usual mode of waiting for keyboard input before scrolling.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *