Data Analysis Using STATA: Use and Interpretation

Stata is a general-purpose statistical software package created in 1985 by StataCorp. The name Stata is a syllabic abbreviation of the words statistics and data. The FAQ for the official forum of Stata insists that the correct English pronunciation of Stata “must remain a mystery”; any of “Stay-ta”, “Sta-ta” or “Stah-ta” (rhymes of the three pronunciations of ‘data’) are considered acceptable. More recent updates indicate that Stata employees pronounce it /ˈsttə/.

Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine, and epidemiology. According to StataCorp (2016), Stata is “a complete, integrated statistical software package that provides everything you need for data analysis, data management, and graphics”.  Basically, Stata is a software that allows you to store and manage data (large and small data sets), undertake statistical analysis on your data, and create some really nice graphs.

Stata allows you to write code or use menus to perform your analysis. Stata has two primary menu tabs: Graphics and Statistics. Within “Statistics” there are twenty-one sub tabs and numerous tabs within those tabs. Within “Graphics” there are twenty-one tabs as well (you think the people at Stata like to play Black Jack?). I admit, this can be a bit daunting and time consuming if you are trying to find a specific function.  But wait, there’s more. In the command box you can type “help” and what you are looking for. The key point here is I don’t waste time trying to find what I’m looking for. After running the command through the menus method the code shows up in the “Review” box.  I can then copy and paste the command into a “do-file”. A “do-file” is the text document that allows you to submit more than one command to Stata at once.

Stata allows you to have more than one do-file opened at a time. This is a big plus because it makes it easy to copy and paste from other project do-files into the current do-file.  Using do-files is significantly quicker than using the menus if you have created template do-files, especially for creating graphs. There are so many options for creating a graph. It takes less than a minute to copy from a template and paste the commands into your current project. Stata is extremely efficient running repetitive analysis when incorporating macros and loops in a do-file. This sounds like it may be difficult but it’s not.

How well is Stata supported by Stata Corp? On average Stata sends out updated files every two months with new features and/or any fixes to reported glitches. The reference guide for Stata 13 is 281 pages filled with examples and links to the data sets used in the examples. The professional community also provides incredible support. Stata allows third party written commands (also known as modules) to be imported into the software. The website http://ideas.repec.org/s/boc/bocode.html is a warehouse for hundreds of third party written commands which have been tested before made public. Running a search for “logistic” returned 128 results. The bottom line is Stata will run every analysis that the other major statistical packages can, if not more. It is a very efficiently organized program to learn to use.  Third party professionals are continuously offering new functions. Stata adds new features without charging a “new” version fee. All this and the added bonus are reasonably priced and has no add-on charges.

There are four major builds of each version of Stata:

  • Stata/MP for multiprocessor computers (including dual-core and multicore processors)
  • Stata/SE for large databases
  • Stata/IC, which is the standard version
  • Numerics by Stata, supports any of the data sizes listed above in an embedded environment

Small Stata, which was the smaller, student version for educational purchase only, is no longer available. This software is commonly used among health researchers, particularly those working with very large data sets, because it is a powerful software that allows you to do almost anything you like with your data.

It’s important to note that Stata is not the only statistical software – there are many others that you may come across if you pursue a career that requires you to work with data. Some of the other common statistical packages include SPSS and SAS (yes, they all start with ‘s’!).  The focus for this session, however, is on Stata.

[title text=”Main contents” link_text=”See more from basic to advanced” link=”/category/methodology/statistical-software/stata/”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”5000″ cat=”237″ posts=”3″ offset=”134″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”6000″ cat=”237″ posts=”6″ offset=”128″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”7000″ cat=”237″ posts=”6″ offset=”122″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”8000″ cat=”237″ posts=”6″ offset=”116″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”5000″ cat=”237″ posts=”6″ offset=”110″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”6000″ cat=”237″ posts=”6″ offset=”104″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”7000″ cat=”237″ posts=”6″ offset=”98″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”8000″ cat=”237″ posts=”6″ offset=”92″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”5000″ cat=”237″ posts=”6″ offset=”86″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”6000″ cat=”237″ posts=”6″ offset=”80″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”7000″ cat=”237″ posts=”6″ offset=”74″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[blog_posts style=”normal” col_spacing=”xsmall” columns=”3″ columns__md=”1″ depth_hover=”2″ auto_slide=”8000″ cat=”237″ posts=”6″ offset=”68″ show_date=”false” excerpt_length=”25″ comments=”false” image_height=”60%” image_size=”original” image_hover=”zoom” text_align=”left”]

[row style=”small” class=”form-lien-he”]

[col span=”2″ span__sm=”12″]

[/col]
[col span=”4″ span__sm=”12″]

[button text=”Home” color=”secondary” style=”gloss” radius=”5″ depth=”2″ depth_hover=”3″ expand=”true” icon=”icon-star” icon_pos=”left” link=”https://phantran.net/”]

[/col]
[col span=”4″ span__sm=”12″]

[button text=”See basic to advanced” style=”gloss” radius=”5″ depth=”2″ depth_hover=”3″ expand=”true” icon=”icon-checkmark” icon_pos=”left” link=”/category/methodology/statistical-software/stata/”]

[/col]
[col span=”2″ span__sm=”12″]

[/col]

[/row]

How to Install and Activate Stata on a Windows Computer

In this post, we will show you how to install and activate Stata, that is a software allowing you to store and manage data, undertake statistical analysis on your data, and create some really nice graphs. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine and epidemiology.

Step 1: Download Stata

1. Go to http://download.stata.com/download or download directly Stata here or here. The following login box should appear in your browser. Enter in your username and password (as provided in the email from Survey Design and Analysis), as well as the other information requested, then click ‘Log in’:

2. Select the ‘Windows’ option for your operating system:

3. Then click on ‘Download SetupStata_xx.exe’:

4. The Setup file should now be downloading, and the following should appear in the bottom left corner of your browser:

NOTE: Please make sure that you have a good internet connection to be able to download the package.

Step 2: Install Stata

1. Once the file is downloaded, click on the SetupStata17.exe file. You will see the following dialogue box appear, please wait until the process is finished:

2. Once the above is finished, the installation will begin. The following should appear on your screen:
Click on ‘Next’, then read and accept the Licence Agreement, then click on ‘Next’:
2. Put the user name and organisation as required. NOTE: if it is a single-user licence, you will need to click on ‘Only for me’. Then click ‘Next’:
3. Select your Stata edition, please make sure that you select the right edition as shown on your licence file:
4. Then you will proceed with the listed ‘Destination Folder’ or browse to choose an alternative folder, then click on ‘Next’:
5. Select the first option to set the documents folder as your Default Working Directory, then click on ‘Next’:
6 Click on ‘Install’ to begin the installation of Stata:
Please wait while the software installs, the following dialogue box should appear while installing:

Step 3: Activate Stata

1. Type stata in the search bar and then click and open Stata

2. You will then be asked to initialise your licence, enter details in the following dialogue box (please copy-paste your Serial number, Code and Authorization to avoid typing errors):
To find the details required, open the PDF licence file, and you will see serial number, code and authorization.

3. Ensure the following box is checked to register for access to Technical Support. Then click ‘Finish’. The licence is now activated.

 

How to Install and Activate Stata on a Mac Computer

Step 1: Download Stata

1. Go to http://download.stata.com/download. The following login box should appear in your browser. Enter your username and password (as provided in the email from Survey Design and Analysis), as well as the other information requested, and click ‘Log in’:

2. Select the ‘Mac’ option for your operating system:
3. Then click on ‘Stata17.dmg’:
4. The dmg file should now be downloading, and the following should appear in the top right corner of your browser (if you are using Safari):
NOTE: Please make sure that you have a good internet connection to be able to download the package.

Step 2: Install Stata

1. Once the file is downloaded you can run the Stata17.dmg file from your browser by clicking on the download arrow in the top right corner of the browser and then double-clicking the file. Alternatively, you can navigate to your Downloads folder and the Stata17.dmg file should be located here (as shown below):

2. Run the Stata17.dmg file, which should open the following folder:
3. Double-click on the “Install Stata” icon in the Stata17.dmg folder (above). This should open the installer and show the screen below. Click on ‘Continue’ to proceed.
4. Click on ‘Continue’ again to accept the licence agreement on the below screen:
5. Click on ‘Agree’ to accept the licence agreement:
6. Select your Stata edition, please make sure that you select the right edition as shown on your licence file, then click ‘Continue’:

7. Confirm where you would like Stata to install. The default is the Macintosh HD, however if you wish to change this to another location click on ‘Change Install Location …’, and once you are happy with the install location click ‘Install’:

8. Please wait while the software installs, the following screen should appear while installing:
9. Once installation is complete you should see the screen below. Click ‘Close’ to finish:
10. Your Mac will then ask if you want to move the Stata17.dmg file to the trash. If you would like to save the Stata17.dmg file to a USB or somewhere for easy re-installation later click ‘Keep’. Otherwise if you do not wish to keep it select ‘Move to Bin’:

Step 3: Activate Stata

1. To find Stata, go to your Applications folder and find a folder called Stata. Your new Stata program is located in this folder. Double-click it to run. Otherwise, the Stata program should have been added to your Launchpad screen, so if you use this to start programs you should be able to find it there too. Run the Stata program.

2. You will then be asked to initialise your licence, enter details in the following dialogue box (please copy-paste your Serial number, Code and Authorization to avoid typing errors):

To find the details required, open the PDF licence file, and you will see serial number, code and authorization. Once you have entered the information you should be able to click ‘Next’. If the next button is not lit up for you to click, it means you have not entered either your serial number, code or authorization correctly. For best results please copy-paste from the PDF licence file you were emailed.

3. Ensure the following box is checked to register for access to Technical Support. Then click ‘Finish’. The licence is now activated.

Quick Overview of Stata User Interface

Stata’s capabilities include data management, statistical analysis, graphics, simulations, regression, and custom programming. It also has a system to disseminate user-written programs that lets it grow continuously. This article introduces the core of Stata’s interface: its main windows, its toolbar, its menus, and its dialogs.

1. The windows

The five main windows are the History, Results, Command, Variables, and Properties windows. Except for the Results window, each window has its name in its title bar. These five windows are typically in use the whole time Stata is open. There are other, more specialized windows such as the Viewer, Data Editor, Variables Manager, Do-file Editor, Graph, and Graph Editor windows – these are discussed later in this manual.

The Command window

Commands are submitted to Stata from the Command window. The Command window supports basic text editing, copying and pasting, a command history, function-key mapping, filename completion, and variable-name completion.

From the Command window, pressing:

Page Up steps backward through the command history.

Page Down steps forward through the command history.

Tab autocompletes a partially typed variable name when possible or presents a list of similar names if there could be more than one completion. Further typing will narrow the list. As soon as the name is complete, the full name will be inserted. If the name starts with a double quote, Tab attempts to autocomplete a filename in the same manner.

The command history allows you to recall a previously submitted command, edit it if you wish, and then resubmit it. Commands submitted by Stata’s dialogs are also included in the command history, so you can recall and submit a command without having to open the dialog again.

The Results window

The Results window contains all the commands and their textual results you have entered during the Stata session.

You can scroll through the Results window to look at work you have done, it is much simpler to search within the Results window by using the find bar. By default, the find bar is hidden. You can expose it by selecting Edit > Find ….

You can clear out the Results window at any time by right-clicking in the Results window and selecting Clear results from the contextual menu. This action is not undoable.

The Variables window

The Variables window shows the list of variables in the dataset, along with the properties of the variables. By default, it shows all the variables and their variable labels. You can change what properties get displayed by right-clicking on the header of any column of the Variables window.

Click once on a variable in the Variables window to select it. Multiple variables can be selected in the usual fashion, either by Ctrl-clicking on nonadjacent variables or by clicking on a variable and Shift-clicking on a second variable to select all intervening variables.

Double-clicking on a variable in the Variables window puts the selected variable at the insertion point in the Command window.

The leftmost column of the Variables window is called the one-click paste column. You can also send variables to the Command window by hovering the mouse over the one-click paste column of the Variables window and clicking on the arrow that appears. The one-click paste column can be shown or hidden in the same fashion as the other columns in the Variables window.

The Variables window supports filtering and changing the display order of the variables. Text entered in the Filter variables here field will filter the variables appearing in the Variables window. The filter is applied to all visible columns and shows all variables that match the criteria in at least one column. By default, the filter will ignore case and show any variables for which at least one column contains any of the words in the filter. Clicking on the wrench on the left will allow you to change this behavior as well as add or remove additional columns containing information about the variables.

You can change the display order of the variables in the Variables window by clicking on any column header. The first click sorts in ascending order, the second click sorts in descending order, and the third click puts the variables back in dataset order. Thus, clicking on the Name column header will make the Variables window display the variables in alphabetical order. Sorting in the Variables window is live, so if you change a property of a variable when the Variables window is sorted by that property, it will automatically move the variable to its proper location. Reordering the display order of the variables in the Variables window does not affect the order of the variables in the dataset itself.

Right-clicking on a variable in the Variables window displays a menu from which you can select

  • Keep only selected variables to keep just the selected variables in the dataset in memory. You will be asked for confirmation. This affects only the dataset in memory, not the dataset as saved on your disk.
  • Drop selected variables to drop, or eliminate, the selected variables from the dataset in memory. You will be asked for confirmation. Just as above, this affects only the dataset in memory, not the dataset as saved on your disk.
  • Copy varlist to copy the selected variable names to the clipboard.
  • Select all to select all variables in the dataset that satisfy the filter conditions. If no filter has been specified, all variables will be selected.
  • Send varlist to Command window to send all selected variables to the Command window.
  • .. to bring up a Font dialog, allowing you to change the font used to display the Variables window contents.

Items from the contextual menu issue standard Stata commands, so working by right-clicking is just like working directly in the Command window.

If you would like to hide the Variables window, click on its close button.

To reveal a hidden Variables window, select Window > Variables.

The Properties window

The Properties window displays variable and dataset properties. If a single variable is selected in the Variables window, its properties are displayed. If there are multiple variables selected in the Variables window, the Properties window will display properties that are common across all selected variables.

Clicking the lock icon in the Properties window title bar toggles the ability to alter properties of the selected variables. By default, changes are not allowed. Once the properties are unlocked, you can make any changes to variable or dataset properties you like. Each change you make will create a command that appears in the Results and Command windows, as well as in any command log, so the changes are reproducible. Using the Properties window is one of the simplest ways of managing notes, changing variable and value labels, and changing display formats.

Clicking the arrow buttons next to the lock icon will select the previous or next variable shown in the Variables window, and that selection will be reflected in the Properties window. If you would like to hide the Properties window, click on its close box. If you would like to reveal a hidden Properties window, select Window > Properties.

You should also investigate the Variables Manager, because it extends these capabilities and provides a good interface for managing variables.

The History window

The History window shows the history of commands that have been entered, with unsuccessful commands and their error codes in red, by default.

The toolbar has two tools for manipulating the contents of the History window. Clicking on the Filter button in the History window title bar toggles the visibility of these tools. Text entered in the Filter commands here field will filter the commands appearing in the History window. By default, the filter ignores case and finds any commands containing any of the words in the filter. Clicking on the wrench on the left allows you to change this behavior. You can hide the commands that produced an error by clicking on the Filter errors button .

To enter a command from the History window, you can:

  • Click once on a past command to copy it to the Command window, replacing the contents of the Command window.
  • Double-click on a past command to resubmit it. Executing the command adds the command to the bottom of the History window.

Right-clicking on the History window displays a menu from which you can select various actions:

  • Cut removes the selected commands from the History window and places them on the Clipboard.
  • Copy copies the selected commands to the Clipboard.
  • Delete removes the selected commands from the History window.
  • Select all selects all the commands in the History window, including those before and after the commands currently displayed.
  • Clear all clears out all the commands from the History window, including those before and after the commands currently displayed.
  • Do selected submits all the selected commands and adds them to the bottom of the command history. Stata will attempt to run all the selected commands, even those containing errors, and will not stop even if a command causes an error.
  • Send selected to Do-file Editor places all the selected commands into a new Do-file Editor window.
  • Save all… brings up a Save review contents dialog, which allows you to save all the commands in the History window, including those before and after the commands currently displayed, in a do-file. (See [GSW] 13 Using the Do-file Editor—automating Stata for more information on do-files.)
  • Save selected… brings up a Save review contents dialog, which allows you to save the selected commands in the History window in a do-file.
  • Font… brings up a Font dialog, allowing you to change the font used to display the History window contents.

2. The toolbar

This is the toolbar:

The toolbar contains buttons that provide quick access to Stata’s more commonly used features. If you forget what a button does, hold the mouse pointer over the button for a moment, and a tooltip will appear with a description of that button.

Buttons that include both an icon and an arrow display a menu if you click on the arrow. Here is an overview of the toolbar buttons and their functions:

Open opens a Stata dataset. Click on the button to open a dataset with the Open dialog.

Save saves the Stata dataset currently in memory to disk.

Print displays a list of windows. Select a window name to print its contents.

Log begins a new log or closes, suspends, or resumes the current log.

Viewer opens the Viewer or brings a Viewer to the front of all other windows. Click on the button to open a new Viewer. Click on the arrow to select a Viewer to bring to the front.

Graph brings a Graph window to the front of all other windows. Click on the button to bring the Graph window to the front. Click on the arrow to select a Graph window to bring to the front.

Do-file Editor opens the Do-file Editor or brings a Do-file Editor to the front of all other windows. Click on the button to open a new Do-file Editor. Click on the arrow to select a Do-file Editor to bring to the front.

Data Editor (Edit) opens the Data Editor or brings the Data Editor to the front of the other Stata windows.

Data Editor (Browse) opens the Data Editor in browse mode.

Variables Manager opens the Variables Manager.

Show more results tells Stata to continue when it has paused in the middle of long output. Click on the arrow to choose whether to run the command to completion.

Break stops the current task in Stata.

3. Menus and dialogs

There are two ways by which you can tell Stata what you would like it to do: you can use menus and dialogs, or you can use the Command window. We will discuss the menus and dialogs here.

Stata’s Data, Graphics, and Statistics menus provide point-and-click access to almost every command in Stata. As you will learn, Stata is fully programmable, and Stata users can even create their own dialogs and menus. The User menu provides a place for programmers to add their own menu items. Initially, it contains only some empty submenus. As an example, suppose you wish to perform a Poisson regression. You could type Stata’s poisson command, or you could select Statistics > Count outcomes > Poisson regression, which would display this dialog:

This dialog provides access to all the functionality of Stata’s poisson command. Because the dependent and independent variables must be numeric, you will find that the combo box will display only numeric variables for choosing. The poisson command has many options that can be accessed by clicking on the multiple tabs across the top of the dialog. The first time you use the dialog for a command, it is a good idea to look at the contents of each tab so that you will know all the dialog’s capabilities.

The dialogs for many commands have the by/if/in and Weights tabs. These provide access to Stata’s commands and qualifiers for controlling the estimation sample and dealing with weighted data.

The dialogs for most estimation commands have the Maximization tab for setting the maximization options. For example, you can specify the maximum number of iterations for the optimizer.

Most dialogs in Stata provide the same six buttons you see at the bottom of the poisson dialog above.

OK issues a Stata command based on how you have filled out the fields in the dialog and then closes the dialog.

Cancel closes the dialog without doing anything—just as clicking on the dialog’s close button does.

Submit issues a command just like OK but leaves the dialog on the screen so that you can make changes and issue another command. This feature is handy when, for example, you are learning a new command or putting together a complicated graph.

Help provides access to Stata’s help system. Clicking on this button will typically take you to the help file for the Stata command associated with the dialog. Clicking on it here would take you to the poisson help file. The help file will have tabs above groups of options to show which dialog tab contains which options.

Reset resets the dialog to its default state. Each time you open a dialog, it will remember how you last filled it out. If you wish to reset its fields to their default values at any time, simply click on this button.

Copy command to Clipboard behaves much like the Submit button, but rather than issuing a command, it copies the command to the Clipboard. The command can then be pasted elsewhere (such as in the Do-file Editor).

The command issued by a dialog is submitted just as if you had typed it by hand. You can see the command in the Results window and in the History window after it executes. Looking carefully at the full command will help you learn Stata’s command syntax.

In addition to being able to access the dialogs for Stata commands through Stata’s menus, you can also invoke them by using two other methods. You may know the name of a Stata command for which you want to see a dialog, but you might not remember how to navigate to that command in the menu system. Simply type db commandname to launch the dialog for commandname:

. db poisson

You will also find access to the dialog for a command in that command’s help file.

As you read this manual, we will present examples of Stata commands. You may type those examples as presented, but you should also experiment with submitting those commands by using their dialogs. Use the db command described above to quickly launch the dialog for any command that you see in this manual.

Using the Stata Viewer

The Viewer’s purpose

The Viewer is a versatile tool in Stata. It will be the first place you can turn for help within Stata, but it is far more than just a help system. You can also use the Viewer to add, delete, and manage third-party extensions to Stata that are known as community-contributed features; to view and print Stata logs from both your current and your previous Stata sessions; to view and print any other Stata-formatted (SMCL) or plain-text file; and even to launch your web browser to follow hyperlinks.

This chapter focuses on the general use of the Viewer, its buttons, and a brief summary of the commands that the Viewer understands. There is more information about using the Viewer to find help in [GSW] 4 Getting help and for installing community-contributed features in [GSW] 19 Updating and extending Stata—Internet functionality.

To open a new Viewer window (or open a new tab in an existing Viewer), you can either click on the Viewer button,  , or select Window > Viewer > New Viewer.

Viewer buttons

The toolbar of the Viewer has multiple buttons, a command box, and a search box.

The Find bar is used to find text within the current Viewer. To reveal the Find bar at the bottom of the window, click on the Find text in page button:

The Find bar has its own buttons, fields, and checkboxes.

Viewer’s function

The Viewer is similar to a web browser. It has links (shown in blue text by default) that you can click on to see related help topics and to install and manage third-party software. When you move the mouse pointer over a link, the status bar at the bottom of the Viewer shows the action associated with that link. If the action of a link is help logistic, clicking on that link will show the help file for the logistic command in the Viewer. Middle-clicking on a link in a Viewer window (Ctrl+clicking if you do not have a three-button mouse) will open the link in a new tab in the Viewer window. Shift+clicking will open the link in a new Viewer window.

You can open a new Viewer by selecting Window > Viewer > New Viewer or by clicking on the Viewer button on the toolbar of the main window. Entering a help command from the Command window will also open a new Viewer.

To bring a Viewer to the front of all other Viewers, select Window > Viewer and choose a Viewer from the list there. Selecting Close all Viewers closes all open Viewer window and tabs.

Viewing local text files, including SMCL files

In addition to viewing built-in Stata help files, you can use the Viewer to view Stata Markup and Control Language (SMCL) files such as those typically produced when logging your work (see [GSW] 16 Saving and printing results by using logs) as well as plain-text files. To open a file and view its contents, simply select File > Open…, and you will be presented with a dialog:

You may either type in the name of the file that you wish to view and click on OK, or you may click on the Browse… button to open a standard file dialog that allows you to navigate to the file.

If you currently have a log file open, you may view the log file in the Viewer. This method has one advantage over scrolling back in the Results window: what you view stays fixed even as output is added to the Results window. If you wish to view a current log file, select File > Log > View…, and the usual dialog will appear but with the path and filename of the current log already in the field. Simply click on OK, and the log will appear in the Viewer. See [GSW] 16 Saving and printing results by using logs for more details.

Viewing remote files over the Internet

If you want to look at a remote file over the Internet, the process is similar to viewing a local file, only instead of using the Browse… button, you type the URL of the file that you want to see, such as https://www.stata.com/man/readme.smcl. You should use the Viewer only to view text or SMCL files. If you enter the URL of, say, an arbitrary webpage, you will see the HTML source of the page instead of the usual browser rendering.

Navigating within the Viewer

In addition to using the scrollbar to navigate the Viewer window, you also can use the up and down arrow keys and Page Up and Page Down keys to do the same. Pressing the up or down arrow key scrolls the window a line at a time. Pressing the Page Up or Page Down key scrolls the window a screen at a time.

Printing

To print the contents of the Viewer, right-click on the window and select Print…. You may also select File > Print > Viewer name or click on the Print toolbar button, , to print.

Tabs in the Viewer

A Viewer window can have multiple tabs. You may view different files or different views of the same file in different tabs. Clicking on the Open new tab button, +, will open a new tab in the current Viewer window. You can change the order of the tabs within a Viewer by dragging the tabs along the tab bar within the window. If you drag a tab and drop it within the body of the same Viewer window, a menu will appear. If you select New Horizontal Tab Group, the Viewer window will be split horizontally. This is useful for having two views of the same file at different places in the file. Similarly, if you select New Vertical Tab Group, the Viewer window will be split vertically. This is useful for comparing two similar files side by side.

Once you have created a horizontal or vertical tab group, if you drop a tab inside a group, you will get the choice of creating a similar new group or moving the tab to the selected group.

Right-clicking on the Viewer window

Right-clicking on the Viewer window displays a contextual menu that offers these options:

  • Select all to select all text in the Viewer.
  • Preferences .. to edit the preferences for the Viewer window.
  • Font .. to change the font for the window.
  • Print.. to print the contents of the Viewer window.

In other contexts, there could be more items displayed in the contextual menu.

Searching for help in the Viewer

The search box in the Viewer can be used to search documentation. Click on the magnifying glass; choose Search all, Search documentation and FAQs, or Search net resources; and then type a word or phrase in the search box and press Enter. For more extensive information about using the Viewer for help, see [GSW] 4 Getting help.

Commands in the Viewer

Everything that can be done in the Viewer by clicking on links and buttons can also be done by typing commands in the command box at the top of the window or on the Stata command line. Some tasks that can be performed in the Viewer are

  • obtaining help (see [GSW] 4 Getting help):

Type contents to view the contents of Stata’s help system.

Type commandname to view the help file for a Stata command.

Type keyword to search documentation, FAQs, and net resources for a topic.

  • searching (see [GSW] 4 Getting help):

Type search keyword to search documentation, FAQs, and net resources for a topic. Type search keyword, local to search only documentation and FAQs for a topic.

Type search keyword, net to search only net resources for a topic.

  • finding and installing community-contributed commands (see [GSW] 4 Getting help and [GSW] 19 Updating and extending Stata—Internet functionality):

Type net from https://www.stata.com/ to find and install Stata Journal, Stata Technical Bulletin, and community-contributed commands from the Internet.

Type ado to review community-contributed packages you have installed.

Type ado uninstall to uninstall community-contributed packages you have installed on your computer.

  • viewing files in the Viewer:

Type view filename.smcl to view SMCL files.

Type view filename .txt to view text files.

Type view filename .log to view text log files.

  • viewing files in the Results window:

Type type filename.smcl in the Command window to view SMCL files in the window.

Type type filename .txt in the Command window to view text files in the window.

Type type filename .log in the Command window to view text log files in the window.

  • launching your browser to view an HTML file:

Type browse URL to launch your browser.

Using the Viewer from the Command window

Typing help commandname in the Command window will bring up a new Viewer showing the requested help.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Getting help in Stata

1. System help

Stata’s help system provides a wealth of information to help you learn and use Stata. To find out which Stata command will perform the statistical or data management task you would like to do, you should generally follow these steps:

  1. Select Help > Search…, choose Search all, and enter the topic or keywords. This search will open a new Viewer window containing information about Stata commands, references to articles in the Stata Journal, links to Frequently Asked Questions (FAQs) on Stata’s website, links to videos on Stata’s YouTube channel, links to selected external websites, and links to community-contributed features.
  2. Read through the results. If you find a useful command, click on the link to the appropriate command name to open its help file.
  3. Read the help file for the command you chose.
  4. If you want more in-depth help, click on the link from the name of the command to the PDF documentation, read it, then come back to Stata.
  5. If the first help file you went to is not what you wanted, either click on the Also see menu and choose a link to related help files or click on the Back button to go back to the previous document and go from there to other help files.
  6. With the help file open, click on the Command window and enter the command, or click on the Dialog button and choose a link to open a dialog for the command.
  7. If, at any time, you want to begin again with a new search, enter the new search terms in the search box of the Viewer window.
  8. If you select Search documentation and FAQs, Stata searches its keyword database for official Stata commands, Stata Journal articles and software, FAQs, and videos. If you select Search net resources, Stata searches for community-contributed commands, whether they are from the Stata Journal or elsewhere.

Let’s illustrate the help system with an example. You will get the most benefit from the example if you work along at your computer.

Suppose that we have been given a dataset about antique cars and that we need to know what it contains. Though we still have a vague notion of having seen something like this while working through the example session in [GSW] 1 Introducing Stata—sample session, we do not remember the proper command.

Start by typing sysuse auto, clear in the Command window to bring the dataset into memory.

Follow the above approach:

  • Select Help > Search
  • Check that the Search all radio button is selected.
  • Type dataset contents into the search box and click on OK or press Enter. Before we press Enter, the window should look like

Stata will now search for “dataset contents” among the Stata commands, the reference manuals, the Stata Journal, the FAQs on Stata’s website, and community-contributed features. Here is the result:

Upon seeing the results of the search, we see two commands that look promising: codebook and describe. Because we are interested in the contents of the dataset, we decide to check out the codebook The [D] means that we could look up the codebook command in the Data Management Reference Manual. The codebook link in (help codebook) means that there is a system help file for the codebook command. This is what we are interested in right now.

  • Click on the codebook Links can take you to a variety of resources, such as help for Stata commands, dialogs, and even webpages. Here the link goes to the help file for the codebook command.

What is displayed is typical for help for a Stata command. Help files for Stata commands contain, from top to bottom, these features:

  • The quick access toolbar with three buttons:
    • The Dialog button shows links to any dialogs associated with the command.
    • The Also see button shows links to related PDF documentation and help files.
    • The Jump to button shows links to other sections within the current help file.
  • The second line of a help file shows a View complete PDF manual entry Clicking on the link will open the complete documentation for the command—in this case, codebook—in your PDF viewer.
  • The command’s syntax, that is, rules for constructing a command that Stata will correctly interpret. The square brackets here indicate that all the arguments to codebook are optional but that if we wanted to specify them, we could use a varlist, an if qualifier, or an in qualifier, along with some options. (Options vary greatly from command to command.) The options are listed directly under the command and are explained in some detail later in the help file. You will learn more about command syntax later.
    • A description of the command. Because “codebook” is the name for big binders containing a hard copy describing each of the elements of a dataset, the description for the codebook command is justifiably terse.
    • The options that can be used with this command. These are explained in much greater detail than in the listing of the possible options after the syntax. Here, for example, we can see that the mv option can look to see if there is a pattern in the missing values—something important for data cleaning and imputation.
    • Examples of command usage. The codebook examples are real examples that step through using the command on a dataset either shipped with Stata or loadable within Stata from the Internet.
    • The information the command stores in the returned results. These results are used primarily by programmers.

For now, either click on Jump to and choose Examples from the drop-down menu or scroll down to the examples. It is worth going through the examples as given in the help file. Here is a screenshot of the top of the examples:

2. Searching help

Search is designed to help you find information about statistics, graphics, data management, and programming features in Stata, either as part of the official release or as community-contributed features. When entering topics for the search, use appropriate terms from statistics, etc. For example, you could enter Mann-Whitney. Multiple topic words are allowed, for example, regression residuals.

When you are using Search, use proper English and proper statistical terminology. If you already know the name of the Stata command and want to go directly to its help file, select Help > Stata command… and type the command name. You can also type the command name in the Search field at the top of the Viewer and press Enter.

Help distinguishes between topics and Stata commands because some names of Stata commands are also general topic names. For example, logistic is a Stata command. If you choose Stata command… and type logistic, you will go right to the help file for the command. But if you choose Search… and type logistic, you will get search results listing the many Stata commands that relate to logistic regression.

Remember that you can search for help from within a Viewer window by typing a command in the command box of the Viewer or by clicking the magnifying glass button to the right of the search box, selecting the scope of your search, typing the search criteria in the search box, and pressing Enter.

3. Help and search commands

As you might expect, the help system is accessible from the Command window. This feature is especially convenient when you need help on a particular Stata command. Here is a short listing of the various commands you can use:

  • Typing help commandname is equivalent to selecting Help > Stata command… and typing The help file for the command appears in a new Viewer window.
  • Typing search topic in the Command window produces the same output as selecting Help > Search…, choosing Search all, and typing The output appears in a new Viewer window.
  • Typing search topic, local in the Command window produces the same output as selecting Help > Search…, choosing Search documentation and FAQs, and typing The output appears in the Results window instead of a Viewer.
  • Typing search topic, net in the Command window produces the same output as selecting Help > Search…, choosing Search net resources, and typing The output appears in the Results window instead of a Viewer.

See [U] 4 Stata’s help and search facilities and [U] 4.8 search: All the details in the User’s Guide for more information about these command-language versions of the help system. The search command, in particular, has a few capabilities (such as author searches) that we have not demonstrated here.

4. The Stata reference manuals and User’s Guide

All the Stata reference manuals come as PDF files and are included with the software. The manuals themselves have many cross-references in the form of clickable links, so you can easily read the documentation in a nonlinear way.

Many of the links in the help files point to the PDF manuals that came with Stata. It is worth clicking on these links to read the extensive information found in the manuals. The Stata help system, though extensive, contains only a fraction of the information found in the manuals.

Most Stata reference manuals are each arranged alphabetically. Each Getting Started with Stata has its own index. A combined index for all other manuals can be found in the Stata Index. This combined index is a good place to start when you are looking for information about a command.

Entries have names like collapse, egen, and summarize, which are generally themselves Stata commands.

Notations such as [R] ci, [R] regress, and [R] ttest in the Search results and help files are references to the Base Reference Manual. You may also see things like [P] PyStata integration, which is a reference to the Programming Reference Manual, and [U] 12 Data, which is a reference to the User’s Guide. For a complete list of manuals and their shorthand notations, see Cross-referencing the documentation, which immediately follows the table of contents in this manual.

For advice on how to use the reference manuals, see [GSW] 18 Learning more about Stata, or see [U] 1.2 The User’s Guide and the Reference manuals.

5. Stata videos

The Stata YouTube channel is an excellent resource for learning about Stata. The brief videos demonstrate many topics using Stata’s graphical user interface. They cover basic topics, such as data management, graphics, summary statistics, and hypothesis testing, and advanced topics, such as multilevel models and structural equation models.

There are also several playlists that provide a series of videos about a topic in sequence. For example, the “Power and sample size calculations” playlist includes videos about how to calculate power, sample size, and effect size for two independent proportions and for paired samples. The “Survival analysis” playlist takes you through the process of setting your data up for survival analysis, conducting basic descriptive analysis of survival data, graphing survival data, and calculating survivor functions and life tables. The “Time series” playlist takes you through the process of setting your data up for time-series analysis, creating time-series graphs, using time-series operators in estimation, and fitting ARMA and ARIMA models. There is even a “Back-to-school video” playlist for students who are using Stata for the first time or want a refresher after summer break.

See https://www.stata.com/links/video-tutorials/ for an up-to-date list of videos organized by topic. The playlists can be accessed directly at https://www.youtube.com/user/statacorp/.

6. The Stata Journal

When searching in Stata, you will often see links to the Stata Journal.

The Stata Journal is a printed and electronic journal, published quarterly, containing articles about statistics, data analysis, teaching methods, and effective use of Stata’s language. The Journal publishes peer-reviewed papers together with shorter notes and comments, regular columns, tips, book reviews, and other material of interest to researchers applying statistics in a variety of disciplines. The Journal is a publication for all Stata users, both novice and experienced, with different levels of expertise in statistics, research design, data management, graphics, reporting of results, and Stata, in particular. See https://www.stata-journal.com for more information.

Associated with each issue of the Stata Journal are the programs and datasets described therein. These programs and datasets are made available for download and installation over the Internet, not only to subscribers but also to all Stata users. See [R] net and [R] sj for more information.

Because the Stata Journal has had several articles about measures of inequality, if you select Help > Search…, choose Search documentation and FAQs, type inequality, and scroll down a bit, you will see some of these references:


SJ-18-3 refers to volume 18, number 3 of the Stata Journal. st0539 refers to the package name; st indicates that this package is in the “statistics” category of the Stata Journal. Listed next is the title of the software package and the authors. The community-contributed commands found within this package are listed next in parentheses, followed by the reference details of the article. Clicking on an SJ link, such as SJ 18(3):692—715, will open a browser and take you to the Stata Journal website, where you can view the abstract of the article and purchase the article. The search listing concludes with a brief description of the community-contributed package.

The Stata Journal website allows all articles older than three years to be downloaded for free. See Downloading community-contributed commands in [GSW] 19 Updating and extending Stata— Internet functionality for more details on how to install community-contributed software. Also see [R] ssc for information on a convenient interface to resources available from the Statistical Software Components (SSC) Archive.

We recommend that all users subscribe to the Stata Journal. See [U] 3.4 The Stata Journal for more information.

Links to other sites where you can freely download programs and datasets for Stata can be found on the Stata website; see https://www.stata.com/links/.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Finding and Installing New Commands or User-Written Programs in Stata

Stata has a system to disseminate user-written programs that lets it grow continuously. So actually, there are a tremendous number of user-written programs for Stata available which, once installed, act just like official Stata commands. This article will show you how to find, install and update new commands or user-written programs in Stata.

Finding User-Written Programs

If you know the name of the program you want to use, you can go directly to Installing User-Written Programs. However, it’s much more common to know what you want to do without knowing what program (if any) can do it. This is a job for Stata’s findit command.

For example, suppose you wanted to diagnose the collinearity of some variables, and you know that collin command can do it. So, type:

findit collin

The result is a tremendous amount of information. You can click on each package to view a very brief description for selecting the suitable package, for example here package st 0004 2.

Installing User-Written Programs

If you identified the package you can install it by clicking on its link, namely “click here to install”, for example package st0004_2 above for collin command.

Now, the collin command is ready to use.

You must only type the right command for using it as we try here the collin command, for indicating the collinearity diagnostics of some variables in Stata.

If you know the name of the package you want to install, for example the package ice, you can install it by typing:

ssc install ice

The package ice with tis commands is now ready to use.

Updating User-Written Programs

While few user-written programs are updated as frequently as ice, it’s still important to get the latest versions of any user-written programs you install. And, the easiest way to automatically update all of your user-written programs is to type:

adoupdate, update

The latest versions of all of your user-written programs are now updated.

Opening and saving Stata datasets

Opening and saving datasets in Stata works similarly to those tasks in other computer applications. There are a few differences, however. First, it is possible to save and open files from within Stata’s Command window. Second, Stata allows just one dataset to be active at any one time. That is, while it is possible to have multiple datasets in memory at once, only one dataset may be active. Keeping this in mind will make Stata’s care in opening new datasets clear. This post outlines all the possible ways to open and save datasets (download datasets using in the following video).

A Stata dataset can be opened in a variety of ways, most of which are probably familiar to you from other applications:

  • Double-click on a Stata data file, which is a file whose extension is .dta. Note: The file extension may not be visible, depending on what options you have set in your operating system.
  • Select File > Open… or click on the Open button and navigate to the file.
  • Select File > Open data subset…, navigate to the file, specify the observation range, and select variables from the dataset.
  • Select File > Recent files > filename. 
  • Type use filename in the Command window. Stata will look for filename in the current working directory. If the file is located elsewhere, you will need to give its path. Be aware that if there is a space anywhere in the path or filename, you will need to put the filename inside quotation marks.
  • Type sysuse filename in the Command window. Stata will look for filename in a series of directories called the adopath. Typically, this is for finding example datasets installed when you installed Stata, but it can also be used for easy access to your own datasets. For more information on the adopath.
  • Type webuse filename in the Command window. The webuse command is used to access datasets used in the Stata manuals; for example, webuse lbw loads the lbw dataset used in the documentation of the logistic For more information, see [D] webuse.

Opening a dataset in the current frame will replace the dataset, if any, that is currently in memory for that frame. Datasets in other frames are unaffected. If there have been changes to the data in the dataset in the current frame, Stata will refuse to discard the dataset unless you force it to do so. If you open the file with any method other than the Command window, you will be prompted. If you use the Command window and the current data have changed, you will get the following error message:

. sysuse auto

no; dataset in memory has changed since last saved

r(4);

These behaviors protect you from mistakenly losing data.

To save an unnamed dataset (or an old dataset under a new name):

  1. select File > Save as…; or
  2. type save filename in the Command window.

To save a dataset for use with Stata 13,

  1. select File > Save as…, and select Stata 13 Data (*.dta) from the Save as type list; or
  2. type saveold filename in the Command window.

To save a dataset that has been changed (overwriting the original data file),

  1. select File > Save;
  2. click on the Save button; or
  3. type save, replace in the Command window.

Once you overwrite a dataset, there is no way to recover your original dataset. With important datasets, you may want to either keep a backup copy of your original filename.dta or save your changes to a dataset under a new name. This is no different from working with a word-processing document, except that recovering from an inadvertent save of a dataset is nearly impossible.

Important note: Changes you have made to a dataset are not permanent until you save them. You work with a copy of the dataset in memory, not with the data file itself. This should not be surprising, because it is the way that you work with almost all applications on your computer.

If you do not want to save your dataset, you can clear the dataset in memory and open a new dataset by typing use filename, clear.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Using the Data Editor in Stata

1. The Data Editor

The Data Editor gives a spreadsheet-like view of data that are currently in memory. You can use it to enter new data, edit existing data, search through the dataset, and edit attributes of the data in the dataset, such as variable names, labels, and display formats, as well as value labels.

In addition to the view of the data, there are two windows for manipulating variables and their properties: the Variables window and the Properties window. These are similar to the same-named windows in the main Stata window.

Any action you take in the Data Editor results in a command being issued to Stata as though you had typed it into the Command window. This means that you can keep good records and learn commands by using the Data Editor.

The Data Editor can be kept open while you work in Stata, giving you a live view of your dataset as you work. To protect your data from inadvertent changes, the Data Editor has two modes: edit mode for active editing and browse mode for viewing. In browse mode, editing within the Data Editor window is disabled. We highly recommend that you use the Data Editor in browse mode and switch to edit mode only when you want to make changes.

You can print your data from the Data Editor by selecting Print from the File menu.

We will be entering and editing data in this chapter, as well as manipulating the variables by using the Variables and Properties windows, so start the Data Editor in edit mode by clicking on the Data

Editor (Edit) button, PI.

2. Buttons on the Data Editor

The toolbar for the Data Editor has some standard buttons and some buttons we have not yet seen:

Edit mode: Changes the Data Editor to edit mode.

Browse mode: Changes the Data Editor to browse mode for safely looking at data.

Open: Opens a Stata dataset. Stata will warn you if your current dataset has unsaved changes.

Save: Saves the dataset visible in the Data Editor.

Print: Prints the dataset visible in the Data Editor.

Copy: Copies the current selection to the Clipboard.

Paste: Pastes the contents of the Clipboard. You may paste only if one cell is selected—this Cs cell will become the upper-left corner of the pasted contents. Warning: This action will paste over existing data.

Find: Opens the Find bar for searching in the Data Editor.

 Filter observations: Filters the observations visible in the Data Editor. This button is useful for looking at a subset of the current dataset.

You can move about in the Data Editor by using the typical methods:

  • To move to the right, use the Tab key or the right arrow key.
  • To move to the left, use Shift+Tab or the left arrow key.
  • To move down, use Enter or the down arrow key.
  • To move up, use Shift+Enter or the up arrow key.

You can also click within a cell to select it.

Right-clicking within the Data Editor brings up a contextual menu that allows you to manipulate the data and what you are viewing. Right-clicking on the Data Editor window displays a menu from which you can do many common tasks:

  • Copy to copy data to the Clipboard.
  • Paste to paste data from the Clipboard.
  • Paste special… to paste data from the Clipboard with finer control of delimiters, giving a preview of what will be pasted.
  • Select all to select all the data displayed in the Data Editor. This could be different from the data in the dataset if the data are filtered or some variables are hidden.
  • Data to open a submenu containing
    • Insert variable… to bring up a dialog for creating a new variable at the current cursor position.
    • Add variable… to bring up a dialog for creating a new variable at the beginning or end of a dataset.
    • Replace contents of variable… to bring up a dialog for replacing the values of the selected variable.
    • Insert observations… to bring up a dialog for inserting new empty observations at the current cursor position.
    • Add observations… to bring up a dialog for adding new empty observations to the end of the dataset.
    • Sort data… to sort the dataset by the selected variable.
    • Value labels to access a submenu for managing and displaying value labels.
    • Keep only selected data to keep only the selected data in the dataset. All remaining data will be dropped (removed) from the dataset. As always, this affects only the data in memory. It will not affect any data on disk.
    • Drop selected data to drop the selected data. This is only possible if the selection consists of either entire variables (columns) or observations (rows).
    • Convert variables from string to numeric… for converting string variables to numeric variables, which is useful when the string variables contain characters for formatting numbers instead of just numbers.
    • Convert variables from numeric to string… for converting numeric variables to strings. • Encode string variable to labeled numeric… for encoding a string-valued categorical variable to a numeric variable while still displaying the categories in tables and graphs. • Decode labeled numeric variable to string… for turning an encoded variable back into a string variable.
  • Reset selected column widths to reset the selected columns to the default width.
  • Hide selected variables to hide the selected variables.
  • Show only selected variables to hide all but the selected variables.
  • Show entire dataset to turn off all filters and unhide all variables.
  • Preferences.. to set the preferences for the Data Editor.
  • Font .. to change the font of the Data Editor.

3. Data entry

Entering data into the Data Editor is similar to entering data into a spreadsheet. One major difference is that the Data Editor has the concept of observations, which makes the data entry smart. We will illustrate this with an example. It will be useful for you to follow the example at your computer. To work along, you will need to start with an empty dataset, so save your dataset if necessary, and then type clear in the Command window.

Note: As a check to see if your data have changed, type describe, short (or d,s for short). Stata will tell you if your data have changed.

Suppose that we have the following data, and we want to enter them into Stata:

We do not know MPG for the third car or the make of the sixth.

Start by opening the Data Editor in edit mode. You can do this either by clicking on the Data

Editor (Edit) button, PI, or by typing edit in the Command window. You should be greeted by a Data Editor with no data displayed. (If you see data, type clear in the Command window.) Stata shows the active cell by highlighting it and displaying varname[obsnum] next to the input box in the Cursor Location box. We will see below that we can navigate within a dataset by using this cell reference. The Data Editor starts, by default, in the first row of the first column. Because there are no data, there are no variable names, and so Stata shows var1[1] as the active cell.

We can enter these data either by working across the rows (observation by observation) or by working down the columns (variable by variable). To enter the data observation by observation, press Tab after entering each value until you have reached the end of the first row. In our case, we would type VW Rabbit, press Tab, type 4697, press Tab, and continue entering data to complete the first observation.

After you are finished with the first observation, select the second cell in the first column, either by clicking within it or by navigating to it. At this point, your screen should look like this:

We can now enter the data for the second observation in the same fashion as the first—with one nice difference: after we enter the last value in the row, pressing the Tab key will bring us to the first cell in the third row. This is possible because the number of variables is known after the first observation has been entered, so Stata knows when it has all the data for an observation.

We can enter the rest of the data by pressing the Tab key between entries, simply skipping over missing values by tabbing through them.

If we had wanted to enter the data variable by variable, we could have done that by pressing Enter between each make of car until all seven observations were entered, skipping past the missing entry by pressing Enter twice. Once the first variable was entered, we would select the first cell in the second column and enter the price data. We would continue this until we were finished.

4. Notes on data entry

There are several things to note about data entry and the feedback you get from the Data Editor as you enter data:

  • Stata does not allow blank columns or rows in the middle of your dataset.

Whenever you enter new variables or observations, always begin in the first empty column or row. If you skip columns or rows, Stata will fill in the intervening columns or rows with missing values.

  • Strings and value labels are color coded.

To help distinguish between the different types of variables in the Data Editor, string values, value labels (see [GSW] 9 Labeling data), and all other values are displayed in different colors. You can change the colors for strings and value labels by right-clicking on the Data Editor window and selecting Preferences…………

  • A period (.) represents Stata’s system missing numeric value.

Stata has a system missing value, ‘.’, and extended missing values ‘.a’ through ‘.z’. By default, Stata uses its system missing value.

  • The Tab key is smart.

As we saw above, after the first observation has been entered, Stata knows how many variables you have. So at the end of the second observation (and all subsequent observations), Tab will automatically take you back to the first column.

  • The Cursor Location box both shows location and is used for navigation.

The Cursor Location box gives the location of the current cell. If you see, for example, var3[4], this means that the current cell is the fourth observation of the variable named var3. You can navigate to a particular cell by typing the variable name and the observation in the Cursor Location box. If you wanted the second observation of varl to be the active cell, typing varl 2 in the Cursor Location box and pressing Enter would take you there.

  • Double quotes around text are unnecessary in string variables.

Once Stata knows that a variable is a string variable (it holds text), there is no need to put quotes around the values, even if the values look like a number. Thus, if you wanted to enter ZIP codes as text, you would enter the first ZIP code with quotes (“02173”), but the rest would not need any quotes.

  • The arrow keys are context sensitive.

If you select a cell and type new data, using an arrow key will accept the change and move to a new active cell. If you double-click on a cell, you can edit within the cell contents. In this case, the right- and left-arrow keys move within the cell’s data.

  • You can throw away changes to a cell.

If, while you are entering data in a cell, you decide you would like to cancel the changes, press the Esc key or click outside the cell.

5. Renaming and formatting variables

The data have now been entered into Stata, but the variable names leave something to be desired: they have the default names varl, var2, …, var5. We would like to rename the variables so that they match the column titles from our dataset. We would also like to give the variables descriptions and change their formatting.

We will step through changing the name, label, and format of the price variable. We will then add a note to the variable. Start by clicking on the var2 variable in the Variables window. The few properties associated with var2 are now visible and editable in the Properties window. We may now systematically change the properties of var2 to our choosing:

  1. Double-click on var2 in the Name field to select the old variable name, and type price to overwrite the name.
  2. Click under the new price name in the Label
  3. Enter a worthwhile label, such as Price in dollars.
  4. Click on %9.0g in the Format
  5. Click on the ellipsis (…) button that appears. The Create format dialog opens.
  6. You can see here that there are many possible formats, most of which are related to time. We want commas in our numbers, so check the Use commas in numeric output When you are done, click on the OK button.
  7. Click in the Notes
  8. Click on the ellipsis (…) button that appears. A dialog called Notes for price
  9. Click on the Add button and type a clever note.
  10. When you are done typing, click on the Submit button, and then click on the Close This note is now attached to the price variable.
  11. Click on the disclosure control to see the note you just typed in the Properties window.

To edit the properties of another variable, click on the variable in the Variables window. We can name the first variable make; the third, mpg; the fourth, weight; and the fifth, gear_ratio. Just before you rename var5 to gear_ratio, your screen should look like this:

You need to know some rules for variable names:

  • Stata is case sensitive.

Make, make, and MAKE are all different names to Stata. If you had named your variables Make, Price, MPG, etc., then you would have to type them correctly capitalized in the future. Using all lowercase letters is easier.

  • A variable name must be 1-32 characters long.
  • The characters can be letters (A-Z, a-z), digits (0-9), underscores (_), or Unicode characters that are not symbols.
  • Spaces or other characters are not allowed.
  • The first character of a variable name must be a letter, an underscore, or a Unicode character. Although you can use an underscore to begin a variable name, it is highly discouraged. Such names are used for temporary variable names in Stata. You would like your data to be permanent, so using a temporary name could lead to great frustration.

For more information about variable names and value labels, see [GSW] 9 Labeling data; for display formats, see [U] 12.5 Formats: Controlling how data are displayed.

6. Copying and pasting data

You can copy and paste data by using the Data Editor. This is often a simple way to bring data into Stata from any other applications such as spreadsheets or databases.

  1. Select the data that you wish to copy by using one of these means:
  • Click once on a variable name or column heading to select an entire column.
  • Click once on an observation number or row heading to select the entire row.
  • Click and drag the mouse to select a range of cells.
  1. Copy the data to the Clipboard by right-clicking within the selected range, and select Copy.
  2. Paste the data from the Clipboard by right-clicking on the top left cell of the area to which you wish to paste, and select Paste.

We will illustrate copying and pasting an observation by making a copy of the first observation and pasting it at the end of the dataset.

Start by clicking on the observation number of the first observation. Doing so highlights all the data in the row. Right-click on the same location (there is no need to move the mouse), and select Copy:

Click on the first cell in the eighth row, right-click while you are still in that cell, and choose Paste from the resulting menu. You can see that the observation was successfully duplicated.

7. Notes on copying and pasting

  • The above example illustrated copying and pasting within the Data Editor. You can use roughly the same technique to copy and paste between other applications and Stata and between Stata and other applications. The easiest way to see if copying and pasting works properly with another application is to try it. The one requirement for things to work well is that the external application must copy tables in some delimited form, as do spreadsheet applications, many database applications, and some word processors. Using Edit > Paste special… gives some added flexibility to the formats you can paste into the Data Editor. If a simple paste does not give you what you expected, you should try Edit > Paste special  For more information on file-based methods for importing data into Stata, see [GSW] 8 Importing data.
  • If you are copying and pasting data with value labels, you have a choice. You can copy variables with value labels as text, using the value labels as the actual values, or you can copy said variables as their underlying encoded numbers. Copying with the value labels is the default. If you would like the other choice, right-click in the Data Editor and select Data > Value labels > Hide all value labels, or select Tools > Value labels > Hide all value labels.

8. Changing data

As its name suggests, the Data Editor can be used to edit your dataset. As we have seen already, it can be used to edit the data themselves as well as the description and display options for the variables.

Here is an example for making some changes to the automobile dataset, which illustrates both methods for using the Data Editor and its documentation trail. We will also keep snapshots of the dataset as we are working so that we can revert to previous versions of the dataset in case we make a mistake.

We would like to investigate the dataset, work with value labels, delete the trunk variable, and make a new variable showing gas consumption per 100 miles. These tasks will illustrate the basics of working in the Data Editor.

Start by typing sysuse auto into the Command window. If you worked the previous example, you will get an error and are told that the dataset in memory has changed since it was last saved. This is good—Stata is keeping you from inadvertently throwing away the unsaved changes to your current dataset as it loads auto.dta. If you would like to save the dataset you have been working on, select File > Save and save the dataset in an appropriate location. Otherwise, type clear in the Command window, and press Enter to clear out the data, and then load that auto.dta.

Once auto.dta is loaded, start the Data Editor.

  1. We remember that our grandfather had a Toronado, which looked sleek but seemed to require a lot of fill-ups. We would like to see if this car is in the dataset. To find it, we select Edit > Find…, type Toronado, and press Enter. We see that this make of car got 16 miles per gallon.
  2. We would like to see which cars have the lowest and highest gas mileages. To do this, right-click on the column heading of the mpg Select Data > Sort data… from the contextual menu. A dialog will pop up asking how you want to sort, defaulting to sorting in ascending order. Click on OK. (Stata worries about sort order because sort order can affect reproducibility when using resampling techniques. This is a good thing.) You will see that the data have now been sorted by mpg in ascending order. The lowest-mileage cars are at the top of the screen; by scrolling to the bottom of the dataset, you can find the highest-mileage cars. You also could have sorted by selecting Tools > Sort data… once the mpg variable was selected.
  3. We would like to investigate repair records and hence sort by the rep78 (Do this now.) We see that the Starfire and Firebird both had poor repair records, but we would like to see the cars with good repair records. We could scroll to the bottom of the dataset, but it will be faster to use the Cursor Location box: type rep78 74 and press Enter to make rep78[74] the active cell. We notice that the last five entries for rep78 appear as dots. The dots mean that these values are missing. A few items of note:
    • As we can see from the result of the sort, Stata views missing values as being larger than all numeric nonmissing values. In technical terms, this means that rep78 >= . is equivalent to missing(rep78).
    • What we do not see here is that Stata has multiple missing-value indicators: . is Stata’s default or system missing-value indicator, and .a, .b, …, .z are Stata’s extended missing values. Extended missing values are useful for indicating the reason why a value is unknown.
    • The different missing values sort among themselves: . < .a < .b < ••• < .z. See [U] 12.2.1 Missing values for full details.
  1. We would like to make the repair records readable. Click on rep78 in the Variables window.
  2. Click on the Value label field in the Properties window, and then click on the ellipsis (…) button that appears. This opens the Manage value labels We need to define a new value label for the repair records.
    • Click on the Create label You will see the Create label dialog.
    • Type a name for the label, say, repairs, in the Label name
    • Press the Tab key or click within the Value field.
    • Type 1 for the value, press the Tab key, and type atrocious for the label.
    • Press the Enter key to create the pairing.
    • Repeat steps d and e to make all the pairings: 2 with “bad”, 3 with “OK”, 4 with “good”, and 5 with “stupendous”.
    • Click on the OK button to finish creating the value label.
    • Click on the disclosure control, ±J to show the label—you should see this:

If you have something else, you can edit the label by clicking on the Edit label button.

    • Click on the Close button to close the Manage value labels dialog.

Now that the label has been created, attach it to the rep78 variable by clicking on the down arrow in the Value label field and selecting the repairs label. You can now see the labels displayed in place of the values.

  1. Suppose that we found the original source of the data in a time capsule, so we could replace some of the missing values for rep78. We could type the values into cells. We can also assign the values by right-clicking within a cell with a missing value and choosing a value from Data > Value labels > Assign value from value label “repairs”. This strategy can be useful when a value label has many possible values.
  2. We would now like to delete the trunk variable. We can do this by right-clicking on the trunk variable name at the top of the column and selecting the Data > Drop selected data menu item. Because this can lead to data loss, the Data Editor asks whether we would like to drop the selected variable. Click on the Yes
  3. To finish up, we would like to create a variable containing the gallons of gasoline per 100 miles driven for each of the cars.
    • Right-click within any cell, and choose the Data > Add variable… menu item to bring up the generate
    • Type gp100m in the Variable name
    • Being sure that the Specify a value or an expression radio button is selected, type 100/mpg in its field. We could have clicked on the .. button to open the Expression Builder dialog, but this formula was simple enough to type. (You might want to explore the Expression Builder right now to see what it can do.)
    • Be sure that the Add at the end of dataset item is chosen from the Position of new variable
    • Click on OK. You can scroll to the right to see the newly created variable.

Throughout this data-editing session, we have been using the Data Editor to manipulate the data. If you look in the Results window, you will see the commands and their output. You can also see all the commands generated by the Data Editor in the History window. If you wanted to save the editing commands to use again later, you could do the following steps:

  1. Click in the History window on the last command that came from the Data Editor.
  2. Scroll up until you find the sort mpg command you ran immediately after opening the Data Editor, and Shift-click on it.
  3. Right-click on one of the highlighted commands.
  4. Select Send selected to Do-file Editor.

This procedure will save all the commands you highlighted into the Do-file Editor. You could then save them as a do-file, which you could run again later. We will talk more about the Do-file Editor in [GSW] 13 Using the Do-file Editor—automating Stata. You can find help about do-files in [U] 16 Do-files.

If you want to save this dataset, save it under a new name by using File > Save as… in the main Stata window to prevent overwriting the original dataset.

9. Working with snapshots

The Data Editor allows you to save to disk snapshots of whatever dataset you are working on. These are temporary copies of the dataset—they will be deleted when you exit Stata, so they need to be treated as temporary. Still, there are many uses for snapshots, such as

  • saving a temporary copy of the data in memory so that another dataset can be opened and viewed;
  • saving stages of work, which can be recovered in case you do something disastrous; and
  • saving pieces of datasets while doing analyses.

We will keep using auto.dta from above; if you are starting here, you can start fresh by typing sysuse auto in the Command window to open the dataset. (If you get a warning about data in memory being lost, either use clear or save your data. See [GSW] 5 Opening and saving Stata datasets for more information.) If we open the Data Editor and click on the Snapshots tab beneath the Variables window, we see the following window. If you are starting afresh, you will see numbers rather than labels for rep78.

To begin with, only one button is active in the Snapshots toolbar. Click on the active button—the

Add button, . It brings up a dialog asking for a label, or name, for the snapshot. Give it an inventive name, such as Start, and press Enter. You can see that a snapshot is now listed in the Snapshots window, and all the buttons in the toolbar are now active. The following buttons appear in the Snapshots window:

Add: Save a new snapshot with a timestamp and label.

Remove: Erase a snapshot. This action deletes the temporary snapshot file but does not affect the data in memory.

Change label: Edit the label of the selected (highlighted) snapshot.

Restore: Replace the data in memory with the data from the selected snapshot. You will get a dialog asking you to confirm your action.

You should now try manipulating the dataset by using the tools we have seen. Once you have done that, create another snapshot, calling it Changed. Open the Snapshots window and restore the Start snapshot by either double-clicking it or clicking first on it and then on the Restore button to see where you started. You can then go back to where you were working by restoring your Changed snapshot.

Snapshots continue to be available either until they are deleted or until you exit Stata. You can thus use snapshots of one dataset while working on another. You will find your own uses for snapshots—just take care to save the datasets you want for future use because the snapshots are temporary.

10. Dates and the Data Editor

The Data Editor has two special tools for working with dates in Stata. To see these in action, we will need to open another dataset. Either save your dataset or clear it out, and then type sysuse sp500 in the Command window. Look in the Data Editor to see what you have.

You can see a date variable that has January 2, 2001, as its first day, though it is being displayed in Stata’s default format for dates.

We will start with formatting:

  1. Select the date variable in the Variables window to the right of the data table.
  2. In the Properties window, select the Format row and click on the ellipsis button that appears.
  3. The Create format dialog tells us three pieces of information about the date format:
  • These are daily dates. As you can see, Stata understands other types of dates that are often used in financial data.
  • Looking at the bottom of the dialog, you can see that Stata’s default date format is %td. This means that the variable contains time values that are to be interpreted as daily dates.
  • This default format is displayed as, for example, 07apr2021.
  1. There are many premade date formats in the Samples pane at the top right of the Create format Click on April 7, 2021. You can see how the format would be specified at the bottom of the dialog.
  2. Click on OK to close the Create format You can see that the dates are now displayed differently.

This is a very simple way to change date formats. For complete information on dates and date formats, see [D] Datetime.

We will now change some of the dates to illustrate how this can be done simply, regardless of the format in which the dates are displayed. If you look in the upper-right corner of the Data Editor, you will see the Time/Date input mask field, which shows DMY. This field affects how dates are entered when editing data.

By default, the input mask is set to DMY. This means dates can be entered in many different fashions, as long as the order of the date components is day, month, year. Try the following:

  1. Click in the first observation of date so that the Cursor Location shows date[1].
  2. Type 18jan2021 and press the Enter Stata understands the DMY input mask and knows enough to enter the new date in the selected cell.
  3. Enter 30042021 and press Stata still understands the input mask, even though there are no separators.
  4. Click within the Time/Date input mask field, and choose MDY from the drop-down menu.
  5. Click on any observation in the date column.
  6. Type March 15, 2021 and press Enter. Stata will still understand.

Working in this fashion is the fastest way to edit dates by hand. If you look in the Results window, you will see why.

We are now finished with this dataset, so type clear and press Enter.

11. Data Editor advice

As you could see above, a small mistake in the Data Editor could cause large problems in your dataset. You really must take care in how you edit your data.

  • People who care about data integrity know that editors are dangerous—it is easy to accidentally make changes. Never use the Data Editor in edit mode when you just want to look at your data. Use the Data Editor in browse mode (or use the browse command).
  • If you must edit your data, protect yourself by limiting the dataset’s exposure. For example, if you need to change rep78 only if it is missing, find a way to look at just the missing values for rep78 and any other variables needed to make the change. This will make it impossible for you to change (damage) variables or observations other than those you view. We will explore this aspect shortly.
  • Even with these caveats, Stata’s Data Editor is safer than most because it records commands in the Results window. Use this feature to log your output and make a permanent record of the changes. Then you can verify that the changes you made are the changes you wanted to make. See [GSW] 16 Saving and printing results by using logs for information on creating log files.

12. Filtering and hiding

We would now like to investigate restricting our view of the data we see in the editor. This feature is useful for the reasons mentioned above, and as we will see, it helps if we would like to browse through the data of a large dataset. In any case, we would like to focus on some data, not all the data, whether we focus on some of the variables, some of the observations, or even just some observations within some variables. We would also like to change the order of the variables. We will show you how this is done by using both the graphical interface and commands.

Open the automobile dataset by typing sysuse auto. If you get an error message, type clear and try again. Once you have done that, open the Data Editor.

Suppose that we would like to edit only those observations for which rep78 is missing. We will need to look at the make of the car so that we know which observations we are working with, but we do not need to see any other variables. We will work as though we had a very large dataset to work with.

  1. Before we get started, try experimenting with the Variables window.
    • Drag variables up and down the list. Doing so changes the order of the variables’ columns in the Data Editor. It does not change their order in the dataset itself.
    • Uncheck some of the checkboxes in the first column to hide some of the variables.
    • Type a search criterion in the Filter variables here Just like in the Variables window in the main Stata window, the default is to ignore case and find any variables or variable labels containing any of the words in the filter. Clicking on the wrench on the left will allow you to change this behavior as well as to add or remove additional columns containing information about the variables. The filtering of variables in the list affects what is displayed in the Variables window; it does not affect what variables’ data are displayed. When you are done, delete your filter text.
  2. Right-click on any variable in the Variables window, and select Select all from the contextual menu.
  3. Click on any checkbox to deselect all the variables.
  4. Click on the make variable to select it, and deselect all the other variables.
  5. Click on the checkbox for make.
  6. Click on the checkbox for rep78.

If you look in the Command window, you can see that no commands have been issued, because hiding the variables does not affect the dataset—it affects only what shows in the Data Editor.

We now have protected ourselves by using only those variables that we need. We should now reduce our view to only those observations for which rep78 is missing. This is simple.

  1. Click on the Filter observations button, , in the Data Editor’s toolbar.
  2. Enter missing(rep78) in the Filter by expression
  3. Click on the Apply filter
  4. If you are curious, click on the ellipsis button. It opens up an Expression Builder This lists the wide variety of functions available in Stata. See the Stata Functions Reference Manual.

Now we are focused on the part of the dataset in which we would like to work, and we cannot destroy or mistakenly alter other data by stray keystrokes in the Data Editor window.

It is worth learning how to hide variables and filter observations in the Data Editor from the Command window. This can be quite convenient if you are going to restrict your view, as we did above. To work from the Command window, we must use the edit command together with a varlist (variable list) along with if and in qualifiers in the Command window. By using a varlist, we restrict the variables we look at, whereas the if and in qualifiers restrict the observations we see. ([GSW] 10 Listing data and basic command syntax contains many examples of using a command with a variable list and if and in.) Suppose we want to correct the missing values for rep78. The minimum amount of data we need to expose are make and rep78. To see this minimal amount of information and hence to minimize our exposure to making mistakes, we enter the commands

and we would see the following window:

Once again, we are safe and sound.

Keep this lesson in mind if you edit your data. It is a lesson well learned.

13. Browse mode

The purpose of using the Data Editor in browse mode is to look at data without altering them by stray keystrokes. You can start the Data Editor in browse mode by clicking on the Data Editor

(Browse) button, , or by typing browse in the Command window. When you work in browse mode, all contextual menu items that would let you alter the data, the labels, or any of the display formats for the variables are disabled. You may view a variable’s properties with the Properties menu item, but you may not make any changes. You still can filter observations and hide variables to get a restricted view because these actions do not change the dataset.

Note: Because you can still use Stata menus not related to the Data Editor and because you can still type commands in the Commands window, it is possible to change the data even if the Data Editor is in browse mode. In fact, this means you can watch how your commands affect the dataset. You are merely restricted from using the Data Editor itself to change the data.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Using the Variables Manager in Stata

1. The Variables Manager

This chapter discusses Stata’s Variables Manager. To get started, open the automobile dataset by typing sysuse auto, clear in the Command window. You open the Variables Manager by selecting Data > Variables Manager or clicking on the Variables Manager button .

The Variables Manager is a tool for managing properties of variables both individually and in groups. It can be used to create variable and value labels, rename variables, change display formats, and manage notes. It has the ability to filter and group variables as well as to create variable lists. Users will find these features useful for managing large datasets.

Any action you take in the Variables Manager results in a command being issued to Stata as though you had typed it in the Command window. This means that you can keep good records and learn commands by using the Variables Manager.

2. The Variable pane

The left pane of the Variables Manager is called the Variable pane, though it has no explicit title on the screen. It shows the list of variables in the dataset. This list can be manipulated in a variety of ways.

  • The variables can be filtered by entering text into the filter box in the upper-left corner. This can be a good way to zoom in on similarly named or labeled variables.
  • The list can be sorted by clicking on the column title.
    • If you click on a column title, it will sort in ascending order.
    • A second click on the same column title will change to sorting in descending order.
    • Clicking on the hash mark (#) restores the sort order to the original variable order.

The sort order affects only how the data appear in the Variable Managers window—the dataset itself stays the same.

  • The order of the columns can be changed by dragging the column titles. To restore the original column headings, right-click on the column titles and select Restore column defaults.
  • The variables can be grouped by values in one or more columns. This is done by dragging the column titles into the grouping bar. The grouping can be canceled by dragging the column titles back into the column titles row. Here is an example of auto.dta grouped by variable type:

3. Right-clicking on the Variable pane

Right-clicking on the Variable pane displays a menu from which you can do many common tasks:

  • Edit variable properties to change the focus to the Variable Properties pane. This will expose the Variable Properties pane if it has been automatically hidden.
  • Keep only selected variables to keep only the selected variables in the dataset and to drop all the others.
  • Drop selected variables to drop all the selected variables from the dataset.
  • Manage notes for selected variable… to open a window that allows adding and deleting notes for a single variable. This is disabled if multiple variables are selected.
  • Manage notes for dataset… to open a window that allows adding and deleting notes for the dataset as a whole.
  • Copy varlist to copy the names of the selected variables to the Clipboard.
  • Select all to select all visible variables. If a variable has become hidden because of the filter, it will not be selected.
  • Send varlist to Command window to insert the names of the selected variables in the Command window. Combined with grouping and sorting, this can be a useful way to create variable lists in large datasets.
  • Print.. to print the Variable pane. You can change the widths of the printed columns by changing the widths of the columns in the Variables Manager.

4. The Variable properties pane

The Variable Properties pane can be used to manipulate the properties of variables selected in the Variable pane. With one variable selected, you can manipulate all properties of the variable. With many variables selected, you can change their formats or types as well as assign value labels all at once. These fields work in the same fashion as those shown in Renaming and formatting variables in [GSW] 6 Using the Data Editor. We can also manage the notes Stata allows you to attach to variables and the dataset—we will show an example below.

The Variable Properties pane is, in actuality, a docking window, like those discussed in Auto Hide and pinning in [GSW] 2 The Stata user interface. You can see this because of the pushpin in its upper-right corner. If you click on the pushpin, the window will automatically hide when not in use. You can also dock the window in another part of the Variables Manager window by dragging it by its title bar to one of the docking guides.

5. Managing notes

Stata allows you to attach notes to both variables and the dataset as a whole. These are simple text notes that you can use to document whatever you like—the source of the dataset, data collection quirks associated with a variable, what you need to investigate about a variable, or anything else.

Start by selecting a variable in the Variable pane. We will work with the price variable. Click on the Manage… button next to the Notes field, and you will see the following dialog appear:

We will add a few notes:

  1. Click on the Add button to add a note.
  2. Type TS – started working. TS with a trailing space inserts a timestamp in the note.
  3. Add two more notes. We added two notes about prices:

It is worth experimenting with adding, deleting, and editing notes. Notes can be an invaluable memory aid when working on projects that last a long time. Anytime you manipulate notes in the Notes Manager, you create Stata commands.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Importing data in Stata

1. Copying and pasting

One of the easiest ways to get data into Stata is often overlooked: you can copy data from most applications that understand the concept of a table and then paste the data into the Data Editor. This approach works for all spreadsheet applications, many database applications, some word-processing applications, and even some webpages. Just copy the full range of data, paste it into the Data Editor, and everything will probably work well. You can even copy a text file that has the pieces of data separated by commas and then paste it into the Data Editor.

Suppose that your friend has a small dataset about some very old cars.

VW Rabbit,4697,25,1930,3.78

Olds 98,8814,21,4060,2.41

Chev. Monza,3667,,2750,2.73

,4099,22,2930,3.58

Datsun 510,5079,24,2280,3.54

Buick Regal,5189,20,3280,2.93

Datsun 810,8129,,2750,3.55

You would like to put these data into Stata. Doing so is easier than you think:

  1. Clear out your current dataset by typing clear.
  2. Copy the above data.
  3. Open the Data Editor in edit mode.
  4. Select Edit > Paste special….
  5. Stata sees that the column delimiters are commas and shows how the data would look.
  6. Click on the OK

You can see that Stata has imported the data nicely.

Later in this chapter, we would like to bring these data into Stata without copying and pasting, so we would like to save them as a text file. Go back to the main Stata window, and click on the Do-file

Editor button, , to open a new Do-file Editor window. Paste the data in the Do-file Editor, then click on the Save button. Navigate to your working directory, and save the file as a few cars. csv. If you do not know what your working directory is, look in the status bar at the bottom of the main Stata window.

Be careful if you are copying data from a spreadsheet because spreadsheets can contain special formatting that ruins its rectangular form. Be sure that your spreadsheet does not contain blank rows, blank columns, repeated headers, or merged cells because these can cause trouble. As long as your spreadsheet looks like a table, you will be fine.

2. Commands for importing data

Copying and pasting is a great way to bring data into Stata, but if you need a clear audit trail for your data, you will need another way to bring data into Stata. The rest of this chapter will explain how to do this. You will also learn methods that lend themselves better to repetitive tasks and methods for importing data from a wide variety of sources.

Stata has various commands for importing data. The three main commands for reading non-Stata datasets in text are

  • import delimited, which is made for reading text files created by spreadsheet or database programs or, more generally, for reading text files with clearly defined column delimiters such as commas, tabs, semicolons, or spaces;
  • infile, which is made for reading simple data that are separated by spaces or rigidly formatted data aligned in columns; and
  • infix, which is made for data aligned in columns but possibly split across rows.

Stata has other commands that can read other types of files and can even get data from external databases without the need for an interim file:

  • The import excel command can read Microsoft Excel files directly, either as an .xls or as an .xlsx
  • The import sas command can read native SAS files, so data can be transferred from SAS to Stata in this fashion.
  • The import spss command can read IBM SPSS Statistics files.
  • The import sasxport5 command can read version SAS V5 Transport files. The import sasxport8 command can read version SAS V8 Transport files.
  • The odbc command can be used to pull data directly from any data sources for which you have ODBC (Open Database Connectivity) drivers.
  • The jdbc command allows you to load data from a database, execute SQL statements on a database, and insert data into a database using JDBC (Java Database Connectivity) drivers.

Stata can import more formats; see [D] import for the full list.

Each command expects the file that it is reading to be in a specific format. This chapter will explain some of those formats and give some examples. For the full description, consult the Data Management Reference Manual.

3. The import delimited command

The import delimited command was developed to read in text files that were created by spreadsheet or database programs because these are common formats for sharing datasets on the Internet. All spreadsheet programs and most database applications have an option to save the dataset as a text file with the columns delimited with either tab characters or commas. Some of these programs also save the column titles (variable names, in Stata) in the text file.

To read in such a file, you have only to type import delimited filename, where filename is the name of the text file. The import delimited command will figure out what the delimiter character is (tab or comma) and what type of data is in each column. As always, if filename contains spaces, put double quotes around the filename, and include the path if filename is not in the current working directory.

By default, the import delimited command understands files that use the tab or comma as the column delimiter automatically. If you have a file that uses another character as the delimiter, use import delimited’s delimiters() option.

Earlier in this chapter, you saved a file called a few cars.csv in Copying and pasting. These data correspond to the make, price, MPG, weight, and gear ratio of a few very old cars. The variable names are not in the file (so import delimited will assign its own names), and the fields are separated by commas. Clear out any existing data, then use import delimited to read the data in this file. Because there are spaces in the filename, it must be enclosed in double quotes.

You can look at the data in the Data Editor, and it will look just like the earlier result from copying and pasting. We will now list the data so that we can see them in the manual. The separator(O) option suppresses the horizontal separator line that is drawn after every fifth observation by default.

If you want to specify better variable names, you can include the desired names in the command. When you specify variable names, you must also use the using keyword before the filename.

As a side note about displaying data, Stata listed gear_ratio as gear_r~o in the output from list. gear_r~o is a unique abbreviation for the variable gear_ratio. Stata displays the abbreviated variable name when variable names are longer than eight characters.

To prevent Stata from abbreviating gear-ratio, you could specify the abbreviate(10) option:

For more information on the ~ abbreviation and on list, see [GSW] 10 Listing data and basic command syntax.

We will use this dataset again in the next chapter, so we would like to save it. Type save afewcars, and press Enter in the Command window to save the dataset.

For this simple example, you could have copied the contents of the file and pasted it into the Data Editor by using Paste special… and choosing comma as the delimiter.

For text files that have no nice delimiters or for which observations could be spread out across many lines, Stata has two more commands: infile and infix. See [D] import for more information about how to read in such files.

4. Importing files from other software

Stata has some more specialized methods for reading data that were created by other applications and stored in their proprietary formats.

The import excel command is made for reading files created by Microsoft Excel. See [D] import excel for full details.

The import spss command is made for reading files created by IBM SPSS Statistics. See [D] import spss for full details.

The import sas command is made for reading files created by SAS. See [D] import sas for full details.

The import sasxport5 and import sasxport8 commands can read SAS V5 and SAS V8 Transport files. See [D] import sasxport5 and [D] import sasxport8 for full details.

If you have software that supports ODBC, you can read data by using the odbc command without the need to create interim files. See [D] odbc for full details.

The jdbc command allows you to connect to, load data from, insert data into, and execute queries on a database using JDBC. See [D] jdbc for full details.

Here is a brief summary of the choices:

  • If you have a Microsoft Excel .xls or .xlsx file, use import excel.
  • If you have an IBM SPSS Statistics .sav file, use import spss.
  • If you have a SAS .sas7bdat file created on a Windows machine, use import sas.
  • If you have a file exported from a spreadsheet or database application to a tab-delimited or CSV file, use import delimited.
  • If you have a fixed-format file, either use infile with a dictionary or use infix.
  • If you have a database accessible with ODBC, use odbc.
  • If you have a database accessible with JDBC, use jdbc.
  • If you have a SAS V5 Transport file, use import sasxport5.
  • If you have a SAS V8 Transport file, use import sasxport8.
  • If you have economic data from the Federal Reserve Data, use import fred.
  • If you subscribe to any Haver Analytics databases, use import haver.
  • If you have a dBASE file, use import dbase.
  • If you have a table, you could try copying it and pasting it into the Data Editor.

Finally, you can purchase a third-party transfer program that will convert the other software’s data file format to Stata’s data file format.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Labeling data in Stata

1. Making data readable

This chapter discusses, in brief, labeling of the dataset, variables, and values. Such labeling is critical to careful use of data. Labeling variables with descriptive names clarifies their meanings. Labeling values of numerical categorical variables ensures that the real-world meanings of the encodings are not forgotten. These points are crucial when sharing data with others, including your future self. Labels are also used in the output of most Stata commands, so proper labeling of the dataset will produce much more readable results. We will work through an example of properly labeling a dataset, its variables, and the values of one encoded variable.

2. The dataset structure: The describe command

At the end of The import delimited command in [GSW] 8 Importing data, we saved a dataset called afewcars.dta. We will put this dataset into a shape that a colleague would understand. Let’s see what it contains.

The data allow us to make some guesses at the values in the dataset, but, for example, we do not know the units in which the price or weight is measured, and the term “mpg” could be confusing for people outside the United States. Perhaps we can learn something from the description of the dataset. Stata has the aptly named describe command for this purpose (as we saw in [GSW] 1 Introducing Stata—sample session).

Though there is precious little information that could help us as a researcher, we can glean some information here about how Stata thinks of the data from the first three columns of the output.

  1. The Variable name is the name we use to tell Stata about a variable.
  2. The Storage type (otherwise known as the data type) is the way in which Stata stores the data in a variable. There are six different storage types, each having its own memory requirement:
    • For integers:

byte for integers between -127 and 100 (using 1 byte of memory per observation) int for integers between -32,767 and 32,740 (using 2 bytes of memory per observation) long for integers between -2,147,483,647 and 2,147,483,620 (using 4 bytes of memory per observation)

    • For real numbers:

float for real numbers with 8.5 digits of precision (using 4 bytes observation)

double for real numbers with 16.5 digits of precision (using 8 bytes observation)

    • For strings (text) between 1 and 2,045 bytes (using 1 byte of memory per character for ASCII and up to 4 bytes of memory per Unicode character):

str1 for 1-byte-long strings

str2 for 2-byte-long strings

str3 for 3-byte-long strings

…..

str2045 for 2,045-byte-long strings

    • Stata also has a strL storage type for strings of arbitrary length up to 2,000,000,000 bytes. strLs can also hold binary data, often referred to as BLOBs, or binary large objects, in databases. We will not illustrate these here.

Storage types affect both the precision of computations and the size of datasets. A quick guide to storage types is available at help data types or in [D] Data types.

  1. The Display format controls how the variable is displayed; see [U] 12.5 Formats: Controlling how data are displayed. By default, Stata sets it to something reasonable given the storage type.

We would like to make this dataset into something containing all the information we need.

To see what a well-labeled dataset looks like, we can look at a dataset stored at the Stata Press repository. We need not load the data (and disturb what we are doing); we do not even need a copy of the dataset on our machine. (You will learn more about Stata’s Internet capabilities in [GSW] 19 Updating and extending Stata—Internet functionality.) All we need to do is direct describe to look at the proper file by using the command describe using filename.

This output is much more informative. There are three locations where labels are attached that help explain what the dataset contains:

  1. In the first line, 1978 automobile data is the data label. It gives information about the contents of the dataset. Data can be labeled by selecting Data > Data utilities > Label utilities > Label dataset, by using the label data command, or by editing the Label field in the Data portion of the Properties window. When doing this in the main window, be sure that the Properties window is unlocked.
  2. There is a variable label attached to each variable. Variable labels are how we would refer to the variable in normal, everyday conversation. Here they also contain information about the units of the variables. Variables can be labeled by selecting the variable in the Variables window and editing the Label field in the Properties window. You can also change a variable label by using the Variables Manager or by using the label variable command.
  3. The foreign variable has an attached value label. Value labels allow numeric variables such as foreign to have words associated with numeric codes. The describe output tells you that the numeric variable foreign has value label origin associated with it. Although not revealed by describe, the variable foreign takes on the values 0 and 1, and the value label origin associates 0 with Domestic and 1 with Foreign. If you browse the data (see [GSW] 6 Using the Data Editor), foreign appears to contain the values “Domestic” and “Foreign”. The values in a variable are labeled in two stages. The value label must first be defined. This can be done in the Data Editor, or in the Variables Manager, or by selecting Data > Data utilities > Label utilities>Manage value labels or by typing the label define command. After the labels have been defined, they must be attached to the proper variables, either by selecting Data > Data utilities>Label utilities > Assign value label to variables or by using the label values command.

Note: It is not necessary for the value label to have a name different from that of the variable. You could just as easily have used a value label named foreign.

3. Labeling datasets and variables

We will now load the afewcars.dta dataset and give it proper labels. We will do this with the Command window to illustrate that it is simple to do in this fashion. Earlier in Renaming and formatting variables in [GSW] 6 Using the Data Editor, we used the Data Editor to achieve a similar purpose. If you use the Data Editor for the material here, you will end up with the same commands in your log; we would like to illustrate a way to work directly with commands.

4. Labeling values of variables

We will now add a new indicator variable to the dataset that is 0 if the car was made in the United States and 1 if it was made in another country. Open the Data Editor and use your previously gained knowledge to add a foreign variable whose values match what is shown in this listing:

You can create this new variable in the Data Editor if you would like to work along. (See [GSW] 6 Using the Data Editor for help with the Data Editor.) Though the definitions of the categories “0” and “1” are clear in this context, it still would be worthwhile to give the values explicit labels because it will make output clear to people who are not so familiar with antique automobiles. This is done with a value label.

We saw an example of creating and attaching a value label by using the point-and-click interface available in the Data Editor in Changing data in [GSW] 6 Using the Data Editor. Here we will do it directly from the Command window.

From this example, we can see that a value label is defined via

label define labelname # “contents” # “contents” .. .

It can then be attached to a variable via

label values variablename labelname

Once again, we need to save the dataset to be sure that we do not mistakenly lose the labels later. We saved this under a new filename because we have cleaned it up, and we would like to use it in the next chapter.

If you had wanted to define the value labels by using a point-and-click interface, you could do this with the Properties window in either the Main window or the Data Editor or by using the Variables Manager. See [GSW] 7 Using the Variables Manager for more information.

There is more to value labels than what was covered here; see [U] 12.6.3 Value labels for a complete treatment.

You may also add notes to your data and your variables. This feature was previously discussed in Renaming and formatting variables in [GSW] 6 Using the Data Editor and Managing notes in [GSW] 7 Using the Variables Manager. You can learn more about notes by typing help notes, or you can get the full story in [D] notes.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Listing data and basic command syntax in Stata

1. Command syntax

This chapter gives a basic lesson on Stata’s command syntax while showing how to control the appearance of a data list.

As we have seen throughout this manual, you have a choice between using menus and dialogs and using the Command window. Although many find the menus more natural and the Command window baffling at first, some practice makes working with the Command window often much faster than using menus and dialogs. The Command window can become a faster way of working because of the clean and regular syntax of Stata commands. We will cover enough to get you started; help language has more information and examples, and [U] 11 Language syntax has all the details.

The syntax for the list command can be seen by typing help list:

list [varlist] [if] [in] [, options]

Here is how to read this syntax:

  • Anything inside square brackets is optional. For the list command,
    1. varlist is optional. A varlist is a list of variable names.
    2. if is optional. The if qualifier restricts the command to run only on those observations for which the qualifier is true. We saw examples of this in [GSW] 6 Using the Data Editor.
    3. in is optional. The in qualifier restricts the command to run on particular observation numbers.
    4. , and options are optional. options are separated from the rest of the command by a comma.
  • Optional pieces do not preclude one another unless explicitly stated. For the list command, it is possible to use a varlist with if and
  • If a part of a word is underlined, the underlined part is the minimum abbreviation. Any abbreviation at least this long is acceptable.
  1. The l in list is underlined, so l, li, and lis are all equivalent to list.
    • Anything not inside square brackets is required. For the list command, only the command itself is required.

Keeping these rules in mind, let’s investigate how list behaves when called with different arguments. We will be using the dataset afewcarslab.dta from the end of the previous chapter.

2. List with a variable list

Variable lists (or varlists) can be specified in a variety of ways, all designed to save typing and encourage good variable names.

  • The varlist is optional for list. This means that if no variables are specified, it is equivalent to specifying all variables. Another way to think of it is that the default behavior of the command is to run on all variables unless restricted by a varlist.
  • You can list a subset of variables explicitly, as in list make mpg price.
  • There are also many shorthand notations:
    • m* means all variables starting with m.
    • price-weight means all variables from price through weight in the dataset order. ma?e means all variables starting with ma, followed by any character, and ending in e.
  • You can list a variable by using an abbreviation unique to that variable, as in list gear_r~o. If the abbreviation is not unique, Stata returns an error message.

3. List with if

The if qualifier uses a logical expression to determine which observations to use. If the expression is true, the observation is used in the command; otherwise, it is skipped. The operators whose results are either true or false are

<      less than

<=   less than or         equal

==   equal

>      greater than

>=   greater than        or equal

!=     not equal

&     and

|       or

!       not (logical          negation)

() parentheses are for grouping to specify order of evaluation

In the logical expressions, & is evaluated before | (similar to multiplication before addition in arithmetic). You can use this in your expressions, but it is often better to use parentheses to ensure that the expressions are evaluated in the proper order. See [U] 13.2 Operators for complete details.

In the listings above, we see more examples of Stata treating missing numerical values as large values, as well as the care that should be taken when the if qualifier is applied to a variable with missing values. See [GSW] 6 Using the Data Editor.

4. List with if, common mistakes

Here is a series of listings with common errors and their corrections. See if you can find the errors before reading the correct entry.

The error arises because “equal” is expressed by ==, not by =. Corrected, it becomes

Other common errors with logic:

Joint tests are specified with &, not with the word and or multiple ifs. The if qualifier should be if mpg==21 & weight>4000, not if mpg==21 if weight>4000. Here is its correction:

A problem with string variables:

Strings must be in double quotes, as in make==”Datsun 510″. Without the quotes, Stata thinks that Datsun is a variable that it cannot find. Here is the correction:

Confusing value labels with strings:

Value labels look like strings, but the underlying variable is numeric. Variable foreign takes on values 0 and 1 but has the value label that attaches 0 to “Domestic” and 1 to “Foreign” (see [GSW] 9 Labeling data). To see the underlying numeric values of variables with labeled values, use the label list command (see [D] label), or investigate the variable with codebook varname. We can correct the error here by looking for observations where foreign==0.

There is a second construction that also allows the use of the value label directly.

5. List with in

The in qualifier uses a numlist to give a range of observations that should be listed. numlists have the form of one number or first/last. Positive numbers count from the beginning of the dataset. Negative numbers count from the end of the dataset. Here are some examples:

6. Controlling the list output

The fine control over list output is exercised by specifying one or more options. You can use sepby() to separate observations by variable. abbreviate() specifies the minimum number of characters to abbreviate a variable name in the output. divider draws a vertical line between the variables in the list.

The separator() option draws a horizontal line at specified intervals. When not specified, it defaults to a value of 5.

7. Break

If you want to interrupt a Stata command, click on the Break button, X

It is always safe to click on the Break button. After you click on Break, the state of the system is the same as if you had never issued the original command.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Creating new variables in Stata

1. Generate and replace

This chapter shows the basics of creating and modifying variables in Stata. We saw how to work with the Data Editor in [GSW] 6 Using the Data Editor—this chapter shows how we would do this from the Command window. The two primary commands used for this are

  • generate for creating new variables. It has a minimum abbreviation of g.
  • replace for replacing the values of an existing variable. It may not be abbreviated because it alters existing data and hence can be considered dangerous.

The most basic form for creating new variables is generate newvar = exp, where exp is any kind of expression. Of course, both generate and replace can be used with if and in qualifiers. An expression is a formula made up of constants, existing variables, operators, and functions. Some examples of expressions (using variables from auto.dta) would be 2 + price, weight~2 or sqrt(gear_ratio).

The operators defined in Stata are given in the table below:

Stata has many mathematical, statistical, string, date, time-series, and programming functions. See help functions for the basics, and see the Stata Functions Reference Manual for a complete list and full details of all the built-in functions.

You can use menus and dialogs to create new variables and modify existing variables by selecting menu items from the Data > Create or change data menu. This feature can be handy for finding functions quickly. However, we will use the Command window for the examples in this chapter because we would like to illustrate simple usage and some pitfalls.

Stata has some utility commands for creating new variables:

  • The egen command is useful for working across groups of variables or within groups of observations. See [D] egen for more information.
  • The encode command turns categorical string variables into encoded numeric variables, while its counterpart decode reverses this operation. See [D] encode for more information.
  • The destring command turns string variables that should be numeric, such as numbers with currency symbols, into numbers. To go from numbers to strings, the tostring command is useful. See [D] destring for more information.

We will focus our efforts on generate and replace.

2. Generate

There are some details you should know about the generate command:

  • The basic form of the generate command is generate newvar = exp, where newvar is a new variable name and exp is any valid expression. You will get an error message if you try to generate a variable that already exists.
  • An algebraic calculation using a missing value yields a missing value, as does division by zero, the square root of a negative number, or any other computation which is impossible.
  • If missing values are generated, the number of missing values in newvar is always reported. If Stata says nothing about missing values, then no missing values were generated.
  • You can use generate to set the storage type of the new variable as it is generated. You might want to create an indicator (0/1) variable as a byte, for example, because it saves 3 bytes per observation over using the default storage type of float.

Below are some examples of creating new variables from the afewcarslab dataset, which we created in Labeling values of variables in [GSW] 9 Labeling data. (To work along, start by opening the automobile dataset with sysuse auto. We are using a smaller dataset to make shorter listings.) The last example shows a way to generate an indicator variable for cars weighing more than 3,000 pounds. Logical expressions in Stata result in 1 for “true” and 0 for “false”. The if qualifier is used to ensure that the computations are done only for observations where weight is not missing.

3. Replace

Whereas generate is used to create new variables, replace is the command used for existing variables. Stata uses two different commands to prevent you from accidentally modifying your data. The replace command cannot be abbreviated. Stata generally requires you to spell out completely any command that can alter your existing data.

Suppose that you want to create a new variable, predprice, which will be the predicted price of the cars in the following year. You estimate that domestic cars will increase in price by 5% and foreign cars, by 10%.

One way to create the variable would be to first use generate to compute the predicted domestic car prices. Then use replace to change the missing values for the foreign cars to their proper values.

Of course, because foreign is an indicator variable, we could generate the predicted variable with one command:

4. generate with string variables

Stata is smart. When you generate a variable and the expression evaluates to a string, Stata creates a string variable with a storage type as long as necessary, and no longer than that. where is a strl in the following example:

Stata has some useful tools for working with string variables. Here we split the make variable into make and model and then create a variable that has the model together with where the model was manufactured:

There are a few things to note about how these commands work:

  1. ustrpos(si , s2) produces an integer equal to the      first character in the string s1  at which the string s2 is found or 0 if it is not found. In this  example, ustrpos(make,” “)   finds the position of the first space in each observation of make.
  1. usubstr(s, start, len) produces a string of length len characters, beginning at character start of string s. If c1 = ., the result is the string from character start to the end of string s.
  2. Putting 1 and 2 together: usubstr(s,ustrpos(s,” “) + 1,.) will always give the string with its first word removed. Because make contains both the make and the model of each car, and make never contains a space in this dataset,        we       have found each car’s model.
  1. The operator “+”, when applied to string variables, will concatenate the strings (that is, join them together). The expression “this” + “that” results in the string “thisthat”. When the variable modelwhere was generated, a space (” “) was added between the two strings.
  1. The missing value for a string is nothing special—it is simply the empty string “”. Thus the value of modelwhere for the car with no make or model is ” D” (note the leading space).

If your strings might contain Unicode characters, use the Unicode versions of the string functions, as shown above. See [U] 12.4.2 Handling Unicode strings.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Deleting variables and observations in Stata

1. Clear, drop, and keep

In this chapter, we will present the tools for paring observations and variables from a dataset. We saw how to do this using the Data Editor in [GSW] 6 Using the Data Editor; this chapter presents the methods for doing so from the Command window.

There are three main commands for removing data and other Stata objects, such as value labels, from memory: clear, drop, and keep. Remember that they affect only what is in memory. None of these commands alter anything that has been saved to disk.

2. Clear and drop _all

Suppose that you are working on an analysis or a simulation and that you need to clear out Stata’s memory so that you can impute different values or simulate a new dataset. You are not interested in saving any of the changes you have made to the dataset in memory—you would just like to have an empty dataset. What you do depends on how much you want to clear out: at any time, you can have not only data but also metadata such as value labels, stored results from previous commands, and stored matrices. The clear command will let you carefully clear out data or other objects; we are interested only in simple usage here. For more information, see help clear and [D] clear.

If you type the command clear into the Command window, it will remove all variables and value labels. In basic usage, this is typically enough. It has the nice property that it does not remove any stored results, so you can load a new dataset and predict values by using stored estimation results from a model fit on a previous dataset. See help postest and [U] 20 Estimation and postestimation commands for more information.

If you want to be sure that everything is cleared out, use the command clear all. This command will clear Stata’s memory of data and all auxiliary objects so that you can start with a clean slate. The first time you use clear all while you have a graph or dialog open, you may be surprised when that graph or dialog closes; this is necessary so that Stata can free all memory being used.

If you want to get rid of just the data and nothing else, you can use the command drop _all.

3. Drop

The drop command is used to remove variables or observations from the dataset in memory.

  • If you want to drop variables, use drop
  • If you want to drop observations, use drop with an if or an in qualifier or both.

We will use the afewcarslab dataset to illustrate drop:

These changes are only to the data in memory. If you want to make the changes permanent, you need to save the dataset.

4. Keep

keep tells Stata to drop all variables except those specified explicitly or through the use of an if or in expression. Just like drop, keep can be used with varlist or with qualifiers but not with both at once. We use a clear command at the start of this example so that we can reload the afewcarslab dataset:

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Using the Do-file Editor—automating Stata

1. The Do-file Editor

Stata comes with an integrated text editor called the Do-file Editor, which can be used for many tasks. It gets its name from the term do-file, which is a file containing a list of commands for Stata to run (called a batch file or a script in other settings). See [U] 16 Do-files for more information. Although the Do-file Editor has advanced features that can help in writing such files, it can also be used to build up a series of commands that can then be submitted to Stata all at once. This feature can be handy when writing a loop to process multiple variables in a similar fashion or when doing complex, repetitive tasks interactively.

To get the most from this chapter, you should work through it at your computer. Start by opening

the Do-file Editor, either by clicking on the Do-file Editor button, , or by typing doedit in the Command window and pressing Enter.

2. The Do-file Editor toolbar

The Do-file Editor has 15 buttons. Many of the buttons share a similar purpose with their look-alikes in the main Stata toolbar.

If you ever forget what a button does, hover the mouse pointer over a button, and a tooltip will appear.

New: Open a new do-file in a new tab in the Do-file Editor.

Open: Open a do-file from disk in a new tab in the Do-file Editor. Save: Save the current file to disk.

Print: Print the contents of the Do-file Editor.

Find: Open the Find dialog for finding text.

Cut: Cut the selected text and put it in the Clipboard.

Copy: Copy the selected text to the Clipboard.

Paste: Paste the text from the Clipboard into the current document.

Undo: Undo the last change.

Redo: Undo the last undo.

Toggle bookmark: Turn on or off the bookmark on the current line. Bookmarks are a U:    way to move quickly within the do-file. They are quite useful in long do-files or when debugging.

Previous bookmark: Go to the previous bookmark (if any).

Next bookmark: Go to the next bookmark (if any).

Show file in Viewer: Show the contents of the do-file in a Viewer window. This is worthwhile when editing files that contain SMCL tags, such as log files or help files.

Execute (do): Run the commands in the do-file, showing all commands and their output. ► If text is highlighted, the button becomes the Execute selection (do) button and will run only the selected lines, showing all output. We will refer to this as the Do button.

3. Using the Do-file Editor

Suppose that we would like to analyze fuel usage for 1978 automobiles in a manner similar to what we did in [GSW] 1 Introducing Stata—sample session. We know that we will be issuing many commands to Stata during our analysis and that we want to be able to reproduce our work later without having to type each command again.

We can do this easily in Stata: simply save a text file containing the commands. When that is done, we can tell Stata to run the file and execute each command in sequence. Such a file is known as a Stata do-file; see [U] 16 Do-files.

To analyze fuel usage of 1978 automobiles, we would like to create a new variable containing gallons per mile. We would like to see how that variable changes in relation to vehicle weight for both domestic and imported cars. Performing a regression with our new variable would be a good first step.

To get started, click on the Do-file Editor button to open the Do-file Editor. After the Do-file Editor opens, type the commands below into the Do-file Editor. Purposely misspell the name of the foreign variable on the fifth line. (We are intentionally making some common mistakes and then pointing you to the solutions. This will save you time later.)

* an example do-file sysuse auto

generate gp100m = 100/mpg

label var gp100m “Gallons per 100 miles”

regress gp100m weight foreing

Here is what your Do-file Editor should look like now:

You will notice that the color of the text changes as you type. The different colors are examples of the Do-file Editor’s syntax highlighting. The colors and text properties of the syntax elements can be changed by selecting Edit > Preferences… from the Do-file Editor menu bar and then clicking on the Colors tab in the resulting window.

Syntax highlighting extends beyond highlighting Stata commands. You can switch the syntax highlighting from Stata by going to the Language menu and choosing the language you would like. The Language menu includes a selection for Markdown because Stata can process Markdown to create dynamic documents. See [RPT] dyndoc for more information. This menu also contains selections for Python and Java because Stata has both Python integration and Java integration. See [P] PyStata integration and [P] Java integration for more information. Stata will default to the proper language based on the extension of the file you are editing, but if the file has not been saved yet, you will need to tell it what language to choose.

Also note that if you pause briefly as you type, the Do-file Editor will allow autocompletion of words that are already in the do-file. Once the suggestions appear, more typing will narrow down the possibilities. You can navigate the suggestions using the up- and down-arrow keys or keep typing to narrow them to a single word. Once you have the word you like, pressing Enter will place the word in your do-file.

Click on the Do button, , to execute the commands. Stata executes the commands in sequence, and the results appear in the Results window:

The do “C:\ …” command is how Stata executes the commands in the Do-file Editor. Stata saves the commands from a do-file with unsaved changes to a temporary file and issues the do command to execute them. Everything worked as planned until Stata saw the misspelled variable. The first three commands were executed, but an error was produced on the fourth. Stata does not know of a variable named foreing. We need to go back to the Do-file Editor and change the misspelled variable name to foreign in the last line:


Click on the Do button again. Alas, Stata now fails on the first line—it will not overwrite the dataset in memory that we changed.

We now have a choice for what we should do:

  • We can put a clear command in our do-file as the very first command. This automatically clears out Stata’s memory before the do-file tries to load auto.dta. This is convenient but dangerous because it defeats Stata’s protection against throwing away changes without warning.
  • We can type a clear command in the Command window to manually clear the dataset and then process the do-file again. This process can be aggravating when building a complicated do-file.

Here is some advice: Automatically clear Stata’s memory while debugging the do-file. Once the do-file is in its final form, decide the context in which it will be used. If it will be used in a highly automated environment (such as when certifying), the do-file should still automatically clear Stata’s memory. If it will be used rarely, do not clear Stata’s memory. This decision will save much heartache. We will add a clear option to the sysuse command to automatically clear the dataset in Stata’s memory before the do-file runs:

The do-file now runs well, as clicking on the Do button shows:

You might want to select File > Save as… to save this do-file from the Do-file Editor. Later, you could select File > Open to open it and then add more commands as you move forward with your analysis. By saving the commands of your analysis in a do-file as you go, you do not have to worry about retyping them with each new Stata session. Think hard about removing the clear option from the first command.

After you have saved your do-file, you can execute the commands it contains by typing do filename, where the filename is the name of your do-file.

4. The File menu

The File menu of the Do-file Editor includes standard features found in most text editors. You may choose any of these menu items: create a New > Do-file, Open an existing file, Save the current file, save the current file under a new name with Save as…, or Print the current file. There are also buttons on the Do-file Editor’s toolbar that correspond to these features.

Finally, you can create a New > Project… to keep track of collections of files used in a project. These can be do-files, data files, graph files, or any other files you like. For more information on the Project Manager, see [P] Project Manager.

5. The Edit menu

The Edit menu of the Do-file Editor includes the standard Undo, Redo, Cut, Copy, Paste, Delete, and Find capabilities. There are also buttons on the Do-file Editor’s toolbar for easy access to these capabilities. There are several other Edit menu features that you might find useful:

  • You can select Insert file… to insert the contents of another file at the current cursor position in the Do-file Editor.
  • You can select the current line with Select line.
  • You can delete the current line with Delete line.
  • Find > Go to line… will allow you to jump to a specific line number. The line numbers are displayed at the left and the lower-right of the Do-file Editor window.
  • Advanced leads to a submenu with some programmer’s friends:
    • Shift right indents the selection by one tab.
    • Shift left unindents the selection by one tab.
    • Re-indent indents the selection according to its nesting within blocks and programs.
    • Toggle comment toggles //-style comments at the start of the selected lines.
    • Add block comment puts a /* before and a */ after the selected region, commenting it out.
    • Remove block comment undoes the above.
    • Make selection uppercase converts the selection to all capital letters.
    • Make selection lowercase converts the selection to all lowercase letters.
    • Complete word attempts to complete the current word based on words that are already in the do-file. If there are multiple possibilities, all will be shown. You can either pick the completion you would like or keep typing to narrow the choices.
    • Convert to UTF-8… converts the current file to UTF-8
    • Convert line endings to macOS/Unix format (\n) converts the line endings for the current file to macOS/Unix format.
    • Convert line endings to Windows format (\r\n) converts the line endings for the current file to Windows format.
    • Convert tabs to spaces replaces any tab characters with spaces, leaving the spacing as it currently appears.
    • Convert leading spaces to tabs converts any spaces at the start of lines to tab characters. The number of spaces per tab is determined by a preference setting.
    • Convert all spaces to tabs converts spaces to tab characters wherever possible. The number of spaces per tab is determined by a preference setting.

Matching and balancing of parentheses ( ), braces { }, and brackets [ ] are also available from the Edit menu. When you select Edit > Find > Match brace, the Do-file Editor looks at the character immediately to the left and right of the cursor. If either is one of the characters that the editor can match, the editor will find the matching character and place the cursor immediately in front of it. If there is no match, the cursor will not move.

When you select Edit > Find > Balance braces, the Do-file Editor looks to the left and right of the current cursor position or selection and creates a selection that includes the narrowest level of matching braces. If you select Balance braces again, the editor will expand the selection to include the next level of matching braces. If there is no match, the cursor will not move. Balancing braces is useful for working with blocks of code defined by loops or if commands. See [P] foreach, [P] forvalues, [P] while, and [P] if for more information.

Balance braces is easier to explain with an example. Type {now {is the} time} in the Do-file Editor. Place the cursor between the words is and the. Select Edit > Find > Balance braces. The Do-file Editor will select {is the}. If you select Balance braces again, the Do-file Editor will select {now {is the} time}.

Text in Stata strings can include Unicode characters and is encoded as UTF-8 (see [U] 12.4.2 Handling Unicode strings). However, you may have do-files, ado-files, or other text files that you used with Stata 13 or earlier, and those files contain characters other than plain ASCII such as accented characters, Chinese, Japanese, or Korean (CJK) characters, Cyrillic characters, and the like. If you open a file that is not encoded in UTF-8, Stata prompts you to specify the encoding for the file so that it can convert the file to UTF-8. If you cancel the conversion or choose the wrong encoding, you can try the conversion again later using Convert to UTF-8. The conversion to UTF-8 can be undone by using Edit > Undo and is not permanent until you save the do-file. For Stata datasets with characters not encoded in UTF-8 or for bulk conversion of multiple Stata files, you should use the Unicode translate command.

Editing tip: You can click on the left margin near a line number to select the entire line and the end-of-line characters. Doing so makes it easy to delete lines or cut lines and paste them elsewhere. You can click and drag within the line-number column to select a range of complete lines.

6. The View menu

The View menu of the Do-File Editor allows you to zoom in and out or display special characters such as tabs and line endings.

7. The Tools menu

You have already learned about the Do button. Selecting Tools > Execute (do) is equivalent to clicking on the Execute (do) button.

Selecting Tools > Execute (do) from top will send all the commands from the first line to the current line to the Command window. This method is a quick way to run a part of a do-file.

Selecting Tools > Execute (do) to bottom will send all the commands from the current line through the end of the contents of the Do-file Editor to the Command window. This method is a quick way to run a part of a do-file.

Selecting Tools > Execute quietly (run) is equivalent to Tools > Execute (do) but the commands will be executed qUietly; that is, no output will be displayed in the Command window.

Selecting Tools > Execute (include) is similar to clicking on the Execute (do) button with one major difference: local macros defined in the current session can be expanded in the commands being executed.

Do is equivalent to Stata’s do command, whereas Execute (include) is equivalent to Stata’s include command. See [U] 16 Do-files for a complete discussion.

You can also preview files in the Viewer by selecting Tools > Show file in Viewer or by clicking on the Show file in Viewer button, ? . This feature is useful when working with files that use Stata’s SMCL tags, such as when writing help files or editing log files.

8. Saving interactive commands from Stata as a do-file

While working interactively with Stata, you might decide that you would like to rerun the last several commands that you typed interactively. From the History window, you can send highlighted commands or even the entire contents to the Do-file Editor. You can also save commands as a do-file and open that file in the Do-file Editor. You can copy a command from a dialog (rather than submit it) and paste it into the Do-file Editor. See [GSW] 6 Using the Data Editor for details. Also see [R] log for information on the cmdlog command, which allows you to log all commands that you type in Stata to a do-file.

9. Navigating your do-file

When working with long files, bookmarks allow you to easily navigate through your do-file. By placing a bookmark before important sections in your do-file, you can return to those sections more easily later. You can add a bookmark line by using Edit > Toggle bookmark, clicking the Toggle bookmark button on the toolbar, or by manually typing a line beginning with a special comment, **#. All other text on the rest of the line is treated as the title of the bookmark. You cannot have ado code on the same line as the bookmark comment, or the bookmark comment will be ignored.

You can move between bookmarks using the options in the Edit menu, the equivalent buttons on the Do-file Editor toolbar, or the Navigation Control. The Navigation Control of the Do-file Editor allows you to move between bookmarks as well as programs that you have defined in your do-file. When you select a program or a bookmark from the Navigation Control, you will jump directly to the position of that program or bookmark in your do-file.

Bookmarks can also be removed with the Toggle bookmark option in the Edit menu or toolbar as well as simply deleting the line. You can also add and remove bookmarks by clicking in the bookmark margin next to the line you want to add the bookmark before. When you add a bookmark line, the bookmark icon will be added in the bookmark margin to make it more visibly obvious for you while scrolling.

10. Projects

For advanced users managing many files as part of a project, Stata has a Project Manager that uses the Do-file Editor. For more information on the Project Manager, see [P] Project Manager.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Graphing data in Stata

1. Working with graphs

Stata has a rich system for graphical representation of data. The main command for creating graphs is unsurprisingly named graph. Behind this plain name is a wealth of tools. In this chapter, we will make one simple graph to point out the basics of the Graph window. See the [G] Stata Graphics Reference Manual for more information about all aspects of working with graphs.

2. A simple graph example

In the sample session of [GSW] 1 Introducing Stata—sample session, we made a scatterplot, added a fitted regression line, and made a grid of scatterplots to allow comparisons across groups. Here, using the automobile dataset, we make a simple box plot that shows the displacements of the cars’ engines and how they compare across repair records within the place of manufacture of the cars. Start by loading the dataset by typing sysuse auto in the Command window and pressing Enter.

We select Graphics > Box plot, choose or type displacement in the Variables field on the Main tab, click on the Categories tab, check the Group 1 checkbox and enter rep78 for the first grouping variable, and check the Group 2 checkbox and enter foreign for the second grouping variable. Finally, we click on the Submit button so that we could easily make changes to the graph if need be. After we look at the graph, we realize that we forgot the title. We close the Graph window, click on the Titles tab of the graph box dialog, type the title Displacement across repairs within origin, and click on the Submit button again.

The Graph window comes up, showing us our nicely titled graph:

Graph window

When the Graph window comes up, it shows our graph in a window with a toolbar. The first four buttons are familiar to us from other Stata windows: Open, Save, Print, and Copy. The next two buttons are new:

Rename graph: This button allows the graph to be renamed. Why would you do this? If you would like to have multiple graphs open at once, the graphs need to be named. So you can click on the Rename graph button to give a graph a name. This graph will then remain open when you create your next graph.

Graph Editor: Stata has a Graph Editor that allows you to manipulate and edit your graph. This feature will be introduced in the next chapter.

The inactive buttons to the right of the Graph Editor button are used by the Graph Editor, so their meanings will become clear in the next chapter.

We decide that we like this graph and would like to save it. We can save it either by clicking on the Save button and choosing a name and a location or by right-clicking on the Graph window itself and selecting Save as

3. Saving and printing graphs

You can save a graph once it is displayed by right-clicking on its window and selecting Save as… You can print a graph by right-clicking on its window and selecting Print    You can also use the File menu to save or print a graph. We recommend that you always right-click on a graph to save or print it to ensure that the correct graph is selected.

4. Right-clicking on the Graph window

Right-clicking on the Graph window displays a menu from which you can select the following:

  • Save as… to save the graph to disk.
  • Copy to copy the graph to the Clipboard.
  • Start Graph Editor to start the Graph Editor.
  • .. to edit the preferences for graphs.
  • .. to print the graph.

5. The Graph button

The Graph button, , is located on the main window’s toolbar. The button has two parts, an icon and an arrow. Clicking on the icon brings the topmost Graph window to the front of all other windows. Clicking on the arrow displays a menu of open graphs. Selecting a graph from the menu brings that graph to the front of all other windows. If you close the Graph window, you can reopen it only by reissuing a Stata command that draws a new graph.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Editing graphs in Stata

1. The Graph Editor

With Stata’s Graph Editor, you can change almost anything on your graph; you can add text, lines, arrows, and markers wherever you like.

We will first make a graph to edit and will then point out the tools in the Graph Editor. Start by opening the automobile dataset: sysuse auto. Here is the command that we will use to make the graph:

Start the Editor by right-clicking on your graph and selecting Start Graph Editor. Click once on the title of the graph. Here is a picture of the Graph Editor with its elements labeled.

Select any of the tools along the left of the Graph Editor window to edit the graph. The Pointer (Select tool), , is selected by default.

You can change the properties of objects or drag them to new locations by using the Pointer. As you select objects with the Pointer, a Contextual Toolbar will appear just above the graph. In the above example, the title of the graph is selected, so the Contextual Toolbar has controls that are relevant for editing titles. You can use any of the controls on the Contextual Toolbar to immediately change the most important properties of the selected object. Right-click on an object to access more properties and operations. Hold the Shift key when dragging objects to constrain the movement to horizontal or vertical directions.

Add text, lines, or markers (with optional labels) to your graph by using the three Add… tools— T ,     \ , and . Lines can be changed to arrows by using the Contextual Toolbar. If you do not like the default properties, simply change their settings in the Contextual Toolbar before adding the text, line, or marker. The new settings will then be applied to all added objects, even in future Stata sessions.

Do not be afraid to try things. If you do not like a result, change it back by using the same tool or by clicking on the Undo button, , in the Standard Toolbar for the Graph Editor (below the main menu). Edit > Undo in the main menu does the same thing.

Remember to reselect the Pointer tool when you want to drag objects or change their properties.

You can move objects on the graph and have the rest of the objects adjust their position to accommodate the move with the Grid edit tool, . With this tool, you are repositioning objects in the underlying grid that holds the objects in the graph. Some graphs, for example, by graphs, are composed of nested grids. You can reposition objects only within the grid that contains them; they cannot be moved to other grids.

You can also select objects in the Object Browser along the right of the graph. This window shows a hierarchical listing of the objects in the graph. Clicking or right-clicking on an object in the Object Browser is the same as clicking or right-clicking on the object in the graph.

The Graph Editor has the ability to record your actions and play them back on later graphs. When you click on the Start recording button, , every editing action you take, including undos and redos, is recorded. If you would like to do some editing that is not recorded, you can click on the

Pause recording button, . You can click on the Pause recording button again to resume recording. When you are done with your recording, click on the Start recording button. You will be prompted

to save your recording. Any recording you save is available from the Play recording button, ► , and may be applied to future graphs. You can even play a recording in any Stata graph command by using the play option. See Graph Recorder in [G-1] Graph Editor for more information.

Stop the editor by selecting File > Stop Graph Editor from the main menu or by clicking on the Graph Editor button. When you stop the Graph Editor, you will be prompted to save your graph if you have made any changes. If you do not save your graph, your changes will not be lost, but you will risk losing them if you create a new graph in the same Graph window. You must stop the Editor if you would like to work on other tasks in Stata.

Here are a few of the things that you can do with the Editor:

  • Add annotations using lines, arrows, and text.
  • Add or remove grid lines or reference lines.
  • Add or modify titles, captions, and notes.
  • Change scatterplots to line plots, connected plots, areas, bars, spikes, or drop lines—and, of course, vice versa.
  • Change the size, color, margin, and other properties of your graph’s titles (or any other text on the graph).
  • Move your legend to another side of the graph, or even place it in the plot region.
  • Change the aspect ratio of your graph.
  • Stack the bars on a bar graph or turn them into percentages.
  • Rotate or change the angle of axis labels.
  • Add custom ticks and labels to the axes.
  • Change the rule for the number and spacing of ticks and labels on an axis.
  • Emphasize a point on the graph, whether marker, bar, spike, or other plot, by making it a custom color, size, or symbol.
  • Change the text or properties of a marker label.

Because you can edit every property of every object on the graph, you can change almost anything about your graph. To learn more, see [G-1] Graph Editor or type help graph editor.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Saving and printing results by using logs in Stata

1. Using logs in Stata

When you work on an analysis, it is worthwhile to behave like a bench scientist and keep a lab notebook of your actions so that your work can be easily replicated. Everyone has a feeling of complete omniscience while working intensely—this feeling is wonderful but fleeting. The next day, the exact small details needed for perfect duplication have become obscure. Stata has a lab notebook at hand: the log file.

A log file is simply a record of your Results window. It records all commands and all textual output as it happens. Thus it keeps your lab notebook for you as you work. Because it writes the file to disk while it writes the Results window, it also protects you from disastrous failures, be they power failures or computer crashes. We recommend that you start a log file whenever you begin any serious work in Stata.

2. Logging output

All the output that appears in the Results window can be captured in a log file. Stata can save the file in one of two different formats. By default, Stata will save the file in its Stata Markup and Control Language (SMCL) format, which preserves all the formatting and links from the Results window. You can open these results in the Viewer, and they will behave as though they were in the Results window. If you would rather have plain-text files without any formatting, you can save the file as a plain log file. We recommend using the SMCL format because SMCL files can be translated into a variety of formats readable by applications other than Stata with the File > Log > Translate… menu (see [R] translate).

To start a log file, click on the Log button, . This will open a standard file dialog that allows you to specify a directory and filename for your log. If you do not specify a file extension, the extension .smcl will be added to the filename. If you specify a file that already exists, you will be asked whether you want to append the new log to the file or overwrite the file with the new log.

3. Working with logs

Log files are best viewed using Stata’s Viewer. Select File > Log > View…. If there is a log file open (as shown by the status bar), it will be the default log file to view; otherwise, you need to either type the name of the log file into the dialog or click on the Browse… button to find the file with a standard file dialog.

Once you are in the Viewer window, everything behaves as expected: you can copy text and paste between the Viewer and anything else that uses text, such as word processors or text editors. You can even paste into the Command window or the Do-file Editor, but you should take care to copy only commands, not their output. It is okay to copy the prompt (“. ”) at the start of the echoed command because Stata is smart enough to ignore it in the Command window. When working with a word processor, what you paste will be unformatted text; it will look best if you use a fixed-width font, like Courier, to display it.

Viewing your current log file is a good way to keep a reminder of something you have already done or a view of a previous result. The Viewer window takes a snapshot of your log file and hence will not scroll as you keep working in Stata. If you need to see more recent results in the Viewer, click on the Reload page button.

For more detailed information about logs, see [U] 15 Saving and printing output—log files and [R] log. For more information about the Viewer, see [GSW] 3 Using the Viewer.

4. Printing logs

To print a standard SMCL log file, you need to have the log file open in a Viewer window. Once the log file is in the Viewer, you can click on the Print button, right-click on the Viewer window, and select Print…, or select File > Print…. A Print dialog will appear. After you click on Print, a Print settings dialog will appear.

  • You can fill in none of, any of, or all the items Header, Name, and You can check or uncheck options to Print line numbers, Print header, and Print logo. These items are saved and will appear again in the print sheet Print settings (in this and in future Stata sessions).
  • You can set the font, margins, and color scheme that the printer will use by clicking on .. in the Print settings dialog to open the Printer preferences dialog. Monochrome is for black-and-white printing, Color is for default color printing, and Custom 1 and Custom 2 are for customized color printing. You can set the font by clicking on the Font… button. The resulting Font dialog will list only the fixed-width “typewriter” fonts (for example, Courier) available for your printer.

You could also use the translate command to generate a PostScript or PDF version of the log file. See [R] translate for more information.

If your log file is a plain-text file (.log instead of .smcl), you can open it in a text editor, such as Notepad, in the Do-file Editor or in your favorite word processor. You can then edit the log file—add headings, comments, etc.—format it, and print it. If you bring the log file into a word processor, it will be displayed and printed with its default font. The log file will not be easily readable when printed in a proportionally spaced font (for example, Times New Roman or Georgia). It will look much better printed in a fixed-width font (for example, Courier New).

You may wish to associate the .log extension with a text editor (such as Notepad or WordPad) in Windows. You can then edit and print the logs from those Windows applications if you like.

5. Rerunning commands as do-files

Stata also can log just the commands from a session without recording the output. This feature is a convenient way to make a do-file interactively. Such a file is called a cmdlog file by Stata. You can start a cmdlog file by typing

cmdlog using filename.do and you can close the cmdlog file by typing cmdlog close

Here, for example, is what a cmdlog of the previous session would look like. It contains only commands and comments and hence could be used as a do-file.

If you start working and then wish you had started a cmdlog file, you can save yourself heartache by saving the contents of the History window. The History window stores the last 5,000 commands you have typed. Simply right-click on the History window and select Save all… from the menu. This will work best if you first filter out all the commands that resulted in errors as was shown in The History window in [GSW] 2 The Stata user interface. If you would like to move the commands directly to the Do-file Editor, select Select all followed by Send selected to Do-file Editor. You may find this method a more convenient way to create a text file containing only the commands that you typed during your session.

See [GSW] 13 Using the Do-file Editor—automating Stata, [U] 16 Do-files, and [U] 15 Saving and printing output—log files for more information.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Setting font and window preferences in Stata

1. Changing and saving fonts and sizes and positions of your windows

You may find that you would like to change the fonts and display style of Stata’s windows, depending on your monitor resolution and personal preferences. At the same time, there could be requirements for font usage, say, when you submit graphs to journals. Stata accommodates both of these by allowing sets of preferences for how windows are displayed.

We will first cover what can be changed in each window and then talk about what you can manage with your preferences.

2. Graph window

The preferences for the Graph window can be changed by right-clicking on the Graph window and choosing Preferences… from the contextual menu. The settings can then be set for how graphs are displayed in Stata. The settings that should be used when printing can be set under the Printer tab. The behavior of the Clipboard is controlled under the Clipboard tab.

The Graph preferences allow different schemes that control the look of graphs. These schemes provide a quick way to optimize graphs for printing or to display on a screen. There are even schemes defined for The Economist and the Stata Journal so that you can get the details for these publications right without much fuss. Changing the scheme does not change the current graph—it applies the settings to future graphs.

3. All other windows

You can change the display font and font size for most types of windows in Stata.

If fonts and font sizes for a window can be changed, they can be changed by right-clicking on the window and selecting Font… from the contextual menu. Doing so will bring up the Font dialog, from which you can pick the font and size of your choice. The font lists for each of the Results, Viewer, Data Editor, and Do-file Editor windows are restricted to fixed-width fonts only. This restriction ensures that output and numbers line up properly and are readable. The other windows can have any font that you would like without any adverse consequences.

4. Changing color schemes

In addition to changing the fonts themselves, you can also change the background and foreground colors of text being displayed. In the Do-file Editor, you can choose the colors for syntax highlighting, allowing, say, Stata commands to be displayed in a different color from arbitrary text. You can control the overall color scheme by selecting a scheme in the General tab of the General preferences.

The Results and Viewer windows have color schemes that control the display of input, text, results, errors, links, and highlighted text. Each has its color scheme set in the same fashion: you can right-click on the window and select or design your own color scheme. The default setting for both the Results window and the Viewer is the built-in Standard scheme, which uses a white background and dark text. There are other built-in schemes as well as slots for custom schemes. The settings for the Viewer affect all Viewer windows at once. Choosing an overall scheme from the General tab will reset all custom settings to the settings determined by that scheme.

5. Managing multiple sets of preferences

Stata’s preferences are automatically saved when you exit Stata, and they are reloaded when Stata is launched. However, sometimes you may wish to rearrange Stata’s windows and then revert to your preferred arrangement of windows. You can do this by saving your preferences to a named preference set and loading them later. Any changes you make to Stata’s preferences after loading a preferences set do not affect the set; the set remains untouched unless you specifically overwrite it.

To manage preferences, open the Edit > Preferences menu, and do any of the following:

  • Select a preference set from the Load preference set menu to load it. Several different preference sets come installed with Stata, some meant for small screens, others meant for giving presentations involving Stata. They are worth a look.
  • Select Save preference set > New preference set… to save the current preferences to a set. Enter a name for the set, and click on OK.
  • Select an existing set from the Save preference set menu to overwrite it with the current preferences.
  • Select a preference set from the Delete preference set menu to delete it. Click on OK to verify that you wish to delete the set.

6. Closing and opening windows

You can close all windows but the Results and Command windows. If you want to open a closed window, open the Window menu and select the desired window.

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.