Deleting variables and observations in Stata

1. Clear, drop, and keep

In this chapter, we will present the tools for paring observations and variables from a dataset. We saw how to do this using the Data Editor in [GSW] 6 Using the Data Editor; this chapter presents the methods for doing so from the Command window.

There are three main commands for removing data and other Stata objects, such as value labels, from memory: clear, drop, and keep. Remember that they affect only what is in memory. None of these commands alter anything that has been saved to disk.

2. Clear and drop _all

Suppose that you are working on an analysis or a simulation and that you need to clear out Stata’s memory so that you can impute different values or simulate a new dataset. You are not interested in saving any of the changes you have made to the dataset in memory—you would just like to have an empty dataset. What you do depends on how much you want to clear out: at any time, you can have not only data but also metadata such as value labels, stored results from previous commands, and stored matrices. The clear command will let you carefully clear out data or other objects; we are interested only in simple usage here. For more information, see help clear and [D] clear.

If you type the command clear into the Command window, it will remove all variables and value labels. In basic usage, this is typically enough. It has the nice property that it does not remove any stored results, so you can load a new dataset and predict values by using stored estimation results from a model fit on a previous dataset. See help postest and [U] 20 Estimation and postestimation commands for more information.

If you want to be sure that everything is cleared out, use the command clear all. This command will clear Stata’s memory of data and all auxiliary objects so that you can start with a clean slate. The first time you use clear all while you have a graph or dialog open, you may be surprised when that graph or dialog closes; this is necessary so that Stata can free all memory being used.

If you want to get rid of just the data and nothing else, you can use the command drop _all.

3. Drop

The drop command is used to remove variables or observations from the dataset in memory.

  • If you want to drop variables, use drop
  • If you want to drop observations, use drop with an if or an in qualifier or both.

We will use the afewcarslab dataset to illustrate drop:

These changes are only to the data in memory. If you want to make the changes permanent, you need to save the dataset.

4. Keep

keep tells Stata to drop all variables except those specified explicitly or through the use of an if or in expression. Just like drop, keep can be used with varlist or with qualifiers but not with both at once. We use a clear command at the start of this example so that we can reload the afewcarslab dataset:

Source: STATA (2021), Getting Started with Stata for Windows, Stata Press Publication.

Leave a Reply

Your email address will not be published. Required fields are marked *