Data Analysis Using STATA: Use and Interpretation

Stata is a general-purpose statistical software package created in 1985 by StataCorp. The name Stata is a syllabic abbreviation of the words statistics and data. The FAQ for the official forum of Stata insists that the correct English pronunciation of Stata “must remain a mystery”; any of “Stay-ta”, “Sta-ta” or “Stah-ta” (rhymes of the three pronunciations of ‘data’) are considered acceptable. More recent updates indicate that Stata employees pronounce it /ˈsttə/.

Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine, and epidemiology. According to StataCorp (2016), Stata is “a complete, integrated statistical software package that provides everything you need for data analysis, data management, and graphics”.  Basically, Stata is a software that allows you to store and manage data (large and small data sets), undertake statistical analysis on your data, and create some really nice graphs.

Stata allows you to write code or use menus to perform your analysis. Stata has two primary menu tabs: Graphics and Statistics. Within “Statistics” there are twenty-one sub tabs and numerous tabs within those tabs. Within “Graphics” there are twenty-one tabs as well (you think the people at Stata like to play Black Jack?). I admit, this can be a bit daunting and time consuming if you are trying to find a specific function.  But wait, there’s more. In the command box you can type “help” and what you are looking for. The key point here is I don’t waste time trying to find what I’m looking for. After running the command through the menus method the code shows up in the “Review” box.  I can then copy and paste the command into a “do-file”. A “do-file” is the text document that allows you to submit more than one command to Stata at once.

Stata allows you to have more than one do-file opened at a time. This is a big plus because it makes it easy to copy and paste from other project do-files into the current do-file.  Using do-files is significantly quicker than using the menus if you have created template do-files, especially for creating graphs. There are so many options for creating a graph. It takes less than a minute to copy from a template and paste the commands into your current project. Stata is extremely efficient running repetitive analysis when incorporating macros and loops in a do-file. This sounds like it may be difficult but it’s not.

How well is Stata supported by Stata Corp? On average Stata sends out updated files every two months with new features and/or any fixes to reported glitches. The reference guide for Stata 13 is 281 pages filled with examples and links to the data sets used in the examples. The professional community also provides incredible support. Stata allows third party written commands (also known as modules) to be imported into the software. The website http://ideas.repec.org/s/boc/bocode.html is a warehouse for hundreds of third party written commands which have been tested before made public. Running a search for “logistic” returned 128 results. The bottom line is Stata will run every analysis that the other major statistical packages can, if not more. It is a very efficiently organized program to learn to use.  Third party professionals are continuously offering new functions. Stata adds new features without charging a “new” version fee. All this and the added bonus are reasonably priced and has no add-on charges.

There are four major builds of each version of Stata:

  • Stata/MP for multiprocessor computers (including dual-core and multicore processors)
  • Stata/SE for large databases
  • Stata/IC, which is the standard version
  • Numerics by Stata, supports any of the data sizes listed above in an embedded environment

Small Stata, which was the smaller, student version for educational purchase only, is no longer available. This software is commonly used among health researchers, particularly those working with very large data sets, because it is a powerful software that allows you to do almost anything you like with your data.

It’s important to note that Stata is not the only statistical software – there are many others that you may come across if you pursue a career that requires you to work with data. Some of the other common statistical packages include SPSS and SAS (yes, they all start with ‘s’!).  The focus for this session, however, is on Stata.