Introduction to STATA

Stata is a general-purpose statistical software package created in 1985 by StataCorp. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine, and epidemiology. According to StataCorp (2016), Stata is “a complete, integrated statistical software package that provides everything you need for data analysis, data management, and graphics”.  Basically, Stata is a software that allows you to store and manage data (large and small data sets), undertake statistical analysis on your data, and create some really nice graphs.

Stata’s capabilities include data management, statistical analysis, graphics, simulations, regression, and custom programming. It also has a system to disseminate user-written programs that lets it grow continuously.

The name Stata is a syllabic abbreviation of the words statistics and data. The FAQ for the official forum of Stata insists that the correct English pronunciation of Stata “must remain a mystery”; any of “Stay-ta”, “Sta-ta” or “Stah-ta” (rhymes of the three pronunciations of ‘data’) are considered acceptable. More recent updates indicate that Stata employees pronounce it /ˈsttə/.

There are four major builds of each version of Stata:

  • Stata/MP for multiprocessor computers (including dual-core and multicore processors)
  • Stata/SE for large databases
  • Stata/IC, which is the standard version
  • Numerics by Stata, supports any of the data sizes listed above in an embedded environment

Small Stata, which was the smaller, student version for educational purchase only, is no longer available.

This software is commonly used among health researchers, particularly those working with very large data sets, because it is a powerful software that allows you to do almost anything you like with your data.

It’s important to note that Stata is not the only statistical software – there are many others that you may come across if you pursue a career that requires you to work with data.  Some of the other common statistical packages include SPSS and SAS (yes, they all start with ‘s’!).  The focus for this session, however, is on Stata.