Software for Content Analysis

This appendix describes computer software that may be useful to content analysis. The list of programs here is by no means complete, and it is purely descriptive, not a GAO endorsement of any program. The descriptions focus on features of the software
that are necessary or optional for use in content analysis; they do not refer to other features that are not relevant to content analysis.

The content analyst must carry out several of these six functions:

Edit: generate and edit recorded information, including the creation of ASCII files.

Code: mark recording units and attach category codes.

Search: identify specific words, phrases, and categories.

Count: count the number of specific words, phrases, or categories in each recording unit.

Retrieve: retrieve specific words, phrases, or categories.

Export: create a computer file for analysis by statistical packages.

Therefore, the software in table II.I is described in this appendix primarily in regard to these functions. The table is organized so that the software with the greatest number of features is at the top, the least at the bottom.

1. askSam

askSam was designed not for content analysis but as a general purpose database manager that can handle structured and unstructured qualitative and quantitative data. 1 This description of its features is based on askSam version 2.0a for Windows. askSam has been used in several GAO projects that involved the analysis of large amounts of textual information, including (1) transcripts of focus group discussions; (2) structured interviews consisting of 100 questions asked of 200 persons, several of the
questions being open-ended; (3) a COBOL database transformed into_ an askSam database consisting of thousands of records, each including one open-ended free text field; and ( 4) an automated version of the GAO open recommendations report.2

Text to be coded could be prepared on a word processor and converted to an ASCII file and then imported to askSam. However, askSam can import information directly in a variety of formats such as dBase and WordPerfect (5.x and 6.0). The program’s built-in word processor is relatively flexible and can be used to enter data.

Text passages can be coded from within askSam’s word processor by text-editing. That is, while the text is displayed on the screen, a code is typed in at the beginning of the passage and a single character is placed at the end of the passage. A form of automatic coding is also available; a selected character that appears in the raw text, a colon for example, can serve as a code, or field character. The text that follows that code, on the same line, can be analyzed as a coded passage.

The program has strong search capabilities for words (including codes) and phrases. Words and phrases can be counted, thus providing the basis for content analysis. The full texts for all instances of a code can also be retrieved and displayed on the screen or printed. There is no simple way to export the results of code counts to statistical programs for further analysis.

askSam’s great versatility makes it harder to learn and somewhat more awkward to use than some of the more specialized programs such as AQUAD and Textbase Alpha

2. Textbase Alpha

Textbase Alpha was developed for the qualitative analysis of data from interviews. Although not designed for content analysis, it has some numeric analysis features, and it can produce an output file that SPSS can use directly for categorical data analysis.

Text to be coded is prepared on a word processor and converted to an ASCII file. A separate data file is created for each document. Supplementary data, such as identifiers and demographic variables may be added at this time.

In coding, the analyst moves the cursor to mark the beginning and end of a recording unit and then keys the code so that it appears in a special data entry box at the bottom of the screen. The program also includes a prestructured coding feature in which the paragraph format of the text (prepared in the word processor) leads to a form of automatic coding. This may be especially useful for handling the responses to interviews whose paragraph-like structure corresponds to a series of questions.

Textbase Alpha has flexible procedures for text retrieval by code. A search may be made across all documents or only selected ones (for example, only Hispanic respondents if ethnicity has been added as a demographic variable). The results of searching text passages are saved in an ASCII file, which can be viewed on screen or imported into a word processor for editing.

The frequency of some or all codes can be counted, with the results also stored in an ASCII file. The program will also count all or selected words in the textual material, and the count can be made for all or selected documents.

The program can construct an SPSS file in which each document corresponds to an SPSS case. Demographic variables and codes become SPSS variables.


Like Textbase Alpha, AQUAD was developed primarily for the analysis of qualitative data in circumstances in which there is no intent to transform the results to numbers. However, AQUAD has several features that make it useful for content analysis.

Textual material is prepared on a word processor and converted to ASCII files for processing by AQUAD. Each document constitutes one file. For example, if 10 interviews were conducted, 10 ASCII files would be prepared.

Coding in AQUAD can be performed with the textual material displayed on the screen as on a word processor. The cursor is moved to the line where the passage to be coded begins, and the code is entered. The code carries three kinds of information: the line where the segment begins, the line where it ends, and the category label. If the analyst prefers to mark the codes on hard copy first, AQUAD provides a shortcut by which they can be entered into the database.

Even though it was not designed as a content analysis program, AQUAD can be used to count code frequencies and to retrieve the coded passages in their entirety.


TEXTPACK PC was designed for analyzing open-ended survey questions but over the years it has been extended to a variety of applications such as content analysis and literary and linguistic analysis.

In Version V, Release 4.0, for MS/DOS, the text to be coded is prepared on a word processor, which also produces an ASCII file that the program can read. All documents are included in a single file. TEXTPACK PC transforms that file to others in TEXTPACK format for use in the actual analysis. The program has minimal text-editing capability; editing is best done with a word processor.

In coding, the analyst specifies a code “dictionary” of words, sequences of words, and word roots (that is, the beginnings of words). The dictionary is created in the form of an input file for TEXTPACK PC, and the coding is automatic in that the computer looks for and counts the matches of “words” in dictionary and character sequences in the text file. Unlike Textbase Alpha and AQUAD, the recording units that are counted are limited to words, phrases, or word roots in the text. TEXTPACK PC also performs a simple word frequency count (that is, without counting sequences or word roots) without the necessity of creating a code dictionary.

The text retrieval feature identifies and displays words in context. A dictionary file is used to specify the “words” to be searched. Results are displayed in standard KWIC format with identifying information so that each occurrence can be traced back to its location in the text.

A frequency count of codes, produced as described above, can be saved to a file in a form that SPSS and SAS.

5. Micro-OCP

Micro-OCP is the microcomputer implementation of a mainframe concordance program known as OCP, or Oxford Concordance Program. A concordance is an alphabetical list of words showing the context of each occurrence of each word. It makes word lists with frequency counts, indexes, and concordances from texts in a variety of languages and alphabets.

Although designed especially for literary analysis in which individual words are the recording units, the program can be used to perform content analysis by using a somewhat limited form of coding.

As with most other programs, the textual material would ordinarily be generated by a word processing program and converted to ASCII format for importation to Micro-OCP. To perform a content analysis, the analyst also requires a “command” file, which can be developed with a word processor or Micro-OCP. The command file is, in effect, a set of instructions that tells Micro-OCP what it is to do with the textual material.

Text passages can be coded with a word processor by inserting code characters at the beginning of a passage, but there is no way to mark the end of a passage. It is therefore possible to count the occurrence of codes, but the ability to retrieve a coded passage is limited, except when words are the recording units.

Different kinds of text passages can be marked (Micro-OCP calls the markings “references”) for later use in the analysis. For example, when the textual material is composed of answers to a series of interview questions, all responses to question 1 could be marked “Ql,” those to question 2 “Q2,” and so on. By appropriate use of Micro-OCP commands, a given content analysis could then be limited to responses to question 1, for example.

Micro-OCP searches for words and brings back the results in one of three basic forms: a word list, an index, or a concordance. Typical content analysis applications are producing (1) a word list of codes, along with the frequencies of the codes, (2) a concordance of selected words as a preliminary to other forms of analysis, (3) a concordance of codes as a crude way to retrieve partial text passages, and (4) an index of selected words or codes to provide the basis for a second-stage “look-up” of words or codes in the text. Used in these ways, Micro-OCP can provide a rudimentary form of content analysis.

6. Word Cruncher

WordCruncher indexes text files and retrieves and manipulates data from them for viewing or analysis.3 WordCruncher is primarily designed to display the text associated with words or word combinations (that is, the context). It also provides a count of the number of instances of each word and a way of creating a free-standing thesaurus, facilitating the development of categories for a content analysis.

Before analysts use WordCruncher for content analysis, they generate the text material and code it in a word processor. (Under some circumstances, WordCruncher generates second- and third-level codes automatically.) The codes consist of two parts: a reference symbol and a reference label (such as “questionlO”), which identify the location of words in the text.

Once the text has been coded, WordCruncher is used to produce an index—a list of words along with their frequencies. Then, when the analyst highlights a word and presses the enter key, the program finds each instance of the word and displays its context.

7. WordPerfect

A word processing program, such as WordPerfect, is indispensable for carrying out a content analysis. It can be used to create a textual database for later use •with other programs, to edit an existing database, to attach codes necessary for content analysis, and to convert from a word processor format to ASCII format. Virtually all word processors can perform these tasks and their editing capabilities are usually much superior to the primitive editing features found in most specialized content analysis programs.

Some word processors have powerful search features that are useful during the early stages of content analysis. WordPerfect has QuickFinder, which searches for words and phrases within files and across files. The analyst can then scroll through the text to find the words and phrases that QuickFinder has highlighted. Used in this way, the program can be helpful in defining variables and categories and in deciding what material to code.[1]

QuickFinder File Indexer is an enhanced search utility included in WordPerfect 5.2 and later versions. An index of all words in a file or files is created and saved as a basis for all searches. Using the index greatly increases the speed of the search.

QuickFinder allows the analyst to specify quite complex word patterns through the use of search modifiers. Thus, the analyst can search for files containing

  • each one of a set of words (Boolean AND);
  • any one of a set of words (Boolean OR); .
  • one word but not another;
  • particular word forms (using “?” and as wild-card characters);
  • phrases (words next to each other);
  • two words within n number words of each other; and
  • two words in the same line, sentence, paragraph, page, or section (between two hard pages).

Source: GAO (2013), Content Analysis: A Methodology for Structuring and Analyzing Written Material: PEMD-10.3.1, BiblioGov.

Leave a Reply

Your email address will not be published. Required fields are marked *