Once evaluators have defined the variables and selected the textual material, their next major task is to define the recording units. A recording unit is the portion of text to which evaluators apply a category label. For example, the Stars and Stripes news story was the focus of analysis in that the evaluation objective was to draw conclusions about whether the stories had been subject to news management or censorship. Therefore, the news story became the recording unit; each news story was categorized by topic, and each news story about the U.S. military was categorized by image. In general, six recording units are commonly used: word, word sense, sentence, paragraph, theme, and whole text. (Weber, 1990)
When words are the recording unit, evaluators categorize each individual word. This recording unit is well-defined because we know the physical boundaries of a word. When all words have been placed in categories, a content analysis becomes simply a word count. Although word counts would probably find limited application in GAO, knowing the frequency of key words may be useful. Most content analysis software and some other specialized forms of software can automatically count individual words.
2. Word Sense
“Word sense” is a variation on words as units. Some computer programs can automatically distinguish between the multiple meanings of a word and can identify phrases that constitute semantic units the way words constitute semantic units. The word senses can then be counted just as if they were words. Applications in GAO are probably limited.
Sentences may occasionally be useful recording units, especially in structured material such as written responses to an open-ended questionnaire item. Although the physical boundaries of sentences are well-defined, using them as units implies human coding, because computer programs cannot automatically classify sentences as they do words and word senses.
A paragraph is a structured unit above the sentence, so it can be a recording unit. Sometimes, however, a paragraph embraces too many ideas for consistent assignment of the text segment to a single category. This leads to the problem of unreliable coding (discussed in chapter 4).
Theme is probably better suited than sentences to coding open-ended questionnaires because a theme can include the several sentences that are commonly a response to such questions. Theme is a useful recording unit, if somewhat ambiguous. Holsti describes a theme as “a single assertion about some subject” (1969, p. 116). The boundary of a theme delineates a single idea; we are not restricted to the individual semantic boundaries of sentences and paragraphs. The evaluator who defines theme as a recording unit should include guidance regarding whether, at one extreme, sentence fragments can be coded or, at the other, paragraphs or multiple paragraphs. However, even with such guidance, coders necessarily use their judgment in determining the boundaries of particular theme units and may therefore be unreliable in their coding.
6. Whole Text
Whole text is a recording unit larger than a paragraph but still with clearly defined physical boundaries. For example, in the Stars and Stripes assessment, a whole news story was a unit of analysis. A news story has physical and other attributes that coders can ordinarily use to distinguish it easily from editorials or syndicated columns. In the extreme, an entire document may be a recording unit. Whole-text coding is almost always unreliable.
Source: GAO (2013), Content Analysis: A Methodology for Structuring and Analyzing Written Material: PEMD-10.3.1, BiblioGov.