Defining the Variables in Content Analysis

The assignment’s evaluation questions lead directly to the relevant variables. In the Stars and Stripes example, we asked “To what extent does the content of news stories in Stars and Stripes indicate management or censorship of the news management?” In practice, however, defining a variable may be separated into two parts: conceptualizing the variable and specifying its categories.

1. Conceptualizing and Categorizing

“Conceptualizing a variable” means identifying subjects, things, or events that vary and that will help us answer the question. In the Stars and Stripes example, the two variables “news story topic” and “image of the .military” were defined. News story topic
was variable across the stories that appeared in Stars and Stripes, and the paper’s distorted coverage or-topics might indicate news management or censorship. Image of the military could also conceivably vary across the paper’s stories, and imbalance in the image of the U.S. military might be another indicator of news management or censorship.

“Specifying the categories” distinguishes one subject, thing, or event from others by putting them each and severally into a limited number of categories. Thus, to completely define a variable for content analysis, we need to specify its categories. The variable’s category may be either nominal or ordinal and it must be exclusive and exhaustive. Nominal variables have nointrin sic order. For example, gender can be treated as a nominal variable with two categories-male and female-but there is nothing about either category that warrants ranking one ahead of the other. Ordinal variables do have an intrinsic order. For example, attitude is often divided into categories such as greatly dislike, moderately dislike, indifferent to, moderately like, and greatly like. These categories can be ranked from top to bottom or bottom to top.

Categories must be mutually exclusive and  exhaustive. If they overlap, then information may be erroneously classified. Likewise, if the categories do not cover all possible classes of information, then a variable may be misclassified or not recorded at all.
News story topic in the Stars and Stripes example was a nominal variable that had five categories: acquired immunodeficiency syndrome, Iran-contra, strategic issues (such as Intermediate Nuclear Forces and the Strategic Defense Initiative), the 1988 presidential campaign, and other. Each news story could thus be conceptually labeled as fitting into one of these categories. The first four categories corresponded to politically sensitive topics, so they seemed relevant to the evaluation question. The fifth category, “other,” ensured that all stories would be labeled.

Military image was also a nominal variable but it had four categories: negative, neutral, positive, and mixed.

Each news story about the U.S. military was placed into one of the categories. If the variable had had only the three categories negative, neutral, and positive, it would have been not nominal but ordinal. The category “mixed” was included because without it
some stories would not have been classified. This fourth category helped ensure that the categories were mutually exclusive and exhaustive.

2. Determining the Number of Categories

What dictates the number of categories for a variable? Some variables seem to have an intrinsic set of categories. For example, a week can have seven categories (the seven days) or two (weekdays and weekend). For news story topic, the list of possible
categories is virtually endless, so the evaluator must use judgment and be guided by the evaluation question.

In the Stars and Stripes assignment, the evaluators needed evidence to show the extent, if any, of news management or censorship. Studying all possible categories of news story was not feasible, so they chose only those for which they could determine some editorial manipulation.

The practical limit to the number of categories that I can be handled is important. Both the coding process and the analytical tools available may suggest upper limits. And, certainly, the interpretation of results can become very complicated when categories are
numerous. Generally speaking, the categories assigned to each variable should not exceed seven inthe f inal steps of the analysis but may include more in the coding process because later, after the results of coding are known, evaluators can combine some categories. They may not, however, expand them.

Some ordered variables have a natural middle or neutral point. For those that do, selecting an uneven number of categories allows coders to determine a middle ground. For’ example, for observations about attitude, the five categories greatly dislike,  moderately dislike, indifferent to, moderately like, and greatly like are better than the four categories greatly dislike, moderately dislike, moderately like, and greatly like. This is because the latter scaleunrealis tically forces all attitudes into either negative
or positive categories.

Source: GAO (2013), Content Analysis: A Methodology for Structuring and Analyzing Written Material: PEMD-10.3.1, BiblioGov.

Leave a Reply

Your email address will not be published. Required fields are marked *