Data Coding, Entry, and Checking in SPSS – Problem 2.1: Check the Completed Questionnaires

Now examine the incomplete, unclear, or double answers in questionnaire. Stop and do this now, before proceeding. What issues did you see? The researcher needs to make rules about how to handle these problems and note them on the questionnaires or on a master “coding instructions” sheet so that the same rules are used for all cases.

We have identified at least 11 responses on 6 of the 12 questionnaires that need to be clarified. Can you find them all? How would you resolve them? Write on Figs. 2.1 and 2.2 how you would handle each issue that you see.

Make Rules About How to Handle These Problems

For each type of incomplete, blank, unclear, or double answer, you need to make a rule for what to do. As much as possible, you should make these rules before data collection, but there may well be some unanticipated issues. It is important that you apply the rules consistently for all similar problems so as not to bias your results.

Interpretation of Problem 2.1 and Fig. 2.4

Now we will discuss each of the issues and how we decided to handle them. Of course, some reasonable choices could have been different from ours. We think that the data for Participants 1-6 are quite clear and ready to enter with the help of Fig. 2.3. However, the questionnaires for participants 7-12 pose a number of minor and more serious problems for the person entering the data. We discuss next and have written our decisions in numbered callout boxes on Fig. 2.4, which are the surveys and responses for Subjects 7-12.

  1. For Participant 7, the GPA appears to be written as 250. It seems reasonable to assume that he meant to include a decimal after the 2, and so we would enter 2.50. We could instead have said that this was an invalid response and coded it as missing. However, missing data create problems in later data analysis, especially for complex statistics. Thus, we want to use as much of the data provided as is reasonable. The important thing here is that you must treat all other similar problems the same way.
  2. For Subject 8, two colleges were checked. We could have developed a new legitimate response value (4 = other). Because this fictitious university requires that students be identified with one and only one of its three colleges, we have developed two missing value codes (as we did for ethnic group and religion in the HSB data set). Thus, for this variable only, we used 98 for multiple checked colleges or other written-in responses that did not fit clearly into one of the colleges (e.g., business engineering or history and business). We treated such responses as missing because they seemed to be invalid and/or because we would not have had enough of any given response to form a reasonable size group for analysis. We used 99 as the code for cases where nothing was checked or written on the form. Having two codes enabled us to distinguish between these two types of missing data, if we ever wanted to later. Other researchers (e.g., Newton & Rudestam, 1999) recommend using 8 and 9 in this case, but we think that it is best to use a code that is very different from the “valid” codes so that they stand out visually in the Data View and will lead to noticeable differences in the Descriptives if you forget to code them as missing values.
  3. Also, Subject 8 wrote 2.2 for his It seems reasonable to enter 2.20 as the GPA. Actually, in this case, if we enter 2.2, the program will treat it as 2.20 because we will tell it to use two decimal places for this variable.
  4. We decided to enter 3.00 for Participant 9’s Of course, the actual GPA could be higher or, more likely, lower, but 3.00 seems to be the best choice given the information provided by the student (i.e., “about 3 pt”).
  5. Participant 10 only answered the first two questions, so there were lots of missing data. It appears that he or she decided not to complete the questionnaire. We made a rule that if three out of the first five items were blank or invalid, we would throw out that whole questionnaire as invalid. In your research report, you should state how many questionnaires were thrown out and for what reason(s). Usually you would not enter any data from that questionnaire, so you would only have 11 subjects or cases to enter. To show you how you would code someone’s college if they left it blank, we did not delete this subject at this time.
  6. For Subject 11, there are several problems. First, she circled both 3 and 4 for the first item; a reasonable decision is to enter the average or midpoint, 3.50.
  7. Participant 11 has written in “biology” for college. Although there is no biology college at this university, it seems reasonable to enter 1 = arts and sciences in this case and in other cases (e.g., history = 1, marketing = 2, civil = 3) where the actual college is clear. See the discussion of Issue 2 for how to handle unclear examples.
  8. Participant 11 also entered 9.67 for the GPA, which is an invalid response because this university has a 4-point grading system (4.00 is the maximum possible GPA). To show you one method of checking the entered data for errors, we will go ahead and enter 9.67. If you examine the completed questionnaires carefully, you should be able to spot errors like this in the data and enter a blank for missing/invalid data.
  9. Enter 1 for reading and homework for Participant 11 (even though they were circled rather than checked). Also enter 0 for extra credit (not checked) as you would for all the boxes left unchecked by other participants (except Subject 10, who, as stated in number 5 above, did not complete the questionnaire). Even though this person circled the boxes rather than putting X’s or checks in them, her intent is clear.
  10. As in Point 6, we decided to enter 2.5 for Participant 12’s X between 2 and 3.
  11. Participant 12 also left GPA blank so, using the general (system) missing value code, we left it blank.

Fig. 2.4. Completed survey with callout boxes showing how we handled problem responses.

Clean up Completed Questionnaires

Now that you have made your rules and decided how to handle each problem, you need to make these rules clear to whoever will enter the data. As mentioned earlier, we put our decisions in callout boxes on Fig. 2.4; a common procedure would be to write your decisions on the questionnaires, perhaps in a different color.

Source: Morgan George A, Leech Nancy L., Gloeckner Gene W., Barrett Karen C. (2012), IBM SPSS for Introductory Statistics: Use and Interpretation, Routledge; 5th edition; download Datasets and Materials.

Leave a Reply

Your email address will not be published. Required fields are marked *