Research Reliability and Validity of the Measuring Instrument

1. Definition and Overview

In the social sciences, measurement can be defined as the process that enables us to establish a relationship between abstract concepts and empirical indica­tors (Carmines and Zeller, 1990). Through measurement, we try to establish a link between one or several observable indicators (a cross in a questionnaire, a sentence in a meeting or a document, an observed behavior, etc.) and an abstract concept, which is not directly observable, nor directly measurable, and which we aim to study. We try to determine the degree to which a group of indicators represents a given theoretical concept. One of the main preoccupa­tions of researchers is to verify that the data they plan to collect in the field relates as closely as possible to the reality they hope to study. However, numer­ous occasions for error are likely to arise, making every method of measuring phenomena or the subject being observed more difficult. These errors can include: actors giving false information; tired observers transcribing their observations badly; changes in the attitudes of respondents between two sur­veys; or errors in the process of transforming qualitative data into quantitative data. It is therefore essential to ensure that empirical indicators (or field data) are comparable to the measurements employed. This will provide the best pos­sible representation of the phenomenon being investigated. The researcher must, then, consider – for each measurement used – the question of its reliabil­ity and its validity. This in turn makes it necessary to address the process by which this measurement has been obtained or arrived at: researchers must demonstrate that the instrument or the instruments used enable them to obtain reliable and valid measurements.

To be reliable, a measuring instrument must allow different observers to measure the same subject with the same instrument and arrive at the same results, or permit an observer to use the same instrument to arrive at similar measures of the same subject at different times. To be valid, the instrument must on the one hand measure what it is expected to measure, and on the other hand give exact measures of the studied object.

The validity, just as much as the reliability, of a measuring instrument is expressed in degrees (more or less valid, more or less reliable) and not in absolute terms (valid or not valid, reliable or not reliable). Researchers can assess the validity or reliability of an instrument in comparison with other instruments.

2. Assessing the Reliability of a Measuring Instrument

In assessing reliability, the researcher is assessing whether measuring the same object or the same phenomenon with the same measuring instrument will give results that are as similar as possible. Correlations between dupli­cated or reproduced measurements of the same object or phenomenon, using the same instrument, need to be calculated. This duplication can be carried out either by the same observer at different times, or by different observers simultaneously.

2.1. Measuring instruments used in quantitative research

To assess the reliability and validity of quantitative measuring instruments, researchers are most often drawn to refer to the true value model. This consists of breaking the result of a measurement into different elements: the true value (theoretically the perfect value) and error terms (random error and non-random error).

The measure obtained = true value + random error + non-random error

  • ‘Random error’ occurs when the phenomenon measured by an instrument is subject to vagaries, such as circumstances, the mood of the people being questioned, or fatigue on the part of the interviewer. It is important, how­ever, to note that the very process of measuring introduces random error. The distinction between different indicators used should not be made according to whether or not they induce random error, but rather according to the degree of random error. Generally, random error is inversely related to the degree of reliability of the measuring instrument: the greater the reli­ability, the smaller the random error.
  • ‘Non-random error’ (also called ‘bias’), refers to a measuring instrument producing a systematic biasing effect on the measured phenomenon. A thermometer that measures 5 degrees more than the real temperature is producing a non-random error. The central problem of the validity of the measuring instrument is bound to this non-random error. In general terms, the more valid the instrument, the smaller the non-random error.

Later in this chapter (section 3.1) we will discuss techniques used to improve measurement validity.

In the following discussion we will focus essentially on measuring scales (used in questionnaires), as these constitute the main group of tools used with quantitative approaches. As the reliability of a measurement is linked to the risk that it will introduce random error, we present below four methods used to estimate this reliability.

‘Test-retest’ This method consists in carrying out the same test (for example posing the same question) on the same individuals at different times. We can then calculate a correlation coefficient between the results obtained in two suc­cessive tests. However, these measurements can be unstable for reasons inde­pendent of the instrument itself. The individuals questioned might themselves have changed; to limit this possibility, there should not be too much delay between the two tests. The fact of having been given the test previously may also sensitize a subject to the question, and predispose him to respond differently in the second test – subjects may have given the problem some thought, and perhaps modified their earlier opinions. It has also been observed, conversely, that if the time lapse between two measurements is too short, actors often remember their first response and repeat it despite apparent changes.

Alternative forms This method also involves administering two tests to the same individuals, the difference being that in this case the second test is not identical to the first; an alternative questionnaire is used to measure the same object or phenomenon, with the questions formulated differently. Although this method limits the effect that memory can have on test results (a source of error with the test-retest method), it is sometimes difficult in practice to design two alternative tests.

‘Split-halves’ This method consists of giving the same questionnaire at the same time to different actors, but in this case the items are divided into two halves. Each half must present the phenomenon the researcher seeks to measure, and contain a sufficient number of items to be significant. A coefficient correlation of the responses obtained in each half is then calculated (one of the most com­monly used of such coefficients is that of Spearman-Brown (Brown, 1910; Spearman, 1910). The difficulty of this method lies in the division of question­naire items – the number of ways of dividing these items increases greatly as the number of items contained in a scale increases. This problem of dividing the items is a limitation of this method, as the coefficients obtained will vary in line with the division method used. One solution to this problem consists in num­bering the items and dividing the odd from the even.

Internal consistency Methods have been developed to estimate reliability coefficients that measure the internal cohesion of a scale without necessitating any dividing or duplicating of items. Of these coefficients the best known and most often used is Cronbach’s Alpha (Cronbach, 1951).

Cronbach’s Alpha is a coefficient that measures the internal coherence of a scale that has been constructed from a group of items. The number of items ini­tially contained in the scale is reduced according to the value of the coefficient alpha, so as to increase the reliability of the construct’s measurement. The value of a varies between 0 and 1. The closer it is to 1, the stronger the internal cohe­sion of the scale (that is, its reliability). Values equal to 0.7 or above it are gene­rally accepted. However, studies (Cortina, 1993; Kopalle and Lehman, 1997; Peterson, 1994) have shown that interpretation of the alpha coefficient is more delicate than it may seem. The number of items, the degree of correlation between the items and the number of dimensions of the concept being studied (a concept may be unidimensional or multidimensional) all have an impact on the value of a. If the number of items contained in the scale is high, it is pos­sible to have an a of an acceptable level in spite of a weak correlation between the items or the presence of a multidimensional concept (Cortina, 1993). It is therefore necessary to make sure before interpreting an alpha value that the concept under study is indeed unidimensional. To do this a preliminary factor analysis should be carried out. The a coefficient can be interpreted as a true indicator of the reliability of a scale only when the concept is unidimensional, or when relatively few items are used (six items, for example).

Of the methods introduced above, the alternative forms method and the Cronbach Alpha method are most often used to determine the degree of relia­bility of a measuring scale, owing to the limitations of the test-retest and the split halves methods (Carmines and Zeller, 1990).

2.2. Measuring instruments used in qualitative research

While we face the problem of reliability just as much for qualitative instru­ments as for quantitative instruments, it is posed in different terms. As Miles and Huberman (1984a: 46) point out:

continuously revising instruments puts qualitative research at odds with survey research, where instrument stability (for example test-retest reliability) is required to assure reliable measurement. This means that in qualitative research issues of instrument validity and reliability ride largely on the skills of the researcher. Essentially a person – more or less fallibly – is observing, interviewing and recording while modifying the observation, interviewing and recording devices from one field trip to the next.

The continual revision of instruments makes qualitative research the exact opposite of quantitative research, in which the stability of the instrument is essential for a reliable measure. In qualitative research, instrument validity and reliability depend largely on the skills of the researcher – a person, fallible to differing degree, who observes, questions and records, while at the same time modifying his or her observation, interview and recording tools ‘from one field trip to the next’ (Miles and Huberman, 1984a: 46). Thus, reliability is assessed partly by comparing the results of different researchers, when there are several of them, and partly through the work of coding raw data obtained through interviews, documents or observation (see Chapter 16). Different coders are asked to analyze data using a collection of predetermined categories and in accordance with a coding protocol. Inter-coder reliability is then assessed through the rate of agreement between the different coders on the definition of the units to code and their categorization. This reliability can also be calculated from the results obtained by a single coder who has coded the same data at two different periods, or by two coders working on the same data simultaneously.

Reliability of observations Studies based on observation are often criticized for not providing enough elements to enable their reliability to be assessed. To respond to this criticism researchers are advised to accurately describe the note­taking procedures and the observation contexts that coders should follow (Kirk and Miller, 1986), so as to ensure that different observers are assessing the same phenomenon in the same way, noting the phenomena observed according to the same norms. We can then assess the reliability of the observations by comparing how the different observers qualified and classified the observed phenomena.

To obtain the greatest similarity in results between the different observers, it is recommended that researchers use trained and experienced observers and set out a coding protocol that is as clear as possible. In particular, the protocol will have to establish exactly which elements of the analysis are to be recorded, and clearly define the selected categories.

Reliability of documentary sources Researchers have no control over the way in which documents have been established. Researchers select the documents that interest them, then interpret and compare their material. Reliability depends essentially on the work of categorizing written data in order to analyze the text (see Chapter 16), and different coders interpreting the same document should obtain the same results. An assessment of reliability is then essentially a question of determining the degree of inter-coder reliability (see the discussion on assess­ing inter-coder reliability above).

Reliability of interviews Unstructured interviews are generally transcribed and analyzed in the same way as documents; the question of reliability comes back then to determining inter-coder reliability.

In the case of more directive interviews, interview reliability can be enhanced by ensuring that all the interviewees understand the questions in the same way, and that the replies can be coded unambiguously. For this reason it is necessary to pre-test questionnaires, to train interviewers and to verify inter­coder reliability for any open questions.

3. Assessing the Validity of a Measuring Instrument

3.1. Measuring instruments used in quantitative research

We recall that validity is expressed by the degree to which a particular tool measures what it is supposed to measure rather than a different phenomenon. An instrument must also be valid in relation to the objective for which it has been used. Thus, while reliability depends on empirical data, the notion of validity is in essence much more theoretical, and gives rise to the question: ‘Valid for what purpose?’

We have seen in Section 2.1 above that the validity of a measuring instru­ment is tied to the degree of non-random error that it contains (or any bias introduced by using the tool or by the act of measuring). Improving the valid­ity of a measuring instrument then consists of reducing as far as possible the level of non-random error connected to the application of that instrument.

One type of validity by which we can assess a measuring instrument is con­tent validity: this means validating the application of a tool on the basis of a consensus within the research community as to its application. It is also useful to assess whether the tool used permits different dimensions of the phenome­non under study to be measured. In the case of quantitative instruments (par­ticularly measuring scales), the notion of instrument validity is very close to the notion of construct validity; both assess whether the indicators used (by way of the measurement scale) are a good representation of the phenomenon (see Section 1, 2.1 above).

3.2. Measuring instruments used in qualitative research

In discussing qualitative research, Miles and Huberman (1984a: 230) assert that ‘the problem is that there are no canons, decision rules, algorithms, or even any agreed upon heuristics in qualitative research, to indicate whether findings are valid’. However, the accumulation of experimental material in qualitative research has led more and more researchers to put forward methodologies which can improve the validity of qualitative tools such as observation, inter­views or documentary sources.

Improving the validity of interviews The question of the validity of interviews used in a qualitative process poses a problem, as it is difficult to assess whether this instrument measures exactly what it is supposed to measure. The fact that the questions posed concern the problem being studied is not enough to assess the validity of the interview. While it is possible to assess whether an interview is a good instrument for comprehending facts, such an assessment becomes more tenuous when we are dealing with opinions for which there is no exter­nal criterion of validity. Certain precautions exist that are designed to reduce errors or possible biases, but the subject of the validity of interviews remains debatable, raising the question of whether researchers should give priority to the accuracy of their measurements or to the richness of the knowledge obtained (Dyer and Wilkins, 1991).

Improving the validity of document analyses When an analysis can describe the contents of a document and remain true to the reality of the facts being studied, we can consider that this analysis is valid. Its validity will be that much stronger if the researcher has taken care to clearly define how categories and quantification indices are to be constructed, and to describe the categorization process in detail. We must note, however, that it is easier to show the validity of a quantitative content analysis, which aims for a more limited goal of describing obvious content, than to show the validity of a qualitative content analysis, which can have the more ambitious goals of prediction, explanation and analysis of the latent content (see Chapter 16). Validity can also be verified by comparing the results obtained through content analysis with those obtained through different techniques (interviews, measurements of attitudes, observation of behavior).

Improving the validity of observation techniques External criteria do not always exist with which to verify whether the observations really measure what they are supposed to measure. Different observation techniques are pos­sible (see Silverman, 1993), and validity depends more on the methodological system employed than on the instrument itself (Miles and Huberman, 1984a).

Source: Thietart Raymond-Alain et al. (2001), Doing Management Research: A Comprehensive Guide, SAGE Publications Ltd; 1 edition.

4 thoughts on “Research Reliability and Validity of the Measuring Instrument

  1. Beau says:

    Greetings! Quick question that’s entirely off topic. Do you know how
    to make your site mobile friendly? My web site
    looks weird when viewing from my iphone. I’m trying to find a template or plugin that might be able to fix this issue.
    If you have any recommendations, please share. With thanks!

  2. Collette says:

    Good way of telling, and good piece of writing to take information on the topic of my presentation topic, which i
    am going to deliver in academy.

Leave a Reply

Your email address will not be published. Required fields are marked *