The Sample Selection Process

Sample selection can follow a number of different procedures, many of which are related to two generic approaches: the traditional approach, typical of proba­bility sampling, and the iterative approach, such as that applied for grounded theory (Glaser and Strauss, 1967). The sample selection process also includes certain procedures that are carried out after data has been collected.

1. Two Standard Approaches

The traditional approach (see Figure 8.3) is typical of probability sampling, but it is also frequently encountered in the quota method. It starts by defining the target population, for which the results will be generalized through statistical inference. The population is operationalized in order to have clear criteria to determine the elements included in or excluded from the study population. The next step is to select a sampling method. It then becomes possible to determine the sample size. If a random sampling method is used, it will be necessary to choose or create a sampling frame in order to carry out random selection. The researcher then selects the sample elements and collects the required data. The elements for which all of the required data could in fact be collected constitute the study’s ‘usable’ sample. The last stage of this procedure is a study of poten­tial biases and, if necessary, adjustment to the sample.

Each stage of this process (sampling method, sample size and element- selection techniques) being related, the outcome of one stage can lead to a reconsideration of earlier choices (Henry, 1990). For example, if the required sample size is too large in terms of the cost of data collection, the population can sometimes be narrowed to make the sample more homogenous. The significance level required for internal validity is then easier to achieve. If it turns out to be difficult to establish or find a sampling frame, another sampling method might be chosen. Consequently, the choices related to selecting a sample often follow a non-linear process (Henry, 1990).

An iterative approach follows a radically different process. Unlike the classic approach, the scope of generalization for the results is not defined at the outset, but rather at the end of the process. Another important difference between the two procedures lies in the progressive constitution of the sample by successive iterations. Each element of the sample is selected by judgement. The data are then collected and analyzed before the next element is selected. Over the course of successive selections, Glaser and Strauss (1967) suggest first studying simi­lar units in order to enable the emergence of a substantive theory before enlarg­ing the collection to include units with different characteristics. The process is completed when theoretical saturation is reached. Unlike the classic procedure, the size and composition of the sample are not predetermined but, quite the opposite, they arise from the outcome of the iterative process of successive selection of elements. These choices are guided by both the data collected and the theory being elaborated. The scope of generalization of the results is constructed progressively over the course of the procedure and is defined only at the outcome of the process.

Role of the pre-test in the sample selection process In practice, research often involves a phase of pre-testing. This pre-testing does not specifically concern sampling, but it does supply useful information that can contribute to a better definition of the required size and composition of the final sample. In quanti­tative studies, the pre-test sample can, in particular, provide an initial estimate for variance and aid in identifying criteria for segmenting a stratified sample. In qualitative research, the pilot case aids in determining the composition and number of cases required. These factors depend on literal and theoretical repli­cation conditions and the magnitude of observed differences (Yin, 1990).

2. Specific Approaches

2.1. Progressive sample selection

The traditional approach is to determine sample size before data collection. However, another possible approach is to collect and process data until the desired degree of precision or level of significance is reached. This involves successive rounds of data collection (Thompson, 1992). According to Adlfinger (1981), this procedure allows us to arrive at a sample half the size it would have been had we determined it in advance. Establishing a minimal size in advance does generally lead to oversized samples as, to be on the safe side, researchers often work from the most pessimistic estimates.

Unfortunately though, researchers are not always in a position to employ this procedure, which can reduce data-collection costs considerably. A study attempting to analyze the impact of a non-reproducible event (such as a merger between two companies) on a variable (for example, executive motivation) illus­trates this. Such a study requires data to be collected both before and after the event – it is not possible to increase the number of elements in the sample progressively. The researcher is then obliged to follow the classic procedure – and to determine sample size in advance.

Even if it were possible to constitute the sample progressively, it would still be worthwhile to estimate its size in advance. Without a prior estimate, the researcher runs the risk of not being able to enlarge the sample (for example, for budgetary concerns). The sample might well turn out to be too small to reach the desired significance level.

Determining sample size beforehand enables researchers to evaluate the feasi­bility of their objectives. This is one way we can avoid wasting time and effort on unsatisfactory research, and it encourages us to consider other research designs that might lead to more significant results.

2.2. Ex post selection

For laboratory experiments, matched samples are constructed by selecting elements, dividing them randomly into the different treatment conditions and collecting data on them. Yet not all phenomena lend themselves to constituting matched samples before data collection. Sample structure can be hard to master, particularly when studying phenomena in real settings, when the phenomena are difficult to access or to identify, or when the population under study is not well known. In these situations, it is sometimes possible to constitute matched samples after data collection – ex post – in order to perform a test. To do this, a control group is selected from the target population following the rules of random selection, in such a way that the structure of the control group reproduces that of the observed group.

3. Post-sampling

3.1. Control and adjustment procedures

It is often possible ex post to correct non-sampling biases such as non-response and response errors. But researchers should bear in mind that adjusting data is a fall-back solution, and it is always preferable to aim to avoid bias.

Non-response Non-responses can cause representativeness biases in the sample. To detect this type of bias, the researcher can compare the structure of the respondent sample to that of the population, focusing on variables that might affect the phenomenon being studied. If these structures differ noticeably, rep­resentativeness bias is probable, and should be corrected.

There are three ways of correcting non-response biases. The first is to survey a subsample of randomly selected non-respondents. The researcher must then make every effort to obtain a response from all of the elements in this subsample (Lessler and Kalsbeek, 1992). The second is to perform an ex post stratification, in which responses from sample elements are weighted to reconstruct the structure of the population (Lessler and Kalsbeek, 1992). This method can also be employed in two other situations of non-response bias: when stratification has not been carried out in advance because of technical difficulties (for example, if no sampling frame, or only an insufficiently precise one, was available), or when a new stratification variable is discovered belatedly, during the data analysis phase. In any case, ex post stratification can increase the precision of estimations. The third procedure is to replace non-respondents with new respondents pre­senting identical characteristics, and to compensate by assigning the new res­pondents a weighted factor (Levy and Lemeshow, 1991). This method can also be applied to adjust for missing responses when replies are incomplete.

If data for certain identifiable subsets of the sample is still missing at the end of this process, the population should be redefined – or, at the very least, this weakness in the study should be indicated.

Response error Response errors can be checked by cross-surveying a sub-set of respondents (Levy and Lemeshow, 1991). This may identify certain types of error, such as errors derived from the interviewer or from respondents misunderstanding the question (although this method is futile if respondents willfully supply erroneous information, which is extremely difficult to detect or correct).

3.2. Small samples

Despite taking all precautions, a sample sometimes turns out to have been too small to obtain the precision or the significance level desired. In this case, the best solution is to carry out a second round of data collection that will enlarge the sample. But this is not always an option (for example, when secondary data is used, when the sampling frame has been entirely exploited, or when the data depend on a particular context which has changed).

When the sample size cannot be increased, the researcher can compensate by generating several samples from the original one. The two main ways of doing this are known as the ‘jackknife’ and the ‘bootstrap’ (Mooney and Duval, 1993). These methods enable researchers to establish their results more firmly than they could through standard techniques alone.

The jackknife The jackknife creates new samples by systematically removing one element from the initial sample. For a sample size of n, the jackknife gives n samples of size n – 1. Statistical processing is then carried out on each of these samples, and the results are compared to the initial sample. The more the results converge, the greater the confidence with which we can regard the outcome.

The bootstrap The bootstrap works on a relatively similar principle, but the samples are constituted differently. They are obtained by random sampling with replacement from the initial sample, and they contain the same number of elements (n). The number of samples drawn from the initial sample can be very high when using the bootstrap, as it does not depend on the size of the initial sample.

Both the jackknife and the bootstrap can be applied to basic statistical measurements, such as variance and mean, and to more complex methods, such as LISREL or PLS.

Source: Thietart Raymond-Alain et al. (2001), Doing Management Research: A Comprehensive Guide, SAGE Publications Ltd; 1 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *