Sampling Rare Populations

Sampling Rare Populations (DOI: https://doi.org/10.2307/2981886) was written by Graham Kalton and Dallas W. Anderson in 1986. It was published in the Journal of the Royal Statistical Society (series A) (vol. 149, no. 1).

The authors discuss several sampling methods for economically surveying a rare population.

  1. Screening to identify the rare population from a wider population.
    • The mode of the full interview must be considered when executing a screening plan. For example, if the full interview will be face-to-face, then the screener should be sampled in a geographically clustered manner. This despite the potential to select a more representative sample.
    • There is a risk that nonresponse to the mode of the screener is related to being in the rare population.
  2. Disproportionate sampling involves selecting strata with a higher prevalence of the rare population at a higher rate.

  3. Multiplicity or network sampling involves selecting households and then collecting information about both the interviewee and others affiliated with the interviewee. One example is interviewing one household member but collecting information about all household members.
    • Most interesting survey efforts include questions that one person cannot easily or accurately answer on behalf of another. In which case, this interview turns into a screener to identify affiliated individuals who are likely to be in the rare population.
    • "use of informants in multiplicity sampling is frequently likely to increase significantly the level of response error"
    • "An ethical question with multiplicity sampling is whether it is appropriate to collect the survey data from an informant who is not even a member of the linked person's household."
  4. Multiple frames methods: see below

  5. Snowballing or reputational sampling involves a dynamic frame built through administration of the survey. Contact information for members of the rare population is requested during the interview.
  6. Sequential sampling involves selecting an initial sample based on power analysis and the estimated prevalence of a rare population. Administering this sample yields some number of interviews and an updated estimate of prevalence, enabling a second sample that can be smaller and more accurate.
  7. "Multipurpose" (omnibus) surveys where resources are pooled across multiple survey efforts.
  8. Secondary analysis of available survey data.
    • Biggest problem is inconsistency in measurements and classifications.
  9. Batch testing, as in testing for water contamination. Testing household water samples individually would be prohibitively expensive, so instead samples are aggregated.

Multiple frame methods

For example, making use of partial frames of the rare population, as well as a frame which covers the entire general population. "the sample of retail stores described in Hansen, Hurwitz, and Madow (1953, pp. 516-558). Within selected primary sampling units, all retail stores on a combined list were included in the sample and an area sample was taken to give representation to stores not on the list."

The challenge is that individuals can be on both frames. Two approaches from here:

Elimination of overlaps, creating a single representative frame that also identifies the rare population with high probability. This then relies on frame linking.

Compensation for overlaps, adjusting weights to reflect the probability of selection. Given two sampling frames (A and B), consider the discrete domains to be A, B, and AB (i.e., the overlap). If the domain sizes (NA, NB, and NAB) are known, then a population total Y can be estimated as:

Ŷ1 = NAA + NBB + NAB(p y̅ABA + q y̅ABB)

where p + q = 1, ABA is AB computed from sampling frame A, and ABB is AB computed from sampling frame B.

If these sizes are not known, then take the inverses of the sampling fractions (notated as FA and FB). Y can be estimated as:

Ŷ2 = FA (y̅A + y̅ABA) + FB (y̅B + y̅ABB)

"For further details, the reader is referred to the papers by Hartley (1962,1974), Cochran (1964), Fuller and Burmeister (1972), and the sizeable recent research on the use of dual frame estimation techniques to augment telephone surveys by face-to-face interviews (Lund, 1968; Cassady et al.; 1981; Groves and Lepkowski, 1982; Lepkowski and Groves, 1984)."


CategoryRicottone CategoryReadingNotes

SamplingRarePopulations (last edited 2025-12-16 04:21:37 by DominicRicottone)