Sequential Sample Selection Methods
Sequential Sample Selection Methods was written by James R. Chromy in 1979. It was part of the proceedings of the American Statistical Association Section on Survey Research Methods.
The author describes a sequential PPS sampling algorithm that can be efficiently programmed.
There are N sampling units. Each unit, indexed by i, is associated with a size measure as S(i).
Let n(i) to be the number of 'sample hits' for unit i. Naturally, Σin(i) is equal to the sample size, n. In probability non-replacement (PNR) sampling, n(i) is equal to 1 for n units and 0 for all others. In probability replacement (PR) sampling, n(i) can take on higher values (in theory up to n).
It can be shown that E[n(i)] = nS(i) / ΣiS(i) and that ΣiE[n(i)] = n. Henceforth, let ΣiS(i) be denoted as S(+)
It follows that a computer algorithm can determine values of n(i) with these probabilities by sequentially visiting units, rather than operating on the entire set. The author introduces this as probability minimum replacement (PMR). First, calculate a uniform random value for each unit. Then let I(i) and F(i) be the integer and fractional parts, respectively, of
This represents the expected number of sample hits for the subset of units up to and including unit i. It follows that
If the uniform random value is less than the conditional probability given by
then n(i) is characterized by
otherwise by
Clearly then
There is then some more math for variance estimation, which is going over my head.
