Differences between revisions 1 and 6 (spanning 5 versions)
Revision 1 as of 2020-10-22 18:21:22
Size: 1725
Comment:
Revision 6 as of 2020-10-22 20:34:38
Size: 4516
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

----



== Sample Frames ==

Common frames used for survey sampling are:

 * Census Bureau surveys are excellent sources of regional, hierarchical data (i.e. states > counties > tracts)
 * U.S. Postal Service Delivery Sequence files
 * Random digit dialing (RDD)
Line 44: Line 56:
 + what is the expected response rate?  * what is the expected response rate?
Line 53: Line 65:
Within a stratified sample, allocation can be designed as:  * '''Equal allocation''' is taking the same number from each stratum.
 * '''Proportional allocation''' is taking a number proportional to the size of that stratum.
 * '''Neyman allocation''' is an optimization of a key measure's margin of error against cost. It assumes a fixed cost per contact.
 * On the other hand, if cost is assumed variable, it becomes an '''optimal allocation'''.
Line 55: Line 70:
 * equal from each stratum
 * proportional to the size of each stratum
 * an optimization of a key measure's margin of error against cost
   * if cost is assumed fixed per unit, it is a Neyman allocation
   * if cost is assumed variable, it is an optimal allocation
   * note that, if all measures are equally varied, proportional allocation is essentially the same as a Neyman allocation
Note that, if all measures are equally varied, proportional allocation is essentially the same as a Neyman allocation.

----



== Sampling Methods ==

'''Simple Random Sampling''' ('''SRS''') is essentially sorting randomly and taking the first N cases.

'''Stratified Random Sampling''' ('''STSRS''') is the above process applied to a stratified sample, using proportional allocation.

'''Systematic sampling''' is any form of sampling that takes every Nth case from a list. The key is then how the list is ordered.

'''Probability Proportionate to Size''' ('''PPS''') ensures that chance to be contacted increases with the magnitude of some measure. For example, in a study of utility customers, the largest consumers of that utility should almost always be contacted.


=== Multi-Stage Methods ===

Randomly select primary sampling units (PSU) like census tracts, then randomly select the actual targets (i.e. households) as secondary sampling units (SSU).

'''Cluster sampling''' is a two-stage method where ''all'' members of the sampled PSUs are contacted.

Common in face-to-face interviews, due to extraordinary costs of that mode.


=== Multi-Phase Methods ===

Sample for a screener, then re-sample based on the information collected in the screener. In most cases, all responses from the target group are re-contacted, while a random sample of others are re-contacted.

----



== Variance Estimation ==

Sampling variance is how estimates would vary if samples are repeated drawn from the population. Sample design can affect the variance; stratified sampled have differing variances per stratum.

Of course, because the population descriptives are unknown, survey variance must be estimated.

'''Exact methods''' are mathematically convenient but impractical.

'''Finite population correction''' ('''FPC''') encapsulates the fact that as sample rate increases, sampling variance decreases. (If the entire populatino is sampled, there is no variance.) This is generally inapplicable if the sampling rate is below 5%.

'''Taylor series linearization''' makes use of weights and sample design features (i.e. strata, finite population correction, etc.) to estimate variance.

'''Replication''' or '''replicate weights''' makes use of several hierarchical weights.

When using Stata, consider using `subpop` or `over` options instead of `if` for filtration.


=== Singleton Strata ===

It isn't possible to estimate variance for a single unit.

When using Stata, consider using the `singleunit(centered)` option.

Survey Sampling


Sample Frames

Common frames used for survey sampling are:

  • Census Bureau surveys are excellent sources of regional, hierarchical data (i.e. states > counties > tracts)

  • U.S. Postal Service Delivery Sequence files
  • Random digit dialing (RDD)


Sample Type

Propability sampling

All members of a population have a non-zero chance to be contacted in a survey instrument. Traditional statistics rely on this assumption.

Examples:

  • Administrative surveys
  • Surveys with random recruitment (as by random digit dialing)

Non-probability sampling

Some members of a population are certain to be contacted or not be contacted.

Examples:

  • Panel surveys
  • River surveys (i.e. surveys with open recruitment, as by banner ads)


Survey Allocation

Allocation is the distribution of sample size across domains.

Designing Domains

The key considerations are:

  • are some splits more important to others?
    • if studying military recruitment, then sex/gender is a strong split
  • what splits will be used for reporting?
  • what is the expected response rate?
    • if too few responses are expected from a domain, then splits should be reconsidered
  • what is the desired margin of error?

Stratified Allocation

Stratification is the process of dividing the population into discrete stratum, and then sampling from the strata.

  • Equal allocation is taking the same number from each stratum.

  • Proportional allocation is taking a number proportional to the size of that stratum.

  • Neyman allocation is an optimization of a key measure's margin of error against cost. It assumes a fixed cost per contact.

  • On the other hand, if cost is assumed variable, it becomes an optimal allocation.

Note that, if all measures are equally varied, proportional allocation is essentially the same as a Neyman allocation.


Sampling Methods

Simple Random Sampling (SRS) is essentially sorting randomly and taking the first N cases.

Stratified Random Sampling (STSRS) is the above process applied to a stratified sample, using proportional allocation.

Systematic sampling is any form of sampling that takes every Nth case from a list. The key is then how the list is ordered.

Probability Proportionate to Size (PPS) ensures that chance to be contacted increases with the magnitude of some measure. For example, in a study of utility customers, the largest consumers of that utility should almost always be contacted.

Multi-Stage Methods

Randomly select primary sampling units (PSU) like census tracts, then randomly select the actual targets (i.e. households) as secondary sampling units (SSU).

Cluster sampling is a two-stage method where all members of the sampled PSUs are contacted.

Common in face-to-face interviews, due to extraordinary costs of that mode.

Multi-Phase Methods

Sample for a screener, then re-sample based on the information collected in the screener. In most cases, all responses from the target group are re-contacted, while a random sample of others are re-contacted.


Variance Estimation

Sampling variance is how estimates would vary if samples are repeated drawn from the population. Sample design can affect the variance; stratified sampled have differing variances per stratum.

Of course, because the population descriptives are unknown, survey variance must be estimated.

Exact methods are mathematically convenient but impractical.

Finite population correction (FPC) encapsulates the fact that as sample rate increases, sampling variance decreases. (If the entire populatino is sampled, there is no variance.) This is generally inapplicable if the sampling rate is below 5%.

Taylor series linearization makes use of weights and sample design features (i.e. strata, finite population correction, etc.) to estimate variance.

Replication or replicate weights makes use of several hierarchical weights.

When using Stata, consider using subpop or over options instead of if for filtration.

Singleton Strata

It isn't possible to estimate variance for a single unit.

When using Stata, consider using the singleunit(centered) option.


CategoryRicottone

SurveySamples (last edited 2021-04-30 17:01:41 by DominicRicottone)