Algorithm-Numerical-Sample
view release on metacpan or search on metacpan
Crucial to see that the "sample" algorithm is correct is the fact that
when we sample "n" elements from a set of size "N" that the "t + 1"st
element is choosen with probability "(n - m)/(N - t)", when already "m"
elements have been choosen. We can immediately see that we will never
pick too many elements (as the probability is 0 as soon as "n == m"),
nor too few, as the probability will be 1 if we have "k" elements to
choose from the remaining "k" elements, for some "k". For the proof that
the sampling is unbiased, we refer to [3]. (Section 3.4.2, Exercise 3).
Algorithm B.
It is easy to see that the second algorithm returns the correct number
of elements. For a sample of size "n", the first "n" elements go into
the reservoir, and after that, the reservoir never grows or shrinks in
size; elements only get replaced. A detailed proof of the fairness of
the algorithm appears in [3]. (Section 3.4.2, Exercise 7).
LITERATURE
Both algorithms are discussed by Knuth [3] (Section 3.4.2). The first
algoritm, *Selection sampling technique*, was discovered by Fan, Muller
and Rezucha [1], and independently by Jones [2]. The second algorithm,
*Reservoir sampling*, is due to Waterman.
REFERENCES
[1] C. T. Fan, M. E. Muller and I. Rezucha, *J. Amer. Stat. Assoc.* 57
(1962), pp 387 - 402.
[2] T. G. Jones, *CACM* 5 (1962), pp 343.
[3] D. E. Knuth: *The Art of Computer Programming*, Volume 2, Third
edition. Reading: Addison-Wesley, 1997. ISBN: 0-201-89684-2.
lib/Algorithm/Numerical/Sample.pm view on Meta::CPAN
C<(n - m)/(N - t)>, when already C<m> elements have been
choosen. We can immediately see that we will never pick too
many elements (as the probability is 0 as soon as C<n == m>),
nor too few, as the probability will be 1 if we have C<k>
elements to choose from the remaining C<k> elements, for some
C<k>. For the proof that the sampling is unbiased, we refer to [3].
(Section 3.4.2, Exercise 3).
=head2 Algorithm B.
It is easy to see that the second algorithm returns the correct
number of elements. For a sample of size C<n>, the first C<n>
elements go into the reservoir, and after that, the reservoir
never grows or shrinks in size; elements only get replaced.
A detailed proof of the fairness of the algorithm appears in [3].
(Section 3.4.2, Exercise 7).
=head1 LITERATURE
Both algorithms are discussed by Knuth [3] (Section 3.4.2).
The first algoritm, I<Selection sampling technique>, was
discovered by Fan, Muller and Rezucha [1], and independently
by Jones [2]. The second algorithm, I<Reservoir sampling>,
is due to Waterman.
=head1 REFERENCES
=over
=item [1]
C. T. Fan, M. E. Muller and I. Rezucha, I<J. Amer. Stat. Assoc.>
( run in 0.793 second using v1.01-cache-2.11-cpan-39bf76dae61 )