sec results from the CPAN

Algorithm-Numerical-Sample

view release on metacpan or search on metacpan

    Crucial to see that the "sample" algorithm is correct is the fact that
    when we sample "n" elements from a set of size "N" that the "t + 1"st
    element is choosen with probability "(n - m)/(N - t)", when already "m"
    elements have been choosen. We can immediately see that we will never
    pick too many elements (as the probability is 0 as soon as "n == m"),
    nor too few, as the probability will be 1 if we have "k" elements to
    choose from the remaining "k" elements, for some "k". For the proof that
    the sampling is unbiased, we refer to [3]. (Section 3.4.2, Exercise 3).

  Algorithm B.
    It is easy to see that the second algorithm returns the correct number
    of elements. For a sample of size "n", the first "n" elements go into
    the reservoir, and after that, the reservoir never grows or shrinks in
    size; elements only get replaced. A detailed proof of the fairness of
    the algorithm appears in [3]. (Section 3.4.2, Exercise 7).

LITERATURE
    Both algorithms are discussed by Knuth [3] (Section 3.4.2). The first
    algoritm, *Selection sampling technique*, was discovered by Fan, Muller
    and Rezucha [1], and independently by Jones [2]. The second algorithm,
    *Reservoir sampling*, is due to Waterman.

REFERENCES
    [1] C. T. Fan, M. E. Muller and I. Rezucha, *J. Amer. Stat. Assoc.* 57
        (1962), pp 387 - 402.

    [2] T. G. Jones, *CACM* 5 (1962), pp 343.

    [3] D. E. Knuth: *The Art of Computer Programming*, Volume 2, Third
        edition. Reading: Addison-Wesley, 1997. ISBN: 0-201-89684-2.

lib/Algorithm/Numerical/Sample.pm view on Meta::CPAN

C<(n - m)/(N - t)>, when already C<m> elements have been
choosen. We can immediately see that we will never pick too
many elements (as the probability is 0 as soon as C<n == m>),
nor too few, as the probability will be 1 if we have C<k>
elements to choose from the remaining C<k> elements, for some
C<k>. For the proof that the sampling is unbiased, we refer to [3].
(Section 3.4.2, Exercise 3).

=head2 Algorithm B.

It is easy to see that the second algorithm returns the correct
number of elements. For a sample of size C<n>, the first C<n>
elements go into the reservoir, and after that, the reservoir
never grows or shrinks in size; elements only get replaced.
A detailed proof of the fairness of the algorithm appears in [3].
(Section 3.4.2, Exercise 7).

=head1 LITERATURE

Both algorithms are discussed by Knuth [3] (Section 3.4.2).
The first algoritm, I<Selection sampling technique>, was
discovered by Fan, Muller and Rezucha [1], and independently
by Jones [2]. The second algorithm, I<Reservoir sampling>,
is due to Waterman.


=head1 REFERENCES

=over

=item [1]

C. T. Fan, M. E. Muller and I. Rezucha, I<J. Amer. Stat. Assoc.>

( run in 0.894 second using v1.01-cache-2.11-cpan-6aa56a78535 )