GRID-Machine
view release on metacpan or search on metacpan
lib/GRID/Machine/perlparintro.pod view on Meta::CPAN
51 my $m = $machine{$hn};
52 ($proc[$_], $pid[$_]) = $m->open("./pi $_ $N $np |");
53 $readset->add($proc[$_]);
54 my $address = 0+$proc[$_];
55 $id{$address} = $_;
56 }
During the last stage the master node simply waits in the L<IO::Select>
object listening on each of the channels. As soon as a result is received
it is added to the total sum for C<$pi>:
58 my @ready;
59 my $count = 0;
60 do {
61 push @ready, $readset->can_read unless @ready;
62 my $handle = shift @ready;
63
64 my $me = $id{0+$handle};
65
66 my ($partial);
67 my $numBytesRead = sysread($handle, $partial, 1024);
68 chomp($partial);
69
70 $pi += $partial;
71 print "Process $me: machine = $machine[$me % $nummachines] partial = $partial pi = $pi\n";
72
73 $readset->remove($handle) if eof($handle);
74 } until (++$count == $np);
75
76 my $elapsed = tv_interval ($t0);
77 print "Pi = $pi. N = $N Time = $elapsed\n";
=head1 PERFORMANCE: COMPUTATIONAL RESULTS
Let us see the time it takes the execution of the I<pure C> program on each
of the involved nodes (nereida, beowulf and orion). To have an idea of how things work
for a comptuation large enough we set C<$N> to C<1 000 000 000> intervals:
pp2@nereida:~/LGRID_Machine/examples$ time ssh nereida 'pi/pi 0 1000000000 1'
3.141593
real 0m32.534s
user 0m0.036s
sys 0m0.008s
pp2@nereida:~/LGRID_Machine/examples$ time ssh beowulf 'pi/pi 0 1000000000 1'
3.141593
real 0m27.020s
user 0m0.036s
sys 0m0.008s
casiano@beowulf:~$ time ssh orion 'pi/pi 0 1000000000 1'
3.141593
real 0m29.120s
user 0m0.028s
sys 0m0.003s
As you can see, there is some heterogeneity here. Machine C<nereida> (my desktop)
is slower than the others two. C<beowulf> is the fastest.
Now let us run the parallel perl program in C<nereida> using only the C<beowulf>
node. The time spent is roughly comparable to the I<pure C> time. That is nice:
The overhead introduced by the coordination tasks is not as large (compare it
with the C<beowulf> entry above):
pp2@nereida:~/LGRID_Machine/examples$ time gridpipes.pl 1 1000000000
Process 0: machine = beowulf partial = 3.141593 pi = 3.141593
Pi = 3.141593. N = 1000000000 Time = 27.058693
real 0m28.917s
user 0m0.584s
sys 0m0.192s
Now comes the true test: will it be faster using two nodes? how much?
pp2@nereida:~/LGRID_Machine/examples$ time gridpipes.pl 2 1000000000
Process 0: machine = beowulf partial = 1.570796 pi = 1.570796
Process 1: machine = orion partial = 1.570796 pi = 3.141592
Pi = 3.141592. N = 1000000000 Time = 15.094719
real 0m17.684s
user 0m0.904s
sys 0m0.260s
We can see that the sequential pure C version took 32 seconds in my desktop (C<nereida>).
By using two machines I have SSH access I have reduced that time to roughly 18 seconds.
This a factor of C<32/18 = 1.8> times faster. This factor is even better if I
don't consider the set-up time: C<32/15 = 2.1>. The total time decreases
if I use the three machines:
pp2@nereida:~/LGRID_Machine/examples$ time gridpipes.pl 3 1000000000
Process 0: machine = beowulf partial = 1.047198 pi = 1.047198
Process 1: machine = orion partial = 1.047198 pi = 2.094396
Process 2: machine = nereida partial = 1.047198 pi = 3.141594
Pi = 3.141594. N = 1000000000 Time = 10.971036
real 0m13.700s
user 0m0.952s
sys 0m0.240s
which gives a speed factor of C<32/13.7 = 2.3> or not considering
the set-up time C<32/10.9 = 2.9>.
What happens if you have multiprocessor machine. The results highly
depend on the underlying architecture. My machine C<nereida> is a dual Xeon:
nereida:/tmp/graphviz-2.20.2# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.66GHz
stepping : 5
cpu MHz : 2658.041
cache size : 512 KB
physical id : 0
.......................................
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.66GHz
stepping : 5
cpu MHz : 2658.041
cache size : 512 KB
physical id : 0
...................................
After changing the C<Makefile> to include the C<-O3> option and the
line defining the set of machines in C<gridpipes.pl>
(addresses in the subnetwork 127.0.0 are mapped to localhost):
my @machine = qw{127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4};
We have the following results:
pp2@nereida:~/LGRID_Machine/examples$ time gridpipes.pl 1 1000000000
Process 0: machine = 127.0.0.1 partial = 3.141593 pi = 3.141593
Pi = 3.141593. N = 1000000000 Time = 32.968117
real 0m33.858s
user 0m0.336s
sys 0m0.128s
( run in 1.994 second using v1.01-cache-2.11-cpan-e1769b4cff6 )