MCell & NPACi
MCell & NPACi
MCell & NPACI -- Running in Parallel
With support from NPACI
centered at the San Diego Supercomputer Center, two means of parallelizing
MCell are being pursued:
-
port MCell to the Cray
T3E and IBM SP massively parallel
supercomputer architecture.
-
develop an MCell/NetSolve client/server version of
MCell so that heterogeneous clusters of hundreds of NetSolve
machines may be used to run multiple MCell parameter space explorations
in parallel.
The port of MCell to the
Cray T3E and IBM SP utilizes the MPI
and PVM3 message
passing protocols using the KeLp library and, when completed, will allow a single MCell
job to split the heavy computational load of its diffusion algorithm as
well as its large memory requirements evenly across any number of multiple
processors on the machine. In addition, using the MPI and PVM3 standards
will make this parallel version of MCell highly portable to other parallel
architectures, including heterogeneous clusters of machines.
NetSolve
is an alternative to MPI and PVM3 that makes it simple to turn a loosely
associated collection of machines into a fault-tolerant client/server compute
cluster. We have begun working very closely with the authors of NetSolve
(Jack Dongarra
& Henri Casanova)
to create an MCell/NetSolve client/server application. This
application will allow submission of perhaps thousands of MCell
jobs to the cluster for automatic, fault-tolerant, concurrent execution.
During our initial tests jobs were run concurrently on 40 machines.
It is very inefficient to transmit large input files (e.g. 3D
reconstruction information) repeatedly to the distributed compute cluster.
Future advancements in NetSolve will include a distributed file caching
mechanism to solve this problem. The MCell/NetSolve client/server
will also be more closely integrated with the metacomputing tools AppLeS
and NWS being developed by Francine
Berman and Rich Wolski.
|