A Beginner's Guide to PVM Parallel Virtual Machine

Contents

Introduction

PVM (Parallel Virtual Machine) is a software environment for heterogeneous distributed computing. It allows a user to create and access a parallel computing system made from a collection of distributed processors, and treat the resulting system as a single virtual machine (hence the name, parallel virtual machine).

The hardware in a user's virtual machine may be single processor workstations, vector machines or parallel supercomputers or any combination of those. The individual elements may all be of a single type (homogeneous) or all different (heterogeneous) or any mixture, as long as all machines used are connected through one or more networks. These networks may be as small as a LAN connecting machines in the same room, as large as the Internet connecting machines across the world or any combination. This ability to bring together diverse resources under a central control allows the PVM user to divide a problem into subtasks and assign each one to be executed on the processor architecture that is best suited for that subtask.

PVM is based on the message-passing model of parallel programming. Messages are passed between tasks over the connecting networks.

User's tasks are able to initiate and terminate other tasks, send and receive data, and synchronize with one another using a library of message passing routines. Tasks are dynamic (i.e. can be started or killed during the execution of a program), even the configuration of the virtual machine (i.e. the actual machine that are part of your PVM) can be dynamically configured.
 
 

Installing PVM

One of the reasons for PVM's popularity is it's ease of use, which extends to its installation. No special privileges are required to install PVM, anyone with a valid login on the host machines can install it.

This section describes how to install PVM on machines that you wish to run PVM applications on. If PVM has been previously installed on the machines, you may skip the first part of this section. Check with your system administrator to see if and where PVM has been installed.

The latest version of PVM is always available in from Netlib at the URL http://www.netlib.org/pvm3/ in a file named pvm3.x.x.tar.gz. There are pointers to more PVM information in this directory. As of May 2000, the most recent version of PVM was pvm3.4.

The following set of commands download, unpack, build and install a release of PVM. Start in the directory just above PVM_ROOT, that is, just above where you wish PVM to be installed, for example $HOME or /usr/local.

% ftp ftp.netlib.org
Name: anonymous
Password: you@host.domain (put your email address here)
ftp> bin
ftp> cd pvm3
ftp> ls pvm3*.tar.z.uu
ftp> get pvm3.4........
ftp> quit
The environment variable PVM_ROOT tells PVM where the system is installed so that PVM can find various files. It must be set in order for PVM to run. PVM also needs to know what architecture it running on (e.g., SUN4), so that it can run the appropriate executables. To do this it checks the environment variable PVM_ARCH. A script is provided with PVM which figures out the correct value for PVM_ARCH

The following examples assume that PVM is installed in $HOME/pvm3. If it is installed in another location, for example, /usr/local/pvm3 please make the appropriate changes.
 

Configuration

The machines lpc1, lpc2, lpc4 and lpc5 and the medusa can be used as machines for the PVM platform.The users are pardb1 -- pardb7.

On each machines you should include into the .bashrc the following export's :

If you normally use bash, sh or ksh, add these lines to the end of your .bashrc:
export PVM_ROOT=/usr/lib/pvm3
export PVM_DPATH=$PVM_ROOT/lib/pvmd
export MANPATH=$MANPATH:$PVM_ROOT/man
export PATH=$PATH:$PVM_ROOT/lib:$PVM_ROOT/include

To use the pvm functionalities, a rsh between the different noded of the parallel virtual machines must be possible. In order to enable this, you should configure the .rhosts :
Make a file .rhosts in your $HOME directory and write in it:

lpc1.itec.uni-klu.ac.at harald
lpc2.itec.uni-klu.ac.at harald

if you are user harald and the nodes of the parallel machines are lpc1 and son on.
 

The PVM distribution comes with several example programs which are an excellent way of learning PVM. you can find these examples in $PVM_ROOT/examples. Use these examples and their associated Makefiles as templates for your own programs.
 
 

Writing PVM Applications

PVM applications can be written in using C or Fortran 77, Java and other languages.. To program in C using PVM, the user adds PVM function calls to the code. The compiled code is then linked with libraries which handle the PVM calls.

PVM provides a very flexible environment for message passing. It supports MIMD (Multiple Instruction stream over Multiple Data stream) style parallel computation, though most programs are written in the SPMD (Single Program over Multiple Data stream) style.

A Simple Example

There are a few elements common to all PVM programs. Every PVM program needs to include a header file (C needs pvm3.h; ) at the beginning. A possible first PVM routine called, usually : pvm_mytid in C, enrolls the process in PVM. The routine returns a number which is the task-id tid of the calling process.

NOTE: Almost all PVM calls in C are functions generally start with pvm_ (e.g., mytid = pvm_mytid()) and they return a value, usually communicating the success of the call or other useful information.

First Example :

The following example codes provide a very basic outline for a PVM program. A process enrolls in PVM, it does some work, possibly including message passing, it exits from PVM, and exits from the main program. This code is a complete program which includes some PVM calls and provides a simple first example to show the syntax of the C and Fortran versions. A more detailed example will be discussed later.

C Code myprog.c

#include "pvm3.h"  
#define NTASKS 5

main()
{
  int mytid, info;    

  /* enroll in PVM */
  mytid = pvm_mytid();
  
  /* Possibly do some work here ... */
  printf("Hello from task %d", mytid);
  
  /* exit from PVM */
  info = pvm_exit();

  exit();
}
Compiling in C: Compile the code using the C compiler, letting it know the location of the include files, and linking with the PVM libraries :
% cc -o myprog myprog.c -I$PVM_ROOT/include -L$PVM_ROOT/lib/LINUX -lpvm3 -lnsl
Running the executable: Start the parallel virtual machine PVM, quit to the console and run one copy of the executable. Then re-enter PVM and use the PVM console to run multiple (3) copies of the executable. Finally halt the virtual machine, so that any PVM deamon process are stopped.
%(asteroid) pvm
pvm> conf
1 host, 1 data format
                    HOST     DTID     ARCH   SPEED
                asteroid    40000     LINUX  1000
pvm> quit
% myprog
Hello from task 262146
% pvm
pvm> spawn -3 -> myprog
[1]
3 successful
t40004
t40005
t40006
[1:t40005] Hello from task 262149
[1:t40005] EOF
[1:t40006] Hello from task 262150
[1:t40006] EOF
[1:t40004] Hello from task 262148
[1:t40004] EOF
[1] finished
pvm> halt
You have now seen how a simple PVM program (one without any specific message passing) can be written and compiled. You have also seen how the program is run from the command line and from the PVM console. You even seen how multiple copies of the program are spawned from the PVM console. These things will now be explained in more detail and with more examples.
 
 

Detailed Examples

There are two main programming paradigms used in message passing programs master/worker and hostless. In the master/worker paradigm, one task (master) is designated to be the coordinator, usually handling the spawning (creation) of the other tasks, and the input/output. Under the hostless model, multiple tasks are spawned from the command line or from the console, and each starts working on its portion of the problem. Input and output requirements are usually handled by individual tasks themselves, but it may be beneficial to have some results collected by a single task, especially if those results need to be broadcast back out to all the tasks.

Each PVM task receives a unique task identifier tid from the PVM daemon. This tid is received when the process enrolls into PVM (on its first call to a PVM routine) and is used by the other tasks to address message to it.

The following subsections give an example program in both C. For this example, we shall assume that the user has built his own set of PVM executables on the LINUX machine cetus1a and the program sources reside in the user's $HOME directory.

The examples given here illustrate a master/worker code that sums up the values of an integer array. The master task spawns off five worker tasks, sends each of the workers a portion of the array to be summed up, receives the partial totals from each of the workers, and adds those up for the final total. The worker tasks receive an array of integers, add up all the values in the array, and send the total back to the master process.

C Example


Figure 1: C version of master example.

#include "pvm3.h"
#include <stdio.h>
#define SIZE 1000
#define NPROCS 5

main()
{
   int mytid, task_ids[NPROCS];
   int a[SIZE], results[NPROCS], sum = 0;
   int i, msgtype, num_data = SIZE/NPROCS;

   /* enroll in PVM */
   mytid = pvm_mytid();

   for (i = 0; i < SIZE; i++)
      a[i] = i % 25;

   /* spawn worker tasks */
   pvm_spawn("worker", (char **)0, PvmTaskDefault, "", NPROCS, task_ids);

   /* send data to worker tasks */
   for (i = 0; i < NPROCS; i++) {
      pvm_initsend(PvmDataDefault);
      pvm_pkint(&num_data, 1, 1);
      pvm_pkint(&a[num_data*i], num_data, 1);
      pvm_send(task_ids[i], 4);
   }

   /* wait and gather results */
   msgtype = 7;
   for (i = 0; i < NPROCS; i++) {
      pvm_recv(task_ids[i], msgtype);
      pvm_upkint(&results[i], 1, 1);
      sum += results[i];
   }

   printf("The sum is %d \n",sum);

   pvm_exit();
}


Figure 2: C version of worker example.

#include "pvm3.h"
#include <stdio.h>

main()
{
   int mytid;
   int i, sum, *a;
   int num_data, master;

   /* enroll in PVM */
   mytid = pvm_mytid();

   /* receive portion of array to be summed */
   pvm_recv(-1, -1);
   pvm_upkint(&num_data, 1, 1);
   a = (int *) malloc(num_data*sizeof(int));
   pvm_upkint(a, num_data, 1);

   sum = 0;
   for(i = 0; i < num_data; i++)
      sum += a[i];

   /* send computed sum back to master */
   master = pvm_parent();
   pvm_initsend(PvmDataRaw);
   pvm_pkint(&sum, 1, 1);
   pvm_send(master, 7);

   pvm_exit();
}
Figure 1 gives the C code for the master program which is contained in the file master.c. Figure 2 shows the C code for the worker portion of the PVM example application stored in the file worker.c. Let's examine each program to see how the PVM calls are used.

The first line of both programs includes the PVM header file. This file defines PVM symbolic names and functions.

The master program initializes then the data that is to be summed up. Next, the worker processes are spawned.

int numt = pvm_spawn(char *task, char **argv, int flag, char *where,
int ntask, int *tids)
The task parameter is a string containing the name of the executable file to be run (``spawned''). Any arguments that must be sent to this program are in an array pointed to by argv. If no arguments are required by the task, then the argv parameter can be NULL. The flag parameter is used to determine the specific machine or type of architecture the spawned task is to be run on. Possible values for flag are: The ntask parameter, specifies the number of copies of the task to be spawned. The tids parameter is a pointer to an integer array that will contain the task ids of all tasks spawned on return. In numt the function returns the number of tasks that were successfully created. If some tasks could not be started, the last ( ntask - numt) positions of tids will contain error codes for the unsuccessful tasks.

For error codes : try man pvm_spawn.
 

In our example code five (NPROCS) copies of the executable file ``worker'' will be spawned by the master program. No arguments are to be sent and we are allowing PVM to choose which machines will be used to execute the worker code. The task ids will be placed in the array task_ids.

To send a message from one task to another, a send buffer is created to hold the data. The function pvm_initsend() creates and clears a buffer and returns a buffer identifier.

int bufid = pvm_initsend (int encoding)
Generally, pvm_initsend() must be called each time a new message is to be sent, in order to clear the default send buffer. The encoding parameter can be either PvmDataDefault, PvmDataRaw or PvmDataInPlace. The PvmDataDefault option will use XDR (eXtended Data Representation standard) encoding for sending message data (allows machines with different data representations to exchange information). The PvmDataRaw option does no encoding of the message data. The PvmDataInPlace leaves the data in memory instead of copying it to the send buffer, this saves time but requires that the programmer does not alter the data until the send is completed.

The send buffer needs to be packed with data to be sent. (KOFFER PACKEN) The functions to pack data into the active send buffer are pvm_pkXXXX() where the XXXX indicates the type of data being packed. The data types supported by PVM (and their XXXX function designation) are byte (byte), complex (cplx), double complex (dcplx), double (double), float (float), integer (int), long (long) and short(short). The example code packs integers and uses the function pvm_pkint().

int info = pvm_pkint (int *np, int nitem, int stride)
The first np parameter is a pointer to the data to be packed. The nitem parameter is the total number of items to be packed. The stride parameter is the stride (step size, number of items to skip) to use when packing. There is also a function to pack a NULL terminated string (str) that requires only a single parameter, the pointer to the string.

A single message can be packed with any number of different data types and there is no limit on the complexity of the message. However, you should ensure that the received message is unpacked in the same order it was originally packed.

The function to send a message is pvm_send().

int info = pvm_send (int tid, int msgtag)
This function attaches an integer label of msgtag and immediately sends the contents of the send buffer to the task with the task id of tid. The msgtag is an arbitrary integer that can be used to distinguish between different messages that a task could send out.

In the example code, for each worker, the master program clears the send buffer for each new message and packs this buffer with two things: 1) the number of array elements that follow in the message and 2) the array portion to be summed. Since each consecutive item from the array a is to be sent, starting with the num_data*i position, the stride for the packing function is 1. The task_ids array that was returned from the pvm_spawn() call is used to address each different task that will receive a portion of the array. The arbitrarily chosen value `4' is the msgtag used to label the messages.

After the array portions have been distributed to the worker tasks, the workers will process the data and send back some results.

The master program must receive (pvm_recv()) a partial sum from each of the worker processes.

int bufid = pvm_recv (int tid, int msgtag)
This will receive a message from task tid with label msgtag and place it into the receive buffer with id bufid. If no message is waiting from the given task with the expected label, the function waits until a message from the proper task or with the correct label arrives. Values of `-1' for the parameters will match with any task id and/or label. In the example code, the master program is expecting a label value of `7' on messages from the worker tasks. The loop forces the messages to be recieved from the workers in order.

Once a message has been received the data within must be unpacked. The unpacking functions are pvm_upkXXXX() where the XXXX corresponds to the type of data that is to be unpacked. The same XXXX extensions used in the pvm_pkXXXX() packing functions are valid for pvm_upkXXXX(). For example, since the master program is receiving an integer from its worker processes, it calls the integer unpacking function

int info = pvm_upkint (int *np, int nitem, int stride)
The np parameter is a pointer to where the data is to be stored. The nitem parameters specifies the number of items to be unpacked, and the stride parameter specifies the stride to be used when unpacking the data. Our example code unpacks each partial result received into a different element of the results array and adds it to the running sum.

After the sum is computed and printed the master task informs the PVM daemon that it is exiting from the virtual machine.

int info = pvm_exit()
As for the worker program, after enrolling in the virtual machine, the worker tasks wait to receive their portion of the array to be summed. Using the `-1' wild cards in the pvm_recv() call indicates that the task does not care what task the message was sent from nor what message label was used. This is because the worker had not found out the tid of the master. Since the size of the array being sent may not be known ahead of time, after unpacking the number of data items from the message, the worker code allocates enough memory to hold the rest of the data contained in the message. The array fragment is summed up and the total is sent back to the parent.

The task id of the parent task that spawned the current task is returned by the pvm_parent() function.

int parent_id = pvm_parent()
Note that since the master program is expecting a msgtag of `7' from the worker tasks, this value must be used in the pvm_send() call.

Helpful Hint: If the set of tasks spawned will need to communicate amongst themselves, the parent task needs to send the task array tids that is generated in the pvm_spawn() function to each.

The compilation process for our example codes would be
> cc -o master master.c -I$PVM_ROOT/include -L$PVM_ROOT/lib/LINUX -lpvm3 -lnsl

> cc -o worker worker.c -I$PVM_ROOT/include -L$PVM_ROOT/lib/LINUX -lpvm3 -lnsl

Complete details for all PVM library functions can be found in PVM 3 User's Guide and Reference Manual, online availabe at :  http://www.netlib.org/pvm3/book/node1.html
Includes TroubleShooting !!!!
 

Running PVM Applications

Sart the PVM console program.
cetus1a> pvm
pvm>
The prompt for the PVM console is pvm>. From this prompt you can add or delete computers from your virtual machine, list the current virtual machine configuration and check the status of executing tasks. The help or ? command accesses the interactive help facility.

The conf command lists the current virtual machine configuration. This list includes the host name, PVM daemon task id, architecture type, and relative speed rating. At the first time it is only :

pvm> conf
6 hosts, 1 data format
                    HOST     DTID     ARCH  SPEED
                 cetus1a    40000     SUN4   1000

In order to add new hosts use the add command :
pvm > add cetus1b
....

pvm> conf
6 hosts, 1 data format
                    HOST     DTID     ARCH  SPEED
                 cetus1a    40000     SUN4   1000
                 cetus1b    80000     SUN4   1000
                 cetus1c    c0000     SUN4   1000
                 cetus1d   100000     SUN4   1000
                 cetus1e   140000     SUN4   1000
                 cetus1f   180000     SUN4   1000
If you wish to dynamically create or alter your virtual machine, you do not need to create a hostfile. Use the add and delete commands to add and remove computers from your virtual machine.
pvm> add cetus2a
1 successful
                    HOST     DTID
                 cetus2a   1c0000
pvm> delete cetus1b

1 successful
                    HOST  STATUS
                 cetus1b  deleted

Your PVM application can be started on any of the hosts in the virtual machine. To do this from the initial host, we can either put the PVM console into the background with a control-Z followed by bg or issue the quit command. The quit command only exits the console program; all the daemons and PVM tasks are left running. You may quit and restart the PVM console any number of times without disturbing any of the PVM daemons or hosts in the configuration. Of course, if you have multiple windows on the initial host, you can leave the console program running in one window while you execute your programs from another.

pvm> quit

pvmd still running.
cetus1a>
As if it were any other UNIX executable, you need only type the name of the executable at your system prompt to start your PVM application.
cetus1a> master
The sum is 12000
While the PVM daemon is running, you may edit and recompile your PVM programs as you require. There is no need to keep halting and restarting the daemon and console programs to make alterations in your code.

When you have finished, you need to halt PVM. Bring the PVM console to the foreground with the fg command or restart it if you had quit earlier and issue the halt command.

cetus1a> pvm
pvmd already running.
pvm> halt
cetus1a>
This will terminate all PVM daemons within the virtual machine configuration that included the host issuing the halt command.

Where to get more Information

There is a lot more to know about PVM. One of the most commonly used sets of routines has to do with groups of tasks, and those have not been covered here. Many other routines having to do with manipulating the virtual machine and sending and receiving messages have not been covered.

The PVM Home Page is at the URL: http://www.epm.ornl.gov/pvm/pvm_home.html. A lot of useful information can be found there, including pointers to other introductions, and information about the latest developments in PVM.

The PVM 3 User's Guide and Reference Manual is the most complete and readily available source of information on PVM. As has been stated before, this document can be obtained fromnetlib@ornl.gov by sending the message send ug.ps from pvm3. It is also available via the URL: http://www.netlib.org/pvm3/ug.ps.

There is a book on PVM available from MIT Press called PVM: Parallel Virtual Machine A Users' Guide and Tutorial for Networked Parallel Computing. An online HTML version of the book can be browsed at URL: http://www.netlib.org/pvm3/book/pvm-book.html.

A convenient reference card to PVM calls is available from URL:http://www.netlib.org/pvm3/refcard.ps, this summarizes the C and Fortran interface in two sheets.

A UseNet newsgroup, comp.parallel.pvm, exists. The forum is intended for users to exchange ideas, tricks, successes and problems.