README

Contention Communication Model (CCMOD)


Contents:


Introduction

CCMOD is a C++ model library that predicts communication delays of packet switching network considering link contention and background loads. It is designed to assist software developers and system administrators to perform performance prediction and capacity planning studies. The main features of CCMOD are:

CCMOD represents in a sufficient level of detail the network characteristics to consider the performance effects that might be of interest to a software engineer or system administration such as the link contention, message routing, bottleneck identification, and background loads. CCMOD does not represent the physical and protocol layers of the network.

The model evaluation algorithm introduces powerful abstractions that allow the user to define the system without the need of understanding or modifying the evaluation process and internal CCMOD calculations.

The code has been tested with MS Visual C++ 7.0 compiler.

Back to contents


List of Files

readme.htm This file
actmsg.h, actmsg.cpp Active message class
ccmod.h, ccmod.cpp Error handling and global definitions
evaleng.h, evaleng.cpp Model evaluation engine
model.cpp Model configuration and evaluation
model.mak Makefile for MS Visual C++ compiler
msg.h, msg.cpp Static message class
myrinet.h, myrinet.cpp netsys derived class for Myrinet switch
netsys.h Abstract netsys class definition
otrace.h Abstract otrace class definition
otrdeb.h, otrdeb.cpp otrace derived class for printing output traces to stdout
proc.h, proc.cpp Processor class
sample1.trace Sample ASCII workload trace file obtained from an FFT (~130 traces)
sample2.trace Sample ASCII workload trace file obtained from Sweep3D (~96000 traces)
workload.h Abstract workload class definition
wrkascii.h, wrkascii.cpp workload derived class for reading ASCII workload traces

Back to contents


Model Configuration and Evaluation

The CCMOD C++ library is designed to hide the internal model details from the user. The evaluation engine interfaces to the user configuration through three abstract classes:

The user is required to provide the three derived classes. A typical scenario using the CCMOD library, assuming that you have created the three customization classes, is shown below:

 
//
// Communication Contention Model
//
// model.cpp - Process command lines and configure & evaluate model
//
// Copyright (c) 2000 Microsoft Research Ltd. All rights reserved.
//

#include <iostream>
#include <fstream>
#include <iomanip>
#include <stdlib.h>
#include <assert.h>

#include "ccmod.h"

#include "wrkascii.h"
#include "myrinet.h"
#include "otrdeb.h"

using namespace std;

int main(int argc,char** argv)
{
    const int Nproc = 16;

    if( argc != 3 ) {
        cerr << "Usage: model.exe <workload traces> <output file>" << endl;
        exit(1);
    }

    // Set up trace source if present
    ifstream fsrc(argv[1]); 
   	if( !fsrc ) {
   	    cerr << "Error opening trace file\n";
   	    return 1;
   	}

    // Set up otrace file
    ofstream otrg(argv[2]);
   	if( !otrg ) {
        cerr << "Error opening output trace file\n";
        return 1;
    }

    // Set up & evaluate model
    try {

        // Set up

        // Workload
        wrkascii w(Nproc,1,&fsrc);

        // System model
        myrinet  myrswitch(Nproc);

        // Output trace facility
        // Use DBNONE to turn off debug mode
        otrdeb  odb(otrdeb::DBALL,&otrg);


        // Set up evaluation engine
        evaleng  e(&w,&myrswitch,&odb);

        // Evaluate
        cout << e.Go() << endl;
    }

    catch( ccmod_error er ) {
        cerr << er.what() << endl;
        return(1);
    }

    return(0);
}

The evaleng class function Go() returns the overall workload predicted execution time. The detailed timings and operation of the model are fed to the otrace class during the evaluation. The time units are arbitrary and they are specified by the user defined classes.

 

Back to contents


Error Handling

If an error occurs during the execution of the user defined function raise the exception ccmod_error. For example:

if( error ) {
    throw ccmod_error(__LINE__,"class name","Something is wrong...");
}
Back to contents

 


Workload Definition

The workload class feeds the evaluation engine with workload traces. The user defined class is a derivation of the workload class defined in workload.h:

class workload { // Abstract workload classs
public:

	enum trace { // Tracetype
		TRACE_IDLE, // Processor is idle
		TRACE_SCOM, // Synchronous communication
		TRACE_END, // End of traces

		// Trace extension that might be used by 
		// derived classes
		TRACE_EXT1=11,TRACE_EXT2=12,TRACE_EXT3=13,
		TRACE_EXT4=14,TRACE_EXT5=15
	};
	// Returns current trace type
	virtual trace GetTraceType(int procid) = 0;
	// Return communication trace data configuration
	virtual void GetTraceData(int procid,
			int &src,int &trg,long &len) = 0;
	// Returns processing trace configuration
	virtual void GetTraceData(int procid, long &time) = 0;
	// Prepares next trace
	virtual trace FetchNextTrace(int procid) = 0;
	// Prints workload debugging information
	virtual void print(std::ostream&) = 0;

}

The workload class reads detailed workload traces or generates workload traces depending on the problem requirements. An approach to implement the workload class is to store traces into processor (or computer, server, client) queues.  The use of processor queues probably is not required when the workloads are generated on the fly or the trace file is read during evaluation.

The type of traces used within the evaluation engine are defined in the trace enum.  The extention trace types can be used within the derived workload class as flags. The traces types are defined as:

 

TRACE_IDLE
This is the idle event that denotes that the processor doesn't generate any communication requests for a specified time interval.
TRACE_SCOM
This is the synchronous communication trace. For two processors to communicate both should have in the top of the workload queues the same SCOM event. The trace is specified with the source processor id, target processor id, and the length of the message in bytes. If one of the two processors is busy with other events the ready processor blocks.
TRACE_END
Denotes that the traces for a processor have finished.

 The following support virtual functions are required to be specified in the derived class:

 virtual trace GetTraceType(int procid)

Returns the current trace type (top of the queue?) of the procid processor.

void GetTraceData(int procid,int &src,int &trg,long &len)

Provides configuration information of a communication trace. procid is the processor that generates the event, and src, trg, and len the message event configuration parameters.

void GetTraceData(int procid, long &time)

Same as above for a idle event. The parameter time is the time duration that the processor remains idle.

trace FetchNextTrace(int procid)

Prepares the next trace for a specific processor. The next call of GetTraceType() will return the type of the new trace.

Note: 

The evaluation engine assumes that there no need to call FetchNextTrace() in the first request of traces (i.e. the first trace is ready).

Back to contents

 


Network Configuration

The netsys class provides all the system related configuration information to the evaluation engine. It is essential that the class member functions are defined correctly since they are responsible for the accuracy of predictions.

The netsys abstract class is defined as:

class netsys {
public:
	// Message routing
	virtual void Routing(int,int,unsigned char(*)[2]) = 0;
	// Specifies background load
	virtual float (*GetBgrLoad(long))[2] = 0;

	// Communication cost
	virtual long Tcom(int,float) = 0;
	// Inverse communication cost
	virtual float T2Pack(long,int,float,float,long) = 0;

	// Return number of links
	virtual int GetLinkNo(void) = 0;
	// Return numbet of processors in the system
	virtual int GetNproc(void) = 0;
	// Return size of packet in bytes
	virtual int PacketSize(void) = 0;
	// Return name of system
	virtual const char* Name(void) = 0;
}

The netsys member functions can be organized into the following groups:

Message routing and link load

CCMOD uses the notion of the routing array (RA) to represent the network topology and the routing of messages. RAs include the communication channels (links) that a message will travel through to reach its destination (target processor). It is a two dimensional array; the columns correspond to the links of the system and rows correspond to the message direction. Each array element is either 1 (when the message traverses a link in a specified direction) or 0.

void Routing(int srcp,int trgp,unsigned char(*ra)[2])

This function fills the routing array (ra) given the source and target processors (srcp and trgp). The array is filled by emulating the routing algorithm of the system.

float (*GetBgrLoad(long clock))[2]

CCMOD can consider the performance implications of background loads. GetBgrLoad returns a two dimensional array that has the same context as an RA. The values of the elements however are floats and range from 0-1. These values represent the percentage background link for each link direction. When an element of the background array is 0 it means that there is no background load and when it is 1 it means that the link is exclusively used by background load. The function argument clock is the current simulated clock value. The background function might either generate a constant background array or change over time based on some statistic or stochastic process.

Communication Cost

The communication delay depends on the physical characteristics of the network, protocol, contention factor, and length of message. In some cases the number of hops between processors or network components (e.g. switches) also influence the communication cost. The communication delay of a single message traveling through a quiet network (without any link contention) can be determined by creating a regression model of the communication cost versus the message length and the number of hops. Measurements for a range of point to point communication scenarios are obtained by benchmarking and then they are used to determine the regression parameters.

long Tcom(int hops,float packets)

Returns the communication delay of a message traveling through a quiet network given the number of hops and the number of packets. The number of packets can be a decimal number. This assumes that the system supports variable size packets. This is the case for the majority of modern networks. If the network only supports fixed size packets the round the number of packets to the larger integer and then perform the calculation.

The initial calculation of a communication delay is based on the assumption that the traffic of the system will remain the same throughout the duration of the message. However, the communication traffic changes each time a new message enters the system or an existing message completes. Consequently, the status of the network might change many times during the lifetime. CCMOD tracks the traffic state transition using a number of modeling techniques. The period that the network remains steady is called Event Horizon (EH)

If EH is shorter than the communication delay, the size of the message that has been transmitted has to be determined. A new communication delay will be calculated for the updated message size and traffic. For this purpose a new member function is required that returns the number of bytes consumed for a given EH and number of hops (an inverse of Tcom):

float T2Pack(long eh,int hops,float pt,float tr,long tc)

Returns number of packets consumed given the duration of the event horizon (eh), the number of message hops (hops), the total message packets (pt), packets yet to be consumed (tr), and time spend for the message communication so far (tc).

Care should be taken to avoid initial start-up costs in the cases of messages that communicate through more than one EHs. This can be achieved by using two separate Tcom() models, one for the initialization of the message and another one for consequent traffic states.

Status & Information

int GetLinkNo(void)
int GetNproc(void)
int PacketSize(void)
const char* Name(void)

A number of member functions provide information about the configuration of the systems.

Back to contents


Output Trace Processing

The otrace class process the detailed performance and network status information that the model generates during the evaluation.

The otrace abstract class is defined:

class otrace {
public:
	// Signal from evaleng -> otrace
	enum otrace_sig {
		CONFIG,
		GO_INIT,
		GO_STARTEVENT,
		GO_END,
		CREATEEVENT,
		NEWSCOM,
		NEWPROC,
		SYSCONT,
		MSGCONT,
		COMMCOST,
		EVENTHORIZON,
		UPPR_PROC,
		UPPR_SCOM,
		UPPR_PCNS,
		UPPR_END
	};
	// Receive an output trace signal from evaleng
	virtual void RecvSignal(otrace_sig ...)=0; 
}
There is only one function member defined in the class. The otrace member function receives signals from the evaluation engine. Each signal is called at a specific moment during the evaluation and is followed with a number of arguments providing additional information. The type and number of arguments depends on the signal type.

A simple example implementation of RecvSignal is shown below:

#include <stdarg.h>
// Central signal distribution function
void otrdeb::RecvSignal(otrace_sig s ...)
{
	va_list marker;
	va_start(marker,s);

	switch(s) {
	case GO_STARTEVENT:
		cerr << "[EVL] Event: " 
		<< va_arg(marker,long) << endl;
	break;
	case EVENTHORIZON:
		cerr << "[EHR] Event horizon (time): "
		<< va_arg(marker,long) << endl;
	break;	}
}

The sample function process two types of signals and ignores everything else. In the first case statement (GO_STARTEVENT) the current event number is printed to the stderr. The second case (EVENTHORIZON) prints the duration of the current traffic state.

The following table includes a signal description, where they are generated, and the additional arguments provided as additional information.

CONFIG Before evaluation process starts
  • int nproc - number of processors
  • int nlink - number of links
  • workload* w - user defined workload class
  • proc* p - array of processor status class
GO_INIT Before first event
  • char* sysname - name of system
GO_STARTEVENT Start of event
  • int event - current event number
GO_END End of evaluation
  • int event - number of events processed
CREATEEVENT After the processing of traces and creation of new events
  • int newev - number of new events created at current cycle
  • int oldev - number of events carried forward from previous cycle
NEWSCOM After processing of new communication trace
  • int procid - Processor of id source processor
NEWPROC After processing of new idle trace
  • int procid - Processor id
  • ulong time - Duration of idle period
SYSCONT After system contention factors have been determined
  • float (*scont)[2] - The effective bandwidth of each link direction taking into account message traffic and background load
MSGCONT After calculation of message contention factor
  • int procid - Source processor id
  • float msgc - The effective bandwidth of the link that is the bottleneck for a message
COMMCOST After calculation of message communication cost taking into account contention and background load
  • int procid -Source processor id
  • int npack - Number of packets in message
  • int hops - Number of message hops
  • ulong cost - Communication cost
EVENTHORIZON After calculation of duration of current traffic state
  • long evhr - Duration of event horizon
UPPR_PROC After processor status has been updated for a traffic state while is in idle state
  • int procid - Processor id
UPPR_SCOM After processor status has been updated for a traffic state cycle while is in communication state
  • int srcp - Source processor id
  • int trgp - Target processor id
UPPR_PCNS After calculation of packets consumed during traffic state cycle
  • int srcp - Source processor id
  • int trgp - Target processor id
  • float npacks - Number of packets consumed
UPPR_END After completion of traffic state cycle
  • ulong t - The max value of all processor clocks

 

Back to contents

 


 

Stathis Papaefstathiou (efp@microsoft.com)
Copyright (c) 2000 Microsoft Research Ltd. All rights reserved.