SlideShare a Scribd company logo
1 of 8
Download to read offline
A Lightweight C++ Interface to MPI
Simone Pellegrini, Radu Prodan, Thomas Fahringer
Institute of Computer Science, University of Innsbruck
Technikerstr. 21A, 6020 Innsbruck, Austria
Abstract
The Message Passing Interface (MPI) provides bindings
for the three programming languages commonly used in
High Performance Computing (HPC): C, C++ and Fortran.
Unfortunately, MPI supports only the lowest common de-
nominator of the three languages, providing a level of ab-
straction far lower than typical C++ libraries. Lately, af-
ter the decision of the MPI committee to deprecate and re-
move the C++ bindings from the MPI standard, program-
mers are forced to use either the C API or rely on third-party
libraries.
In this paper we present a lightweight, header-only C++
interface to MPI which uses object oriented and generic
programming concepts to improve its integration into the
C++ programming language. We compare our wrapper
with a related approach called Boost.MPI showing how
MPP facilitates the interaction with C++ objects. Perfor-
mance wise, MPP outperforms Boost.MPI by reducing the
interface overhead by a factor of eight. Additionally, MPPā€™s
handling of user-deļ¬ned data types allows transferring of
STL containers (e.g. std::list) up to 20 times faster
than Boost.MPI for small linked lists by relying on software
serialization.
1 Introduction
MPI is the defacto standard for writing parallel programs
for distributed memory systems. As its focus is on High
Performance Computing (HPC), MPI offers an Applica-
tion Programming Interface (API) for C, C++ and Fortran,
the most widely used languages for HPC. Unfortunately,
since the deļ¬nition of the ļ¬rst standard in 1994 [3], MPI
did not keep the pace with the evolution of the underlying
languages, such as object-oriented programming in Fortran
2000 and templates in C++. Nowadays, this problem is
mostly perceived in C++ which, unlike Fortran and C, pro-
vides much higher-level abstractions which are not reļ¬‚ected
in the design of the MPI interface [6]. MPI is so poorly in-
tegrated into the C++ environment that many programmers
prefer to use, even in C++ programs, the C interface. Fur-
thermore, to map common C++ constructs onto MPI, pro-
grammers are forced to weaken the language type safety. As
a consequence, errors that could be easily detected by the
compiler are no longer captured leading to runtime failures.
These issues led the MPI committee to the decision of dep-
recating C++ bindings in the version 2.2 of the MPI stan-
dard. However, because of the growing interest and use of
C++ in HPC, several third-party wrappers to MPI have been
proposed [11], the most important being Boost.MPI [8] and
OOMPI [9].
Figure 1 shows a simple MPI program sending two ļ¬‚oat-
ing point values from process rank 0 to rank 1. A problem
of this code snippet is that the programmer is forced to un-
necessarily declare a temporary variable val to store the
values being sent by MPI Send (line 4). Although the C99
standard [1] introduced compound literals to avoid such un-
necessary memory allocations (line 2), they are not widely
used because of the decreased code readability. Because the
compiler is not aware of the semantics of MPI Send which
guarantees that the valā€™s value is not modiļ¬ed, no memory
optimizations can be performed. A second problem is that
the signature of all MPI routines requires the programmer
to provide the size and the type (i.e. one MPI FLOAT) of
the data being sent, which is error-prone and can be avoided
in C++ by inferring them at compile-time.
Boost.MPI [8] tries to simplify the MPI interface by de-
ducing several of those parameters at compile-time through
C++ template techniques. For example, the size of the data
sent and its associated MPI Datatype is strictly related to
the type of the object being sent and, therefore, deducible at
compile-time from the C++ typing system. The send and
recv routines in Boost.MPI require only three parameters,
as shown in Figure 2 (lines 2, 3, 6, and 8): the source/desti-
nation rank, the message tag, and the message content. This
not only simpliļ¬es the usage of the routines, but also im-
proves their safety. Although Boost.MPI is a consistent im-
provement over the standard MPI C++ bindings, it is not
widely accepted within the MPI community because of two
main reasons: (i) the dependency on the Boost C++ library
and accompanying licensing issues; (ii) the use of a serial-
2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
978-0-7695-4633-9/12 $26.00 Ā© 2012 IEEE
DOI 10.1109/PDP.2012.42
3
1 if ( rank==0 ) {
2 MPI Send((const int[1]) { 2 }, 1, MPI INT, 1, 1,
3 MPI COMM WORLD );
4 std::array<float,2> val = {3.14f, 2.95f};
5 MPI Send(&val.front(), val.size(), MPI FLOAT, 1, 0,
6 MPI COMM WORLD);
7 } else if (rank == 1) {
8 int n;
9 MPI Recv(&n, 1, MPI INT, 0, 1, MPI COMM WORLD,
10 MPI STATUS IGNORE);
11 std::vector<float> values(n);
12 MPI Recv(&values.front(), n, MPI FLOAT, 0, 0,
13 MPI COMM WORLD, MPI STATUS IGNORE);
14 }
Figure 1. Simple MPI program using C bind-
ings
ization library [10] to handle transmission of user-deļ¬ned
data types (i.e. merging of objects with a sparse memory
representation into a continuous data chunk) that negatively
impacts the performance.
1 if ( world.rank() ==0 ) {
2 world.send( 1, 1, 2 );
3 world.send( 1, 0, std::array<float,2>({3.14f, 2.95f}) );
4 } else if (world.rank() == 1) {
5 int n;
6 world.recv(0, 1, n);
7 std::vector<float> values(n);
8 world.recv(0, 0, values);
9 }
Figure 2. Boost.MPI version of the program
from Figure 1
An object-oriented approach to improve the C++ MPI
interface is OOMPI [9] which speciļ¬es send and receive
operations in a more user friendly way by overloading the
insertion << and extraction >> C++ operators. In OOMPI, a
Port towards a process rank is obtained by using the array
subscript operator [] on a communicator object (see line 2
in Figure 3). A further advantage is the convenience to com-
bine these operators in one C++ instruction when inserting
or extracting data to/from the same stream. A drawback
of OOMPI is the poor integration of arrays and user data
types in general. For example, sending an array instance
requires the programmer to explicitly instantiate an object
of class OOMPI Array message, which requires the size
and type of the data to be manually speciļ¬ed as in the cur-
rent MPI speciļ¬cation (line 4). The support for generic user
data types requires the objects being sent to inherit from the
OOMPI User type interface. This is a rather severe lim-
itation as it does not allow any legacy class (e.g. the STLā€™s
containers) to be directly supported.
1 if ( OOMPI COMM WORLD.rank() == 0 ) {
2 OOMPI COMM WORLD[1] << 2;
3 std::array<float,2> val = {3.14f, 2.95f};
4 OOMPI COMM WORLD[1] <<
5 OOMPI Array message(&val.front(), val.size());
6 } else if (OOMPI COMM WORLD.rank() == 1) {
7 int n;
8 OOMPI COMM WORLD[0] >> n;
9 std::vector<float> values(n);
10 OOMPI COMM WORLD[0] >>
11 OOMPI Array message(values, 2);
12 }
Figure 3. OOMPI version of the program from
Figure 1
In this paper, we combine some of the concepts pre-
sented in Boost.MPI and OOMPI and propose an advanced
lightweight MPI C++ interface called MPP that aims at
transparently integrating the message passing paradigm into
the C++ programming language without sacriļ¬cing perfor-
mance. Our approach focuses on point-to-point commu-
nications and integration of user data types which, unlike
Boost.MPI, relies entirely on native MPI Datatypes for
better performance. Our interface also utilizes advanced
concepts from other parallel programming languages such
future objects [5] which simpliļ¬es the use of MPI asyn-
chronous routines.
Overall, MPP is designed with a speciļ¬c focus on per-
formance. As we target HPC systems, we understand how
critical performance is and spent signiļ¬cant effort to re-
duce the interface overhead. We compare the performance
of MPP with Boost.MPI and show that, for a simple ping-
pong application, MPP achieves a four times larger through-
put (in terms of messages per seconds). Compared to the
pure C bindings, MPP has an increased latency of only
9%. As far as the handling of user data types is concerned,
MPP is able to reduce the transfer time of a linked list (i.e.
std::list<T> from C++ STL) up to 20 times compared
to Boost.MPI. To determine the real beneļ¬t of using MPP
for real applications, we rewrote the computational kernel
of QUAD MPI [7] to use Boost.MPI and MPP. The ob-
tained results show a performance improvement of around
12% compared to Boost.MPI.
The rest of the paper is organized as follows. In Section 2
we introduce MPP as a lightweight C++ wrapper to MPI
using small code snippets. In Section 3 we compare our
library against Boost.MPI and an plain MPI implementation
using two micro-benchmark codes and an application code
4
called QUAD MPI. Section 4 concludes the paper.
2 MPP: C++ Interface to MPI
We use object-oriented programming concepts and C++
templates to design a lightweight wrapper for MPI routines
that simpliļ¬es the way in which MPI programs are written.
Similar to Boost.MPI, we achieve this goal by reducing the
amount of information required by MPI routines and by in-
ferring as much as possible at compile-time. By reducing
the amount of code written by the users, we expect less pro-
gramming errors. Furthermore, by making type checking
safer, most common programming mistakes can be captured
at compile-time. In this work, we focus on point-to-point
operators, as the specialised semantics of collective oper-
ations has no counterpart in C++ STL. We also present a
generic mechanism of handling C++ user data types which
allows for easy transfer of C++ objects to any existing MPI
routine (including collective operations).
2.1 Point-to-Point Communication
While Boost.MPI maintains in its API design the style of
the traditional send/receive MPI routines, our approach is
more similar to OOMPI aiming at a better C++ integration
by deļ¬ning these basic operations using streams. A stream
is an abstraction that represents a device on which input
and output operations are performed. Therefore, sending
or receiving a message through a MPI channel can be seen
as a stream operation. We introduce a mpi::endpoint
class which has the semantics of a bidirectional stream from
which data can be read (received) or written (sent) using the
<< and >> operators. The concept of endpoints is simi-
lar to the Port abstraction of OOMPI, however, because
our mechanism is based on generic programming, user-
deļ¬ned data types can be transparently handled. In con-
trast, OOMPI is based on inheritance which forces the pro-
grammer to instantiate an OOMPI Message class contain-
ing the data type and size required by the MPI routines un-
derneath [11] (see line 4 in Figure 3).
Because an MPI send/receive operation offers more ca-
pabilities than C++ streams (e.g. tags for messages, non-
blocking semantics), endpoints cannot be directly modelled
using an ā€œis-aā€ relationship. Fortunately, STLā€™s utilities
(e.g. algorithms) are mostly based on templates and end-
points can be passed to any generic function which relies on
the << or >> stream operations. Figure 4 shows an example
that uses an endpoint as argument to a generic read from
function. An endpoint is generated from a communicator
using the () operator to which the process rank is passed
(line 3). The mpi::comm class is a simple wrapper for an
MPI Communicator with the capability of creating end-
points and retrieving the current process rank and the com-
municator size. The mpi::world refers to an instance of
the comm class which wraps the MPI COMM WORLD com-
municator.
1 namespace mpi {
2 struct comm {
3 mpi::endpoint operator()(int) const;
4 };
5 } // end mpi namespace
6
7 template <class InStream, class T>
8 void read from(InStream& in, T& val) {
9 in >> val;
10 }
11
12 int val[2];
13 // reads the first element of the val array from std::cin
14 read from(std::cin, val[0]);
15
16 // receives 2nd element of val array from rank 1
17 read from(mpi::comm::world(1), val[1]);
Figure 4. Example of usage of endpoints in a
generic function.
Figure 5 shows how the program in Figure 1 can be
rewritten with MPP. First of all, the objects are either sent
or received using stream operations which allows for a more
compact code compared to C MPI bindings (half in size) or
to Boost.MPI. Secondly, objects are automatically wrapped
by a generic mpi::msg<T> object, which does not need
to be speciļ¬ed by the user (as opposed to OOMPI). Adding
this level of indirection allows MPP to handle both prim-
itive and user data types in a way transparent to the user.
R-values (i.e. values with no address such as constants) are
handled similar to any regular L-value (e.g. variables) us-
ing C++ constant references via the msg class, which avoids
unnecessary memory allocation. The interface also allows
to specify message tags by manually allocating the message
wrapper (example in line 3).
MPP also supports non-blocking semantics for the
send and receive operations through the overloaded <
and > operators. Unlike blocking send/receives, asyn-
chronous operations return a future object [5] of class
mpi::request<T> which can be polled to test whether
the pending operation has completed or not. An exam-
ple of non-blocking operations in MPP is shown in Fig-
ure 6. For non-blocking receives, the method T& get()
waits for the underlying operation to complete (line 2)
and, upon completion, it returns a reference to the received
value. The mpi::request<T> class also provides a
void wait() and a bool test() method implement-
ing the semantics of MPI Wait and MPI Test, respec-
5
1 using namespace mpi;
2 if ( comm::world.rank() == 0 ) {
3 comm::world(1) << std::array<float,2>({3.14f, 2.95f});
4 comm::world(1) << msg(2, 1);
5 } else if (mpi::world.rank() == 1) {
6 int n;
7 comm::world(0) << msg(n, 1);
8 std::vector<float> values(n);
9 comm::world(0) >> values;
10 }
Figure 5. MPP version of the program from
Figure 1.
tively. The example also shows MPPā€™s support for receive
operations which listen for messages coming from an un-
known process using the mpi::any constant rank when
creating an endpoint (line 3).
1 float real;
2 mpi::request<float>&& req =
3 mpi::comm::world(mpi::any) > real;
4 // ... do something else ...
5 use( req.get() );
Figure 6. Non-blocking MPP endpoints.
Errors returned in MPI by every routine as an error code
are handled in MPP via C++ exceptions. Any call to MPP
routines can potentially throw an exception as a subclass of
mpi::exception. The method get error code()
of this class allows the retrieval of the native error code.
2.2 User Data Types
OOMPI is one of the ļ¬rst APIs trying to introduce
support for user data types through inheritance from an
OOMPI User type class. Unfortunately, this mechanism
is relatively weak because, by relying on inheritance, it does
not allow the handling of class instances provided by third-
party libraries (e.g. STL containers). Another attempt is
the use of serialization in Boost.MPI which, although el-
egant, introduces a high runtime overhead. The objective
of MPP is to reach the same level of integration with user
data types as Boost.MPI without performance loss, which
we achieve by relying on the existing MPI support for user
data types, i.e. MPI Datatype. The deļ¬nition of an
MPI Datatype is rather cumbersome and therefore not
commonly used. Indeed, deļ¬ning an MPI Datatype re-
quires the programmer to specify several information re-
lated to its memory layout which often leads to program-
ming errors that are very difļ¬cult to debug. However, be-
1 template <class T>
2 struct mpi type traits<std::vector<T>> {
3 static inline const Tāˆ—
4 get addr( const std::vector<T>& vec ) {
5 return mpi type trait<T>::get addr(vec.front());
6 }
7 static inline const size t
8 get size( const std::vector<T>& vec ) {
9 return vec.size();
10 }
11 static inline MPI Datatype
12 get type( const std::vector<T>& ) {
13 return mpi type trait<T>::get type( T() );
14 }
15 };
16 ...
17 typedef mpi type traits<vector<int>> vect traits;
18 vector<int> v = { 2, 3, 5, 7, 11, 13, 17, 19 };
19 MPI Ssend( vect traits::get addr(v),
20 vect traits::get size(v),
21 vect traits::get type(v), ... );
Figure 7. Example of using mpi type traits
to handle STL vectors.
cause operations on data types are mapped to DMA trans-
fers by the MPI library, the use of an MPI Datatype out-
performs any other techniques based on software serializa-
tion.
The integration of user data types is achieved by using
a design pattern called type traits [4]. An example is illus-
trated in Figure 7 for C++ STLā€™s std::vector<T> class.
We let the user specialize a class which statically provides
the compiler three pieces of information required to map a
user data type to MPI Datatypes:
1. the memory address from which the data type instance
begins;
2. the type of each element;
3. the number of elements.
Because a C++ vector is contiguously allocated in mem-
ory, the starting address of the ļ¬rst element has to be recur-
sively computed for handling generic regular nested types
(e.g. vector<array<float,10>> in lines 3āˆ’6). The
length is the number of elements present in the vector (line
9) and the type is the data type of a vector element (line
11 āˆ’ 14). Because our mechanism is not based on inher-
itance (like in OOMPI), it is open for integration and use
with third party class libraries. Lines 17 āˆ’ 21 show how the
introduced type traits can be used with the MPI C binding.
This method can also be used for collective operations or
for one of the several ļ¬‚avors of MPI Send for which an
6
!



(a) Number of ping/pong operations per second.































































  
       #$   %
'
(


(
%

(b) Comparison of Boost.MPI and MPP for STLā€™s linked list
(std::listT).
Figure 8. MPP performance evaluation results.
appropriate operator cannot be deļ¬ned. MPP also provides
several type traits for some of the STL containers such as
vector, array and list.
3 Performance Evaluation
In this section we compare the performance of MPP
against Boost.MPI and the standard C binding of MPI. We
used the Open MPI version 1.4.2 to execute the experi-
ments. We did not consider OOMPI for performance eval-
uation since its development has been stopped since several
years. We ļ¬rst compared the MPI bindings by using micro-
benchmarks and then by using a real MPI application called
QUAD MPI which is a C++ program that approximates an
integral based on a quadrature rule [7].
3.1 Micro Benchmarks
The purpose of the ļ¬rst experiment is to measure the la-
tency overhead introduced by MPP over the standard C in-
terface to MPI compared to Boost.MPI. We implemented
a simple ping-pong application which we executed on a
shared memory machine with a single AMD Phenom II X2
555, 3.5 GHz dual-core processors, 1MB of L2 cache, and
6MB of L3 cache. This way, any data transmission over-
head is minimized and the focus is solely on the interface
overhead. Figure 8(a) displays the number of ping-pong
operations per second for varying message sizes. MPP has
approximately 9% larger latency for small messages com-
pared to the native MPI routines. This overhead is due to the
creation of a temporary status object corresponding to the
MPI Status returned by the MPI receive routine contain-
ing the message source, size, tag, and error (if any). Com-
pared to Boost.MPI, MPP shows nevertheless a consistent
performance improvement of around 75% for small mes-
sage sizes. Because both implementations use plain vectors
to store the exchanged message, no serialization is involved
to explain the overhead difference. We believe that the main
reason for this overhead comes from the fact that Boost.MPI
is implemented as a library and every call to MPI routines
pays the overhead of an additional function call. We solved
the problem in MPP by designing a pure header-based im-
plementation, which allows all MPP routines to be inlined
by the compiler, thus eliminating any overhead. The graph
also illustrates that, as expected, the overhead decreases for
larger messages as the communication time becomes pre-
dominant.
In the second experiment, we compared MPP with
Boost.MPI for the support of user-deļ¬ned data types. We
used a listdouble type of varying size exchanged be-
tween two processes in a loop repeated one thousand times.
We executed the experiment on an IBM blade cluster with
a quad-core Intel Xeon X5570 processors interconnected
through Inļ¬niband network. We allocated the two MPI pro-
cesses on different blades in order to simulate a real use
case scenario. Figure 8(b) shows the time necessary to per-
form this micro-benchmark for different list sizes and the
7
1 double my a, my b;
2 my total = 0.0;
3 if ( rank == 0 ) {
4 for ( unsigned q = 1; q  p; ++q ) {
5 my a = ( ( p āˆ’ q ) āˆ— a + ( q āˆ’ 1 ) āˆ— b ) / ( p āˆ’ 1 );
6 MPI Send ( my a, 1, MPI DOUBLE, q, 0 );
7
8 my b = ( ( p āˆ’ q āˆ’ 1 ) āˆ— a + ( q ) āˆ— b ) / ( p āˆ’ 1 );
9 MPI Send ( my b, 1, MPI DOUBLE, q, 0 );
10 }
11 } else {
12 MPI Recv ( my a, 1, MPI DOUBLE, 0, 0, status );
13 MPI Recv ( my b, 1, MPI DOUBLE, 0, 0, status );
14
15 for ( unsigned i = 1; i = my n; ++i ) {
16 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1);
17 my total = my total + f ( x );
18 }
19 my total = (my b āˆ’ my a) āˆ— my total / (double) my n;
20 }
Figure 9. Computational kernel of QUAD MPI.
speedup achieved by MPP over Boost.MPI. For small lists
of 100 elements, the speedup is approximately 20, how-
ever, the performance gap closes by increasing the list size.
The reason is the std::list implementation in MPP us-
ing MPI Type struct, which requires enumerating all
memory addresses that compose the object being sent. To
create an MPI Datatype for a linked list, three arrays
have to be provided:
ā€¢ the displacement of each list element relative to the
starting address;
ā€¢ the size of each element;
ā€¢ the data type of each element (i.e. O(3Ā·N) of memory
overhead).
We observe in Figure 8(a) that building such a data type be-
comes more expensive as the list size increases, so that for
large linked lists over 50,000 elements the software serial-
ization outperforms the MPI data typing mechanism. Future
optimization could improve the support of large data struc-
tures integrating in MPP a mechanism that switches from
the use of MPI Datatype to serialization starting from a
critical size.
3.2 QUAD MPI Application Code
The micro-benchmarks highlighted the low latency of
the MPP bindings, however this does not indicate much
about the beneļ¬ts of using MPP for real application codes.
1 my total = 0.0;
2 if ( rank == 0 ) {
3 for ( unsigned q = 1; q  p ; ++q ) {
4 world.send(q, 0, (( p āˆ’ q ) āˆ— a + ( q āˆ’ 1 ) āˆ— b) / ( p āˆ’ 1 ));
5 world.send(q, 0, (( p āˆ’ q āˆ’ 1 ) āˆ— a + ( q ) āˆ— b) / ( p āˆ’ 1 ));
6 }
7 } else {
8 double my a, my b;
9 world.recv(0, 1, my a);
10 world.recv(0, 2, my b);
11
12 for ( unsigned i = 1; i = my n; ++i ) {
13 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1);
14 my total = my total + f ( x );
15 }
16 my total = (my b āˆ’ my a) āˆ— my total / (double) my n;ł
17 }
Figure 10. Computational kernel of
QUAD MPI rewritten using Boost.MPI.
For this purpose we took a simple MPI application ker-
nel called QUAD MPI and rewritten using Boost.MPI and
MPP. QUAD MPI is a C program which approximates an
integral using a quadrature rule [7] and can be efļ¬ciently
parallelized using MPI. From the original code [7], we ex-
tracted the computational kernel depicted in Figure 9. The
process rank 0 assigns to every other process a sub-interval
of [A, B] and these bounds are then communicated using
message passing routines. The number of communication
statement in the code is limited, i.e. 2 Ā· (P āˆ’ 1), where P
is the number of processes. Therefore, this code represents
a good balance between communication and computation
making it a good choice to determine the beneļ¬ts of MPP
bindings.
This QUAD MPI kernel can be easily rewritten using
Boost.MPI and MPP, as shown respectively in Figures 10
and 12. In both cases, we removed the necessity of assign-
ing the value being sent to the my a and my b variables be-
cause both Boost.MPI and MPP support sending R-values
that are computed and directly sent to the destination (lines
4 and 5). The code at the receiver side is similar, the only
difference being that we can now restrict the scope of the
my a and my b variables to the else body only (lines 9 and
10), which allows a faster machine code generation as the
compiler can utilize the CPU registers more efļ¬ciently. Ad-
ditionally, MPP allows for a further reduction of the code
as shown in Figure 12, since the two sends (line 4) and the
two receives (line 9) can be combined together into a sin-
gle statement. MPP also relieves the programmer from the
burden of specifying a message tag by utilizing the tag 0
by default. With MPP we are able to shrink the input code
8
1 my total = 0.0;
2 if ( rank == 0 ) {
3 for ( unsigned q = 1; q  p; ++q ) {
4 comm::world(q)  ((p āˆ’ q) āˆ— a + (q āˆ’ 1) āˆ— b) / (p āˆ’ 1)
5  ((p āˆ’ q āˆ’ 1) āˆ— a + q āˆ— b ) / (p āˆ’ 1);
6 }
7 } else {
8 double my a, my b;
9 comm::world(0)  my a  my b;
10
11 for ( unsigned i = 1; i = my n; ++i ) {
12 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1);
13 my total = my total + f ( x );
14 }
15 my total = (my b āˆ’ my a) āˆ— my total / (double) my n;
16 }
Figure 11. Computational kernel of
QUAD MPI rewritten using MPP.
by 30% (in terms of number of characters), which reduces
the chances of programming errors and increases the overall
productivity.
We ran the three versions of the QUAD MPI kernel on a
machine with 16 cores (a dual socket Intel Xeon CPU) and
used shared memory to minimize communications costs
and highlight the library overhead. We compiled the in-
put programs with optimization enabled (i.e. -O3 ļ¬‚ag), re-
peated each experiment for 10 times, and reported the av-
erage and standard deviation in execution time (see Fig-
ure 12).
Because of the removal of the superļ¬‚uous assignment
operations to the my a and my b variables, the MPP ver-
sion performs slightly faster than the original code. It is
worth noticing that, although the same optimization has
been applied to the Boost.MPI version, the large overhead
of Boost.MPI cancels any beneļ¬t making the resulting code
the slowest of all three. Compared to Boost.MPI, the MPP
version has a performance improvement of around 12%.
4 Conclusions
In this paper we presented MPP as an advanced C++ in-
terface to MPI. We combined some of the ideas of OOMPI
and Boost.MPI into a lightweight, header-only interface
smoothly integrated with the C++ environment. We intro-
duced a transparent mechanism for dealing with user data
types which, for small objects, is up to 20 times faster than
Boost.MPI due to the use of MPI Datatypes instead of
software serialization. We showed that programs written
using MPP are more compact compared to the MPI C bind-
ings and that the object oriented design overhead introduced
   
 
 
 
 
 
 
 
 
 
 

!


#
$
!
%
Figure 12. QUAD MPI performance compari-
son.
is negligible. Furthermore, MPP can avoid common pro-
gramming errors in two ways:
1. through its interface design that uses future objects to
avoid reading the buffer of an asynchronous receive
before data has been written;
2. by automatically inferring most of the input arguments
required by MPI routines.
The MPP interface is freely available at [2].
In the future we intent to extend the interface to sup-
port easier use of other complex MPI features such as dy-
namic process management, operations on communicators
and groups, and creation of process topologies.
5 Acknoledgments
This research has been partially funded by the Austrian
Research Promotion Agency (FFG) under grant P7030-025-
011 and by the Tiroler Zukunftsstiftung under the Trans-
lational Research Grant ā€Parallel Computing with Java for
Manycore Computers.
References
[1] C99 standard. www.open-std.org/JTC1/SC22/
wg14/www/docs/n1124.pdf
[2] MPI C++ Interface. https://github.com/
motonacciu/mpp
[3] The MPI-1 Speciļ¬cation. http://www.mpi-forum.
org/docs/docs.html
[4] A. Alexandrescu. Traits: The else-if-then of types. In
C++ Report, pages 22ā€“25, 2000. http://erdani.
com/publications/traits.html
9
[5] H. C. Baker, Jr. and C. Hewitt. The incremental garbage
collection of processes. In Proceedings of the 1977 sympo-
sium on Artiļ¬cial intelligence and programming languages,
pages 55ā€“59, New York, NY, USA, 1977. ACM.
[6] J. S. Bill, B. S. Y, and A. L. Z. The design and evolution
of the MPI-2 C++ interface. In In Proceedings, 1997 In-
ternantional Conference on Scientiļ¬c Computing in Object-
Oriented Parallel Computing, Lecture Notes in Computer
Science. Springer-Verlag, 1997.
[7] J. Burkardt. http://people.sc.fsu.edu/
Ėœjburkardt/c_src/quad_mpi/quad_mpi.html
[8] P. Kambadur, D. Gregor, A. Lumsdaine, and A. Dharurkar.
Modernizing the C++ interface to MPI. In Recent Advances
in Parallel Virtual Machine and Message Passing Inter-
face, Lecture Notes in Computer Science, pages 266ā€“274.
Springer Berlin / Heidelberg, 2006.
[9] B. C. McCandless, J. M. Squyres, and A. Lumsdaine.
Object-Oriented MPI (OOMPI): A class library for the mes-
sage passing interface. In Proceedings of the Second MPI
Developers Conference, pages 87ā€“, Washington, DC, USA,
1996. IEEE Computer Society.
[10] R. Ramsey. Boost serialization library.www.boost.org/
doc/libs/release/libs/serialization/
[11] A. Skjellum, D. G. Wooley, A. Lumsdaine, Z. Lu, M. Wolf,
J. M. Squyres, B. Mccandless, and P. V. Bangalore. Object-
oriented analysis and design of the message passing inter-
face, 1998.
10

More Related Content

Similar to A Lightweight C++ Header-Only MPI Interface for Improved Performance (MPP

Unit 1 of c++ part 1 basic introduction
Unit 1 of c++ part 1 basic introductionUnit 1 of c++ part 1 basic introduction
Unit 1 of c++ part 1 basic introductionAKR Education
Ā 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCIJSRD
Ā 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfssuserada6a9
Ā 
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
C notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit orderC notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit order
C notes by m v b reddy(gitam)imp notes all units notes 5 unit orderMalikireddy Bramhananda Reddy
Ā 
Csc1100 lecture01 ch01 pt2-paradigm (1)
Csc1100 lecture01 ch01 pt2-paradigm (1)Csc1100 lecture01 ch01 pt2-paradigm (1)
Csc1100 lecture01 ch01 pt2-paradigm (1)IIUM
Ā 
Csc1100 lecture01 ch01 pt2-paradigm
Csc1100 lecture01 ch01 pt2-paradigmCsc1100 lecture01 ch01 pt2-paradigm
Csc1100 lecture01 ch01 pt2-paradigmIIUM
Ā 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming Rokonuzzaman Rony
Ā 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingnong_dan
Ā 
Sc13 comex poster
Sc13 comex posterSc13 comex poster
Sc13 comex posterhjjvandam
Ā 
Programming in c++
Programming in c++Programming in c++
Programming in c++MalarMohana
Ā 
Programming in c++
Programming in c++Programming in c++
Programming in c++sujathavvv
Ā 
Message passing Programing and MPI.
Message passing Programing and MPI.Message passing Programing and MPI.
Message passing Programing and MPI.Munawar Hussain
Ā 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interfaceMohit Raghuvanshi
Ā 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPIjbp4444
Ā 
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMA SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMVLSICS Design
Ā 
Overview of c++
Overview of c++Overview of c++
Overview of c++geeeeeet
Ā 

Similar to A Lightweight C++ Header-Only MPI Interface for Improved Performance (MPP (20)

Unit 1 of c++ part 1 basic introduction
Unit 1 of c++ part 1 basic introductionUnit 1 of c++ part 1 basic introduction
Unit 1 of c++ part 1 basic introduction
Ā 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Ā 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdf
Ā 
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
C notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit orderC notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit order
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
Ā 
C AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDYC AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDY
Ā 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
Ā 
Csc1100 lecture01 ch01 pt2-paradigm (1)
Csc1100 lecture01 ch01 pt2-paradigm (1)Csc1100 lecture01 ch01 pt2-paradigm (1)
Csc1100 lecture01 ch01 pt2-paradigm (1)
Ā 
Csc1100 lecture01 ch01 pt2-paradigm
Csc1100 lecture01 ch01 pt2-paradigmCsc1100 lecture01 ch01 pt2-paradigm
Csc1100 lecture01 ch01 pt2-paradigm
Ā 
Principal of objected oriented programming
Principal of objected oriented programming Principal of objected oriented programming
Principal of objected oriented programming
Ā 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
Ā 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
Ā 
Part 1
Part 1Part 1
Part 1
Ā 
Sc13 comex poster
Sc13 comex posterSc13 comex poster
Sc13 comex poster
Ā 
Programming in c++
Programming in c++Programming in c++
Programming in c++
Ā 
Programming in c++
Programming in c++Programming in c++
Programming in c++
Ā 
Message passing Programing and MPI.
Message passing Programing and MPI.Message passing Programing and MPI.
Message passing Programing and MPI.
Ā 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
Ā 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPI
Ā 
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHMA SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
A SYSTEMC/SIMULINK CO-SIMULATION ENVIRONMENT OF THE JPEG ALGORITHM
Ā 
Overview of c++
Overview of c++Overview of c++
Overview of c++
Ā 

More from Brittany Brown

Minimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitMinimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitBrittany Brown
Ā 
Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Brittany Brown
Ā 
Writing Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperWriting Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperBrittany Brown
Ā 
Best Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBest Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBrittany Brown
Ā 
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBest QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBrittany Brown
Ā 
Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Brittany Brown
Ā 
Case Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencCase Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencBrittany Brown
Ā 
Best Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBest Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBrittany Brown
Ā 
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishMy Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishBrittany Brown
Ā 
My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.Brittany Brown
Ā 
Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Brittany Brown
Ā 
How To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayHow To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayBrittany Brown
Ā 
Proposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComProposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComBrittany Brown
Ā 
Critical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsCritical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsBrittany Brown
Ā 
Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Brittany Brown
Ā 
Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Brittany Brown
Ā 
Sample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalSample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalBrittany Brown
Ā 
Access To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefAccess To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefBrittany Brown
Ā 
Groundhog Day Writing Paper Teaching Resources
Groundhog Day Writing Paper  Teaching ResourcesGroundhog Day Writing Paper  Teaching Resources
Groundhog Day Writing Paper Teaching ResourcesBrittany Brown
Ā 
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Brittany Brown
Ā 

More from Brittany Brown (20)

Minimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitMinimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper Digit
Ā 
Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.
Ā 
Writing Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperWriting Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On Paper
Ā 
Best Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBest Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay Examples
Ā 
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBest QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Ā 
Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.
Ā 
Case Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencCase Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And Scienc
Ā 
Best Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBest Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArt
Ā 
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishMy Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
Ā 
My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.
Ā 
Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.
Ā 
How To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayHow To Write A Paper For College Besttoppaperessay
How To Write A Paper For College Besttoppaperessay
Ā 
Proposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComProposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.Com
Ā 
Critical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsCritical Analysis Essay Examples For Students
Critical Analysis Essay Examples For Students
Ā 
Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.
Ā 
Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.
Ā 
Sample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalSample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change Final
Ā 
Access To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefAccess To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is Def
Ā 
Groundhog Day Writing Paper Teaching Resources
Groundhog Day Writing Paper  Teaching ResourcesGroundhog Day Writing Paper  Teaching Resources
Groundhog Day Writing Paper Teaching Resources
Ā 
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Ā 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
Ā 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
Ā 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
Ā 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Ā 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
Ā 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
Ā 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
Ā 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
Ā 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
Ā 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Ā 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
Ā 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Ā 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Ā 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
Ā 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
Ā 
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Ā 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Ā 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Ā 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
Ā 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
Ā 
CĆ³digo Creativo y Arte de Software | Unidad 1
CĆ³digo Creativo y Arte de Software | Unidad 1CĆ³digo Creativo y Arte de Software | Unidad 1
CĆ³digo Creativo y Arte de Software | Unidad 1
Ā 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
Ā 

A Lightweight C++ Header-Only MPI Interface for Improved Performance (MPP

  • 1. A Lightweight C++ Interface to MPI Simone Pellegrini, Radu Prodan, Thomas Fahringer Institute of Computer Science, University of Innsbruck Technikerstr. 21A, 6020 Innsbruck, Austria Abstract The Message Passing Interface (MPI) provides bindings for the three programming languages commonly used in High Performance Computing (HPC): C, C++ and Fortran. Unfortunately, MPI supports only the lowest common de- nominator of the three languages, providing a level of ab- straction far lower than typical C++ libraries. Lately, af- ter the decision of the MPI committee to deprecate and re- move the C++ bindings from the MPI standard, program- mers are forced to use either the C API or rely on third-party libraries. In this paper we present a lightweight, header-only C++ interface to MPI which uses object oriented and generic programming concepts to improve its integration into the C++ programming language. We compare our wrapper with a related approach called Boost.MPI showing how MPP facilitates the interaction with C++ objects. Perfor- mance wise, MPP outperforms Boost.MPI by reducing the interface overhead by a factor of eight. Additionally, MPPā€™s handling of user-deļ¬ned data types allows transferring of STL containers (e.g. std::list) up to 20 times faster than Boost.MPI for small linked lists by relying on software serialization. 1 Introduction MPI is the defacto standard for writing parallel programs for distributed memory systems. As its focus is on High Performance Computing (HPC), MPI offers an Applica- tion Programming Interface (API) for C, C++ and Fortran, the most widely used languages for HPC. Unfortunately, since the deļ¬nition of the ļ¬rst standard in 1994 [3], MPI did not keep the pace with the evolution of the underlying languages, such as object-oriented programming in Fortran 2000 and templates in C++. Nowadays, this problem is mostly perceived in C++ which, unlike Fortran and C, pro- vides much higher-level abstractions which are not reļ¬‚ected in the design of the MPI interface [6]. MPI is so poorly in- tegrated into the C++ environment that many programmers prefer to use, even in C++ programs, the C interface. Fur- thermore, to map common C++ constructs onto MPI, pro- grammers are forced to weaken the language type safety. As a consequence, errors that could be easily detected by the compiler are no longer captured leading to runtime failures. These issues led the MPI committee to the decision of dep- recating C++ bindings in the version 2.2 of the MPI stan- dard. However, because of the growing interest and use of C++ in HPC, several third-party wrappers to MPI have been proposed [11], the most important being Boost.MPI [8] and OOMPI [9]. Figure 1 shows a simple MPI program sending two ļ¬‚oat- ing point values from process rank 0 to rank 1. A problem of this code snippet is that the programmer is forced to un- necessarily declare a temporary variable val to store the values being sent by MPI Send (line 4). Although the C99 standard [1] introduced compound literals to avoid such un- necessary memory allocations (line 2), they are not widely used because of the decreased code readability. Because the compiler is not aware of the semantics of MPI Send which guarantees that the valā€™s value is not modiļ¬ed, no memory optimizations can be performed. A second problem is that the signature of all MPI routines requires the programmer to provide the size and the type (i.e. one MPI FLOAT) of the data being sent, which is error-prone and can be avoided in C++ by inferring them at compile-time. Boost.MPI [8] tries to simplify the MPI interface by de- ducing several of those parameters at compile-time through C++ template techniques. For example, the size of the data sent and its associated MPI Datatype is strictly related to the type of the object being sent and, therefore, deducible at compile-time from the C++ typing system. The send and recv routines in Boost.MPI require only three parameters, as shown in Figure 2 (lines 2, 3, 6, and 8): the source/desti- nation rank, the message tag, and the message content. This not only simpliļ¬es the usage of the routines, but also im- proves their safety. Although Boost.MPI is a consistent im- provement over the standard MPI C++ bindings, it is not widely accepted within the MPI community because of two main reasons: (i) the dependency on the Boost C++ library and accompanying licensing issues; (ii) the use of a serial- 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing 978-0-7695-4633-9/12 $26.00 Ā© 2012 IEEE DOI 10.1109/PDP.2012.42 3
  • 2. 1 if ( rank==0 ) { 2 MPI Send((const int[1]) { 2 }, 1, MPI INT, 1, 1, 3 MPI COMM WORLD ); 4 std::array<float,2> val = {3.14f, 2.95f}; 5 MPI Send(&val.front(), val.size(), MPI FLOAT, 1, 0, 6 MPI COMM WORLD); 7 } else if (rank == 1) { 8 int n; 9 MPI Recv(&n, 1, MPI INT, 0, 1, MPI COMM WORLD, 10 MPI STATUS IGNORE); 11 std::vector<float> values(n); 12 MPI Recv(&values.front(), n, MPI FLOAT, 0, 0, 13 MPI COMM WORLD, MPI STATUS IGNORE); 14 } Figure 1. Simple MPI program using C bind- ings ization library [10] to handle transmission of user-deļ¬ned data types (i.e. merging of objects with a sparse memory representation into a continuous data chunk) that negatively impacts the performance. 1 if ( world.rank() ==0 ) { 2 world.send( 1, 1, 2 ); 3 world.send( 1, 0, std::array<float,2>({3.14f, 2.95f}) ); 4 } else if (world.rank() == 1) { 5 int n; 6 world.recv(0, 1, n); 7 std::vector<float> values(n); 8 world.recv(0, 0, values); 9 } Figure 2. Boost.MPI version of the program from Figure 1 An object-oriented approach to improve the C++ MPI interface is OOMPI [9] which speciļ¬es send and receive operations in a more user friendly way by overloading the insertion << and extraction >> C++ operators. In OOMPI, a Port towards a process rank is obtained by using the array subscript operator [] on a communicator object (see line 2 in Figure 3). A further advantage is the convenience to com- bine these operators in one C++ instruction when inserting or extracting data to/from the same stream. A drawback of OOMPI is the poor integration of arrays and user data types in general. For example, sending an array instance requires the programmer to explicitly instantiate an object of class OOMPI Array message, which requires the size and type of the data to be manually speciļ¬ed as in the cur- rent MPI speciļ¬cation (line 4). The support for generic user data types requires the objects being sent to inherit from the OOMPI User type interface. This is a rather severe lim- itation as it does not allow any legacy class (e.g. the STLā€™s containers) to be directly supported. 1 if ( OOMPI COMM WORLD.rank() == 0 ) { 2 OOMPI COMM WORLD[1] << 2; 3 std::array<float,2> val = {3.14f, 2.95f}; 4 OOMPI COMM WORLD[1] << 5 OOMPI Array message(&val.front(), val.size()); 6 } else if (OOMPI COMM WORLD.rank() == 1) { 7 int n; 8 OOMPI COMM WORLD[0] >> n; 9 std::vector<float> values(n); 10 OOMPI COMM WORLD[0] >> 11 OOMPI Array message(values, 2); 12 } Figure 3. OOMPI version of the program from Figure 1 In this paper, we combine some of the concepts pre- sented in Boost.MPI and OOMPI and propose an advanced lightweight MPI C++ interface called MPP that aims at transparently integrating the message passing paradigm into the C++ programming language without sacriļ¬cing perfor- mance. Our approach focuses on point-to-point commu- nications and integration of user data types which, unlike Boost.MPI, relies entirely on native MPI Datatypes for better performance. Our interface also utilizes advanced concepts from other parallel programming languages such future objects [5] which simpliļ¬es the use of MPI asyn- chronous routines. Overall, MPP is designed with a speciļ¬c focus on per- formance. As we target HPC systems, we understand how critical performance is and spent signiļ¬cant effort to re- duce the interface overhead. We compare the performance of MPP with Boost.MPI and show that, for a simple ping- pong application, MPP achieves a four times larger through- put (in terms of messages per seconds). Compared to the pure C bindings, MPP has an increased latency of only 9%. As far as the handling of user data types is concerned, MPP is able to reduce the transfer time of a linked list (i.e. std::list<T> from C++ STL) up to 20 times compared to Boost.MPI. To determine the real beneļ¬t of using MPP for real applications, we rewrote the computational kernel of QUAD MPI [7] to use Boost.MPI and MPP. The ob- tained results show a performance improvement of around 12% compared to Boost.MPI. The rest of the paper is organized as follows. In Section 2 we introduce MPP as a lightweight C++ wrapper to MPI using small code snippets. In Section 3 we compare our library against Boost.MPI and an plain MPI implementation using two micro-benchmark codes and an application code 4
  • 3. called QUAD MPI. Section 4 concludes the paper. 2 MPP: C++ Interface to MPI We use object-oriented programming concepts and C++ templates to design a lightweight wrapper for MPI routines that simpliļ¬es the way in which MPI programs are written. Similar to Boost.MPI, we achieve this goal by reducing the amount of information required by MPI routines and by in- ferring as much as possible at compile-time. By reducing the amount of code written by the users, we expect less pro- gramming errors. Furthermore, by making type checking safer, most common programming mistakes can be captured at compile-time. In this work, we focus on point-to-point operators, as the specialised semantics of collective oper- ations has no counterpart in C++ STL. We also present a generic mechanism of handling C++ user data types which allows for easy transfer of C++ objects to any existing MPI routine (including collective operations). 2.1 Point-to-Point Communication While Boost.MPI maintains in its API design the style of the traditional send/receive MPI routines, our approach is more similar to OOMPI aiming at a better C++ integration by deļ¬ning these basic operations using streams. A stream is an abstraction that represents a device on which input and output operations are performed. Therefore, sending or receiving a message through a MPI channel can be seen as a stream operation. We introduce a mpi::endpoint class which has the semantics of a bidirectional stream from which data can be read (received) or written (sent) using the << and >> operators. The concept of endpoints is simi- lar to the Port abstraction of OOMPI, however, because our mechanism is based on generic programming, user- deļ¬ned data types can be transparently handled. In con- trast, OOMPI is based on inheritance which forces the pro- grammer to instantiate an OOMPI Message class contain- ing the data type and size required by the MPI routines un- derneath [11] (see line 4 in Figure 3). Because an MPI send/receive operation offers more ca- pabilities than C++ streams (e.g. tags for messages, non- blocking semantics), endpoints cannot be directly modelled using an ā€œis-aā€ relationship. Fortunately, STLā€™s utilities (e.g. algorithms) are mostly based on templates and end- points can be passed to any generic function which relies on the << or >> stream operations. Figure 4 shows an example that uses an endpoint as argument to a generic read from function. An endpoint is generated from a communicator using the () operator to which the process rank is passed (line 3). The mpi::comm class is a simple wrapper for an MPI Communicator with the capability of creating end- points and retrieving the current process rank and the com- municator size. The mpi::world refers to an instance of the comm class which wraps the MPI COMM WORLD com- municator. 1 namespace mpi { 2 struct comm { 3 mpi::endpoint operator()(int) const; 4 }; 5 } // end mpi namespace 6 7 template <class InStream, class T> 8 void read from(InStream& in, T& val) { 9 in >> val; 10 } 11 12 int val[2]; 13 // reads the first element of the val array from std::cin 14 read from(std::cin, val[0]); 15 16 // receives 2nd element of val array from rank 1 17 read from(mpi::comm::world(1), val[1]); Figure 4. Example of usage of endpoints in a generic function. Figure 5 shows how the program in Figure 1 can be rewritten with MPP. First of all, the objects are either sent or received using stream operations which allows for a more compact code compared to C MPI bindings (half in size) or to Boost.MPI. Secondly, objects are automatically wrapped by a generic mpi::msg<T> object, which does not need to be speciļ¬ed by the user (as opposed to OOMPI). Adding this level of indirection allows MPP to handle both prim- itive and user data types in a way transparent to the user. R-values (i.e. values with no address such as constants) are handled similar to any regular L-value (e.g. variables) us- ing C++ constant references via the msg class, which avoids unnecessary memory allocation. The interface also allows to specify message tags by manually allocating the message wrapper (example in line 3). MPP also supports non-blocking semantics for the send and receive operations through the overloaded < and > operators. Unlike blocking send/receives, asyn- chronous operations return a future object [5] of class mpi::request<T> which can be polled to test whether the pending operation has completed or not. An exam- ple of non-blocking operations in MPP is shown in Fig- ure 6. For non-blocking receives, the method T& get() waits for the underlying operation to complete (line 2) and, upon completion, it returns a reference to the received value. The mpi::request<T> class also provides a void wait() and a bool test() method implement- ing the semantics of MPI Wait and MPI Test, respec- 5
  • 4. 1 using namespace mpi; 2 if ( comm::world.rank() == 0 ) { 3 comm::world(1) << std::array<float,2>({3.14f, 2.95f}); 4 comm::world(1) << msg(2, 1); 5 } else if (mpi::world.rank() == 1) { 6 int n; 7 comm::world(0) << msg(n, 1); 8 std::vector<float> values(n); 9 comm::world(0) >> values; 10 } Figure 5. MPP version of the program from Figure 1. tively. The example also shows MPPā€™s support for receive operations which listen for messages coming from an un- known process using the mpi::any constant rank when creating an endpoint (line 3). 1 float real; 2 mpi::request<float>&& req = 3 mpi::comm::world(mpi::any) > real; 4 // ... do something else ... 5 use( req.get() ); Figure 6. Non-blocking MPP endpoints. Errors returned in MPI by every routine as an error code are handled in MPP via C++ exceptions. Any call to MPP routines can potentially throw an exception as a subclass of mpi::exception. The method get error code() of this class allows the retrieval of the native error code. 2.2 User Data Types OOMPI is one of the ļ¬rst APIs trying to introduce support for user data types through inheritance from an OOMPI User type class. Unfortunately, this mechanism is relatively weak because, by relying on inheritance, it does not allow the handling of class instances provided by third- party libraries (e.g. STL containers). Another attempt is the use of serialization in Boost.MPI which, although el- egant, introduces a high runtime overhead. The objective of MPP is to reach the same level of integration with user data types as Boost.MPI without performance loss, which we achieve by relying on the existing MPI support for user data types, i.e. MPI Datatype. The deļ¬nition of an MPI Datatype is rather cumbersome and therefore not commonly used. Indeed, deļ¬ning an MPI Datatype re- quires the programmer to specify several information re- lated to its memory layout which often leads to program- ming errors that are very difļ¬cult to debug. However, be- 1 template <class T> 2 struct mpi type traits<std::vector<T>> { 3 static inline const Tāˆ— 4 get addr( const std::vector<T>& vec ) { 5 return mpi type trait<T>::get addr(vec.front()); 6 } 7 static inline const size t 8 get size( const std::vector<T>& vec ) { 9 return vec.size(); 10 } 11 static inline MPI Datatype 12 get type( const std::vector<T>& ) { 13 return mpi type trait<T>::get type( T() ); 14 } 15 }; 16 ... 17 typedef mpi type traits<vector<int>> vect traits; 18 vector<int> v = { 2, 3, 5, 7, 11, 13, 17, 19 }; 19 MPI Ssend( vect traits::get addr(v), 20 vect traits::get size(v), 21 vect traits::get type(v), ... ); Figure 7. Example of using mpi type traits to handle STL vectors. cause operations on data types are mapped to DMA trans- fers by the MPI library, the use of an MPI Datatype out- performs any other techniques based on software serializa- tion. The integration of user data types is achieved by using a design pattern called type traits [4]. An example is illus- trated in Figure 7 for C++ STLā€™s std::vector<T> class. We let the user specialize a class which statically provides the compiler three pieces of information required to map a user data type to MPI Datatypes: 1. the memory address from which the data type instance begins; 2. the type of each element; 3. the number of elements. Because a C++ vector is contiguously allocated in mem- ory, the starting address of the ļ¬rst element has to be recur- sively computed for handling generic regular nested types (e.g. vector<array<float,10>> in lines 3āˆ’6). The length is the number of elements present in the vector (line 9) and the type is the data type of a vector element (line 11 āˆ’ 14). Because our mechanism is not based on inher- itance (like in OOMPI), it is open for integration and use with third party class libraries. Lines 17 āˆ’ 21 show how the introduced type traits can be used with the MPI C binding. This method can also be used for collective operations or for one of the several ļ¬‚avors of MPI Send for which an 6
  • 5. ! (a) Number of ping/pong operations per second. #$ % ' ( ( % (b) Comparison of Boost.MPI and MPP for STLā€™s linked list (std::listT). Figure 8. MPP performance evaluation results. appropriate operator cannot be deļ¬ned. MPP also provides several type traits for some of the STL containers such as vector, array and list. 3 Performance Evaluation In this section we compare the performance of MPP against Boost.MPI and the standard C binding of MPI. We used the Open MPI version 1.4.2 to execute the experi- ments. We did not consider OOMPI for performance eval- uation since its development has been stopped since several years. We ļ¬rst compared the MPI bindings by using micro- benchmarks and then by using a real MPI application called QUAD MPI which is a C++ program that approximates an integral based on a quadrature rule [7]. 3.1 Micro Benchmarks The purpose of the ļ¬rst experiment is to measure the la- tency overhead introduced by MPP over the standard C in- terface to MPI compared to Boost.MPI. We implemented a simple ping-pong application which we executed on a shared memory machine with a single AMD Phenom II X2 555, 3.5 GHz dual-core processors, 1MB of L2 cache, and 6MB of L3 cache. This way, any data transmission over- head is minimized and the focus is solely on the interface overhead. Figure 8(a) displays the number of ping-pong operations per second for varying message sizes. MPP has approximately 9% larger latency for small messages com- pared to the native MPI routines. This overhead is due to the creation of a temporary status object corresponding to the MPI Status returned by the MPI receive routine contain- ing the message source, size, tag, and error (if any). Com- pared to Boost.MPI, MPP shows nevertheless a consistent performance improvement of around 75% for small mes- sage sizes. Because both implementations use plain vectors to store the exchanged message, no serialization is involved to explain the overhead difference. We believe that the main reason for this overhead comes from the fact that Boost.MPI is implemented as a library and every call to MPI routines pays the overhead of an additional function call. We solved the problem in MPP by designing a pure header-based im- plementation, which allows all MPP routines to be inlined by the compiler, thus eliminating any overhead. The graph also illustrates that, as expected, the overhead decreases for larger messages as the communication time becomes pre- dominant. In the second experiment, we compared MPP with Boost.MPI for the support of user-deļ¬ned data types. We used a listdouble type of varying size exchanged be- tween two processes in a loop repeated one thousand times. We executed the experiment on an IBM blade cluster with a quad-core Intel Xeon X5570 processors interconnected through Inļ¬niband network. We allocated the two MPI pro- cesses on different blades in order to simulate a real use case scenario. Figure 8(b) shows the time necessary to per- form this micro-benchmark for different list sizes and the 7
  • 6. 1 double my a, my b; 2 my total = 0.0; 3 if ( rank == 0 ) { 4 for ( unsigned q = 1; q p; ++q ) { 5 my a = ( ( p āˆ’ q ) āˆ— a + ( q āˆ’ 1 ) āˆ— b ) / ( p āˆ’ 1 ); 6 MPI Send ( my a, 1, MPI DOUBLE, q, 0 ); 7 8 my b = ( ( p āˆ’ q āˆ’ 1 ) āˆ— a + ( q ) āˆ— b ) / ( p āˆ’ 1 ); 9 MPI Send ( my b, 1, MPI DOUBLE, q, 0 ); 10 } 11 } else { 12 MPI Recv ( my a, 1, MPI DOUBLE, 0, 0, status ); 13 MPI Recv ( my b, 1, MPI DOUBLE, 0, 0, status ); 14 15 for ( unsigned i = 1; i = my n; ++i ) { 16 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1); 17 my total = my total + f ( x ); 18 } 19 my total = (my b āˆ’ my a) āˆ— my total / (double) my n; 20 } Figure 9. Computational kernel of QUAD MPI. speedup achieved by MPP over Boost.MPI. For small lists of 100 elements, the speedup is approximately 20, how- ever, the performance gap closes by increasing the list size. The reason is the std::list implementation in MPP us- ing MPI Type struct, which requires enumerating all memory addresses that compose the object being sent. To create an MPI Datatype for a linked list, three arrays have to be provided: ā€¢ the displacement of each list element relative to the starting address; ā€¢ the size of each element; ā€¢ the data type of each element (i.e. O(3Ā·N) of memory overhead). We observe in Figure 8(a) that building such a data type be- comes more expensive as the list size increases, so that for large linked lists over 50,000 elements the software serial- ization outperforms the MPI data typing mechanism. Future optimization could improve the support of large data struc- tures integrating in MPP a mechanism that switches from the use of MPI Datatype to serialization starting from a critical size. 3.2 QUAD MPI Application Code The micro-benchmarks highlighted the low latency of the MPP bindings, however this does not indicate much about the beneļ¬ts of using MPP for real application codes. 1 my total = 0.0; 2 if ( rank == 0 ) { 3 for ( unsigned q = 1; q p ; ++q ) { 4 world.send(q, 0, (( p āˆ’ q ) āˆ— a + ( q āˆ’ 1 ) āˆ— b) / ( p āˆ’ 1 )); 5 world.send(q, 0, (( p āˆ’ q āˆ’ 1 ) āˆ— a + ( q ) āˆ— b) / ( p āˆ’ 1 )); 6 } 7 } else { 8 double my a, my b; 9 world.recv(0, 1, my a); 10 world.recv(0, 2, my b); 11 12 for ( unsigned i = 1; i = my n; ++i ) { 13 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1); 14 my total = my total + f ( x ); 15 } 16 my total = (my b āˆ’ my a) āˆ— my total / (double) my n;ł 17 } Figure 10. Computational kernel of QUAD MPI rewritten using Boost.MPI. For this purpose we took a simple MPI application ker- nel called QUAD MPI and rewritten using Boost.MPI and MPP. QUAD MPI is a C program which approximates an integral using a quadrature rule [7] and can be efļ¬ciently parallelized using MPI. From the original code [7], we ex- tracted the computational kernel depicted in Figure 9. The process rank 0 assigns to every other process a sub-interval of [A, B] and these bounds are then communicated using message passing routines. The number of communication statement in the code is limited, i.e. 2 Ā· (P āˆ’ 1), where P is the number of processes. Therefore, this code represents a good balance between communication and computation making it a good choice to determine the beneļ¬ts of MPP bindings. This QUAD MPI kernel can be easily rewritten using Boost.MPI and MPP, as shown respectively in Figures 10 and 12. In both cases, we removed the necessity of assign- ing the value being sent to the my a and my b variables be- cause both Boost.MPI and MPP support sending R-values that are computed and directly sent to the destination (lines 4 and 5). The code at the receiver side is similar, the only difference being that we can now restrict the scope of the my a and my b variables to the else body only (lines 9 and 10), which allows a faster machine code generation as the compiler can utilize the CPU registers more efļ¬ciently. Ad- ditionally, MPP allows for a further reduction of the code as shown in Figure 12, since the two sends (line 4) and the two receives (line 9) can be combined together into a sin- gle statement. MPP also relieves the programmer from the burden of specifying a message tag by utilizing the tag 0 by default. With MPP we are able to shrink the input code 8
  • 7. 1 my total = 0.0; 2 if ( rank == 0 ) { 3 for ( unsigned q = 1; q p; ++q ) { 4 comm::world(q) ((p āˆ’ q) āˆ— a + (q āˆ’ 1) āˆ— b) / (p āˆ’ 1) 5 ((p āˆ’ q āˆ’ 1) āˆ— a + q āˆ— b ) / (p āˆ’ 1); 6 } 7 } else { 8 double my a, my b; 9 comm::world(0) my a my b; 10 11 for ( unsigned i = 1; i = my n; ++i ) { 12 x = ((my n āˆ’ i) āˆ— my a + (i āˆ’ 1) āˆ— my b) / (my n āˆ’ 1); 13 my total = my total + f ( x ); 14 } 15 my total = (my b āˆ’ my a) āˆ— my total / (double) my n; 16 } Figure 11. Computational kernel of QUAD MPI rewritten using MPP. by 30% (in terms of number of characters), which reduces the chances of programming errors and increases the overall productivity. We ran the three versions of the QUAD MPI kernel on a machine with 16 cores (a dual socket Intel Xeon CPU) and used shared memory to minimize communications costs and highlight the library overhead. We compiled the in- put programs with optimization enabled (i.e. -O3 ļ¬‚ag), re- peated each experiment for 10 times, and reported the av- erage and standard deviation in execution time (see Fig- ure 12). Because of the removal of the superļ¬‚uous assignment operations to the my a and my b variables, the MPP ver- sion performs slightly faster than the original code. It is worth noticing that, although the same optimization has been applied to the Boost.MPI version, the large overhead of Boost.MPI cancels any beneļ¬t making the resulting code the slowest of all three. Compared to Boost.MPI, the MPP version has a performance improvement of around 12%. 4 Conclusions In this paper we presented MPP as an advanced C++ in- terface to MPI. We combined some of the ideas of OOMPI and Boost.MPI into a lightweight, header-only interface smoothly integrated with the C++ environment. We intro- duced a transparent mechanism for dealing with user data types which, for small objects, is up to 20 times faster than Boost.MPI due to the use of MPI Datatypes instead of software serialization. We showed that programs written using MPP are more compact compared to the MPI C bind- ings and that the object oriented design overhead introduced ! # $ ! % Figure 12. QUAD MPI performance compari- son. is negligible. Furthermore, MPP can avoid common pro- gramming errors in two ways: 1. through its interface design that uses future objects to avoid reading the buffer of an asynchronous receive before data has been written; 2. by automatically inferring most of the input arguments required by MPI routines. The MPP interface is freely available at [2]. In the future we intent to extend the interface to sup- port easier use of other complex MPI features such as dy- namic process management, operations on communicators and groups, and creation of process topologies. 5 Acknoledgments This research has been partially funded by the Austrian Research Promotion Agency (FFG) under grant P7030-025- 011 and by the Tiroler Zukunftsstiftung under the Trans- lational Research Grant ā€Parallel Computing with Java for Manycore Computers. References [1] C99 standard. www.open-std.org/JTC1/SC22/ wg14/www/docs/n1124.pdf [2] MPI C++ Interface. https://github.com/ motonacciu/mpp [3] The MPI-1 Speciļ¬cation. http://www.mpi-forum. org/docs/docs.html [4] A. Alexandrescu. Traits: The else-if-then of types. In C++ Report, pages 22ā€“25, 2000. http://erdani. com/publications/traits.html 9
  • 8. [5] H. C. Baker, Jr. and C. Hewitt. The incremental garbage collection of processes. In Proceedings of the 1977 sympo- sium on Artiļ¬cial intelligence and programming languages, pages 55ā€“59, New York, NY, USA, 1977. ACM. [6] J. S. Bill, B. S. Y, and A. L. Z. The design and evolution of the MPI-2 C++ interface. In In Proceedings, 1997 In- ternantional Conference on Scientiļ¬c Computing in Object- Oriented Parallel Computing, Lecture Notes in Computer Science. Springer-Verlag, 1997. [7] J. Burkardt. http://people.sc.fsu.edu/ Ėœjburkardt/c_src/quad_mpi/quad_mpi.html [8] P. Kambadur, D. Gregor, A. Lumsdaine, and A. Dharurkar. Modernizing the C++ interface to MPI. In Recent Advances in Parallel Virtual Machine and Message Passing Inter- face, Lecture Notes in Computer Science, pages 266ā€“274. Springer Berlin / Heidelberg, 2006. [9] B. C. McCandless, J. M. Squyres, and A. Lumsdaine. Object-Oriented MPI (OOMPI): A class library for the mes- sage passing interface. In Proceedings of the Second MPI Developers Conference, pages 87ā€“, Washington, DC, USA, 1996. IEEE Computer Society. [10] R. Ramsey. Boost serialization library.www.boost.org/ doc/libs/release/libs/serialization/ [11] A. Skjellum, D. G. Wooley, A. Lumsdaine, Z. Lu, M. Wolf, J. M. Squyres, B. Mccandless, and P. V. Bangalore. Object- oriented analysis and design of the message passing inter- face, 1998. 10