Parallel Framework Project (PFP) Wiki
Background
Parallel computing is going mainstream. Many parallel programming models exist today and more are being proposed. They range from the older OpenMP, MPI, UPC/CAF and Cilk, to the newer, Fortress, X10, Chapel, TBB, RapidMind, MapReduce, Hadoop, and Dryad, just to name a few. Each addresses one or several challenges in parallel programming. There clearly is no one-size-fits-all solution, and there are plenty of areas where Sun Tools can innovate and provide leadership in this critical and exciting era.
Three engineers Liang Chen, Deepankar Bairagi and Yuan Lin (by the order of joining the project) at Sun Developer Tools group spent their spare time in brainstorming and founding the project in summer 2007. Oleg Mazurov joined in early 2009 and made big enhancements to the project. Kuldip Oberoi joined in Spring 2009 to initiate the project community effort.
PFP is originally code named MCFX standing for MultiCore FX. Therefore you will find MCFX is used throughout the documents and the program code. PFP (MCFX) project was presented by Liang Chen at Stanford University PPL retreat in November 2008 and UC Berkeley OSQ conference in May 2009. Liang will officially present the project at International SuperComputing Conference at Germany in June 2009. The technical paper can found at Springer Journal web site: http://www.springerlink.com/content/9004q3172421j810/
The PFP Kernel is now available for download at the Downloads link. The Software Development Kit which includes example programs and documentation will be available at the Source Code repository in near future.
What Is PFP
Explore an easy-to-use parallel programming framework for non-trivial parallel applications
- Library based approach without learning new language
- Delivering programmability as well as performance
- Focusing on a few popular design patterns
- Encouraging the re-usable objects to help build a parallel programming community
- Including those trivial ones, of course!
- Currently implemented in C++ and can be extended to Java and other languages in the future
Main Goals
The PFP project aims to improve the productivity of parallel programming by taking a data-centric approach and to provide a high level parallel execution model.
- Ease the pain of developing parallel applications for several classes of problems (graph traversal and cyclic computation).
- Simplify the programming of non-trivial parallelism.
- Closely match the domain problems by taking an object oriented and data object centric approach.
- Encourage re-usability with a layered architecture.
The MCFX framework consists of a C++ class library with a runtime support. The MCFX C++ class definitions provide the API for programmers to write applications using the MCFX parallel programming model and the underlying runtime library implements the various features supported by the MCFX framework.
Current Status
- PFP project will be officially presented at International SuperComputing 09 Conference at Hamburg Germany in June.
- PFP Project community will be opened to the public in June.
- In Collaboration with Stanford University EE Department for Parallel Workload Research.
PFP Overview
_
- 1 Parallel Framework Project (PFP) Wiki
- 1.1 Background
- 1.2 What Is PFP
- 1.3 Main Goals
- 1.4 Current Status
- 1.5 PFP Overview
- 1.5.1 1. Introduction
- 1.5.2 2. Basic Concept
- 1.5.3 3. Constructs
- 1.5.3.1 A summary of important programming constructs in MCFX
- 1.5.3.2 Regular Object
- 1.5.3.3 Executable Object
- 1.5.3.4 Conditional Executable Object
- 1.5.3.5 Executable Container
- 1.5.4 4. The Services Of Executable Container
- 1.5.4.1 Object Dependency Service
- 1.5.4.2 Reduction Service
- 1.5.4.3 Introspection Service
- 1.5.4.4 Deletion Service
- 1.5.4.5 Execution Service
- 1.5.5 5. Programming Model
- 1.5.5.1 Map-Reduce
- 1.5.5.2 Graphical Traversal
- 1.5.5.3 Cyclic Simulation
- 1.5.6 6. Example
_
1. Introduction
With the industry moving towards multi-core, multi-threaded processors, parallel programming becomes more and more important. However it is always quite difficult for a software developer to design and to write parallel programs. Most programmers are not trained to think and to design their programs in a concurrent manner. Furthermore, parallel programming poses new challenges that are not present in sequential programming. For example, race condition is a notorious problem in parallel programming and hard to avoid completely.
There are several software packages and programming API in Java and native programming languages to help programmers develop parallel code today. The existing parallel tool offerings range from the !OpenMP API in C/C++/Fortran to the Concurrency utility package in Java to Intel Thread Building Blocks in C++. They are designed to help programmers implement their parallel programs more productively. However all the existing design approaches remain at a relatively low design abstraction level. For example, Intel TBB provides a set of parallel operations based on concurrent tasks. One critical issue of the existing approaches is that they force programmers to think in terms of parallel tasks and operations. Furthermore, the existing approaches provide a too low level parallel mechanism. The current parallel design tools all require the programmers to rework their program to decompose and partition the main algorithm into multiple concurrent tasks. The low level design abstraction with task centric approach works in most cases, but it has several severe drawbacks. The first drawback is low productivity due to design abstraction at very low level, it is difficult for programmers to design parallel programs with current tools. Another drawback is that it takes a different mind-set and a different implementation scheme to solve every problem with the task centric approach. The third drawback is the programming inefficiency due to the focus on operations instead of data. A program can be viewed as a process of operations on the main data structure. Many software developers feel more comfortable to design their programs starting from the data rather than from the tasks or operations.
Here we propose a concurrent object oriented programming framework, which will be called MCFX in this document. This parallel programming framework is comprised of a high design abstraction level programming model and a concurrent execution environment. The invented programming model will allow the programmers to design their program from their application object perspective. The application developer can use an executable object (EO) to model a concurrent entity in the real world, such as a device in EDA or a molecule in a life science. An executable object can also model a graph node in a graphical traversal or a computing state in the search for the best move in a game or for an optimal path to reach the destination. When the executable objects are created and placed in a executable container (EC), the executable objects can be activated to become alive, i.e. the actions (processes) of the executable objects will be executed concurrently. The action of an executable object can generate more new executable objects and add them to the executable container. Therefore the application programmers can focus on designing their application objects and the action methods of the objects.
There are several major advantages to design parallel program based on MCFX. First of all, high design abstraction level allows the programmers to focus on solving the problem of their application instead of spending effort in designing parallel operations. Secondly, the proposed programming model is expandable. The application executable objects designed for the framework are re-usable components and they can be easily extended by other applications. Even the executable containers can be extended and re-used by other application. Thirdly, the MCFX programming model and the concurrent execution environment are loosely coupled with each other other via a well defined API. Therefore the same programming model can run on any parallel runtime environment supporting MCFX containers.
The MCFX framework consists of a C++ class library with a runtime support. The MCFX C++ class definitions provide the API for programmers to write applications using the MCFX parallel programming model and the underlying runtime library implements the various features supported by the MCFX framework. Please read the MCFX User guide for more details.
2. Basic Concept
MCFX framework is based on an innovative software concept of introducing the configurable and reusable parallel object scheduler(s) at the user application level. The parallel scheduler is called an executable container in MCFX . An MCFX application program can be designed as the composition of many application objects which can be executable or non-executable. A regular object is designed as an object class derived from MCFX abstract base class object_t. A regular object is non-executable and can be placed in a normal data container only. When an application object is associated with a designated action, it becomes executable and is called executable object. An executable object become alive (actionable) when it is placed in an activated executable container. An executable object can model a real world concurrent thing such as an electronic device or a molecule. An executable objects can also model an abstract entity such as a tree node or a state node in a tree traversal.
The MCFX executable container uses an internal concurrent execution engine to execute its contained objects in parallel. The concurrent execution engine consists of multiple execution units, each of which is equipped with its own local container. The concurrent execution engine is completely transparent to the application developers. There are the various types of MCFX executable containers such as queue, priority queue, list and cyclic pool. A programmer may create multiple executable containers of different types in an application. A parallel executable container can provide several important services to help the application developers in the design of their executable objects. Because an application can have the multiple executable containers, an executable data object can be moved from one executable container to another if needed.
3. Constructs
A summary of important programming constructs in MCFX
Regular Object
A C++ or Java object derived from MCFX abstract class object_t. A regular object is non-executable and behaves exactly the same as a regular C++ or Java object. MCFX also provides a set of efficient atomic accessing methods to the regular objects. A regular object can be stored in a regular container only.
Executable Object
A C++ or Java object derived from MCFX abstract class ex_object_t. An executable object is an application object associated with at least one MCFX designated action method which can be executed by the hosting parallel execution container. An executable data object can be placed in either a regular container or an executable container.
Conditional Executable Object
A C++ or Java object derived from MCFX abstract class ex_cond_object_t. When an executable object has its internal states or its actions depends on the external factors, the executable object should be associated a set of conditional actions.
Executable Container
Equipped with the various helpful services to provide a concurrent execution environment and perform the actions of the executable objects in the container in parallel. Only the executable objects can be put in the executable containers.
4. The Services Of Executable Container
DPF executable container delivers the following services to help the application programmers to design their executable objects:
Object Dependency Service
A conditional executable object contains pairs of condition and action. The conditional action will be executed only when the waited condition happens. When an object has conditional actions, the application should specify what conditions are partial and what conditions are full. When an object's partial condition is met, the corresponding action is executed and the object remains in the container to wait for the other conditions. When an object's full condition is met, the corresponding action is executed and the object is done and no longer waits for any other conditions.
Reduction Service
Provide a set of reduction methods allowing the application to achieve desired result under parallel executions. The generic reduction service can be customized to produce maximum, minimum, summation or any other target values. An application can implement the Map-Reduce design pattern easily by creating many executable objects performing the parallel operations in an executable container and generate the result through the reduction service.
Introspection Service
Provide a self-examining method with a user specified comparison function to search for equivalent objects within the executable container. When a new executable object is added to the executable container, the introspection service can prevent a duplicated or inferior object being added to the container. This service is often used in a graphical traversal or searching type application.
Deletion Service
Provide a method to remove the existing executable objects from the executable container when a superior new executable object is added to the container. This service is exactly opposite to the introspection service. When a better executable object arrives, a waiting-to-be-executed inferior object in the container can be removed for a better efficiency.
Execution Service
Container.start() - asynchronously starts to execute a container immediately. (? "asynchronously" and "immediately"?)
Container.start(condition) - asynchronously starts to execute a container when the condition is met.
Container.wait() - a blocking method that does not return until the container has fully stopped.
Container.stop_condition(goal) - a non-blocking method that specifies a condition goal for the container to stop.
Container.barrier(state) - when the container reaches the barrier state, it just pauses and waits for the next stage execution. The difference between the barrier and the stopped goal: when the executable data container achieve the goal condition, all internal execution units are shut down and the resources freed.
5. Programming Model
An object oriented application is designed as a composition of the various objects. The computing intensive portion should be comprised of the executable object and the rest can be designed with the regular objects. MCFX supports the various application design patterns such as Map-Reduce, graphical traversal, and cyclic simulation. A MCFX object can model a real world concurrent object as well as an abstract programming object.
When an executable object models a real-world entity, the application may choose a cyclic pool container to execute the real world entities. An executable object will remain alive forever in a cyclic pool container. Furthermore the cyclic pool container will make sure all the executable objects enter the Done state before it signals a cycle end.
When an executable object models an abstract entity such as a tree node in a graphical traversal program, it tends to use the other container types such as queue or stack. An executable object will become dead and removed from the container after its unconditional action or full conditional action is executed and enter the Done state. Although MCFX tries to executes all executable objects in the container concurrently, some objects will be executed earlier than the other due to the available parallel execution units allocated according to the hardware resources. Therefore queue type container is better for the breadth first processing order and stack type container is better for the depth first processing order.
MCFX programming model can support several design patterns, here are a three design patterns we specifically design for.
Map-Reduce
A computation task can be modeled as an executable task object in MCFX. A large number of executable task objects can be placed in an executable container to get executed in parallel. Furthermore new executable objects may be created by actions of older projects.The application can apply reduction service to filter and summarize the desired result.
Graphical Traversal
A graphic node can be modeled as an executable node object in MCFX. The action of a graphic node is to process the node data, create executable object nodes of its child and add the new objects to the executable container. If the action of a node object depends on the other nodes. The node object should be derived from the conditional executable object class and a set of conditional actions need to be specified. The post-order tree traversal is one example of graphical traversal programs which require object dependency.
Cyclic Simulation
A real world concurrent object such as an electronic gate can be modeled as an executable gate object. The process of a gate can be a simple functional operation when its inputs are ready or a more complex function with multiple outputs with each output depending on its own input set. The gate object can have internal states and behave as a finite state machine. When all the gate objects are placed in a cyclic pool, the gate objects will be executed when the waited input objects are available. The cyclic pool will signal a cycle end when all the gate objects are fully executed exactly once.
6. Example
A program example to illustrate the MCFX programming model
Graphical tree traversal to search the target nodes
Here is a a graphical tree traversal program to find all the target nodes in a large unbalanced tree. This program traverses the tree from the root node to examine and calculate the node value of each node. When a calculated node value matches the target value, the node will be collected in the target node list.
Below are the main programming steps to design the application:
- Class node_object is designed as a derived class from ex_object_t class. The tree structure is defined with a single root node with the child nodes and each child node is a sub-tree itself. The tree can be very unbalanced and therefore it is inefficient for the graphical traversal program to create a thread to process a sub-tree as in a traditional multi-threaded programming.
- Design the MCFX designated virtual function process() of node_object class to calculate the node value and decide if the node should be collected through the reduction service. The process method should also create the node objects for the children nodes and add the new node objects to the container. A comparison function can be provided for the introspection service to make the parallel searching run more efficiently.
- Create an instance of ec_queue_t executable container as the main parallel execution container if the searching order is bread-first. Otherwise the instance of ec_stack_t executable container can be created for depth-first searching order.
- Create an instance of node_object for the root node and add it to the executable container.
- Start the executable container and get the result when the container becomes empty.
Sample code of the example program
// Application class for node object
class node_object : public ex_object_t {
public:
tree_node* tn_ptr;
node_object(tree_node* tn) { tn_ptr = tn; }
virtual process(executable_container_t* ec) {
if ( isQualified(tn_ptr->value) ) ec->post( this );
for (int k = 0; k < tn_ptr->child_count; k++) {
node_object* new_node = new node_object( tn_ptr->child_array[k]);
ec->push ( new_node );
}
}
// Main program
int main(int argc, char** argv) {
node_object* start_node = new node_object ( get_tree_root() );
ec_queue_t* searching_queue = new ec_queue_t;
searching_queue->push( start_node );
searching_queue->start();
searching_queue->reduce( collect_target_node() )
}





