1 files changed, 367 insertions, 0 deletions
diff --git a/libs/mpi/doc/collective.qbk b/libs/mpi/doc/collective.qbk
new file mode 100644
index 0000000000..101c7f5c70
--- /dev/null
+++ b/libs/mpi/doc/collective.qbk
@@ -0,0 +1,367 @@
+[section:collectives Collective operations]
+
+[link mpi.tutorial.point_to_point Point-to-point operations] are the
+core message passing primitives in Boost.MPI. However, many
+message-passing applications also require higher-level communication
+algorithms that combine or summarize the data stored on many different
+processes. These algorithms support many common tasks such as
+"broadcast this value to all processes", "compute the sum of the
+values on all processors" or "find the global minimum." 
+
+[section:broadcast Broadcast]
+The [funcref boost::mpi::broadcast `broadcast`] algorithm is
+by far the simplest collective operation. It broadcasts a value from a
+single process to all other processes within a [classref
+boost::mpi::communicator communicator]. For instance, the
+following program broadcasts "Hello, World!" from process 0 to every
+other process. (`hello_world_broadcast.cpp`)
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <string>
+  #include <boost/serialization/string.hpp>
+  namespace mpi = boost::mpi;
+
+  int main()
+  {
+    mpi::environment env;
+    mpi::communicator world;
+
+    std::string value;
+    if (world.rank() == 0) {
+      value = "Hello, World!";
+    }
+
+    broadcast(world, value, 0);
+
+    std::cout << "Process #" << world.rank() << " says " << value 
+              << std::endl;
+    return 0;
+  } 
+
+Running this program with seven processes will produce a result such
+as:
+
+[pre
+Process #0 says Hello, World!
+Process #2 says Hello, World!
+Process #1 says Hello, World!
+Process #4 says Hello, World!
+Process #3 says Hello, World!
+Process #5 says Hello, World!
+Process #6 says Hello, World!
+]
+[endsect:broadcast]
+
+[section:gather Gather]
+The [funcref boost::mpi::gather `gather`] collective gathers
+the values produced by every process in a communicator into a vector
+of values on the "root" process (specified by an argument to
+`gather`). The /i/th element in the vector will correspond to the
+value gathered from the /i/th process. For instance, in the following
+program each process computes its own random number. All of these
+random numbers are gathered at process 0 (the "root" in this case),
+which prints out the values that correspond to each processor. 
+(`random_gather.cpp`)
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <vector>
+  #include <cstdlib>
+  namespace mpi = boost::mpi;
+
+  int main()
+  {
+    mpi::environment env;
+    mpi::communicator world;
+
+    std::srand(time(0) + world.rank());
+    int my_number = std::rand();
+    if (world.rank() == 0) {
+      std::vector<int> all_numbers;
+      gather(world, my_number, all_numbers, 0);
+      for (int proc = 0; proc < world.size(); ++proc)
+        std::cout << "Process #" << proc << " thought of " 
+                  << all_numbers[proc] << std::endl;
+    } else {
+      gather(world, my_number, 0);
+    }
+
+    return 0;
+  } 
+
+Executing this program with seven processes will result in output such
+as the following. Although the random values will change from one run
+to the next, the order of the processes in the output will remain the
+same because only process 0 writes to `std::cout`.
+
+[pre
+Process #0 thought of 332199874
+Process #1 thought of 20145617
+Process #2 thought of 1862420122
+Process #3 thought of 480422940
+Process #4 thought of 1253380219
+Process #5 thought of 949458815
+Process #6 thought of 650073868
+]
+
+The `gather` operation collects values from every process into a
+vector at one process. If instead the values from every process need
+to be collected into identical vectors on every process, use the
+[funcref boost::mpi::all_gather `all_gather`] algorithm,
+which is semantically equivalent to calling `gather` followed by a
+`broadcast` of the resulting vector.
+
+[endsect:gather]
+
+[section:scatter Scatter]
+The [funcref boost::mpi::scatter `scatter`] collective scatters
+the values from a vector in the "root" process in a communicator into 
+values in all the processes of the communicator.
+ The /i/th element in the vector will correspond to the
+value received by the /i/th process. For instance, in the following
+program, the root process produces a vector of random nomber and send 
+one value to each process that will print it. (`random_scatter.cpp`)
+
+  #include <boost/mpi.hpp>
+  #include <boost/mpi/collectives.hpp>
+  #include <iostream>
+  #include <cstdlib>
+  #include <vector>
+  
+  namespace mpi = boost::mpi;
+  
+  int main(int argc, char* argv[])
+  {
+    mpi::environment env(argc, argv);
+    mpi::communicator world;
+    
+    std::srand(time(0) + world.rank());
+    std::vector<int> all;
+    int mine = -1;
+    if (world.rank() == 0) {
+      all.resize(world.size());
+      std::generate(all.begin(), all.end(), std::rand);
+    }
+    mpi::scatter(world, all, mine, 0);
+    for (int r = 0; r < world.size(); ++r) {
+      world.barrier();
+      if (r == world.rank()) {
+        std::cout << "Rank " << r << " got " << mine << '\n';
+      }
+    }
+    return 0;
+  }
+
+Executing this program with seven processes will result in output such
+as the following. Although the random values will change from one run
+to the next, the order of the processes in the output will remain the
+same because of the barrier.
+
+[pre
+Rank 0 got 1409381269
+Rank 1 got 17045268
+Rank 2 got 440120016
+Rank 3 got 936998224
+Rank 4 got 1827129182
+Rank 5 got 1951746047
+Rank 6 got 2117359639
+]
+
+[endsect:scatter]
+
+[section:reduce Reduce] 
+
+The [funcref boost::mpi::reduce `reduce`] collective
+summarizes the values from each process into a single value at the
+user-specified "root" process. The Boost.MPI `reduce` operation is
+similar in spirit to the STL _accumulate_ operation, because it takes
+a sequence of values (one per process) and combines them via a
+function object. For instance, we can randomly generate values in each
+process and the compute the minimum value over all processes via a
+call to [funcref boost::mpi::reduce `reduce`]
+(`random_min.cpp`):
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <cstdlib>
+  namespace mpi = boost::mpi;
+
+  int main()
+  {
+    mpi::environment env;
+    mpi::communicator world;
+
+    std::srand(time(0) + world.rank());
+    int my_number = std::rand();
+
+    if (world.rank() == 0) {
+      int minimum;
+      reduce(world, my_number, minimum, mpi::minimum<int>(), 0);
+      std::cout << "The minimum value is " << minimum << std::endl;
+    } else {
+      reduce(world, my_number, mpi::minimum<int>(), 0);
+    }
+
+    return 0;
+  }
+
+The use of `mpi::minimum<int>` indicates that the minimum value
+should be computed. `mpi::minimum<int>` is a binary function object
+that compares its two parameters via `<` and returns the smaller
+value. Any associative binary function or function object will
+work provided it's stateless. For instance, to concatenate strings with `reduce` one could use
+the function object `std::plus<std::string>` (`string_cat.cpp`):
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <string>
+  #include <functional>
+  #include <boost/serialization/string.hpp>
+  namespace mpi = boost::mpi;
+
+  int main()
+  {
+    mpi::environment env;
+    mpi::communicator world;
+
+    std::string names[10] = { "zero ", "one ", "two ", "three ", 
+                              "four ", "five ", "six ", "seven ", 
+                              "eight ", "nine " };
+
+    std::string result;
+    reduce(world, 
+           world.rank() < 10? names[world.rank()] 
+                            : std::string("many "),
+           result, std::plus<std::string>(), 0);
+
+    if (world.rank() == 0)
+      std::cout << "The result is " << result << std::endl;
+
+    return 0;
+  } 
+
+In this example, we compute a string for each process and then perform
+a reduction that concatenates all of the strings together into one,
+long string. Executing this program with seven processors yields the
+following output:
+
+[pre
+The result is zero one two three four five six
+]
+
+[h4 Binary operations for reduce]
+Any kind of binary function objects can be used with `reduce`. For
+instance, and there are many such function objects in the C++ standard
+`<functional>` header and the Boost.MPI header
+`<boost/mpi/operations.hpp>`. Or, you can create your own
+function object. Function objects used with `reduce` must be
+associative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y),
+z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`),
+Boost.MPI can use a more efficient implementation of `reduce`. To
+state that a function object is commutative, you will need to
+specialize the class [classref boost::mpi::is_commutative
+`is_commutative`]. For instance, we could modify the previous example
+by telling Boost.MPI that string concatenation is commutative:
+
+  namespace boost { namespace mpi {
+
+    template<>
+    struct is_commutative<std::plus<std::string>, std::string> 
+      : mpl::true_ { };
+
+  } } // end namespace boost::mpi
+
+By adding this code prior to `main()`, Boost.MPI will assume that
+string concatenation is commutative and employ a different parallel
+algorithm for the `reduce` operation. Using this algorithm, the
+program outputs the following when run with seven processes:
+
+[pre
+The result is zero one four five six two three
+]
+
+Note how the numbers in the resulting string are in a different order:
+this is a direct result of Boost.MPI reordering operations. The result
+in this case differed from the non-commutative result because string
+concatenation is not commutative: `f("x", "y")` is not the same as
+`f("y", "x")`, because argument order matters. For truly commutative
+operations (e.g., integer addition), the more efficient commutative
+algorithm will produce the same result as the non-commutative
+algorithm. Boost.MPI also performs direct mappings from function
+objects in `<functional>` to `MPI_Op` values predefined by MPI (e.g.,
+`MPI_SUM`, `MPI_MAX`); if you have your own function objects that can
+take advantage of this mapping, see the class template [classref
+boost::mpi::is_mpi_op `is_mpi_op`].
+
+[warning Due to the underlying MPI limitations, it is important to note that the operation must be stateless.]
+
+[h4 All process variant]
+
+Like [link mpi.tutorial.collectives.gather `gather`], `reduce` has an "all"
+variant called [funcref boost::mpi::all_reduce `all_reduce`]
+that performs the reduction operation and broadcasts the result to all
+processes. This variant is useful, for instance, in establishing
+global minimum or maximum values.
+
+The following code (`global_min.cpp`) shows a broadcasting version of 
+the `random_min.cpp` example:
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <cstdlib>
+  namespace mpi = boost::mpi;
+  
+  int main(int argc, char* argv[])
+  {
+    mpi::environment env(argc, argv);
+    mpi::communicator world;
+  
+    std::srand(world.rank());
+    int my_number = std::rand();
+    int minimum;
+  
+    mpi::all_reduce(world, my_number, minimum, mpi::minimum<int>());
+    
+    if (world.rank() == 0) {
+        std::cout << "The minimum value is " << minimum << std::endl;
+    }
+  
+    return 0;
+  }
+
+In that example we provide both input and output values, requiring
+twice as much space, which can be a problem depending on the size 
+of the transmitted data. 
+If there  is no need to preserve the input value, the output value 
+can be omitted. In that case the input value will be overridden with 
+the output value and Boost.MPI is able, in some situation, to implement
+the operation with a more space efficient solution (using the `MPI_IN_PLACE`
+flag of the MPI C mapping), as in the following example (`in_place_global_min.cpp`):
+
+  #include <boost/mpi.hpp>
+  #include <iostream>
+  #include <cstdlib>
+  namespace mpi = boost::mpi;
+  
+  int main(int argc, char* argv[])
+  {
+    mpi::environment env(argc, argv);
+    mpi::communicator world;
+  
+    std::srand(world.rank());
+    int my_number = std::rand();
+  
+    mpi::all_reduce(world, my_number, mpi::minimum<int>());
+    
+    if (world.rank() == 0) {
+        std::cout << "The minimum value is " << my_number << std::endl;
+    }
+  
+    return 0;
+  }
+
+
+[endsect:reduce]
+
+[endsect:collectives]