Parallel Constructs Documentations
These function define the primative operations used by the physics modules to exchange data in parallel. When using these functions, the should not have to make any MPI calls directly, they should all be encapsulated within the provided functions.
TODO: crossref to physics module documentation
The Types and Basic API section describes the SharedFaceData
datatype and the basic functions that operate on it. The Parallel Data Exchange section describes the functions used by the physics modules that start and finish parallel parallel communication.
Types and Basic API
Utils.SharedFaceData
— Type.This type holds all the data necessary to perform MPI communication with a given peer process that shared mesh edges (2D) or faces (3D) with the current process.
Fields:
peernum: the MPI rank of the peer process
peeridx: the index of this peer in mesh.peer_parts
myrank: MPI rank of the current process
comm: MPI communicator used to define the above
q_send: the send buffer, a 3D array of n x m x d. While these dimensions
are arbitrary, there are two commonly used case. If
opts["parallel_type"] == face, then m is mesh.numNodesPerFace and
d is the number of faces shared with peernum.
If opts["parallel_type"] == element, then
m = mesh.numNodesPerElement and d is the number of elements that
share faces with peernum.
q_recv: the receive buffer. Similar to q_send, except the size needs to
to be the number of entities on the *remote* process.
send_waited: has someone called MPI.Wait() on send_req yet? Some MPI
implementations complain if Wait() is called on a Request
more than once, so use this field to avoid doing so.
send_req: the MPI.Request object for the Send/Isend/whatever other type of
Send
send_status: the MPI.Status object returned by calling Wait() on send_req
recv_waited: like send_waited, but for the receive
recv_req: like send_req, but for the receive
recv_status: like send_status, but for the receive
bndries_local: Vector of Boundaries describing the faces from the local
side of the interface
bndries_remote: Vector of Boundaries describing the facaes from the remote
side (see the documentation for PdePumiInterface before
using this field)
interfaces: Vector of Interfaces describing the faces from both sides (see
the documentation for PdePumiInterfaces, particularly the
mesh.shared_interfaces field, before using this field
Utils.SharedFaceData
— Method.Outer constructor for SharedFaceData.
Inputs:
mesh: a mesh object
peeridx: the index of a peer in mesh.peer_parts
q_send: the send buffer
q_recv: the receive buffer
Utils.getSharedFaceData
— Method.This function returns a vector of SharedFaceData objects, one for each peer processes the current process shares mesh edge (2D) or face (3D) with. This function is intended to be used by the AbstractSolutionData constructors, although it can be used to create additional vectors of SharedFaceData objects.
if opts["parallel_data"] == "face", then the send and receive buffers are numDofPerNode x numNodesPerFace x number of shared faces.
if opts["parallel_data"] == "element", the send and receive buffers are numDofPerNode x numNodesPerElement x number of elements that share the faces. Note that the number of elements that share the faces can be different for the send and receive buffers.
Inputs:
Tsol: element type of the arrays
mesh: an AbstractMesh object
sbp: an SBP operator
opts: the options dictonary
Outputs:
data_vec: Vector{SharedFaceData}. data_vec[i] corresponds to
mesh.peer_parts[i]
Base.copy!
— Method.In-place copy for SharedFaceData. This copies the buffers, but does not retain the state of the Request and Status fields. Instead they are initialized the same as the constructor.
This function may only be called after receiving is complete, otherwise an exception is thrown.
Base.copy
— Method.Copy function for SharedFaceData. Note that this does not retain the send_req/send_status (and similarly for the recceive) state of the original object. Instead, they are initialized the same as the constructor.
This function may only be called after receiving is complete, otherwise an exception is thrown.
Utils.assertReceivesConsistent
— Method.Like assertSendsConsistent, but for the receives
Utils.assertReceivesWaited
— Method.This function verifies all the receives have been waited on for the supplied SharedFaceData objects
Utils.assertSendsConsistent
— Method.Verify either all or none of the sends have been waited on. Throw an exception otherwise.
Inputs:
shared_data: Vector of SharedFaceData objects
Output:
val: number of receives that have been waited on
Utils.waitAllReceives
— Method.This function is like MPI.Waitall, operating on the recvs of a vector of SharedFaceData objects
Utils.waitAllSends
— Method.This function is like MPI.Waitall, operating on the sends of a vector of SharedFaceData objects
Utils.waitAnyReceive
— Method.Like MPI.WaitAny, but operates on the receives of a vector of SharedFaceData. Only the index of the Request that was waited on is returned, the Status and recv_waited fields of hte SharedFaceData are updated internally
Parallel Data Exchange
The functions in this section are used to start sending data in parallel and finish receiving it. All functions operate on a Vector
of SharedFaceData
that define what data to send to which peer processes. See Parallel Overview for a high-level overview of how the code is parallelized.
Sending the data to the other processes is straight-forward. Receiving it (efficiently) is not. In particular, [finishExchangeData
] waits to receive data from one peer process, calls a user supplied callback function to do calculations involving the received data, and then waits for the next receive to finish. This is significantly more efficient than waiting for all receives to finish and then doing computations on all the data.
This section describes the API the physics modules use to do parallel communication. The Internals section describes the helper functions used in the implementation.
Utils.startSolutionExchange
— Function.This function is a thin wrapper around exchangeData(). It is used for the common case of sending and receiving the solution variables to other processes. It uses eqn.shared_data to do the parallel communication. eqn.shared_data must be passed into the corresponding finishDataExchange call.
Inputs: mesh: an AbstractMesh sbp: an SBP operator eqn: an AbstractSolutionData opts: options dictionary
Keyword arguments: tag: MPI tag to use for communication, defaults to TAG_DEFAULT wait: wait for sends and receives to finish before exiting
Utils.exchangeData
— Function.This function posts the MPI sends and receives for a vector of SharedFaceData. It works for both opts["parallel_data"] == "face" or "element". The only difference between these two cases is the populate_buffer() function.
The previous receives using these SharedFaceData objects should have completed by the time this function is called. An exception is throw if this is not the case.
The previous sends are likely to have completed by the time this function is called, but they are waited on if not. This function might not perform well if the previous sends have not completed. #TODO: fix this using WaitAny
Inputs: mesh: an AbstractMesh sbp: an SBPOperator eqn: an AbstractSolutionData opts: the options dictionary populate_buffer: function with the signature: populate_buffer(mesh, sbp, eqn, opts, data::SharedFaceData) that populates data.q_send Inputs/Outputs: shared_data: vector of SharedFaceData objects representing the parallel communication to be done
Keyword Arguments: tag: MPI tag to use for this communication, defaults to TAG_DEFAULT This tag is typically used by the communication of the solution variables to other processes. Other users of this function should provide their own tag
wait: wait for the sends and receives to finish before returning. This
is a debugging option only. It will kill parallel performance.
Utils.finishExchangeData
— Function.This is the counterpart of exchangeData. This function finishes the receives started in exchangeData.
This function (efficiently) waits for a receive to finish and calls a function to do calculations for on that data. If opts["parallel_data"] == "face", it also permutes the data in the receive buffers to agree with the ordering of elementL. For opts["parallel_data"] == "element", users should call SummationByParts.interiorFaceInterpolate to interpolate the data to the face while ensuring proper permutation.
Inputs: mesh: an AbstractMesh sbp: an SBPOperator eqn: an AbstractSolutionData opts: the options dictonary calc_func: function that does calculations for a set of shared faces described by a single SharedFaceData. It must have the signature calc_func(mesh, sbp, eqn, opts, data::SharedFaceData)
Inputs/Outputs: shared_data: vector of SharedFaceData, one for each peer process that needs to be communicated with. By the time calc_func is called, the SharedFaceData passed to it has its q_recv field populated. See note above about data permutation.
Utils.TAG_DEFAULT
— Constant.Default MPI tag used for sending and receiving solution variables.
Internals
These helper functions are used by the functions in Parallel Data Exchange.
Utils.verifyReceiveCommunication
— Function.Utils.verifyCommunication
This function checks the data provided by the Status object to verify a communication completed successfully. The sender's rank and the number of elements is checked agains the expected sender and the buffer size
Inputs: data: a SharedFaceData
Utils.getSendDataFace
— Function.This function populates the send buffer from eqn.q for opts["parallle_data"] == "face"
Inputs: mesh: a mesh sbp: an SBP operator eqn: an AbstractSolutionData opts: options dictonary
Inputs/Outputs: data: a SharedFaceData. data.q_send will be overwritten
Utils.getSendDataElement
— Function.This function populates the send buffer from eqn.q for opts["parallle_data"] == "element"
Inputs:
mesh: a mesh
sbp: an SBP operator
eqn: an AbstractSolutionData
opts: options dictonary
Inputs/Outputs:
data: a SharedFaceData. data.q_send will be overwritten
Utils.@mpi_master
— Macro.Utils.mpi_master
This macro introduces an if statement that causes the expression to be executed only if the variable myrank is equal to zero. myrank must exist in the scope of the caller
Utils.@time_all
— Macro.Utils.time_all
This macro returns the value produced by the expression as well as the execution time, the GC time, and the amount of memory allocated