max planck institut

informatik

informatik

**MPI-I-2000-1-003**. May** **2000, 56 pages. | Status:** **available - back from printing | Next --> Entry | Previous <-- Entry

Abstract in LaTeX format:

In this paper, we present a randomized, online, space-efficient

algorithm for the general class of programs with synchronization

variables (such programs are produced by parallel programming

languages, like, e.g., Cool, ID, Sisal, Mul-T, OLDEN and Jade).

The algorithm achieves good locality and low scheduling overheads

for this general class of computations, by combining work-stealing

and depth-first scheduling.

More specifically, given a computation with work $T_1$,

depth $T_\infty$ and $\sigma$ synchronizations that its

execution requires space $S_1$ on a single-processor

computer, our algorithm achieves expected space

complexity at most $S_1 + O(PT_\infty \log (PT_\infty))$

and runs in an expected number of

$O(T_1/P + \sigma \log (PT_\infty)/P + T_\infty \log (PT_\infty))$

timesteps on a shared-memory, parallel machine with $P$ processors.

Moreover, for any $\varepsilon > 0$, the space complexity of our

algorithm is at most $S_1 + O(P(T_\infty + \ln (1/\varepsilon))

\log (P(T_\infty + \ln(P(T_\infty + \ln (1/\varepsilon))/\varepsilon))))$

with probability at least $1-\varepsilon$. Thus, even for values

of $\varepsilon$ as small as $e^{-T_\infty}$, the space complexity

of our algorithm is at most $S_1 + O(PT_\infty \log(PT_\infty))$,

with probability at least $1-e^{-T_\infty}$. The algorithm achieves

good locality and low scheduling overheads by automatically

increasing the granularity of the work scheduled on each

processor.

Our results combine and extend previous algorithms and

analysis techniques (published by Blelloch et. al [6]

and by Narlikar [26]). Our algorithm not only exhibits the

same good space complexity for the general class of programs

with synchronization variables as its deterministic analog

presented in [6], but it also achieves good locality and

low scheduling overhead as the algorithm presented in [26],

which however performs well only for the more restricted class

of nested parallel computations.

Acknowledgement:** **

References to related material:

To download this research report, please select the type of document that fits best your needs. | Attachement Size(s): |
---|---|

787 KBytes | |

Please note: If you don't have a viewer for PostScript on your platform, try to install GhostScript and GhostView |