a given number of processors as quickly as possible when the
processing times of the tasks are variable and not known in advance.
The tasks are assigned to the processors in chunks consisting
of several tasks at a time, and the difficulty lies in finding
the optimal tradeoff between the processors' load balance, which
is favoured by having small chunks, and the total scheduling overhead,
which will be the lower the fewer chunks there are.
Our studies are motivated by a practical problem from
high-performance computing, namely parallel-loop scheduling,
for which a large variety of heuristics have been proposed
in the past, but hardly any rigorous analysis has been presented to date.