Rounding errors for expressions with floor/ceil funcitons

106 Views Asked by At

In parallel computations I frequently use the following distribution of data to processors: having a sequence of $n$ elements $x_1,\ldots,x_n$ and $P$ processors $p_1,\ldots,p_P$, processor $p_k$ handles the elements $x_{\mathit{first}(k)},\ldots,x_{\mathit{last}(k)}$, where

$$ \mathit{first}(k)=\left\lceil (k-1)\frac{n}{P}+\frac{1}{2} \right\rceil, \quad \mathit{last}(k)=\left\lceil k\frac{n}{P}-\frac{1}{2} \right\rceil. $$

This provide a balanced distribution such that each processor handles the same amortized number of elements. The element $x_i$ is then located in a local array of processor $p_{\mathit{proc}(i)}$ at position $\mathit{pos}(i)$, where

$$ \mathit{proc}(i)=\left\lfloor \left(i-\frac{1}{2}\right) \frac{P}{n} \right\rfloor + 1, \quad \mathit{pos}(i)=i-\mathit{first}(\mathit{proc}(i)). $$

My problem is that I am not sure whether the values of $\mathit{proc}$ and $\mathit{pos}$ functions calculated by floor/ceil functions within a computer program and thus containing rounding errors may not differ from their actual values? And also, I wonder how to provide an analysis of this problem. Is there any mathematical approach for such kind of analyses/proves?

1

There are 1 best solutions below

4
On BEST ANSWER

Basically what you are trying to do is separate $(x_n)$ into chunks of size $n/P$ (your assignment seems faulty as-is actually, the chunks will have length $P/n \to 0$ as $n\to\infty$. I suggest doing a high-precision computation of $$l := \lfloor n/P \rfloor$$ using a non floating-point algorithm of your choice and doing the same for $$r := n\ \mathrm{mod}\ P$$ Now you have two integers with $n = P\cdot l + r$ so the load can be determined by

globalIdx = 0;
for (i = 0; i < P; i++)
    l[i] = i < r ? l : l + 1;
    pStart[i] = globalIdx;
    pEnd[i] = globalIdx + l[i] - 1;
    globalIdx += l[i];
end

If $P$ also grows, we can run the algorithm in parallel on each processor in $\mathcal O(1)$. Only $l$ and $r$ need to be precomputed (feasible in $\mathcal O(\log n)$):

l[i] = i < r ? l : l + 1; // Determine my load
pStart[i] = i * l;
if (i > r) pStart[i] += i - r;
pEnd[i] = pStart[i] + l[i] - 1;