- Calculate the matrix coefficients (a, b, c) based on scene CoC for each row of the image.
- Solve the tridiagonal system for each row.
- Calculate the matrix coefficients (a, b, c) based on scene CoC for each column of the image.
- Solve the tridiagonal system for each column.
Calculating the coefficients
From the previous post one know that:
a = -s_{i-1}
b = 1 + s_{i-1} + s_i
c = -s_i
The index is used because the CoC changes between pixels.
And with some math explained here one know that:
s_i = CoC_i^2
Solving the system
Solving the system is the more complicated step. As you probably noticed, solving it in CPU with the TDMA algorithm is easy but this algorithm is serial, thus we have to find a more appropriated algorithm.
The algorithm I choose was the Cyclic Reduction. This algorithm, given a tridiagonal system, reduces the original system to a new one with half the variables. Therefore one can recursively reduce it until one reaches a system with fewer variables that can be solved fast. After that one can perform a back substitution in the system of n variables using the result from system n/2 until one reaches the final solution.
Supose one have the following system:
Supose one have the following system:
a_1x_1 + b_1x_2 + c_1x_3 = y_1
a_2x_2 + b_2x_3 + c_2x_4 = y_2
a_3x_3 + b_3x_4 + c_3x_5 = y_3
let:
k_1 = - \frac{a_2}{b_1}
k_2 = - \frac{c_2}{b_3}
Then multiply eq 1 by k1 , eq 3 by k2 and sum the three eqs:
(a_1k_1)x_1 + (c_1k_1 + b_2 + a_3k_2)x_3 + (c_3k_2)x_5 = y_1k_1 + y_2 + y_3 k_2
So now one have a reduced system also tridiagonal and with only odd indices. One can recursively apply this technique until reach a system with only one or two unknowns.
Thus the new coefficients are:
a_1^{'} = a_1 k_1
b_{1}^{'} = c_{1} k_{1} + b_{2} + a_{3} k_{2}
c_{1}^{'} = c_{3} k_{2}
y_{1}^{'} = y_{1} k_{1} + y_{2} + y_{3} k_{2}
Then after reducing and solving the odd indices one can perform the back substitution solving the even indices:
a_1x_1 + b_1x_2 + c_1x_3 = y_1
x_2 = \frac{(y_1 - a_1x_1 - c_1x_3)}{b_1}
With this algorithm one can reduce in parallel all odd indices and solve in parallel all even indices.
Notice that one can also reduce both odd and even indices at the same time, getting two systems with half unknowns, with this method in each step one doubles the number of systems and reduces the unknowns by half, this is called PCR (Parallel Cyclic Reduction).
So this algorithm is pretty simple but its implementation using fragment shaders was painful.
As we are solving a system with CR, any deviation in the values can cause a huge difference in the result because the error propagates in each step. Thus the use of texture coordinates to access the data lead to some problems:
- A small offset in the texcoord causes interpolation between neighborhood pixels leading to wrong values.
- We must treat access to out of bounds texture values.
- In back substitution stage we must 'process' only even pixels and copy odd ones.
Notice that the video quality isn't very good, it looks much better live.