segunda-feira, 19 de agosto de 2013

Implementing the Diffusion DoF

As explained previously, the diffusion DoF algorithm can be separated in the following steps:

  • Calculate the matrix coefficients (a, b, c) based on scene CoC for each row of the image.
  • Solve the tridiagonal system for each row.
  • Calculate the matrix coefficients (a, b, c) based on scene CoC for each column of the image.
  • Solve the  tridiagonal system for each column.

Calculating the coefficients

From the previous post one know that:

a = -s_{i-1}
b = 1 + s_{i-1} + s_i
c = -s_i

The index is used because the CoC changes between pixels.
And with some math explained here one know that:

s_i = CoC_i^2

Solving the system

Solving the system is the more complicated step. As you probably noticed, solving it in CPU with the TDMA algorithm is easy but this algorithm is serial, thus we have to find a more appropriated algorithm.
The algorithm I choose was the Cyclic Reduction. This algorithm, given a tridiagonal system, reduces the original system to a new one with half the variables. Therefore one can recursively reduce it until one reaches a system with fewer variables that can be solved fast. After that one can perform a back substitution in the system of n variables using the result from system n/2 until one reaches the final solution.

Supose one have the following system:

a_1x_1 + b_1x_2 + c_1x_3 = y_1
a_2x_2 + b_2x_3 + c_2x_4 = y_2
a_3x_3 + b_3x_4 + c_3x_5 = y_3

k_1 = - \frac{a_2}{b_1}
k_2 = - \frac{c_2}{b_3}

Then multiply eq 1 by k1 , eq 3 by k2 and sum the three eqs:

(a_1k_1)x_1 + (c_1k_1 + b_2 + a_3k_2)x_3 + (c_3k_2)x_5 = y_1k_1 + y_2 + y_3 k_2

So now one have a reduced system also tridiagonal and with only odd indices. One can recursively apply this technique until reach a system with only one or two unknowns.
Thus the new coefficients are:

a_1^{'} = a_1 k_1
b_{1}^{'} = c_{1} k_{1} + b_{2} + a_{3} k_{2}
c_{1}^{'} = c_{3} k_{2}
y_{1}^{'} = y_{1} k_{1} + y_{2} + y_{3} k_{2}

Then after reducing and solving the odd indices one can perform the back substitution solving the even indices:

a_1x_1 + b_1x_2 + c_1x_3 = y_1
x_2 = \frac{(y_1 - a_1x_1 - c_1x_3)}{b_1}

With this algorithm one can reduce in parallel all odd indices and solve in parallel all even indices.
Notice that one can also reduce both odd and even indices at the same time, getting two systems with half unknowns, with this method in each step one doubles the number of systems and reduces the unknowns by half, this is called PCR (Parallel Cyclic Reduction).

So this algorithm is pretty simple but its implementation using fragment shaders was painful.
As we are solving a system with CR, any deviation in the values can cause a huge difference in the result because the error propagates in each step. Thus the use of texture coordinates to access the data lead to some problems:
  • A small offset in the texcoord causes interpolation between neighborhood pixels leading to wrong values.
  • We must treat  access to out of bounds texture values.
  • In back substitution stage we must 'process' only even pixels and copy odd ones.
Then when I'm writing code, always passes a mistake or another, and you can't easily debug shaders, or even notice a small deviation in pixel colors (in that case the problem variables). Summing all that, it took a whole week and some days to get everything working properly. But in the end it was rewarding to see the result.

Notice that the video quality isn't very good, it looks much better live.

sexta-feira, 9 de agosto de 2013

DoF techniques

This week I started to research about DoF (Depth of Field) techniques, besides the classical gaussian blur I found 2 interesting techniques: the hexagonal DoF with bokeh from Frostbite2 and the Diffusion DoF used in Metro2033.
For the first, I was able to implement the hexagonal blur, but I yet didn't figured out how to compose the final image based on the blured image and the per-pixel CoC (Circle of Confusion). Simply interpolating between the original image and the blurred based on the CoC didn't looks good, and also there is the bleeding of in-focus/out-of-focus pixels colors. Here are some images (notice that I'm faking hdr to make brighter spots more perceptible)

Diffusion DOF

On the other hand the Diffusion DoF, in a elegantly way, uses the heat diffusion equation to implement the DoF effect. This method treat the image as a medium with each pixel color being its heat. Considering the pixel CoC as the heat conductivity of the pixel, we can simulate the heat diffusion  "spreading" the color among the pixels. Following we have the heat equation:

\frac{\partial u}{\partial t} = \beta \left(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}\right)

Using the Alternate Direction Implicit (ADI) method we alternate the directions (x,y) solving two one dimensional problem at each time step. This 2D ADI method is unconditionally stable, this means that we can use any time step we want.
Simplifying the notation we have:

u_j^k = u_j^{k+1} + \frac{\beta \Delta t}{\Delta x^2}\( u_{j+1}^{k+1} - 2u_j^{k+1} + u_{j-1}^{k+1}\)
u_j^k = \(1+2s\)u_j^{k+1} - s\(u_{j+1}^{k+1} + u_{j-1}^{k+1}\)
where: s = \frac{\beta \Delta t}{\Delta x^2} , and u_{j}^{k} is the color in position j on time k.
 As the new value of u in the time k+1 depends on its adjacents values, we need to solve a system where:
\left( \begin{array}{cccccccc} b & c & & & & & & 0 \\ a & b & c & & & & & \\ & & & & & & & \\ & & & (...) & & & & \\ & & & & & & & \\ & & & & a & b & c & \\ & & & & & a & b & c\\ 0 & & & & & & a & b\\ \end{array} \right) \times \left( \begin{array}{c} u_0^{k+1} \\ u_1^{k+1} \\ \\ (...) \\ \\ \\ u_{n-1}^{k+1} \\ u_n^{k+1} \\ \end{array} \right) = \left( \begin{array}{c} u_0^k \\ u_1^{k} \\ \\ (...) \\ \\ \\ u_{n-1}^{k} \\ u_n^{k+1} \\ \end{array} \right)
a = -s
b = 1 + 2s
c = -s

For this kind of matrix (tridiagonal)  we have fast methods to to solve the system, in my CPU implementation, I used the TDMA (also known as the Thomas algorithm).
Therefore we first have to calculate the diffusion in x direction and them in the y direction. Notice that for each row/column, we need to solve one system.

Before going to implement it in CrystalSpace, I implemented  it first in CPU to have a reference and here are the result:

Original image

CoC image

First step(diffusion in x direction)

Second step(diffusion in y direction)

You can download the code here.

sábado, 27 de julho de 2013

Finalizing the HBAO

This week I finished to fully implement the HBAO effect, following we have  the effect xml file:
    <layer name="hbao" shader="/shader/postproc/HBAO/HBAO.xml" downsample="0">
        <parameter name="angle bias" type="float">0.52359877</parameter>
        <parameter name="sqr radius" type="float">0.01</parameter>
        <parameter name="inv radius" type="float">10</parameter>
        <parameter name="radius" type="float">0.1</parameter>
        <parameter name="num steps" type="float">25</parameter>
        <parameter name="num directions" type="float">32</parameter>
        <parameter name="contrast" type="float">2.3</parameter>
        <parameter name="attenuation" type="float">1.0</parameter>

        <input source="/data/posteffects/CosSinJitter.bmp" texname="tex csj"/>
        <output name="out" format="argb8" />

    <layer name="blurX" shader="/shader/postproc/HBAO/BilateralBlurX.xml">
        <parameter name="blur radius" type="float">7</parameter>
        <parameter name="blur falloff" type="float">0.010204081632</parameter> <!-- 1 / (2*radius*radius)-->
        <parameter name="sharpness" type="float">100000</parameter>

        <input layer="hbao.out" texname="tex diffuse"/>
        <output name="out" format="argb8" />

    <layer name="blurY" shader="/shader/postproc/HBAO/BilateralBlurY.xml">
        <parameter name="blur radius" type="float">7</parameter>
        <parameter name="blur falloff" type="float">0.010204081632</parameter> <!-- 1 / (2*radius*radius)-->
        <parameter name="sharpness" type="float">100000</parameter>

        <input layer="blurX.out" texname="tex diffuse"/>
        <output name="out" format="argb8" />

    <layer name="combine" shader="/shader/postproc/HBAO/combine.xml">
        <input layer="*screen" texname="tex diffuse"/>
        <input layer="blurY.out" texname="tex ao"/>

So, it contains 4 layer:

  • The HBAO - is the main layer, as the original effect uses a buffer(dx10) with precomputed random (cosine,sine,jitter) I baked it into a texture. In the parameters we have, the radius (in which the texture will be sampled), the number of directions and how much steps in each direction will be calculated. 
  • Blur X and Y - is the bilateral blur, this is, a gaussian blur weigthed by the depth diferrence between the samples, the sharpnes parameter controls how much the difference affect the weight 
  • Combine - this layers just combines the AO with the color, it can be done directly in the BlurY pass, but I keep it for now, just to make debug/test easy.
Now I will work on properly hook the depth buffer so that the effects can use it, I think there is some bug related to fbo's on intel gpus, because when I try to run the deferred rm on it the rm binds a D24S8 buffer and then the ShadowPSSM tries to bind a D32 buffer causing a invalid attachment error, but if I change the first to D32 it works fine. Maybe this bug was causing the invalid attachment when I tried to hook the depth buffer with a D24S8 format.
Other strange thing is that this same intel gpu supports dx10, but in the HBAO if I use "for" loops it says that I can't use "for", and also if I remove the "for"(unroll) it says that I exceded the max temporary registers! But the NVIDIA demo with almost the same shader runs fine.

domingo, 21 de julho de 2013

Implementing HBAO

After improve the error and warnings notification I started to implement the HBAO, based on the NVIDIA HBAO  demo. I think that this ambient occlusion achieves better visual quality than the standard SSAO and also it can be implemented without having to store  the normals, thus being usable not only by the deferred rm.
Here is a small video showing the effect:

As there are various effects that uses the depth buffer, now I have to find a elegant way to hook the depth buffer and provide it to the effects without affecting the rm drawing. I also thought in provide some kind of depth buffer usage capabilities to post-effects.

sexta-feira, 12 de julho de 2013

Post-Effect XML Parser

This week I did some enhancements in  PostEffectLayersParser. An effect can be described by the following structure:

    <layer name="name" shader="/shader.xml" downsample="0" mipmap="false" maxmipmap="-1">
        <input layer="" texname="tex diffuse" texcoord="texture coordinate 0" pixelsize="pixel size" />
        <input source="/tex.png" texname="a texture" />

        <output name="out1" format="argb16_f" reusable="true">
        <output ... />
        <parameter name="param_name" type="vector2">1,1<parameter />
        <parameter (...)>(...)<parameter />

Based on that structure we have:
An effect can have multiple layers and each layer must specify at least its name and the shader used. Optionally can be specified its downsample, mipmap and maxmipmap.
Each layer can have one or more inputs and one or more outputs.
In the input attributes we have:

  • layer - specifies which layer output will be used as input. If omitted will be used the first output of the previous layer (in case it's the first layer then will be used the input texture). The input format is <layername>.<outputname>, if <outputname> is omitted then is used the first output of <layername>.
  • texname - shader variable name used to access this texture, if omitted the default value is "tex diffuse"
  • texcoord - shader variable name used to access the texture coordinate, if omitted the default value is "texture coordinate 0"
  • pixelsize - shader variable name used to access the pixel size for this texture, if omitted no shader variable is provided to the shader.
  • source - specifies the path to the texture used as input.

Notice that layer and source are mutual exclusive, and if both attributes are provided layer takes precedence over source. If no input tag is provided then is used one with all default values.

In the output tag can be specified:

  • name - the output name, used to access this output as input for other layers, if omitted this output will not be accessible unless this is the first output.
  • format -the output texture format, if omitted the default value is "argb8".
  • reusable - if false prevents the reuse of this output , if omitted the default value is "true".
  • mipmap - enables mipmap generation, overrides the value specified in layer tag, optional.
  • maxmipmap - max mipmap level to generate, overrides the value specified in layer tag, optional.
If output tag is omitted, the output used will use all default values.

The parameter tag is used to provide default shader variable values for the post effect.

domingo, 7 de julho de 2013

Refactoring the PostEffect code

After porting the old postprocessing code I've started to refactor the code.
Starting by the PostEffectSupport class, first I removed some functions that no longer make sence. As PostEffectSupport contains an array of post effects and not Layers, the functions ClearLayers, AddLayers* and ClearIntermediates are removed. I've added the function SetPostEffectEnable so that we can enable/disable postprocessing at any time.

In the iPostEffect interface I've removed the function ClearIntemediates because all texture management was move to PostEffectManager. I also added the function LoadFromFile and Construct, which respectively, loads the effect from an external xml file and setup the effect (which include allocate textures, link layers and sets up shaders variables).

In this process I created some helper classes: DependencySolver and PrioriryShadeVariableContext.
The first given a graph of layers with edges linking a layer output to another layer input, this class calculates the minimum number of textures needed to draw the effect and assigns each texture to its respective layer.
The second stacks up shaders variable contexts with each one having a priority so when pushing the variables to the svStack the highest priority dominates over the lower priorities. Then we can have a svContext for default variables and one(or multiples) to user defined variables in a clean way.

For the PostEffectManager I added 3 new functions:

  • RequestTexture
  • SetupView
  • RemoveUnusedTextures

Which respectively:

  • Asks the manager to create or reuse a texture previously allocated based on the allocation info and the given number(identifier).
  • Sets up the view, if resolution has changed it releases all textures so that PostEffectSupport forces the effects to reacquire the textures
  • We keep an array of allocated textures (csRef), then when the reference of the texture reaches 1, it's no longer used in any post effect and we can safely remove it, yet this function isn't implemented because I'm having some trouble with the CS iterators, I think using std would be much more easy.

And finally, the HDR code is BROKEN!
I noticed that the hdr already no longer worked, but now given all these changes, the code is really broken and I will have to devote some time to fix it. It's good and bad because it consumes time but I think in the end the hdr code will be more simple more versatile and, of course, will work properly.

quarta-feira, 26 de junho de 2013

Crystal Space Postprocessing Requirements & Design

The postprocessing requirementes are:
  • posteffect can have multiple passes (layers)
  • each pass can uses multiples inputs
  • the pass inputs can be others layers outputs, or custom textures
  • each pass can have multiple outputs (render targets)
  • output textures can have custom format, downsample and mipmap values
  • # of output textures should be minimal
  • output textures can be reused if not strictly specified the opposite
  • a posteffect can have default shaders parameters
  • it should be data-driven, that is, fully set up using the posteffects xml files


As some part of these requirements has already been inherited from the previous code, then I'm doing small changes in the codebase to fit the rest of the requirements.

Texture usage optimization
I started to work in optimizing the texture usage, sharing common textures between layers. The old method (using ping-pong textures) would not work properly, imagine the case where we have 3 layers:

Layer 2 uses the output from layer 1 and layer 3 uses both outpus (1 and 2).

In such a case we clearly realize that we can't reuse the output of layer 1 until layer 3 finishes its work, the same occurs for layer 2.

From this example we conclude that:
- All inputs of layer i can't be reused until layer i is drawn
- All outputs of layer i can't be shared between themselves (obvious)

For that problem I wrote a simple algorithm that resolve the needed texture usage (pseudo code):

ResolveDependency( layers )
avail_tex : list;
used_tex : list;

for each output of last layer

for i = last layer to  first layer
    //remove textures where assigned layer number > i
    // and put it on avail_tex
    for each input of layer i
        //find if there is any texture  that match the
        //properties from the layer output referenced by input
        tex = avail_tex.find(format,mipmap,downsample);

        if tex = not_found then
            tex = CreateNewEntry(format,mipmap,downsample);

        asign tex to layer i

Of course it's a shallow overview of the implementation details.

I also did some minor changes in some structs related to the layer options. I will list the structs first then I explain them.

struct TextureAllocationInfo
    bool mipmap;
    int maxMipmap;
    int downsample;
    bool reusable;
    csString format;
    bool operator==(const TextureAllocationInfo& other);
As the name sugests this struct defines the properties of the layer's outpus, and will also be used as input param by  a Texture cache that will be implemented to share textures (RT's) between posteffects.

struct PostEffectLayerOptions
    TextureAllocationInfo info;
    csRef<iTextureHandle> renderTarget;
    csRect targetRect;
    csString name;
    bool operator==(const PostEffectLayerOptions& other);
This struct defines the full output options, its name and a manual render target if desired.

enum LayerInputType { AUTO, STATIC, MANUAL };
struct PostEffectLayerInputMap
    LayerInputType type;
    csRef<iTextureHandle> inputTexture;
    csString sourceName;
    csString svTextureName;
    csString svTexcoordName;
    csString svPixelSizeName;
    csRect sourceRect;
    PostEffectLayerInputMap () : type (AUTO),
        svTextureName ("tex diffuse"),
        svTexcoordName ("texture coordinate 0") {}
The input map had small but important changes, first it now has a type that defines if it references a layer output, a texture resource or is manually seted up. The sourceName variable depending on the type can be: "layername.outputname" so that we can link layers input/output, or a path like "/data/lookuptex.png" for AUTO and STATIC type respectively. For manual textures the sourceName is unused.

For last but not least:
struct LayerDesc
    csArray<PostEffectLayerInputMap> inputs;
    csArray<PostEffectLayerOptions> outputs;
    csRef<iShader> layerShader;
    csString name;

    LayerDesc () {}
    LayerDesc (iShader* shader)
    LayerDesc (iShader* shader, const char * layerName)
    LayerDesc (iShader* shader, PostEffectLayerInputMap &inp)
    LayerDesc (iShader* shader, PostEffectLayerInputMap &inp, 
                   PostEffectLayerOptions &opt)
    LayerDesc (iShader* shader, 
                   csArray<PostEffectLayerInputMap> &inp,
                   PostEffectLayerOptions &opt)
    LayerDesc (iShader* shader, PostEffectLayerOptions &opt);
The layer descritor contain all the parameters needed to create and setup a layer, and of course, alot of constructors to easily create a layer.

The next step will be create a texture cache where all posteffects can request textures, link layers inputs to outputs and create a loader for STATIC type inputs.