Table of Contents

Recap

In the last post, we looked a variety of different kernels that show up in image processing. One of the concepts mentioned in passing was the idea of separating a kernel into two vectors. This was particularly clear when constructing the Gaussian blur kernel — in fact, the blur kernel was derived by multiplying vectors.

Matrices that can be decomposed into the product of a column and row vector are referred to as separable. As we will see in the following section, the property can be exploited to improve convolution performance.

Separable Kernels

An mxn kernel is said to be separable if there exists a pair of vectors with dimensions mx1 and 1xn such that the product of the vectors is equal to original kernel matrix. Formally, where K is a kernel matrix, u is a column vector and v is a row vector:

$$K = u * v$$

This is equivalent to convolving u with v; for this discussion the x symbol will be used to indicate convolution, and * will be used for standard scalar/matrix multiplication. This gives us the following:

$$K = u \times v$$

The primary reason this is interesting is that the convolution operator is associative. Given an input image, output image and separable kernel K:

$$output = input \times K$$ $$\implies output = input \times (u \times v)$$ $$\implies output = (input \times u) \times v$$

Rather than convolving an image with an mxn kernel, we can first apply an mx1 kernel followed by an 1xn kernel. It is important to note that the convolutions in the expressions refer to the whole image, not just one pixel — the entire image is convolved with the column vector u, after which the resulting output image is convolved with vector v. This means that two distinct passes are performed over the image, rather than one.

From an algorithm analysis standpoint, the cost of performing the convolution is reduced from O(m * n) to O(m + n) per output pixel, assuming the cost of performing a convolution pass is relatively cheap.

OpenGL Implementation

Implementing separable convolution kernels in OpenGL or DirectX requires the use of render to texture features. The output of the first convolution, applied in the vertical direction, is written to an intermediate texture rather than the screen. A horizontal pass is then applied with the first output texture as the source image.

The implementation details are heavily dependent on the particular graphics API being used; the example below is derived from the WebGL convolution tool. It isn’t the exact same code — the tool allows the user to capture the current output and redirect it back in as input, so the buffer structure is a bit more complex.

The following code lists an example of setting up the temporary buffer for the convolution procedure:

var convTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, convTexture);

gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, /*WIDTH*/, /*HEIGHT*/,
              0, gl.RGBA, gl.UNSIGNED_BYTE, null);

gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR_MIPMAP_NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);

gl.bindTexture(gl.TEXTURE_2D, null);

... 

var convBuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, convBuffer);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0,
                        gl.TEXTURE_2D, convTexture, 0);
gl.bindFramebuffer(gl.FRAMEBUFFER, null);

convTexture will hold the intermediate output produced during the vertical convolution. The render loop should be structured such that the vertical pass is performed with convBuffer bound:

gl.bindFramebuffer(gl.FRAMEBUFFER, convBuffer);
/* Do vert conv */
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
/* DO horiz conv, with convTexture as an input */

The glsl coded looks basically the same as the one listed in Part 2. The only difference is that two fragment shaders are needed — one for the vertical pass and one of the horizontal pass. Each shader should take a floating point array as input as before, but the array represents a vector not a full matrix. An example vertical shader for a 5x1 vector (origin, scale, etc omitted):

uniform sampler2D uSampler;
 
uniform float uKernel[5];
uniform vec2 uTextureSize;

varying vec2 vTexCoord;
  
void main(void)
{
    vec4 sum = vec4(0.0);
               
    float stepSize = 1.0/(uTextureSize.y);
                    
    sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y - stepSize * 2.0)) * uKernel[0];
    sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y - stepSize)) * uKernel[1];
    sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y)) * uKernel[2];
    sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y + stepSize)) * uKernel[3];
    sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y + stepSize * 2.0)) * uKernel[4];
                                         
    sum.a = 1.0;
    gl_FragColor = sum;
}

The horizontal shader can be implemented in the same way, however stepSize will be computed from uTextureSize.x and applied to the x components instead of the y components. Alternatively a single shader with a uniform int flag for switching between horizontal and vertical vectors could be used.

In modern OpenGL writing performing multiple passes as described above is fairly cheap. If a kernel is separable its usually best to make use of that fact, unless the kernel is tiny. Not much is gained by separating a 3x3 kernel into two passes, but I think it’s fair game to split a 5x5 kernel into two passes. Of course, that’s something that should be tested with benchmarks by the developer!

Wrap Up

Not all kernels can be separated — many of the matrices discussed so far, such as edge detection and sharpening filters, cannot be represented with a vector product. The next post will explore some of the techniques that can be used to produce vector approximations for a matrix.