Wednesday, February 20, 2013

This is not Motion Blur

This motion blur approximation provokes me motion sickness so hard I have to disable it. The problem is, often these games only give the option to disable all kinds of motions blur (radial blur, ping pong motion blur, artistic blur) while all I want is to only disable per-pixel motion based/velocity based motion blur

The origin

Velocity based motion blur probably became famous after Crysis implemented it. They weren't the first though, this kind of technology was already showcased by the ill-fated Offset engine. And they weren't the first either.

How it works

The standard way of doing it is described in GPU Gems 3 using textures containing velocity vectors. It is also explained on D3D10 Book "Programming Vertex, Geometry, and Pixel Shaders", and also there's a demo in MJP's blog.
Basically a textures holds where and how much the pixel moved from the previous frame, and start blending this pixels with all other pixels along the line traced by this vector.
I don't have actually anything against this approximation technique. Where I differ though, is in the blending formula.

A look at a real camera's blur

How cameras work

Motion blur on camera's is mainly controlled by the ISO speed & shutter speed (and to a lesser extend, aperture).

The shutter speed

The name is misleading because it should be called “shutter time”. The shutter speed controls how long the camera sensor is exposed to incoming light. Low shutter speeds can reach 1000th's of a second, while high values can be sustain more than half a minute.
Low shutter speeds produce very crisp images images, while high shutter speeds produce very blurry pictures. Extremely long exposure times are often used to produce “light paintings” or overworldy, surreal effects. There is a whole “Long exposure photography” sub-culture that can absorb your attention for a few hours.
Low shutter speeds on the other hand, are very useful for action scenes or for focusing on fast moving objects.

High shutter speed time, Source: ForestWander Nature Photography, Under CC SA license.

The ISO speed

The ISO controls how sensitive the camera is to incoming light. Low ISO is very useful in very bright scenes, while higher ISOs are needed in low light conditions to get good shots. However high ISOs may result in noisier images (those coloured dots in digital cameras) or images with noticeable film grain (analog cameras).

In old analog cameras, the ISO was directly proportional to the density of the layer of silver salts in the photographic film. Lower ISO equals less layer density, which results in a film that needs less photon beams to produce a meaningful reaction to leave a “print” (called latent image). It's like trying to sculpt on paper with light beams. The bigger the density (i.e. “thickness”) the more beams you need to sculpt it.
In digital cameras, the ISO speed is controlled by how the CCD is calibrated. An engineer specialized in photography could probably give a more technically detailed response, but this should suffice.

Picture with noticeable film grain (high ISO). Watch the original here.

Picture with noticeable noise (high ISO 3200, 6.25ms shutter speed, digital camera)
Same picture with almost no visible noise (low ISO 100, 200ms shutter speed, digital camera)
You may have realized by now, ISO & Shutter speed are tied. Low shutter speed time will allow less light to go through. As a result, a high ISO is required. ISO & Shutter speed are in fact so tied together, that this is why high ISOs are called "fast".
High shutter speed times on the other hand, will allow plenty of light to go through the sensor/film. If you're not using a low ISO, it can quickly overexpose. This is why low ISOs are called "slow".


I won't go into much detail on this one. Basically aperture controls how much the camera opens and lets the light go through. It is measured in f-stop. An f-stop of f/1 makes the lens wide open, while an apperture of f/22 makes the lens to be very tight closed (the area of the circle is smaller).
Apperture controls the depth of field in the camera. Close shots needs wide open lens (apperture close to f/1), while far shots require a tightly closed lens (f as smaller as possible, like f/22)

Needless to say, a wide open lens allows more light to go through, thus causing less film grain and reducing the need of having longer shutter speeds (reduces motion blur).
A closed lens instead, has very far depth of field but needs more time to gather enough light, thus getting more blurry pictures, or noisier images if you choose a bigger ISO and less shutter speed.

In terms of rendering, we could say the apperture is controlled indirectly by the value of the far plane, and directly by your depth of field shader effect, if you have one.

Apperture f / 13
Apperture f / 14
Both pictures were taken with the same ISO 400 & shutter speed 5ms. Notice how the noise (film grain for analog cameras) in the background is more noticeable with smaller appertures. Click on the picture to enlarge.

These three elements, Shutter speed, ISO speed, and Apperture are called the "exposure triangle"

What we've learned so far

  • Film grain and motion blur are inversely proportional.
    • Film grain is more frequent when using high ISOs.
    • Film grain is hence also more likely in low light conditions.
    • But motion blur appears when using low ISOs!
    • At least one of the games listed at the start is using too much of both effects at the same time!!
  • Close-focused DoF produces less motion blur and less film grain.
    • A wider apperture increases the amount of light going through, thus forcing us to reduce the shutter speed and the ISO without their disadvantages.
    • This is very important if you have a cutscene with DoF. Don't make a close up shot and have lots of blur and film grain.
  • Far-focused DoF produces more motion blur and/or more film grain
    • A smaller apperture decreases the amount of light going through, thus forcing us to reduce to either increase the exposure time needed (increase shutter speed time -> more motion blur) or increase the ISO (more film grain), or both.
Having all effects active can be an artistic touch, but if your goal is aiming towards photorealistic rendering, keep these in mind.
Furthermore we're living in an era where the idea seems to try to blind the player with bloom, slow hdr eye adaptation, gigantic lens flares, and almighty god rays (ok I'm guilty of that one).
So.. since DoF is rarely used and knowing that you should lower your motion blur when film grain is kicking in is a good hint, given the same apperture. You don't have to throw everything together to the player to show off how awesome your postprocessing shaders are and how ree3al!! it looks. It's tempting, I know. But please fight it. Turning on every effect you know doesn't make it more realistic.

VERY FAST rotating camera. High ISO 3200, 10ms shutter. Notice a lot of noise and some blur

Camera rotating at the same speed, low ISO 100, 333ms shutter. Notice the lack of noise/grain, and extreme motion blurriness.

The motion blur

Ok, enough with the rant. Back to the original topic, the point of all this is that motion blur is caused by light breaking it's way through the lens to the sensor and adding itself to what's there already.
Note the word "add". I didn't say "average". I said ADD
Motion blur is additive, not an average. But this is relative, because using a lower ISO is roughly the mathematical equivalent of dividing all images that are going to be added by a constant factor; in which case it starts to look more like an average.

Let's look at some real camera shots, purposely shaken to enhance the motion blur effect. Notice the light streaks are blurrier than the rest of the image (darker spots) and leave a larger trail.
Most of them were taken with ISO-400

Now let's try to mimic that effect. I will try to present a different blending formula that tries to emulate that behavior. Note: This is an empirical method based on observation, I haven't based the formula in some mathematical or physically accurate model.

Below is the typical motion blur postprocessing formula:

  • RGB is the final output.
  • HDR() is the hdr tonemapping operation
  • x is the current pixel location (whether in texels or pixels)
  • v is the velocity vector
  • n is the number of sample steps
  • pixel[] is the pixel being addressed.
  • f() is the motion blur operation

 While this is a more correct (IMHO) motion blur formula:

  • RGB is the final output.
  • HDR() is the hdr tonemapping operation
  • x is the current pixel location (whether in texels or pixels)
  • v is the velocity vector
  • n is the number of sample steps
  • p[] is the pixel being addressed.
  • C is a constant factor in the range (0; 1]
  • D is a constant factor in the range (0; 1], typically 1-C
  • f() is the motion blur operation
Note: I used the X and * signs indistinctly for multiplication. It was an inconsistency typo. There are no cross products involved.

Some notable differences & remarks:

1. HDR is done after the motion blur. This ensures that the motion blur result stays in the correct range when C != n. This also matches a real life camera closer.

2. C is a constant (arbitrary) factor to simulate the ISO speed. When C = n, it's a simple average.

3. D is a constant (arbitrary) factor to simulate the shutter speed speed. Normally it should be inversely proportional to C. So probably either D = 1 - C or D = 1 / (C + 1)

4. C is inside the loop. Although putting it outside could be considered a performance optimization, take in mind you're adding hdr values. If you're using real life values, the sky can easily have large numbers like "5000". Just using a R16FG16FB16F render target with a surprisingly not very high step count can easily overflow the result to infinity.
You'll most likely blur all samples in one pass and convert them to 32-bit floats, thus you can safely put C outside. But if you're in DX 9.0 hardware, beware of not using half, otherwise some hardware (G60, G70, Intel cards) will perform the arithmetic in F16 precision (overflow) while other hardware (G80+, Radeon HD 2000+) will always perform the arithmetic in F24 or F32 precision (no apparent overflow)

5. For similar reasons in 4.; you may want to clamp the result before outputting the float to the render target. Chances are, you're going to do another pass to calculate the average luminance in order to perform the HDR tone mapping, you probably don't want to overflow to infinity.
Because not all 16-bit float GPUs are IEEE compliant, clamp to sane high value, like 64992, or 32000 (should be multiple of 32).

6. Clamping to avoid overflow is not a bad thing per-se. If you notice the three pictures earlier,  one of them has a highly saturated light source the camera can't handle, unless I use a lower ISO. In a similar fashion, just use a lower C value to allow HDR to gracefully handle the saturation.

Nothing new

This isn't new, nor is rocket science. In fact, Crysis 2 talks about it in SIGGRAPH 2011 ("Secrets of CryEngine 3 Graphics technology") that their motion blur is done in linear space before tone mapping, using 24 taps for in PC, 9 taps in consoles; and says “Bright stakes are kept and propagated” citing Debevec 1998.
Obviously, I wasn't the first one to notice the order of operations was wrong.

The reason of why it works is quite obvious. In an extremely bright scene with some extremely dark objects, Blurring two bright pixels: 5000 + 5000 = 10000; while blurring one dark with one bright: 5000 + 10 = 5010; which is almost the same pixel it was before.
When blurring the two dark pixels, 10 + 10 = 20. Which is 100% different, very blurry. However if the scene is very bright, it will favour all pixels with >very high luminance because of the avg. luminance.
After tonemapping, the bright pixels 10000 will map to the value "1"; the not-as-bright pixels 5010 & 5000 will both map to roughly the same value (~0.5) and the dark pixels 10 & 20 will probably map to the exact same low value due to rounding, which you won't notice the difference (0.00392 ~ 0.00784 if you're using an RGBA8888 framebuffer)

Where's the sample code? Results?

Unfortunately, I'm too busy on too many projects right now to be testing this formula. I'm usually not keen on releasing something without proof (besides it convinces people quickly!) A test application would require me to implement vector based motion blur, DoF & HDR. I've done these several times in the past, but for different clients under NDAs, so I would have to make another one from scratch. And due to my motion sickness with this particular effect, I have never been been much interested in vector based motion blur to begin with.
I've only got into this when I saw that every game I've been shown lately is shipping with this horrible effect, so I felt I needed to write something about it. 5 years ago it could be turned off because it was too expensive for most GPU cards, so the option was there. But it appears today consumer cards are able to handle this effect with no problem, thus less games allow the effect to be turned off.
As much as I admire Wolgang Engel (who wrote the D3D10 book, and also worked on Alan Wake) and Emil Persson (who worked on Just Cause 2), I cannot stand that horrible motion blur effect going on there.

What Crytek is doing is probably not very different from this approach, and they do have pictures and a working implementation that may be worth watching.

I know someone who's recently implemented his own version of v-based motion blur, so I may be able to convince to try this approach instead and share the results.

If the math formula jargon scared you away, here's a Cg implementation so you can understand. Warning: Untested code.

//Remember to do the hdr tonemapping AFTER you've done the motion blur. rtTex is assumed to be sampled in linear space. Do *not* compile this code with a flag that turns all float into half; unless you move the multiplication against isoFactor inside the loop.
float4 motionBlur( float2 uv, uniform sampler2D rtTex, uniform sampler2D rtVelocities, int numSteps, float isoFactor, float shutterFactor )
    float4 finalColour = 0.0f;
    float2 v = tex2D( rtVelocities, uv ).xy;
    float stepSize = shutterFactor / numSteps;
    for( int i = 0; i < numSteps; ++i )
        // Sample the color buffer along the velocity vector. 
        finalColour += tex2D( rtTex, uv );
        uv += v * stepSize;

    //Clamp the result to prevent saturation when stored in 16 bit float render targets.
    return min( finalColour * isoFactor, 64992.0f );

Saturday, February 16, 2013

Premultiplied Alpha: Online Interactive demo

So, how does it work? Just move the sliders. Premultiplied alpha blending on the left, regular alpha blending on the right. It requires HTML5, you should use a good browser like Firefox 18, Chrome or Opera 12.

Premultiplied Alpha
Regular Alpha blending

Not showing up? Try downloading the offline version.

What is premultiplied alpha?

There's a lot of resources about it, but they're all too technical and not artist-friendly. Still a good read, namely:
Long story short: It's a different way of doing transparency, which looks better, removes those ugly black edges that sometimes appear in alpha blended images, naturally behaves more like transparency does in the real world, compresses better when using DXT1, and is also very useful to perform neat tricks in particle personal effects to save performance. Also, some people (including myself) believe premultiplied alpha is the right way of performing alpha blending.
Does it have disadvantages? Yes, it's harder to understand at first, and some textures may undergo quality degradation when not using dxt compression. But seriously, those are very small prices to pay for huge advantages. It's a win win.

Coincidentally, NVIDIA came with an interesting article at the time of writing. However, they make emphasis on the filtering artifacts, while I'll be focusing in another point.

Why it behaves more “natural” to real life

To answer that question, we need to compare how both methods work:

How regular alpha blending works:

Regular alpha blending is easier to understand than premultiplied alpha. In regular alpha blending, take a value called “alpha” and mix two colours. Alpha of 0% means background is fully shown and foreground is fully transparent. Alpha of 50% means half background, half foreground then mix together. Alpha 100% means foreground is fully opaque, the background is being occluded. Easy piecy

How premultiplied alpha blending works

Premul. alpha just doesn't work like that. “Transparency” is no longer controlled by one value. It's controlled by all four (RGBA)
This is because Alpha only controls “how opaque” the surface is. But not being opaque does not mean it's transparent!

Take a moment, breath in, breath out. Relax. Now think it again, “not being opaque does not mean it's transparent”

At this moment you may be thinking “Whaaaa??” -Ok let me give you an example. Think of light shafts going through a window. You can see the light. But the light isn't opaque, yet you're seeing it, so it can't be transparent either. This is the same. This happens because the light beam isn't blocking the incoming light from it's background (it isn't opaque) but it also contributes with it's own light (hence it's not transparent)
Well technically it is blocking some light, but just some and besides, premul. alpha can mimic this too

Premultiplied alpha works with the concept that when alpha = 0, the surface is no longer opaque, so it contributes light to the scene. In photoshop terms, when alpha = 0, it's the same as having the layer blending mode set to “add”.
In order to be fully transparent, alpha has to be 0, but also the RGB colour has to be black, so that it doesn't contribute any light either.
The bigger the alpha, the more it blocks incoming light from behind, however it never stops contributing/adding light unless you turn the RGB down.
Get it?
No? Ok let's see some pictures:

In photorealistic rendering, the best results are achieved with HDR, because the whole “I block half of the background and contribute my own light” effect can be appreciated more easily.
Example: A premultiplied alpha window from a house is contributing some light, when looked from outdoors in a bright day the window will appear opaque, because the window's contributing lighting is higher than the light from inside passing through it.
However when looking from indoors, the window looks almost transparent, because the light coming from outside is higher than the window's own light. Coincidentally this matches real world behavior, where you can't look what's in the other side of the window if the brightness in your end is much higher than the brightness on the other side.
Regular alpha blending can't match this by a long shot. You will always get 50% window's own colour + 50% what's behind it, or whatever the alpha ratio is, the result will be completely unnatural and unrealistic. In worst cases it will blow up your automatic exposure adjustment because averaging completely distorts the actual luminance the object should be using.

What's inside the darker room? Who knows. All I see is a window (try to ignore the reflection, sorry I couldn't take a better screenshot).

What's outside in the bright outdoor? That's crystal clear!

This picture was taken with my camera. I admit I should've cleaned the window, but it makes a perfect example on how to use premultiplied alpha effectively:

When the specular component is very high (depends on view angle and sun's position) the dirt spots will appear bright, making the window very dirty (while the clean spots will remain clean). However when the specular component is very low, all spots, whether clean or not, will almost be transparent, giving the illusion of a clean window. Which is exactly how real life works.

So that's about it (artist friendly explanation). I'll let you keep playing with the interactive online demo above.
Experiment on how the alpha slider affects the “opaqueness” without affecting it's transparency, and compare the result with regular alpha blending.

When is alpha blending useful

Regular alpha blending is still useful. Like I said, it's easier to imagine and visualize. It's usually great for GUI & icon design. Also, fade in & out becomes harder when you have to tweak all RGBA sliders at the same time instead of just fading one. For these types of jobs, regular alpha blending "just works"

Other neat tricks

This has been repeated to death. So I will be brief.
Being able to switch between being opaque and additive blending is a very neat property. The most common example is fire. Fire is by nature fully additive (alpha = 0) while the smoke it emits is almost completely opaque (alpha close to 1).
All it takes is to fade the alpha value close to 1 and we get fire effects that darkens until it becomes smoke; for free (no need for a 2nd draw call, render state change, or another texture) This is purely a performance optimization. And also a pretty handy ability, since it's easier to tweak & animate the alpha value in most particle effects middleware than adding a second spawner/emitter or a texture change.
Another neat property is that premul. alpha doesn't produce black halos surrounding the edges. The reason behind this is very technical, so I won't explain why, you just need to know what it does, not why it works.

Quick! Spread the word, and hopefully some day everyone will be using premultiplied alpha. And bug your technical artist if the game engine doesn't use it. Are you already a programmer? Then read the goddam links at the beginning. Don't be lazy!

Wednesday, February 6, 2013

Polycount vs Vertex count

I was asked this question too often by junior artists, so I should address it. There is some talking about in but I should make it very clear.This article has mainly been written for artists

How I should measure the cost of my models, by polycount, triangle count??

Short answer: VERTEX COUNT!!! The poly- or triangle count says nothing very little about the cost of your model.

Long answer: For GPUs, all that matters is how many vertices need to be processed, and how much of the screen the triangle will be covering.
It's true that 4 vertices may form 2 triangles, or it may form up to 4 triangles (i.e. overlapping each other) and the latter will cause the GPU to process the same 4 vertices more times. However modern GPUs have large enough caches that will store the result of the vertex, making this problem meaningless. Any decent exporter (or even the modeling application) will be arranging the vertices in a cache-friendly way. It's not something the artist has to worry about.
If the exporter isn't rearranging the vertices, introduce your tools programmer to the wonderful AMD Tootle

As for N-Gons, it is just for Maya/Max/Blender. It eases modeling and textures for the artist, but when it's exported to a game engine, everything is converted to triangles. Everything, even quads.
That's because all the GPU can process are triangles. GPUs that can natively render quads haven't been popular since the Sega Saturn.

What if told you, a cube can be made with 36 vertices, 24 vertices, or 8?

The vertex count that Max/Maya/Blender show you is not the vertex count that really cares, although sometimes it can be a rough estimate times a factor (that factor will depend on each model)
For example, Nathan's model from Distant Souls is 12.959 vertices in Blender (22.876 tris for those not yet used to it), but once exported it's 14.003 vertices.

So, why is that? Because every time there is a discontinuity between two triangles, the vertex must be duplicated. Normals are stored per vertex, not per face. Strong differences between the normals of two faces means the same vertex has two different normals, which equals two duplicated vertex: Same position, different normal.
Let's take a look at the following cube, it's 8 vertices, and 6 faces (obviously!):
So, you would assume that's 8 vertices, right? Guess again. If exported correctly, it should be 24. If badly exported, it will be 36.
Let's see bad case, how it could be 36 vertices:


The exporter sees we're using flat shading/hard edges, so it just goes the easy routes and makes one vertex for each triangle. Quick 'n dirty. The cube has 6 faces, 2 triangles per face. That means 12 triangles. Because there are 3 vertices per triangle, 12x3 = 36
 6 faces x 2 triangles per face x 3 vertices per face = 36
But let's get a little smarter, we can reuse a few vertices:

As seen in the picture, 2 vertices from each triangle share the same normal, hence we don't need to duplicate them. That leaves us with 4 vertices per face. Every other face doesn't share the same normal, so we'll need to duplicate the vertices. What we get is:
6 faces x 4 vertex per face = 24
And finally goes smooth shading, due to the way the normals are placed, it gives a soft look, since it's trying to mimic a round ball lighting:

Here, every vertex from each triangle has the same normal, hence no need to duplicate anything. From a technical standpoint, it's the best case scenario. Not only they're less vertices, but they're also shared which allows the GPU to better utilize the post vertex cache (it's a cache that reuses the output from a vertex shader for a different triangle, so that the shader doesn't have to process it again)
From an artistic standpoint, whether this is good will depend on the look you're trying to achieve.
Anyway, the result is 8 vertices total.
In a real world scenario, practice tells us even this smoothly shaded cube won't be 8 vertices, probably 12-16. This is because unwrapping the UV will generate a discontinuity at some point unless you use an extremely distorted UV mapping:
When exporting this cube using smooth shading with this UV layout, it contains 14 vertices. Still better than 24.
Green: Duplicated vertices.
Yellow: The original vertices that presented a discontinuity.

Causes for vertex duplication / discontinuities?

There are many factors that can cause a discontinuity. For example:
  • Using flat shading. Use smooth shading instead when possible.
    • Called “hard edges” in Maya (use smooth edges)
    • Using multiple, faceted “smoothing groups” in 3DS Max
    • Called “Flat shading” in Blender
  • Too many UV islands. If the vertex has U=0,5 for one face, and U=0.7 for another face, the vertex can't have two U values at the same time, it has to be duplicated
    • Note that multiple UV sets are fine as long as they keep continuous (which can be rarer to keep in harmony the more sets you use)
  • Using different Material Ids. Try to batch everything into one material.
Polycount already covers this in excellent detail. There is no need to keep repeating what is said there.

Common misconception: Why is the GPU so dumb?

A common misconception that confuses many junior artists is why the GPU is dumb? In the example given above (let's forget about UVs) it's clearly smarter to just store 8 vertices with 6 face normals. Why does the GPU need 24 vertices? Why store per vertex normals? The first reaction I get is “That's insane”
The answer is simply because that's how GPUs work. GPUs are all about raw power. In simple terms, the main difference between a CPU & GPU is that one tries to be smart, while the other tries to brute force everything.
Think an analogy: Suppose the Hulk wants to enter a house. The smart thing to do is knock on the door, wait until someone opens it, then close it again. But he's the Hulk. It's clearly easier for him to just smash the whole wall. You wouldn't expect the Hulk not to do that. Even something as simple as opening door or knocking would be hard for him. Same goes for GPUs.
3DS Max, Maya & Blender all of them will internally store 8 vertices and 6 face normals. This is because it's storage efficient. Imagine those packages saving a file that is 3 to 6 times their current size. Also from an artist point of view, it is much easier to just work face normals, so the internal representation matches the artist's normal workflow.

But here's one little secret: When they have to display it on the viewport, they have to convert the model on the fly. That perfect 8-vertex cube with 6 face normals gets converted to 24 vertices. That's why you may have noticed modifying one single vertex in an object with 10 million vertices is so painfully slow even in high end machines. Also another big reason modeling packages such as Maya/Max/Blender can't match the framerate of a game engine. They either convert the whole model on the fly, or sacrifice real time performance for incremental updates so that editing doesn't become so slow.

This is going to change in the future, right?

Well, it's been like this for more than 20 years. So don't keep hope. You can try teaching the Hulk to knock on doors. He will keep preferring to break the wall.
There are progress in alternatives to rasterization for real time applications, mainly using Compute Shaders, and sometimes they include good ol' raytracing. But nothing ground breaking so far yet (limited use cases right now), so 8-vertex cube with hard edges may become feasible in ?? years.

It's worth mentioning that there is a way to use hard edges with just 8 vertices (but still needs to duplicate due to UVs) in Direct3D & OpenGL, which is simply called “Flat shading” (of course!). This feature has been there since the dawn of time.
The reason the tool's programmer doesn't ever want you to know that is because:
  • It's an all or nothing – Either everything is flat shaded, or smooth shaded. You can't mix.
  • The order in which vertex are submitted to the GPU becomes very important to achieve correct lighting otherwise the wrong normal will be used. And it's a major PITA for the exporter to guarantee that correct order. Sometimes there is not enough information and becomes practically (but not theoretically) impossible to export the right order, unless we resort to duplicating a few, which is what we were trying to avoid. It's easy for standard primitives (box, sphere, pyramid) but analyzing complex surfaces' triangle indexing to get the right order is challenging.
If accurate lighting can be sacrificed, or the programmer who wrote the exporter is a genius, and you don't need smooth triangles in your mesh, then flat shading is a feasible alternative. Note that using UVs and other causes for vertex discontinuity will still force to duplicate vertices.