Sc4Freak's Devblog: March 2008

Friday, March 28, 2008

SCALE III, part 4

The SCALE API is as simple as it gets. It consists of just two functions: ParseFromFiles() and ParseFromMemory(). The ParseFromFiles() function takes an array of filenames, loads the files into memory, and passes them to ParseFromMemory().

ParseFromMemory() itself takes an array of strings, each string in the array represents a translation unit. Inside ParseFromMemory(), the SCALE engine is initialised and parsing occurs.

The SCALE Engine has 2 distinct stages: tokenisation and parsing.

Tokenisation

Tokenisation of a translation unit is the process of breaking up the entire translation unit into separate entities known as "lexemes". A lexeme is the smallest "unit" that SCALE understands - they are analogous to words in english.

In English, lexemes (eg, "run", "play", "discombobulate") are delimited by whitespace. That means, lexemes (or words) are defined to be separated by whitespace - if there's a bunch of characters that aren't separated by whitespace, it's considered to be a single word.

I'm sure that I'm about to be mob lynched by all the linguists out there, so I think I'll stop with the analogies to real languages. In SCALE, lexemes are delimited by a certain number of characters which where described in part 2 of this series. It's the tokeniser's job to break apart the entire translation unit into lexemes that then go into the parsing stage. For example, let's say we have the following translation unit:


Type ObjectName
{
    ~Frobnicate
    {
        Foo = 5;
    }
}

The tokeniser would break up that code into the following list of tokens:


Lexeme          Index     Row, Column
"Type"          0         1,   1
"ObjectName"    5         1,   6
"{"             17        2,   1
"~Frobnicate"   24        3,   5
"{"             41        4,   5
"Foo"           52        5,   9
"="             56        5,   13
"5"             58        5,   15
";"             59        5,   16
"}"             66        6,   5
"}"             69        7,   1

As you can see, whitespace and comments are ignored - only the useful information is extracted. This list of tokens is then passed to the parser.

Parsing

The job of the parser is to make sense of the raw tokens given to it by the tokeniser. At its core, the SCALE parser's job is to check syntax and format/convert the information into useful data structures. The parser is responsible for type checking, syntax & formatting, and conversion of data. The parser just sequentially runs through each token in the list, and performs checking to ensure that it's correct. If an unexpected token is found (eg. missing an opening brace), the parser errors out. When parsing object types, object names, modules, variable names and variable values, the parser does checks to ensure the validity of the token and enforces things like naming rules and types.

The output of the parser is a large associative data structure (a std::map<>, specifically) called ParsedData. The ParsedData structure contains all the types which were parsed by the parser. ParsedData is a std::map whose key is a type name, and whose key values is a data structure called TypeData.

The TypeData structure contains all the parsed objects that are of the specified type. It, too, is a std::map associative container. The key value is an object name, and the key value type is a structure called ObjectData.

The ObjectData structure contains all the modules contained in a particular object. Like the other data structures, this is another associative container (std::map). The key is a module name, and the key value is an array of "ModuleData" data structures (remember, it's legal to have multiple modules of the same name in an object).

The ModuleData structure finally contains a map of all the variables in that module. The map's key is a variable name, the value is a VariableData structure.

Finally, the VariableData structure contains the data for a variable, including the type.

If you didn't understand that, here's a helpful diagram that may help:

Performance

SCALE III, as the name implies, is the third time I've attempted to implement SCALE. The first iteration was, well, terrible. It was slow and incredibly fragile with overly strict and verbose syntax. It was abandoned quite quickly, and SCALE II was born. SCALE II had the exact same syntax and naming/typing rules as SCALE III, but it's impementation was very different. SCALE II was an attempt at single-pass lexical analysis. There was no tokenisation or parsing or compiling phase, it was all done in-place. And it was a bloated, slow, complex piece of code. I had believed that I could save some performance by making it single pass - but it was a lot more difficult than I would have imagined. That's why SCALE III is done in two passes, like a lot of other lexers out there.

SCALE II relied heavily on simple arrays to store data - and searching arrays was pretty slow as it was an O(n) operation. For namespace collisions and type checking, this resulted in a terrible O(n^2) running time. SCALE III now relies on sorted associative arrays: std::map is usually implemented as a binary red-black tree and it guarantees O(log n) time complexity when searching, indexing, inserting and removing from the tree. Thanks to this, large portions of the engine were optimised from O(n^2) running time to O(n log n), which is a very significant improvement.

SCALE III also supports multithreaded processing of translation units (well, SCALE II did as well, but it wasn't quite as robust). Since translation units can be processed separately, they are dispatched to a thread pool where they are processed in parallel. Since name resolution and checking across translation unit boundaries requires that all the information be available, it's delayed until all translation units have been processed. Then, the thread pool is destructed and namespace resolution occurs on a single thread. That means that some of the processing is required to be serial, but the majority of processing (tokenisation, most of the parser) can be run in parallel for a significant performance increase over single-threaded processing.

Here are some benchmark results on my processor (Core 2 Duo E6600):

2x 7.5mb files (15mb total), MT OFF: 6505ms
2x 7.5mb files (15mb total), MT ON: 3679ms

With multithreading on, the total running time is reduced by 43%. That means that with multithreading on, SCALE is over 76% faster! Of course, I only have a dual core to test it on. The gains would be even larger on a quad core, or octo core. This multithreaded system is scalable up to the number of translation units available, so benefits would likely be seen up to 16-core and beyond, given that there's likely to be a large number of translation units.

Of course, it certainly could be faster. I haven't even run SCALE through a profiler yet, since there's practically no need to optimise. In any sane, non-pathological real-world case, we'd be looking at miniscule amounts of data. At most, maybe 1mb of text data spread out across dozens of files. Since the loading is faster on lots of small files as opposed to a couple of large files, SCALE can parse an entire game's worth of data almost instantly.

Monday, March 24, 2008

SCALE III, part 3

The Type System

Variables in SCALE do not have types, unlike objects (eg. an object "SpriteMaterial" having a type of "Material"). Or not explicit types, anyway. The user does not specify the type of a variable. Nor does the application, for that matter. A variable can have any type, and this type is determined at runtime by the SCALE parser. There are 12 fundamental types that SCALE recognises:

- Integer
- Boolean
- Float
- Reference
- Enumeration
- String
- IntegerArray
- BooleanArray
- FloatArray
- ReferenceArray
- EnumerationArray
- StringArray

When a variable's value is parsed, the type is dynamically determined based on a set of rules. Right now, the rules are a bit arbitrary and will need to be formalised and set in stone sometime in the future. But for now, this the general algorithm:

  - If the value is surrounded by braces ('{' and '}'), it is an array. For each element, use this algorithm to determine the type.¹
  - Else, if the value is surrounded by quotes ('"'), it is of type String
  - Else, If the value is preceeded by an ampersand ('&'), it is of type Reference
  - Else, If the value is "true" or "false" (sans quotes), it is of type Boolean.
  - Else, if the value is a numeric value²
      - If the value has a decimal point ('.'), it is a Float
      - Else, the value is an Integer
  - Else, it is of type Enumeration.

¹ Arrays must be homogeneous; that is, the types of all elements must be identical. Arrays of arrays are not supported.
² Determination of whether a value is "numeric" is currently a bit arbitrary. Needs to be finalised in the future.

Once SCALE determines the type, it converts it into a binary format and gives it to the application. It is the application's job to determine what to do with it, and what to do if the type isn't what was expected.

Naming

In SCALE, each object must have a unique name. That is, it is illegal for two objects to share a name.

Each type has its own namespace. Which means that the unique name requirement doesn't carry between two different types. For example: it's legal to have an object of type "Foo" to be called "Fred", and at the same time to have an object of type "Bar" to also be called "Fred". This is legal:


Bar Fred
{
}

Foo Fred
{
}

There's a gotcha, though. You'd expect that each translation unit would get its own namespace, but it doesn't work that way. All translation units share the same, global namespace. Meaning that separating two identically typed, identically named objects into two different translation units (or files) won't work.

This unique name rule doesn't apply to modules in an object - it is legal to have two modules of the same name (and even containing the same data, if you wanted). This is legal:


Type ObjectName
{
    ~Frobnicate
    {
        Foo = 5;
    }
    ~Frobnicate
    {
        Foo = 6;
    }
}

The rule rears its ugly head again when dealing with variables. Every module, in every object, gets its own namespace. What that means is that in any particular module, you're not allowed to have two variables with the same name. However, you are obviously allowed to have two variables with the same value. For example, the following is NOT legal, and will cause an error during parse time:


Type Wilma
{
    ~Fred
    {
        Foo = 5;
        Foo = 6;
    }
}

That's about it as far as naming and type rules go - it's relatively simple and easy to remember.

Next time, I'll talk about the usage of the SCALE API from the perspective of the application, and how it's going to be used with SolSector.

Sunday, March 23, 2008

SCALE III, part 2

So what the heck does it even do? What does it look like?

SCALE defines a syntax format and provides a parser for that syntax. That's basically all that SCALE is. As I mentioned in my previous post, SCALE was inspired by and is based upon the .ini configuration files of Command and Conquer: Generals. A key feature of C&C:G's .ini system was its devilish simplicity. The syntax was simple and easy to understand - they were, after all, just configuration files (hence the .ini extension).

So here's what the syntax of SCALE looks like:

Material SpriteMaterial
{
    ~Shader
    {
        Filename = "Sprite.fx";
        TechniqueName = "Sprite2D";
    }

    ~ShaderPass
    {
        PassName = "Pass0";
        RenderTarget = &BackBuffer;
    }

    ~ShaderConstants
    {
        WorldMatrix = "WorldMatrix";
        OrthoProjectionMatrix = "OrthoProjectionMatrix";
    }

    ~Texture
    {
        Filename = "laying_rock7_largeDXT5.dds";
        ShaderBinding = "Texture";
    }
}

As you can see, the syntax is really simple. It's broken up into a simple hierarchy:


-Translation Units
    -Objects
        -Modules
            -Variables

At the top, you have translation units. They're analogous to C/C++ translation or compilation units, in that they are effectively separate "modules". Each translation unit can be parsed without needing to look at any other translation units.

Each translation unit is typically represented by a file, but no limitation is imposed by the parser. Inside each translation unit is any arbitrary number of objects. Each object can be of any arbitrary type. Each object can also hold an arbitrary number of modules. And each module can hold an arbitrary number of arbitrarily typed variables with arbitrary values.

Confused? Let's take another look at the syntax we saw above. That code is actually an entire object - named "SpriteMaterial". The type of this object is a "Material". This works the same way as declaring variables C/C++: declare a type, then the variable name. In the body of the object, you are allowed any number of modules. A module is signified by the tilde prefix ('~'). For example, there are 4 modules in the above object: "Shader", "ShaderPass", "ShaderConstants", and "Texture". SCALE allows any number of modules, and they can have any name. SCALE also allows duplicate named modules - for example, it is legal to declare two modules named "Texture". Inside each module is a set of variables. SCALE also allows any number of variables, and they can be named however the application wishes.

Two types of comments are supported in SCALE: C-style comments and C++ style comments. Comments are merely skipped by the parser. They act just like they do in C++:


Foo Wilma
{
    ~Bar
    {
        Baz = 12.0;
        //Baz = 3.14;
    }
    /*~Fred
    {
    }*/
}

Like in many languages, whitespace is mostly irrelevant in SCALE and has no syntactic meaning. Tokens in SCALE are delimited by a few things:
- Comments
- Braces
- Newlines
- Whitespace
- Semicolon
- Equals sign

That means that you can separate two identifiers with any of the above constructs. For example: this is entirely legal:


 Mesh/*comment*/MyMesh{
  ~File


{
        Name="fromage";
      LazyLoad      =   /* hi, ma */ false;}
 ~Options{WeldVertices=true;Epsilon=0.01;}~Foo{Something=42;}
}

It's just unreadable, that's all.

So that's the syntax of SCALE. Very easy to understand and remember. In my next post, I'll go into the variable typing system, and describe the naming system.

SCALE III

"SCALE"?

SCALE is a wonderful little backronym that a friend and I came up with back in 2006. It stands for "Scalable Contant Abstraction Layer Engine". It's an overly fancy name for a helper library designed for usage in data-driven games.

History

Before I began dabbling in hobbyist game development, I liked to make mods for games. Back in 2003, I joined the Halogen team, which aspired to create a Halo total conversion mod for Command and Conquer: Generals. Following the disappearance of their existing coder at the time, I quickly assumed the role of lead coder for the project - and remained sole coder until the project ended in late 2006.

Halogen was actually reasonably close to completion. By the middle of 2006, most of the UNSC faction was complete, and the Covenant units and structures were well on their way. We had plans for an open beta in mid-late 2006. By that time, Halogen had gained a reasonably large amount of popularity. We were well known in the C&C modding community, and we also made a small name for ourselves in the Halo community. Our updates were often mentioned on the front page of halo.bungie.org.

In September 2006, we effectively recieved a cease and desist letter. It wasn't actually sent by Microsoft, though. SketchFactor, a community manager at Bungie, elected to inform us of the bad news before Microsoft's lawyers got involved. And so, after 3 years of working on Halogen, we officially shut down the Halogen project. If you google "halogen mod" you may still be able to find some online articles if you want more information.

Of course, it was no secret why were were shut down. A few weeks after the notice, at X06, Microsoft's new Halo RTS "Halo Wars" was announced. Since Command and Conquer: Generals was an EA game, Halo was Microsoft's intellectual property, and Halogen was going to be direct competition, it's obvious why we were shut down.

But I never forgot the experience I gained through Halogen. Command and Conquer: Generals was an extremely moddable game - practically all game assets were exposed through a simple, easy to use plain text system. All in-game entity definitions were changeable through regular .ini files - making modding a breeze. This extreme moddability is what kept Command and Conquer: Generals alive for so long. Through my experience on Halogen, I learned the value of highly data-driven game design.

That's what SCALE is. It's not just a format, it's an engine for data-driven assets, based loosely on the .ini modding system used in Command and Conquer: Generals. Effectively, C&C:Generals was my inspiration for designing this system, and the great time I had modding it provides the motivation for SCALE's implementation.

Now that my overly lengthy history lesson is over, I'll begin talking about SCALE itself in more detail in a later post sometime.

Saturday, March 22, 2008

Macros

Once again I am boggled by the sheer evilness of macros. They're so innocent, like a cute little bunny. But the moment you turn your back you immediately find it gnawing on the back of your skull in an attempt to forcefully extract all of the remaining slivers of sanity which you hold so dear.

After implementing exceptions in SolSector Engine, I turned to improving some other error-related code: asserts. asserts are handy little macros which basically make an assertion that a certain condition must be true. Otherwise the program breaks. And it breaks loudly.

In C/C++, you use it like this:
assert(x != 0);

This means that the variable x should never be equal to 0, and if it is the program will break and an "Assertion Failed" dialog box will pop up (on MSVC, that is). Asserts were intended to be debugging tools, so they are only active in debugging mode. That is, they're only active if NDEBUG is not defined. Otherwise, they're removed by the preprocessor.

I see a little problem with this. All too often, I find that a particular problem happens when compiling in release mode, but for some reason doesn't happen in debug mode. In debug mode, code is compiled with no optimisations, and with full debugging information on. And it's often that it feels like I've got some freaky quantum errors going on... the errors magically disappear when I try to observe and debug them.

In debug mode, things like RTTI, exceptions, strict floating point precision, and optimisations are all turned off. That's a lot of different variables between release and debug mode to test - it means that if an error occurs in release but not debug mode, then any of the differences between the two modes are causing the problem. For example, the lack of RTTI may be making a dynamic_cast fail somewhere in release mode but not in debug mode. The lack of floating point exceptions might be screwing with math calculations somewhere. And the list goes on.

Turning asserts (aka. a vital debugging tool) off in release mode only exacerbates the problem. An assert might catch something that would be causing the error in release mode - but because the default library assert is tied to debug mode, I needed to write my own so that I could toggle it on/off at will without having to rely on NDEBUG. Having a custom assert also means that I don't have to deal with that ugly default MSVC-generated "assertion failed" dialog box.

Searching for resources, I came across this article. Read it. Now. If you didn't think macros were evil before, you should after reading that. It's not to say that macros shouldn't be used (indeed, assert is a perfectly valid use for a macro), but it highlights just how difficult macros can be. The average programmer should be wary of writing macros because, more often than not, you'll get them wrong.

I'm not even going to attempt writing my own assert macro now, and instead will use the one in the aforementioned article (it was released under the MIT license). And I can see that some of my existing macros need to be fixed. The mind-boggling evilness of macros strikes again.

Thursday, March 20, 2008

Exceptions vs. Error Codes

When I first started SolSector engine, I was faced with a bit of a dilemma. I had to choose how to perform error reporting and recovery. I had two choices: error codes or C++ exceptions.

Error codes, like the HRESULT returns that DirectX uses, have bern tried and tested. They work well, and they're fast. However, they're a pain to manage. For every function that can possibly return an error, you need to check the return code. With hundreds of functions, this can be a major pain. Often, an error occurs deep in a function stack, and that error would need to be propagated through all the functions until it gets to higher function that can actually do something about the error. However, error codes are fast and really simple to use.

Exceptions, on the other hand, seem to be hated by a lot of people. I'd heard of the horrid execution speed of exceptions, and the apparent difficulty of writing exception-safe code. For instance:

void foo(char*, char*);

void bar()
{
foo(new char[5], new char[5]);
}

What's wrong with this code? Well, either one of those new's can throw an exception. Depending on how the compiler is feeling today, this could cause a memory leak because if an exception is thrown, foo() will never execute and the pointer to the allocated memory will be lost forever. I'm sure you can think of many more similarly nasty cases.

Initially, I chose to use error codes because of all the negative criticism regarding exceptions. And error codes worked well. I wrote a bunch of helpful macros to facilitate automatic error code returns and logging. But, as the project grew, error codes became more and more of a problem. Because error codes are only integers, they cannot store additional semantic information. For example, if a mesh loading function failed to load a file, I'd like to be able to know what the filename was. To alleviate this, I wrote a logging system so that whenever an error was returned from a function, it would automatically log the error (along with any additional helpful error/debugging information) in a global logger class. The returned error code would then be an index to the logged error message.

As you can imagine, this was very ugly. It seemed hackish and didn't feel like a clean solution (trying to hide all this ugly logging business behind macros, etc). It was at this point that I began to seriously consider exceptions.

I asked myself, why did I choose to avoid exceptions in the first place? I had a few main reasons:
Performance. I feared that exceptions would prove to be unacceptably slow.
Maintainability. I had heard that exceptions were notoriously difficult to work with and that it was hard to write exception-safe code.
Experience. Back when I began SolSector Engine (2006-ish) I hadn't had a large amount of experience working with exceptions yet. But I was already quite comfortable with error codes. Not having much experience with exceptions exacerbated my fears that I would not be able to write proper exception safe code.

I reviewed these reasons, and did a lot of research on the topic. As it turns out, my performance fears were unfounded. There were two things I was initially concerned about: the intrinsic cost of having exceptions on, and the cost of actually throwing an exception.

The intrinsic cost of exceptions is no lower than regular error codes. The general gist of it is that in any decent compiler, compiling with exceptions on just means that a function pointer (representing the exception handler) is popped on/off the stack when a function is entered/exited. Of course, there's probably more involved in the process, but calling a function with exceptions on is no more expensive than calling it and returning an error code. Another critical point is this: the standard library uses exceptions. If I want to catch out-of-memory errors and the like, I'd need exceptions to be on, anyway.

Of course, actually throwing an exception is quite slow. But let's stop and think for a moment. When are exceptions thrown? Exceptions are only thrown when an error happens. And when an error occurs, performance is the least of my worries. A great saying is, "throw exceptions in exceptional circumstances". If you're using exceptions for regular flow control, then you've got bigger problems than mere performance issues. This is also a great example of premature optimisation (which, as we all know, is the root of all evil). When an error occurs, it is far more important to have a useful error reporting system, rather than jumping through hoops and sacrificing code usability/maintainability (like I did with the logging system for error codes) to achieve better performance.

Maintainability of exceptions isn't that bad. Sure, writing exception-safe code is hard (very hard). But, it's not as bad as I would have imagined. Ever since I started using C++, I realised the value of RAII. For those of you who don't know, RAII stands for "Resource Acquisition Is Initialisation" and is pretty self-explanatory. Its meaning extends to deallocation - when an object is destroyed, it should clean up its own mess. In C++, this is made possible by the use of destructors: when an object is delete'd or goes out of scope, its destructor is called. My extensive use of RAII means that exception-safe code is an order of magnitude easier - when I throw an exception, I know that all my objects that I allocated on the stack will be cleaned up. And since I almost never deal with raw pointers (learn to love smart pointers!), I don't even have to worry about the memory I allocated in a function. With the usage of good, idiomatic C++, the maintenance costs if exceptions are lowered.

Recently, I've been using the .NET framework a lot more. In particular, I was exposed to C# while I was participating in the Google Highly Open Participation Contest, where I worked on improving the performance of the Mono C# Compiler. The compiler itself was written in C#, and I was able to increase the performance of the compiler by 5-10%. That's where I picked up C# - I learned it on the spot as I read through real code, learning as I went along. Exposure to such a "modern" language showed me the value of exceptions - error handling in .NET was far, far easier and more elegant than in SolSector Engine, for example. I could just set up a try/catch block around something, and if anything went wrong I could deal with it appropriately. In .NET, no more did I have to deal with massive blocks of if statements and error code checks.

So I finally made the jump to exceptions for SolSector Engine. Thanks to some handy macros and the power of regex, it only took me a few days to do (although testing will take much, much longer). It's still very early, but they seem to be working well so far and I haven't encountered any major problems. The error system is now a lot more robust and easy to use, I think. I got rid of the logging contraptions I used for error codes, and stored information inside exceptions themselves - much more convenient. The new code integrates better into the STL, too. Rather than setting up massive try/catch blocks and attempting to convert standard exceptions into error codes, I can now just not bother and allow the exceptions to pass through. It's the same with constructors - exceptions are the best way to signal a failed constructor.

I wrote my exceptions to be similar to those found in .NET - as they were simple and familiar. It's still early, and lots of testing needs to be done, but for now it seems that I made the right choice. From now on, it'll be exceptions all the way.

Monday, March 17, 2008

Implementing HDR in Clouds

It seems as if High Dynamic Range lighting (HDR) is becoming all the rage for gaming graphics, nowadays. I’ve seen it used in a multitude of games thus far. I remember my first HDR experience. It was back in 2004, when FarCry had just released its 1.3 patch enabling HDR on Geforce 6 series cards. Since I was between upgrades, I was using a Radeon X300 at the time. So I went to a friend’s house to try it out on his 6600GT. And I was stunned at the oversaturated, glaring bloom effects.

I’d wanted to implement my own over-the-top HDR ever since.

What is HDR?
So for the uninitiated, what is HDR? In the real world, we humans experience a vast range of contrast. We can see very bright objects, such as the sun, as well as very dark objects such as a moonlit scene. In computers and modern graphics, however, we're limited to a very, very small range. The brightest colour that can be expressed in a 32-bit ARGB texture is (255, 255, 255, 255), or pure white. And it isn't very bright. So we use a technique called HDR to alleviate this. When rendering the scene, we render it to a format with a very high dynamic range, for example a 64-bit texture. Most commonly, we use a floating point format in the form: A16R16G16B16F, which means a 16-bit float for each component. Each component can now roughly store a value up to about ~65,000... much better than the 255 we were limited to on 32-bit ARGB.

That means that in our scene, very very bright objects will have colours greater than 255, while dark objects will be able to have precise values near 0 (thanks to floating-point). But now we've got a problem. Regular monitors and displays can't display such bright values: the hardware itself is still limited to 8-bit component RGB, with its annoying 255 maximum. So an operation called tonemapping is performed. Tonemapping is the process of converting the HDR image into a Low Dynamic Range image (LDR), so that the monitor can actually display it. This means shifting and squeezing the brightness of a scene so that it best fits within the display limits of the monitor. This is why, in HDR-rendered scenes, the brightness changes as you look at bright/dark objects.

After tonemapping, you've got an LDR image which you can now display to the monitor. That's really all that HDR needs. But if you've ever used a camera, you may notice some things are missing from our HDR scene. In a camera (and in the human eye), we see that bright things tend to get lens flares, as well as blooming. So often you'll see these things implemented along with HDR (and, indeed, seem to have become synonymous with the term). Bloom, starring and lens flares are all post-processing effects, and they simulate the effect of light passing through a lens and can greatly increase the image quality of a scene. More information can be found in the resources I'll list later on.

HDR in Clouds

Clouds doesn’t use any dynamic lighting. All shading done on the clouds is static – cheap shader approximations because I didn’t want to implement anything overly complicated. Nevertheless, I made the decision to implement HDR because it provided an excellent real-world test of SolSector Engine’s shader system.

I set out by reading up on the relevant material. As HDR had gained popularity for use in games over the past few years, there was no shortage of information. The first thing I took a look at was the famous implementation by Masaki Kawase, in the form of his demo dubbed “Rthdribl” (Real-Time High Dynamic Range Image Based Lighting). It’s available here. I used his demo as a base for what HDR should look like - and set out to write my own shaders in HLSL from scratch.

The full HDR pipeline is pretty complex. I drew up an initial diagram for the pipeline: (warning: large)

But, this was unoptimised and took up a horrific amount of memory:

After some reorganisation, I was able to significantly reduce the amount of buffers to just a few, but it really convoluted the pipeline: (warning, very large)

Problems

One of the main problems was simply getting it to look right. Initially, I used the Reinhard operator to tonemap the HDR scene, but I found that it was ill-suited for this purpose. It would overdarken bright scenes, and overbrighten dark scenes. In other words, the dynamic range was a little too high. After many, many hours of tweaking, I resorted to simply removing the Reinhard operator in favour of a simple linear one. The Reinhard operator (as well as the Reinhard improved operator) would simply reduce contrast over the entire scene to fit the HDR space into LDR - a byproduct of its hyperbolic curve, even if it did guarantee that the HDR scene mapped fully into LDR. It may have been theoretically ideal, but it just didn't look good. The linear operator, in addition to a simple luminance clamp, is certainly not "technically" correct. But it doesn't matter; it looks nice and the tonemapping effect is now quite subtle (as it should be).

HDR is just one of those things that needs a lot of tweaking to get right. Especially the tonemapping, and various parameters for bloom/glare/etc.

For anyone looking to implement HDR for themselves, here are a few of the things that helped me along the way:

Masaki Kawase's GDC 2004 slides, available here and here. Go through them both!
For those interested in some theory, be sure to check out Erik Reinhard's whitepaper here.
The DirectX SDK samples. They provide full source code and explanations, and a heck of a lot of good information. See: HDR Lighting (Direct3D 9), HDRLighting Sample (with executable and source code available the SDK), HDRDemo Sample. The HDRLighting Sample is particularly useful. It implements the entire HDR pipeline and the HLSL shader code provided was a great reference point.

Sunday, March 16, 2008

Multithreading in Clouds

The Clouds demo posed an interesting problem for me. It required large amounts of CPU power, while at the same time I was finding it difficult to optimise the code any further. The cloud generation and render setup was hitting the CPU hard - and only 50% of my dual core CPU was being utilised.

While I had dabbled in multithreading before, I'd never applied it to any real-life applications. Clouds was in a unique situation: it was a CPU-intensive application, but had absolutely no requirements regarding latency. So I was free to design the most rediculously input-latency-heavy system I could think of.

I immediately split the application into 2 threads - simulation and render. Simulation would generate perlin noise, do the day/night cycles, and everything relating to the world itself. The render thread would, well, render. The render thread and the simulation thread are completely decoupled - they can both run at any speed and neither would affect the other.

I chose a buffered approach to communicate between the two threads. The simulation thread would write information to a buffer, and the render thread would then take that information and use it to draw stuff to the screen. I started off with 3 buffers - one for simulation and two for render. Since the simulation thread was locked to 10hz and the render thread ran as fast as it could, the render thread needed to "fill in the blanks" somehow between each 100ms long timestep. So the render thread uses its two buffers to interpolate between, to produce smooth looking results while the simulation thread writes to the third buffer. Here's what it looks like:

Here's the problem: this leaves the system open to data starvation. If the simulation thread finishes its simulation before the render thread is done, then the simulation thread has to sit there and wait until the render thread is done with one of its two buffers. However, if the render thread finishes before the simulation thread is done, then the render thread has to sit there and wait for the simulation thread to finish with its current timestep before it can continue interpolating.

So I added a fourth buffer. The addition of the fourth buffer meant that input latency had grown to a rediculously massive 400-500ms. But that didn't matter, since I didn't have any input that could affect the simulation! The fourth buffer sits in between the simulation thread and render thread, and it donates itself to whoever needs it most. If the simulation thread is done before the render thread, the fourth buffer goes to the simulation thread so it can begin writing the next timestep's data. If the render thread is done before the simulation thread, the fourth buffer goes to the render thread so it can continue interpolating and rendering stuff. So here's the final flow:

Because of this buffered approach and the fact that the two threads were completely decoupled, syncronisation overhead is basically nonexistant. However, this system would not be suitable for usage in a real game, since it introduces impossibly large amounts of input latency to the game.

However, thanks to the helpful members of GameDev.Net, I now have some ideas for how to implementation of this system in a real game.

Saturday, March 15, 2008

Clouds Demo

Well, the Clouds demo is pretty much done. The Clouds demo was written to test specific systems in the SolSector Engine. Specifically, the resource management systems, the mesh/material/texture interaction systems and the usage of advanced shaders.

Here's a youtube video:

The music is "Primavera" by Ludovico Einaudi. The sprites for the clouds were borrowed from this article at Gamasutra.

The demo features:
- DirectX10 3D renderer
- Dynamic, perlin-noise generated volumetric clouds
- High Dynamic Range lighting
- Dynamic day/night cycle
- Multithreaded, multi-core optimised design

It runs at a blazing 20fps on my Core 2 Duo E6600 and Geforce 8800GTS machine. It's completely CPU bound, because of the massive number of sprites I'm dealing with.

Cloud Generation
The clouds are generated through simple perlin noise, via libnoise. Since realism wasn't exactly a key goal here, I just needed a quick and dirty way to generate nice patterns. Remembering the Render>Clouds filter in Photoshop, I turned to perlin noise. Perlin noise is a type of coherent noise defined by Ken Perlin. It's seen a lot of use all over the place, especially in dynamically-generated content such as these clouds.

Originally, I had planned to use a more complex simulation to generate more realistic clouds. Initially, I took a look at the works presented by Dobashi et al. in their whitepaper. They utilised a simplified fluid dynamics simulation model, and during preproduction of the demo I attempted a quick implementation of their simulation. Needless to say, I failed. I didn't have the necessary physics background to understand what they were doing, and the simulation ran too slowly to be of any use anyway. It was at this point that I turned to simpler approaches. I found a very informative article on Gamasutra regarding the cloud rendering system used in Microsoft Flight Simulator 2004, authored by Niniane Wang of Microsoft. There was also a plethora of information available on GameDev.Net, but it most of it wasn't all too relevant.

I chose perlin noise because it was simple. However, it was also quite slow. The clouds are essentially just a 128x128 grid of perlin noise - which means a grand total of 16,384 perlin noise samples to regenerate per tick. And it needed to be done on the CPU. I was able to leverage this by implementing a multi-threaded system, which I'll talk about later. All you need to know right now is that, essentially, the generation of the perlin noise came "for free" if you have a dual-core processor or better.

The perlin noise generated was on a 2D plane, however. To achieve real, proper volume, I'd need to generate unique perlin noise in all 3 dimentions. This would have greatly increased the number of perlin noise samples required per frame (roughly by an order of magnitude). So I faked it - wrote a quick and dirty software bright-pass filter and applied it to the generated 2D perlin noise. This eliminated the darker regions of the noise, and when stacked above each other gave a roughly pyramidal shape for each cloud. To scroll and billow the clouds, it was as simple as adding an offset to the inputs to the perlin noise generator.

To render the clouds themselves, a simple 128x128x10 sprite grid was placed, and each sprite was made to be more/less transparent, depending on the value of the perlin noise corresponding to each sprite.

I'll post more details about the workings of the Clouds demo later on. I'll also post a higher-res version of the demo video, as well as the actual executable if I can find somewhere to host it.

Sunday, March 9, 2008

Hello World

So this is the blogosphere. Not as glamorous as I had imagined.

A little about me: I'm a hobbyist game developer, and I like to write games in my spare time. I'm fairly well-versed in C++, and I know a little C# on the side. I'm currently working on an ambitious project named "SolSector Engine". I plan for it to be a full-scale, DX10 3D game engine written in C++.

What's with the name? Well, "SolSector" is actually the name for a future game that I and a friend plan to start someday. And the game needed an engine. So SolSector Engine was born. I usually shorten SolSector Engine to SSE, not to be confused with Intel's Streaming SIMD Extensions. If I mention SSE, I really do mean SolSector Engine and not an instruction set for instruction-level parallelism in CPUs.

So what is SolSector Engine capable of currently? The engine is still in its early phases of development. However, it's capable of some pretty neat stuff. It features a complete DirectX10 renderer, and systems for the automatic management of resources, sound, input, and windowing. Because it uses Direct3D10, it is Vista-exclusive.

Right now, I'm writing a small tech demo to test out the various features and subsystems of SolSector Engine. It's called "Clouds", and should be ready in the coming weeks. It won't be particularly pretty or anything, since it is designed to stress-test the engine, not to produce nice graphics (although that's a bonus).

I'll be making more posts about the Clouds demo and SSE in general later on.

Sc4Freak's Devblog