Friday, March 5, 2010

Save Games and All That

I have long postponed a task to improve Detour serialization. My use of the word may be a bit wonky, but I mean the business storing and restoring the game state. In case of Detour this also affects how you would initialize the navmesh too.

In pretty much all game projects I have participated in save games have been pretty much retrofitted into the game. It usually results a lot of stupid zombie coding and super long debugging sessions.

When I started thinking about Detour, I have tried to make decision which eventually will make the serialization implementation a bit simpler. That goes into how data is kept internally and what kind of data the API returns. For example there is no pointers, but handles instead.


Handles

Handles are simple abstraction which allows to keep track references to your data. The difference to a pointer is that handles can be invalidated, and you don't need to worry about someone having a reference to your data. The next time the data is accessed, the access will just fail, in which case the user should also invalidate the handle.

Usually in pointer based systems you use some sort of callback system, which tells the pointer owner that something has changed and you should remove the reference. Or it can result things like smart pointers or reference counting.

As the navigation data is solely owned by the navmesh, ref counting and smart pointers would not be a good choice. I also wanted the system to load the navmesh data as one contiguous chunk, which makes stuff like the usual ref counting not really a good choice.

Another good thing about the handles are that they can be stored in the disk as simple integers. These two facts: 1) simple handling of invalid references and 2) simple storing of returned data, make handles good choice when you need to pass handles to internal data through your API.


Detour Handles

Each detour handle contains three values: salt, tile ID, and polygon ID. Tile ID is simply an index to internal Detour tile array, polygon ID is index to a polygon within that time and salt is special ingredient which tells which version of tile we are using. The tile ref is constructed so that 0 (zero) value an be used as null handle.

Detour has fixed pool of tiles it reuses. Removed tiles are returned back to the pool and new tiles are allocated from the pool. Each of those tiles are identified by and index to the tile.

Every time a tile is removed the salt of that tile is incremented. This operation makes sure that if a tile changes, all handles which are pointing to the old data can be detected.

This also implies that if we save a dtPolyRef (the handle Detour uses) to disk when a save game is stored, we need to also restore the state of the Detour mesh when we restore a save game so that the handles are valid after save game is restored.


Saving Game State

The least information that needs to be stored on disk is the tile index and salt of each tile that we have in the current state of the navmesh. This allows the handles to survive the serialization. Additionally we may want to store the per polygon flags and areas too, as the API allows to change that too.

That is still piece of cake. But this is where things can get a bit complicated.

Pretty much every single game I've seen out there uses different way to load assets and even more so how they store and restore save games.

When save game is restored, some games load and initialize a full "section" of the game and then load a delta state on top of that. Some may first store the structure of the level and then load the assets that are needed.

While they both may sound like almost the same, the difference in load performance can be huge. The first, full section loading, allows us to place all the assets in such order that they are fast to load from disk, while the second method may be simpler if the world is allowed to change a lot but may and will result more fragmented loading, in which case seeking may kill your loading performance.

Streaming changes the game a bit too. I have never worked on a game which would have excessively used streaming. So I'm reaching out to you to have a bit better understanding of save game with streaming.

How do you handle asset loading and how do you restore the game state from a save-game?


Current Sketch

My current idea of how the API for serializing Detour navmesh looks something like this.

dtNavMeshSerializer nser;
nser.storeDelta(navmesh);
fwrite(nser.getData(), nser.getDataSize(), fp);

The serializer will create a contiguous clump of data which can be directly stored into disk. The process to restore the state is as follows:

dtNavMeshSerializer ser;
if (!ser.init(data,dataSize))
return false;
for (int i = 0; i < ser.getNavMeshResourceCount(); ++i)
{
const dtTileRes& tres = ser.getNavMeshResource(i);
navmesh->restoreTileState(tres.tileId, tres.stateData, tres.stateDataSize);
}

This assumes that the structure of the navmesh has not changed in between. If the structure may change, you will need to load the tile data before restoring.

dtNavMeshSerializer ser;
if (!ser.init(data,dataSize))
return false;

dtNavMesh* navmesh = new dtNavMesh;

if (!navmesh->init(ser.getInitParams())
return false;

for (int i = 0; i < ser.getNavMeshResourceCount(); ++i)
{
const dtTileRes& tres = ser.getNavMeshResource(i);
int dataSize = 0;
unsigned char* data = 0;
// Load data from disk.
if (!m_assetMgr.getNavmeshTileData(tres.resourceId, &data, &dataSize))
return false;
navmesh->addTile(data, dataSize, false, tres.tileId, tres.tileSalt);
// Restore state
navmesh->restoreTileState(tres.tileId, tres.stateData, tres.stateDataSize);
}

Note how the addTile looks different than currently. The tiles don't actually make any sense in random order, so in practice there is no need to pass the tile x and y coordinates when tile is added. This data can be stored in the tile header and acquired from there. Also not that tile ID and tile salt are also restored when tile is loaded to make sure the handles remain valid. That m_assetMgr is your custom asset manager, which will actually handle loading the navmesh data. This example also assumes that the asset manager handles allocating and releasing the data.

In case you use Recast to dynamically create the tiles you can use serializer to store the whole navmesh data too.

dtNavMeshSerializer nser;
nser.storeAll(navmesh);
fwrite(nser.getData(), nser.getDataSize(), fp);

Restoring the state would look something like this:

dtNavMeshSerializer ser;
if (!ser.init(data,dataSize))
return false;

dtNavMesh* navmesh = new dtNavMesh;

if (!navmesh->init(ser.getInitParams())
return false;

for (int i = 0; i < ser.getNavMeshResourceCount(); ++i)
{
const dtTileRes& tres = nser.getNavMeshResource(i);
navmesh->addTile(tres.tileData, tres.tileDataSize, true, tres.tileId, tres.tileSalt);
}

This code also restores the tile ID and tile Salt so that handles remain valid, but it does not store state, since the tile data already contains the state data too.


Conclusion

My ideas towards implementing the serialization are getting together. I'm requesting comments. Each game has unique requirements for storing save games and loading assets. There is no way I can please all, but I can try to make it little less painful for as many of you as possible.

Please, let me know if I overlooked some scenario with my sketch. And also please share how do you handle asset loading and how do you restore the game state from a save-game? If you don't want to share it here in my blog, feel free to mail it to me too!

25 comments:

  1. Maybe I'm being naive here, but why would you need to store the state of your navigation for saving/loading?

    Couldn't you just as well just recalculate your path / look up your position in the navmesh -after- loading a game?

    I suppose you might end up with some determinism problems though.
    Am I missing something?

    Disclaimer: I haven't actually used Detour (yet)

    ReplyDelete
  2. That is one viable option too.

    The polygon IDs can be used for other things too, like marking on which polygon a cover point lies, etc.

    You may end up with slow save game loading if you recalculate everything on load.

    ReplyDelete
  3. I guess my requirements aren't much more than save the whole mesh at generation time and load the whole mesh at level load time in game.

    One thing that would be nice though, since you are considering an API performing file access, is to provide hooks for the application to use its own file handling routines. For example:

    typedef bool (*FileOpenCallback) (const char* filename, bool readonly, void** handle, void** userdata);
    typedef bool (*FileCloseCallback) (void* handle, void* userdata);
    typedef bool (*FileReadCallback) (void* buffer, size_t sizebytes, size_t count, void* handle, void* userdata);
    typedef bool (*FileWriteCallback) (const void* buffer, size_t sizebytes, size_t count, void* handle, void* userdata);

    struct SerializationConfig
    {
    FileOpenCallback file_open;
    FileCloseCallback file_close;
    FileReadCallback file_read;
    FileWriteCallback file_write;
    };

    // construct serializer using default file access callbacks
    dtNavMeshSerializer();

    // construct serializer using custom file access callbacks
    dtNavMeshSerializer(const SerializationConfig* config)

    It would also be useful to have something similar for memory allocations, with hints specifying the lifetime of the allocations:

    // callbacks for custom memory handling - allocates arrays of unsigned char
    typedef unsigned char* (*MemAllocCallback) (size_t size, int alloc_hint);
    typedef void (*MemFreeCallback) (unsigned char* ptr);

    struct AllocConfig
    {
    MemAllocCallback mem_alloc;
    MemFreeCallback mem_free;
    };

    ReplyDelete
  4. For me its unusual then subsystem store its state. Anim subsystem doesn't store all active bone pos \ orient on save even bone could be affected by external forces such as IK \ physics.
    I think its a user deal what to store.

    But single tile loading is good as is for streaming or precalculated destruction or environment morphing (I know that i could recalculate tile in realtime ). But in this case all paths following trou this tile must be invalidated \ recalculated too.

    ReplyDelete
  5. Is it possible to use 64 bit handles? Using 32bit from which 10 are reserved for salt gives us 22 bits for polygons & tiles, which is basically artificially limiting the amount of geometry you can possibly have. I'm actually in the situation where I'm exceeding the bounds given by the handle size.

    ReplyDelete
  6. You can try to typedef "dtTileRef" to a and unsigned 64bit int and try. I'm not 100% in what kind of things you may run into.

    Remember, the tile count is the number of active tiles. You can have 1000 tiles generated and stored on disk, and only use 100 at the time.

    ReplyDelete
  7. I've done that , fixed the problems on reference calculations, but i've got strange crashes on functions wich uses bvtree, is there something to modify that i may have forgot to?

    ReplyDelete
  8. I will need more context than that. Call stack, on which number, and are some variables obviously bad (i.e. tile pointer, indices).

    ReplyDelete
  9. http://pastebin.com/8AqgLZey this is the stacktrace of the crash

    ReplyDelete
  10. patsebin deleted it , it's http://pastebin.ca/1962125

    ReplyDelete
  11. i think i've got it, on dtLink ref's size changed from 4 bytes to 8 bytes , so i need to update all navmesh tiles files

    ReplyDelete
  12. Tiziano, that sounds like it could cause it. Let me know if that was it.

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. rebuilding the mega-navmesh will take some hours, it is the whole world of warcraft map
    and i'm 100% sure that it causes it,
    dtLink will get bigger and it is before bvTree on the file , so bvtree and everything after will get corrupted

    ReplyDelete
  15. i'm trying with a small piece of map, it doesn't find any polygon when searching start/end polygons, but polygons are loaded on the navmesh
    maybe it's better if you contact me somewhere on irc

    ReplyDelete
  16. I don't do irc or im. I have some time tomorrow to check the issue.

    ReplyDelete
  17. ok , you can do that patch on your repo, it's useful having support for more tiles

    i'm trying to find the problem myself anyway , it seems to not be related to the tree

    ReplyDelete
  18. polys[n++] = base | (dtPolyRef)node->i;
    if (!m_nav->isValidPolyRef(base | (dtPolyRef)node->i))
    printf("Invalid reference to polygon on bvtree @ queryPolygonsintile\n");
    that is the problem , base | (dtPolyRef)node->i does not give any valid reference

    ReplyDelete
  19. Take a look at change R235, with those changes I was able to change poly ref to "unsigned long long" (gcc 64 bit unsigned int) and everything worked fine.

    http://code.google.com/p/recastnavigation/source/detail?r=235

    Also take a look at the note at the top of the dtNavMesh.h

    ReplyDelete
  20. inline dtPolyRef encodePolyId(unsigned int salt, unsigned int it, unsigned int ip) const
    {
    return (salt << (m_polyBits+m_tileBits)) | (it << m_polyBits) | ip;
    }

    won't work , you have to cast every operand to 64 bit integer

    ReplyDelete
  21. Good catch, I think my test data was not as crazy as yours :) I added few more casts which should help some cases (R240).

    Other than that, does it work?

    ReplyDelete
  22. it works , it was not working cause i forgot some casts on nodes stuff,http://pastebin.ca/1963143 these are ref calculation functions with all needed casts
    in 2-3 hours it'll be tested on very large scale with tens of navmeshes with > 500 tiles each

    ReplyDelete
  23. it seems to be stable and work correctly.

    ReplyDelete
  24. on your repo you have still wrong casts , 1<<33 for example leads to overflow cause integer constants by default are 32 bit integers

    ReplyDelete
  25. Tiziano, I added casts to the 1's too. I will keep the API 32 bit (salt, it, and ip). When you are using 64bit polyrefs I think the only meaning configuration of bits are: 16bits for poly, 16 bits for salt and 32bits for tiles. There is hard limit of 65536 polygons, so 16 bits is fine for that, I cannot think any practical case where you could have 4G of tiles. If you had, you probably have lots of other problems too (such as floating point precision). 16bits for salt is great.

    ReplyDelete