JSON vs Binary Serialization
In this article I will discuss what binary serialization is and the difference between that and a more typical JSON or XML serialization.
Table of Contents
- Introduction
- What is binary serialization?
- Why do I need binary serialization?
- How to do binary serialization?
- Conclusion
Introduction
In the modern world a lot of programmers have settled for one type of serialization and that is the JSON format. The name comes from the JavaScript Object Notation and is a format that has a little amount of symbols and syntax but is still human readable. In essence this is still a text format.
In the older days people used XML to serialize data (and it is still used today in the form of HTML or microsoft office formats). XML has a lot more syntax than JSON but allows for more metadata to each field type in the form of tag attributes.
In general the idea of data serialization is to be able to transform typical programming data objects or structs into a text format to be read later by the same or other application. The reading application might even be in another language. So this is why data serialization formats are usually simple. You can then convert them into any language’s internal types.
What is binary serialization?
Binary serialization is the process of saving data in its raw form. To explain this best a text format would save the following number “12345678” as eight characters. An ASCII character in most programming languages is 1 Byte in size or 8 bits. In comparison, integers for up to “2,147,483,647” can be stored in 4 Bytes only or 32 bits. You can then add one more byte to indicate the type of these 4 Bytes which will result into 5B of total data. Compared to the 8B this is a small difference.
But this example excludes that a JSON file has a lot more. You need to include brackets and a name for the field as well as quotes and colons. This all adds up to a lot more bytes.
Why do I need binary serialization?
You need to work with data. This is both considering video games and network communication. In video games you can have save files or level data that needs to be read from files. The smaller the data and the closer it is to the format that your language uses then the faster the game or save will load.
For network programming you want to send the least amount of data. This will help lag and lost packets (since packets are generally smaller). It will also help reduce cost of hosting since most of the hosting services will charge you for a certain amount of bandwith. On mobile devices it also saves energy.
Some Charts
If you’re still not convinced let me add some charts that show off the comparison between raw, flatbuffers and json:
And one for serialization:
How to do binary serialization?
You can always go on and reinvent the wheel. The C++ language does support input and output stream serialization. You will stumble upon a problem though. How do you handle changes in the serialization? This is where libraries like Protobuf come into play.
Google has developed two great solutions to the serialization problem:
- Protobuf – a library and a schema compiler that is heavily used with their other major technology called gRPC for network communication. Protobuf is a great library for network communication but the library that parses the binary data adds more dependencies that you must also include in your project. This is not ideal if you do not plan to use its network features and only to serialize data for local files.
- FlatBuffers – Another library developed by google. In the case of flatbuffers the buffers themselves are loaded into a model that is convenient for use directly when loaded into memory. This makes it optimal as parsing time is almost none. It still allows gRPC code generation and also some other features as reflection. The most convenient feature though is the ability to have structs and map them to your own types from the generated still optimal code. The library is also minimal and doesn’t have other dependencies.
- Ohter formats – There are noumerous other formats like BSON (slight improvement over JSON), CBOR, CORBA, etc. These formats are out there but are not as user friendly, popular or fast as flatbuffers or protobuf.
Conclusion
So in summary if you do serialization for game development and you have your own engine you might consider using flatbuffers. This library will allow you to save any state, level or data in a format that is just read into memory and you can start work with it. It can be more optimal to use that kind of format for almost anything that you load into your game.
It is also very useful to have this for network communication where both the client and the server agree on the same kind of format an it saves on packet size.
Data Serialization
You can get my course on Data Serialization which covers the basics of binary serialization through the flatbuffers library. You can get the best price from here:
Leave a comment