I would have liked to talk about Concepts Lite, but I’m still working on contiguous integration support for the examples I want to share. Maybe in a couple of weeks.
Today I will talk about a simple data structure, what I call “bitchunks”.
std::bitset
The C++ Standard Library ships with a template called std::bitset, which represents fixed-size sequences of bits. It works great for bit twiddling, giving a clean and sane “container like” interface instead of clever C macros:
It has however some caveats:
It’s statically sized: You cannot declare a bitset which size is known at runtime. There are some alternatives like boost::dynamic_bitset that solve this issue.
Gives access to individual bits only: The bitset[bit] syntax is nice, but there are situations where you want to manipulate more than one bit at once, like getting the first 52 bits off a 64 bit number.
The bitchunk template I show here is designed to solve the second issue.
bitchunk
While both host a value supposed to be manipulated at bit level, the goals of bitchunk are different from std::bitset:
The main goal is to, given a value of a basic type (Like int, float, etc), being able to extract specificchunksof bits from the value, manipulate that chunks; manipulating the value itself.
bitchunks should be “chunkable” too, in a way transparent to the user. That is, if we got the first three bits from a number, it should be possible to get the last two bits from that chunk (which are the bits 2-1 from the original number) maintaining bit ownership on the original number:
Bit manipulation tools
First I will write a set of simple C-like functions to do bit twiddling. This way we encapsulate all the bit manipulation code in a separated utility.hpp header simple to read and debug.
Raw data manipulation
Since the bitchunk template is supposed to work with any integral/floating-point type, I decided to declare an alias to the types I will use for bit twiddling:
The first declares the type we will work with when doing byte manipulation. The second is the type used for accessing bits by index.
Floating point values cannot be reinterpreted directly as integrals, so we should handle floats and integrals in different ways. Instead of having ugly casts across the code, I wrote a template that handles different cases through specialization, then a simple get/set interface on top of that:
Why passing by pointer by default? Just legacy, these functions were written for bitchunk implementation, and in most of the cases bitchunks operates on view mode, holding a pointer to the value instead of the value itself. See the bitchunk implementation bellow.
Bit manipulation functions
Just simple one-liners I hope my compiler will inline:
I’m not very proficient with bits, I’m sure there are better hacks than the functions above.
Finally, these functions use two simple constexpr recipes I usually have as part of the utility.hpp file of my projects:
Bitchunk implementation
So far we have covered the most boring part of the post. At least was the most boring for me to write…
Now let’s dive into bitchunk: Remember the purpose of this data type: To give access to chunks of bits from a value, so we can read some specific bits of a value, manipulate them, etc. All with a syntax simple for the user.
Also one of the key features I wanted when started with this is the ability of creating nested chunks, to be able to ask for a sub-chunk of another chunk.
For this reasons, I wrote bitchunk covering this two sides in two different specializations: In simple terms, the first specialization holds a full value and looks at its bits, and the second is a view to a specific range of bits from a value of type T.
Why not having bitchunk as a view class only? Mostly because it was designed with storage in mind. Also since the range of the view is defined at runtime, it’s not as efficient as it can (should?) be for the cases you want to look at the whole value (The range [0,sizeof_bits<T>)).
As you can see, it’s just a wrapper around a T value. The point of this specialization is to give storage for the T and the accessors the bitchunk needs to work (The range of bits it operates, accessors to the value, etc).
The second specialization has the same interface, but behaves as a view to a T instead of storage:
Covered the implementation details into different base classes, bitchunk looks like this:
Do you remember the sub-chunking ability I insisted above? The trick is that the view specialization of bitchunk_base is a view to any T, another bitchunks included.
Note how the constructor taking a bitchunk initializes the range: The range the user specifies is the range on that value, regardless its an integer value or a view. This was done to return bitchunks transparently. Imagine you have a network protocol specified as: Packets are stored as 32 bit integers, where the first (most significant bits) height bits are a header, and the other 24 data. From the header, its first two bits are an status code, the other six the destination address of the packet.
The code that implements packet processing could look like this:
The rest of the bitchunk code covers accessing to bitchunks:
and manipulating the value held/viewed by the bitchunk:
A real use case for bitchunks: Tagged pointers
I have the habit of implementing a weird technique just for fun. This time were tagged pointers. A month ago I was talking with a friend about how she could learn good C++, and as I usually do, I encouraged her to implement a string class. That way you learn about RAII, the rule of three/five, to hate raw memory management, and in the long term embrace the rule of zero by using std::string and not writing a custom destructor or assignment operator anymore. “Except for special (i.e. freak and funny) purposes” I said her.
“But you should note that the string implementation you know from the university, the simple dynamic array, is never used in real code because of performance considerations”. That’s the problem. Every CS grad knows how to write an string or a stack, but no one knows how real world data structures work. Teachers tell much about Big O, but nothing about hardware. In my experience, most of them still think that CPUs work like in the eighties, this year I have even had a two hours discussion with my algorithms teacher because of some timings I did for an assignment: “This should be wrong, if you do the math this says your CPU is executing 1.5 instructions per cycle on average. It’s impossible to execute more than one instruction per cycle.”.
So I started to work on simple examples of the most common tricks for string implementation, such as SSO, copy on write, etc. Then an idea came to my mind: What about tagged pointers?
The idea is to store string data on the pointer itself if the string is short enough, else allocate dynamic memory through that pointer. Both string length and operating mode (Short or wide string mode) are encoded into the tagged pointer data:
Note I’m relying on x86_64 48 bit addressing, not alignment. In that case, the unused address bits are the lower ones.
With the bitchunk template above, writing a simple wrapper for a tagged pointer was a matter of 3 minutes smashing the keyboard:
So far so good. The tagged_string class implements a simple string using a tagged pointer for storage, following the technique above:
The constructor takes an string literal, checks its length, and stores it applying the operating mode that fits to its length. The methods short_string_bit_(), string_length_(), etc are just sugar to the specific chunks from the tagged pointer:
The class also relies in another private methods in charge of accessing the string in the two different operating modes:
The only hard part was giving an interface to the characters. Since the access to the string is completely different in the two operating modes, I had to write a mechanism to hide all that complexity. I decided to write iterators that hold references to the string they traverse, then accessing to the [OP MODE]_get_char_(std::size_t i) methods above:
With the ugly iterators above the string class is completed with the classic range accessors, operator[], etc:
Summary
Everyday I saw people asking how to get good C++ skills, programming skills in general. I think the best practice is to focuse on a topic you find interesting and try to implement things from that field as if it was a true library/software component. Don’t get stuck in examples, keep diving in the technique, make the code truly multiplatform, check edge cases, etc.
This could be quite challenging. In this case, C++, to make Standard C++ code to work in the same way across all compilers is a pain: A source of debugging hours, headaches and lessons to learn (Most from MSVC :p).
Bitchunks were just an example of this, an idea that looks simple at the beginning, but takes a lot of hours to take it working everywhere the way you want. Here’s the repository for the case you want to look at the sources: Manu343726/strings.