Introduction
A binary search cube is a linear data structure that contains a sorted two dimensional dynamic array of nodes which each point to a sorted array of key/value pairs. This creates a three dimensional structure with subsequently three axes. The first two axes function as a lookup table and do not contain any keys. As the axes are of roughly similar size the structure resembles a cube.
The cube can shrink and grow as elements are added. In an ideal situation the axes are of identical size, but it doesn't significantly degrade performance if they are not perfectly matched.
To find a key a binary search is performed on the x axis to find a matching y axis, a second binary search is performed on the y axis to find a matching z axis and a final binary search is performed to find a matching key on the z axis. Each binary search is performed in O(log (cbrt n)) time and the three searches combined are the equivalent of an O(log n) operation.
The term cbrt stands for cube root.
Insertion
Inserting or removing a node is an O(cbrt n) operation. If a node falls between two Z axes the search algorithm can be optimized to pick the lower order axis so the node is appended to the end, which is a faster operation. As a consequence a node will never be inserted on the 0 index with the exception of the floor axis.
The special case of a floor insertion can be checked at the start of the binary search, subsequent binary searches on the X, Y and Z axes can be optimized to deference the 0 index check.
Finding an index
Finding an index is an O(cbrt n) operation. In order to be able to find an index the volume of the X axes and each Y axis must be remembered. The average search time can be split in half by starting the index search at the end of the X axis if the index is greater than half the volume. With this optimization finding the first or last index is an O(1) operation.
The index search is optimal when the X and Y axes are of identical size. Finding the index on the Z axis is an O(1) operation.
Finding a key
The most efficient search approach appears to be a binary search using deferred detection of equality. Notably the detection of equality can be skipped for the X and Y axis.
In order to increase the access speed each axis can remember the floor key, which is all that is needed to perform a binary search using deferred detection of equality on the lower axes. Alternatively a cube of floor keys can be maintained, which results in better cache utilization.
For each axis there is a different optimal binary search. While each axis is of similar size the X axis is most likely to be available in the CPU cache. A boundless binary search works well for the X axis. A boundless quaternary binary search (see the
binary search section) is a good pick for the Z axis. I invented these binary searches and they can be twice as fast as the bound-based binary search used in most applications.
Balancing a binary search cube
Balancing a binary search cube is fairly simple. When adding a node to the Z axis and it exceeds twice the size of the X axis the Z axis is split. If this results in the underlying Y axis exceeding twice the size of the X axis the Y axis is split. These are O(cbrt n) operations.
It can be detected whether the cube is imbalanced by comparing the volume to the optimal volume. Shrinking an Y axis is a fast operation that is required infrequently at regular intervals.
Using a binary search cube as a priority queue
A binary search cube can be optimized to check the last index first, allowing O(1) push and pop operations.
Alternatively the last index of each axis can be checked first when performing a binary search. This allows O(m) push and pull operations, where m is the number of dimensions of the cube.
Memory usage
A binary search cube has less memory overhead than a binary search tree. A cube with 1,000,000 elements will have 100 X nodes and 10,000 Y nodes. A binary search tree of the same size requires 2,000,000 leaf node pointers, with some implementations requiring an additional 1,000,000 root node pointers.
CPU usage
While inserting and removing elements is an O(cbrt n) operation only half the nodes on an axis need to be moved on average. As moving memory within an array is extremely fast, insert operations on a binary search cube should outperform a balanced binary tree until the cube grows extremely large. A cube with 1 million nodes will on average require 50 nodes to be moved for an insertion, while a balanced binary tree has to move 20 nodes using more complex operations.
Binary Search Hyper Cube
The toll of insert and resize operations on large cubes can be reduced by using a hyper cube. The maximum length of an axis is set at 16 (or 32) and whenever the maximum volume is exhausted an extra dimension is added. The hyper cube starts out as a 1 dimensional array, turning into a 2 dimensional binary search square when the 17th element is added. When the 257th element is added the binary search square is split into a binary search cube with 3 dimensions, though this split can happen at the 129th element depending on the distribution. When the 65537th element is added the cube turns into a 4 dimensional tesseract. And so on.
Using static axis sizes will eliminate the need to resize Z axes as the cube grows and the average insertion (moving 8 elements) will become faster than O(log n) once a size of 256 elements is reached.
Looking up an index is an O(log n) operation, which would be the main advantage, and possibly the only advantage as there would be several drawbacks.
Cubesort
The average sorting speed is O(n log n) for any type of data. Cubesort is stable.
In order insertions and searches within a binary search cube give a significant cache advantage. Insertions to the end are very fast memory operations, while insertions to the front are very fast search operations.
A binary search cube can be much more rapidly converted to a 1 dimensional array (and back) than any other data structure. This gives an additional speed advantage when sorting arrays compared to tree based sorts.
Cubesort is fast for mostly sorted data. In the case of random data it exhausts the CPU cache at about 100K elements. It is well-suited as an online sort.
External Cubesort
The Z axis can be stored externally. If a binary search penteract (5 dimensions) is used with a volume of 10 billion nodes the V, W, X, and Y axes on a 32 bit system have a memory requirement of 800 MB. The disk space requirement for the Z axis will be 200 GB, existing of 1 billion files, each containing 2 KB of data (assuming there's a 4 byte key, and a 16 byte value). Only one file would need to be accessed for each key lookup.
If the filename of each file exists of the floor key it's possible to rebuild the cube using scandir() without having to read any files.
One Z axis can be kept in memory for each Y axis to function as a cache which would increase the memory usage to 3.2 GB.
It's also possible to have no memory requirements at all and use 5 scandir() calls to find a file.
Parallel Cubesort
As sorting partially sorted data with cubesort is substantially faster it's possible to divide the data in blocks of 100,000 elements and sort these in parallel, while the main process sorts the pre-sorted blocks one at a time.
Mergesort is poorly suited for this approach, especially when trying to merge 100K elements into a larger 10M elements array.
Fixed axis size
The lower axes can be given a fixed maximum size. The Z axes can be given a size of 32, the Y axes a size of 64, the X axes a size of 128, with the W axis of variable size. This type of search tesseract (4 dimensions) doesn't require resizing as it grows, has reasonable performance for both smaller and larger sizes, and allows fast insertions.
Resizing the search tesseract as it shrinks is optional and can be handled by keeping the average Z axis length above 8.
Fixed key size
If the key size is fixed (typically when the key is an integer) it's possible to implement a cube as a trie. A binary search is used to find a key fragment, with the exception that a key fragment can be found in O(1) time when it falls within the section of the axis that is in order.
When using this approach resizing is never necessary. Subsequently ideal distribution isn't guaranteed.
Iterating a binary search cube
Iterating is similar to iterating a multi-dimensional array. The main problem is how to handle changes to the cube during iteration. As a solution a global state can be introduced that is checked whenever an element is added or removed. Tree structures have the same problem but are much more robust.
Advantages
- Low memory overhead.
- Iterating a search cube is faster than a tree, making it an ideal data structure for programming languages.
- O(1) insertion of sorted data.
- Innate support for O(root n) index searches.
- Balancing is fast and infrequent, at regular intervals.
Disadvantages
- High memory overhead for small data sets.
- Complex data arrangement to increase cache utilization.
Gregorius van den Hoven, June 2014