Extends the the Java™ Collections Framework by providing type-specific maps, sets, lists and priority queues with a small memory footprint and fast access and insertion; it also includes a fast I/O API for binary and text files. It is free software distributed under the GNU Lesser General Public License.
The classes of this package specialize the most useful {@link java.util.HashSet}, {@link java.util.HashMap}, {@link java.util.LinkedHashSet}, {@link java.util.LinkedHashMap}, {@link java.util.TreeSet}, {@link java.util.TreeMap}, {@link java.util.IdentityHashMap}, {@link java.util.ArrayList} and {@link java.util.Stack} classes to versions that accept a specific kind of key or value (e.g., {@linkplain it.unimi.dsi.fastutil.ints.IntSet integers}). Besides, there are also several types of {@linkplain it.unimi.dsi.fastutil.PriorityQueue priority queues} and a large collection of static objects and methods (such as {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET immutable empty containers}, {@linkplain it.unimi.dsi.fastutil.ints.IntComparators#OPPOSITE_COMPARATOR comparators implementing the opposite of the natural order}, {@linkplain it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[]) iterators obtained by wrapping an array} and so on.
To understand what's going on at a glance, the best thing is to look at the examples provided. If you already used the Collections Framework, everything should look rather natural. If, in particular, you use an IDE such as Eclipse, which can suggest you the method names, all you need to know is the right name for the class you need.
fastutil
5The release 5 of fastutil
needs Java 5, and
marks a revolution in the way the
classes are organized. Besides full support for generics, which can help
with object-based classes (but is rather irrelevant in type-specific classes),
fastutil
is now strongly based on what is probably the less hyped
and most important feature of Java 5: covariant return-type overriding.
A method x()
returning an object of type T
can now
be overriden by a method returning an object of type U
, where
U
is a subclass of (or implements) T
.
Covariant return-type overriding has always been supported by the JVM,
but was inaccessible at the syntax level. In Java 5 this irritant limitation
has been lifted, opening a new world of clarity and orthogonality in organizing
collection classes. In particular, interface strengthening makes it
possible to use all features of fastutil
classes without type
casting: for instance, {@link it.unimi.dsi.fastutil.ints.IntSet#iterator()}
is strengthened, w.r.t. {@link java.util.Set#iterator()}, so that it
returns an {@link it.unimi.dsi.fastutil.ints.IntIterator} instead of a
simple {@link java.util.Iterator}. Although type casting was previously used to simulate this
feature, it made using the hundreds of classes in fastutil
difficult and error prone, as every type cast had to be checked against the documentation.
Another advantage of covariant return-type overriding is that implementations can declare to return more specific (usually, more powerful) types than those specified by their interface. For instance, {@link it.unimi.dsi.fastutil.ints.IntAVLTreeSet#iterator()} returns actually an {@link it.unimi.dsi.fastutil.ints.IntListIterator IntListIterator}, whereas {@link it.unimi.dsi.fastutil.ints.IntSortedSet#iterator()} is specified as returning just a {@link it.unimi.dsi.fastutil.ints.IntListIterator IntBidirectionalIterator}.
Unfortunately, the Java™ Collections Framework
has obvious legacy problems, so the {@linkplain java.util.Map#keySet() key set}
of a {@link java.util.SortedMap} is not a {@link java.util.SortedSet}:
fastutil
5 makes a break with the past, giving the best possible
interface and implementation return types. This entails a number of source
incompatibilities, which we feel are worth the effort.
All data structures in fastutil
implement their standard
counterpart interface (e.g., {@link java.util.Map} for maps). Thus, they
can be just plugged into existing code, using the standard access methods
(of course, any attempt to use the wrong type for keys or values will
produce a {@link java.lang.ClassCastException}). However, they also provide
(whenever possible) many polymorphic versions of the most used methods that
avoid boxing/unboxing (and that before Java 5 caused
the tedious “type juggling” that was well known to Java
programmers). In doing so, they implement more stringent interfaces that
extend and strengthen the standard ones (e.g., {@link
it.unimi.dsi.fastutil.ints.Int2IntSortedMap} or {@link
it.unimi.dsi.fastutil.ints.IntListIterator}).
Of course, the main point of type-specific data structures is that the absence of wrappers around primitive types can increas speed and reduce space occupancy by several times. The presence of generics in Java does not change this fact, since there is no genericity for primitive types: if you are interested in performance, I suggest to turn on warnings for automatic boxing/unboxing of primitive types, and use type-specific methods instead.
The implementation techniques used in fastutil
are quite
different than those of {@link java.util}: for instance, open-addressing
hash tables, threaded AVL trees, threaded red-black trees and exclusive-or
lists. An effort has also been made to provide powerful derived objects and
to expose them overriding covariantly return types:
for instance, the {@linkplain it.unimi.dsi.fastutil.objects.Object2IntSortedMap#keySet() keys of sorted maps
are sorted} and iterators on sorted containers are always {@linkplain
it.unimi.dsi.fastutil.BidirectionalIterator bidirectional}.
More generally, the rationale behing fastutil
is that
you should never need to code explicitly natural
transformations. You do to not need to define an anonymous class to
iterate over an array of integers—just {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[]) wrap it}. You do not
need to write a loop to put the characters returned by an iterator into a
set—just {@linkplain
it.unimi.dsi.fastutil.chars.CharOpenHashSet#CharOpenHashSet(CharIterator)
use the right constructor}. And so on.
for collections, and
for maps.
By "type" here I mean a capitalized primitive type, {@link
java.lang.Object} or Reference
. In the latter case, we
are treating objects, but their equality is established by reference
equality (that is, without invoking equals()
), similarly
to {@link java.util.IdentityHashMap}. Of course, reference-based
classes are significantly faster.
Thus, an {@link it.unimi.dsi.fastutil.ints.IntOpenHashSet} stores integers efficiently and implements {@link it.unimi.dsi.fastutil.ints.IntSet}, whereas a {@link it.unimi.dsi.fastutil.longs.Long2IntAVLTreeMap} does the same for maps from longs to integers (but the map will be sorted, tree based, and balanced using the AVL criterion), implementing {@link it.unimi.dsi.fastutil.longs.Long2IntMap}. If you need additional flexibility in choosing your {@linkplain it.unimi.dsi.fastutil.Hash.Strategy hash strategy}, you can put, say, arrays of integers in a {@link it.unimi.dsi.fastutil.objects.ObjectOpenCustomHashSet}, maybe using the ready-made {@linkplain it.unimi.dsi.fastutil.ints.IntArrays#HASH_STRATEGY hash strategy for arrays}. A {@link it.unimi.dsi.fastutil.longs.LongLinkedOpenHashSet} stores longs in a hash table, but provides a predictable iteration order (the insertion order) and access to first/last elements of the order. A {@link it.unimi.dsi.fastutil.objects.Reference2ReferenceOpenHashMap} is similar to an {@link java.util.IdentityHashMap}. You can manage a priority queue of characters in a heap using a {@link it.unimi.dsi.fastutil.chars.CharHeapPriorityQueue}, which implements {@link it.unimi.dsi.fastutil.chars.CharPriorityQueue}. {@link it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList Front-coded lists} are highly specialized immutable data structures that store compactly a large number of arrays: if you don't know them you probably don't need them.
For a number of data structures that were not available in the
Java™ Collections Framework
when fastutil
was created, an object-based version is
contained {@link it.unimi.dsi.fastutil}, and in that case the prefix
Object
is not used (see, e.g., {@link it.unimi.dsi.fastutil.PriorityQueue}).
Since there are eight primitive types in Java, and we support
reference-based containers, we get 1323 (!) classes (some nonsensical
classes, such as Boolean2BooleanAVLTreeMap
, are not
generated). Many classes are generated just to mimic the hierarchy of
{@link java.util} so to redistribute common code in a similar way. There
are also several abstract classes that ease significantly the creation of
new type-specific classes by providing automatically generic methods based
on the type-specific ones.
The huge number of classes required a suitable division in subpackages
(more than anything else, to avoid crashing browsers with a preposterous
package summary). Each subpackage is characterized by the type of elements
or keys: thus, for instance, {@link it.unimi.dsi.fastutil.ints.IntSet}
belongs to {@link it.unimi.dsi.fastutil.ints} (the plural is required, as
int
is a keyword and cannot be used in a package name), as
well as {@link it.unimi.dsi.fastutil.ints.Int2ReferenceRBTreeMap}. Note
that all classes for non-primitive elements and keys are gathered in {@link
it.unimi.dsi.fastutil.objects}. Finally, a number of non-type-specific
classes have been gathered in {@link it.unimi.dsi.fastutil}.
The following table summarizes the available interfaces and implementations. To get more information, you can look at a specific implementation in {@link it.unimi.dsi.fastutil} or, for instance, {@link it.unimi.dsi.fastutil.ints}.
Interfaces | Abstract Implementations | Implementations |
---|---|---|
Collection | AbstractCollection | |
Set | AbstractSet | OpenHashSet, OpenCustomHashSet |
SortedSet | AbstractSortedSet | RBTreeSet, AVLTreeSet, LinkedOpenHashSet |
Map | AbstractMap | OpenHashMap, OpenCustomHashMap |
SortedMap | AbstractSortedMap | RBTreeMap, AVLTreeMap, LinkedOpenHashMap |
List | AbstractList | ArrayList, ArrayFrontCodedList |
PriorityQueue† | AbstractPriorityQueue† | HeapPriorityQueue, ArrayPriorityQueue |
IndirectPriorityQueue† | AbstractIndirectPriorityQueue† | HeapSemiIndirectPriorityQueue, HeapIndirectPriorityQueue, ArrayIndirectPriorityQueue |
IndirectDoublePriorityQueue† | AbstractIndirectDoublePriorityQueue† | HeapSesquiIndirectDoublePriorityQueue, HeapIndirectDoublePriorityQueue, ArrayIndirectDoublePriorityQueue |
Stack† | AbstractStack† | ArrayList |
Iterator | AbstractIterator | |
Comparator | AbstractComparator | |
BidirectionalIterator† | AbstractBidirectionalIterator | |
ListIterator | AbstractListIterator |
†: this class has also a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
Note that abstract implementations are named by prefixing the interface name with Abstract. Thus, if you want to define a type-specific structure holding a set of integers without the hassle of defining object-based methods, you should inherit from {@link it.unimi.dsi.fastutil.ints.AbstractIntSet}.
The following table summarizes static containers, which usually give rise both to a type-specific and to a generic class:
Static Containers |
---|
Collections |
Sets |
SortedSets |
Maps† |
SortedMaps |
Lists |
Arrays† |
Heaps |
SemiIndirectHeaps |
IndirectHeaps |
PriorityQueues† |
IndirectPriorityQueues† |
IndirectDoublePriorityQueues† |
Iterators |
Comparators |
Hash‡ |
HashCommon‡ |
†: this class has also a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
‡: this class has only a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
The static containers provide also special-purpose implementations for
all kinds of {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET empty
structures} (including {@linkplain
it.unimi.dsi.fastutil.objects.ObjectArrays#EMPTY_ARRAY arrays}) and
{@linkplain it.unimi.dsi.fastutil.ints.Int2IntMaps#singleton(int,int) singletons}.
Note: before fastutil
5, all empty set and
iterator structures had a single implementation. The introduction of generics
has made necessary an {@linkplain it.unimi.dsi.fastutil.ints.IntSets#EMPTY_SET empty set of integers},
an {@linkplain it.unimi.dsi.fastutil.bytes.ByteSets#EMPTY_SET empty set of bytes}, etc.
All classes are not synchronized. If multiple threads access one of these classes concurrently, and at least one of the threads modifies it, it must be synchronized externally. Iterators will behave unpredictably in the presence of concurrent modifications. Reads, however, can be carried out concurrently.
Reference-based classes violate the {@link java.util.Map}
contract. They intentionally compare objects by reference, and do
not use the equals()
method. They should be used only
when reference-based equality is desired (for instance, if all
objects involved are canonized, as it happens with interned strings).
Linked classes do not implement wholly the {@link java.util.SortedMap} interface. They provide methods to get the first and last element in iteration order, but any submap or subset method will cause an {@link java.lang.UnsupportedOperationException} (this may change in future versions).
Substructures in sorted classes allow the creation of
arbitrary substructures. In {@link java.util}, instead, you
can only create contained sub-substructures (BTW, why?). For instance,
(new TreeSet()).tailSet(1).tailSet(0)
will throw an exception, but {@link
it.unimi.dsi.fastutil.ints.IntRBTreeSet (new
IntRBTreeSet()).tailSet(1).tailSet(0)} won't.
Immutability is syntactically based (as opposed to
semantically based). All methods that are known not to be
causing modifications to the structure at compile time will not throw
exceptions (e.g., {@link it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET
EMPTY_SET.clear()}). All other methods will cause an {@link
java.lang.UnsupportedOperationException}. Note that (as of Java 5)
the situation in {@link java.util} is definitely different, and
inconsistent: for instance, in singletons add()
always
throws an exception, whereas remove()
does it only if the
singleton would be modified. This behaviour agrees with the interface documentation,
but it is nonetheless confusing.
The new interfaces add some very natural methods and strengthen many of
the old ones. Moreover, whenever possible, the object returned is type-specific,
or implements a more powerful interface. Before fastutil
5, the
impossibility of overriding covariantly return types made these features
accessible only by means of type casting, but fortunately this is no longer true.
More in detail:
fastutil
type you
would expect (e.g., the keys of an {@link
it.unimi.dsi.fastutil.ints.Int2LongSortedMap} are an {@link
it.unimi.dsi.fastutil.ints.IntSortedSet} and the values are a {@link
it.unimi.dsi.fastutil.longs.LongCollection}).
fastutil
type you would expect, too.
iterator()
are type-specific.
fastutil
return
type-specific {@linkplain
it.unimi.dsi.fastutil.BidirectionalIterator bidirectional
iterators}. This means that you can move back and forth among
entries, keys or values.
iterator(from)
which creates a type-specific {@link
it.unimi.dsi.fastutil.BidirectionalIterator} starting from a given
element of the domain (not necessarily in the set). See, for instance,
{@link it.unimi.dsi.fastutil.ints.IntSortedSet#iterator(int)}. The method is
implemented by all type-specific sorted sets and subsets.
new ObjectOpenHashSet( new String[] { "foo", "bar" } )
or just "unroll" the integers returned by an iterator into a list with
new IntArrayList( iterator )
There are a few quirks, however, that you should be aware of:
null
to denote the absence of a certain
pair. Rather, they return a {@linkplain
it.unimi.dsi.fastutil.ints.Int2LongMap#defaultReturnValue(long) default
return value}, which is set to 0 cast to the
return type (false
for booleans) at creation, but
can be changed using the defaultReturnValue()
method (see, e.g., {@link
it.unimi.dsi.fastutil.ints.Int2IntMap}). Note that changing the
default return value does not change anything about the data
structure; it is just a way to return a reasonably meaningful
result—it can be changed at any time. For uniformity reasons,
even maps returning objects can use
defaultReturnValue()
(of course, in this case the
default return value is initialized to null
). A
submap or subset has an independent default return value (which
however is initialized to the default return value of the
originator).rem()
on variables that are collections, but not
sets—for instance, {@linkplain
it.unimi.dsi.fastutil.ints.IntList type-specific lists}.
fastutil
provides a number of static methods and
singletons, much like {@link java.util.Collections}. To avoid creating
classes with hundreds of methods, there are separate containers for
sets, lists, maps and so on. Generic containers are placed in {@link
it.unimi.dsi.fastutil}, whereas type-specific containers are in the
appropriate package. You should look at the documentation of the
static classes contained in {@link it.unimi.dsi.fastutil}, and in
type-specific static classes such as {@link
it.unimi.dsi.fastutil.chars.CharSets}, {@link
it.unimi.dsi.fastutil.floats.Float2ByteSortedMaps}, {@link
it.unimi.dsi.fastutil.longs.LongArrays}, {@link
it.unimi.dsi.fastutil.floats.FloatHeaps} and {@link
it.unimi.dsi.fastutil.doubles.DoublePriorityQueues}. Presently, you can easily
obtain {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET empty collections},
{@linkplain it.unimi.dsi.fastutil.longs.Long2IntMaps#EMPTY_MAP empty
type-specific collections}, {@linkplain
it.unimi.dsi.fastutil.ints.IntLists#singleton(int) singletons},
{@linkplain
it.unimi.dsi.fastutil.objects.Object2ReferenceSortedMaps#synchronize(Object2ReferenceSortedMap)
synchronized versions} of any type-specific container and
unmodifiable versions of {@linkplain
it.unimi.dsi.fastutil.objects.ObjectLists#unmodifiable(ObjectList)
containers} and {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#unmodifiable(IntBidirectionalIterator) iterators} (of course,
unmodifiable containers always return unmodifiable iterators).
On a completely different side, the {@linkplain it.unimi.dsi.fastutil.ints.IntArrays type-specific static container classes for arrays} provide several useful methods that allow to treat an array much like an array-based list, hiding completely the growth logic. In many cases, using this methods and an array is even simpler then using a full-blown {@linkplain it.unimi.dsi.fastutil.doubles.DoubleArrayList type-specific array-based list} because elements access is syntactically much simpler. The version for objects uses reflection to return arrays of the same type of the argument.
For the same reason, fastutil
provides a full
implementation of methods that manipulate arrays as type-specific
{@linkplain it.unimi.dsi.fastutil.ints.IntHeaps heaps}, {@linkplain
it.unimi.dsi.fastutil.ints.IntSemiIndirectHeaps semi-indirect heaps} and
{@linkplain it.unimi.dsi.fastutil.ints.IntIndirectHeaps indirect heaps}.
Finally, fastutil
includes an {@linkplain
it.unimi.dsi.fastutil.io I/O package} that provides {@linkplain
it.unimi.dsi.fastutil.io.FastBufferedInputStream fast, unsynchronized
buffered input streams}, {@linkplain
it.unimi.dsi.fastutil.io.FastBufferedOutputStream fast, unsynchronized
buffered output streams}, and a wealth of static methods to store and
retrieve data in {@linkplain it.unimi.dsi.fastutil.io.TextIO textual} and
{@linkplain it.unimi.dsi.fastutil.io.BinIO binary} form.
fastutil
provides type-specific iterators and
comparators. The interface of a fastutil
iterator is
slightly more powerful than that of a {@link java.util} iterator, as
it contains a {@link it.unimi.dsi.fastutil.objects.ObjectIterator#skip(int)
skip()} method that allows to skip over a list of elements (an
{@linkplain
it.unimi.dsi.fastutil.objects.ObjectBidirectionalIterator#back(int) analogous
method} is provided for bidirectional iterators). For objects (even
those managed by reference), the extended interface is named {@link
it.unimi.dsi.fastutil.objects.ObjectIterator}; it is the return type, for
instance, of {@link
it.unimi.dsi.fastutil.objects.ObjectCollection#iterator()}.
fastutil
provides also classes and methods that makes it
easy to create type-specific iterators and comparators. There are abstract versions of
each (type-specific) iterator and comparator that implement in the
obvious way some of the methods (see, e.g., {@link
it.unimi.dsi.fastutil.ints.AbstractIntIterator} or {@link
it.unimi.dsi.fastutil.ints.AbstractIntComparator}).
A plethora of useful static methods is also provided by various type-specific static containers (e.g., {@link it.unimi.dsi.fastutil.ints.IntIterators}) and {@link it.unimi.dsi.fastutil.ints.IntComparators}: among other things, you can {@linkplain it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[]) wrap arrays} and {@linkplain it.unimi.dsi.fastutil.ints.IntIterators#asIntIterator(java.util.Iterator) standard iterators} in type-specific iterators, {@linkplain it.unimi.dsi.fastutil.ints.IntIterators#fromTo(int,int) generate them} giving an interval of elements to be returned, {@linkplain it.unimi.dsi.fastutil.objects.ObjectIterators#concat(ObjectIterator[]) concatenate them} or {@linkplain it.unimi.dsi.fastutil.objects.ObjectIterators#pour(Iterator,ObjectCollection) pour them} into a set.
fastutil
offers three types of queues: direct
queues, indirect queues and double indirect
queues. A direct queue offers type-specific method to {@linkplain
it.unimi.dsi.fastutil.longs.LongPriorityQueue#enqueue(long) enqueue} and
{@linkplain it.unimi.dsi.fastutil.longs.LongPriorityQueue#dequeueLong()
dequeue} elements. An indirect queue needs a reference array,
specified at construction time: {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#enqueue(int) enqueue} and
{@linkplain it.unimi.dsi.fastutil.IndirectPriorityQueue#dequeue()
dequeue} operations refer to indices in the reference array. The advantage
is that it may be possible to {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#changed(int) notify the change}
of any element of the reference array, or even to {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#remove(int) remove an arbitrary
element}. Finally, {@linkplain it.unimi.dsi.fastutil.IndirectDoublePriorityQueue double indirect queues} maintain at the same time two orders
(very commonly, opposite orders).
Queues have two implementations: a trivial array-based implementation, and a heap-based implementation. In particular, heap-based indirect queues may be {@linkplain it.unimi.dsi.fastutil.objects.ObjectHeapIndirectPriorityQueue fully indirect} or just {@linkplain it.unimi.dsi.fastutil.objects.ObjectHeapSemiIndirectPriorityQueue semi-indirect}: in the latter case, there is no need for an explicit indirection array (which saves one integer per queue entry), but not all operations will be avalable. Correspondingly, double priority queues have a {@linkplain it.unimi.dsi.fastutil.ints.IntHeapSesquiIndirectDoublePriorityQueue sesqui} (i.e., one-and-a-half) indirect implementation and a {@linkplain it.unimi.dsi.fastutil.ints.IntHeapIndirectDoublePriorityQueue fully indirect implementation}.
Sometimes, the behaviour of the built-in equality and hashing methods is
not what you want. In particular, this happens if you store in a hash-based
collection arrays, and you would like to compare them by equality. For this kind of applications,
fastutil
provides {@linkplain it.unimi.dsi.fastutil.Hash.Strategy custom hash strategies},
which define new equality and hashing methods to be used inside the collection. There are even
{@linkplain it.unimi.dsi.fastutil.ints.IntArrays#HASH_STRATEGY ready-made strategies} for arrays. Note, however,
that fastutil
containers do not cache hash codes, so custom hash strategies must be efficient.
fastutil
provides a wide range of abstract classes, to
help in implementing its interfaces. They take care, for instance, of
providing wrappers for non-type-specific method calls, so that you have to
write just the (usually simpler) type-specific version.
The main reason behind fastutil
is performance, both in
time and in space. The relevant methods of type-specific hash maps and sets
are something like 2 to 10 times faster than those of the standard
classes. Note that performance of hash-based classes on object keys is
usually worse (from a few percent to doubled time) than that of
{@link java.util}, because fastutil
classes do not cache hash
codes (albeit it will not be that bad if keys cache internally hash codes,
as in the case of {@link java.lang.String}). Of course, you can try to get
more speed from hash tables using a small load factors: to this purpose,
alternative load factors are proposed in {@link it.unimi.dsi.fastutil.Hash#FAST_LOAD_FACTOR}
and {@link it.unimi.dsi.fastutil.Hash#VERY_FAST_LOAD_FACTOR}.
For tree-based classes you have two choices: AVL and red-black trees. The essential difference is that AVL trees are more balanced (their height is at most 1.44 log n), whereas red-black trees have faster deletions (but their height is at most 2 log n). So on small trees red-black trees could be faster, but on very large sets AVL trees will shine. In general, AVL trees have slightly slower updates but faster searches; however, on very large collections the smaller height may lead in fact to faster updates, too.
fastutil
reduces enormously the creation and collection of
objects. First of all, if you use the polymorphic methods and iterators no
wrapper objects have to be created. Moreover, since fastutil
uses open-addressing hashing techniques, creation and garbage collection of
hash-table entries are avoided (but tables have to be rehashed whenever
they are filled beyond the load factor). The major reduction of the number
of objects around has a definite (but very difficult to measure) impact on
the whole application (as garbage collection runs proportionally to the
number of alive objects).
Whenever possible, fastutil
tries to gain some speed by
checking for faster interfaces: for instance, the various set-theoretic
methods addAll()
, retainAll()
, ecc. check whether
their arguments are type-specific and use faster iterators and accessors
accordingly.
Since deletions in hash tables are handled simply by tagging, they are very fast per se, but they tend to slow down subsequent accesses (with respect to a table with no deleted entries). In highly dynamical situations, where entries are continuously created and deleted, unsuccessful searches may take linear time (as all entries must be probed).
A partial solution to this problem (which has no known complete
solution if you use open addressing with double
hashing—cfr. Knuth's section on hashing in the third volume of
The Art of Computer Programming) is to call the
rehash()
method, which will try to rebuild the table
remapping all keys. There are also trim()
methods that
will reduce the table size if possible.
In other words, if your application requires inextricably interleaved
insertions, deletions and queries open-addressing hash-table
implementations (and in particular fastutil
classes) are not
the right choice.
Note, however, that fastutil
implements a special
optimization, usually not found elsewhere, that speeds up probes for
recently deleted entries. More details can be found in the documentation of
the {@link it.unimi.dsi.fastutil.Hash} interface.
The case of linked tables is even more problematic: the deletion of an item requires a linear probe of the links until the item is found, and thus it has potentially linear cost (however, this is not true if the deletion is performed by means of an iterator, or if you delete the last element).
The absence of wrappers makes data structures in fastutil
much smaller: even in the case of objects, however, data structures in
fastutil
try to be space-efficient.
To avoid memory waste, (unlinked) hash tables in
fastutil
keep no additional information about elements
(such as a list of keys). In particular, this means that enumerations
are always linear in the size of the table (rather than in the number
of keys). Usually, this would imply slower iterators. Nonetheless, the
iterator code includes a single, tight loop; moreover, it is possible
to avoid the creation of wrappers. These two facts make in practice
fastutil
iterators faster than {@link
java.util}'s.
The memory footprint for a table with n keys is exactly the memory required for the related types times n, plus a overhead of n bytes to store the state of each entry. The absence of wrappers around primitive types can reduce space occupancy by several times (this applies even more to serialized data, e.g., when you save such a data structure in a file). These figures can greatly vary with your virtual machine, JVM versions, CPU etc.
More precisely, when you ask for a map that will hold n elements with load factor 0 < f ≤ 1, p entries are allocated, where p is first prime in {@link it.unimi.dsi.fastutil.Hash#PRIMES} larger than n / f. Primes in {@link it.unimi.dsi.fastutil.Hash#PRIMES} are roughly multiplicatively spaced by 21/16, so you lose on average about 2% with respect to n / f.
When the table is filled up beyond the load factor, it is rehashed to a larger size. The growth is controlled by the growth factor, which can be set at any time. By default, the table size is doubled (for more information, see {@link it.unimi.dsi.fastutil.ints.IntOpenHashSet}), but you can trade speed for memory occupancy by {@linkplain it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap#growthFactor(int) setting a slower growth rate}.
In the case of linked hash tables, there is an additional vector of p integers that is used to store link information. Each element records the next and previous element indexes exclusive-or'd together. As a result, linked tables provide bidirectional iterators without having to store two pointers per entry (however, iterators starting from a given element require a linear probe to be initialized, unless the element is the last one).
Since hash codes are not cached, equality on objects is checked first by checking equality of their {@link java.lang.Object#hashCode() hashCode()}, and then using {@link java.lang.Object#equals(Object) equals()}. This turns out to increase slightly the performance, as many classes (including {@link java.lang.String}) cache their hash codes; in any case, the speed cannot reach that of {@link java.util}'s hash classes, which cache hash codes.
The balanced trees implementation is also very parsimonious.
fastutil
is based on the excellent (and unbelievably well
documented) code contained in Ben Pfaff's GNU libavl, which describes in
detail how to handle balanced trees with threads. Thus, the
overhead per entry is two pointers and one integer, which compares well to
three pointers plus one boolean of the standard tree maps. The trick is
that we use the integer bit by bit, so we consume two bits to store thread
information, plus one or two bits to handle balancing. As a result, we get
bidirectional iterators in constant space and amortized constant time
without having to store references to parent nodes.
It should be mentioned that all tree-based classes have a fixed overhead for some arrays that are used as stacks to simulate recursion; in particular, we need 48 booleans for AVL trees and 64 pointers plus 64 booleans for red-black trees.
Suppose you want to store a sorted map from longs to integers. The first step is to define a variable of the right interface, and assign it a new tree map (say, of the AVL type):
Long2IntSortedMap m = new Long2IntAVLTreeMap();
Now we can easily modify and access its content:
m.put( 1, 5 ); m.put( 2, 6 ); m.put( 3, 7 ); m.put( 1000000000L, 10 ); m.get( 1 ); // This method call will return 5 m.get( 4 ); // This method call will return 0
We can also try to change the default return value:
m.defaultReturnValue( -1 ); m.get( 4 ); // This method call will return -1
We can obtain a type-specific iterator on the key set:
LongBidirectionalIterator i = m.keySet().iterator(); // Now we sum all keys long s = 0; while( i.hasNext() ) s += i.nextLong();
We now generate a head map, and iterate bidirectionally over it starting from a given point:
// This map contains only keys smaller than 4 Long2IntSortedMap m1 = m.headMap( 4 ); // This iterator is positioned between 2 and 3 LongBidirectionalIterator t = m1.keySet().iterator( 2 ); t.previous(); // This method call will return 2 (t.next() would return 3)
Should we need to access the map concurrently, we can wrap it:
// This map can be safely accessed by many threads Long2IntSortedMap m2 = Long2IntSortedMaps.synchronize( m1 );
Linked maps are very flexible data structures which can be used to implement, for instance, queues whose content can be probed efficiently:
// This map remembers insertion order (note that we are using the array-based constructor) IntSortedSet s = new IntLinkedOpenHashSet( new int[] { 4, 3, 2, 1 } ); s.firstInt(); // This method call will return 4 s.lastInt(); // This method call will return 1 s.contains(5); // This method will return false IntBidirectionalIterator i = s.iterator( s.lastInt() ); // We could even cast it to a list iterator i.previous(); // This method call will return 1 i.previous(); // This method call will return 2 s.remove(s.lastInt()); // This will remove the last element in constant time
Now, we play with iterators. It is easy to create iterators over intervals or over arrays, and combine them:
IntIterator i = IntIterators.fromTo( 0, 10 ); // This iterator will return 0, 1, ..., 9 int[] a = new int[] { 5, 1, 9 }; IntIterator j = IntIterators.wrap( a ); // This iterator will return 5, 1, 9. IntIterator k = IntIterators.concat( new IntIterator[] { i , j } ); // This iterator will return 0, 1, ..., 9, 5, 1, 9
It is easy to build sets and maps on the fly using the array-based constructors:
IntSet s = new IntOpenHashSet( new int[] { 1, 2, 3 } ); // This set will contain 1, 2, and 3 Char2IntMap m = new Char2IntRBTreeMap( new char[] { '@', '-' }, new int[] { 0, 1 } ); // This map will map '@' to 0 and '-' to 1
Whenever you have some data structure, it is easy to serialize it in an efficient (buffered) way, or to dump their content in textual form:
BinIO.storeObject( s, "foo" ); // This method call will save s in the file named "foo" TextIO.storeInts( s.intIterator(), "foo.txt" ); // This method call will save the content of s in ASCII i = TextIO.asIntIterator( "foo.txt" ); // This iterator will parse the file and return the integers therein