Skip to content

4. Hashmaps

Think of a map or a dictionary as a collection of pairs with each pair having a key and a value.

hashmaps 1

Abstract Data Type: Map

Data:

  • A collection of pairs (k, v) of objects k and v, where k is the search key and v is the corresponding value.
  • The number of pairs in the collection.
Pseudocode UML Description
add(key, value) +add(key: K, value: V): void Adds the pair (k, v) to the dictionary.
remove(key) +remove(key: K): V Removes from the dictionary the entry that corresponds to a given search key.
get(key) +get(key: K): V Retrieves from the dictionary the value that corresponds to a given search key.
contains(key) +contains(key: K): boolean Sees whether any entry in the dictionary has a given search key.
getAllEntries() +getAllEntries: List<SimpleEntry<K, V>> Creates an ArrayList of all of the entries.
isEmpty() +isEmpty(): boolean Sees whether the dictionary is empty.
size() +size(): integer Gets the size of the dictionary.
clear() +clear(): void Removes all entries from the dictionary.

Interface of the ADT Map

import java.util.Iterator;

/** An interface for a dictionary with distinct search keys.
Search keys and associated values are not null. */
public interface DictionaryInterface<K, V> {
    /** Adds a new entry to this dictionary. If the given search key already exists in the dictionary, replaces the corresponding value.
    @param key An object search key of the new entry.
    @param value An object associated with the search key.
    @return Either null if the new entry was added to the dictionary or the value that was associated with key if that value was replaced. */
    public V add(K key, V value);

    /** Removes a specific entry from this dictionary.
    @param key An object search key of the entry to be removed.
    @return Either the value that was associated with the search key or null if no such object exists. */
    public V remove(K key);

    /** Retrieves from this dictionary the value associated with a given search key.
    @param key An object search key of the entry to be retrieved.
    @return Either the value that is associated with the search key or null if no such object exists. */
    public V get(K key);

    /** Sees whether a specific entry is in this dictionary.
    @param key An object search key of the desired entry.
    @return True if key is associated with an entry in the dictionary. */
    public boolean contains(K key);

    /** Creates an ArrayList of all of the entries.
    @return An ArrayList of all entries. */
    public List<SimpleEntry<K, V>> getAllEntries();

    /** Sees whether this dictionary is empty.
    @return True if the dictionary is empty. */
    public boolean isEmpty();

    /** Gets the size of this dictionary.
    @return The number of entries (key-value pairs) currently in the dictionary. */
    public int size();

    /** Removes all entries from this dictionary. */
    public void clear();
}

Hashing

When implementing a map, we will use an array with integer indexes for keys. The issue is essentially converting the key value into an integer index to store the value. We do this through hashing the key. Now if the total number of all possible keys is small, we can just use direct addressing to store key k in position $k$ of the array. However, this does not work with large possible keys, so we can hash the key into an integer with a hash function. Thankfully, every java Object has a hashCode() function that we can implement.

public int hashCode() {
    return 1;
}

So how do we define a good hashCode()? Which ever one that minimizes the total number of collisions. Sadly, there is not perfect hash function, so we have to deal with these collisisons somehow. One way is making a really good hashCode(). For example, with a string $S$ of length $N$ we can sum the ascii codes of each character: $\sum_{i=0}^{N}{rank(S[i])}$. This will still cause collisions, but it's better than using just the length for example. Using a polynomial hashing scheme we can achieve better results with some value $A\neq1$: $\sum_{i=0}^{N-1}{S[i]\cdot A^{N-1-i}}$. With this approach, we can get some really big numbers, so we must use compression mapping and the simplest way of doing it is: $|hashCode| \% N$. A better method is the multiply add and divide method: $|A\cdot hashCode+B| \% N$ with the best results when $A$, $B$, and $N$ are primes.

So does the polynomial hashing scheme and multiply add and divide methods solve our issues? No, collisions will still occur since they're unavoidable. What more can we do!!! Well, probing to search for the next open spot is one possible way. With linear probing you check if the current hashCode spot is open, if not, add one to the hashCode and continue till you get an open spot. This approach is fine, however, issues arise with deletion, tendency to form clusters, inefficiency with search, etc. So, let's try quadratic probing, it solves some of our issues, but not all of them, it will lessen the issues. Same idea as linear probing, but with a quadratic element to it (index $j$, length $N$, and interator $i$): $j=(hashCode + i^2) \% N$. We can also double hash to solve these issues with a secondary hash function stepHash(k): $j=|hashCode+i\cdot stepHash| \% N$. So far, we have only covered open addression, but what about separate chaining? Instead of going to the next available slot, we can create a chain of nodes to store the data at the correct hashCode. This is great, because we solve the issues of deletion, clustering, and all of that, but at the cost of space.

Implementation

This is an implementation from one of the recitation assignments.

import java.util.AbstractMap.SimpleEntry;
import java.util.*;

public class HashMap<K, V> implements DictionaryInterface<K, V> {
    // NOTES:
    // Using prime numbers lessen the number of collisions that will happen.
    // If the load factor > maxLoadFactor, then the hashmap will lose some efficiency. Load factors tell us when to resize our hashmap. 
    private static final int TABLE_SIZE[] = { 5, 11, 23, 47, 97, 197, 397, 797, 1597, 3203, 6421, 12853, 25717, 51437, 102877, 205759, 411527, 823117, 1646237, 3292489, 6584983, 13169977, 26339969, 52679969, 105359939, 210719881, 421439783, 842879579, 1685759167 };
    private static final int DEFAULT_CAPACITY = 11;
    private static final float DEFAULT_LOAD_FACTOR = 0.75f;

    private LinkedList<SimpleEntry<K, V>>[] table;
    private int size;
    private final float maxLoadFactor;

    @SuppressWarnings("unchecked")
    private LinkedList<SimpleEntry<K, V>>[] createTable(int capacity) {
        for (int primeCapacity : TABLE_SIZE) {
            if (primeCapacity >= capacity) {
                capacity = primeCapacity;
                break;
            }
        }
        LinkedList<SimpleEntry<K, V>> storage[] = new LinkedList[capacity];

        this.size = 0;
        return storage;
    }

    public HashMap(int capacity, float maxLoadFactor) {
        this.table = createTable(capacity);
        this.maxLoadFactor = maxLoadFactor;
    }

    public HashMap() {
        this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    public V get(K key) {
        if (key == null) {
            throw new NullPointerException();
        }

        int hash = key.hashCode() & Integer.MAX_VALUE;
        int index = hash % this.table.length;

        if (this.table[index] != null) {
            LinkedList<SimpleEntry<K, V>> list = this.table[index];
            Iterator<SimpleEntry<K, V>> iterator = list.iterator();

            while (iterator.hasNext()) {
                SimpleEntry<K, V> entry = iterator.next();

                if (entry.getKey().equals(key)) {
                    return entry.getValue();
                }
            }
        }

        return null;
    }

    public V add(K key, V value) {

        if (key == null || value == null) {
            throw new NullPointerException();
        }

        int hash = key.hashCode() & Integer.MAX_VALUE;
        int index = hash % this.table.length;

        if (this.table[index] == null) {
            this.table[index] = new LinkedList<SimpleEntry<K, V>>();
        }

        Iterator<SimpleEntry<K, V>> iterator = this.table[index].iterator();

        while (iterator.hasNext()) {
            SimpleEntry<K, V> entry = iterator.next();

            if (entry.getKey().equals(key)) {
                V oldValue = entry.getValue();

                entry.setValue(value);

                return oldValue;
            }
        }

        this.table[index].add(new SimpleEntry<>(key, value));
        this.size++;

        if ((float) this.size / this.table.length >= this.maxLoadFactor) {
            resizeRehash();
        }

        return null;
    }

    public V remove(K key) {
        int hash = key.hashCode() & Integer.MAX_VALUE;
        int index = hash % this.table.length;

        if (this.table[index] == null)
            return null;

        Iterator<SimpleEntry<K, V>> iter = this.table[index].iterator();
        while (iter.hasNext()) {
            SimpleEntry<K, V> entry = iter.next();
            if (!entry.getKey().equals(key))
                continue;
            iter.remove();
            this.size--;
            return entry.getValue();
        }
        return null;
    }

    public boolean contains(K key) {
        if (key == null) {
            throw new NullPointerException();
        }

        int hash = key.hashCode() & Integer.MAX_VALUE;
        int index = hash % this.table.length;

        if (this.table[index] != null) {
            Iterator<SimpleEntry<K, V>> iterator = this.table[index].iterator();

            while (iterator.hasNext()) {
                SimpleEntry<K, V> entry = iterator.next();
                if (entry.getKey().equals(key)) {
                    return true;
                }
            }
        }

        return false;
    }

    public List<SimpleEntry<K, V>> getAllEntries() {
        List<SimpleEntry<K, V>> list = new ArrayList<SimpleEntry<K, V>>();

        for (int i = 0; i < this.table.length; i++) {
            if (this.table[i] != null) {
                list.addAll(this.table[i]);
            }
        }

        return list;
    }

    private void resizeRehash() {
        LinkedList<SimpleEntry<K, V>>[] oldTable = this.table;
        this.table = createTable(oldTable.length * 2 + 1);

        for (int i = 0; i < oldTable.length; i++) {
            if (oldTable[i] != null) {
                Iterator<SimpleEntry<K, V>> iterator = oldTable[i].iterator();

                while (iterator.hasNext()) {
                    SimpleEntry<K, V> entry = iterator.next();

                    this.add(entry.getKey(), entry.getValue());
                }
            }
        }
    }

    public boolean isEmpty() {
        return size == 0;
    }

    public int size() {
        return size;
    }

    public void clear() {
        this.table = createTable(this.table.length);
    }
}

Next Page →