Grokking Algorithms #5: Hash Tables

This post covers Chapter 5 of Grokking Algorithms, which introduces hash tables.

Hash Table

A hash table is a data structure that provides quick access to key-value pairs. It does this by using a hash function to determine where in memory a particular value is stored.

Since each key in a hash table must be unique, a key can predictably be translated into a memory offset.

Hash Function

This predictability is called determinism. A function that is deterministic always produces the same output when called with a particular input1.

In the case of a hash function, the function will produce the same memory offset for a particular key. This allows us to know the memory address of a given key’s associated value without storing that mapping in memory, which allows for constant-time retrieval. That’s fast!

Grocery Prices

We can define a map of grocery prices in Clojure as follows:

(def book
  {"apple"   0.67
   "milk"    1.49
   "avocado" 1.49})

This is fine if we know up front all the groceries that we plan to sell and their prices, but what if we want to add a new type of product to our inventory at a later time?

Immutability

One of Clojure’s strengths comes from its immutable data types, which cannot be changed once set. By default, all value types in Clojure are immutable.

This results in code that is easier to reason about, as unintended side effects are minimized. Additionally, concurrency issues are avoided and code can more easily be parallelized.

In order to allow changes to our inventory, we’ll use an atom.

Atoms

Atoms are one of Clojure’s mutable data types, allowing a value to be defined and subsequently updated.

(def book (atom {}))

(swap! book assoc "apple" 0.67)
(swap! book assoc "milk" 1.49)
(swap! book assoc "avocado" 1.49)

We start by defining book as an atom with a starting value of an empty map.

Let’s say we find out that we’re going to start selling apples at a price of $0.67 each. We can add that mapping to our atom by calling the swap! function. Here, we’re saying, update the atom book by calling assoc on it with the arguments "apple" and 0.67. We do the same for milk and avocado, both priced at $1.49.

To print out the value, we need to dereference the book atom to get its current value. We do this by prefixing book with @. If we don’t dereference the atom, we would get a reference to it rather than its current value.

; Print the reference to book
(println book)
=> #object[clojure.lang.Atom 0x378b038e {:status :ready, :val {apple 0.67, milk 1.49, avocado 1.49}}]

; Print the current value of book
(println @book)
=> {apple 0.67, milk 1.49, avocado 1.49}

To retrieve the price of a specific grocery, you would run the following:

(get @book "apple")

=> 0.67

Voting Booth

To build a voting system that only allows each person to vote once, we’ll use an atom again.

(def voted (atom {}))

(defn check-voter
  [name]
  (if (get @voted name)
    (println "kick them out!")
    (do
      (swap! voted assoc name true)
      (println "let them vote!"))))

We start by defining the variable voted, which is initialized to an atom with a starting value of {}.

The check-voter function will look for an entry in the current value of voted by the key name. If a value of true is found, we print out "kick them out!". Otherwise, we update voted with a new entry for the key name with a value of true and print out "let them vote!".

Let’s try it out!

(check-voter "tom")
=> let them vote!

(check-voter "mike")
=> let them vote!

(check-voter "mike")
=> kick them out!

Key Takeaways

  • Hash tables provide constant-time retrieval of stored values.
  • Atoms in Clojure provide a thread-safe mutable data type.

Footnotes

  1. Determinism is one of the benefits of pure functions, which makes them easy to test, since the output depends only on the inputs and not on any external factors (database records, the current time, etc.). ↩︎

Leave a comment