Grokking Algorithms #1: Binary Search

I’m awful at algorithms.

There, I said it. I could blame my college algorithms professor or the fact that I’ve never had to use fancy algorithms at work. But anytime I’m on the market for a new job, I’m confronted with the inevitable HackerRank or LeetCode challenge and I stumble, remembering my biggest weakness as a software engineer.

Now that I’m happy with my current job, I figure it’s a good time to brush up on my technical skills and tackle this knowledge gap of mine.

In this series, I’ll work through the book Grokking Algorithms by Aditya Bhargava. Each post will cover a chapter from the book and I’ll implement the presented algorithms in Clojure. I don’t intend to fully teach the concepts myself–the author does a much better job of that than I could–so it’s best that you follow along in the book.

If you already have experience with Clojure or another Lisp and want to improve your algorithms knowledge, this series is perfect for you. But even if you’re new to Clojure, this series can introduce you to the language and serve as a cookbook.

This post covers Chapter 1, which explores binary search.

Finding a Needle in a Haystack

At my first job, I worked at a company that provided merchant services to small businesses. One of the main projects I worked on allowed merchants to sign up for our services. The application was essentially a form containing a large number of fields. Once submitted, the logic on the server would validate the fields, massage the data, and then save to a database.

Occasionally, we would get a submission that would get past the validation logic and cause an exception. The error message didn’t make it clear which field was the problem, so I would take the JSON map of all of the fields and submit the form using Postman in our test environment.

Once I reproduced the error, I would submit the form again with only the first half of the fields. If the error came up, it meant the bad field was somewhere in the first half of the fields. If not, I would submit again with the second half. I would repeat this process until I found the single offending field.

I didn’t realize it at the time, but I was performing a binary search.

Binary Search

Binary search allows you to efficiently search an array of values by cutting the array in half over and over again. This only works with indexed collections1 containing sorted values.

We’ll start by defining a function that takes a sorted array of values, items, and a value to search for, item.

(defn binary-search
  [items item]
  
  )

Let’s also add a loop to our function. loop works well here because it doesn’t require us to know how many iterations we’ll need to make ahead of time. This means we can exit the loop as soon as we find the right value.

(defn binary-search
  [items item]
  (loop [low  0
         high (dec (count items))]
    )
  )

Here, we defined a loop with two arguments:

  • low with an initial value of 0
  • high with an initial value of (dec (count items)).

(count items) gives us the number of items in our items collection, while dec (short for decrement) decreases the value by one2. So the initial values of low and high are the indices of the first and last items in the items collection, respectively.

We need to add a couple of local variables to help us determine if we’ve found item or if we need to loop again. Let’s use let for this.

(defn binary-search
  [items item]
  (loop [low  0
         high (dec (count items))]
    (let [mid   (quot (+ low high) 2)
          guess (get items mid)]
      )
  ))

Here, we created a local scope with let and defined two variables scoped within the let form: mid and guess. mid is assigned the value (low + high) / 2, the midpoint between the lower and upper indices. guess is assigned the value in items at index mid.

Now we can check whether we’ve found our value or if we need to keep looking.

(defn binary-search
  [items item]
  (loop [low  0
         high (dec (count items))]
    (let [mid   (quot (+ low high) 2)
          guess (get items mid)]
      (cond
        (> low high)   :not-found                   ;not found
        (= guess item) mid                          ;found
        (> guess item) (recur low       (dec mid))  ;look in first half
        (< guess item) (recur (inc mid) high)))))   ;look in second half

We added a cond, which lets us branch on a series of conditions, returning the value corresponding to the first condition that resolves to a truthy3 value. Our cond has four scenarios:

  • low and high have crossed: return :not-found.
  • We found item: return its index, mid.
  • We guessed too high: recur with the new high value set to mid – 1.
  • We guessed too low: recur with the new low value set to mid + 1.

Brute Force

Now for comparison, let’s write another implementation using a simple, brute force approach.

(defn brute-force-search
  [items item]
  (loop [idx     0
         numbers items]
    (cond
      (= item (first numbers)) idx
      (empty? (rest numbers))  :not-found
      :else                    (recur (inc idx) (rest numbers)))))

We defined a brute force solution that loops through a sequence of numbers starting with the first element and exits a soon as the item is found.

It does this by keeping track of the current index with idx and the remaining items in items, which we’re calling numbers. On each iteration, it checks if the first item in numbers equals idx. If so, it returns the current index. If not, it will iterate again with idx incremented and the first item removed from numbers. If numbers contains no more items, we return :not-found.

Performance

Let’s compare the performance of our two solutions by searching for the number 78,900 in a sequence from 0 to 99,999. We’ll wrap our function call in a call to the time macro, which will evaluate the body and print the execution time.

(time
  (brute-force-search (range 100000) 78900))
"Elapsed time: 38.736485 msecs"
=> 78900

It took our brute force solution close to 39ms to find the number 78,900 in a list from 0 to 99,999.

Let’s see how our binary search algorithm does.

(let [items (vec (range 100000))]
  (time
    (binary-search items 78900)))
"Elapsed time: 0.092191 msecs"
=> 78900

As you can see, there’s a huge difference in the elapsed time of the brute force approach (~39ms) compared to the binary search approach (0.09ms). For larger collections, the performance difference would be even greater.

Conclusion

We implemented two functions that accomplish the same goal–finding an item in a sequential collection–but in different ways. Each approach took nearly the same amount of code.

Our brute force function is simpler to think about, but its performance is not great. On the other hand, our binary search implementation is more complex, but its performance if far superior.

Keep in mind that many algorithms depend on a specific data structure in order to work. In the case of binary search, that data structure is an indexed, sorted collection. Without it, the algorithm is useless. This is why algorithms are generally discussed in conjunction with the data structures they use.

Footnotes

  1. An indexed collection, like an array or vector, allows you to access any of its elements in constant time. ↩︎
  2. Array indices in most programming languages are zero-indexed, meaning they start from 0 and end with one less than the length of the array. So if you have an array of length 10, the indices go from 0 to 9. ↩︎
  3. A truthy value is one that is treated the same as true. In Clojure, false and nil are considered falsey, while everything else is considered truthy. ↩︎

Leave a comment