Finally Mastering Big O Notation

by Gordon Zhu, November 2022


Background
The root cause
A better way
A proxy for time
Time examples
A proxy for space
Space examples
Using big O notation

Background

It is extremely common for programmers of all levels to have an extremely weak understanding of performance analysis with big O notation.

Beginners often refuse to engage altogether, believing the topic to be impractical or too difficult. Experienced programmers fare no better, as they are extremely susceptible to serious and persistent misconceptions.

This hurts their ability to solve problems that deal with large inputs and makes it impossible to effectively use data structures and algorithms. This creates pain in the workplace and in technical interviews.

The root cause

To reliably analyze performance, one must assume the form of a mathematical argument, where each assertion logically flows from previous steps. When approached in this way, the analysis can be done methodically and understood with logical certainty.

In other words, every performance analysis is at its essence, a mathematical proof. If the assumptions and the logic behind the argument are correct, the analysis must be valid. If there is a flawed assumption or logical error, the analysis must be invalid.

The problem is that teaching people how to think logically and produce valid arguments is hard, and therefore a dicey approach, especially if your livelihood depends on high pass rates. In an effort to make the topic more approachable, well-meaning educators turn to a familiar tactic. Simply transform a difficult skill into a memorization and classification exercise. Suddenly it becomes a lot less like math and a lot more like identifying mammals at the zoo.

I was interviewing a candidate that recently graduated from a well-known university with a masters in computer science. His first task was to find the largest integer in an array. He was not making any progress so I had to give him hints, but that was not the worst part. I asked him for the time and space complexity, and his answer made no sense at all.

He did that thing where he looked up and to the right, like he was trying to recall something. He gave another answer. It was also complete nonsense. I looked at him again. He blurted out something else.

How did we get here? In his algorithms and data structures class, the professor had given him a table of common algorithms and their performance figures. He was just blurting out random figures from that table, hoping he'd guess right.

Along with the table, students are given a series of 80% heuristics. I say 80% because they often work in the common cases but rarely in the tricky ones. The most popular and damaging ones include:

The issue is not that the hazy heuristics only work 80% of the time, it's that students have no clue about the underlying reasoning, and are therefore doomed to missapply them forever, often in comical ways.

Interviewers, knowing the weaknesses of the "table plus hazy heuristics approach", easily pin candidates down by focusing on the 20% cases. At top companies, experienced interviewers use this to efficiently filter out weak candidates.

A better way

The solution is to think about performance from the ground up, using logical reasoning rather than rough pattern-matching. As alluded to earlier, the approach should mirror the process of developing logical arguments or proofs.

To ease the difficulty, the approach should be methodical and deterministic. Students should be able to follow a series of simple, repeatable steps. If they apply the same method, they should get exactly the same answer. Since each step is precisely laid out, if there are logical errors, students should be able to identify the exact point(s) where they went wrong and why.

Though more difficult to learn initially, this method removes doubt from an otherwise uncertain process and helps students think about performance in a way that will scale up to problems of any difficulty.

A proxy for time

We could run a function with progressively larger inputs and record how long each one takes, but this would be noisy because of differences between machines. To remove the effect of our individual machines, we instead measure how much work each function does. The theory is, all else equal, more work should take longer.

Before we go further, I must introduce two definitions. I made them up, so don't be alarmed that you have never heard of them.

We can measure the amount of work by counting the total number of instructions. A single instruction counts as 1 unit of work. When we see a compound instruction, we must determine how many single instructions it contains.

Time examples

All examples are in JavaScript. I've tried to use only essential language features so that the code can easily be understood even if you've never seen JavaScript.

Example 1:

let sum = 1 + 2;                  // 1
let name = 'Watch and Code';      // 1
let myObject = {};                // 1
myObject.type = 'Mac';            // 1

Each line above is a single instruction.
Altogether, we have 4 instructions.

Example 2:

for (let i = 0; i < 10; i++) {      // 10 *
  0 + 0;                              // 1
  0 + 0;                              // 1
}

We have a compound instruction because of the for-loop.
The for-loop will run 10 times.
The body has (1 + 1) = 2 instructions.
This gives us 10 * 2 = 20 instructions.

In the remaining examples, figure out how many instructions there would be if we were to call the function.

Example 3:

function example3() {
  let age = 0;      // 1
  age = age + 1;    // 1
  age = age + 1;    // 1
  return age;       // 1
}

We see that the function has 4 single instructions.
Therefore, we'd have 4 instructions in total.

Example 4:

// Assume n is a positive integer
function example4(n) {
  let age = 0;                    // 1

  for (let i = 0; i < n; i++) {   // n *
    age = age + 1;                  // 1
  }

  return age;                     // 1
}

The first line is 1 single instruction.

Next, we see that the loop will run n times.
The body has 1 single instruction.
This gives us n * (1) instructions.

The last line is 1 single instruction.

So in total, we have:
= 1 + n * (1) + 1
= n + 2

Example 5:

function example5(array, targetElement) {
  for (let element of array) {        // array.length *
    if (element === targetElement) {    // 1
      return true;                    // 1 or 0
    }
  }

  return false;                       // 1 or 0
}

In this case, it *appears* that we will run the
loop array.length times. The body includes an
if-statement, which is a single instruction.
So far, this gives us array.length * (1), which
we can simplify to array.length.

When analyzing the return statements, we must be careful
to count only 1 of them since only one of them will run
once. That gives us a new total of (array.length + 1)
instructions.

However, this doesn't give us the full picture.
(array.length + 1) would be true if the targetElement is
at the very end of the array or not there at all. But
what if targetElement is the first element in inputArray?
Then we’d have (1 + 1 = 2) instructions.

So what we are really trying to say is that we have a
best case of 2 instructions, a worst case of
(array.length + 1), and of course everything in between.

Typically, in our analyses, to avoid wishful thinking,
we will concern ourselves with the worst case (unless
otherwise stated). So our final count will be
array.length + 1.

Example 6:

// Assume n is a positive integer.
function example6(n) {
  let sum = 0;                      // 1

  for (let i = 0; i < 33; i++) {    // 33 *
    sum = sum + i;                    // 1
    for (let j = 0; j < 12; j++) {    // 12 *
      sum = sum - 1;                    // 1
    }
  }

  for (let k = 0; k < 17; k++) {    // 17 *
    sum = sum + 1;                    // 1
  }

  return sum;                       // 1
}

The first line is 1 instruction.

The i-loop will run 33 times. Inside of
there is 1 instruction and a j-loop that
runs 1 instruction 12 times.

So in total, the i-loop contains:
= 33 * (1 + 12 * (1))
= 33 * (1 + 12)
= 33 * (13)
= 429

The k-loop is more straightforward. It runs
1 instruction 17 times. So, 17 * (1) = 17.

The last line is 1 instruction.

Adding everything together, we have:
= 1 + 429 + 17 + 1
= 448

Example 7:

// Assume n is a positive integer.
function example7(n) {
  for (let i = 0; i < n; i++) {      // n *
    for (let j = 0; j < 7; j++) {      // 7 *
      for (let k = 0; k < 3; k++) {      // 3 *
        2 + 2;                             // 1
      }
    }
  }
}

Hopefully, you're starting to get a hang of
things now. It's easier to pay attention to
the comments on the right side of each line.

Note how I always comment in the same way. Pay
particular attention to the *'s and indentation.

If we look at the comments, we have:
= n * (7 * (3 * (1)))
= n * (7 * 3)
= n * 21

Note how the parentheses exactly match the
nesting of the loops in the code.

Example 8:

// Assume n is a positive integer.
function example8(n) {
  for (let i = 0; i < n; i++) {      // 1 *
    for (let j = 0; j < 7; j++) {      // 1 *
      for (let k = 0; k < 3; k++) {      // 1 *
        let myValue = 0;                   // 1
        myValue++;                         // 1
        myValue++;                         // 1
        return myValue;                    // 1
      }
    }
  }
}

Here, we need to be careful and make sure we
understand how the code works. We know that
when we see a return, we will exit the function.

So the first time we get to `return myValue`,
we exit. Therefore, even though the loops seem like
they will iterate n, 7, and 3 times respectively,
in reality, each loop will actually iterate once.

So we have:
= 1 * (1 * (1 * (1 + 1 + 1 + 1))))
= 1 * (1 * (1 * (4))))
= 4

Example 9:

// Assume numbers is an array of numbers.
function example9(numbers) {
  let sum = 0                       // 1

  for (let numberA of numbers) {    // numbers.length *
    sum += numberA;                   // 1

    for (let numberB of numbers) {    // numbers.length *
      sum += numberB;                   // 1
    }
  }

  return sum;                       // 1
}

The first line is 1 instruction.

In the middle we have:
= numbers.length * (1 + numbers.length * (1))
= numbers.length * (1 + numbers.length)
= numbers.length + numbers.length^2

The last line is 1 instruction.

If we add everything together, we get:
= numbers.length + numbers.length^2 + 2

Example 10:

function example10(array) {
  let halfway = array.length / 2;     // 1

  for (let i = 0; i < halfway; i++) { // ((length + 1) / 2) *
    array[i] = 0;                       // 1
  }
}

Assume length = array.length in the comments above
and discussion below.

Length will be even or odd. If odd, the loop will run
(length + 1) / 2 times. If even, the loop will run
(length / 2) times. We'll assume it's odd since we
want the worst case.

That gives us:
= 1 + ((length + 1) / 2) * (1)
= 1 + ((length + 1) / 2)

Example 11:

function example11(n) {
  let x = n;               // 1

  while (x > 1) {          // wtf?
    x = x / 2;               // 1
  }
}

In total, we have:
= 1 + wtf * (1)

The tricky part is to think about the wtf,
which represents how many times the loop
will run.

In the worst case, n will be a large positive
number. We'll have to keep dividing it by 2
until we get a value less than or equal to 1.

In other words, we could say that wtf is:
"How many times can n be divided by 2 
until it's less than or equal to 1?"

Luckily, there is shorthand notation from
math that expresses this question: log2(n).

Now that we know this much easier notation,
we can express the number of instructions
more succintly without wtfs:

= 1 + wtf * (1)
= 1 + log2(n) * (1)
= 1 + log2(n)

Looking back

In every example, we were able to find a mathematical expression that exactly represented the amount of work being done, based on the definitions for single and compound instructions. Aside from a few prerequisites (basic math and attention-to-detail), every person that follows the methodology should get precisely the same answer, every time.

Pedagogically, forcing students to count exactly the number of instructions is important. With this process, if the student is off by even 1, that serves as a strong warning that there is a crucial misconception that must be resolved. This allows misconceptions to easily surface for both students and instructors.

Moreover, because this process requires more detail than most methods, the incorrect answers themselves can point to the severity and nature of the misconception.

A proxy for space

Now that we have a way to measure time, let's turn our attention to space, or memory. The process here is very similar. We'll introduce a few rules and simple definitions, and then we'll apply them to actual programs. The result will be a mathematical expression that represents the amount of space a program uses.

Instead of measuring literal bits, we'll use an imaginary "memory unit". We do this because there are differences between how computers use memory. One computer might use 8, 16, 32, or 64 bits to represent the same thing. Our imaginary memory units allow us to ignore differences between machines.

Definitions

We must also consider the effect that functions have on memory

A function call itself takes up 1 unit of memory. This is simple enough, but since a function may call any number of functions, we must also consider how many functions are running at any given time. Consider the following:

function f2() {
  return;
}

function f1() {
  f2();
}

f1();

We call f1, which then calls f2. When f2 is running, so is f1. That means the computer must store data for both function calls simultaneously. Therefore peak memory usage for f1 is 2 units of memory.

The rest of the program uses 1 unit of memory for each function definition. So while f1 uses 2 units of memory, the entire program with function definitions uses 4 units of memory.

We must also consider local variables within functions. Consider the following:

function g2() {
  let four = 4;
  let five = 5;
  let six = 6;
  let seven = 7;
}

function g1() {
  let one = 1;
  let two = 2;
  let three = 3;
  g2();
}

g1();

Like the previous example, at most, we will have two functions that are running simultaneously. So this gives us 2 units of memory. Then, we must also add the 7 local variables from g1 and g2. This gives us a total peak memory usage of 2 + 7 = 9 for g1.

To compute the memory for the entire program, we must add 2 units of memory for the two function definitions. So total memory for the entire program is 9 + 2 = 11.

Functions and responsibility

When we talk about the memory used by a particular function, we only consider additional memory that would be used if you were to call the function. This is the perspective that's most commonly assumed when computer scientists talk about the performance of a function. Note that this perspective ignores the space required for the function definition itself. We are only concerned with the incremental memory required to call the function.

Any memory that is created outside of the function is not considered. For example, if you called function1 and passed in an array, function1 would not be responsible for any of the memory used by the array itself, since the array was created outside of the function. function1 would however be responsible for storing a copy of the memory address that points to the array. Assuming function1 doesn’t do anything else, the total memory responsibility is just 2 (1 for the array's memory address and 1 for the function call itself).

In another example, let’s say you pass an array, testArray2 that has 50 elements, into function2. At this point, function2 is responsible for storing a copy of the memory address that points to the existing array. Next, function2 adds 100 elements to testArray2. Even though testArray2 has 150 elements, function2 is only responsible for the 100 elements that were added by function2. So the total responsibility is 1 + 100 + 1 = 102 (1 for the array memory address, 100 for the new elements, and 1 for the function call itself).

Space examples

Example 1:

let x = 100;
x = x + 100;
x = x * 2;

Here we have just 1 unit of memory allotted for `x`,
which stores a number.

Example 2:

function increment20Times(number) {
  let result = number;

  for (let i = 0; i < 20; i++) {
    result = result + 1;
  }

  return result;
}

increment20Times uses 4 units of memory.

1 for the call.
1 to store the parameter `number`.
1 to store `result`.
1 for i.

Example 3:

function add1(inputArray) {
  inputArray[inputArray.length] = 1;
}

add1 uses 3 units of memory.

1 for calling add1.
1 for the local value inputArray, which is a memory
address to an array that already exists.
1 for the additional memory used to store the
extra element that was added to inputArray.

Example 4:

function copyArray(original) {
  const copy = [];

  for (let i = 0; i < original.length; i++) {
    copy[i] = original[i];
  }

  return copy;
}

copyArray uses
(4 + original.length) units of memory.

1 for the function call
1 for original (a memory address)
1 for copy (a memory address)
1 for i
original.length, for each element in the copied array

Using big O notation

This is the easiest, most trivial part. There are just two insanely simple rules that are nearly impossible to get wrong. As usual though, I want to explain the why behind everything.

So far, we've come up with mathematical expressions that described:

If you are trying to push the performance of your programs to the absolute maximum, thinking in this much detail might be warranted. You might even want more detail. For example, whereas the smallest unit we counted was a single line of operations, if performance were critical enough, you might want to go down to the individual operations within each line. Whereas we counted imaginary "memory units", you could count bits if you desired.

However, most often, we are concerned with very broad categories of performance. In order to create these broad distinctions, we must look at what happens as the input grows arbitrarily large, towards infinity. This perspective (focusing on the input as it gets huge) will allow us to simplify our analysis greatly and put almost all common programs into just a handful of distinct buckets.

Take 2n^2 + 34n + 2 as an example. You can assume that n is something like the length of an array.

As n gets larger, the ratio of the last term (2) to the entire function approaches 0. The same is true of the middle term (34n). The opposite happens with the first term, 2n^2. As n grows, the ratio of 2n^2 to the entire function approaches 1.

What we’re observing is that the largest term will always dominate the others. And the others will be so insignificant that their share of the function will approach 0. This is the thinking behind dropping everything but the biggest term. This simplification yields: 2n^2 + 34n + 2 => 2n^2.

For the same reason that we can drop smaller terms, we can also drop coefficients. Consider the case where you have 2n^2 + 1000000000n. n^2 will still dominate 1000000000n once n gets large enough. So if we are really analyzing performance broadly, we just do not care about coefficients, so we just throw them away.

What happens in the case where we must compare two functions that have the form c * n^2, where c is some constant value? If we have 10 * n^2 and 1 * n^2, neither will dominate the other, their ratio will always be 10:1 for all n. While it is true that one is always 10 times faster, remember our goal is to distill the performance of all algorithms into a handful of distinct buckets. If we incorporate c into our analysis, we'd have infinite number of buckets since c can be anything. Ignoring c allows us to lump all functions with the form c * n^2 into a single bucket rather than infinitely many.

So if we start from: 2n^2 + 1000000000n

= 2n^2 + 1000000000n // get rid of lower terms
= 2n^2               // drop coefficient
= n^2                // final result

The notation for this method of performance analysis is called Big O notation, expressed as O(insert_simplified_expression).

So for the previous example, if we expressed the result using Big O notation, we would have O(n^2). In a sentence you would say, “this function runs in O of n-squared time”.

Using these steps, we can easily go back to the previous expressions and transform them using big O notation. Big O notation it turns out is the easiest part. Anyone can do it. The hard part is actually coming up with the initial expressions.