Finally Mastering Big O Notation

by Gordon Zhu, November 2022

Background
The root cause
A better way
A proxy for time
Time examples
A proxy for space
Space examples
Using big O notation

Background

It is extremely common for programmers of all levels to have a weak understanding of performance analysis with big O notation.

Beginners often refuse to engage altogether, believing the topic to be impractical or too difficult. Experienced programmers fare no better, as they are extremely susceptible to serious and persistent misconceptions.

This hurts their ability to solve problems that deal with large inputs and makes it impossible to effectively use data structures and algorithms. This creates pain in the workplace and in technical interviews.

The root cause

To reliably analyze performance, one must assume the form of a mathematical argument, where each assertion logically flows from previous steps. When approached in this way, the analysis can be done methodically and understood with logical certainty.

In other words, every performance analysis is at its essence, a mathematical proof. If the assumptions and the logic behind the argument are correct, the analysis must be valid. If there is a flawed assumption or logical error, the analysis must be invalid.

The problem is that teaching people how to think logically and produce valid arguments is hard, and therefore a dicey approach, especially if your livelihood depends more on pleasing people than actual achievement. In an effort to make the topic more approachable, well-meaning educators turn to a familiar tactic. Simply transform a difficult skill into a memorization and classification exercise. Suddenly it becomes a lot less like math and a lot more like identifying mammals at the zoo.

I was interviewing a candidate that recently graduated from a well-known university with a computer science degree. His first task was to find the largest integer in an array. He was not making any progress so I had to give him hints, but that was not the worst part. I asked him for the time and space complexity, and his answer made no sense at all.

He did that thing where he looked up and to the right, like he was trying to recall something. He gave another answer. It was also complete nonsense. I looked at him again. He blurted out something else.

How did we get here? In his algorithms and data structures class, the professor had given him a table of common algorithms and their performance figures. He was just blurting out random figures from that table, hoping he'd guess right.

Along with the table, students are given a series of 80% heuristics. I say 80% because they often work in the common cases but rarely in the tricky ones. The most popular and damaging ones include:

"If you see a single loop, the time is linear."
"If you see a nested for-loop, the time is quadratic."
"If you see a division by 2, the time is logarithmic."

The issue is not that the hazy heuristics only work 80% of the time, it's that students have no clue about the underlying reasoning, and are therefore doomed to missapply them forever, often in comically idiotic ways.

Interviewers, knowing the weaknesses of the "table plus hazy heuristics approach", easily pin candidates down by focusing on the 20% cases. At top companies, experienced interviewers use this to efficiently filter out weak candidates.

A better way

The solution is to think about performance from the ground up, using logical reasoning rather than rough pattern-matching. As alluded to earlier, the approach should mirror the process of developing logical arguments or proofs.

To ease the difficulty, the approach should be methodical and deterministic. Students should be able to follow a series of simple, repeatable steps. If they apply the same method, they should get exactly the same answer. Since each step is precisely laid out, if there are logical errors, students should be able to identify the exact points where they went wrong and why.

Though more difficult to learn initially, this method removes doubt from an otherwise uncertain process and helps students think about performance in a way that will scale up to problems of any difficulty.

A proxy for time

We could run a function with progressively larger inputs and record how long each one takes, but this would be noisy because of differences between machines. To remove the effect of our individual machines, we instead measure how much work each function does. The theory is, all else equal, more work should take longer.

Before we go further, I must introduce two definitions. I made them up, so don't be alarmed that you have never heard of them.

Single instruction: One line of operations (e.g. +, -, *, /, <, >, ===) and assignments (i.e. =).
Compound instruction: A set of single instructions (functions, loops, etc).

We can measure the amount of work by counting the total number of instructions. A single instruction counts as 1 unit of work. When we see a compound instruction, we must determine how many single instructions it contains.

Time examples

All examples are in JavaScript. I'll only use essential language features so that you can understand everything even if you've never seen JavaScript.

Example 1:

let x = 100;    // This is 1 instruction
x = x + 100;    // This is 1 instruction
x = x * 2;      // This is 1 instruction

Here we just have 3 single instructions

= 1 + 1 + 1
= 3 total instructions

Example 2:

let x = 0;                         // 1

for (let i = 0; i < 100; i++) {    // 100 *
  x = x + 10;                        // 1
}

Here we have a single instruction and a compound instruction (loop) that repeats 1
instruction 100 times. Notice how the indentation of the comments mirrors the
structure of the code.

= 1 + 100 * (1)
= 1 + 100
= 101 total instructions

Example 3:

let x = 0                          // 1

for (let i = 0; i < 100; i++) {    // 100 *
  x = x + 1                          // 1
  x = x + 10                         // 1
}

Here we have a single instruction followed by a compound instruction (loop)
that repeats 2 instructions 100 times. Again, notice the comments on the right
and how the indentation determines whether we add or multiply.

= 1 + 100 * (1 + 1)
= 1 + 100 * (2)
= 1 + 200
= 201 total instructions

In the remaining examples, we'll try to figure out how many instructions there would be if we were to call the function.

Example 4:

function example3() {
  let age = 0;      // 1
  age = age + 1;    // 1
  age = age + 1;    // 1
  return age;       // 1
}

If we were to call the function, we'd have 4 instructions in total.

= 1 + 1 + 1 + 1
= 4 total instructions

Example 5:

function example5(number) {
  let result = number;            // 1

  for (let i = 0; i < 20; i++) {  // 20 *
    result = result + 1;            // 1
  }

  return result;                  // 1
}

= 1 + 20 * (1) + 1
= 1 + 20 + 1
= 22 total instructions

Notice that it doesn’t matter what we pass into the function. No matter how large 
`number` is, the function will always result in 22 instructions.

Example 6:

// Assume n is a positive integer
function example6(n) {
  let age = 0;                    // 1

  for (let i = 0; i < n; i++) {   // n *
    age = age + 1;                  // 1
  }

  return age;                     // 1
}

= 1 + n * (1) + 1
= 1 + n + 1
= n + 2 total instructions

Example 7:

function example7(array, targetElement) {
  for (let element of array) {        // array.length *
    if (element === targetElement) {    // 1
      return true;
    }
  }

  return false;                       // 1 in the worst case
}

The tricky thing about this code is that there are two
return statements. Only one of them will run though, so we
must be careful to only count one.

In Big O notation, to avoid wishful thinking, we only care about
the worst case, or the most work the function could do. If we apply
this perspective, only the last return statement will run. That's 
because in the worst case, the for-loop will examine all the elements 
without returning true. It will exit the loop and then will return
false at the end. That means we can ignore the first return statement
since it won't run in the worst case.

That leads to the following calculations:

= array.length * (1) + 1
= array.length + 1 total instructions

Just for educational purposes, let's think about the best case
situation as well. The function would do the least amount of work
if `targetElement` is the first element in `inputArray`. Then we’d 
only have 1 + 1 = 2 instructions. That's because the if-statement
would run once, and then we'd return true after that.

Example 8:

// Assume n is a positive integer.
function example8(n) {
  let sum = 0;                      // 1

  for (let i = 0; i < 33; i++) {    // 33 *
    sum = sum + i;                    // 1
    for (let j = 0; j < 12; j++) {    // 12 *
      sum = sum - 1;                    // 1
    }
  }

  for (let k = 0; k < 17; k++) {    // 17 *
    sum = sum + 1;                    // 1
  }

  return sum;                       // 1
}

In this example you'll want to pay especially close
attention to the indentation.

= 1 + 33 * (1 + 12 * (1)) + 17 * (1) + 1
= 1 + 33 * (13) + 17 + 1
= 1 + 429 + 17 + 1
= 448 total instructions

Example 9:

// Assume n is a positive integer.
function example9(n) {
  for (let i = 0; i < n; i++) {      // n *
    for (let j = 0; j < 7; j++) {      // 7 *
      for (let k = 0; k < 3; k++) {      // 3 *
        2 + 2;                             // 1
      }
    }
  }
}

If we look at the comments, we have:
= n * (7 * (3 * (1)))
= n * 21

Note how the parentheses exactly match the
nesting of the loops in the code.

Example 10:

// Assume n is a positive integer.
function example10(n) {
  for (let i = 0; i < n; i++) {      // 1 *
    for (let j = 0; j < 7; j++) {      // 1 *
      for (let k = 0; k < 3; k++) {      // 1 *
        let myValue = 0;                   // 1
        myValue++;                         // 1
        myValue++;                         // 1
        return myValue;                    // 1
      }
    }
  }
}

Here, we need to be careful and make sure we
understand how the code works. We know that
when we see a return, we will exit the function.

So the first time we get to `return myValue`,
we exit. Therefore, even though the loops seem like
they will iterate `n`, 7, and 3 times respectively,
in reality, each loop will actually iterate once.

So we have:
= 1 * (1 * (1 * (1 + 1 + 1 + 1)))
= 1 * (1 * (1 * (4)))
= 4 total instructions

Example 11:

// Assume numbers is an array of numbers.
function example11(numbers) {
  let sum = 0                       // 1

  for (let numberA of numbers) {    // numbers.length *
    sum += numberA;                   // 1

    for (let numberB of numbers) {    // numbers.length *
      sum += numberB;                   // 1
    }
  }

  return sum;                       // 1
}

= 1 + numbers.length * (1 + numbers.length * (1)) + 1
= numbers.length * (1 + numbers.length) + 2
= numbers.length^2 + numbers.length + 2

Example 12:

function example12(n) {
  let x = n;               // 1

  while (x > 1) {          // wtf?
    x = x / 2;               // 1
  }
}

In total, we have:
= 1 + wtf * (1)

The tricky part is to think about the wtf,
which represents how many times the loop
will run.

In the worst case, `n` will be a large positive
number. We'll have to keep dividing it by 2
until we get a value less than or equal to 1.

In other words, we could say that wtf is:
"How many times can `n` be divided by 2 
until it's less than or equal to 1?"

Luckily, there is shorthand notation from
math that expresses this question: log2(n).

Now that we know this much easier notation,
we can express the number of instructions
more succinctly without wtfs:

= 1 + wtf * (1)
= 1 + log2(n) * (1)
= 1 + log2(n)

A note for our students: logarithms can be tricky and require
a lot of practice to master. We have much more in-depth practice
specifically for logarithms in our algorithms curriculum. This
example is just an introduction.

Looking back

In every example, we were able to find a mathematical expression that exactly represented the amount of work being done, based on the definitions for single and compound instructions. Aside from a few prerequisites (basic math and attention-to-detail), every person that follows the methodology should get precisely the same answer, every time.

Pedagogically, forcing students to count exactly the number of instructions is important. With this process, if the student is off by even 1, that serves as a strong warning that there is a crucial misconception that must be resolved. This allows misconceptions to easily surface for both students and instructors.

Moreover, because this process requires more detail than most methods, the incorrect answers themselves can point to the severity and nature of the misconception.

A note about strings:

This guide is focused on the fundamental principles of time complexity and Big-O notation. However, it’s important to note that operations involving strings can be tricky because strings are typically immutable.

For example, if you have a 'abcd' and want to append 'e' to create the string abcde, it will take 5 instructions because all 5 characters have to be copied into an entirely new string. This is because strings cannot be modified due to immutability. The latter part of our algorithms curriculum goes into specific string-related performance issues and includes practice problems to help you learn how to maximize their performance.

A proxy for space

Now that we have a way to measure time, let's turn our attention to space, or memory. The process here is very similar. We'll introduce a few rules and simple definitions, and then we'll apply them to actual programs. The result will be a mathematical expression that represents the amount of space a program uses.

Instead of measuring literal bits, we'll use an imaginary "memory unit". We do this because there are differences between how computers use memory. One computer might use 8, 16, 32, or 64 bits to represent the same thing. Our imaginary memory units allow us to ignore differences between machines.

Definitions

Singular data: takes 1 memory unit. Examples include characters, numbers, booleans, null, and undefined.
Compound data: takes 1 or more memory units. Examples include arrays, strings, and objects. We'll always use 1 unit of memory for the structure itself. We then add more units as needed for any contents. For example the array [7] would take 2 units of memory (1 for the array, and 1 for the number 7).

We must also consider the effect that functions have on memory

A function call itself takes up 1 unit of memory. This is simple enough, but since a function may call any number of functions, we must also consider how many functions are running at any given time. Consider the following:

function f2() {
  return;
}

function f1() {
  f2();
}

f1();

We call f1, which then calls f2. When f2 is running, so is f1. That means the computer must store data for both function calls simultaneously. Therefore peak memory usage for f1 is 2 units of memory.

The rest of the program uses 1 unit of memory for each function definition. So while f1 uses 2 units of memory, the entire program with function definitions uses 4 units of memory.

We must also consider local variables within functions. Consider the following:

function g2() {
  let four = 4;
  let five = 5;
  let six = 6;
  let seven = 7;
}

function g1() {
  let one = 1;
  let two = 2;
  let three = 3;
  g2();
}

g1();

Like the previous example, at most, we will have two functions that are running simultaneously. So this gives us 2 units of memory. Then, we must also add the 7 local variables from g1 and g2. This gives us a total peak memory usage of 2 + 7 = 9 for g1.

To compute the memory for the entire program, we must add 2 units of memory for the two function definitions. So total memory for the entire program is 9 + 2 = 11.

Functions and responsibility

When we talk about the memory used by a particular function, we only consider additional memory that would be used if you were to call the function. This is the perspective that's most commonly assumed when computer scientists talk about the performance of a function. Note that this perspective ignores the space required for the function definition itself. We are only concerned with the incremental memory required to call the function.

Any memory that is created outside of the function is not considered. For example, if you called function1 and passed in an array, function1 would not be responsible for any of the memory used by the array itself, since the array was created outside of the function. function1 would however be responsible for storing a copy of the memory address that points to the array. Assuming function1 doesn’t do anything else, the total memory responsibility is just 2 (1 for the array's memory address and 1 for the function call itself).

In another example, let’s say you pass an array, testArray2 that has 50 elements, into function2. At this point, function2 is responsible for storing a copy of the memory address that points to the existing array. Next, function2 adds 100 elements to testArray2. Even though testArray2 has 150 elements, function2 is only responsible for the 100 elements that were added by function2. So the total responsibility is 1 + 100 + 1 = 102 (1 for the array memory address, 100 for the new elements, and 1 for the function call itself).

Space examples

Example 1:

let x = 100;
x = x + 100;
x = x * 2;

Here we have just 1 unit of memory allotted for `x`,
which stores a number.

Example 2:

function increment20Times(number) {
  let result = number;

  for (let i = 0; i < 20; i++) {
    result = result + 1;
  }

  return result;
}

`increment20Times` uses 4 units of memory.

1 for the call.
1 to store the parameter `number`.
1 to store `result`.
1 for `i`.

Example 3:

function add1(inputArray) {
  inputArray[inputArray.length] = 1;
}

`add1` uses 3 units of memory.

1 for calling `add1`.
1 for the parameter `inputArray`, which is a memory
address for an array that already exists.
1 for the additional memory used to store the
extra element that was added to `inputArray`.

Example 4:

function copyArray(original) {
  let copy = [];

  for (let i = 0; i < original.length; i++) {
    copy[i] = original[i];
  }

  return copy;
}

`copyArray` uses
(4 + original.length) units of memory.

1 for the function call
1 for `original` (a memory address)
1 for `copy` (a memory address)
1 for `i`
`original.length` for the elements in the copied array

Using big O notation

This is the easiest, most trivial part. There are just two insanely simple rules that are nearly impossible to get wrong. As usual though, I want to explain the why behind everything.

So far, we've come up with mathematical expressions that described:

The number of instructions in a program.
The number of memory units used by a program.

If you are trying to push the performance of your programs to the absolute maximum, thinking in this much detail might be warranted. You might even want more detail. For example, whereas the smallest unit we counted was a single line of operations, if performance were critical enough, you might want to go down to the individual operations within each line. Whereas we counted imaginary "memory units", you could count bits if you desired.

However, most often, we are concerned with very broad categories of performance. In order to create these broad distinctions, we must look at what happens as the input grows arbitrarily large, towards infinity. This perspective (focusing on the input as it gets huge) will allow us to simplify our analysis greatly and put almost all common programs into just a handful of distinct buckets.

Take 2n^2 + 34n + 2 as an example. You can assume that n is something like the length of an array.

As n gets larger, the ratio of the last term (2) to the entire function approaches 0. The same is true of the middle term (34n). The opposite happens with the first term, 2n^2. As n grows, the ratio of 2n^2 to the entire function approaches 1.

What we’re observing is that the largest term will always dominate the others. And the others will be so insignificant that their share of the function will approach 0. This is the thinking behind dropping everything but the biggest term. This simplification yields: 2n^2 + 34n + 2 => 2n^2.

For the same reason that we can drop smaller terms, we can also drop coefficients. Consider the case where you have 2n^2 + 1000000000n. n^2 will still dominate 1000000000n once n gets large enough. So if we are really analyzing performance broadly, we just do not care about coefficients, so we just throw them away.

What happens in the case where we must compare two functions that have the form c * n^2, where c is some constant value? If we have 10 * n^2 and 1 * n^2, neither will dominate the other, their ratio will always be 10:1 for all n. While it is true that one is always 10 times faster, remember our goal is to distill the performance of all algorithms into a handful of distinct buckets. If we incorporate c into our analysis, we'd have an infinite number of buckets since c can be anything. Ignoring c allows us to lump all functions with the form c * n^2 into a single bucket rather than infinitely many.

So if we start from: 2n^2 + 1000000000n

= 2n^2 + 1000000000n // get rid of lower terms
= 2n^2               // drop coefficient
= n^2                // final result

The notation for this method of performance analysis is called Big O notation, expressed as O(insert_simplified_expression).

So for the previous example, if we expressed the result using Big O notation, we would have O(n^2). In a sentence you would say, “this function runs in O of n-squared time”.

Using these steps, we can easily go back to the previous expressions and transform them using big O notation. Big O notation it turns out is the easiest part. Anyone can do it. The hard part is actually coming up with the initial expressions.