Since I wrote about garbage collection last week, I’ve been thinking about the subtleties of memory. I, like many, non-traditional programmers, have learned about computer science from the top down (maybe middle to top to bottom if we include learning the fundamentals of OOP). I made my first rails project before I even heard the terms “multi-threading” or “call stack,” let alone understood them. As a result, I learned how to build things without having to learn why they are built that way.
I wanted to change that.
And, wouldn’t you know it, I immediately came across a distinction in two very common data structures that I found really striking. The data structures: arrays and linked lists. Learning the differences in the way they are handled by the computer can be a simple and yet profound way to start thinking about performance, complexity, and memory consumption.
First, though, let’s define a linked list. If my experience can be generalized, beginners from non-traditional backgrounds are likely to encounter arrays fairly early in their development, while linked lists come later with more formal discussion of DS&A (data structures and algorithms). Linked lists are foundational data structures that form the basis for such integral processes as the call stack itself. While the idea of “data structures” can be daunting, linked lists are actually really simple, conceptually speaking.
A linked list is formed by a series of nodes. Each node contains data (the thing we want to store) and a link to another node. In fact, there are two primary types of linked lists: singly-linked and doubly linked. Singly-linked lists have — you guessed it — a single link, one linking to the next node in the series. Doubly linked lists have two links, one to the next in the series, and one to the previous.
What About Arrays
Arrays, by contrast, have indices. That is, each item in an array has its own sub-address, like apartments in an apartment building. You don’t have to knock at each door to ask where the next door is. Assuming you know the apartment you’re looking for, you can just walk right up. That’s the power of arrays.
Something interesting to consider: arrays have a set length that must be declared before their creation. Most higher-level languages are very kindly and judiciously handling this step for us in the background, but it happens. This is related to the fact that arrays are stored sequentially in memory. Q.E.D., the place in memory where we store our array must consequently be large enough to store the entire array without any breaks or gaps in between.
This is where it gets interesting from a performance perspective. See, when we create an array, the computer is reserving a space of a specified size/length. This might be similar to getting a table at a restaurant (a pre-COVID analogy, I guess; for posterity, restaurants were these really big dining rooms where people used to meet to socialize and buy food that was still warm and not soggy from the takeout container). You might call ahead to get a table for six. If you have no reservation, you might be seated immediately but you’ll still need to specify how many people will eventually arrive.
Now, imagine you have a table for four and more friends arrive. They can’t all fit at the current table. What will the restaurant do to handle this? If they have room, they’ll probably move everyone to a new, larger table. Well, arrays work much the same way. When the size of an array exceeds the reserved memory, the computer finds a new, contiguous section of memory where the objects in the array can be comfortably seated together.
The above features of linked lists give us enough information to consider how each will perform, relative to the other, in a given situation, or when performing a given operation. So, let’s go ahead and look at a few:
Warning: To evaluate performance and complexity, I’m going to be discussing the Big O of certain operations. If you have no idea what that means, might I suggest a great blog by…
I’ve already strongly alluded to the issue of access above. Linked lists are like scavenger hunts; they require multiple steps to access anything except the beginning or end. Arrays are like apartment buildings, each item has an address we can access directly. So, the big O for each would be:
Array: O(1) or constant time
Linked List: O(n) or linear time
Insertion / Deletion
Arrays may win the battle for access but there’s a reason linked lists are still around. This goes back to our friends at dinner problem. If someone new arrives, there’s a chance we’ll have to relocate everything.
Arrays may spoil us as developers because it’s simple to tell a computer to shift and unshift items. Under the hood, though, a lot has to happen. When shifting (deleting from the beginning of an array) or unshifting (inserting at the beginning of an array), the computer has to relabel every subsequent item with the corrected index. The splice method makes inserting an element in the middle of the array feel like a breeze, but the same problem applies. Each subsequent item must be given an updated index.
Linked lists, on the other hand, excel at insertion and deletion, especially at the beginning of the list. To add or remove an item to the beginning of a linked list, we only need to modify the link in either the first or second node. To shift, we remove the link to the first node and relabel the second node as the beginning (or head). To unshift we label our new node the head and add a link to the previous first node. And done. No relabeling every item. If we have a doubly-linked list, the same is true for the end of the list. However, singly-linked lists require us to traverse the entire list to get to the end.
When inserting to the middle of a linked list, we still have to do some work to access the right node, but once we’re there, insertion and deletion are simple matters of fixing two links (maybe four for doubly-linked lists), so this particular contest is a wash.
- Insertion/Removal at Beginning of Array: O(n) or linear time.
- Insertion/Removal at End of Array: O(1) or constant time, with the caveat that, occasionally, inserting at the end of an array requires the whole array to be copied to a new location in memory costing us O(n).
- Insertion/Removal at Beginning of Linked List: O(1) or constant time.
- Insertion/Removal at End of Linked List: O(n) or linear time.
When considering what will perform best, it’s also important to think about space complexity, i.e., how much memory will be required. Linked lists require extra memory to store the titular links. However, arrays suffer from the friends-at-dinner-problem: they can fill up and require resizing/relocating. And of course, what happens if there aren’t any larger tables, i.e., what if there isn’t enough contiguous memory available for storing the entire updated array? With linked lists, there only needs to be enough space for the given node. It can be placed anywhere in memory and referenced by a link in the preceding node.
There are many trade-offs between arrays and linked lists that require critical evaluation. If your list will be large and constantly growing, the extra memory for a link might be well worth the cost on a large scale. If fast random access is important, or your list is relatively small and unchanging, arrays will save time and space. Linked lists use only as much memory as they need for the current list. Arrays reserve more memory than they need, so we might be wasting space. Regardless, thinking about the two is a great starting point for thinking about complexity in our own programs.