What does it take to build a successful open source library? I imagined that most packages have complex structure, layers and abstractions that can be understood only by a select few. But I wanted to explore actual projects, not read about best practices in isolation.
You know what they say about writing - to be a good writer you first need to be a reader. I decided to apply the same advice to programming and go through a few open source projects. I opened NPM’s list of most downloaded packages and opened Lodash’s source code, it was the top one with 20-40 million downloads weekly.
Lodash’s Simplicity
Lodash is a popular toolbox library, a collection of utility functions. It’s been around for many years and, even though some of its important functionalities are now added to the language, it still sees a lot of use. It saves implementation time and helps us avoid duplication. In a large project we can often find different implementation of the same algorithm.
Despite its popularity and high usage, Lodash has a surprisingly simple structure. The maintainers have decided that readability, maintainability and correctness matter the most and they’ve ensured that through simplicity. We can learn a lot from complex applications but we can learn even more from simple ones.
The Structure
I start exploring new projects by getting a high level overview of its structure, patterns and naming conventions. Well, Lodash’s structure is as simple as they come. Most of the files are put directly in the root, the rest live in two subdirectories.
├── .internal
| ├── Hash.js
| ├── Stack.js
| ├── baseClone.js
| ├── baseClone.js
| ├── ...
├── tests
| ├── gt.test.js
| ├── isEmpty.test.js
| ├── flatten-methods.test.js
| ├── ...
├── gt.js
├── isEmpty.js
├── flatten.js
├── debounce.js
├── ... +200
All the code is split in three logical parts - exported functions, internal functions and tests. There aren’t any layers or complicated abstractions and that makes it easy to read. But how simple can you get before the details start causing problems?
The root folder contains 200+ files and the internals around 100. It can still be daunting to look at all that.
Here we see the first benefit from simplicity. Each file is self contained. It holds only a single function so you can look at it in isolation without being familiar with the rest of the codebase. Also, each functions is exported separately this way. As a user of the library you can pull only the two or three functions that you need without dragging everything else with it.
Design Decision: Flat Structure
I can’t say I fully agree with the decision to have everything in the root. They could’ve been put in a src
or lib
directory so they don’t get mixed with the configuration files. However, I can understand the authors’ reasoning.
The overarching principle in Lodash is simplicity and it is reflected in the whole application structure. It may seem a bit messy but it’s easy to navigate and find what you need. This structure works because you never need to understand the whole project.
The Implementation
Let’s take a look at the way the functions themselves are structured and the design decisions made there.
The authors have managed to find a good balance between duplication and composition. Each file contains a single function. It imports others if needed, both exported and internal. Complex logic is abstracted in separate functions so they can be reused and composed. But this is done sparingly.
There’s a long comment block before each function that describes what it does, examples of its usage and the names of related functions. The examples in particular are really useful to put everything together. Such descriptive comments save developers a lot of time reasoning about the code.
Let’s look at some of the simpler functions used to make comparisons - gt
, gte
, lt
, lte
/**
* Checks if `value` is greater than `other`.
*
* @since 3.9.0
* @category Lang
* @param {*} value The value to compare.
* @param {*} other The other value to compare.
* @returns {boolean} Returns `true` if `value` is greater than `other`,
* else `false`.
* @see gte, lt, lte
* @example
*
* gt(3, 1)
* // => true
*
* gt(3, 3)
* // => false
*
* gt(1, 3)
* // => false
*/
function gt(value, other) {
if (!(typeof value === 'string' && typeof other === 'string')) {
value = +value
other = +other
}
return value > other
}
The actual functions are around 5 lines of code. If you look at the other comparison functions you will see that they duplicate the logic, there isn’t a base one behind all of them. They’ve decided that duplication for something so simple would be easier to manage than adding a layer. I fully agree with this decision.
Let’s look at a slightly more complicated function that checks if a collection is empty. I’ll omit the comments for simplicity.
function isEmpty(value) {
if (value == null) {
return true
}
if (
isArrayLike(value) &&
(Array.isArray(value) ||
typeof value === 'string' ||
typeof value.splice === 'function' ||
isBuffer(value) ||
isTypedArray(value) ||
isArguments(value))
) {
return !value.length
}
const tag = getTag(value)
if (tag == '[object Map]' || tag == '[object Set]') {
return !value.size
}
if (isPrototype(value)) {
return !Object.keys(value).length
}
for (const key in value) {
if (hasOwnProperty.call(value, key)) {
return false
}
}
return true
}
The logic is made of a few conditional statements written in an imperative manner. They decided not to extract helper functions even for those. It still uses a few internal and exported functions for the more complex bits, though. It’s a good example of how the other utilities can be composed to make more complicated functionality.
The lack of inline comments made an impression of me. The code itself is clear but the conditional that checks if the passed object is an array consists of 7 different calls. A single comment describing the check would’ve made that easier to read.
Those functions are fairly straight forward, let’s see a more complex one like flatMap
.
function flatMap(collection, iteratee) {
return baseFlatten(map(collection, iteratee), 1)
}
While simple logic was often duplicated, complex functionality is abstracted away in internal functions prefixed with base
. There are multiple flattening functions that use baseFlatten
to achieve different results. By abstracting away the parameters and combining it with different functions, the authors are providing us with simpler functions to use.
This is an example of a well made abstraction that removes details and complexity. It can be used in multiple places without having to handle the different cases explicitly. It takes the collection that it needs to flatten and the depth. By manipulating those parameters we can achieve different behaviors. For example, flatMapDeep
passes INFINITY
for the depth so the collection gets flattened until it becomes a one dimensional array.
They’re exporting multiple functions with sensible defaults instead of exporting the base flattening function and expecting the user to configure it. I support this choice.
Even the debounce
function which is longer and more complex can be understood since everything is kept together. One decision I don’t like there is the many nested functions that have side effects. That makes it a bit hard to track of the logical flow and what gets modified by what.
Design Decision: Simple Structure
What can we learn from Lodash and its simple structure? When we’re building an application or library it’s important to take into account the domain and the specifics. The library is a toolbox and it’s designed this way. Each function is a separate tool which can be understood and used on its own.
In other applications where everything has to be used as a whole layers and structure makes it easier to understand. Here we have separate isolated functions that share logic between them. Lodash doesn’t have central configuration or initialization steps.