Reading Code - Express

January 11, 2021 9 minute read

I think that one of the best ways to learn about building software is to explore other code bases. Nothing beats seeing how a successful project is structured and how it works internally.

I often dig through open source repositories to learn from their architecture and design decisions. And there are some excellent projects out there. Some of them have been a great source of inspiration for me.

I decided to publish my case studies with the hope that they inspire others as well.

This article is about Express - the most popular web framework for Node. It’s small, minimalistic and unopinionated - very different from Rails, Django and Laravel.

We’ll see how Express works under the hood, how the code is layed out and what we can learn from it.

Project Structure

Express is not structured differently than most Node libraries - the configuration files, documentation and guides are in the root. The actual project is split in 4 folders - lib, test, examples and benchmarks.

The heart of the project lies in the lib folder. That’s a common practice for node applications - putting the logic in a src or lib directory.

This is how the lib folder is structured:

├── router
|   ├── index.js
|   ├── layer.js
|   ├── route.js
├── application.js
├── express.js
├── request.js
├── response.js
├── utils.js
├── view.js

There are a few main objects and structures that express relies on - the request, response, view, application and router.

For most objects there is a single JS file that holds all its logic. The only exception is the router because it’s actually made of three separate entities.

To make it obvious that those are logically connected they’re put in a separate router folder. This is a good general practice - things that work together should be put together. It makes it easier to reason about the project.

There are two more files - express.js and utils.js file.

The first one puts everything else together and bootstraps the entire application. That one could’ve been called index.js instead but I suppose they didn’t do that because it still contains some logic. It’s not just an entry point that exports the other entities.

The utils file contains a bunch of helper functions. They all have high level comments to describe what they do. This is a good decision because some of them are specific for this kind of project. A good description and an example help if you’re not that familiar with the domain.

Many of the files go into the hundreds of lines. They could’ve definitely been split in multiple smaller ones. But the pattern here is one entity per file, so even if the files are lengthy they have maintained consistency.

The Bootstrapping

The package exports a single function that you call to create a new Express instance.

import express from 'express'

const app = express()

The express.js file, we mentioned earlier, contains a factory function called createApplication. It gets called when you create a new Express instance and returns the application object which contains all the functionality. Factory functions are a great pattern that’s used to create complex objects or ones that require configuration.

/**
 * Create an express application.
 *
 * @return {Function}
 * @api public
 */

function createApplication() {
  var app = function (req, res, next) {
    app.handle(req, res, next)
  }

  // ...
  // Merges app with the exported value from application.js
  mixin(app, proto, false)
  // ...
  app.init()

  return app
}

I really like that pattern because it hides complexity and unnecessary details. It’s commonly used in languages like Go as well.

The Application

When you open application.js we see a lengthy file. It contains the implementation of the main Express functions - route, use, all. There are high level comments above each function giving a description but you will find plenty of inline comments as well. This means that the code itself is not enough to provide the details of why certain decisions are made.

Besides the public methods that we use there are private ones that configure the application. The init method is what gets called by the factory function which sets up the default config for the application.

/**
 * Initialize the server.
 *
 *   - setup default configuration
 *   - setup default middleware
 *   - setup route reflection methods
 *
 * @private
 */

app.init = function init() {
  // ...
  this.defaultConfiguration()
}

Something to note is the lazyrouter method which you will see called in many places. The name immediately brings your attention that it’s not a normal instantiation. When we take a look at that method we see that the router is in fact a singleton. Normally, I’d expect this to be done as part of the configuration but thankfully, there’s a descriptive comment above the lazyrouter function which tells us the reason why this can’t happen.

This function is called in many places to make sure the router property is set.

/**
 * lazily adds the base router if it has not yet been added.
 *
 * We cannot add the base router in the defaultConfiguration because
 * it reads app settings which might be set after that has run.
 *
 * @private
 */
app.lazyrouter = function lazyrouter() {
  if (!this._router) {
    // ...
    this._router = new Router()
    // ...
  }
}

In a sense, Express is just a wrapper around its router - it’s the most important building block. There are many references to the router and some of the functions like get, post, use and handle (that one is internal) just proxy to it.

Something that you won’t find at first are the definitions for the http method handlers. In Express you can use methods like app.get or app.post but there aren’t separate definitions for them.

That’s because they are defined dynamically. A loop goes through all http methods and attaches the functions to the application object.

/**
 * Delegate `.VERB(...)` calls to `router.VERB(...)`.
 */
methods.forEach(function (method) {
  app[method] = function (path) {
    // ...
    this.lazyrouter()

    var route = this._router.route(path)
    route[method].apply(route, slice.call(arguments, 1))
    return this
  }
})

The logic to handle them is identical so the authors have decided to avoid verboseness, duplication or extra abstractions. They’ve taken full advantage of the dynamic nature of JavaScript.

Personally, I would’ve probably gone the route of abstraction and defined the methods explicitly for clarity’s sake. But since their files are already quite lengthy I can see why they decided to go that route.

Of course, they could’ve added just one function for all verbs and get the http method as a parameter. But this decision gives users the ability to write more descriptive code. The complexity is handled by the library instead.

Design Decision: Chaining Methods

You will notice that the public methods like .get(), .post() or .use() always return this - the Express instance. This is done so they can be chained. It leads to cleaner route configuration and saves us some writing.

This technique is called cascading and is common when we have to do a series of operations in a particular order. The map, filter and reduce array functions work the same way so you can build a descriptive flow of data.

In Express, with chaining you can define a route and chain handlers for the different http methods:

app.route('/book').get(getHandler).post(postHandler).put(putHandler)

The Router

This is the heart of the whole project. The workings of the router enable everything that Express does. And the way that it’s built deeply fascinated me.

It consists of three parts - Router, Route and Layer.

The Router is the object that holds methods like use and route but also the base handle method that runs for each request. The application object delegates to it.

We create a new instance of a Route whenever we call a method like app.get() or app.post() - it holds the path and the http method that we want to handle.

But the most important object that is fundamental to Express’s software design is the Layer. The layers enable Express to find and execute the correct middleware and handlers for each path.

Let’s start with middleware. Whenever we create one - global or on the route level - we create a new Layer. This object holds a path like /books and the function that is to run.

When this Layer is created it is added to a stack (which is just an array). Then when the Router runs the logic for a certain route, it will go through this stack in the order in which the Layers were added and execute those who match. This is why the order in which we attach them is important.

When you add a global middleware, the path for the Layer will be / so it will match each time. The Layer mechanism enables us to run an arbitrary number of middleware.

router.use = function use(fn) {
  // ...
  var layer = new Layer(
    path,
    {
      sensitive: this.caseSensitive,
      strict: false,
      end: false,
    },
    fn
  )

  layer.route = undefined

  this.stack.push(layer)
  // ...
}

When you add a handler for a route, we create a Layer as well but it works differently. The layer’s path is set to the path we’re adding the handler for, but the path handler is not assigned to the Layer. Instead, we set the handler for that Layer to be a dispatch method on the route object that we just created.

router.route = function route(path) {
  var route = new Route(path)

  var layer = new Layer(
    path,
    {
      sensitive: this.caseSensitive,
      strict: this.strict,
      end: true,
    },
    route.dispatch.bind(route)
  )

  layer.route = route
  this.stack.push(layer)
  return route
}

This layer is responsible only for matching the path, then we need to match the actual method. The route object holds a stack of layers itself.

So, first we go through the layers to find one that matches the path. Then we go through the layers of that path and run the correct ones for it, middleware and handler, in the order in which they were added.

Design Decision: The Router

The Router is exported separately so you can use it independently of the main Express object. It contains only the routing and middleware methods. It allows you to create a mini-application, to define to routes for just a part of your app.

This is a good decision because it allows better modularity in applications. Each module can declare its own routes and export them to be used by the main application.

const router = express.Router()

router.get('/', handler).post('/', handler)

Adding a group of routes defined this way is simple because everything in Express is a Layer.

app.use('/books', router)

Design Decision: Layers

The beauty in the way Express is built is that a single type of object is underpinning the entire project. Instead of creating separate entities for the middleware, the handlers and writing more logic, they’ve handled it with a single structure.

To be honest, I wouldn’t have gone in that direction. Maybe I would’ve created separate entities. But this can be limiting to the design of your software. On a high level, Express works like an onion. You have a route, a handler and a bunch of layers in between.

Thinking in a more abstract way opens up new implementation paths. In this case, everything is a layer.

The layers are a good abstraction because they don’t leak any implementation details. As a user of Express you have no idea how the routes and middleware work underneath.

Bonus: How Middleware Works

We may have an unknown number of middleware added before the actual handler. So to make sure they’re executed in order, Express passes to them a function usually named next.

When a middleware calls next it signals that it’s done with its work and the next one can go. But how does that work?

The next function is created inside the handle method on the Router which is called on each request. It’s defined inside the handle method because it needs to hold a closure over it to work correctly.

The job of the next function is to find the next Layer in the stack that needs to execute. This function gets passed from Layer to Layer until the chain is complete or something throws an error.

Summary

There’s a lot to learn from Express’s structure and design decisions.

  • A simple and flat structure is always easier to navigate.
  • Group related entities together - like they’ve done for the Router.
  • Take advantage of the capabilities your language gives you to reduce complexity - like they’ve done for the http methods.
  • Most importantly: always make at least two designs. The initial line of thinking would lead you to an implementation that contains many different entities and requires a lot of imperative logic. A more abstract way of thinking brought the idea of a Layer - an object that can power the whole Router instead.

Tao of Node

Learn how to build better Node.js applications. A collection of best practices about architecture, tooling, performance and testing.