Thom Wright

Correlation IDs in NodeJS


Much has already been written about the need for correlation IDs in microservice architectures. If this is a new concept for you, I encourage you to read Building Microservices by Sam Newman. Or if you want a quick intro, try this blog post.

There are three ways I know of to pass a correlation ID around a NodeJS application:

  1. continuation-local storage
  2. async hooks
  3. function arguments

Continuation-local storage is what New Relic used for their instrumentation library, before apparently abandoning it for performance reasons. It’s pretty magic, and relies on “extensively monkeypatching the core platform” (quote).

Aync hooks are experimental at the time of writing (NodeJS v10.7.0). They’re also pretty magic! Despite the magic, I think they might soon be the recommended way of solving these problems, but for now I’m going to focus on the ‘manual’ approach: simple function arguments.

To start with, let’s pretend we’ve written a simple application which handles HTTP requests. It does the following:

  • takes a number and an operation from the request
  • looks up another number in the database
  • performs the operation on the two numbers
  • updates the database
  • returns the result

Here’s what the code might look like.

function add(a, b) {
  console.log(`Adding ${a} and ${b}`)
  return a + b
}

function subtract(a, b) {
  console.log(`Subtracting ${b} from ${a}`)
  return a - b
}

function createDbAccess() {
  let databaseValue = 5
  return {
    async get() {
      console.log(`Getting db val: ${databaseValue}`)
      return databaseValue
    }
    async set(n) {
      console.log(`Setting db val: ${n}`)
      databaseValue = n
    }
  }
}

async function modifyValue(n, op) {
  console.log("Starting")
  const db = createDbAccess()

  const dbVal = await db.get()

  const newVal = op === "add"
    ? add(n, dbVal)
    : subtract(n, dbVal)

  await db.set(newVal)

  console.log("Finished")
  return newVal
}

async function httpHandler(req, res) {
  const x = await modifyValue(
    req.params.number,
    req.params.operation
  )
  res.send(x)
}

Some of our functions write some logs. We want these logs to include correlation IDs, so let’s try doing that.

Explicit function arguments

Our first attempt at refactoring this is to take the correlation ID from the request object and pass it around to any functions which need it.

function add(correlationID, a, b) {
  console.log(`Adding ${a} and ${b}`, correlationID)
  return a + b
}

function subtract(correlationID, a, b) {
  console.log(`Subtracting ${b} from ${a}`, correlationID)
  return a - b
}

function createDbAccess() {
  let databaseValue = 5
  return {
    async get(correlationID) {
      console.log(`Getting db val: ${databaseValue}`, correlationID)
      return databaseValue
    }
    async set(correlationID, n) {
      console.log(`Setting db val: ${n}`, correlationID)
      databaseValue = n
    }
  }
}
const db = createDbAccess()

async function modifyValue(correlationID, n, op) {
  console.log("Starting", correlationID)

  const dbVal = await db.get(correlationID)

  const newVal = op === "add"
    ? add(correlationID, n, dbVal)
    : subtract(correlationID, n, dbVal)

  await db.set(correlationID, newVal)

  console.log("Finished", correlationID)
  return newVal
}

async function httpHandler(req, res) {
  const correlationID = req.headers["X-Correlation-Id"]
  const x = await modifyValue(
    correlationID,
    req.params.number,
    req.params.operation
  )
  res.send(x)
}

We can think of our functions as being called in a tree. In general I guess it’s a directed graph, but a tree will do fine here.

  • httpHandler
    • addToStoredValue
      • getFromDatabase
      • add

This tree is tiny, but most real-world applications will have significantly larger function call trees. Manually passing a value all the way from the root (httpHandler) to the leaf nodes can get pretty cumbersome.

‘Constructor’ dependency injection

Here, we organise our function into components/modules/classes (or whatever you want). Instead of passing the correlation ID to each function, we pass it to the function which creates the component. Any function in that module then has access to it.

function createCalculator(correlationID) {
  return {
    add(a, b) {
      console.log(`Adding ${a} and ${b}`, correlationID)
      return a + b
    },

    subtract(a, b) {
      console.log(`Subtracting ${b} from ${a}`, correlationID)
      return a - b
    },
  }
}

function createDbAccess(correlationID) {
  let databaseValue = 5
  return {
    async get() {
      console.log(`Getting db val: ${databaseValue}`, correlationID)
      return databaseValue
    }
    async set(n) {
      console.log(`Setting db val: ${n}`, correlationID)
      databaseValue = n
    }
  }
}

function createBusinessLogic(correlationID, db, calculator) {
  return {
    async modifyValue(n) {
      console.log("Starting", correlationID)

      const dbVal = await db.get()

      const newVal = op === "add"
        ? calculator.add(n, dbVal)
        : calculator.subtract(n, dbVal)

      await db.set(newVal)

      console.log("Finished", correlationID)
      return newVal
    }
  }
}

async function httpHandler(req, res) {
  const correlationID = req.headers["X-Correlation-Id"]

  // wire up all dependencies
  const db = createDbAccess(correlationID)
  const calculator = createCalculator(correlationID)
  const logic = createBusinessLogic(correlationID, db, calculator)

  const x = await logic.addToStoredValue(req.params.number)
  res.send(x)
}

We can think of our component and their dependencies as a directed acyclic graph (though again a simple tree will do in this case):

  • HTTP handler
    • high-level business logic
      • calculator
      • database access

One thing I want to point out here is the ‘lifetime’ of our components. Before, everything lived for the lifetime of the application. For example, our data access component (const db = createDbAccess()) was created at application start, and used for every request.

After this refactoring, the data access component is created in the request handler, and it lives only for the length of the request.

It’s important to note that if we were connecting to a real database, we’d want the connection (or connection pool) to have an ‘application lifetime’, because creating connections is expensive. We’d want to do something like this:

// in scope for the whole lifetime of the application
const dbConnection = createDbConnection()

async function httpHandler(req, res) {
  // in scope only for a single request
  const db = createDbAccess(dbConnection, correlationID)
  // ...
}

Here, our database connection lives for the lifetime of the application. Since our data access component requires per-request data (our correlation ID), it lives only as long as the request.

Components should only have dependencies on other components with the same lifetime, or a longer lifetime. A component which needs to live for the lifetime of the application shouldn’t depend on something which should only live for the lifetime of a request.

Separating concerns

Compare these two implementations ofthe business logic:

// manually passing to each function
async function modifyValue(correlationID, n, op) {
  console.log("Starting", correlationID)

  const dbVal = await db.get(correlationID)

  const newVal = op === "add"
    ? add(correlationID, n, dbVal)
    : subtract(correlationID, n, dbVal)

  await db.set(correlationID, newVal)

  console.log("Finished", correlationID)
  return newVal
}
// using component dependency injection
async function modifyValue(n) {
  console.log("Starting", correlationID)

  const dbVal = await db.get()

  const newVal = op === "add"
    ? calculator.add(n, dbVal)
    : calculator.subtract(n, dbVal)

  await db.set(newVal)

  console.log("Finished", correlationID)
  return newVal
}

The second example has significantly less noise. We’ve managed to remove almost all mentions of correlation IDs. This means we can write most of our application code without having to worry about passing correlation IDs around. Nice!

If we wanted to take this a step further, we could create a logger component using the correlation ID, and pass that in along with the other dependencies. Let’s do that now.

function createCalculator(logger) {
  return {
    add(a, b) {
      logger.log(`Adding ${a} and ${b}`)
      return a + b
    },

    subtract(a, b) {
      logger.log(`Subtracting ${b} from ${a}`)
      return a - b
    },
  }
}

function createDbAccess(logger) {
  let databaseValue = 5
  return {
    async get() {
      logger.log(`Getting db val: ${databaseValue}`)
      return databaseValue
    }
    async set(n) {
      logger.log(`Setting db val: ${n}`)
      databaseValue = n
    }
  }
}

function createBusinessLogic(logger, db, calculator) {
  return {
    async modifyValue(n) {
      logger.log("Starting")

      const dbVal = await db.get()

      const newVal = op === "add"
        ? add(n, dbVal)
        : subtract(n, dbVal)

      await db.set(newVal)

      logger.log("Finished")
      return newVal
    }
  }
}

function createLogger(correlationID) {
  return {
    log(s) {
      console.log(s, correlationID)
    }
  }
}

async function httpHandler(req, res) {
  const correlationID = req.headers["X-Correlation-Id"]

  // wire up all dependencies
  const logger = createLogger(correlationID)
  const db = createDbAccess(logger)
  const calculator = createCalculator(logger)
  const logic = createBusinessLogic(logger, db, calculator)

  const x = await logic.addToStoredValue(req.params.number)
  res.send(x)
}

There! No more mention of correlationID anywhere except the logger, the only place that really needs to know about it.

Removing the correlation ID from our core logic might not seem that important, but as your application size increases and you have more request-scoped variables (e.g. tracing spans or deadlines) this can become increasingly unmanageable.

Our component hierarchy now looks like:

  • HTTP handler
    • high-level business logic
      • logger
      • calculator
        • logger
      • database access
        • logger

To wire these up, we start at the leaf nodes (here that’s logger) and work our way to the root. The bigger your app, the more complicated your wiring. You might want to make this reusable, or even look into dependency injection systems. Personally, I think there are benefits to the explicit wiring shown here. Saying that, at Candide we use a library I wrote called di-hard which automatically wires our components together. It’s not strictly necessary, but saves some boilerplate.

We also use a service chassis called the shell. This gives developers easy access to a logger and HTTP client which already make use of the correlation ID.

This post is long enough already, so I’m going to stop here. The next steps would be to think about how to propagate this correlation ID to another service through e.g. an HTTP request. This is an easy as creating a new httpClient component which takes a correlation ID, and wiring it in wherever it’s needed.

Once an application is architected this way, adding any other context propagation is much more straightforward. Moving to full distributed tracing is a relatively easy step.

I would strongly recommend using correlation IDs from the beginning if possible. Refactoring them into an existing application is a not a fun job!

Multi-Environment Setups in Snap CI


I’ve been a big fan of Travis for a while now. It runs the builds for most of my open source projects. However, recently I’ve been finding it a bit sluggish, and something fishy seems to have happened to my automated NPM deployments. So, I figured it was time to give some other CI services a go.

One I’m trying at the moment is Snap CI.

One thing that was really easy to do in Travis is running your tests in a number of different environments, using a build matrix. For example, if I wanted to run my JS tests on several versions of NodeJS, I could put the following in my travis.yml:

language: node_js
node_js:
  - "0.12"
  - "0.10"
  - "iojs"

Simples.

Not so in Snap.

This is how I’ve done it, YMMV.

The basic idea is to have one stage in your pipeline per environment. For example, below I have one stage for node v0.12, and one for io.js v2.3.2.

Snap Pipeline

It’s important to note that NODEJS VERSION is None. This version applies to all stages in the pipeline, and we don’t want that.

Since Snap exposes nvm (Node Version Manager), we can install whichever version we like in each stage, like so:

nvm install 0.12 2>/dev/null
nvm use 0.12

We can do this for each stage, but at some point we might want to put this into a script and version control it along with our code. Too much code in CI tools can be considered a smell, and we probably want to avoid this.

Now, a little script like this would do the trick:

# [repo]/scripts/install

#!/bin/sh

nvm install $1 2>/dev/null
nvm use $1
npm install

But, if you call the script using:

./scripts/install 0.12

you’ll get an error: ./scripts/install: line 4: nvm: command not found.

The solution is to call it like so:

bash -l ./scripts/install 0.12

Thanks to Akshay Karle from ThoughtWorks for helping me out with this. Shell scripting is not my forte!

Handling Events with React-Mainloop


I recently created a React.js component wrapper around this main loop library. You can find it here: react-mainloop. It can be used to control a React component using a game loop. It uses an update() function to generate new props, and takes control of when rendering occurs. It’s especially useful for animating games, or other interactive canvas-based apps.

Since then I’ve been working on finding a good way to handle events using this system. This is what I’ve come up with so far.

Before we go any further, it might be worth reading a bit about game loops:

First Attempt

My first implementation simply responded to browser events by handling them immediately, and updating component state. This triggered React rendering, and made things very jerky when responding to mousemove events.

What we really want is to decouple event handling from event listeners. The game loop should be the only thing in control of updating state and re-rendering, so events should be handled in the update() function.

Another good reason to decouple event handling from event listeners is separation of concerns. The React Components listening for events should have enough data to render, and nothing more. This means that they might not know enough about the state of the app to properly respond to events that happen on them. update(), by necessity, knows the entire state of the app, so it the perfect candidate to decide how to respond to events.

Implementation

Here is an outline of my current implementation for event handling using the Event Queue pattern. All code below should be treated as pseudo-code.

The React Components create Events, in response to browser events. There are different event types for different things, for example: BackgroundMouseDown, or EnemyClick. These Events are useful because they contain more information than the native browser event. For example, EnemyClick could contain an enemyID property to identify which enemy was clicked.

class Enemy extends React.Component {

  constructor(props) {
    super(props);
    this.onClick = this.onClick.bind(this);
  }

  onClick(event) {
    this.props.pushEvent({
      event,
      type: 'EnemyClick',
      enemyID: this.props.id
    });
  }

  render() {
    return (
      <EnemySprite
        onClick={this.onClick}
      />
    );
  }
}

These Events are added to a queue, to be processed every time update() is called.

// event queue
let events = [];

const getUpdateFor = (componentRef) => {
  // current game state
  let gameState = {
    enemies: []
  };
  const update = (delta) {
    // handle all events since last update
    events.forEach((event) => {
        switch (event.type) {
          case `EnemyClick`: damageEnemy(event.enemyID, gameState); // updates gameState
            break;
          default:
        }
      });
    events = []; // reset events
    return gameState;
  };
  return update;
};

The pushEvent prop is passed down from the top level, like so:

class Game extends React.Component {

  render() {
    const animate = new Animator();
    const AnimatedCanvas = animate(GameCanvas, getUpdateFor);
    return (
      <AnimatedCanvas
        pushEvent={(event) => { events.push(event); }}
        gameState={gameState}
      />
    );
  }
}

Optional Extras

These extras made use of the well-known Command Pattern.

Injectable Event Handlers

In some cases, processing these events involves deciding what action to take in response to each event type. The action to perform might change depending on what state, or mode, the game is in. We could use an EventProcessor for this. It could be supplied with a mapping from event type to event handler. Here’s a possible implementation:

const normalHandler = function(event, gameState) {
  switch (event.type) {
    case `EnemyClick`: damageEnemy(event.enemyID, gameState);
      break;
    default:
  }
};

const superModeHandler = function(event, gameState) {
  switch (event.type) {
    case `EnemyClick`: killEnemy(event.enemyID, gameState);
      break;
    default:
  }
};

const EventProcessor = function(initialHandler) {
  let handler = initialHandler;

  this.process = (events, gameState) => {
    events.forEach((event) => { handler(event, gameState); } );
  };
  this.setHandler = (newHandler) => { handler = newHandler; };
};

new EventProcessor(normalHandler).process(events, gameState);

Undo/Redo with an Executor

This can easily be done with the Command Pattern. Event handlers could create a Command object with execute() and undo() methods. These commands are sent to an Executor, which stores previous commands in a stack. Again, an example implementation:

const Executor = function() {
  const undoStack = [];
  const redoStack = [];

  this.execute = (command) => {
    command.do();
    undoStack.push(command);
  };
  this.executeAll = (commands) => {
    commands.forEach((command) => {
      this.execute(command);
    });
  };
  this.undo = () => {
    const command = undoStack.pop();
    if (command) {
      command.undo();
      redoStack.push(command);
    }
  };
  this.redo = () => {
    const command = redoStack.pop();
    if (command) {
      command.do();
      undoStack.push(command);
    }
  };
};

const Command = function(execute, undoFunc) {
  this.execute;
  this.undo = undoFunc;
};

new Exectutor().execute(new Command(damageEnemy, giveHealth);

Further Work

I’d like a better way of handling game state. Immutability would be great. An alternative to explicitly passing the state object around would be preferable.

It would also be interesting to try out a Flux-type system. I’m not sure how far this breaks down when we no longer update state or render in response to events.

I’m considering adding support for event handling into react-mainloop, or maybe in a separate library. I’ll probably update this post if/when I improve on these ideas.

Feedback is always appreciated!

Running Mocha in __tests__ directories


I don’t know about you, but I quite like the Jest convention of putting tests in __tests__ directories. It keeps the tests local to the modules they’re testing, and visible in the src directory, rather than hidden away in test. I know, it’s the little things.

Anyway, here’s how to achieve that with Mocha, my test runner of choice. Just stick the following in your package.json scripts:

"mocha": "find ./src -wholename \"./*__tests__/*\" | xargs mocha -R spec"

Inspired by this Gist.

EDIT

Alternatively, this is much simpler and seems to work:

"mocha": "mocha 'src/**/__tests__/*' -R spec"

Beautiful APIs in CoffeeScript


Let’s say we want to make a maths library in CoffeeScript (e.g. a Matrix library). We could easily write an API for addition that looks like:

nine = four.plus five

But what if we want to do this:

nine = four plus five

I know it’s only removing a ., but I think it looks a bit nicer. Let’s see how to do it.

First thing to note is that this relies on some of CoffeeScript’s syntactic sugar. With brackets, the code is:

nine = four(plus(five))

Not exactly pretty, but it allows us to more clearly see what’s going on.

We can see that our numbers need to be functions that take in whatever the result of plus(five) is. Let’s create one of these numbers like so:

makeNumber = (number) ->
  (op) ->
    # Do addition

four = makeNumber 4
console.log four # [Function]

(Ideally we’d be writing tests for this stuff before implementing it. Instead I’m using console.log. Let’s call this, uh… Log-Driven Development (LDD))

Whatever the result of plus(five) is, it’s going to need the other number (four) to do the addition. Let’s implement that now.

makeNumber = (number) ->
  (op) ->
    op number

Now that’s done, let’s have a go at implementing the plus function.

plus = (number) ->
  (otherNumber) ->
    number + otherNumber

Only, this won’t work. Why not? Well, have a look at the types of number and otherNumber. number is something we created with the makeNumber function. A ‘wrapped number’ if you will. otherNumber is just a normal number.

How do we add these? We need to ‘unwrap’ number. Let’s do this by calling the wrapper with no argument e.g. four(). This can be implemented like so:

makeNumber = (number) ->
  (op) ->
    op?(number) or number

four = makeNumber 4
console.log four() # 4

And refactor our plus function:

plus = (number) ->
  (otherNumber) ->
    number() + otherNumber

Something’s not quite right about this though. Feels a bit asymmetrical. How about if both number and otherNumber were both wrapped? Let’s try it.

makeNumber = (number) ->
  wrapper = (op) ->
    op?(wrapper) or number

plus = (number) ->
  (otherNumber) ->
    number() + otherNumber()

Note how we pass the wrapper into op in makeNumber.

OK, looking good! Let’s put it together and give it a go:

makeNumber = (number) ->
  wrapper = (op) ->
    op?(wrapper) or number

four = makeNumber 4
five = makeNumber 5

plus = (number) ->
  (otherNumber) ->
    number() + otherNumber()

nine = four plus five
console.log nine() # 9

Last thing to do is to make plus return a wrapped number, to keep everything in our nicely wrapped format:

plus = (number) ->
  (otherNumber) ->
    makeNumber number() + otherNumber()

We can easily extend this to other operations, such as multiplication:

makeNumber = (number) ->
  wrapper = (op) ->
    op?(wrapper) or number

four = makeNumber 4
five = makeNumber 5

plus = (number) ->
  (otherNumber) ->
    makeNumber number() + otherNumber()

times = (number) ->
  (otherNumber) ->
    makeNumber number() * otherNumber()

nine = four plus five
twenty = four times five
console.log nine() # 9
console.log twenty() # 20

We can even extend it to something like vector addition:

makeNumber = (number) ->
  wrapper = (op) ->
    op?(wrapper) or number

fours = makeNumber [4, 4]
fives = makeNumber [5, 5]

vPlus = (vector) ->
  (otherVector) ->
    r = []
    for v, i in vector()
      r[i] = v + otherVector()[i]
    makeNumber r

nines = fours vPlus fives
console.log nines() # [9, 9]

And there we have it. A beautifully readable (IMHO) API in CoffeeScript.

Now, why would anyone go to all this effort just to remove the .? I’ll be honest, I did it simply because it looks nice.

To my surprise though, it also creates a very flexible API that works for many operands (e.g. numbers, vectors) and allows pluggable user-supplied operations (addition, multiplication…).

My only wish is for some kind of type system to check for silly things like using vPlus with ordinary numbers at compile time. Oh well. Maybe I should use TypeScript

Welcome


Welcome to my brand new website, courtesy of GitHub Pages and Poole.

With any luck, I’ll have something worth seeing here sometime soon. Don’t hold your breath.