Playing with a native C extension for Node.js

I’ve always felt like a good way to fully understand a language is to play with the native interface it exposes. I’ve already done a post playing with a native C Ruby extension, and now that I’m using Node.js more I felt like I should dig into it. Here is how I ended up playing with a native C extension for Node.js. I did not have a real goal while playing around, so I did the exact same thing I did for Ruby. As always, the code is hosted on my GitHub repository!

Why a native extension

Node.js is a great tool for event-based concurrency. Having a garbage collector and being single-threaded makes for a language that is easy to get into and fast. But like any language, it is not perfect.

For example, Node.js is not the best if you need to do some CPU-bound computations. Even using threads won’t help since only one Node.js thread will run at a time. There is also the case of wanting to interface with existing native libraries or with the OS.

For all these reasons, and many others, a native extension could be useful.

Needed setup

There are multiple tooling frameworks that can be used to do a native library for Node.js. In the end, most end up wrapping around N-API, the Node.js native API. I decided to use node-gyp.

The initial repository setup was quite trivial:

  1. Create (and fill) the binding.gyp file
  2. Run npm init

The binding.gyp file is used to configure how node-gyp will build the native part of the extension. In my case, I went with something simple:

Once you have the file in your directory, npm init will recognize it and do the rest of the setup.

Loading the native interface can be tricky, luckily there is a package for that! The library is called node-bindings. Using this library, you can load your library with one simple line: require('bindings')('nci.node').

This will find your native bundle (in my case nci.node) and just load it. No need to think about release vs debug, about platforms, or anything else.

Module management

Exposing your module is done through the  NAPI_MODULE macro. The macro requires a function as a second argument, this function will be called to initialize the module. The initialization method needs to return everything that is exported from the module. For example, the following would register a new module with the method Init as initializer: NAPI_MODULE(NODE_GYP_MODULE_NAME, Init).

In my case, I add the newly created class definition to the existing exports and return that. I also take this time to store a few global variables that I’ll need to reuse in various calls.

Creating the class definition is simple enough: register the class by giving it a name and a constructor callback and then export it. For example, I went with the following:

Class/instance management

There are two Node.js syntaxes that can get your constructor called:

I decided to support only the first version simply because it is a bit easier to do so. In order to distinguish the two versions, you’ll need to check the target of the call. Basically, if the target is set, you are in the  new MyClass()  version. Doing so is pretty easy:

Once in the constructor, I get and validate the first argument, expecting an int. Once this is done, I simply create the native instance and wrap it like so:

When wrapping the instance is the moment where you define a cleanup callback. That callback will be called when the object is garbage collected.

Once the object is wrapped, you’ll need to register all functions on the instance. In order to do so, simply repeatedly call napi_create_function and napi_set_named_property to create and add the function to the object.

Accessing the native object

In the function callbacks, you’ll need to access your native object. In order to do so, you’ll need to get the call information and then unwrap the target object (this) like so:

Loading the module

If you are using node-bindings like me, loading the library is done simply through:  require('bindings')('nci.node'). It is a good idea to wrap that into your own module and not require your customer to do this for you.

Related links

Making AWS Lambda deploy faster using layers

I’ve been playing with AWS Lambdas for a little while, at the same time as learning NodeJS. Let’s say that I’ve uploaded quite a few new versions of my test Lambdas. Even with a small code base and minimal dependencies, I always feel like deploying is slow. Here is how I ended up making AWS Lambda deploy faster using layers. I decided to update the scripts I was using for my previous post, therefore everything can be found in this repository. More precisely, update_dependencies.sh shows an example of how to deploy a new version of a layer and update your lambda.

The problem

I timed a few parts of my deploy script to find out that the zipping process was taking quite a bit of time:

This is a whole 15 seconds to deploy a change to the lambda. You might think that this is millions of files, images, and such. But it is merely 1900 files, the Alexa SDK. Okay, some of that might be my fault, I’m using a Raspberry Pi to develop my things.

Fixing the issue

I already know about Lambda layers since I’ve used HashiCorp’s Vault AWS Lambda extension in a previous job. I was not sure if that concept could be used to store libraries. Turns out it can!

Using layers might even allow you to directly edit your code in the web interface of the lambda if it is small enough.

The first thing that I did was to manually create the layer one time through the web page. This will give you an ARN and make things easier later. The zip file that I uploaded was simply containing a random file of mine.

Uploading the layer version

Once you have the ARN you’ll be able to use the AWS cli to upload the new version of the layer. In order to do so, you’ll need to use the command: aws lambda publish-layer-version. In my case, I invoke that command line like so:

The one ambiguous part is the creation of the zip file. In my case, the only thing that will be included in the zip file will be libraries, the documentation explains the directory structure that needs to be used. Basically, for nodejs package that is not specific to a given version, you’ll need to have all your dependencies in a nodejs/node_modules directory in the zip file. There are ways to make this specific to a given runtime, the documentation explains these.

I decided to make it simple to create the dependency tree: create a temporary directory with a sub-directory named nodejs, and run npm install in there to get the dependencies. Once this is done, zip everything starting at the root of the temporary directory.

For the next step, I will need the version number of the new layer. Luckily, the return value of the command used to update the layer will give you that information. The return format is JSON formatted, I decided to use jq to extract the version. Using the great tool, extracting the version is as simple as piping the result of the update command into  jq .Version.

Updating the function to use the new version

Do note that updating the function at this point is probably a bad idea for production code. Any instance of the lambda will use the new dependencies, and might simply break.

Since I’m only playing with all this and do not mind if my lambdas break, I decided to update the functions directly in the script that uploads a new layer version. It can be done using update-function-configuration:

A few related links

Storing data in Alexa-triggered Lambda

My latest project is to be able to somewhat control my RaspberryPi with my Alexa devices. While playing around, I ended up wanting to store data associated with the Amazon account. I decided to explore storing data with two retention policies: data kept for the session and data kept forever. Here is how I ended up storing data in Alexa-triggered Lambda for those two scenarios. As always, the source code used for this experiment can be found in my GitHub repository.

Identifying the user

I saw two identifiers that were interesting: the userId and the personId. According to the documentation, userId is the identifier associated with the account of the user. On the other side, the personId is used to represent the human that executed the command. Therefore, multiple personIds can be associated with the same userId.

Alexa Skill setup

In order to be able to test both storage policies, I needed a few different intents. I decided to keep it simple and went with the following:

  • Add: adds one to a session counter and returns the new value
  • Current: returns the value of the counter
  • Forget: resets the counter
  • Persist: stores the value of the counter in a persistent database
  • Restore: restores the value of the counter from the persistent database

Lambda environment setup

As with all my other projects, one of my main goals here was to learn a little something. Therefore I decided to try a few new technologies this time around. My previous test with Alexa was done using Python and handled HTTP calls directly.

In order to shake things up, I decided to go with: Node.js and the Alexa SDK. This meant a new language and using the official SDK instead of raw HTTP queries.

For the persistent storage, I wanted something that was easy to use, simple, and, most importantly, free. I decided to go with DynamoDB.

Handling session storage

The session storage is available through the JSON payload received and returned by your lambda. Using the Alexa SDK makes managing these attributes easy through the attribute manager.

In order to read a value from the session, you can use the attribute manager:

You can write to the session through the same object:

Getting a new Alexa session

While testing this skill, I ended up having to reset my session a few times. The simplest way to do so is to say: Alexa, exit. This will give you a new sessionId and reset all stored session attributes.

Handling persistent storage

Reading and writing to DynamoDB is almost as easy as writing to the session storage.

In order to write to the database, you need to build the parameters and pass these to the put function of an instance of AWS.DynamoDB.DocumentClient. The following is an example of that flow:

You can then read the value following the same pattern: build the params and pass those to the get function of the instance.