Using PostgreSQL and Node

I’m using node-postgresnode-sql, and node-db-migrate to build a time tracking API, currently named tempus and it’s a wonderful experience.

Most people new to node will see enterprise stacks like mean.io and only be exposed to the NoSQL paradigm for their persistent storage options. This is not the only option! Yes, relational databases are still a thing and many companies use them. Even trendy startups like instagram!

I want to show the steps I have taken to better use PostgreSQL in my time tracking API, tempus.

Migrations

The first step is a reliable migration system. I started out with node-db-migrate for the tempus-api and had a bit of trouble because it was missing some important functionality. For example, through the JS API, there is lack of foreign key support. I considered forking, and making the appropriate changes, but node-db-migrate is designed for multiple SQL management systems. I needed to get closer to PostgreSQL anyways and spend less time writing JavaScript abstractions.

I had another issue with node-db-migrate where column names are inconsistently wrapped in quotes. In PostgreSQL, unquoted column names will lowercased, while quoted column names will be interpreted literally. Early in my design, I wanted to provide a camel-cased JSON API, and without fine grained control of column names, I was bound to slip up. I could easily stick a mapping in between my query results and the model level, but why bother when I can push my key names to the database level?

Examples

Now I am only using two node-db-migrate calls, dropTable and runSql. Below are some of the migrations for the tempus-api.

users Table

dbm = require("db-migrate")
type = dbm.dataType
table = "users"
exports.up = (db, callback) ->
sql = """
create table #{table}(
"id" uuid PRIMARY KEY
DEFAULT uuid_generate_v4(),
"firstName" character varying NOT NULL,
"lastName" character varying NOT NULL,
"email" character varying NOT NULL UNIQUE,
"hash" character varying NOT NULL
)
"""
db.runSql sql, callback
exports.down = (db, callback) ->
if process.env.DROP_TABLES
db.dropTable table, callback
else
callback()

view rawusers-migration.coffee hosted with ❤ by GitHub

users_projects Table

dbm = require("db-migrate")
async = require ‘async’
_ = require ‘underscore’
type = dbm.dataType
table = "users_projects"
exports.up = (db, callback) ->
t = """
create table #{table}(
"userId" uuid NOT NULL,
"projectId" uuid NOT NULL
)
"""
r = [
"""
add constraint #{table}_pkey PRIMARY KEY ("userId", "projectId")
""",
"""
add constraint "user_id_f_key" FOREIGN KEY ("userId") REFERENCES users("id")
""",
"""
add constraint "project_id_f_key" FOREIGN KEY ("projectId") REFERENCES projects("id")
"""
]
r = _.map r, (s) -> "alter table #{table} #{s}"
async.eachSeries _.flatten([t, r]),
(q, c) -> db.runSql(q, c)
, callback
exports.down = (db, callback) ->
if process.env.DROP_TABLES
db.dropTable table, callback
else
callback()

view rawusers-projects-migration.coffee hosted with ❤ by GitHub

time_entries Table

dbm = require("db-migrate")
async = require ‘async’
_ = require ‘underscore’
type = dbm.dataType
table = "time_entries"
exports.up = (db, callback) ->
t = """
create table #{table}(
"id" uuid PRIMARY KEY
DEFAULT uuid_generate_v4(),
"userId" uuid NOT NULL,
"projectId" uuid NOT NULL,
"message" character varying,
"start" timestamptz,
"end" timestamptz,
"duration" real
)
"""
r = [
"""
add constraint "user_id_f_key" FOREIGN KEY ("userId") REFERENCES users("id")
""",
"""
add constraint "project_id_f_key" FOREIGN KEY ("projectId") REFERENCES projects("id")
"""
]
r = _.map r, (s) -> "alter table #{table} #{s}"
async.eachSeries _.flatten([t, r]),
(q, c) -> db.runSql(q, c)
, callback
exports.down = (db, callback) ->
if process.env.DROP_TABLES
db.dropTable table, callback
else
callback()

view rawtime-entries-table-migration.coffee hosted with ❤ by GitHub

Writing SQL isn’t that hard and forces thought surrounding the data types, relations, potential indexes, and constraints. Be close to the metal when it comes to data because damn, what is more important than your data?

While I don’t use much of the JS API of node-db-migrate, I enjoy the general updowncreate CLI and the ability to point to different environments with ease.

Improvements

For node-db-migrate, I need to customize the base template during the create command. I need to control all of the contents of the migration file and the file extension. For me it will be CoffeeScript, but if the custom template code is written correctly, the sky is the limit.

PostgreSQL Client

node-postgres is actively maintained by @brianc and is always getting love, improvements, and fixes. It is probably the most well know postgres-client but as with many node modules, it serves (generally) a single purpose. node-postgres‘s job provides a connected client object back to module consumers. It facilitates the life cycle of these clients but does not do much more. Sure it handles prepared statements, and some basic SQL formatting, but it is not a ORM. You will need to build upon the sturdy foundation that is node-postgres.

When using pg, it is recommended to use the built in client pool. Here is the lowest level integration with using the connection pooling in the tempus api. This is very basic and forces the user of this module to handle any connection errors. In the future I could handle errors here, alter pool settings etc.

pg = require ‘pg’
{dev} = require "#{__dirname}/../database"
devConfig = "postgres://#{dev.user}:#{dev.password}@localhost/#{dev.database}"
constring = process.env.HEROKU_POSTGRESQL_NAVY_URL or process.env.TRAVIS_DB_URL or devConfig
{EventEmitter} = require ‘events’
Database = (callback) ->
pg.connect constring, callback
class Transaction extends EventEmitter
constructor: ()->
Database (err, client, done) =>
if err
@emit ‘error’, err
else
@client = client
@done = done
@begin()
begin: ->
@client.query ‘BEGIN’, (err)=>
if err
@emit ‘error’, err
else
@emit ‘begin’
query: (statement, cb)->
statement = if typeof statement.toQuery is ‘function’ then statement.toQuery() else statement
@client.query statement, (err, rows) =>
if err
@rollback()
else
cb rows
rollback: () ->
@client.query ‘ROLLBACK’, (err) =>
@done err
@emit ‘rollback’, err
commit: (obj)->
@client.query ‘COMMIT’, (err) =>
@done()
@emit ‘commit’, obj
module.exports = Database
module.exports.Transaction = Transaction

view rawtempus-db-index.coffee hosted with ❤ by GitHub

In the above gist, you will also see a Transaction Event Emitter. Transactions are part of SQL and this abstraction is available from other packages such as node-any-db-transaction. I needed a simple way to start and stop transactions. The idea of rolling back data manipulation operations in application code is a fierce some task. Luckily SQL has our back on this one, and we can easily reverse any set of queries if necessary.

Here is a way to use this Transaction object.

deleteOne: (key, callback)->
{time_entry} = TimeEntriesController
deleteProjectTimeEntries = time_entry.delete().where(time_entry.projectId.equals(key))
deleteProject = @project.delete().where(@project.id.equals(key))
deleteUsersProjectsRows = @usersprojects.delete().where(@usersprojects.projectId.equals key)
t = @transaction()
start = ->
async.eachSeries [deleteProjectTimeEntries, deleteUsersProjectsRows, deleteProject],
(s, cb)->
t.query s, (err)->
cb(err)
, ->
t.commit()
t.on ‘begin’, start
t.on ‘error’, callback
t.on ‘commit’, ->
callback()
t.on ‘rollback’, ->
callback new Error "Could not delete project with id #{key}"

view rawtransaction-ex.coffee hosted with ❤ by GitHub

This snippet includes usage of the node-sql module that generates SQL strings from definitions outside this function scope. Just pretend they are SQL strings. For each of the SQL strings, execute in serial. If an error is thrown by PostgreSQL, the Transaction Object will internally emit a rollback.

Improvements

For future transaction support, I am taking a deeper dive into node-any-db-transaction because it’s more mature and battle tested in node-any-db.

Application SQL

In migration SQL, I am not concerned at all with user input and thus no need for SQL injection projection. Only authorized people will run migrations against the staging DB and production DB. In the application SQL, SQL injection attacks must be mitigated.

node-sql builds prepared statements.

Table objects are defined with an object containing a name and an array of columns.

time_entries = sql.define
name: ‘time_entries’
columns: [‘message’, ‘projectId’, ‘userId’, ‘start’, ‘end’, ‘duration’, ‘id’]

view rawtime-entries-sql.coffee hosted with ❤ by GitHub

For selecting time entries for a particular project, the statements should look like this:

getForProject: (projectId, callback)->
statement = @time_entry.select @time_entry.star()
.from @time_entry
.where @time_entry.projectId.equals projectId

view rawtime-entries-for-project.coffee hosted with ❤ by GitHub

The above statement object is passed to a deeper layer where I eventually call .toQuery yielding a prepared statement which combats SQL injection attacks. Prepared statements can also help speed up frequently used queries if used correctly.

Improvements

As the data model becomes more complex, it will be nice to build partially generated node-sql objects that can be shared throughout my controller level code. Not everything is supported in node-sql. Some of the specific features of PostgreSQL have not be implemented in node-sql such as window functions. I haven’t used these yet, but I hope that node-sql is easy extensible. Since node-sql is designed for multiple relational databases, it is wise to use node-sql as a base and use a module, perhaps node-psql, that inherits and builds PostgreSQL specific functionality.

Query Language

Using a relational database adds additional complexity some higher level application programmers are not used to. The benefits of SQL greatly outweigh the overhead of managing your data model. At some point the data model has to be defined. Start thinking about the data from the beginning.

In my experience with working with NoSQL stores, one ends up adding a lot of application code to query the data in a NoSQL store, build and update indexes, and forgetting to include ways to migrate data when the model changes. Frankly, I want to push so much of this responsibility back to the Database so I don’t have to write it my self. The less code I write the better.

General Thoughts on Scaling

Relational algebra, the math relational databases are built on, is a beautiful math that scales infinitely. I find it disturbing that all of the big data NoSQL solutions do not provide a Standard Query Language for asking questions about the data stored. There is no reason to reinvent the wheel for big data. Relational algebra scales infinitely. Big data solutions should use the math.

While traditional storage engines used by relational databases may not scale as well as big data Peta byte solutions, relational algebra principles and concepts will.

Repositories Referenced


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *