I’m using node-postgres, node-sql, and node-db-migrate to build a time tracking API, currently named tempus and it’s a wonderful experience.
Most people new to node will see enterprise stacks like mean.io and only be exposed to the NoSQL paradigm for their persistent storage options. This is not the only option! Yes, relational databases are still a thing and many companies use them. Even trendy startups like instagram!
I want to show the steps I have taken to better use PostgreSQL in my time tracking API, tempus.
Migrations
The first step is a reliable migration system. I started out with node-db-migrate
for the tempus-api and had a bit of trouble because it was missing some important functionality. For example, through the JS API, there is lack of foreign key support. I considered forking, and making the appropriate changes, but node-db-migrate
is designed for multiple SQL management systems. I needed to get closer to PostgreSQL anyways and spend less time writing JavaScript abstractions.
I had another issue with node-db-migrate
where column names are inconsistently wrapped in quotes. In PostgreSQL, unquoted column names will lowercased, while quoted column names will be interpreted literally. Early in my design, I wanted to provide a camel-cased JSON API, and without fine grained control of column names, I was bound to slip up. I could easily stick a mapping in between my query results and the model level, but why bother when I can push my key names to the database level?
Examples
Now I am only using two node-db-migrate
calls, dropTable
and runSql
. Below are some of the migrations for the tempus-api.
users Table
dbm = require("db-migrate") | |
type = dbm.dataType | |
table = "users" | |
exports.up = (db, callback) -> | |
sql = """ | |
create table #{table}( | |
"id" uuid PRIMARY KEY | |
DEFAULT uuid_generate_v4(), | |
"firstName" character varying NOT NULL, | |
"lastName" character varying NOT NULL, | |
"email" character varying NOT NULL UNIQUE, | |
"hash" character varying NOT NULL | |
) | |
""" | |
db.runSql sql, callback | |
exports.down = (db, callback) -> | |
if process.env.DROP_TABLES | |
db.dropTable table, callback | |
else | |
callback() |
view rawusers-migration.coffee hosted with ❤ by GitHub
users_projects Table
dbm = require("db-migrate") | |
async = require ‘async’ | |
_ = require ‘underscore’ | |
type = dbm.dataType | |
table = "users_projects" | |
exports.up = (db, callback) -> | |
t = """ | |
create table #{table}( | |
"userId" uuid NOT NULL, | |
"projectId" uuid NOT NULL | |
) | |
""" | |
r = [ | |
""" | |
add constraint #{table}_pkey PRIMARY KEY ("userId", "projectId") | |
""", | |
""" | |
add constraint "user_id_f_key" FOREIGN KEY ("userId") REFERENCES users("id") | |
""", | |
""" | |
add constraint "project_id_f_key" FOREIGN KEY ("projectId") REFERENCES projects("id") | |
""" | |
] | |
r = _.map r, (s) -> "alter table #{table} #{s}" | |
async.eachSeries _.flatten([t, r]), | |
(q, c) -> db.runSql(q, c) | |
, callback | |
exports.down = (db, callback) -> | |
if process.env.DROP_TABLES | |
db.dropTable table, callback | |
else | |
callback() |
view rawusers-projects-migration.coffee hosted with ❤ by GitHub
time_entries Table
dbm = require("db-migrate") | |
async = require ‘async’ | |
_ = require ‘underscore’ | |
type = dbm.dataType | |
table = "time_entries" | |
exports.up = (db, callback) -> | |
t = """ | |
create table #{table}( | |
"id" uuid PRIMARY KEY | |
DEFAULT uuid_generate_v4(), | |
"userId" uuid NOT NULL, | |
"projectId" uuid NOT NULL, | |
"message" character varying, | |
"start" timestamptz, | |
"end" timestamptz, | |
"duration" real | |
) | |
""" | |
r = [ | |
""" | |
add constraint "user_id_f_key" FOREIGN KEY ("userId") REFERENCES users("id") | |
""", | |
""" | |
add constraint "project_id_f_key" FOREIGN KEY ("projectId") REFERENCES projects("id") | |
""" | |
] | |
r = _.map r, (s) -> "alter table #{table} #{s}" | |
async.eachSeries _.flatten([t, r]), | |
(q, c) -> db.runSql(q, c) | |
, callback | |
exports.down = (db, callback) -> | |
if process.env.DROP_TABLES | |
db.dropTable table, callback | |
else | |
callback() |
view rawtime-entries-table-migration.coffee hosted with ❤ by GitHub
Writing SQL isn’t that hard and forces thought surrounding the data types, relations, potential indexes, and constraints. Be close to the metal when it comes to data because damn, what is more important than your data?
While I don’t use much of the JS API of node-db-migrate
, I enjoy the general up
, down
, create
CLI and the ability to point to different environments with ease.
Improvements
For node-db-migrate
, I need to customize the base template during the create
command. I need to control all of the contents of the migration file and the file extension. For me it will be CoffeeScript, but if the custom template code is written correctly, the sky is the limit.
PostgreSQL Client
node-postgres
is actively maintained by @brianc and is always getting love, improvements, and fixes. It is probably the most well know postgres-client but as with many node modules, it serves (generally) a single purpose. node-postgres
‘s job provides a connected client object back to module consumers. It facilitates the life cycle of these clients but does not do much more. Sure it handles prepared statements, and some basic SQL formatting, but it is not a ORM. You will need to build upon the sturdy foundation that is node-postgres
.
When using pg, it is recommended to use the built in client pool. Here is the lowest level integration with using the connection pooling in the tempus api. This is very basic and forces the user of this module to handle any connection errors. In the future I could handle errors here, alter pool settings etc.
pg = require ‘pg’ | |
{dev} = require "#{__dirname}/../database" | |
devConfig = "postgres://#{dev.user}:#{dev.password}@localhost/#{dev.database}" | |
constring = process.env.HEROKU_POSTGRESQL_NAVY_URL or process.env.TRAVIS_DB_URL or devConfig | |
{EventEmitter} = require ‘events’ | |
Database = (callback) -> | |
pg.connect constring, callback | |
class Transaction extends EventEmitter | |
constructor: ()-> | |
Database (err, client, done) => | |
if err | |
@emit ‘error’, err | |
else | |
@client = client | |
@done = done | |
@begin() | |
begin: -> | |
@client.query ‘BEGIN’, (err)=> | |
if err | |
@emit ‘error’, err | |
else | |
@emit ‘begin’ | |
query: (statement, cb)-> | |
statement = if typeof statement.toQuery is ‘function’ then statement.toQuery() else statement | |
@client.query statement, (err, rows) => | |
if err | |
@rollback() | |
else | |
cb rows | |
rollback: () -> | |
@client.query ‘ROLLBACK’, (err) => | |
@done err | |
@emit ‘rollback’, err | |
commit: (obj)-> | |
@client.query ‘COMMIT’, (err) => | |
@done() | |
@emit ‘commit’, obj | |
module.exports = Database | |
module.exports.Transaction = Transaction |
view rawtempus-db-index.coffee hosted with ❤ by GitHub
In the above gist, you will also see a Transaction Event Emitter. Transactions are part of SQL and this abstraction is available from other packages such as node-any-db-transaction. I needed a simple way to start and stop transactions. The idea of rolling back data manipulation operations in application code is a fierce some task. Luckily SQL has our back on this one, and we can easily reverse any set of queries if necessary.
Here is a way to use this Transaction object.
deleteOne: (key, callback)-> | |
{time_entry} = TimeEntriesController | |
deleteProjectTimeEntries = time_entry.delete().where(time_entry.projectId.equals(key)) | |
deleteProject = @project.delete().where(@project.id.equals(key)) | |
deleteUsersProjectsRows = @usersprojects.delete().where(@usersprojects.projectId.equals key) | |
t = @transaction() | |
start = -> | |
async.eachSeries [deleteProjectTimeEntries, deleteUsersProjectsRows, deleteProject], | |
(s, cb)-> | |
t.query s, (err)-> | |
cb(err) | |
, -> | |
t.commit() | |
t.on ‘begin’, start | |
t.on ‘error’, callback | |
t.on ‘commit’, -> | |
callback() | |
t.on ‘rollback’, -> | |
callback new Error "Could not delete project with id #{key}" |
view rawtransaction-ex.coffee hosted with ❤ by GitHub
This snippet includes usage of the node-sql
module that generates SQL strings from definitions outside this function scope. Just pretend they are SQL strings. For each of the SQL strings, execute in serial. If an error is thrown by PostgreSQL, the Transaction Object will internally emit a rollback.
Improvements
For future transaction support, I am taking a deeper dive into node-any-db-transaction
because it’s more mature and battle tested in node-any-db.
Application SQL
In migration SQL, I am not concerned at all with user input and thus no need for SQL injection projection. Only authorized people will run migrations against the staging DB and production DB. In the application SQL, SQL injection attacks must be mitigated.
node-sql
builds prepared statements.
Table objects are defined with an object containing a name
and an array of columns
.
time_entries = sql.define | |
name: ‘time_entries’ | |
columns: [‘message’, ‘projectId’, ‘userId’, ‘start’, ‘end’, ‘duration’, ‘id’] |
view rawtime-entries-sql.coffee hosted with ❤ by GitHub
For selecting time entries for a particular project, the statements should look like this:
getForProject: (projectId, callback)-> | |
statement = @time_entry.select @time_entry.star() | |
.from @time_entry | |
.where @time_entry.projectId.equals projectId |
view rawtime-entries-for-project.coffee hosted with ❤ by GitHub
The above statement object is passed to a deeper layer where I eventually call .toQuery
yielding a prepared statement which combats SQL injection attacks. Prepared statements can also help speed up frequently used queries if used correctly.
Improvements
As the data model becomes more complex, it will be nice to build partially generated node-sql objects that can be shared throughout my controller level code. Not everything is supported in node-sql
. Some of the specific features of PostgreSQL have not be implemented in node-sql
such as window functions. I haven’t used these yet, but I hope that node-sql
is easy extensible. Since node-sql
is designed for multiple relational databases, it is wise to use node-sql
as a base and use a module, perhaps node-psql
, that inherits and builds PostgreSQL specific functionality.
Query Language
Using a relational database adds additional complexity some higher level application programmers are not used to. The benefits of SQL greatly outweigh the overhead of managing your data model. At some point the data model has to be defined. Start thinking about the data from the beginning.
In my experience with working with NoSQL stores, one ends up adding a lot of application code to query the data in a NoSQL store, build and update indexes, and forgetting to include ways to migrate data when the model changes. Frankly, I want to push so much of this responsibility back to the Database so I don’t have to write it my self. The less code I write the better.
General Thoughts on Scaling
Relational algebra, the math relational databases are built on, is a beautiful math that scales infinitely. I find it disturbing that all of the big data NoSQL solutions do not provide a Standard Query Language for asking questions about the data stored. There is no reason to reinvent the wheel for big data. Relational algebra scales infinitely. Big data solutions should use the math.
While traditional storage engines used by relational databases may not scale as well as big data Peta byte solutions, relational algebra principles and concepts will.
Leave a Reply