The Product Side

5 columns every database table should have

So, you’re designing a business app that uses a relational database. Users will log in to your app and create or edit data. Please consider adding the following columns to each one of your DB tables.

1. id

Create a separate column as the primary key (PK) to your database table — the “id” column. This column should have nothing to do with your data. It typically is either an integer or a GUID.

Although this may sound trivial, I’ve seen way too many tables having a PK that is tied to the data (i.e. it consists of one or more columns).

A) Why shouldn’t the PK be tied to the data? Even if it seems that the underlying data won’t change (for example the PK might be the person’s national ID number), chances are that in the long run it will change. Your data row (the tuple) is one thing and the data it contains is another. A person may get a new national ID or passport, but the person remains the same. A user may change her email, but she still is the same user. There is no need to tie the tuple identifier to the data it contains.

It gets even better when you need to update the data. Say that the user changes her email and you need to reflect this change on the database. If the PK is an integer then all you need to do is issue a single update command “update table Users set email=’newEmail@company.com’ where id='123' ”. If on the other hand the email was the PK then not only do you need to change the ‘Users’ table, but all the other tables that have a reference to this particular user via her old email.

B) Why should the PK be a single column? It makes it much easier to do all sort of CRUD or join operations on the database.

C) Integer or GUID? Which is best? An integer (auto increment, or nextVal from sequence) is easier to read and this helps when you debug or write your own SQL statements by hand. This is by far the biggest benefit of using an INT. The main reason to go for GUID would be if you need to merge records from different database instances or split a huge database across many servers. Also if you implement database replication then GUID is useful.

Technical differences between integers or GUIDs (such as size, search speed, index size & speed, sorting difficulty, table space used etc.) don’t make much difference for small databases and all modern RDBMSs can handle the extra load of a GUID. However if your database starts growing significantly then the GUID will add an overhead, which you should estimate before you start creating the database.

2. userIns & dateIns

At some point in time you will need to discover why a record exists. Who placed this order, who issued this invoice, who created this delivery address and so on.

This is why it is recommended to make sure that every record holds the name of the application user who created it and the timestamp of its creation. By “application user” I refer to the logged in user who created the record, say “Dimitris” and not some generic user, such as “web”, or the DB user. For records that were generated because of a batch job running, you can put the name of the batch program as “userIns”.

Both columns are only populated once; at the time the record was created.

3. userupd & dateupd

Same as point 2, please keep track of who modified each record and hold the name of the modifier, together with the modification timestamp.

Both columns are populated every time the record gets updated, including the initial creation.

If there is business need you may wish to keep the full modification history of each row (either in the same database table, or to a separate one — recommended).

Last point would be how to handle deletions. Keep the row in place and mark it as deleted or actually delete the database row? This mostly depends on the business need and we can review it on a separate post.

Happy to hear your thoughts!