Documentation Index

Database Design

Post Database

We designed the database code to be "Data Ignorant". All the code does is read/write/delete records. The code is just an API and knows nothing about the content of the records. Just like the fopen/fread/fwrite/fclose API. It is the "higher up" code that knows, or cares, about the content.

The database design is as simple as possible:

    +----+------+
    | id | body |
    +----+------+

Which is:

    CREATE TABLE data (id INT AUTO_INCREMENT UNIQUE, body MEDIUMTEXT)

See the file POSTFORMAT for how the body text is formatted (it is several headers separated by newlines, with the post text after two newlines). Like this:

    +----+---------------------------------------------------------------+
    | id | title:First Post\ndate:Jan-01, 2013\n\nWelcome...\nThis is... |
    +----+---------------------------------------------------------------+

which is exactly how a post is stored in the database. This data gets turned in to a record as:

    'id' => ID,
    'title' => 'First Post',
    'date' => 'Jan-01, 2013',
    'body' => 'Welcome...\nThis is...'

The code that turns a post into a record turns each header into an array member with the name/value set to each text:text\n found, with the \n\n as the marker for body; then it adds in the ID.

Although this design makes for really simple database, it does not allow for direct queries of a post's header data — that was not a requirement of the design.

Querying and managing post header data is done in a separate module that knows nothing about the database design.

Since the database cannot be queried on the post header data directly, as each record must be read in full, the headers extracted and then queried. With a small blog, with no more than several thousands of medium sized posts this is not really an issue because the overall code base is very small. However, with tens of thousands or more and with much larger post data sizes, the performance hit will be seen.

We do not plan to change the database design. However, we do speculate about this from time to time (more below).

The database has a separate table for each "section" (what is otherwise known as a "category").

Comments Database

The user comments database is just like the posts database with the addition on a column of entryid, which contains the id of the post that the comment is for:

    +----+---------+-----------------------------------------------------+
    | id | entryid | from:Joe\ntitle:Snow\ndate:Jan-01, 2013\n\n<p>It's..|
    +----+---------+-----------------------------------------------------+

This data gets turned into a record as:

    'id' => ID,
    'from' => 'Joe',
    'title' => 'Snow',
    'date' => 'Jan-01, 2013',
    'body' => '<p>It\'s..'

Currently comments are not editable and there is no comment header data beyond from, title and date. body is formatted for displaying before the data is stored. id is the MySQL record ID and not entryid, which gets lost — that may change.

Visitors Database

The Visitors database (see file VISITORS) is similar to the POST DATABASE but with name instead of id.

    CREATE TABLE visitor (name VARCHAR(32) UNIQUE, body MEDIUMTEXT)
    +------+-------------------------------------------------------------+
    | name | from:Joe\ncode:$5$nHr7d\n\nip:127.0.0.1\ntime:1375296768... |
    +------+-------------------------------------------------------------+

This data gets turned into a record as:

    'from' => Joe',
    'code' => '$5$nHr7d',
    'body' => 'ip:127.0.0.1\ntime:1375296768'

The Visitors database is editable and is in the process of expansion. Headers and body data will be changing through the next several versions as we are adding features as we think of them. Note that body currently is like header data. We do not do anything with that data... yet.

Performance

Adding of two columns for the post title and date (while leaving room for a variable number of "sub-headers"), as all posts require a date and title header, might be an improvement.

The code was designed to be a single (Admin) user website and post records do not have a "name" header, and no other header fields beyond title and date lends itself to queries.

Complex queries such as date ranges and sorting are probably better turned over to the database server — certainly so for millions of rows or distributed databases — but this ain't that. PHP does what we want to do just fine.

To us, the current design has the direct advantage that if we want to add, remove or otherwise change the post header data there are not hundreds of SQL strings strewn throughout the code that would have to be changed.

Notes
  1. But with a configuration setting a few User Agent strings are added.
  2. This is unlike posts, which are formatted for display each time they are read out of the database. This is because comments are not editable as posts are. (Admin can edit posts but it edits them in their formatted state.)