line-level code deduplication

When you’re learning about object-oriented code, people use “code deduplication” as one of the reasons for modularising your code, to increase code re-use, and the cut the number of times the same blocks of code are present inside any given system.

The following phrases are some of those that all developers should be thinking:

  • “If I want to re-use this block of code, maybe I should put it in a method…”
  • “If I want to re-use these methods, maybe I should put them in a separate class…”
  • “If I want to re-use this set of classes, maybe I should put them into a bundle…”

These thoughts all focus on code re-use for the purposes of maintainability, and de-duplication of effort. They focus on code-deduplication at Method, Class, and Module levels, respectively.

What these statements have missed, is that there is one more important reason for deduplication, that of code performance.


Code performance is overlooked often in so much of the code that I see, that it is often the number one reason that I end up spending time refactoring.

Consider the following code. It suffers from code duplication at the line-level that reduces readability, performance, and maintainability.

switch ($filters->getDateFilter()->getDateType()) {
    case 'started':
        $searchQueryBuilder->applyFilter(
            new StartDateRangeFilter($filters->getDateFilter()->getFrom(), $filters->getDateFilter()->getTo())
        );
        break;
    case 'ended':
        $searchQueryBuilder->applyFilter(
            new EndDateRangeFilter($filters->getDateFilter()->getFrom(), $filters->getDateFilter()->getTo())
        );
        break;
    case 'created':
        $searchQueryBuilder->applyFilter(
            new CreatedDateRangeFilter($filters->getDateFilter()->getFrom(), $filters->getDateFilter()->getTo())
        );
        break;
    case 'active':
        $searchQueryBuilder->applyFilter(
            new ActiveDateRangeFilter($filters->getDateFilter()->getFrom(), $filters->getDateFilter()->getTo())
        );
        break;
    default:
        throw new Exception("Invalid date_type, valid types are 'started', 'ended', 'created' and 'active'");
}

Repeated calls in this statement:

  • `$filters->getDateFilter()` – occurs 9 times
  • `$searchQueryBuilder->applyFilter()` – occurs 4 times
  • `$filters->getDateFilter()->getFrom()` – occurs 4 times
  • `$filters->getDateFilter()->getTo()` – occurs 4 times

There are several methods in the above code that are repeatedly called, even though we know that the result will always be the same.
(If we know that the result may change between calls, then this should be illustrated in a comment, but only badly written code would prefix a mutator method with ‘get’)

$dateFilter = $filters->getDateFilter();
$from = $dateFilter->getFrom();
$to = $dateFilter->getTo();

switch ($dateFilter->getDateType()) {
    case 'started':
        $filter = new StartDateRangeFilter($from, $to);
        break;
    case 'ended':
        $filter = new EndDateRangeFilter($from, $to);
        break;
    case 'created':
        $filter = new CreatedDateRangeFilter($from, $to);
        break;
    case 'active':
        $filter = new ActiveDateRangeFilter($from, $to);
        break;
    default:
        throw new Exception("Invalid dateType, valid types are 'started', 'ended', 'created' and 'active'");
}
$searchQueryBuilder->applyFilter($filter);

In the code above, these repeated calls have each been replaced with a single method call, using a local variable to store the result when appropriate, so it can be referenced without repeating the call. Code that is not necessary to be inside of the `switch` statement has been relocated after the switch statement.
This code could be shrunk further with the use of stringified method names, but in this instance is unnecessary, makes code tracing and maintenance harder, and reduces code readability and possibly also affects performance to a small extent.

Here is an example of how you could shrink the code further:

$dateFilter = $filters->getDateFilter();
$dateType = $dateFilter->getDateType();

$types = [
    'started' => 'StartDateRangeFilter',
    'ended' => 'EndDateRangeFilter',
    'created' => 'CreatedDateRangeFilter',
    'active' => 'ActiveDateRangeFilter',
];
if(!array_key_exists($dateType, $types)) {
    throw new Exception("Invalid date_type, valid types are '" . implode("', '", array_keys($types))."'");
}

$searchQueryBuilder->applyFilter(new $types[$dateType]($dateFilter->getFrom(), $dateFilter->getTo()));

This shrunk code does have the advantage that it is the most flexible, in that even the Exception message is dynamic, so adding a new type now only involves adding one line of code.
From the perspective of code tracing and maintainability, you could add a PHPDoc `@var` tag to tell any watching IDE that the resultant class could be an instance of any one of the four listed classes from above:

/** @var StartDateRangeFilter|EndDateRangeFilter|CreatedDateRangeFilter|ActiveDateRangeFilter $f */
$f = new $types[$dateType]($dateFilter->getFrom(), $dateFilter->getTo());
$searchQueryBuilder->applyFilter($f);

But this just starts to get silly, and we are again repeating the class names from the list for the purposes of IDE integration and code documentation.


Some developers would prefer to use an inline formatted layout for the switch statement as below:

$dateFilter = $filters->getDateFilter();
$from = $dateFilter->getFrom();
$to = $dateFilter->getTo();

switch ($dateFilter->getDateType()) {
    case 'started': $filter = new StartDateRangeFilter($from, $to);   break;
    case 'ended':   $filter = new EndDateRangeFilter($from, $to);     break;
    case 'created': $filter = new CreatedDateRangeFilter($from, $to); break;
    case 'active':  $filter = new ActiveDateRangeFilter($from, $to);  break;
    default:
        throw new Exception("Invalid dateType, valid types are 'started', 'ended', 'created' and 'active'");
}
$searchQueryBuilder->applyFilter($filter);

Recently I have been avoiding this code pattern, as it is against the PSR-2 Code Style Guide layed out by PHP-FIG – This is one of the standards I follow, and my IDE is configured to auto-format code which prevents me using this pattern. This is one of the (very few) things I dislike about the PSR-2 guide.

it’s the little things

When in a rush to deploy new applications, people rarely consider the impact of working on additional little features. I’m not talking about fixing bugs or highlighting browser incompatibilities, I’m talking about little additional extras that normal people dont always notice, but technically savvy people will.

It’s the little things in an application that make the difference between being quite good, and being brilliant.


In an interactive web application, consider adding context menus to interface components. Being able to right-click on something and perform an action on it that would otherwise require you to go to a new page to perform the same action, is a little thing, but a useful, time saving, bandwidth reducing addition to your application.


If you have paginated results of things that is limited to a specific number, say 20, why not check first if the actual list size is only marginally over that number before sending back only 20 items?
If you have 20 items on page 1, and there are 2 pages, the user may be disappointed when they click on the ‘next page’ button when only 1 more result comes back.
If there are only marginally more results than the pagination is limited to, just show them all and hide the pagination features.


Make use of HTML5 form features – use the new types!
Instead of using
you should be using .
On mobile devices, this restricts the available keys on the displayed keyboard to only those used in phone numbers, adding an extra layer of validation, and also making the user feel like they might actually be using a modern website.
There are more in-depth validation options you can use here, there is a great article here: Making Forms Fabulous with HTML5.


Make sure any part of your site that submits personal information, like your name, email address, and especially login details, always submit this information using HTTPS.
Actually this isn’t a little thing, it is quite a big thing, but I mention it here anyway because it’s good to remind developers as often as possible that they really should be doing this.


Make sure your site has a `favicon` 🙂 Remember that any user that opens more than a few tabs may only be able to see the favicons and not the page names in their tabs list. A picture paints a thousand words. Adding this small image creates a strong bond between the colours and shapes of the image, and your website. This helps users identify which website is yours in their tabs list, and hopefully in their list of bookmarks too.


If you have several simple pages on your site that users visit regularly, consider combining them into a single page, even if this means having another page, it might just cause a reduction in unnecessary traffic.
Maybe show a portion of each page instead, so users can see part of the pages at a glance, and can still click through to them to see more detail.


Use hover/popup/help text. If you have an abbreviation on a page, and your user doesn’t know what it means, they might feel a little frustrated that you are expecting them to know already. Don’t force them to Google it – use the `<abbr>` tag. When it isn’t an abbreviation but a link displayed only as an icon, then make sure you use the `title` attribute of the image – otherwise your users may not even know what the button does!
In fact it is a generally recommended accessibility rule anyway – those with impaired visibility might have other less common mechanisms to interpret your website, and these devices can’t be expected to describe what your icons look like if you haven’t bothered to give them any description.


Happy tinkering 🙂

computers don’t read comments

This is a draft specification for a self-defining database schema using a row-oriented database, such as MySQL. I reserve the right to update this document at any time 🙂

This post is basically a brain dump to try to nail down the naming convention which should be used in databases to make the names easily parseable and readable in a processing environment, with as little configuration, and as little operational limitation as possible. I plan to update this document as I use this specification, as I fully expect it to evolve over time.

table/entity naming convention

  • Table names must match exactly the class object names in our database. All entity names are suffixed with ‘Entity’, as are our table names.
  • Table names must be less than 32 chars for readability, but are limited to 64 chars in length, including the length of `Entity` – so bear this in mind when naming your entities.
  • Table names must begin with a capital letter, not a number, and contain only the following letters `[A-Za-z0-9]`.
  • Special characters and spaces are prohibited.
abcdefghijklmnopqrstuvwxyzabcdef
#                              ^ 32 chars (quite long)         v 64 chars (silly)
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl

Example valid table names:

`FooEntity`, `Bar123Entity`, `ALongerNameEntity`, `ANameWithAnAcronymABCDEntity`

We use a common suffix on our entity names and tables for the following reasons:

  • to easily map our tables to entities
  • to easily map our entities to tables
  • to prevent using reserved names in PHP
  • to prevent using reserved names in MySQL
  • to easily identify when we are working with an Entity in our application

field/property naming convention

Field names must contain all the description required to properly submit or read the data from the database, or to determine the proper relationship with another table.

`<prefix><CamelCaseName>`

prefix:
`e` = enum (always stored as an int in the database, never use database `enum` type)
`s` = string (varchar, longtext, shorttext, mediumtext, etc.)
`i` = int (tinyint, smallint, mediumint, int, bigint, etc.)
`f` = float (float, double, single, etc.)
`b` = bool (`tinyint(1)` = `1` or `0`)
`o` = object (stored as an int in the database, contains ID of the foreign key)
`r` = raw binary (usually a blob)
`d` = datetime (should be used for date, datetime, time, and timestamp types)
`x` = internal, reserved, used for mapping

Prefixes are compulsory – the length/format is never checked, assumed the database will accept any length output by the system, but should fail gracefully when exceeded.
Case sensitive columns names – the CamelCaseName is just a name, has no processed meaning.

One exception to this case – the `id` field is always present in every table, and is used to uniquely reference each row. This is also recorded as the `id` field in the entity class.

If the entity is referred to by other entities and you need this relationship in reverse to be automatically populated into an array, then the object type needs to be an `array` within the entity definition, and an `int` within the table definition, also following the `object` naming convention above.

<?php
public class OrderEntity
{
    const XDBMAP_OPERSON = 'PersonEntity';

    /** @var PersonEntity */
    public $oPerson;
}
public class PersonEntity
{
    const XDBMAP_OHOUSEHOLDENTITY = 'HouseholdEntity';
    const XDBMAP_OMANAGER = 'PersonEntity';
    const XDBMAP_OSTAFF = '[PersonEntity.oManager]';

    /** @var HouseholdEntity */
    public $oHousehold;

    /** @var PersonEntity */
    public $oManager;

    /** @var PersonEntity */
    public $oStaff = [];
}
public class HouseholdEntity
{
    const XDBMAP_OOCCUPANTS = '[PersonEntity.oHousehold]';

    /** @var PersonEntity[] */
    public $oOccupants = [];
}

enums

ENUMS within the database are prohibited. They unnecessarily complicate matters, especially when it comes to modifying the enum values.

Indexed lookups, referential integrity, etc. may be used – the application has no awareness of them.

Entities within the database that are defined from a CONST value held in the Entity class definition should be defined with the prefix `i`, `s`, `f` or `b` as shown above, and not linked to a foreign table. The principle behind this is that list entities in the database should only be present when adding to or removing from the list affects the application without having to modify the code for the change to take effect. This also stands up to database normalisation rules – the data should not be duplicated, i.e. it should not be stored as a CONST value AND be defined in the database.

<?php
public class FooEntity
{
    const
        ESERVICETYPE_FOO = 1,
        ESERVICETYPE_BAR = 2,
        ESERVICETYPE_FAR = 3,

    const
        ESERVICETYPE__1 = 'I am Foo',
        ESERVICETYPE__2 = 'We are Bar',
        ESERVICETYPE__3 = 'You are Far',

    /** @var int */
    public $id;

    /** @var int */
    public $eServiceType = self::ISERVICETYPE_FOO;
}

The example above describes a field in the database being represented as an integer – there is no functionality to be gained from having the field name stored in a separate table in the database, if modifying any of the 3 values above would have no effect without a code change.
Note the naming convention here – the const names above exactly match the variable name, in uppercase only. The underscore separates the name from the machine value, and the const is defined in a block to indicate to the developer that these options apply to the same field.
Note the additional const names – these names are for display purposes only, and describe the machine value stored in the field. So the internal value of FooEntity::FOO is 1, and the human-readable version of that is ‘I am Foo’, used for display purposes only, and never stored – this value might be subject to change, based on your application.
If more complex enums are required, such as the storage of string-value enums, then a separate table and object class should be used instead. The above configuration is intended to suit most use cases, but will not suit all.

Consts with a double-underscore separating the name from the value are used for display purposes, e.g. `ESERVICETYPE__3` is a const field that contains the display value for `ESERVICETYPE_FAR`, which has a const value of 3. This enables easy lookup of the display value given the stored value. Now you are starting to see the importance of this field always being numeric.

Furthermore, the const display fields are not compulsory, but their existence will not be checked prior to their usage – so if you do use the fields for display purposes, make sure you have the const values properly configured. EAFP – It is easier to ask for forgiveness than permission.

nulls

Whether any field in the table can be stored as `null` is entirely down to you – the application will not care, and will fail with a handled error if a null value is attempted to be stored in a non-null field.

relationships

If the field type is an object (prefixed with `o`), then the field name should be suffixed with the name of the database table to which it refers, separated with an underscore.

default values

Default values should be specified in the entity class as defaults for the class properties, in the example above `self::ISERVICETYPE_FOO` is the default value for `$eServiceType`.
If default values are required that are not able to be specified in the parameters, like `new DateTime()`, then these should be set in the entity `__construct()` method.