code smell – objects vs arrays

“Code Smell” is a term that I have heard used in the past, and regularly use myself to describe code that may work fine, but just doesn’t seem as good or clean as it could be. Maybe there’s some repeated code, or unnecessary over-use of strings. Basically the code smells bad.

Use objects instead of associative arrays for working with collections of values.

Here’s why…

In Javascript, it is now commonplace to see a single object passed to a method as a collection of parameters.
This is for code flexibility, so you can call the method with as many or as few values in the object as you like, and you can build the list of values to be passed to a method over a larger area of code. It also means you can simply forward the parameters object to another method, perhaps modifying some of the parameters between calls.

So the community has accepted this is convenient way to pass parameters around – I’m not saying this should be done everywhere, in fact this should only be done when it smells right to do so… Currently one commonly used option is to use associative arrays to accomplish this. Here are some unexpected drawbacks:

  • `$arr[‘foo’]` can’t be described using `/** @var */` (or PHPStorm doesn’t support it).
  • `$arr[‘SomeVariebleName’]` won’t show the typo when you meant `$arr[‘SomeVariableName’]`
  • `$arr[‘bar’]` might be not set, and you’d have to use `isset()` or similar function to determine if it has been set.
  • `$arr[‘foo’]` is slower than `$arr->foo`. Not by a lot, but it is.
  • finding all the places in your code that `$arr[‘foo’]` is used when your array keeps getting passed around, is hard work.
  • you have no dedicated place to put functions for interacting with this array of parameters.

To fix some of these issues you might convert your array to an object using a cast:

$arr = ['foo'=>'a', 'bar'=>'b']; 
$object = json_decode(json_encode($arr), FALSE);
//or 
$object = (object) $arr;

However, you still have no dedicated space to describe the object – `$arr->foo` still has no built-in PHPDoc, no default value, and if you are likely to use the same object in multiple classes, you end up reproducing blocks of code, which also smells bad.

Here are some solutions…

class HelloWorldParams
{
    /** @var int */
    public $foo = 1;

    /** @var string */
    public $bar = 'foo';

    /** @var string */
    public $ran = 'bar';
}

$f = new HelloWorldParams();
$f->foo = 1;
$f->bar = 'Hello';
$f->ran = 'World!';

So here we’ve completely replaced using the array with using a custom object, in the same way that we would use a STRUCT in C. No methods, just a class that contains a bunch of properties. Using this object makes reading the code easier, tracing code easier, and refactoring code easier. And all it takes is a tiny little class that can be re-used elsewhere in your project.

What about when we have an array and we need to convert it?

Good question. We can add a constructor method to our class to make this easier:

class HelloWorldParams
{
    /** @var int */
    public $foo = 1;
    
    /** @var string */
    public $bar = 'foo';
    
    /** @var string */
    public $ran = 'bar';
    
    /**
     * HelloWorldParams constructor.
     *
     * @param mixed[] $arr
     */
    public function __construct(array $arr=[])
    {
        foreach($arr as $k => $v) {
            $this->$k = $v
        }
    }
}

$arr = [
    'foo' => 1,
    'bar' => 'Hello',
    'ran' => 'World!'
];
$f = new HelloWorldParams($arr);

This technique demonstrates replacing an associative array with an object, to improve performance, maintainability, readability and to reduce code duplication. That is all.

Some folks might say

you should always have setters and getters to access object properties

but these people are wrong. KISS!!

There is a time and a place for getters and setters (or accessors and mutators if you’re in academia) and this is not it. You can add them if you like, but IMHO you’re just adding unnecessary code. Remember that you’ve already improved your code with this simple step above, there’s no need to go overboard.

This is a simple class, and if you want to add complex functionality like input validation (which is one of the main reasons for using setters), then you’re better off using a validation class that validates the input data, and a hydrator class that populates the object, which is beyond the scope of this blog entry.


It is possible to reduce the code a little using PHPDoc instead of explicit properties – I advise against this as it reduces code clarity, but I’ll describe what I mean here anyway, so I can show you a slightly more unusual PHPDoc use-case:

/**
 * @property int $foo
 * @property string $bar
 * @property string $ran
 */
class HelloWorldParams{}
$f = new HelloWorldParams();

So this happens in PHPStorm:

PHPStorm recognises @property values

So the only practical difference between this and the previous code above, is that we can’t use default values for the properties unless we add them as real properties of the class. Also, parsing annotations in this class won’t work as effectively as it would for the previous example, because we are describing properties within the class docblock, instead of using dedicated docblocks for each property.

(A docblock is a `/** … */` , not to be confused with a multi-line comment `/* … */`.)


All the examples above allow you to add any property to the class, so by default it doesn’t prevent you from setting a property to the object that it doesn’t expect to have.
PHPStorm will warn you that you’re using an unexpected field, but it will still let you set the property:

PHPStorm Field not found
PHPStorm field declared dynamically

So here’s one way to prevent custom dynamic fields being set in your object, that still retains most of the above functionality:

class HelloWorldParams
{
    public $foo;
    public $bar;
    public $ran;
    /**
     * @param string $name
     * @param mixed $value
     */
    public function __set($name, $value)
    {
        if(!property_exists($this, $name)) {
            throw new InvalidArgumentException(
                "Property '$name' not present for object " . __CLASS__
            );
        }
    }
}
$f = new HelloWorldParams();
$f->abcd = "bar";

We’re using the `__set()` magic method to throw an exception if we try to set a value to a property that doesn’t exist. Simple.

primary path code style

This is a quick breakdown of a simple technique I use when laying out code within methods.

Basically it refers back to a common rule used when drawing logic flow diagrams, that the primary path in a logic flow diagram will follow the central vertical line from top to bottom.

This indicates that any non-standard route will follow a logic path that deviates from the central vertical line, and perhaps rejoins it further on.

Working on this idea, code methods and logic flow diagrams should be written in a similar fashion – if there are fewer lines in a logic flow diagram, then it is a simpler diagram. A line in a flow diagram basically translates to a code block in code, so the fewer nested blocks there are, the fewer lines in the diagram there would be.

If two flows in a diagram share roughly equal likelihood, then this rule has less significance, as mutually exclusive nested blocks must exist within the code to represent this. Consider whether the method could be split out into separate methods if this is the case.

As personal preference, I try not to nest code more than 3 levels deeper than the initial depth of the method. (This will start at 2 levels deep for a class method, one deep for a function.)

Here’s an example of some code:

public function doSomething()
{
    $a = funcA();
    $b = funcB();
    if(null !== $a) {
        funcC();
        funcD();
        //more code goes here
    } else {
        throw new Exception("a should not be null");
    }
}

This code demonstrates a problem I see regularly, and it follows a pattern that breaks the primary path code style. In this example the primary path only executes `funcA()` and `funcB()` and then deviates into two parallel paths, instead of suggesting which path is most likely. Normally having this separation would be fine, because it is entirely possible that the two paths (the if/else blocks) are equally likely – but in this case the `else` case only exists to throw an `Exception` if the `if` statement is not met.

This breaks our rule, as an exception should only be thrown in cases of exceptional circumstances, hence the name. The only caveat to this is if the `doSomething()` method serves the primary purpose of throwing an exception in the first place, which in this case it does not.

This code can be simplified following the primary path code style as follows:

public function doSomething()
{
    $a = funcA();
    $b = funcB();
    if(null === $a) {
        throw new Exception("a should not be null");
    }
    funcC();
    funcD();
    //more code goes here
}

Here we can see the primary path now says that `funcC()` and `funcD()` are part of the primary path, and has a greater likelihood if being executed than throwing the Exception does.
You should also note that this version of the code takes up less space on disk, as there is reduced indenting, and the `} else {` line is no longer required.

To a further extreme, I have seen code that looks like this, but with very many more lines:

public function doSomething()
{
    $a = funcA();
    $b = funcB();
    if(null !== $a) {
        funcC();
        $d = funcD();
        if(null !== $d) {
            $e = funcE();
            $f = funcF();
            if(null !== $f) {
                //further horrible indented code
            } else {
                throw new FIsNullException("f is null, this should never happpen");
            }
        } else {
            throw new DIsNullException("d should not be null");
        }
        //more code goes here
    } else {
        throw new AIsNullException("a should not be null");
    }
}

You can see how this pattern-ignoring coding has lead to much unnecessary indenting, and generally more difficult to read code. Here it is simplified:

public function doSomething()
{
    $a = funcA();
    $b = funcB();
    if(null === $a) {
        throw new AIsNullException("a should not be null");
    }
    funcC();
    $d = funcD();
    if(null === $d) {
        throw new DIsNullException("d should not be null");
    }
    $e = funcE();
    $f = funcF();
    if(null === $f) {
        throw new FIsNullException("f is null, this should never happpen");
    }
    //further primary path code
    //more code goes here
}

This code has **exactly** the same functionality, but now follows the pattern, and you can easily see the code that follows the primary path, and you’ve saved some disk space too.


If you think about how `if` statements are actually processed and executed by the CPU, `else` statements don’t really exist in assembly in the same way they do in abstracted languages like PHP – in assembly, an `if` statement is akin to a `JMP` to jump to a different instruction block depending on the outcome of a comparison statement, (==0 `JZ`, >=n `JGE`, <n `JL`, etc.)[1]. If the comparison doesn’t match, the jump doesn’t happen, and the execution point moves to the next instruction. It is therefore more processor efficient (albeit slightly) to have fewer ‘if’ statements that match the statement, as this means fewer jumps, and fewer instruction blocks.
Whether this performance gain translates upwards into highly abstract languages such as PHP is not likely to be noticed, or even detectable, but it’s always good to try to consider the instructions you are sending to the CPU once in a while, and ask yourself, is there a more logical way to walk through this code?

the importance of being logged

This is a helpful reminder to everyone (me included), to always add logging to the system you’re working on.

If a code block is exiting abnormally, or results are not as expected, this obviously needs to be handled in the code, but is often the result of an external problem, and so should be recorded to a log file. In most circumstances this only involves adding one line of code. All logging facilities use different levels of logging output to differentiate the severity of the message, and to disable the more verbose levels in production systems, to save processing, disk space and I/O bandwidth.

Throwing a generic Access Denied Exception can be backed up with a more useful logging message, for analysis:

} catch (Exception $e) {
    $this->container->get('logger')->error(
        "User does not have access to mid $mid"
    );
}

Exceptions should only be thrown under exceptional circumstances, hence  the name. This situation is worthy of being logged.
Instead of this:

} catch (Exception $e) {}

We should be doing this:

} catch (Exception $e) {
    $this->container->get('logger')->error(
        'Exception caught when fetching access list: " . $e->getMessage()
    );
}

Logging is a very useful development aid too, but consider not removing all your debugging output when your code is all working and polished.

I’ve used logging numerous times in the past to diagnose when a system is experiencing Fatal errors, beit from Syntax to SegFaults – using logging on a line-by-line basis, outputting simply the __CLASS__ name and __LINE__ number, and flooding files with these statements is sometimes the only way to track down the fault.

So if the message you’re logging contains no static parts, remember that it will be harder to find in the code later on! So include something like the current class name, or use the special built-in magic constants that PHP has to offer:

} catch (Exception $e) {
    $this->container->get('logger')->error(
        __METHOD__ . '#' . __LINE__ . ' : ' . $e->getMessage()
    );
}