Perl Coding Standards v1

Coding Standards provided by Design Develop Test Ltd - a Singl.eView and Perl Consultancy company based in the UK

Introduction

Welcome to the generic coding standards document for Singl.eView Developers. Normally coding standards documents are dull and boring. I will try to make this document interesting, informative and more importantly fun.

Why do we need a coding standard?

Good question. Some companies don’t have one or use the standard document that comes with Singl.eView and the majority of the time it gets ignored anyway*.

Lets get to the point though; as developers we tend to code in our own style just like we have different laughs or voices. It is yet another thing that separates us from ‘normal’ people.

There is nothing wrong with being unique, however when we start to group together in a Project Team creating and maintaining lots of code; this combination of individual styles start to work against us, the code becomes to become messy and disjointed even if syntactically correct.

The compiler does not care, so why should we? Well, I am sure we have all been in a situation when we have had to look into a function or a script and tried to read it but found the code was ‘all over the place’. We have had to read through lines of code to try and reverse engineer what it does. Only then, after all that time would we know if we need to change the code or go and decipher another function.

Let me put it another way. “The best applications are coded properly”. The definition of ‘properly’ in this context does not just mean that the code performs its task correctly and has great error handling; but it is also “maintainable code”.

* If you work for a company that follow your coding standard then I salute you.

Maintainable Code...

Maintainable Code is code that has been formatted and commented for the benefit of the human reader. Remember the compiler does not care but as biological entities and especially as developers, we love pattern and order. It is easy to apply this to code, which makes it easier for the maintenance developer to understand, navigate and update code.

To achieve the nirvana of maintainable code though, we need to all be clear what the requirements are for ‘Maintainable Code’ and this is what the Coding Standard tries to achieve (we got there eventually).

Production Ready Code…

Another term we will use in this document is ‘Production Ready’ code. This does not mean code that has been tested and the quality of the code has been given the all clear to go into Production. In fact it does not relate at all to the level of bugs lurking within the code.

The definition of ‘Production Ready’ is that every effort of writing efficient code has been made. If you are a junior developer reading this then you may not know what we mean or how to achieve this. Thankfully though we attempt to teach you in this document some tips and techniques to make the job of any future performance tuning tasks to be as difficult as possible.

Understanding what is efficient code and getting into the habit of writing it from the start means that the structure of your code could follow a very different path to what you would have originally written. In the event that your script or function does get passed to a Performance expert for tuning, then giving them code that has already had the simple things done will make their life more difficult (and rewarding).

General Formatting

Indentation

Indentation is an excellent way to show the depth of code, especially when you have lots of nested code blocks. Being able to scroll across many lines of code and clearly identify the individual code blocks makes it easier to see where a process enters and exists and this speeds up debugging.

Therefore all code must be indented by 1 tab within a code block on top of its parent indentation.

Here is an example of good and bad indentation defined by this document:

Example of bad formatting


# Check that the Name and Age have been identified
if( (defined($zSomeName)) && (defined($zSomeAge)) ) {
    # Are they greater than 18?
    if($zSomeAge < 18){
    print("I am sorry $zSomeName, you are not old enough to apply.\n");
    } elsif($zSomeName eq $ADMIN_NAME) {
    print("Welcome back Dave, would you like to play a game of chess?\n");
    } else {
print("Welcome $zSomeName\n");}
} 
                        

Example of good formatting


# Check that the Name and Age have been identified
if( (defined($zSomeName)) && (defined($zSomeAge)) )
{
    # Are they greater than 18?
    if($zSomeAge < 18)
    {
        # Opps, they are not 18 yet..
        print("I am sorry $zSomeName, you are not old enough to apply.\n");
    }
    elsif($zSomeName eq $ADMIN_NAME)
    {
        # Admin is back, give them a 2001 style greeting
        print("Welcome back Dave, would you like to play a game of chess?\n");
    }
    else
    {
        # Welcome the user
        print("Welcome $zSomeName\n");
    }
}
                        

Comments & Descriptions

Comments and Descriptions in general are one of the most useful features to be able to get up to speed with something. Alas it is always the one thing that gets the least amount of time invested by the developer. I have seen amazing amounts of entities with a description value the same as the Entity name. What use is a description if it is not telling you anything? It would be like me writing the title above and not adding this text.

Populating Description Fields and leaving Comments in code does not take that long and the information for the descriptions can be sourced from the Detailed Design document (DLD). By investing that little extra in the development effort to adding good comments and descriptions will save 10 times that amount of effort for the maintenance.

There is a saying that most of you will know:

“A picture paints a thousand words”

The second part of that is:

“A thousand word’s paints a million pictures”

This saying is stating that if you have an Entity or Diagram in your DLD, then this should provide the majority of the information needed. But if you have only words then this could provide the reader with a million interpretations.

What am I saying? Which one is correct? I know that I have chucked into the mix Designs and Diagrams, but this is an effort to get DLD’s to a higher standard as well.What am I saying? Which one is correct? I know that I have chucked into the mix Designs and Diagrams, but this is an effort to get DLD’s to a higher standard as well.

Entities on their own would provide the description, but this is an interpreted description which takes time and input to build. If you provide a good description or comments in your code, then the two complement each other and provide the maintenance developer with a quick and correct interpretation of your code (or intention if there is a defect in the code).

Code Blocks

Curly Brackets are always an interesting and heated subject matter in software development. The majority of coding books will show curly brackets starting on the end of the previous line like so:


# Check that the Name and Age have been identified
if( (defined($zSomeName)) && (defined($zSomeAge)) ) {
    # Are they greater than 18?
    if($zSomeAge < 18) {
        # Opps, they are not 18 yet..
        print("I am sorry $zSomeName, you are not old enough to apply.\n");
    }
    elsif($zSomeName eq $ADMIN_NAME) {
        # Admin is back, give them a 2001 style greeting
        print("Welcome back Dave, would you like to play a game of chess?\n");
    }
    else {
        # Welcome the user
        print("Welcome $zSomeName\n");
    }
}
                        

However this could be an attempt to save paper. The code that we produce as developers needs to be easy to read and we must see a pattern. Therefore, curly brackets should be put onto a new line to clearly define the start and end of a code block:


# Check that the Name and Age have been identified
if( (defined($zSomeName)) && (defined($zSomeAge)) )
{
    # Are they greater than 18?
    if($zSomeAge < 18)
    {
        # Opps, they are not 18 yet..
        print("I am sorry $zSomeName, you are not old enough to apply.\n");
    }
    elsif($zSomeName eq $ADMIN_NAME)
    {
        # Admin is back, give them a 2001 style greeting
        print("Welcome back Dave, would you like to play a game of chess?\n");
    }
    else
    {
        # Welcome the user
        print("Welcome $zSomeName\n");
    }
}
                        

Tabs vs. Spaces

Both give you an indentation so does it matter? Well, actually it can especially when they are mixed together. Tabs length of an indent is displayed differently depending on the editor you are using or how it is configured.

Code in VIM with the Print Statement indented with a tab and the other lines with 4 spaces.

Image is showing that a tab appears longer in VIM to some other editors such as Eclipse - Perl Coding Standards by Design Develop Test

Same code in Eclipse, the tab is interpreted as 4 spaces and as such, you cannot see the difference.

Image is showing how 4 spaces and a tab look the same in Eclipse - Perl Coding Standards by Design Develop Test

To ensure that the code looks clean in any editor, only spaces should be used and a single indent is comprised of 4 spaces. The following diagram shows the setting that should be changed in Eclipse:

Lee Clifford

Naming Conventions

All variables should be well named and descriptive of the contents that are assigned to them.

Global Constants

Global Constants (Scalar, Array and Hash) should all be in capital letters. To make the naming clearer, words should be split by an underscore.

Where possible, Entity Ids such as Reference Types should be named after the Entity with a prefix of the Entity Type in the name.


Example of good naming:

my $BUSINESS_TYPE = 12345;

my $CUST_STATUS_RT = 54321;


Example of bad naming:

my $BUSINESSTYPE = 12345;

my $referencetype = 54321;

Global Variables

Global Variables (Scalar, Array and Hash) should be prefixed with a lowercase z. with each word within the variable name leading with a capital letter.


Example of good naming:

my $zTaskId;

my $zEffectiveDate;


Example of bad naming:

my $task_id;

my $EffectiveDate;

Local Variables

Local Variables (Scalar, Array and Hash) should be prefixed with a lowercase l. with each word within the variable name leading with a capital letter.

However when defining a local variable, ask yourself the following questions:

Do I need to assing this value to a variable or can I reference it?

Does the variable need to store the value or can I store a reference to this value?


Example of good naming:

my $lCounter;

my $lCustFetchSQL;


Example of bad naming:

my $counter;

my $SQL;

Sub Routines

Where relevant, a Sub Routine/Methods name should be given one of the following prefixes to aid in the identification of its function:


Prefix Description/Purpose
get To fetch and return one or more data values
set To write (internally or externally) a group of data items
process To process a set of data values
request To request input via STDIN
validate To validate a group of data items (typically returning a Boolean representing validity

Sub Routines/Methods should be given meaningful names, not contain underscores and be written in Camel Case. Sub Routines should not be called using the ‘&’ unless absolutely required (when assigning to EXPORT for instance) and should always contain parenthesis even if nothing is being passed.


Example of good naming:

sub processCustomerList

sub validateArguments


Example of bad naming:

sub customer_list

sub process_command_line

Working with the Database

Connection

If you are connecting to the main database then use the core sub-routine ataiDbOpen(). Pass in $opt_u as an argument and call zdie when handling an error:

$zdb = ataiDbOpen($opt_u) || zdie($MSG_PASSTHRU, $errstr);

Once the connection is established, it is recommened to turn off auto-commit. This will force you as the developer to called $zdb->commit() when you have updated or inserted data and are ready to commit the changes:

$zdb->{AutoCommit} = 0;

Here is a complete example of a manual commit.

$zdb->commit() || zdie($MSG_PASSTHRU, $zdb->errstr);

Ensure that the disconnect() is called when your program exits, either planner or in error.

$zdb->disconnect();

Fetching Data

SQL Text should be assigned and stored in a variable before passing to prepare(). That way, the SQL can be printed within debug mode.

If you are planning on executing a query multiple times, then the SQL query should be prepared and stored in a globally accessable variable of if local to a sub-routine, the prepare SQL should be stored within a local variable outside of the loop. Put more basically, an SQL Statement should only call prepare() once.

If there are values that need to be entered into the SQL and the value is either not known at the time of preparing or varies on each execution, then use bind_param() and not a '?' within the SQL.

Adding bind values into a prepared SQL Statement should be done before the execute() method is called.

There are various methods for fetching the data from the cursor. Use the one you feel provides the most efficient solution. For information on how to extract data from the Database, please refer to the DBI documentation in our External Links menu.

The use of finish() is only required if you are not fetching all the data from Oracle. Most of the time, the rule is if you are not fetching the data in a while loop, then call finish().

Updating Data

If you have to update or insert data via SQL in a Perl Script, they use do() to perform the change. Only use prepare(), bind_param(), execute() when needing to perform high-volume changes.

It is preferred that when needing to make changes to the database that this are done via TRE to ensure that no data integrity issues are introduced.


# Define the SQL for the Insert
my $lInsertSQL = <<END_SQL;
    INSERT INTO reference_code
               (REFERENCE_TYPE_ID,
                REFERENCE_CODE,
                LAST_MODIFIED,
                CODE_LABEL,
                ABBREVIATION,
                DESCRIPTION,
                VALID_IND_CODE)
        VALUES ($lReferenceTypeCode,
                $lReferenceCode,
                SYSDATE,
                '$lCodeLabel',
                '$lAbbreviation',
                'Created by $MODULE_NAME',
                1)
END_SQL

# Execute the SQL
# Note: AutoCommit is off, need to ensure we commit at the end
$zdb->do($lInsertSQL) || zdie($MSG_PASSTHRU, $zdb->errstr());

...

# Commit
$zdb->commit() || zdie($MSG_PASSTHRU, $zdb->errstr());
                        

Conditions and Assignments

String Assignments

All Strings should be assigned to variables using single quotes. If a string contains a variable or a line feed then this can be done with double quotes. The reason for using single quotes where possible is to improve performance in the assignment.


Example of good string assignments:

my $lMessage = 'Script has completed' . "\n";

my $lMessage = "$MODULE_NAME has completed\n";

my $lMessage = 'Some Message Here';

my $lMessage = "Some Message Here\n";


Example of bad string assignments:

my $lMessage = "Some Message Here";


If you are assigning a large, multi-line string, then the double arrow (<<) method must be used. You can treat this like double quotes. So special characters would need to be escaped and variables will be interpreted on assignment:

Example of good string assignments:


my $lSQL = <<EOL;
        SELECT COUNT(*)
          FROM charge
         WHERE tariff_id  = 12345 
           AND service_id = 54321;
EOL
                        

Example of bad string assignments:

my $lSQL = "SELECT COUNT(*) FROM charge WHERE tariff_id = 12345 AND service_id = 54321";

Boolean Conditions

Each individual Boolean condition should be wrapped in parenthesis ‘()’. For an if/else if with multiple Boolean conditions, then each one should be wrapped in parenthesis and all of them collectively should be wrapped together in a single parenthesis pair. One or more spaces should be supplied between the Parenthesis allowing the eye to be able to clearly see their existence.

Example:


# Comment on what this condition is doing
if( ($lStatus        == $ACTIVE)                 && 
    ($lProductTypeId == $ALLOWANCE_PRODUCT_TYPE)  )
{
    # The answer is 42...
    ...
}
elsif($lStatus == $SUSPENDED)
{
    # Comment on why this else if condition is needed
    ...
}
else
{
    # Comment what we are doing in the if condition
    ...
}

                        

In-Line If Conditions

If you have an If condition that has a single statement in it, they you should use an In-line If condition:

Before:


# Is this the Admin User?
if($lName eq $ADMIN)
{
    # Assign Admin Access for Admin User
    $lAdminAccess = 1;
}
                        

After:


# Assign Admin Access for Admin User
$lAdminAccess = 1 if($lName eq $ADMIN);
                        

In-Line If/Else Conditions

Like the in-line if statement, if you have an If/Else code block both assigning different values to the same variable, then you can use an in-line if/else statement like this:

Before:


# Is this the Admin User?
if($lName eq $ADMIN)
{
    # Assign Admin Access for Admin User
    $lAdminAccess = 1;
}
else
{
    # Ensure Admin Access is off
    $lAdminAccess = 0;
}
                        

After:


# Assign Admin Access if the user is Admin,
# otherwise ensure it is deactivated
$lAdminAccess = ($lName eq $ADMIN) ? 1 : 0;
                        

Numbers

Arithmetic Calculations

Arithmetic calculations use an order or precedence when being interpreted by the compiler. Here is simple quiz, calculate the value below.

5 + 5 x 5 = ?

What answer did you get? If you got 50, then I am afraid you are wrong. If you got the correct answer (30) then how do you know if this is the right answer? Certainly this would calculate the result as 30, but what if the answer should equal 50?

Well each individual calculation should be wrapped in a parenthesis. The compiler would then calculate the deepest parenthesis to the highest. You would be required to do this to override the order or precedence, but this should actually be done all the time to ensure that the maintenance developer correctly interprets the calculation.

Following the order or precedence to get 30

(5 + (5 * 5) )

Overriding the order of precedence to get 50

( (5 + 5) * 5)

Passing Numbers to TRE

In some cases (mainly with calling an EPM function via TRE) you may get errors when your Scalar contains a number, but the TRE connection sees it as a string. This is a pain and unfortuatly the ataiTRE module does not handle this for you. So to ensure your scalar is seen as a number, then please multiply it by 1.

$lNumber = ($lNumber * 1);

Objects, References and Modules

Object Orientated Programming

Perl is both a Procedule and OO language and although you may get stick from other development teams, it is a powerful tool. You get the strenghts and weaknesses of each at the same time.

Even if you are new to developing Perl in Singl.eView, you will have used OOP when calling the database. The prepare() is not actually a sub-routine, it is a method. $zdb contains a reference to a DBI Object that is connected to the Oracle Instance and all this is handled within ataiUtils.pm which you will have in your script.

If you are writting an OOP Module then please refer to your sub-routines using the following terminology:

Class Method - A sub-routine that is exposed in the export statement and is called as a standard sub-routine. Typically this methods are called to create (bless) the object into existance, however it is feasiable that you may want to expose a procedual function within the same module.

Method - A sub-routine that takes in the object reference as the first argument (called $lSelf). This sub-routine is then called using the arrows within the script:

$lSomeObject->someMethod();

Private Method - A sub-routine that works like a method, but is not possible to call as a method. You may want to use Private Methods to contain reusable code within the object but you don't want it to be exposed or called by a script. To ensure that the object reference is clearly labelled, it should be stored in a variable called $lThis.

Using References

When you are processing large data items such as input files or datasets and you need to work on this data in multiple locations within your script, then this data should ideally be passed by reference. Data will be stored and modified in a single location rather than being copied from variable to variable improving memory efficiency.

References should be linked using the -> arrow in Perl to clearly show the de-referencing:


$lSomeHash{$lSomeKey}->{$lSomeOtherKey} = $lSomeValue;

...

if($lSomeHash{$lSomeKey}->{$lSomeOtherKey} eq $SPECIAL)
{
    ...
}
                        

Before:


doStuff(@lLotsOfData);

...

sub doStuff
{
    my @lData = @_;
    
    # Loop over the dataset and process the customer records
    foreach(@lData)
    {
    	# Process Request
    	processData($_);
        
        ...
    }
} 
                        

After:


doStuff(\@lLotsOfData);

...

sub doStuff
{
    my $lLargeArrayRef = shift;
    
    # Cast the Array Reference and loop through it
    foreach(@{ lLargeArrayRef })
    {
    	# Process Request
    	processData($_);
        
        ...
    }
} 
                        

Homebrew Modules

Modules are a great place to store functionality that you will use within many scripts.

Modules don't just need to contain functionality though; they can also store Global Constants and can be grouped so you only import specific sets relivant to your script. For instance, you could have a group of Global Constants that contains Date Format strings like the following:


my $STD_DATETIME_FORMAT = 'dd-mm-yyyy hh24:mi:ss';
my $STD_DATE_FORMAT     = 'dd-mm-yyyy';
my $STD_TIME_FORMAT     = 'hh24:mi:ss';
                        

You could also assign some validation sub routines to this group as well as if we are processing Dates as a string, then it is most likly that we are going to want to validate them as well at some point.

The use of modules is encouraged and where possible make sure that they are generic and reusable. There are of course exceptions to this rule where putting code into a Module may aid in a more maintainable script.

What should be assessed within a Code Review is that reusable code where time allows has been broken into a module.

Outside of this coding standard, it is encouraged (if you have not done so already) to build and maintain a module for functions that are useful but not very collective (e.g. ataiUtilities.pm).

If you are creating a reuable module then ensure that the functionality you are producing is also generic and not to specific to the project that you are delivering the module under. Also ensure that the module is fully unit tested, you don't want your script to fail the first time it encounters a new script.

Skeleton Sub Routine of Validation Function


sub validateDateString
{
    my $lDateString = shift;
    my $lFormat     = shift;
    my $lReturn     = 0;
    
    if($lFormat eq $STD_DATETIME_FORMAT)
    {
        # Validate the string contains a date and time
        ...
    }
    elsif($lFormat eq $STD_DATE_FORMAT)
    {
        # Validate the string contains a date only
        ...
    }
    elsif($lFormat eq $STD_TIME_FORMAT)
    {
        # Validate the string contains a time only
        ...
    }
    else
    {
        # Date Format not known - record error and return 0
        ...
    }
    
    ...
}
                        

CPAN Modules

One of the great things about Perl is CPAN. There are a lot of modules out there that will be able to save you time within Development if someone already has a CPAN module available that does what you need to achive.

If you want/need to deploy a CPAN Module then first check with your ES and UNIX Team and try to get this added to the environment as soon as possible to avoid any issues later in the Development process.

Arrays and Hashes

Array Index Value

The value used to determine an array value should not contain any variable modification.

Example of what to do:


my $lName = $lArray[0];
my $lName = $lArray[$NAME_POS];
my $lName = $lArray[$i];
my $lName = $lArray[($i - 1)];
                        

Example of what to not do:


my $lName = $lArray[$i++];
                        

Array Index Cards - Performance

This is basically a set of comments that show what is in an Array (or Hash) to aid in development and also on-going maintenance of the script.

You might just call this a comment or by another name which is fine, at DDT we call it an Index Card.

If you have an Array Index Card in place, then assignment to a ‘well named variable’ can be skipped in order to maximise performance.

Example:

An array being iterated though and printing the name in the first position of the nested array.


# Loop over the 2 Dimensional array, first dimension contains the rows,
# the second contains the values within the record
foreach(@lSomeArray)
{
    # Assign the Array Ref to local variable
    my $lRecord = $_;
    
    # Extract the Name
    my $lName = $lRecord->[0];
    
    print("Processing the Customer $lName\n");
}
                        

Example:

The use of an Array Index Card and how to do the print statement without the need to create variables.


# Loop over the 2 Dimensional array, first dimension contains the rows,
# the second contains the values within the record
foreach(@lSomeArray)
{
    # POS     VALUE
    #   0     Name
    #   1     Age
    #   2     Gender
    #   3     Date of Birth (dd-mm-yyyy)
    
    print("Processing the Customer $_->[0]\n");
}
                        

Creating a Hash with values

When creating a Hash and providing values into the Hash, then this should be done at the same time and it should be neat and easy to read. A Hash Index card should also be provided even on a complex Hash:

Example:


# Define a Global Hash to decode Days of Week
# into the number (1-7)
# {DAY} => Number
my %WEEK_DAY_NUMBER = ('Monday'    => 1,
                       'Tuesday'   => 2,
                       'Wednesday' => 3,
                       'Thursday'  => 4,
                       'Friday'    => 5,
                       'Saturday'  => 6,
                       'Sunday'    => 7);
                        

The same process should be used for more complex Hashes. Here is an example of a Hash being defined that will take in data from an SQL result set and load it into a complex hash so as to aggregate the data:

Example:


{
    ...
    
    my %lAggregatedData;
    
    ...
    
    # Loop through the result set and process and aggregate the data
    while( (my @lDataset) = $lCursor->fetchrow_array())
    {
    	# 0 = Event Type
        # 1 = Event Sub Type
        # 2 = Charge
        
        # The data will map into the hash as follows:
        # {Event Type} => {Event Sub Type} => Sum of all 'Charge' values
        
        if(! exists($lAggregatedData{$lDataset[0]}->{$lDataset[1]}))
        {
        	# Ensure the Key is defined within the hash and set the value to 0
            # Note: doing this in traditional IF condition because an inline
            #       would be quite long.
            $lAggregatedData{$lDataset[0]}->{$lDataset[1]} = 0;
       	}
        
        # Sum up the Charge
        $lAggregatedData{$lDataset[0]}->{$lDataset[1]} += $lDataset[2];
        
        ...
        
  	}
    
    ...
    
}