Tuesday, April 19, 2016

The importance of cohesion

"Come on, come on, let's stick together" - Bryan Ferry

There's more to PL/SQL programs than packages, but most of our code will live in packages. The PL/SQL Reference offers the following benefits of organising our code into packages:

Modularity - we encapsulate logically related components into an easy to understand structure.

Easier Application Design - we can start with the interface in the package specification and code the implementation later.

Hidden Implementation Details - the package body is private so we can prevent application users having direct access to certain functionality.

Added Functionality - we can share the state of Package public variables and cursors for the life of a session.

Better Performance - Oracle Database loads the whole package into memory the first time you invoke a package subprogram, which makes subsequent invocations of any other subprogram quicker. Also packages prevent cascading dependencies and unnecessary recompilation.

Grants - we can grant permission on a single package instead of a whole bunch of objects.

However, we can only realise these benefits if the packaged components belong together: in other words, if our package is cohesive.  

The ever reliable Wikipedia defines cohesion like this: "the degree to which the elements of a module belong together"; in other words how it's a measure of the strength of the relationship between components. It's common to think of cohesion as a binary state - either a package is cohesive or it isn't - but actually it's a spectrum. (Perhaps computer science should use  "cohesiveness" which is more expressi but cohesion it is.)

Cohesion

Cohesion owes its origin as a Comp Sci term to Stevens, Myers, and Constantine.  Back in the Seventies they used the terms "module" and "processing elements", but we're discussing PL/SQL so let's use Package and Procedure instead. They defined seven levels of cohesion, with each level being better - more usefully cohesive - than its predecessor.

Coincidental

The package comprises an arbitrary selection of procedures and functions which are not related in any way. This obviously seems like a daft thing to do, but most packages with "Utility" in their name fall into this category.

Logical

The package contains procedures which all belong to the same logical class of functions. For instance, we might have a package to collect all the procedures which act as endpoints for REST Data Services.

Temporal

The package consists of procedures which are executed at the same system event. So we might have a package of procedures executed when a user logs on - authentication, auditing, session initialisation - and similar package for tidying up when the user logs off. Other than the triggering event the packaged functions are unrelated to each other.

Procedural

The package consists of procedures which are executed as part of the same business event. For instance, in an auction application there are a set of actions to follow whenever a bid is made: compare to asking price, evaluate against existing maximum bid, update lot's status, update bidder's history, send an email to the bidder, send an email to the user who's been outbid, etc.

Communicational

The package contains procedures which share common inputs or outputs. For example a payroll package may have procedures to calculate base salary, overtime, sick pay, commission, bonuses and produce the overall remuneration for an employee.

Sequential

The package comprises procedures which are executed as a chain, so that the output of one procedure becomes the input for another procedure. A classic example of this is an ETL package with procedures for loading data into a staging area, validating and transforming the data, and then loading records into the target table(s).

Functional

The package comprises procedures which are focused on a single task. Not only are all the procedures strongly related to each other but they are fitted to user roles too. So procedures for power users are in a separate package from procedures for normal users. The Oracle built-in packages for Advanced Queuing are a good model of Functional cohesion.

How cohesive is cohesive enough?

The grades of cohesion, with Coincidental as the worst and Functional as the best, are guidelines. Not every package needs to have Functional cohesion. In a software architecture we will have modules at different levels. The higher modules will tend to be composed of calls to lower level modules. The low level modules are the concrete implementations and they should aspire to Sequential or Functional cohesion.

The higher level modules can be organised to other levels. For instance we might want to build packages around user roles - Sales, Production, HR, IT - because Procedural cohesion makes it easier for the UI teams to develop screens, especially if they need to skin them for various different technologies (desktop, web, mobile). Likewise we wouldn't want to have Temporally cohesive packages with concrete code for managing user logon or logoff. But there is a value in organising a package which bundles up all the low level calls into a single abstract call for use in schema level AFTER LOGON triggers.    

Cohesion is not an easily evaluated condition. We need cohesion with a purpose, a reason to stick those procedures together. It's not enough to say "this package is cohesive". We must take into consideration how cohesive the package needs to be: how will it be used? what is its relationships with the other packages?

Applying design principles such as Single Responsibility, Common Reuse, Common Closure and Interface Segregation can help us to build cohesive packages. Getting the balance right requires an understanding of the purpose of the package and its place within the overall software architecture.  

Part of the Designing PL/SQL Programs series

Sunday, April 03, 2016

Working with the Interface Segregation Principle

Obviously Interface Segregation is crucial for implementing restricted access. For any given set of data there are three broad categories of access:

  • reporting 
  • manipulation 
  • administration and governance 

So we need to define at least one interface - packages - for each category in order that we can grant the appropriate access to different groups of users: read-only users, regular users, power users.

But there's more to Interface Segregation. This example is based on a procedure posted on a programming forum. Its purpose is to maintain medical records relating to a patient's drug treatments. The procedure has some business logic (which I've redacted) but its overall structure is defined by the split between the Verification task and the De-verification task, and flow is controlled by the value of the p_verify_mode parameter.
 
procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2,
       p_verify_mode in varchar2)
as
    new_rxh_id number;
    rxh_count number;
    rxl_count number;
    drh_rec drug_admin_history%rowtype;
begin
    select * into drh_rec ....;
    select count(*) into rxh_count ....;

    if p_verify_mode = 'VERIFY' then

        update drug_admin_history ....;
        if drh_rec.pp_id <> 0 then
            update patient_prescription ....;
        end if;
        if rxh_count = 0 then
            insert into prescription_header ....;
        else
            select rxh_id into new_rxh_id ....;
        end if;
        insert into prescription_line ....;
        if drh_rec.threshhold > 0
            insert into prescription_line ....;
        end if;

    elsif p_verify_mode = 'DEVERIFY' then

        update drug_admin_history ....;
        if drh_rec.pp_id <> 0 then
            update patient_prescription ....;
        end if;
        select rxl_rxh_id into new_rxh_id ....;
        delete prescription_line ....;
        delete prescription_header ....;

    end if;
end;

Does this procedure have a Single Responsibility?  Hmmm. It conforms to Common Reuse - users who can verify can also de-verify. It doesn't break Common Closure, because both tasks work with the same tables. But there is a nagging doubt. It appears to be doing two things: Verification and De-verification.

So, how does this does this procedure work as an interface? There is a definite problem when it comes to calling the procedure: how do I as a developer know what value to pass to p_verify_mode?

  rx_management.rx_verification
     (p_drh_id => 1234,
       p_patient_name => 'John Yaya',
       p_verify_mode => ???);

The only way to know is to inspect the source code of the procedure. That breaks the Information Hiding principle, and it might not be viable (if the procedure is owned by a different schema). Clearly the interface could benefit from a redesign. One approach would be to declare constants for the acceptable values; while we're at it, why not define a PL/SQL subtype for verification mode and tweak the procedure's signature to make it clear that's what's expected:        


create or replace package rx_management is
 
  subtype verification_mode_subt is varchar2(10);
  c_verify constant verification_mode_subt := 'VERIFY'; 
  c_deverify constant verification_mode_subt := 'DEVERIFY'; 
 
  procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2,
       p_verify_mode in verification_mode_subt);

end rx_management;

Nevertheless it is still possible for a caller program to pass a wrong value:


  rx_management.rx_verification
     (p_drh_id => 1234,
       p_patient_name => 'John Yaya',
       p_verify_mode => 'Verify');

What happens then? Literally nothing. The value drops through the control structure without satisfying any condition. It's an unsatisfactory outcome. We could change the implementation of rx_verification() to validate the parameter value and raise and exception. Or we could add an ELSE branch and raise an exception. But those are runtime exceptions. It would be better to mistake-proof the interface so that it is not possible to pass an invalid value in the first place.

Which leads us to to a Segregated Interface :

create or replace package rx_management is
 
  procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2);
 
  procedure rx_deverification
     (p_drh_id in number);
     
end rx_management;

Suddenly it becomes clear that the original procedure was poorly named (I call rx_verification() to issue an RX de-verification?!)  We have two procedures but their usage is now straightforward and the signatures are cleaner (the p_patient_name is only used in the Verification branch so there's no need to pass it when issuing a De-verification).

Summary

Interface Segregation creates simpler and safer controls but more of them. This is a general effect of the Information Hiding principle. It is a trade-off. We need to be sensible. Also, this is not a proscription against flags. There will always be times when we need to pass instructions to called procedures to modify their behaviour. In those cases it is important that the interface includes a definition of acceptable values.

Part of the Designing PL/SQL Programs series

Three more principles

Here are some more principles which can help us design better programs. These principles aren't part of an organized theory, and they're aren't particularly related to any programming paradigm. But each is part of the canon, and each is about the relationship between a program's interface and its implementation.

The Principle Of Least Astonishment

Also known as the Principle of Least Surprise, the rule is simple: programs should do what we expect them to do. This is more than simply honouring the contract of the interface. It means complying with accepted conventions of our programming. In PL/SQL programming there is a convention that functions are read-only, or at least do not change database state. Another such convention is that low-level routines do not execute COMMIT statements; transaction management is the prerogative of the program at the top of the call stack, which may be interacting directly with a user or may be an autonomous batch process.

Perhaps the most common flouting of the Principle Of Least Astonishment is this:

   exception
      when others then
      null; 

It is reasonable to expect that a program will hurl an exception if something as gone awry. Unfortunately, we are not as astonished as we should be when we find a procedure with an exception handle which swallows any and every exception.

Information Hiding Principle 

Another venerable principle, this one was expounded by David Parnas in 1972. It requires that a calling program should not need to know anything about the implementation of a called program. The definition of the interface should be sufficient. It is the cornerstone of black-box programming. The virtue of Information Hiding is that knowledge of internal details inevitably leads to coupling between the called and calling routines: when we change the called program we need to change the caller too. We honour this principle any time we call a procedure in a package owned by another schema, because the EXECUTE privilege grants visibility of the package specification (the interface) but not the body (the implementation).

The Law Of Leaky Abstractions

Joel Spolsky coined this one: "All non-trivial abstractions, to some degree, are leaky." No matter how hard we try, some details of the implementation of a called program will be exposed to the calling programming, and will need to be acknowledged. Let's consider this interface again:

    function get_employee_recs
        ( p_deptno in number ) 
        return emp_refcursor;

We know it returns a result set of employee records. But in what order? Sorting by EMPNO would be pretty useless, given that it is a surrogate key (and hence without meaning). Other candidates - HIREDATE, SAL - will be helpful for some cases and irrelevant for others. One approach is to always return an unsorted set and leave it to the caller to sort the results; but it is usually more efficient to sort records in a query rather than a collection. Another approach would be to write several functions - get_employee_recs_sorted_hiredate(), get_employee_recs_sorted_sal() - but that leads to a bloated interface which is hard to understand. Tricky.

Conclusion

Principles are guidelines. There are tensions between them. Good design is a matter of trade-offs. We cannot blindly follow Information Hiding and ignore the Leaky Abstractions. We need to exercise our professional judgement (which is a good thing).

Part of the Designing PL/SQL Programs series

It's all about the interface

When we talk about program design we're mainly talking about interface design. The interface is the part of our program that the users interact with. Normally discussion of UI focuses on GUI or UX, that is, the interface with the end user of our application.

But developers are users too.

Another developer writing a program which calls a routine in my program is a user of my code (and, I must remember, six months after I last touched the program, I am that other developer). A well-designed interface is frictionless: it can be slotted into a calling program without too much effort. A poor interface breaks the flow: it takes time and thought to figure it out. In the worst case we have to scramble around in the documentation or the source code.

Formally, an interface is the mechanism which allows the environment (the user or agent) to interact with the system (the program). What the system actually does is the implementation: the interface provides access to the implementation without the environment needing to understand the details. In PL/SQL programs the implementation will usually contain a hefty chunk of SQL. The interface mediates access to data.

An interface is a contract. It specifies what the caller must do and what the called program will do in return. Take this example:

function get_employee_recs
     ( p_deptno in number )
     return emp_refcursor;

The contract says, if the calling program passes a valid DEPTNO the function will return records for all the employees in that department, as a strongly-typed ref cursor. Unfortunately the contract doesn't say what will happen if the calling program passes an invalid DEPTNO. Does the function return an empty set or throw an exception? The short answer is we can't tell. We must rely on convention or the document, which is an unfortunate gap in the PL/SQL language; the Java keyword throws is quite neat in this respect.

The interface is here to help

The interface presents an implementation of business logic. The interface is a curated interpretation, and doesn't enable unfettered access. Rather, a well-designed interface helps a developer use the business logic in a sensible fashion. Dan Lockton calls this Design With Intent: Good design expresses how a product should be used. It doesn't have to be complicated. We can use simple control mechanisms which to help other developers use our code properly.

Restriction of access

Simply, the interface restricts access to certain functions or denies it altogether. Only certain users are allowed to view salaries, and even fewer to modify them. The interface to Employee records should separate salary functions from more widely-available functions. Access restriction can be implemented in a hard fashion, using architectural constructs (views, packages, schemas) or in a soft fashion (using VPD or Data Vault). The hard approach benefits from clarity, the soft approach offers flexibility.

Forcing functions

If certain things must be done in a specific order then the interface should only offer a method which enforces the correct order. For instance, if we need to insert records into a parent table and a child table in the same transaction (perhaps a super-type/sub-type implementation of a foreign key arc) a helpful interface will only expose a procedure which inserts both records in the correct order.

Mistake-proofing

A well-design interface prevents its users from making obvious mistakes. The signature of a procedure should be clear and unambiguous. Naming is important. If a parameter presents a table attribute the parameter name should echo the column name: p_empno is better than p_id. Default values for parameters should lead developers to sensible and safe choices. If several parameters have default values they must play nicely together: accepting all the defaults should not generate an error condition.

Abstraction

Abstraction is just another word for interface. It allows us to focus on the details of our own code without need to understand the concrete details of the other code we depend upon. That's why good interfaces are the key to managing large codebases.

Part of the Designing PL/SQL Programs series