Wednesday, March 16, 2016

Designing PL/SQL Programs: Series home page

Designing PL/SQL Programs is a succession of articles published the articles in a nonlinear fashion. Eventually it will evolve into a coherent series. In the meantime this page serves as a map and navigation aid. I will add articles to it as and when I publish them.

Introduction

Designing PL/SQL Programs
It's all about the interface

Principles and Patterns

Introducing the SOLID principles
Introducing the RCCASS principles
Three more principles
The Single Responsibility principles
The Dependency Inversion Principle: a practical example
Working with the Interface Segregation Principle

Software Architecture

The importance of cohesion
Utilities - the Coincidental Cohesion anti-pattern
Avoiding Coincidental Cohesion

Interface design

Data Access Layer versus Table APIs The use and misuse of %TYPE and %ROWTYPE attributes in PL/SQL APIs

Tools and Techniques

The Dependency Inversion Principle: a practical example

These design principles may seem rather academic, so let's look at a real life demonstration of how applying Dependency Inversion Principle lead to an improved software design.

Here is a simplified version of an ETL framework which uses SQL Types in a similar fashion to the approach described in my blog post here. The loading process is defined using an abstract non-instantiable Type like this:
create or replace type load_t force as object
    ( txn_date date
      , tgt_name varchar2(30)
      , member function load return number
      , final member function get_tgt return varchar2
      )
not final not instantiable;
/

create or replace type body load_t as
    member function load return number
    is
    begin
        return 0;
    end load;
    final member function get_tgt return varchar2
    is
    begin
        return self.tgt_name;
    end get_tgt;
end;
/


The concrete behaviour for each target table in the ABC feed is defined by sub-types like this:
create or replace type load_tgt1_t under load_t
    ( overriding member function load return number
        , constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
      )
;
/
create or replace type body load_tgt1_t as
    overriding member function load return number
    is
    begin
        insert into tgt1 (col1, col2)
        select to_number(col_a), col_b
        from stg_abc stg
        where stg.txn_date = self.txn_date;
        return sql%rowcount;
    end load;
    constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
    is
    begin
        self.txn_date := txn_date;
        self.tgt_name := 'TGT1';
        return;
    end load_tgt1_t;
end;
/
This approach is neat because ETL is a fairly generic process: the mappings and behaviour for a particular target table are specific but the shape of the loading process is the same for any and all target tables. So we can build a generic PL/SQL procedure to handle them. This simplistic example does some logging, loops through a set of generic objects and, through the magic of polymorphism, calls a generic method which executes specific code for each target table:
    procedure load  
     (p_txn_date in date
        , p_load_set in sys_refcursor)
    is
        type loadset_r is record (
            tgtset load_t
            );
        lrecs loadset_r;
        load_count number;
    begin
        logger.logm('LOAD START::txn_date='||to_char(p_txn_date,'YYYY-MM-DD'));
        loop
            fetch p_load_set into lrecs;
            exit when p_load_set%notfound;
            logger.logm(lrecs.tgtset.get_tgt()||' start');
            load_count := lrecs.tgtset.load();
            logger.logm(lrecs.tgtset.get_tgt()||' loaded='||to_char(load_count));
        end loop;
        logger.logm('LOAD FINISH');
    end load;

So far, so abstract. The catch is the procedure which instantiates the objects:
    procedure load_abc_from_stg  
         (p_txn_date in date)
    is
        rc sys_refcursor;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       load(p_txn_date, rc);
    end load_abc_from_stg;

On casual inspection it doesn't seem problematic but the call to the load() procedure gives the game away. Both procedures are in the same package:
create or replace package loader as
    procedure load 
     (p_txn_date in date
        , p_load_set in sys_refcursor);
    procedure load_abc_from_stg
         (p_txn_date in date);
end loader;
/

So the package mixes generic and concrete functionality. What makes this a problem? After all, it's all ETL so doesn't the package follow the Single Responsibility Principle? Well, up to a point. But if we want to add a new table to the ABC feed we need to update the LOADER package. Likewise if we want to add a new feed, DEF, we need to update the LOADER package. So it breaks the Stable Abstractions principle. It also creates dependency problems, because the abstract load() process has dependencies on higher level modules. We can't deploy the LOADER package without deploying objects for all the feeds.

Applying the Dependency Inversion Principle.

The solution is to extract the load_abc() procedure into a concrete package of its own. To make this work we need to improve the interface between the load() procedure and programs which call it. Both sides of the interface should depend on a shared abstraction.

The LOADER package is now properly generic:
create or replace package loader as
    type loadset_r is record (
            tgtset load_t
            );
    type loadset_rc is ref cursor return loadset_r;
    procedure load 
        (p_txn_date in date
          , p_load_set in loadset_rc)
         authid current_user
               ;
end loader;
/
The loadset_r type has moved into the package specification, and defines a strongly-typed ref cursor. The load() procedure uses the strongly-typed ref cursor.

Similarly the LOAD_ABC package is wholly concrete:
create or replace package loader_abc as
    procedure load_from_stg
            (p_txn_date in date);
end loader_abc;
/

create or replace package body loader_abc as
    procedure load_from_stg
            (p_txn_date in date)
    is
        rc loader.loadset_rc;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       loader.load(p_txn_date, rc);
    end load_from_stg;
end loader_abc;
/
Both package bodies now depend on abstractions: the strongly-typed ref cursor in the LOADER specification and the LOADER_T SQL Type. These should change much less frequently than the tables in the feed or even the loading process itself. This is the Dependency Inversion Principle in action.

Separating generic and concrete functionality into separate packages produces a more stable application. Users of a feed package are shielded from changes in other feeds. The LOADER package relies on strongly-typed abstractions. Consequently we can code a new feed package which can call loader.load() without peeking into that procedure's implementation to see what it's expecting.

Part of the Designing PL/SQL Programs series

Tuesday, March 15, 2016

A new law of office life

I posted my Three Laws of Office Life a long while back. Subsequent experience has revealed another one: Every office kitchen which has a sign reminding people to do their washing-up has a concomitant large pile of unwashed crockery and dirty cutlery.

People wash their own mug and cereal bowl, but are less rigorous with the crockery from the kitchen cupboard. This phenomenon will be familiar to anybody who has shared a house during their student days or later.

Don't think that installing a dishwasher will change anything: it merely transfers the problem. Someone who won't wash up a mug is even less likely to unload a dishwasher. There is only one workable solution, and that is to have no office kitchen at all. (Although this creates a new problem, as vending machine coffee is universally vile and the tea unspeakable.)

So the Pile of Washing Up constitutes an ineluctable law, but it is the fourth law and we all know that the canon only admits sets of three laws. One must go. Since I first formulated these laws cost-cutting in the enterprise has more-or-less abolished the practice of providing biscuits at meetings. Hence the old Second Law no longer holds, and creates a neat vacancy.

Here are the revised Laws of Office Life:

First law: For every situation there is an equal and apposite Dilbert cartoon.

Second Law: Every office kitchen which has a sign reminding people to do their washing-up has a concomitant large pile of unwashed crockery and dirty cutlery.

Third Law: The bloke with the most annoying laugh is the one who finds everything funny.

Introducing the RCCASS design principles

Rob C Martin actually defined eleven principles for OOP. The first five, the SOLID principles, relate to individual classes. The other six, the RCCASS principles, deal with the design of packages (in the C++ or Java sense, i.e. libraries). They are far less known than the first five. There are two reasons for this:

  • Unlike "SOLID", "RCCASS" is awkward to say and doesn't form a neat mnemonic. 
  • Programmers are far less interested in software architecture. 

Software architecture tends to be an alien concept in PL/SQL. Usually a codebase of packages simply accretes over the years, like a coral reef. Perhaps the RCCASS principles can help change that.

The RCCASS Principles

Reuse Release Equivalency Principle 

The Reuse Release Equivalency Principle states that the unit of release matches the unit of reuse, which is the parts of the program unit which are consumed by other programs. Basically the unit of release defines the scope of regression testing for consuming applications. It's an ill-mannered release which forces projects to undertake unnecessary regression testing. Cohesive program units allow consumers to do regression testing only for functionality they actually use. It's less of a problem for PL/SQL because (unlike C++ libraries of Java jars) the unit of release can have a very low level of granularity: individual packages or stored procedures.

Common Reuse Principle 

The Common Reuse principle supports the definition of cohesive program units. Functions which share a dependency belong together, because they are likely to be used together belong together. For instance, procedures which maintain the Employees table should be co-located in one package (or a group of related packages). They will share sub-routines, constants and exceptions. Packaging related procedures together makes the package easier to write and easier for calling programs to use.

Common Closure Principle

The Common Closure principle supports also the definition of cohesive program units. Functions which share a dependency belong together, because they have a common axis of change. Common Closure helps to minimise the number of program units affected by a change. For instance, programs which use the Employees table may need to change if the structure of the table changes. All the changes must be released together: table, PL/SQL, types, etc.

Acyclic Dependencies Principle 

Avoid cyclic dependencies between program units: if package A depends on package B then B must not have a dependency on B. Cyclic dependencies make application hard to use and harder to deploy. The dependency graph shows the order in which objects must be built. Designing a dependency graph upfront is futile, but we can keep to rough guidelines. Higher level packages implementing business rules tend to depend on generic routines which in turn tend to depend on low-level utilities. There should be no application logic in those lower-level routines. If SALES requires a special logging implementation then that should be handled in the SALES subsystem not in the standard logging package.

Stable Dependencies Principle 

Any change to the implementation of a program unit which is widely used will generate regression tests for all the programs which call it. At the most extreme, a change to a logging routine could affect all the other programs in our application. As with the Open/Closed Principle we need to fix bugs. But new features should be introduced by extension not modification. And refactoring of low-level dependencies must not done on a whim.

Stable Abstractions Principle

Abstractions are dependencies, especially when we're talking about PL/SQL. So this Principle is quite similar to Stable Dependencies Principle. The key difference is that this relates to the definition of interfaces rather than implementation. A change to the signature of a logging routine could require code changes to all the other programs in the application. Obviously this is even more inconvenient than enforced regression testing. Avoid changing the signature of a public procedure or the projection of a public view. Again, extension rather than modification is the preferred approach.

Applicability of RCCASS principles in PL/SQL 

The focus of these principles is the stability of a shared codebase, and minimising the impact of change on the consumers of our code. This is vital in large projects, where communication between teams is often convoluted. It is even more important for open source or proprietary libraries.

We we can apply Common Reuse Principle and Common Closure Principle to define the scope of the Reuse Release Equivalency Principle, and hence define the boundaries of a sub-system (whisper it, schema). Likewise we can apply the Stable Dependencies Principle and Stable Abstractions Principle to enforce the Acyclic Dependencies Principle to build stables PL/SQL libraries. So the RCCASS principles offer some most useful pointers towards a stable PL/SQL software architecture.

Part of the Designing PL/SQL Programs series

Monday, March 14, 2016

Introducing the SOLID design principles

PL/SQL programming standards tend to focus on layout (case of keywords, indentation, etc), naming conventions, and implementation details (such as use of cursors).  These are all important things, but they don't address questions of design. How easy is it to use the written code?  How easy is it to test? How easy will it be to maintain? Is it robust? Is it secure?

Simply put, there are no agreed design principles for PL/SQL. So it's hard to define what makes a well-designed PL/SQL program.

The SOLID principles

It's different for object-oriented programming. OOP has more design principles and paradigms and patterns than you can shake a stick at. Perhaps the most well-known are the SOLID principles, which were first mooted by Robert C. Martin, AKA Uncle Bob, back in 1995 (although it was Michael Feathers who coined the acronym).

Although Martin put these principles together for Object-Oriented code, they draw on a broader spectrum of programming practice. So they are transferable, or at least translatable, to the other forms of modular programming. For instance, PL/SQL.

Single Responsibility Principle

This is the foundation stone of modular programming: a program unit should do only one thing. Modules which do only one thing are easier to understand, easier to test and generally more versatile. Higher level procedures can be composed of lower level ones. Sometimes it can be hard to define what "one thing" means in a given context, but some of the other principles provide clarity. Martin's formulation is that there should be just one axis of change: there's just one set of requirements which, if modified or added to, would lead to a change in the package.

Open/closed Principle

The slightly obscure name conceals a straightforward proposal. It means program units are closed to modification but open to extension. If we need to add new functionality to a package, we create a new procedure rather than modifying an existing one. (Betrand Meyer, the father of Design By Contract programming, originally proposed it; in OO programming this principle is implemented through inheritance or polymorphism.) Clearly we must fix bugs in existing code. Also it doesn't rule out refactoring: we can tune the implementation providing we don't change the behaviour. This principle mainly applies to published program units, ones referenced by other programs in Production. Also the principle can be looser when the code is being used within the same project, because we can negotiate changes with our colleagues.

Liskov Substitution Principle

This is a real Computer Science-y one, good for dropping in code reviews. Named for Barbara Liskov it defines rules for behavioural sub-typing. If a procedure has a parameter defined as a base type it must be able to take an instance of any sub-type without changing the behaviour of the program. So a procedure which uses
IS OF
to test the type of a passed parameter and do something different is violating Liskov Substitution. Obviously we don't make much use of Inheritance in PL/SQL programming, so this Principle is less relevant than in other programming paradigms.

Interface Segregation Principle

This principle is about designing fine-grained interfaces. It is a extension of the Single Responsibility Principle. Instead of build one huge package which contains all the functions relating to a domain build several smaller, more cohesive packages. For example Oracle's Advanced Queuing subsystem comprises five packages, to manage different aspects of AQ. Users who write to or read from queues have
DBMS_AQ
; users who manage queues and subscribers have
DBMS_AQADM
.

Dependency Inversion Principle

Interactions between programs should be through abstract interfaces rather than concrete ones. Abstraction means the implementation of one side of the interface can change without changing the other side. PL/SQL doesn't support Abstract objects in the way that say Java does. To a certain extent Package Specifications provide a layer of abstraction but there can only be one concrete implementation. Using Types to pass data between Procedures is an interesting idea, which we can use to decouple data providers and data consumers in a useful fashion.

Applicability of SOLID principles in PL/SQL

So it seems like we can apply SOLID practices to PL/SQL.  True, some Principles fit better than others. But we have something which we might use to distinguish good design from bad when it comes to PL/SQL interfaces.

The SOLID principles apply mainly to individual modules. Is there something similar we can use for designing module groups? Why, yes there is. I'm glad you asked.

Part of the Designing PL/SQL Programs series

Tuesday, March 01, 2016

Designing PL/SQL Programs

When I started out, in COBOL, structured programming was king. COBOL programs tended to be lengthy and convoluted. Plus GOTO statements. We needed program desire to keep things under control.

So I noticed the absence of design methodologies when I moved into Oracle. At first it didn't seem to be a problem. SQL was declarative and self-describing, and apparently didn't need designing. Forms was a 4GL and provided its own structure. And PL/SQL? Well that was just a glue, and the programs were so simple.

Then one day I was debugging several hundred lines of PL/SQL somebody had written, and struggling to figure out what was going on. So I drew a flow chart of the IF branches and WHILE loops. Obvious really, but if the original author had done that they would have realised that the program had an ELSE branch which could never be chosen; more than one hundred lines of code which would never execute.

Let me sleep()


Good design is hard to define: in fact, good design is often unobtrusive. It's bad design we notice, because it generates friction and hinders our progress. By way of illustration, here is a poor design choice in Oracle's PL/SQL library: DBMS_LOCK.SLEEP() .

SLEEP() is a simple program, which suspends processing for a parameterized number of seconds. This is not something we want to do often, but it is useful in testing. The problem is its home in the DBMS_LOCK package, because that package is not granted to public by default.

DBMS_LOCK is a utility package for building our own locking mechanisms. There's not much need for this any more. Oracle's default locking model is pretty good. There is SELECT .. FOR UPDATE for pessimistic locking, which is even more powerful since the SKIP LOCKED syntax was permitted in 11g. We have Advanced Queuing, Job Scheduling, oh my. It's hard to find a use case for user-defined locks which isn't re-inventing the wheel, and easy to see how we might end up implementing something less robust than the built-in locks. So DBAs tend not to grant execute on DBMS_LOCK without being asked, and then often not without a fight.

But as developers we need access to a sleep routine. So DBAs have to grant execute on DBMS_LOCK, and then that gives away too much access. It would be better if SLEEP() was easily accessible in some less controversial place.

Why is this an example of bad design? Because user-defined locks need a sleep routine but  SLEEP()has other uses besides lock implementations. Putting  SLEEP() in DBMS_LOCK means it's harder to use it.

Riding the Hobby Horse


Occasionally in a recruitment interview I have asked the candidate how they go would design a PL/SQL program. Mostly the question is met with bemusement. PL/SQL design is not A Thing. Yet many of us work on huge PL/SQL code-bases. How do they turn out without a design methodology? Badly:
  • Do you have one schema crammed with hundreds of PL/SQL program units, perhaps named with a prefix to identify sub-systems?
  • Do you have a package called UTILS?
  • Do you query USER_PROCEDURES or USER_DEPENDENCIES (or even USER_SOURCE) to find a piece of code which implements some piece of functionality?
  • Do you have the same functionality implemented in several places?
  • Does a "simple change" cascade into changes across multiple program units and a regression testing nightmare?
All these are symptoms of poor design. But there are ways to avoid this situation.

Designing PL/SQL Programs series