Wednesday, March 16, 2016

The Dependency Inversion Principle: a practical example

These design principles may seem rather academic, so let's look at a real life demonstration of how applying Dependency Inversion Principle lead to an improved software design.

Here is a simplified version of an ETL framework which uses SQL Types in a similar fashion to the approach described in my blog post here. The loading process is defined using an abstract non-instantiable Type like this:
create or replace type load_t force as object
    ( txn_date date
      , tgt_name varchar2(30)
      , member function load return number
      , final member function get_tgt return varchar2
      )
not final not instantiable;
/

create or replace type body load_t as
    member function load return number
    is
    begin
        return 0;
    end load;
    final member function get_tgt return varchar2
    is
    begin
        return self.tgt_name;
    end get_tgt;
end;
/


The concrete behaviour for each target table in the ABC feed is defined by sub-types like this:
create or replace type load_tgt1_t under load_t
    ( overriding member function load return number
        , constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
      )
;
/
create or replace type body load_tgt1_t as
    overriding member function load return number
    is
    begin
        insert into tgt1 (col1, col2)
        select to_number(col_a), col_b
        from stg_abc stg
        where stg.txn_date = self.txn_date;
        return sql%rowcount;
    end load;
    constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
    is
    begin
        self.txn_date := txn_date;
        self.tgt_name := 'TGT1';
        return;
    end load_tgt1_t;
end;
/
This approach is neat because ETL is a fairly generic process: the mappings and behaviour for a particular target table are specific but the shape of the loading process is the same for any and all target tables. So we can build a generic PL/SQL procedure to handle them. This simplistic example does some logging, loops through a set of generic objects and, through the magic of polymorphism, calls a generic method which executes specific code for each target table:
    procedure load  
     (p_txn_date in date
        , p_load_set in sys_refcursor)
    is
        type loadset_r is record (
            tgtset load_t
            );
        lrecs loadset_r;
        load_count number;
    begin
        logger.logm('LOAD START::txn_date='||to_char(p_txn_date,'YYYY-MM-DD'));
        loop
            fetch p_load_set into lrecs;
            exit when p_load_set%notfound;
            logger.logm(lrecs.tgtset.get_tgt()||' start');
            load_count := lrecs.tgtset.load();
            logger.logm(lrecs.tgtset.get_tgt()||' loaded='||to_char(load_count));
        end loop;
        logger.logm('LOAD FINISH');
    end load;

So far, so abstract. The catch is the procedure which instantiates the objects:
    procedure load_abc_from_stg  
         (p_txn_date in date)
    is
        rc sys_refcursor;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       load(p_txn_date, rc);
    end load_abc_from_stg;

On casual inspection it doesn't seem problematic but the call to the load() procedure gives the game away. Both procedures are in the same package:
create or replace package loader as
    procedure load 
     (p_txn_date in date
        , p_load_set in sys_refcursor);
    procedure load_abc_from_stg
         (p_txn_date in date);
end loader;
/

So the package mixes generic and concrete functionality. What makes this a problem? After all, it's all ETL so doesn't the package follow the Single Responsibility Principle? Well, up to a point. But if we want to add a new table to the ABC feed we need to update the LOADER package. Likewise if we want to add a new feed, DEF, we need to update the LOADER package. So it breaks the Stable Abstractions principle. It also creates dependency problems, because the abstract load() process has dependencies on higher level modules. We can't deploy the LOADER package without deploying objects for all the feeds.

Applying the Dependency Inversion Principle.

The solution is to extract the load_abc() procedure into a concrete package of its own. To make this work we need to improve the interface between the load() procedure and programs which call it. Both sides of the interface should depend on a shared abstraction.

The LOADER package is now properly generic:
create or replace package loader as
    type loadset_r is record (
            tgtset load_t
            );
    type loadset_rc is ref cursor return loadset_r;
    procedure load 
        (p_txn_date in date
          , p_load_set in loadset_rc)
         authid current_user
               ;
end loader;
/
The loadset_r type has moved into the package specification, and defines a strongly-typed ref cursor. The load() procedure uses the strongly-typed ref cursor.

Similarly the LOAD_ABC package is wholly concrete:
create or replace package loader_abc as
    procedure load_from_stg
            (p_txn_date in date);
end loader_abc;
/

create or replace package body loader_abc as
    procedure load_from_stg
            (p_txn_date in date)
    is
        rc loader.loadset_rc;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       loader.load(p_txn_date, rc);
    end load_from_stg;
end loader_abc;
/
Both package bodies now depend on abstractions: the strongly-typed ref cursor in the LOADER specification and the LOADER_T SQL Type. These should change much less frequently than the tables in the feed or even the loading process itself. This is the Dependency Inversion Principle in action.

Separating generic and concrete functionality into separate packages produces a more stable application. Users of a feed package are shielded from changes in other feeds. The LOADER package relies on strongly-typed abstractions. Consequently we can code a new feed package which can call loader.load() without peeking into that procedure's implementation to see what it's expecting.

Part of the Designing PL/SQL Programs series

No comments: