Wednesday, December 31, 2014

UKOUG Annual Conference (Tech 2014 Edition)

The conference

This year the UKOUG's tour of Britain's post-industrial heritage brought the conference to Liverpool. The Arena & Convention Centre is based in Liverpool docklands, formerly the source of the city's wealth and now a touristic playground of museums, souvenir shops and bars. Still at least the Pumphouse functions as a decent pub, which is one more decent pub than London Docklands can boast. The weather was not so much cool in the 'Pool as flipping freezing, with the wind coming off the Mersey like a chainsaw that had been kept in a meat locker for a month. Plus rain. And hail. Which is great: nothing we Brits like more than moaning about the weather.

After last year's experiment with discrete conferences, Apps 2014 was co-located with Tech 2014; each was still a separate conference with their own exclusive agendas (and tickets) but with shared interests (Exhibition Hall, social events). Essentially DDD's Bounded Context pattern. I'll be interested to know how many delegates purchased the Rover ticket which allowed them to cross the borders. The conferences were colour-coded, with the Apps team in Blue and the Tech team in Red; I thought this was an, er, interesting decision in a footballing city like Liverpool. Fortunately the enforced separation of each team's supporters kept violent confrontation to a minimum.

The sessions

This is not all of the sessions I attended, just the ones I want to comment on.

There's no place like ORACLE_HOME

I started my conference by chairing Niall Litchfield's session on Monday morning. Niall experienced every presenter's nightmare: switch on the laptop, nada, nothing, completely dead. Fortunately it turned out to be the fuse in the charger's plug, and a marvellous tech support chap was able to find a spare kettle cable. Niall coped well with the stress and delivered a wide-ranging and interesting introduction of some of the database features available to developers. It's always nice to here a DBA say difficult is the task of developers these days. I'd like to hear more acknowledge it, and more importantly being helpful rather than becoming part of the developer's burden :)

The least an Oracle DBA needs to know about Linux

Turns out "the least" is still an awful lot. Martin Nash started with installing a distro and creating a file system, and moves on from there. As a developer I find I'm rarely allowed OS access to the database server these days; I suspect many enterprise DBAs also spend most of their time in OEM rather than the a shell prompt. But Linux falls into that category of things which when you need to know them you need to know them in the worst possible way. So Martin has given me a long list of commands with which to familiarize myself.

Why solid SQL still delivers the best performance

Robyn Sands began her session with the shocking statement that the best database performance requires good application design. Hardware improvements won't safe us from the consequences of our shonky code. From her experience in Oracle's Real World Performance team, the top three causes of database slowness are:
  • People not using the database the way it was designed to be used
  • Sub-optimal architecture or code
  • Sub-optimal algorithm (my new favourite synonym for "bug")

The bulk of her session was devoted to some demos, racing different approaches to DML:
  • Row-by-row processing
  • Array (bulk) processing
  • Manual parallelism i.e. concurrency
  • Set-based processing i.e. pure SQL
There were a series of races, starting with a simple copying of data from one table to another and culminating in a complex transformation exercise. If you have attended any Oracle performance session in the last twenty years you'll probably know the outcome already but it was interesting to see how much faster pure SQL was compared to the other approaches. in fact the gap between the set-based approach and the row-based approach widened with each increase in complexity of the task. What probably surprised many people (including me) was how badly manual parallelism fared: concurrent threads have a high impact on system resource usage, because of things like index contention.

Enterprise Data Warehouse Architecture for Big Data

Dai Clegg was at Oracle for a long time and has since worked for a couple of startups which used some of the new-fangled Big Data/NoSQL products. This mix of experience has given him a breadth of insight which is not common in the Big Data discussion.

His first message is one of simple economics: these new technologies solve the problem of linear scale-out at a price-point below that of Oracle. Massively parallel programs using cheap or free open source software on commodity hardware. Commodity hardware is more failure prone than enterprise tin (and having lots of the blighters actually reduces the MTTF) but these distributed frameworks are designed to handle node failures; besides, commodity hardware has gotten a lot more reliable over the years. So, it's not that we couldn't implement most Big Data applications using relational databases, it's just cheaper not to.

Dai's other main point addressed the panoply of products in the Big Data ecosystem. Even in just the official Hadoop stack there are lots of products with similar or overlapping capabilities: do we need Kafka or Flume or both? There is no one Big Data technology which is cheaper and better for all use cases. Therefore it is crucial to understand the requirements of the application before starting on the architecture. Different applications will demand different permutations from the available options. Properly defined use cases (which don't to be heavyweight - Dai hymned the praises of the Agile-style "user story") will indicate which kinds of products are required. Organizations are going to have to cope with heterogeneous environments. Let's hope they save enough on the licensing fees to pay for the application wranglers.

How to write better PL/SQL

After last year's fiasco with shonky screen rendering and failed demos I went extremely low tech: I could have my presentation from the PDF on a thumb-drive. Fortunately that wasn't necessary. My session was part of the Beginners' Track: I'm not sure how many people in the audience were actual beginners; I hope the grizzled veterans got something out of it.

One member of the audience turned out to be a university lecturer; he was distressed by my advice to use pure SQL rather than PL/SQL whenever possible. Apparently his students keep doing this and he has to tell them to use PL/SQL features instead. I'm quite heartened to hear that college students are familiar with the importance of set-based programming. I'm even chuffed to have my prejudice confirmed that it is university lecturers who are teach people to write what is bad code in the real world. I bet he tells them to use triggers as well :)

Oracle Database 12c New Indexing Features

I really enjoy Richard Foote's presenting style: it is breezily Aussie in tone, chatty and with the occasional mild cuss word. If anybody can make indexes entertaining it is Richard (and he did).

His key point is that indexes are not going away. Advances in caching and fast storage will not remove the need for indexed reads, and the proof is Oracle's commitment to adding further capabilities. In fact, there are so many new indexing features that Ricahrd's presentation was (for me) largely a list of things I need to go away and read about. Some of these features are quite arcane: an invisible index? on an invisible column? Hmmmm. I'm not sure I understand when I might want to implement partial indexing on a partitioned table. What I'm certain about is that most DBAs these days are responsible for so many databases that they don't have the time to acquire the requisite understanding of individual applications and their data; so it seems to me unlikely that they will be able to decide which partitions need indexing. This is an optimization for the consultants.

Make your data models sing

It was one of the questions in the Q&A section of Susan Duncan's talk which struck me. The questioner talked about their "legacy" data warehouse. How old did that make me feel? I can remember when Data Warehouses were new and shiny and going to solve very enterprises data problems.

The question itself dealt with foreign keys: as is a common practice the data warehouse had no defined foreign keys. Over the years it had sprawled across several hundred tables, without the documentation keeping up. Is it possible, the petitioner asked, to reverse engineer the data model with foreign keys in the database? Of course the short answer is No. While it might be possible to infer relationships from common column names, there isn't any tool we were aware of which could do this. Another reminder that disabled foreign keys are better than no keys at all.

Getting started with JSON in the Database

Marco Gralike has a new title: he is no longer Mr XMLDB he is now Mr Unstructured Data in the DB. Or at least his bailiwick has been extended to cover JSON. JSON (JavaScript Object Notation) is a lightweight data transfer mechanism: basically it's XML without the tags. All the cool kids like JSON because it's the basis of RESTful web interfaces. Now we can store JSON in the database (which probably means all the cool kids will wander off to find something else now that fusty old Oracle can do it).
The biggest surprise for me is that Oracle haven't introduced a JSON data type (apparently there were so many issues around the XMLType nobody had the appetite for another round). So that means we store JSON in VARCHAR2, CLOB, BLOB or RAW. But like XML there are operators which allow us to include JSON documents in our SQL. The JSON dot notation works pretty much like XPath, and we can use it to build function-based indexes on the stored documents. However, we can't (yet) update just part of a JSON doc: it is wholesale replacement only.

Error handling is cute: by default invalid JSON syntax in a query produces null in result set rather than an exception. Apparently that's how the cool kids like it. For those of us that prefer our exceptions hurled rather than swallowed there is an option to override this behaviour.

SQL is the best development language for Big Data

This was Tom Kyte giving the obverse presentation to Dai Clegg: Oracle can do all this Big Data stuff, and has been doing it for some time. He started with two historical observations:
  • XML data stores were going to kill off relational databases. Which didn't happen.
  • Before relational databases and SQL there was NoSQL, literally no SQL. Instead there were things like PL/1, which was a key-value data store.
Tom had a list of features in Oracle which support Big Data applications. They were:
  • Analytic functions which have enabled ordered array semantics in SQL since the last century.
  • SQL Developer's support for Oracle Data Mining.
  • The MODEL clause (for those brave enough to use it).
  • Advanced pattern matching with the MATCH RECOGNIZE clause in 12c
  • External tables with their support for extracting data from flat files, including from HDFS (with the right connectors)
  • Support for JSON documents (see above).
He could also have discussed document storage with XMLType and Oracle Text, Enterprise R, In-Memory columnar storage, and so on. We can even do Map/Reduce in PL/SQL if we feel so inclined. All of these are valid assertions; the problem is (pace Dai Clegg) simply one of licensing. Too many of the Big Data features are chargeable extras on top of Enterprise Edition licenses. Big Data technology is suited to a massively parallel world where all processors are multi-core and Oracle's licensing policy isn't.

Five hints for efficient SQL

This was an almost philosophical talk from Jonathan Lewis, in which he explained how he uses certain hints to fix poorly performing queries. The optimizer takes a left-deep approach, which can lead to a bad choice of transformation, bad estimates (but check your stats as well!) and bad join orders. His strategic solution is to shape the query with hints so that Oracle's execution plan meets our understanding of the data. <

So his top five hints are:
  • (NO_)MERGE
  • (NO_)PUSH_PRED
  • (NO_)UNNEST
  • (NO_)PUSH_SUBQ
  • DRIVING_SITE
Jonathan calls these strategic hints, because advise the optimizer how to join tables or how to transform a sub-query. They don't hard-code paths in the way that say the INDEX hint does.

Halfway through the presentation Jonathan's laptop slid off the lectern and slammed onto the stage floor. End of presentation? Luckily not. Apparently his laptop is made of the same stuff they use for black box flight recorders, because after a few anxious minutes it rebooted successfully and he was able to continue with his talk. I was struck by how unflustered he was by the situation (even though he didn't have a backup due to last minute tweaking of the slides). A lovely demonstration of grace under pressure.