Monday, January 11, 2016

Goodbye Spaceboy

"Sometimes I feel
The need to move on
So I pack a bag
And move on"

Can't believe Bowie has taken that final train.

David Bowie's music has been part of my life pretty much since I started listening to pop music seriously. Lodger was the first Bowie album I listened to all the way through. It's probably his most under-appreciated album. It's funny to think that back then in 1979 Bowie was dismissed as past it, a boring old fart who should be swept aside by the vital surge of post-punk bands. Because those bands were raised on Ziggy, they were taught to dance by the Thin White Duke and they learnt that moodiness from listening to Low in darkened bedrooms too many times.

Even if you don't listen to Bowie, probably your favourite bands did. If they style their hair or wear make up, they listened to Bowie. If they play synths they listened to Bowie. If they make dance music for awkward white boys at indie discos they listened to Bowie. If they lurk in shadows smoking cigarettes in their videos they listened to Bowie. That's a large part of his legacy.

The other thing about Bowie is that his back catalogue has something for pretty much everybody. People who loved Ziggy Stardust might loath the plastic soul phase. Hardly anybody gets Hunky Dory; but for some fans it's their favourite album. My favourite is the first side of "Heroes" and the second side of Low, but that whole stretch from Young Americans to Lodger is a seam of sustained musical invention unparallelled by any other pop act. (Judicious picking of collaborators is an art in itself.)

Of course, there was a long fallow period. Tin Machine weren't as bad as we thought at the time, but the drum'n'bass was too 'Dad dancing at a wedding reception' for comfort. So it was a relief when he finally started producing decent albums again. Heathen has some lovely moments. The Next Day was something of a return to form (although a bit too long to be a classic). Then there's Blackstar.

It's almost as though Bowie hung on just long enough that Blackstar would be reviewed as his latest album, rather than his last one. The four and five star reviews earned through merit rather than the mawkishness which would have accompanied a posthumous release. And it really is pretty good. When I first heard the title track it sounded like Bowie was taking a cue from Scott Walker's latter period: edgy, experimental and deliberately designed not to be fan pleaser. But, unlike Walker, Bowie can't do wilfully unlistenable. Even in the midst of all that drone and skronk there are tunes. He can't help himself, his pop sensibility is too strong. Which is why I've already listened to Blackstar more times than I've listened to Bish Bosch.

So, farewell David Bowie. We're all going to miss you terribly. "May God's love be with you."

Labels: ,

Thursday, December 31, 2015

Death and taxes - and Oracle 11gR2?

Oracle Premier Support for 11gR2 Database expired this time last. However, Oracle announced they would waive the fees for Extended Support for 2015. This was supposed to provide 11gR2 customers an additional twelve months to migrate to 12c. So, twelve months on, how many of those laggards are still on 11gR2. My entirely unscientific guess is, most of them. Why else would Oracle announce the extension of the Extended Support fees waiver until May 2017?

But 11gR2's continued longevity should not be a surprise.

For a start, it is a really good product. It is fully-featured and extremely robust. It offers pretty much everything an organization might want from a database. Basically it's the Windows XP of RDBMS.

The marketing of 12c has compounded this. It has focused on the "big ticket" features of 12c: Cloud, Multi-tenancy and In-Memory Database. Which is fair enough, except that these are all chargeable extras. So to get any actual benefits from upgrading to 12c requires laying out additional license fees, which is not a popular message these days.

And then there's Big Data. The hype has swept up lots of organizations who are now convinced they should be replacing their databases with Hadoop. They have heard the siren singing of free software and vendor-independence. In reality, most enterprises' core business rests on structured data for which they need an RDBMS, and their use cases for Big Data are marginal. But right now, it seems easier to make a business case for the shiny new toys than spending more on the existing estate.

So how can Oracle shift organizations onto 12c? They need to offer compelling positive reasons, not just the fear of loss of Support. My suggestion would be to make a couple of the Options part of the core product. For instance, freeing Partitioning and In-Memory Database would make Oracle 12c database a much more interesting proposition for many organizations.

Wednesday, December 31, 2014

UKOUG Annual Conference (Tech 2014 Edition)

The conference

This year the UKOUG's tour of Britain's post-industrial heritage brought the conference to Liverpool. The Arena & Convention Centre is based in Liverpool docklands, formerly the source of the city's wealth and now a touristic playground of museums, souvenir shops and bars. Still at least the Pumphouse functions as a decent pub, which is one more decent pub than London Docklands can boast. The weather was not so much cool in the 'Pool as flipping freezing, with the wind coming off the Mersey like a chainsaw that had been kept in a meat locker for a month. Plus rain. And hail. Which is great: nothing we Brits like more than moaning about the weather.

After last year's experiment with discrete conferences, Apps 2014 was co-located with Tech 2014; each was still a separate conference with their own exclusive agendas (and tickets) but with shared interests (Exhibition Hall, social events). Essentially DDD's Bounded Context pattern. I'll be interested to know how many delegates purchased the Rover ticket which allowed them to cross the borders. The conferences were colour-coded, with the Apps team in Blue and the Tech team in Red; I thought this was an, er, interesting decision in a footballing city like Liverpool. Fortunately the enforced separation of each team's supporters kept violent confrontation to a minimum.

The sessions

This is not all of the sessions I attended, just the ones I want to comment on.

There's no place like ORACLE_HOME

I started my conference by chairing Niall Litchfield's session on Monday morning. Niall experienced every presenter's nightmare: switch on the laptop, nada, nothing, completely dead. Fortunately it turned out to be the fuse in the charger's plug, and a marvellous tech support chap was able to find a spare kettle cable. Niall coped well with the stress and delivered a wide-ranging and interesting introduction of some of the database features available to developers. It's always nice to here a DBA say difficult is the task of developers these days. I'd like to hear more acknowledge it, and more importantly being helpful rather than becoming part of the developer's burden :)

The least an Oracle DBA needs to know about Linux

Turns out "the least" is still an awful lot. Martin Nash started with installing a distro and creating a file system, and moves on from there. As a developer I find I'm rarely allowed OS access to the database server these days; I suspect many enterprise DBAs also spend most of their time in OEM rather than the a shell prompt. But Linux falls into that category of things which when you need to know them you need to know them in the worst possible way. So Martin has given me a long list of commands with which to familiarize myself.

Why solid SQL still delivers the best performance

Robyn Sands began her session with the shocking statement that the best database performance requires good application design. Hardware improvements won't safe us from the consequences of our shonky code. From her experience in Oracle's Real World Performance team, the top three causes of database slowness are:
  • People not using the database the way it was designed to be used
  • Sub-optimal architecture or code
  • Sub-optimal algorithm (my new favourite synonym for "bug")

The bulk of her session was devoted to some demos, racing different approaches to DML:
  • Row-by-row processing
  • Array (bulk) processing
  • Manual parallelism i.e. concurrency
  • Set-based processing i.e. pure SQL
There were a series of races, starting with a simple copying of data from one table to another and culminating in a complex transformation exercise. If you have attended any Oracle performance session in the last twenty years you'll probably know the outcome already but it was interesting to see how much faster pure SQL was compared to the other approaches. in fact the gap between the set-based approach and the row-based approach widened with each increase in complexity of the task. What probably surprised many people (including me) was how badly manual parallelism fared: concurrent threads have a high impact on system resource usage, because of things like index contention.

Enterprise Data Warehouse Architecture for Big Data

Dai Clegg was at Oracle for a long time and has since worked for a couple of startups which used some of the new-fangled Big Data/NoSQL products. This mix of experience has given him a breadth of insight which is not common in the Big Data discussion.

His first message is one of simple economics: these new technologies solve the problem of linear scale-out at a price-point below that of Oracle. Massively parallel programs using cheap or free open source software on commodity hardware. Commodity hardware is more failure prone than enterprise tin (and having lots of the blighters actually reduces the MTTF) but these distributed frameworks are designed to handle node failures; besides, commodity hardware has gotten a lot more reliable over the years. So, it's not that we couldn't implement most Big Data applications using relational databases, it's just cheaper not to.

Dai's other main point addressed the panoply of products in the Big Data ecosystem. Even in just the official Hadoop stack there are lots of products with similar or overlapping capabilities: do we need Kafka or Flume or both? There is no one Big Data technology which is cheaper and better for all use cases. Therefore it is crucial to understand the requirements of the application before starting on the architecture. Different applications will demand different permutations from the available options. Properly defined use cases (which don't to be heavyweight - Dai hymned the praises of the Agile-style "user story") will indicate which kinds of products are required. Organizations are going to have to cope with heterogeneous environments. Let's hope they save enough on the licensing fees to pay for the application wranglers.

How to write better PL/SQL

After last year's fiasco with shonky screen rendering and failed demos I went extremely low tech: I could have my presentation from the PDF on a thumb-drive. Fortunately that wasn't necessary. My session was part of the Beginners' Track: I'm not sure how many people in the audience were actual beginners; I hope the grizzled veterans got something out of it.

One member of the audience turned out to be a university lecturer; he was distressed by my advice to use pure SQL rather than PL/SQL whenever possible. Apparently his students keep doing this and he has to tell them to use PL/SQL features instead. I'm quite heartened to hear that college students are familiar with the importance of set-based programming. I'm even chuffed to have my prejudice confirmed that it is university lecturers who are teach people to write what is bad code in the real world. I bet he tells them to use triggers as well :)

Oracle Database 12c New Indexing Features

I really enjoy Richard Foote's presenting style: it is breezily Aussie in tone, chatty and with the occasional mild cuss word. If anybody can make indexes entertaining it is Richard (and he did).

His key point is that indexes are not going away. Advances in caching and fast storage will not remove the need for indexed reads, and the proof is Oracle's commitment to adding further capabilities. In fact, there are so many new indexing features that Ricahrd's presentation was (for me) largely a list of things I need to go away and read about. Some of these features are quite arcane: an invisible index? on an invisible column? Hmmmm. I'm not sure I understand when I might want to implement partial indexing on a partitioned table. What I'm certain about is that most DBAs these days are responsible for so many databases that they don't have the time to acquire the requisite understanding of individual applications and their data; so it seems to me unlikely that they will be able to decide which partitions need indexing. This is an optimization for the consultants.

Make your data models sing

It was one of the questions in the Q&A section of Susan Duncan's talk which struck me. The questioner talked about their "legacy" data warehouse. How old did that make me feel? I can remember when Data Warehouses were new and shiny and going to solve very enterprises data problems.

The question itself dealt with foreign keys: as is a common practice the data warehouse had no defined foreign keys. Over the years it had sprawled across several hundred tables, without the documentation keeping up. Is it possible, the petitioner asked, to reverse engineer the data model with foreign keys in the database? Of course the short answer is No. While it might be possible to infer relationships from common column names, there isn't any tool we were aware of which could do this. Another reminder that disabled foreign keys are better than no keys at all.

Getting started with JSON in the Database

Marco Gralike has a new title: he is no longer Mr XMLDB he is now Mr Unstructured Data in the DB. Or at least his bailiwick has been extended to cover JSON. JSON (JavaScript Object Notation) is a lightweight data transfer mechanism: basically it's XML without the tags. All the cool kids like JSON because it's the basis of RESTful web interfaces. Now we can store JSON in the database (which probably means all the cool kids will wander off to find something else now that fusty old Oracle can do it).
The biggest surprise for me is that Oracle haven't introduced a JSON data type (apparently there were so many issues around the XMLType nobody had the appetite for another round). So that means we store JSON in VARCHAR2, CLOB, BLOB or RAW. But like XML there are operators which allow us to include JSON documents in our SQL. The JSON dot notation works pretty much like XPath, and we can use it to build function-based indexes on the stored documents. However, we can't (yet) update just part of a JSON doc: it is wholesale replacement only.

Error handling is cute: by default invalid JSON syntax in a query produces null in result set rather than an exception. Apparently that's how the cool kids like it. For those of us that prefer our exceptions hurled rather than swallowed there is an option to override this behaviour.

SQL is the best development language for Big Data

This was Tom Kyte giving the obverse presentation to Dai Clegg: Oracle can do all this Big Data stuff, and has been doing it for some time. He started with two historical observations:
  • XML data stores were going to kill off relational databases. Which didn't happen.
  • Before relational databases and SQL there was NoSQL, literally no SQL. Instead there were things like PL/1, which was a key-value data store.
Tom had a list of features in Oracle which support Big Data applications. They were:
  • Analytic functions which have enabled ordered array semantics in SQL since the last century.
  • SQL Developer's support for Oracle Data Mining.
  • The MODEL clause (for those brave enough to use it).
  • Advanced pattern matching with the MATCH RECOGNIZE clause in 12c
  • External tables with their support for extracting data from flat files, including from HDFS (with the right connectors)
  • Support for JSON documents (see above).
He could also have discussed document storage with XMLType and Oracle Text, Enterprise R, In-Memory columnar storage, and so on. We can even do Map/Reduce in PL/SQL if we feel so inclined. All of these are valid assertions; the problem is (pace Dai Clegg) simply one of licensing. Too many of the Big Data features are chargeable extras on top of Enterprise Edition licenses. Big Data technology is suited to a massively parallel world where all processors are multi-core and Oracle's licensing policy isn't.

Five hints for efficient SQL

This was an almost philosophical talk from Jonathan Lewis, in which he explained how he uses certain hints to fix poorly performing queries. The optimizer takes a left-deep approach, which can lead to a bad choice of transformation, bad estimates (but check your stats as well!) and bad join orders. His strategic solution is to shape the query with hints so that Oracle's execution plan meets our understanding of the data. <

So his top five hints are:
  • (NO_)MERGE
Jonathan calls these strategic hints, because advise the optimizer how to join tables or how to transform a sub-query. They don't hard-code paths in the way that say the INDEX hint does.

Halfway through the presentation Jonathan's laptop slid off the lectern and slammed onto the stage floor. End of presentation? Luckily not. Apparently his laptop is made of the same stuff they use for black box flight recorders, because after a few anxious minutes it rebooted successfully and he was able to continue with his talk. I was struck by how unflustered he was by the situation (even though he didn't have a backup due to last minute tweaking of the slides). A lovely demonstration of grace under pressure.

Labels: , ,

Thursday, October 10, 2013

T-Shirt slogans

One of the Cloudera chaps at the Oracle Big Data meetup had a T-shirt with this cool slogan:
Data is the new bacon
Even as a vegetarian I can appreciate the humour. However I think it has a corollary, which would also make a good T-shirt:
Metadata is the new Kevin Bacon
Because metadata is the thing which connects us all.

Labels: , ,

Oracle Big Data Meetup - 09-OCT-2013

The Oracle guys running the Big Data 4 the Enterprise Meetup are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software partly because they're self-styled hackers who like being able to play with their software any way they choose. But mainly it is because the budgetary constraints of being a start-up mean they have to choose between a Clerkenwell office and Aeron chairs, or enterprise software licenses, and that's no choice at all.

But an Oracle Big Data meetup has a different constituency. We come from an enterprise background, we've all been using Oracle software for a long time and we know what to expect from an Oracle event. We're prepared to tolerate a certain amount of Oracle marketing because we want to hear the Oracle take on things, and we come prepared with our shields up. Apart from anything else, the Meetup sponsor is always cut some slack, in exchange for the beer'n'pizza.

Besides the Oracle Big Data Appliance is quite at easy sell, certainly compared to the rest of the engineered systems. The Exa stack largely comprises machines which replace existing servers whereas Big Data is a new requirement. Most Oracle shops probably don't have a pool of Linux/Java/Network hackers on hand to cobble together a parallel cluster of machines and configure them to run Hadoop. A pre-configured Exadoop appliance with Oracle's imprimatur is just what those organisations need. The thing is, it seems a bit cheeky to charge a six figure sum for a box with a bunch of free software on it. No matter how good box is. Particularly when it can be so hard to make the business case for a Big Data initiative.

Stephen Sheldon's presentation on Big Data Analytics As A Service addressed exactly this point. He works for Detica. They have stood up an Oracle BDA instance which they rent out for a couple of months to organisations who want to try a Big Data initiative. Detica provide a pool of data scientists and geeks to help out with the processing and analytics. At the end of the exercise the customer has a proven case showing whether Big Data can give them sufficient valuable insights into their business. This strikes me as a highly neat idea, one which other companies will wish they had thought of first.

Ian Sharp (one of the apologetic Oracle guys) presented on Oracle's Advanced Analytics. The big idea here is R embedded in the database. This gives data scientists access to orders of magnitude more data than they're used to having on their desktop R instances. Quants working in FS organisations will most likely have an accident when they realise just how great an idea this is. Unfortunately, Oracle R Enterprise is part of the Advanced Analytics option, so probably only the big FS companies will go for it. But the Oracle R distro is still quite neat, and free.

Mark Sampson from Cloudera rounded off the evening with a talk on a new offering, Cloudera Search. This basically provides a mechanism for building a Google / Amazon style search facility over a Hadoop cluster. The magic here is that Apache Solr is integrated into the Hadoop architecture instead of as a separate cluster, plus a UI building tool. I spent five years on a project which basically did this with an Oracle RDBMS, hand-rolled ETL and XML generators and lots of Java code plumbing an external search engine into the front-end. It was a great system, loved by its users and well worth the effort at the time. But I expect we could do the whole thing again in a couple of months with this tool set. Which is good news for the next wave of developers.

Some people regard attending technical meetups a bit odd. I mean, giving up your free time to listen to a bunch of presentations on work matters? But if you find this stuff interesting you can't help yourself. And if you work with Oracle tech and are interested in data then this meetup is definitely worth a couple of hours of your free time.

Labels: , ,

Monday, July 15, 2013

PL/SQL Coding Standards, revisited

Formatting is the least important aspect of Coding Standards. Unfortunately, most sets of standards expend an inordinate number of pages on the topic. Because:

  1. The standards are old, or the person who wrote them is.
  2. Code formatting is an easy thing to codify and formalise.

Perhaps the source of most wasted energy is formatting keywords. Back in mediaeval times, when the only editors in use were vi or Notepad, or perhaps PFE, this was a pressing issue. But modern editors support syntax highlighting: now that we can have keywords in a different colour there is much less need to distinguish them with a different case.

Personally I prefer everything in lower case; I save about 23 seconds a day from not having to use the [shift] key. But other people have different preferences, and for the sake of the team it is better to have all the source code in a consistent format. But the way to achieve this is with automation not a Word document. SQL Developer, PLSQL Developer and TOAD all have code formatters (or beautifiers, yuck) , as do other tools. Let's put the rules into the machine and move on.

What should the rules be? Well, everybody has an opinion, but here are my definitive PL/SQL Coding Standards, with an addendum of formatting guidance.

APC's Damn Fine PL/SQL Coding Standards

  1. Your code must implement the requirements correctly and completely.
  2. Your code must have a suite of unit and integration tests (preferably automated) to prove it implements the requirements correctly and completely.
  3. Your code must implement the requirements as efficiently and performantly as possible.

APC's PL/SQL Code Formatting Guidelines

  1. Case. ALL CAPS is Teh Suck! Anything else is fine.
  2. Indentation. Align consistently. Spaces not tabs. Four spaces is the Goldilocks indent.
  3. Short statements. One statement per line.
  4. Long statements Use line breaks, don't make me scroll.
  5. Naming conventions. Use prefixes to distinguish local variables, global variables and parameters from each other and from database objects.
  6. Comments. A comment is an apology.
If you prefer something less minimal, William Robertson's PL/SQL Coding Standards remains the most complete and best annotated set on the web. Okay, so he does specify "3 spaces for each nesting level" (why? computing is all about powers of 2) but nobody's perfect.

Labels: ,

Sunday, July 14, 2013

The personal is technical

On Friday evening I attended an IT Job Fair at the Amerigo Vespucci in Canary Wharf. Let me say straight away that hanging out with a random bunch of techies and recruiters would not be my first choice for a Friday evening. But, hey! I'm looking for my next role, and right now the market is too tough to turn down opportunities to find it. Besides I was interested to see whether the Meetup template would translate into a recruitment fair.

On the day the translation was a mixed success. Unlike most Meetups, which can work with any number of attendees, a job fair requires a goodly number of recruiters, and recruiters will only turn up if they think there will enough candidates to make it worth their while. This first event didn't achieve that critical mass, and I would be quite surprised if I get an opportunity from it. Nevertheless I will try to attend the next event, whenever that may be, because pop-up job fairs in bars are a great idea. Not for the reason you're thinking (I drank cola all evening), but because it was enjoyable. I talked with some interesting people, and got a couple of email addresses as well.

But more than that I was impressed with the concept. The informal social setting is good for understanding what a person is really like, specifically what they might be like to work with. This has to be worthwhile. The CV is a dead letter: it lists skills and accomplishments but doesn't animate or demonstrate them. The formality of the technical interview makes it hard to judge somebody as a person. Anyway it's generally aimed at establishing how much of their CV is true. The social element is often missed entirely. Big mistake. Software is like Soylent Green, it's made of people.

Personality matters: the toughest problems most projects face are political (organizational, personal) rather than technical (except for system integration - that's always going to be the biggest pain in the neck). A modern development project is a complex set of relationships. There are external relationships, with users, with senior managers, with other stakeholders (Security, Enterprise Architecture, etc) any of which can jeopardize the success of the project if handled badly. But the internal relationships - between Project Manager and staff, between developers and testers, or developers and DBAs - are just as fraught with difficulty.

The personal is technical because the team dynamic is a crucial indicator of the likely success of the project. You don't just need technical competence, you need individuals who communicate well and share a common vision; people who are (dread phrase) team players. That's why Project Managers generally like to work with people they already know, because they already know they can work with them.

For a new hire, nobody knows the answer to the burning question, "Can I stand to be in this person's company eight hours a day, five days a week, for the duration of the project?" Hence the value of chatting about work and other things in a bar on a hot summer's eve over a glass of something with ice. I'm not sure how well the model would works for recruitment agents, but I think it would suit both hirers and hirees. It's not a technique that scales, but if people made better hiring decisions perhaps that wouldn't matter?

Labels: , , ,

Thursday, July 11, 2013

UKOUG Analytics Event: a semi-structured analysis

Yesterday's UKOUG Analytics event was a mixture of presentations about OBIEE with sessions on the frontiers of data analysis. I'm not going to cover everything, just dipping into a few things which struck me during the day

During the day somebody described dashboards as "Fisher Price activity centres for managers". Well, Neil Sellers showed a mobile BI app called RoamBI which is exactly that. Swipe that table, pinch that graph, twirl that pie chart! (No really, how have we survived so long with pie charts which can't be rotated?) The thing is so slick, it'll keep the boss amused for hours. Neil's theme on the importance of data visualization to convey a message or tell a story was picked up by Claudio Bastia and Nicola Sandol.   Their presentation included a demo of IConsulting's Location Intelligence extension for OBIEE. The tool not only does impressive things with the display of geographic data, it also allows users to interact with the maps to refine queries and drill down into the data. This is visualization which definitely goes beyond the gimmick: it's an extremely powerful way of communicating complex data sets.

A couple of presentations quoted the statistic that 90% of our data was created in the last two years. This is a figure which has been bandied about but I've never seen a citation which explains who calculated it and what method they used (although it's supposed to have originated at IBM). It probably comes from the same place as most other statistics (and project estimates). What is the "data" the figure measures? I'm sure in some areas of human endeavour (bioinformatics, say, or CERN) the amount of data they produce has gone metastatic. And obviously digital cameras, especially on phones, are now ubiquitous, so video and photographs account for a lot of the data growth. But are selfies, instagrammed burgers and cute kittens really data? Same with other content: how much of this data explosion is mirroring, retweets, quoting, spam and AdSense farms? Not to mention the smut. Anyway, that 90% was first cited in 2012; it's now 2013 and somebody needs to invent derive a new figure.

The day rounded off with a panel and a user presentation. Toby Price opened the Q&A by asking Oracle's Nick Whitehead, how does Hadoop fit into an Oracle estate? It's a good question. After all, Oracle has been able to handle unstructured data, i.e. text, since the introduction of ConText in 8.0 (albeit as a chargeable extra in those days). And there's nothing special about MapReduce: PL/SQL can do that. So what's the deal with Hadoop? Here's the impertinent answer to this pertinent question: Hadoop allows us to run massively parallel jobs without paying Oracle's per processor licenses. Let's face it, not even Tony Stark could afford to run a one-thousand core database.

The closing session was a presentation from James Wyper & Dirk Shelley about upgrading the BI architecture at John Lewis Partnership. They described it as a war story, but actually it was a report from the front lines, because the implementation is not yet finished. James and Dirk covered the products - which ones worked as advertised, which ones gave them grief (integration was a particular source of grief). They also discussed their approach to the project, relating what they did well and what they would do differently with the advantage of hindsight. This sort of session is the best part of any user group: real users sharing their experiences with the community. We need more of them.

Labels: , , ,