Monday, January 28, 2008

Is this string a number? Really?

Jared Still ponders this question in a recent blog post. He runs some benchmarks against the various approaches and comes to the conclusion that using TRANSLATE() is the fastest approach. Which is fine, as far as it goes. It's a solution which works for Jared's situation but is not universally applicable.

Note that I have slightly complicated Jared's approach to allow for fake_number values of varying length:

SQL> select * from detect_numeric
2 order by 1
3 /
FAKE_NUMBER
--------------------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.

SQL> select fake_number
2 from detect_numeric
3 where lpad('|', length(fake_number), '|')
4 = translate(fake_number,'0123456789','||||||||||')
5 order by 1
6 /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.
SQL>
So far so good. Let's add another record....

SQL> insert into detect_numeric values ('123.45')
2 /

1 row created.

SQL> select fake_number
2 from detect_numeric
3 where lpad('|', length(fake_number), '|')
4 = translate(fake_number,'0123456789','||||||||||')
5 order by 1
6 /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.

SQL>
Wha'pen? Isn't 123.45 is a number? Well, no, not in this context. The TRANSLATE() call is only counting digits. Hmmm, obviously we need to allow for decimal points.

SQL> select fake_number
2 from detect_numeric
3 where lpad('|', length(fake_number), '|')
4 = translate(fake_number,'0123456789.','|||||||||||')
5 order by 1
6 /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022
123.45

10 rows selected.

SQL>
Problem solved? Not quite. There's more to being numeric than just comprising digits and decimal points. They have to be in the right quantities and right places. Let's add an IP address to the mix....
 
SQL> insert into detect_numeric values ('127.0.0.1')
2 /

1 row created.

SQL> select fake_number
2 from detect_numeric
3 where lpad('|', length(fake_number), '|')
4 = translate(fake_number,'0123456789.','|||||||||||')
5 order by 1
6 /
FAKE_NUMBER
--------------------
000000
000001
000002
000010
000011
000012
000020
000021
000022
123.45
127.0.0.1

11 rows selected.

SQL>
And that's why we might need a function like IS_NUMERIC(), which wraps a TO_NUMBER call:

SQL> create or replace function is_numeric
2 (p_str in varchar2, p_fmt_msk in varchar2 := null)
3 return varchar2
4 as
5 return_value varchar2(5);
6 n number;
7 begin
8 begin
9 if p_fmt_msk is null then
10 n := to_number(p_str);
11 else
12 n := to_number(p_str, p_fmt_msk);
13 end if;
14 return_value := 'TRUE';
15 exception
16 when others then
17 return_value := 'FALSE';
18 end;
19 return return_value;
20 end;
21 /

Function created.

SQL> column is_numeric format a10
SQL> select fake_number, is_numeric(fake_number) is_numeric
2 from detect_numeric
3 order by 2,1
4 /
FAKE_NUMBER IS_NUMERIC
-------------------- ----------
127.0.0.1 FALSE
000000 TRUE
000001 TRUE
000002 TRUE
000010 TRUE
000011 TRUE
000012 TRUE
000020 TRUE
000021 TRUE
000022 TRUE
123.45 TRUE

11 rows selected.

SQL>

All of which underlines the importance of understanding the data with which we are working. If we just need to assert that a string consists solely of digits then a simple TRANSLATE() will suffice and will be very efficient. But if we need to assert something more precise - that the string contains a valid number - then we may require a slower but more reliable approach.

Incidentally, anyone who is interested in seeing how to use 10g's regex functionality to winnow numeric strings from non-numeric strings should read this OTN Forum post from CD.

Friday, January 25, 2008

How about a date, baby?

Laurent Schneider has posted another interesting insight into the limits of Oracle. I always thought that the highest date we could have in Oracle was 31-DEC-9999. Well that's certainly the highest date we can fit into the standard NLS date format. But the date buffer will actually go up to Friday 1st January 15402 A.D. Read Laurent's post to see how he does it.

Incidentally, you will probably need to tweak your NLS settings to see the results:

SQL> alter session set nls_date_format='dd-mon-yyyy';

Session altered.

SQL> select round(date '9999-01-01','CC') from dual;
ERROR:
ORA-01801: date format is too long for internal buffer



no rows selected

SQL> alter session set nls_date_format='FMDay ddth Month YYYY B.C.';

Session altered.

SQL> select round(date '9999-01-01','CC') from dual;
ROUND(DATE'9999-01-01','CC')
--------------------------------
Monday 1st January 10001 AD

SQL>

Incidentally, the Julian dates break down at the outer limits...

SQL> select to_char(round(date '9999-01-01','CC'), 'J') from dual;
TO_CHAR
-------
0000000


SQL> select to_char(trunc(date '-4712-1-1','CC'), 'J') from dual;
TO_CHAR
-------
0000000

SQL>

... and without resorting to RAW twiddling, 1st January 4800 BC is the furthest back our time machine will go....

SQL> select trunc(date '-4712-1-1','CC')-1 from dual;
select trunc(date '-4712-1-1','CC')-1 from dual
*
ERROR at line 1:
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

SQL>

Thursday, January 24, 2008

Neologism corner: commando

Somebody who chooses to use a command line tool like sqlplus.exe over a GUI.

I coined this unthinkingly in a Forum thread about SQL editors which had descended into another "real programmers use vim" yawn fest. However, Brian Tkatch picked me up on it, so I had to retrofit a justification.

I chose the word because a commando is a member of an elite unit of toughnuts, which would seem to fit the self-image of soi disant real programmers. It was only afterwards that I noticed the resonance with "command line" itself. And it also brings fresh life to the phrase "going commando".

Update


The ever-reliable web-comic xkcd proffers a neat encapsulation of how these conversations tend to go.

Friday, January 18, 2008

One of those days

My main task for today was to pick up some zipped files from a server on the client site and bring it back to the office to load onto our development server. Simple enough task.

Except that the network connection was a bit brittle. WinSCP kept failing at 99% complete. The same files were available on the QA server. Alas that box was down: Support hadn't noticed until I mentioned it. So it was back to the first server. I found that if I copied a single zip at a time I could at least keep track of the failures and re-copy when necessary.

Eventually I had all the files I needed. It was then a matter of burning them to CD. Inevitably the desktop I was using didn't have a CD burner but fortunately there was another a PC in the office which had a burner and could see my networked TrueCrypt folder (it was potentially sensitive stuff).

Back at my office I discovered that my access to the shared development network was locked. This happens from time to time because, well, just because. It mattered today because our two sysadmins had already left for the weekend (to Wales and France respectively) so there was no chance of me getting my account unlocked before Monday. So one of my co-workers had to transfer the files to the network. I couldn't actually do anything with the data but at least I would be able to erase the CDs.

While the transfer was happening I went to get a coffee. The coffee machine was out of coffee. Grrrr.

I recount these woes not because they are necessarily typical of my working day (some days I really get lots done) but simply to illustrate a larger point. There's a recent article on the Artima site discussing the impact of languages and frameworks on programmer productivity. The sad fact is only a relatively small part of a developer's day is actually spent coding. There are meeting to attend, cranky networks to wrestle and tea bags to be dunked because the coffee machine's on the blink.

Friday, January 11, 2008

All about me

I have been tagged by Tim Hall who was tagged by Chris Muir who was tagged by Dan Norris who was tagged by Jake Kuramoto. This gives me a Jake Number of 4. Not very flattering. [It turns out I have a Jake Number of 2 - see Laurent Schneider's comment. APC]

Of course, even by the solipsistic standards of regular blogging, this blog-tag tsunami is very narcissistic. Still, as Oscar Wilde once observed, the only thing worse than being blog-tagged is not being blog-tagged. And it annoys Howard Rogers, so here goes.
  1. My nickname Arfur C comes from my schooldays. It derives from the science fiction writer Arthur C Clarke, who presented a TV series on unexplained phenomena. I railed against the nickname for a couple of weeks but without success and now I like it.
  2. I am not Andy C. I think the real Andy C is the one who writes the Norman Brightside blog. There's also Andy Campbell who use to write the "Oracle Stuff I should Have Known" blog, but recently appears to have withdrawn from the blogospshere.
  3. I can't program in C, which I think means I'm not a real techie..
  4. I'm a consultant with an annual season ticket. In fact, out of the nigh on thirteen years in my career with LogicaCMG I have had only three weeks of working away from home.
  5. One of my heroes is the late Quentin Crisp, the closest thing to a Taoist sage Britain has produced.
  6. I use too many cuss words. It's not due to a paucity of vocabulary but just because I enjoy swearing.
  7. I still define myself by the music I like. I probably should have grown out of that by now. Of course I am less tightly wound. I don't think that listening to the Fall makes me cooler than somebody who listens to Billy Joel ... er, hang on a mo, actually I do think that.
  8. Whilst I read lots - blogs, bits of manuals, newspapers, magazines - I haven't read a whole actual book in months.

There can't be many bloggers left to tag. I am pretty sure that none of the following have been "it" yet...

Gints Plivna
Neil Jarvis
Adrian Billington
Duncan Mills
Frits Hoogland
Roel Hartman