• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Sunday, 22 Jul 2012 21:04
The Oracle Database software is only "supported" on Red Hat Enterprise Linux and Oracle Enterprise Linux I believe. But it can be made to work on many other Linux distributions one way or another, as a lot of the stuff is the same or equally available on other Linux distributions. While this is not supported or recommended in any way by Oracle, it can be useful if you have a different Linux distribution installed on a test system for one reason or another and cannot change it. Here I will present the recipe for how I got Oracle 11g Release 2 installed on Arch Linux. Remember that none of this should be used in any kind of production like environment as it is unsupported.

Arch Linux

Arch Linux is a Linux distribution based on a rolling release model. This primarily means that there is only ever one version of Arch, being the "latest version", and whenever new releases of software packages are made by the developer they are then added to the Arch repositories after a small delay. This means that you can always get the latest version of any available software package on Arch, and that there is never a big upgrade or reinstall hassle with Arch every 6 or 12 months. Instead you regularly do a system upgrade which pulls down the latest versions of all installed packages from the repository, so that your system is always "up to date".

Another thing about Arch is that it is highly customisable to whatever combination of Linux software applications you want. It does this by only installing a "bare bones" set of Linux packages that gives you a text only console with a "login" prompt (classic old school UNIX). From this you then install all the extra packages you want, to arrive at your customised Linux, just as you want it. No bloat from other applications you never wanted in the first place. This means installing and configuring Arch is more than just booting from a Live CD and clicking on "Install". But Arch has an excellent Wiki including an "Installation Guide" that explains everything.

Arch has its own package manager (pacman) that handles all of the downloading, resolving dependencies and installation of the packages. So if a package is available you can just install it and everything it needs through one simple command.

I'm assuming for this article that you already have Arch Linux installed, and know enough about how to install it. And that you have installed Arch plus X Windows plus a full Desktop Environment (e.g. Gnome or Xfce) plus a Display Manager to handle the initial user login (e.g. GDM). If you don't use a full blown Desktop Environment then you will need the equivalent combination of Window Manager, Application Menu, etc., which is also possible in Arch.

Extra packages

Oracle 11g requires "gcc" which is in the "base-devel" package along with other development tools, such as "make". Install this, plus some other needed packages:
pacman -S base-devel elfutils libaio unixodbc sysstat pdksh icu gdb
pacman -S libstdc++5
The last one - "libstdc++5" - is needed because the make files provided with Oracle make explicit reference to this. Since Oracle 11g was originally compiled on the current at that time version of Red Hat Linux the standard C library has moved on to version 6, but Oracle still wants something called version 5 when installing.

Symbolic links

Oracle expects some commands to exist in the /bin directory when they are now in the /usr/bin directory as standard. You can work around this by creating symbolic links from one location to the other for these files:
ln -s /usr/bin/basename /bin/basename
ln -s /usr/bin/tr /bin/tr
ln -s /usr/lib/libgcc_s.so.1 /lib/libgcc_s.so.1
Other symbolic links are mentioned by other people for other Linux distributions, but for Arch these are the only ones I needed to make - other links already existed for some other files. Also bash is the default shell on Arch, which is the same as on Red Hat.

Normal Oracle Pre-Installation

  • Configure the Linux kernel and other settings as stated in the Oracle installation instructions. This includes:
    • Settings in /etc/sysctl.conf
    • Settings in /etc/security/limits.conf
    • Adding a line to /etc/pam.d/login if not already present
  • Create the groups needed (oinstall and dba), and the "oracle" user account.
  • Create the directories where the software will be installed e.g. /apps/oracle and /apps/oraInventory
  • I create a ".profile" file for the "oracle" user and set ORACLE_BASE, ORACLE_HOME based on the installation directories. Also set ORACLE_SID, and add ORACLE_HOME/bin to PATH.
 

Install Oracle 11g

Get the Oracle 11g software distribution - downloadable from oracle.com - and unzip the files to another directory. Run the installer and make your choices from the screens it presents - I normally go for "Install Software Only" and "Single Instance".

You will get the following issues when running the Installer:
  • All prerequisite checks will fail. This is because Oracle uses "rpm" to find out what packages are installed, and Arch does not use "rpm", so Oracle thinks everything is missing.
    • Providing you have installed the packages listed earlier, simply click on the "Ignore All" check box on the top right of the screen, then click "Next".
    • Click "Yes" on the pop-up dialog box that warns you about ignoring prerequisites. Then click "Install" as normal on the final screen
  • You will get a failure message during installation about "target agent nmhs", from the make file "sysman/lib/ins_emagent.mk".
    • This only affects the Enterprise Manager Agent, which is generally not used, and so can be ignored.
    • It is due to a change in how "gcc" scans libraries to find referenced symbols in a program.
    • If you really want this Agent, a workaround is possible by editing the makefile - just Google it to find a solution.
    • Assuming you don't want this Agent, just click "Continue" to ignore it can carry on.
Eventually you will get the normal pop-up dialog box about running the "root.sh" scripts as root, and then the installation is complete.

From here you can go on and create a database using the Database Configuration Assistant (dbca) or the "CREATE DATABASE" SQL statement, whichever you are more familiar with.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 17 Jul 2012 21:33
I really like the Python programming language for general purpose programming and quickly putting together little utilities. There are similarities to Perl in terms of being an interpreted language with powerful data processing capabilities, but there the direct comparisons end. Python is a more modern language than Perl, and has a relatively clean design (nothing is ever perfect though, and Python itself has evolved over the years). The main reasons I prefer Python over Perl are:
  • Very clean syntax - no need for obscure characters in front of variable names and so on
  • Very readable code - it is generally clear what a piece of code is doing
  • Minimal syntax dressing overhead - Python avoids the need for begin / end block statement markers
  • True support for functions
    • Perl just pushes all the arguments into a single list that you must disassemble
  • Strongly typed - Python supports multiple data types and checks at run time that operations are valid
  • Rich set of types - number, string, list, dictionaries, plus others e.g. set, tuple
  • Supports classic "function" based programming - just like 'C' does
    • Very simple to write code to directly do what you want, and modularise common code into functions
  • Also supports full "object oriented programming" - full support for Classes as first level objects
    • Perl does not truly do classes as first level objects
I'm not trying to say that Python is better than Perl in all cases, or other programming languages. I'm just saying that Python is a very good and usable programming language, and that I prefer it over Perl now. With Perl I find it gets very confusing when you try and use any level of complexity and the syntax is not obvious or consistent, whereas with Python I don't get any of this because it is such a cleanly designed language. With Python I can easily build a small application by just putting together the necessary source code while using functions for modularity (what I call "just build and run"), avoiding the overheads of defining classes with fully object oriented programming. However, if I have a more complex set of requirements then I can do a full blown object based solution using classes within Python (too complex to just build, so some initial design is needed, probably with test cases too).

Python is very useful for a variety of different scenarios:
  • building a small application or utility by just writing the code - no "compile and link" steps as it is interpreted
  • building something iteratively (top down design & development) as your code is always runnable (being interpreted)
  • prototyping object based code quicker and easier than compiled languages such as Java
  • ability to extend Python with your own libraries - it is written in C and can call your own compiled libraries
  • quickly prototyping real Java code that uses standard Java Classes in the JVM via Jython 
    • Jython is a Python interpreter written in Java
    • this also uses far fewer lines of source code than would be needed in Java, because of Python's fundamentally different design
Again, some of these capabilities are not unique to Python, but the combination of them all together make it a winner for me.

Python Overview

Python should be straightforward enough to understand on reading it. The only major difference to other languages is that Python does not have begin / end statement block markers and instead relies on statement indenting to determine which statements are part of the same block. You will see that statements with embedded statement blocks in them have a colon (':') at the end of the first line, and the following lines are indented. Initially it takes some getting used to, but it does end up with less typing on your part and a consistent code layout.

Although Python supports what I call direct coding (write some statements and then just run them), it is really fully object oriented behind the scenes. Thus you see many standard functions actually returning objects, against which you invoke one of their methods (see Database Example below). The syntax of "object.method (arguments)" occurs frequently, rather than the non-object way of "function (object, other-arguments)".

Python Development

You could edit and run Python programs from the command line ("python filename.py" - the ".py" extension is just a common convention), or you could use an Integrated Development Environment, such as Eclipse with the PyDev plugin. The nice thing about this is that it does syntax checking for you as you type, so you can easily spot various errors before you run your code. And you can run your Python application from within the development environment, so you don't have to leave the editor each time you want to test something.

Python Database Access

Like Perl, Python defines a standard API for database access. The intention is that developers can write code based on this standard API, and a variety of drivers can be provided for access to different databases. By changing the driver used you can connect to different databases, but the bulk of your code remains the same.

There are a bunch of drivers out there for Oracle, but the most common one seems to be cx_Oracle.

Python Database Examples

Using Python is relatively straightforward. I won't describe the database API in detail, as it is obvious when you use it. In summary, you call a "connect" library function with connection details and get back a connection handle (really an object). Using this you can then create a new statement handle, execute it, and fetch back any data or other results. One neat thing is that Python has a "fetchall" method that fetches back all data rows for a SELECT into a single list (actually a list of rows, each row being a list of columns). This then lets you traverse the data using Python logic, with no more fetch calls. I assume that there are efficiencies with the "fetchall" call over how the data is transferred back over the network, but I have not done any tests about this. It will of course end up using more memory within your Python program to hold all the results fetched.

Example

Connect to an Oracle database and execute a SELECT from a table, printing out the data fetched.
from cx_Oracle import connect

conn = connect ("scott", "tiger", "orcl")
# Or: conn = connect('scott/tiger@orcl')
curs = conn.cursor()
curs.execute("SELECT name, value FROM atable")
rows = curs.fetchall()
for i in range(len(rows)):
print "Row", i, "name", rows[i][0], "value", rows[i][1]
Notes:
  • len is a built in function that returns the number of entries in a list
  • range is a built in function that produces a list of numbers from 0 to one less than the supplied value
  • List members can be accessed using array like notation
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 12 Jul 2012 20:36
I was trying to set up Data Guard between two servers and I kept getting connection errors from RMAN on the primary to the secondary:
RMAN>  connect auxiliary sys/syspassword@dgtest_stdby

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-04006: error from auxiliary database: ORA-01017: invalid username/password; logon denied
I tried to validate everything and it looked okay ("tnsping" worked), but I could not connect through SQL*Net at all, whether from RMAN or SQL*Plus. Local direct connections worked fine though. Eventually I managed to work out what was wrong.  So in case anyone else runs into the same problem ...

What I had originally was:
  • A newly created database (using CREATE DATABASE the old fashioned way)
  • An Oracle password file created using orapwd with the SYS password
  • A listner.ora on each server defining a static service name for the database on that server
  • A tnsnames.ora on each server defining the mappings of connection names to servers and services
The problem was that I had never explicitly set a password for the SYS user in the newly created database. I could still connect locally, which I assume is because I was in the "dba" group in the operating system. But I could not connect remotely no matter what password I used. So the solution was:
  • ALTER USER SYS IDENTIFIED BY NEW_PASSWORD
  • This automatically updates the $ORACLE_HOME/dbs/ora$ORACLE_SID password file on the server
  • Copy (ftp) this file over to the second, standby server
Now I could connect over SQL*Net to either database.

I would say that the error message is not particularly helpful, as it seems that all manner of different error conditions can give rise to the same error message. And there is almost no way of investigating this, other than trial and error - which I was getting sick of after 2 hours of getting nowhere. I knew that the underlying SQL*Net worked, as tnsping worked, and I knew that the Listener configurations were correct. Eventually I simply decided to change the SYS password to be something completely different and lo and behold I could now connect to the primary database remotely. It then took another few minutes before I realised that I needed to copy the Oracle password file to the standby server again, because it had changed.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 02 Jul 2012 20:22
When executing SQL involving ANSI syntax OUTER JOIN's Oracle first converts this to its own native syntax before executing the SQL, and sometimes this conversion is not totally transparent thereby affecting and restricting the execution plan produced.

This started when I had a long running query, and it happened to involve outer joins between most of the tables (but not all of them). From the execution plan it was clear that the main problem was a full table scan on a very large table - out of a total cost of 35,000, 28,000 was from this full table scan alone. So replacing the full table scan by a more efficient index access, if possible, could lead to a significant improvement.

After some analysis I created an index on all of the referenced columns in the table. This meant that all necessary data for the query could be obtained from the index, with no need to visit the data rows in the table at all. As the data row was quite wide (180 bytes), the index was much narrower and much smaller in size. Also, because the index stores data in sorted order by the leading columns, it meant that the rows the query wanted would all be stored together within the index structure, further reducing the number of disk I/Os needed to retrieve it all. But, the Oracle Optimizer would not use this index at all. It just seemed to ignore it, and always do a full table scan.

At this point I tried another of my techniques for understanding how a query gets executed - remove tables from the query one at a time, until you get a more "sensible" execution plan, and then build it back up again to the final query by adding a table at a time, monitoring the execution plan after each change.

What I noticed was that when I got down to 3 tables from the original 7, the Optimizer would now choose to use the new index (doing an Index Fast Full Scan, at a lower cost of 4,000). But when I added back in a fourth table, it went back to doing a Full Table Scan at a cost of 28,000.

I did a 10053 trace on the execution of each query (3 table and 4 table queries) and noted that in the SINGLE TABLE ACCESS PATH section of the 3 table query it was costing an Index Fast Full Scan (IndexFFS), whereas in the 4 table query it was not costing this access path, and only had the Full Table Scan costed.

At this point while I could see what was going on (in one case the Optimizer knows about the Index Fast Full Scan as an option, but in the other case it doesn't) I could not explain why it was happening, nor what to do about it. So I turned to the Oracle Forums and raised a question about Why Optimizer ignoring Index Fast Full Scan? While waiting for replies I also continued to tinker with the queries in the hope of gaining some extra understanding of what was going on.

The cause of the problem was identified by Dom Brooks as being the fact that Oracle internally rewrites ANSI syntax OUTER JOIN's to its own native syntax (using "(+)") before then optimizing and executing the query, and that this can result in some complex SQL that cannot be optimised as you might otherwise expect. Also see his earlier reply which gives more background information and links to other explanations about this.

When it rewrites the ANSI syntax JOIN's to Oracle native syntax, it wraps each join up in its own in-line view, to ensure that it is equivalent to the ANSI meaning. It then tries to merge these inline views back into the main query. For a single outer join, and other simple cases, this view merging happens smoothly. But in other cases it does not, and the Optimizer is left with some complex SQL with the joins being performed within the inline views. And this restricts the options available to it when building the final execution plan.

There is more information on this at the Oracle Optimizer Blog - Outer Joins in Oracle
"In Oracle, ANSI left and right outerjoins are internally expressed in terms of left outerjoined lateral views. In many cases, a left outerjoined lateral view can be merged and the ANSI left (or right) outerjoin can be expressed entirely in terms of Oracle native left outerjoin operator. (A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.)"
So for a simple single outer join such as the following:
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2 ON (T1.x = T2.x);
it first becomes this on initial conversion to a lateral inline view:
SELECT T1.d, LV.c
FROM T1, LATERAL (SELECT T2.C FROM T2 WHERE T1.x = T2.x)(+) LV;
which then becomes this when the view is merged:
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) ;
However, if the joins are more complex, or there are other issues e.g. NULL values allowed within indexed columns, then it may not be possible to collapse the lateral views back into a single level query. In which case the options then open to the Optimizer are less than they would be otherwise, as it must treat the unmerged views on their own.

So if you have a query involving OUTER JOIN's that is not getting as good an execution plan as you think it could get, it may be due to you having used ANSI join syntax. And when Oracle rewrites this to its native syntax it is not able to merge back together the inline views for one reason or another. In my case it seems to have been due to the fact that almost all columns allowed NULL values to be stored in them (except the primary key columns), and NULL values are not stored within indexes on disk. As a result, Oracle was not able to utilise this extra index when more tables were added to the query that were also outer joined.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 20 Jun 2012 19:54
Previously I showed some of the issues with allowing NULL values in columns and how indexes might be ignored by the Optimizer. All of the examples I gave involved single column indexes only, and showed that NULL values are not stored in such an index. As others have pointed out in comments, this is not necessarily true for multi-column indexes. If at least one of the columns in the index does not allow NULL values then an index entry is stored with the values of all of the indexed columns. This means that a NULL value can be stored in a multi-column index. Lets test that through an example.

Previously we saw that if we had an index on the ref_id_n column in the transaction table, which allowed NULL values, then the Optimizer would not use the index for an "IS NULL" constraint:
SQL_ID  cwm2cmgn8q09n, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3578 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)
The index on ref_id_n is being ignored, as the NULL values are not stored in that index. However, if we create another index, with ref_id_n as the first column followed by another column that does not allow NULL values, then the index will contain entries where ref_id_n is NULL. With such an index, the Optimizer can now use it for an "IS NULL" constraint:
SQL> create index ix2_transaction on transaction (ref_id_n, ref_id_nn) ;

Index created.

SQL> select count (*) from transaction where ref_id_n is null ;

COUNT(*)
----------
100

SQL> @xplastexec

SQL_ID 8tubzdty7vdnv, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 176942238

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| IX2_TRANSACTION | 1 | 4 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REF_ID_N" IS NULL)
So multi-column indexes can store NULL values in them, providing at least one column does not allow NULL values. The benefit of such an index can be significant - in this test case the cost came down from 3,578 to 2, simply because the number of NULL rows were so few. But an index on only the ref_id_n column itself is of no use for this query, and is ignored by the Optimizer.

Potentially you can also gain some benefit from an index where the ref_id_n column is not the leading index, as the index may be smaller in size than the table, and the Optimizer may chose an Index Full Scan rather than a Full Table Scan. And that is the case with the test data set I have been using:
SQL> drop index ix2_transaction ;

Index dropped.

SQL> create index ix2_transaction on transaction (ref_id_nn, ref_id_n) ;

Index created.

SQL> select count (*) from transaction where ref_id_n is null;

COUNT(*)
----------
100

SQL> @xplastexec

SQL_ID 2j64r2n1nq4xm, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 1095380460

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 726 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FAST FULL SCAN| IX2_TRANSACTION | 1 | 4 | 726 (1)| 00:00:09 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)

The Index Fast Full Scan is costed at 726, compared to the 3,578 of the Full Table Scan. Being a "count (*)" only, no other data columns are needed from the table itself.

So if you are allowing NULL values in columns within your database design, and you want to find those rows that have a NULL value stored, then you cannot use an index on just that column alone. You will need a multi-column index, and include another column that does not allow NULL values. However, it may be that the only column that does not allow NULL values in your database design is the primary key column(s) itself, if you simply allow NULLs for every column by default.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 13 Jun 2012 20:57
It is quite clear that Oracle does not store NULL values within indexes (B-Tree indexes specifically, being the most common type used within Oracle). This has been the case for many years - I remember reading about this in the version 7 days over 15 years ago. My understanding was that this was an optimization on Oracle's part, as they believed that NULL values would not be used in most queries, and that storing NULL values in an index might skew the index somehow if there were a great many number of NULL values in rows. There are several consequences of this non-storage of NULL values in an index, as I mentioned in a previous post on NULL values. But it seems that some people are still unaware of this issue, believing that NULL's are stored within indexes, and that as a result all indexes are equal. Not having any hard evidence to refer people to, I thought I would invent some simple tests to show that NULL's are not stored in indexes, and the various side effects of this. Also, assumptions can be dangerous things when taken too far, and the memory plays tricks over what I once read in the past. So providing some real test cases would either verify any assumptions I had made, or show up them as being wrong.

Tests Overview

I'll create two simple tables - one as a master table of codes (reference with 10,000 rows), and another larger table that refers to the master table as a foreign key (transaction with 1,000,000 rows). This means that there are 100 rows per Reference Identifier value in Transaction. The Transaction table will have two such foreign key columns - one without NULL's (ref_id_nn), and one with NULL's (ref_id_n).

Indexes will be created on these columns, and some example queries run using either reference column to show whether the indexes are being used or not. We can also look at the statistics on the indexes themselves to tell us something about whether NULL values are stored in them or not. I provide all the SQL to create and populate these two tables at the end of this post - I assume most people are interested in the results first.

Index Statistics

Immediately we can see that NULL values are not stored in an index by looking at the statistics for the indexes on the Transaction table.

col index_name heading 'Index'     format a20
col lb heading 'Leaf|Blocks' format 999,999
col dk heading 'Distinct|Keys' format 999,999,999
col cf heading 'Clustering|Factor' format 999,999,999
col nr heading 'Num Rows' format 999,999,999
--
select i.index_name,
i.leaf_blocks lb,
i.num_rows nr,
i.distinct_keys dk,
i.clustering_factor cf
from user_ind_statistics i
where i.table_name = 'TRANSACTION'
/
The results of this query are:
                         Leaf                  Distinct   Clustering
Index Blocks Num Rows Keys Factor
-------------------- -------- ------------ ------------ ------------
PK_TRANSACTION 1,875 1,000,000 1,000,000 13,147
IX1_TRANSACTION_NN 2,090 1,000,000 10,000 1,000,000
IX1_TRANSACTION_N 2,090 999,900 9,999 999,900

Look at the "Number of Rows" values and the "Number of Distinct Keys" values for the two indexes on the two foreign key columns. The index on the column allowing NULL's has one less distinct key value - because the NULL value has not been stored in the index. Whereas the index on the column without any NULL's has the full 10,000 distinct key values in it. This is also reflected in the number of rows covered by the index - 100 less for the index with NULL values than for the index without NULL values. So already we have evidence that NULL values are not stored within a B-Tree index i.e. they are ignored by the index itself.

Queries

In my previous blog post I made two claims about indexes on columns allowing NULL values not being used:
  • Such an index cannot be used to satisfy an "IS NULL" query constraint
  • The index also cannot be used to satisfy a "column != value" query constraint
Lets test these claims. In principle we want to run the same query against each of the two foreign key columns, and see whether the corresponding index is used. However, this may not be possible for one reason or another, as I'll explain in each case.

"IS NULL" Constraint

The query is:
select count (*) from transaction where ref_id_n is null ;
When executed the query has the following execution plan (as from dbms_xplan.display_cursor immediately after executing the query):
SQL_ID  cwm2cmgn8q09n, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3578 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)
So the index was ignored and a full table scan was done, even though there were only 100 rows to retrieve.

Unfortunately we cannot run this query using the other column, because it does not allow NULL values. So the Optimizer will know that this query cannot return any rows at all. (It actually did an Index Fast Full Scan when I ran it, and returned a count of 0 rows). Instead we can change the query to compare to a real value, rather than NULL, and see that it uses the index for the execution now, even though it is still 100 rows again from the 1,000,000 in the table.
SQL_ID  7burxn278qj8b, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n = 5

Plan hash value: 1807025728

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| IX1_TRANSACTION_N | 101 | 404 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REF_ID_N"=5)
This shows that an index on a column allowing NULL values cannot be used when executing a query with an "IS NULL" constraint on that column.

"!=" Constraint

I've since realised that the Optimizer would be more likely to choose a Full Table Scan whether the column allowed NULL or not for a "not equals" constraint, simply because of the number of rows being returned. If 100 rows match a value, then 999,900 do not match that value. And a Full Table Scan will be the optimal access method regardless of whether the column allows NULL values or not.

However, if the query only did a "count (*)" and did not retrieve any data columns from the table, then potentially an index could be used - with the Index Fast Full Scan access method. Consider the following performed on each of the two foreign key columns:
select count (*) from transaction where ref_id_nn != 5 ;
When I execute this I get 999,900 for the non-NULL column, and 999,800 for the NULL allowed column. Whoops! Something is not right somewhere. Again, we are back to the point that NULL is not a "normal value" within the value space of the data type of the column. And a NULL value is treated differently by Oracle rather than just another value. So although NULL is never equal to another real value, it is also never not equal to another real value. Even if I force a full table scan via the "full (transaction)" hint, I still get a count of 999,800 for the "ref_id_n != 5" constraint. (The execution plan from dbms_xplan.display_cursor showed that "TABLE ACCESS FULL" was used on TRANSACTION). [This very point about "!=" was also mentioned in a comment by Narenda on my previous post, and I had independently stumbled upon this as a result of this series of tests].

So even though we have not been able to show one of my claims about NULL values and indexes, we have instead stumbled upon another issue with NULL values. A NULL value is never equal to a real value, and also never not equal to a real value. I am sure that I have seen Jonathan Lewis discuss this in a presentation when talking about rewriting SQL statements into their equivalents. If a column allows NULL values then you must be careful when rewriting equality constraints into equivalent inequality constraints.

Further testing threw up another anomaly. Rather than test "!=" I thought I would test "not in", but contrived so that there is only one value in the "not in" list. For the NULL allowed column the Optimizer chose a full table scan:
SQL> select count (*) from transaction where ref_id_n not in (select 5 from dual) ;

COUNT(*)
----------
999800

SQL> @xplastexec

SQL_ID 0f3nh5h11yu6p, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n not in (select 5 from dual)

Plan hash value: 297680891

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 23425 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL| TRANSACTION | 998K| 3900K| 3578 (1)| 00:00:43 |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( IS NULL)
4 - filter(LNNVL(:B1<>5))

While for the non-NULL column it used the index:
SQL> select count (*) from transaction where ref_id_nn not in (select 5 from dual) ;

COUNT(*)
----------
999900

SQL> @xplastexec

SQL_ID fw2svuam6vzb1, child number 0
-------------------------------------
select count (*) from transaction where ref_id_nn not in (select 5 from dual)

Plan hash value: 1550919231

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 20422 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | FILTER | | | | | |
| 3 | INDEX FAST FULL SCAN| IX1_TRANSACTION_NN | 998K| 3900K| 575 (2)| 00:00:07 |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( IS NULL)
4 - filter(LNNVL(:B1<>5))
So I think that this example does show that there are times when the Optimizer will not use an index on a column allowing NULL values, but would otherwise use an index if NULL's were not allowed.

NULL as "Anything"

Another pattern I have seen is where a NULL value is used to mean "any value", rather than "no value". So in a system that involves processing transactions according to type, and each type is processed by a different handler, then a NULL value means that the transaction can be processed by any available handler. This leads to a query similar to the following:
select ... from transaction where (ref_id_n = 5 or ref_id_n is NULL) ...
The execution plan for this cannot use the index because NULL values are not stored in it. Hence you get an execution plan of
SQL_ID  4tsn8vbt3s3gh, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n = 5 or ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3580 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 101 | 404 | 3580 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("REF_ID_N"=5 OR "REF_ID_N" IS NULL))
However, if the database design was different, and the value '0' used as a special case for "any handler", with a corresponding row inserted into the "reference" table, then the query could be changed to the following and Oracle would use the index on the column:
SQL_ID  d2v9fgscm8nrq, child number 0
-------------------------------------
select count (*) from transaction where ref_id_nn = 5 or ref_id_nn = 0

Plan hash value: 1215059433

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INLIST ITERATOR | | | | | |
|* 3 | INDEX RANGE SCAN| IX1_TRANSACTION_NN | 201 | 804 | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("REF_ID_NN"=0 OR "REF_ID_NN"=5))
Again, the presence of NULL values results in sub-optimal query execution, and a different design can provide a better solution, allowing the Optimizer to use an index for much faster access (cost of 4 versus 3580).

Conclusion

For now I think I have written enough in one post about NULL's and indexes. My main point was to provide some real evidence that NULL values are not stored in B-Tree indexes in Oracle, and that if NULL values are allowed in columns it can affect whether the Optimizer will use an index or not. And I think I have done that. I'm sure that people experienced with Oracle already know this, but I just wanted to provide some proof points for anybody who doubted it for any reason. As ever, the benefit of some real tests is that you get to either verify any assumptions you have made or show them up to be wrong and correct them. In this case I have been able to correct some weak assumptions I had, and have learnt some more about how the Oracle Optimizer works, specifically when handling NULL values.

Table Creation SQL

Here is the SQL to create the tables, populate them with data and index them. Note the two foreign key columns in "transaction", and that one value is mapped to NULL when generating the data.
--
-- Create the tables necessary for the NULL values investigation
-- Create tables, load in generated data, index, gather statistics
--
prompt Creating tables
--
-- Reference table - a lookup of master values
--
create table reference (
ref_id number (*,0) not null,
group_id number (*,0) not null,
description varchar2 (512) not null,
constraint pk_reference primary key (ref_id)
) ;
--
-- Transaction table
--
create table transaction (
trans_id number (*,0) not null,
ref_id_nn number (*,0) not null,
ref_id_n number (*,0) ,
location_id number (*,0) not null,
padding varchar2 (512),
constraint pk_transaction primary key (trans_id)
) ;
--
alter table transaction add constraint fk_ref_nn
foreign key (ref_id_nn) references reference (ref_id) ;
--
alter table transaction add constraint fk_ref_n
foreign key (ref_id_n) references reference (ref_id) ;
--
prompt Loading data
--
-- SPECIAL DUMMY ROW!! Keep in sync with any newly added columns!*!
insert into reference values (0, 0, 'Unknown - Dummy entry for referential integrity purposes') ;
--
insert into reference
with generator as
(select rownum r
from (select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 1000000
)
select r,
mod (r, 1000),
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' || to_char (r, '00000000')
from generator g
where g.r <= 10000
/
commit ;
--
insert into transaction
with generator as
(select rownum r
from (select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 1000000
)
select r,
mod (r, 10000),
decode (mod (r, 10000), 0, null, mod (r, 10000) + 1),
mod (r, 1999),
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789' || to_char (r, '000000000')
from generator g
where g.r <= 1000000
/
commit ;
--
prompt Creating indexes
--
create index ix1_transaction_nn on transaction (ref_id_nn) ;
create index ix1_transaction_n on transaction (ref_id_n) ;
--
prompt Gathering Statistics
--
exec dbms_stats.gather_schema_stats ('JB')
--
So there are 10,000 rows in Reference, and 1,000,000 rows in Transaction. This means that there are 100 rows per Reference Identifier value in Transaction.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Sunday, 10 Jun 2012 17:59
On the one hand I understand the added value of the concept of a "NULL" value - meaning that no value has been stored in a column in a row in a database table. On the other hand I have always felt that allowing NULL values in a column leads to more trouble of one form or another down the line. And recent experience with another database design has again reinforced my concerns. For another description about some of the problems with NULL values and what they do or do not mean, you can also read Much Ado About Nothing?, which is quite informative.

If I had to summarise my position on allowing NULL values, then I think of them similar to how I think of a loaded gun - not directly harmful itself, but a there is a reasonable chance of it going wrong and resulting in serious damages. If you need to use them, and you have a clear reason why, then use them. But otherwise avoid them as much as possible.

Allowing NULL values in a column can be bad for all kind of direct and indirect reasons.
  • Oracle does not store NULL values in normal B-Tree indexes, which limits when the indexes can be used on columns that allow NULL values
    • Such an index cannot be used to satisfy an "IS NULL" query constraint
    • The index also cannot be used to satisfy a "column != value" query constraint
  • Foreign keys that allow NULL are "optional", but again have index and optimizer side effects.
  • Outer joins may become necessary, which again limit the choices available to the Optimizer.
  • Many programming languages have no concept of NULL, being a value outside of the normal value range by definition.
    • Either NULL values require special programming, or require some kind of shadow boolean variable to indicate if it is really NULL (no real value in the main variable).
  • Many programmers struggle with the difference between NULL and empty or blank, which can all mean different things within a programming language e.g. a zero length string, or the number zero.
  • NULL cannot be using in normal equality and range tests. As it is outside the normal value range it cannot be greater than or less than a real value. And it cannot be equal to any other value in a column, including another NULL. You must use the special constraint "IS NULL" to test for a NULL value, and never "= NULL".
  • Calculations involving a NULL value evaluate to NULL i.e. unknown. 
So NULLs are definitely useful, but each case should be considered carefully before using them.  Generally it is easier to not allow NULL values in columns, as all operations are easier.  Allowing NULL values requires some thought to make sure that it ends up working as you hoped it would.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 31 May 2012 21:19
Just what is Performance? I don't mean a formal definition of Performance - through it would probably involve things like measurements and elapsed time and resources used and units of work completed. And I don't mean Performance Tuning - which is a reactive action to a perceived problem. I'm on about how you design Performance into a new application or database. As a proactive action when designing and building a new database based application. What do you do differently compared to if you did not care about Performance and simply ignored it?

I believe a "Performance Based Design" has to think about how the application will be used. One of my personal sayings about performance is that computer systems and applications do absolutely nothing until end users submit some work for them to do. The application sits there waiting for another request to be submitted. So the performance of the application totally depends on the work the application is trying to do, which in turn depends on what the end users are doing. I don't believe Performance can exist in isolation from an Application, much like Data without a Program to process it is not much use to anyone.

Which means that designing with Performance in mind is really about clearly documenting how the application will be used, so that those use cases can feed into the final design, and be used to review the application once implemented. From a database point of view, this not only means finding out all the data items and the relationships between them - fairly standard database design type stuff - but also establishing which data items will be accessed more frequently than others.

Generally I agree with the argument about "premature performance optimization", where someone tries to immediately optimize part of the first design for best possible performance, without knowing whether it is a critical part of the application or not. A good design will be good enough (Agile principles too I believe), while being deliberately designed to allow changes in the future. Performance bottlenecks can be redesigned and rewritten when they are identified as real bottlenecks, and not before.

Keeping your options open within your design is important in being able to achieve good performance. Inflexible designs do not leave any room for improvement, without major modifications and rewrites.

Instrumenting the application is another very important thing to factor in. Building in performance measurement to your application will set you up to be able to identify any bottlenecks, and to turn nebulous "slowness" as perceived by end users into real numbers - both before and after any performance improvements you have made. It can also provide an audit trail of performance over time and can be used to show whether response time is indeed increasing, and whether the workload of transactions submitted has also increased.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 20 Sep 2011 21:24
On the one hand I don't want to boast, but on the other hand there are a couple of things I am proud of at the moment and thought I would take the opportunity to share. If only to record them for posterity.

First, I am presenting at the UK Oracle User Group UNIX SIG in a few days time (Thu 22 Sept) on Queueing Theory. I thought I would volunteer for something in the spirit of giving back, and was pleasantly surprised when they accepted my submission. Nothing revolutionary in the presentation itself, but its nice to know that I have something useful to offer others.

Second, I answered a question posed on Richard Foote's blog the other day and was again pleasantly surprised when he said I got the answer right. "Spot on" was Richard's reply. Which was a nice surprise. I expected to be close but I also expected to have missed something out somewhere. That's the danger when you quickly reply to a question posted on the web, which becomes obvious when someone else points your mistake later. But this time I got it right.

Back to normal posts later,
John

Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 20 Jul 2011 13:44
Craig Shallhamer recently did a number of posts about reporting CPU usage from the data Oracle was gathering, and in this he made a distinction between CPU Cores and CPU Threads. His focus was on the correlation between the values Oracle was reporting and those reported by other operating system tools - were they the same and could you directly use the values reported by Oracle? (His conclusion was that they were the same). What he didn't go into was the difference between CPU Cores and CPU Threads, and what the "real" CPU capacity of each type is. I think this is an important issue for anyone dealing with performance on modern computer systems, as the two types of "multi-CPU" are not the same, and they exhibit radically different scalability behaviour. And what compounds this is that most operating systems do not distinguish between the two types of CPU, and so will misreport (or lie if you want to) how much CPU capacity you have and how much is being used.

The purpose of this post is to try and bring out this distinction between the two types of "multi-CPU" in a single processor chip, and show that multi-threaded CPUs do not behave as expected, leading to unexpected and negative scalability results under increased workloads. The wrong time to discover that the reported "spare, unused CPU capacity" in your system is wrong is when you experience an increase in your transaction volume, and everything just runs slower rather maintaining the same response times.

Some definitions:
  • A CPU Core or Execution Core is a physical piece of silicon in the CPU chip that actually executes instructions on data values.
  • A CPU Thread is a virtualized thing that in reality sits on top of an underlying CPU Core.
The key difference is that because of the virtualization, multiple CPU Threads will share a single CPU core. Thus I might have 2 CPU Threads but only 1 CPU Core. If I have 2 CPU Cores then they can execute 2 instructions simultaneously, independently and in parallel. I have 2 times the capacity of a single CPU core. If I have 2 CPU Threads sharing a single CPU Core, then it can still only execute 1 instruction at a time. This is true regardless of the number of Threads being virtualized on the same CPU core - there is only one CPU core and it can only execute one instruction at a time. The virtualization layer may let it switch between the different Threads very often, so that they all have some of their instructions executed, but it is always only executing one instruction at a time.

If I have 2 CPU Cores I can get twice the work done in a period of time. But if I have 2 CPU Threads sharing the same physical CPU Core then I can get no more work done in a period of time than if there were only 1 CPU Thread. And I will show this in a moment.

Most operating systems though don't distinguish between these two types of multi-CPU flavours. They report CPU Cores and CPU Threads as if they each had an equal capacity to do work. Which means that most operating systems are over reporting the available CPU capacity if the underlying CPUs use CPU Threads on shared CPU Cores.

Tests
A solid example. I have a laptop with an Intel Core i3 M350 CPU in it. This has 2 physical CPU Cores, and on each of these it has 2 CPU Threads. This gives a total of 4 exposed CPUs, but on only 2 CPU cores. The laptop is running Linux (XUbuntu) and I have installed SAR via the sysstat package. /proc/cpuinfo shows 4 CPUs (processors 0 to 3), and clearly this is used by SAR in calculating CPU capacity and CPU utilisation.

I have a simple CPU test using the factor utility, which prints out the prime factors of any number. Any CPU only test will do, and factor is a CPU bound utility. Feed it a list of very big numbers and it will use up some CPU testing prime factors. By using the same numbers I can repeat this any time I want to. How will this behave when I run more instances of this test at the same time? The system says that it can see 4 CPUs. But I know that the Core i3 has only 2 physical CPU Cores to execute instructions on.

The only fly in the ointment is that because this laptop is running X Windows and I am typing in my results to it there is a small constant CPU overhead of around 5%. So the system is not 100% idle when I run these tests. I mention this to be as honest and open as I can be about the tests I did.

Running factor once against a list of large numbers (see elsewhere) and using "time" to measure its elapsed time, takes 7.779 seconds and produces the following SAR output:
12:49:44        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:49:46 all 0.74 0.00 0.74 0.00 0.00 98.51
12:49:48 all 1.74 0.00 0.62 0.00 0.00 97.64
12:49:50 all 10.88 0.00 1.64 0.00 0.00 87.49
12:49:52 all 27.66 0.00 2.78 0.00 0.00 69.57
12:49:54 all 27.64 0.00 2.67 0.48 0.00 69.21
12:49:56 all 27.79 0.00 2.55 0.73 0.00 68.93
12:49:58 all 15.12 0.00 2.03 0.00 0.00 82.85
12:50:00 all 3.29 0.00 3.06 0.94 0.00 92.71
As expected, because Linux believes that there are 4 CPUs in the system and I have run one CPU bound task it reports that it is using 25% of the CPU capacity. "time" reported that 7.440 seconds of user CPU time was used. System CPU time is always negligible in these tests.

When I run 2 of these factor tests at the same time they take 8.034 seconds - about the same elapsed time allowing for other factors affecting this measurement.
12:52:16        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:52:18 all 2.09 0.00 1.72 0.00 0.00 96.19
12:52:20 all 0.61 0.00 0.49 0.00 0.00 98.90
12:52:22 all 25.73 0.00 3.88 0.85 0.00 69.54
12:52:24 all 52.32 0.00 3.58 0.72 0.00 43.38
12:52:26 all 51.49 0.00 3.93 0.00 0.00 44.58
12:52:28 all 51.57 0.00 3.13 0.96 0.00 44.34
12:52:30 all 22.18 0.00 1.72 0.00 0.00 76.10
12:52:32 all 1.25 0.00 0.25 1.50 0.00 97.01
12:52:34 all 1.10 0.00 0.74 0.00 0.00 98.16
As expected, SAR is reporting 50% CPU utilization - double our 25% before - and still around 45% CPU capacity unused (idle).

Note also that "time" reports that 15.465 seconds of user CPU time was used by these 2 factor tests. This ratio of 2:1 between CPU used and elapsed time shows that the 2 factors were indeed running simultaneously in parallel.

Next I run 4 of the factor tests together. How long will they take - still 8 seconds? And what will SAR report?

In fact they took 13.555 seconds, with time reporting 47.991 user CPU seconds. SAR reported 100% utilisation:

12:56:40 CPU %user %nice %system %iowait %steal %idle
12:56:42 all 0.70 0.00 0.58 0.00 0.00 98.72
12:56:44 all 0.12 0.00 0.74 1.74 0.00 97.39
12:56:46 all 22.10 0.00 1.85 0.12 0.00 75.93
12:56:48 all 96.62 0.00 3.38 0.00 0.00 0.00
12:56:50 all 96.88 0.00 3.12 0.00 0.00 0.00
12:56:52 all 96.62 0.00 3.38 0.00 0.00 0.00
12:56:54 all 97.00 0.00 3.00 0.00 0.00 0.00
12:56:56 all 96.00 0.00 4.00 0.00 0.00 0.00
12:56:58 all 96.88 0.00 3.12 0.00 0.00 0.00
12:57:00 all 26.37 0.00 1.74 0.87 0.00 71.02
12:57:02 all 0.25 0.00 0.62 0.99 0.00 98.14
12:57:04 all 0.62 0.00 0.86 0.00 0.00 98.52

Now, if there were 4 real CPU cores I would not expect the elapsed time to increase from about 8 to 13.5 seconds (over 50% increase). Furthermore, something is definitely wrong with the reported CPU usage - 47.991 seconds (nearly 48) for 4 compared to 15.465 (15.5) for 2 factors. I'll come back to this in another post (the operating system is actually measuring allocation of CPU time, not real usage of CPU cycles). The fact that one factor used 7.440 seconds of CPU means that four factors should not use more than about 30 seconds (7.5 * 4). So 48 seconds is way off the mark.

For completeness 3 factors together take 10.946 seconds, with 29.738 CPU seconds, and SAR said about 75% utilisation (75% for factor + 5% for other background processes):

13:00:24 CPU %user %nice %system %iowait %steal %idle
13:00:26 all 1.24 0.00 0.99 0.12 0.00 97.65
13:00:28 all 1.70 0.00 0.73 0.85 0.00 96.72
13:00:30 all 21.98 0.00 2.56 0.00 0.00 75.47
13:00:32 all 79.15 0.00 4.49 0.62 0.00 15.73
13:00:34 all 79.72 0.00 3.50 0.00 0.00 16.77
13:00:36 all 79.75 0.00 3.88 0.00 0.00 16.38
13:00:38 all 74.63 0.00 2.74 0.00 0.00 22.64
13:00:40 all 52.11 0.00 3.49 1.32 0.00 43.08
13:00:42 all 12.07 0.00 1.52 0.00 0.00 86.40
13:00:44 all 2.31 0.00 0.85 1.33 0.00 95.51
13:00:46 all 1.43 0.00 0.95 0.00 0.00 97.62
From this I draw the following conclusions:

First, as I described, CPU Threads are virtual and the real measure of available CPU capacity is CPU cores. That is the "real CPU capacity available", regardless of the number of threads on top of the physical CPU cores.

This is shown by the increase in elapsed time when running more than 2 factors together. If there really were 50% CPU capacity unused and idle then there should be no significant increase in elapsed time for 3 or 4 factors together compared to only 1 or 2.

Interestingly 8 factors took 26.790 seconds and "time" reported 1 minute 37.194 user CPU seconds (97.194 seconds). Both almost perfectly double the 4 factors test results, because the CPUs are fully saturated.

Second, the operating system and hence SAR is lying when it says that there is 50% CPU capacity unused and available when I ran 2 factors together. A "CPU Thread" is unfortunately not a real CPU, but is rather sharing a real CPU with other "CPU Threads".

Third, this is also important to Oracle based systems. A "perfect" or "ideal" Oracle system is both well tuned and running a well designed and written application. It will minimise disk I/Os because they are the slowest operation, and it will scale well as both workload and system capacity are increased. This means that in reality such an Oracle system is CPU bound - transactions are limited by how fast the CPUs are, and not by how fast the disks are. Adding more transactions by increasing the workload will require more CPU capacity to maintain current response times.

Using CPUs that expose multiple CPU Threads on shared CPU execution cores will result in poor scalability under increasing workloads, and a significant increase in response times when the used CPU capacity reported equals the number of physical CPU cores. This is exactly what happened in my tests - up to 50% reported CPU utilisation the elapsed times of factor remained constant, but beyond 50% reported CPU utilisation the elapsed times increased in proportion to the number of concurrent factor's running.

Personally I would either avoid all such multi-threaded CPUs, or would switch off all the extra Threads leaving only one CPU Thread per CPU Core. That way I know that the operating system is reporting "real" CPU capacity and utilisation, and that there is no hidden knee in the response time as I increase the CPU usage under increasing workloads.



Source Code
The factor test is just running factor with input redirected from a list of very large prime numbers:
factor < factors 
I can run this and time how long it takes. I run multiple factors by putting the same command into a file and running that. Each factor runs at the same time (& runs it as a separate, child process).

#!/bin/sh
#
factor < factors &
factor < factors &
#
wait
The list of factors is the same set of large prime numbers repeated 10 times in the file:
999999999989
99999999999973
99999999999971
99999999999959
99999999999931
99999999999929
99999999999923
99999999999853
99999999999829
99999999999821
99999999999797
99999999999791
99999999999701
99999999999673
99999999999503
99999999999481
99999999999469
99999999999467
99999999999463
99999999999457

Author: "John Brady (noreply@blogger.com)" Tags: "performance, multi-threaded CPU, scalabi..."
Send by mail Print  Save  Delicious 
Date: Tuesday, 28 Jun 2011 21:23
One of the themes I'll keep coming back to is how things are often not how you have assumed they are, and how there are always opportunities to learn something new about Oracle. I am quite familiar with histograms in Oracle from reading the manuals and white papers, but have always left it to Oracle to decide what columns histograms were needed. This is mainly due to the databases I have worked on not being large or complicated enough to need further analysis so far. But knowing the day would come when I would have to do something I thought I would review what I knew, and in doing so learnt a few new things - or rather realised that some of my assumptions had been incorrect.

Rather than re-read the manual I decided to go to a better source of knowledge rather than just raw information - Jonathan Lewis's Cost Based Oracle Fundamentals. I already knew that Oracle had 2 types of histogram - Frequency and Height Balanced. And I knew that Frequency stored a row count per individual data value, while a Height Balanced stores a row count per range of values (strictly the row count is the same per range, and it is the data range size that varies). And I knew that the Optimizer in Oracle used this extra row count data in the histogram to calculate better estimates for row counts that would match constraints in queries. In reading the chapter on Histograms and working through the examples and doing my own tests I realised that some of my assumptions had been wrong about how Oracle uses these histograms.

Assumption 1 - Oracle uses all the entries in a histogram to calculate its row count estimate

I knew this to be true for a Frequency Histogram, and had assumed something similar for a Height Balanced one. But I was wrong. In a height balanced histogram although Oracle may have up to 254 buckets, it only uses the ones that contain the same end point values to identify "popular values" - those that occur more frequently than the width of a single bucket, and so span more than one bucket in size. For all other values it ignores the histogram completely and uses the column level Density statistic in the same manner as if there were no histogram.

In other words there is a threshold, being the number of rows covered by a single bucket. For data values that occur more often than this and are also recorded in two or more buckets in a height balanced histogram, the Optimizer uses the number of buckets to calculate the estimated row count, along with the number of rows per bucket. For all other values Oracle assumes a uniform distribution and uses the column level Density statistic. This will not be the same as one over the Number of Distinct Values (NDV), but is calculated differently to remove the effect of the rows for those popular values i.e. the Density is lowered to the average of the "unpopular" values.

Assumption 2 - A Frequency Histogram has an entry for all data values occurring in the table

This is true if either the table is relatively small, or you force Oracle to read all the data rows when gathering statistics. If you leave Oracle to use its default sampling method and you have a large table then some values may not be sampled, and they will be missing from the Frequency Histogram produced.

What Oracle does is to calculate and store a Density value for the column that is half that of the least frequent occurring value in the histogram. Values that appear in query constraints that do not appear in the Frequency Histogram are therefore assumed to occur at half the row count of the value with the smallest row count in the histogram. So again, the Optimizer may end up not using a histogram that exists and instead use the Density statistic of the column when executing a particular query.

None of this is new, and I was able to double check this via various other sources. It was just another thing to add to the list of assumptions I've made in the past that turn out not to be true. Even with histograms in place there is a reasonable chance that the Optimizer will actually not be using the histogram itself but instead the Density statistic value of the column. And also changing the value of the Density for the column can have an impact on queries, even when there is a histogram on that column.

Author: "John Brady (noreply@blogger.com)" Tags: "histograms, density, optimizer"
Send by mail Print  Save  Delicious 
Date: Thursday, 16 Jun 2011 20:59
Following on from my previous post about some SQL that needed to be tuned, I thought I'd summarise some additional important lessons I've learnt about using hints. I already knew quite a lot about hints in Oracle from reading different sources, and to avoid them as much as possible. In spite of this I have still learnt a few new things about using hints in practise. I know that others have said most of this before, but it bears repeating because it really does affect whether any hints do work or not.
  1. Name all your query blocks using the qb_name hint. This clarifies things both in terms of how Oracle itself reports back your execution plan, and in terms of how you specify your hints.
  2. Qualify all object references with query block names e.g. t1@mainq. Again, this is how Oracle reports object references, and it is precise in terms of where the hint should be applied.
  3. Check your hints are being used with the 'outline' option to dbms_xplan.display_cursor. If the hints being listed are the same ones that you used, then all is well. If not, then it is likely that some of your hints are actually being ignored.
  4. Test the hints individually. This follows on from the previous point about proving the hint is recognised and honoured by Oracle. It is possible that a similar hint is being automatically produced by the Optimizer as a result of another hint you are using. In my case it looks like a USE_NL hint was being ignored, but a Nested Loop was produced anyway because of an INDEX hint.
  5. Include all other relevant hints, such as LEADING and UNNEST. Previously I would have assumed that these would be produced by the Optimzer automatically but Jonathan Lewis includes them in his hint sets so they must be relevant.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 09 Jun 2011 21:24
[I'm back. I haven't posted for a long while due to a combination of being busy and somewhat losing focus on the blogging side, but I hope to post more frequently in the future. I was going to start with a post on histograms in Oracle but ...]

I work for a software house, whose heavyweight application runs on top of Oracle. Earlier this week I was dealing with a customer who had poor performance on just one SQL query after having updated their statistics. I spent a day analysing the SQL and the data - what was it trying to do and why - and another day trying different solutions to make it use a different execution plan and run faster for them. I then sent the results to the support people dealing with this customer so that the solution could be implemented. Imagine my surprise then when I see today that Jonathan Lewis has a blog post about the same SQL statement, and makes reference to a post on one of the Oracle Technical Forums.

I'm surprised that a customer would post SQL from a third party application on the web like that, and that they would somehow expect a better answer to come back than from the software vendor themselves. I was even more surprised because they were asking for a set of hints to change the execution plan, and I had just come up with such a set of hints just 2 days earlier. Okay, maybe this had not been forwarded to the customer yet, because our Support department was doing further testing on it first. So it is highly likely that the customer had not seen my solution yet. But I'm still amazed that a customer would be willing to put into production some recommendation they had got off a web forum, and on the basis that they could not work out a better solution themselves - the poster did not know how to do hints properly.

In terms of the problem itself, the cause of the poor performance is actually the extreme skew within the data in this table. On average most of the date values occur thousands of times, but the particular value used in the query only occurs about ten times in the table. Hence another execution plan is faster for this particular value.

Which brings me back to histograms, which I was planning on doing a completely different post on anyway. There is a histogram on the constrained column, but it is a height balanced histogram because there are over 254 distinct values, and so there are no statistics on row counts of individual values other than the most popular values. The value used in the query is unpopular, and the average number of rows per value across all the unpopular values is in the tens of thousands. Hence the choice of an execution plan using a HASH JOIN, as that scales in a different way to higher row counts than a NESTED LOOP join does.

Maybe Oracle should introduce a new type of histogram? One that records both the most popular values, and the least popular values, and then stores a single average for all the values in between (the density). That would have helped here. It certainly seems the case that Oracle does not handle heavily skewed data sets well, though of course you should never expect a "one size fits all" solution to be perfect for all possible scenarios. What the Optimizer does is try and produce an execution plan that is the best for most of the possible input values. Which is what it did here, based on the statistics available for the average number of rows per date value via the density statistic.

Another viable solution is to make the Optimizer believe that the date constraint will match far fewer rows and so choose the NESTED LOOP join method itself. This is the approach put forward by Wolfgang Breitling in his paper on Tuning by Cardinality Feedback, in which he suggests changing the Density statistic stored for the column. And indeed reducing the density value by a factor of 100 has this effect. The upside of this approach is that it avoids the need to change the SQL or use hints, which do not always work as intended.

Author: "John Brady (noreply@blogger.com)" Tags: "performance"
Send by mail Print  Save  Delicious 
Date: Monday, 07 Mar 2011 20:20
I really hate, loath and detest "Agile Development". At least I hate how they use it where I work. For the record I actually agree with most of the principles and points of Agile Development. I'm pretty sure I have said so before. But what they do in the name of Agile Development where I work is really terrible.

The list of "crimes" conducted by the development teams against databases and development in general by their "use" of Agile methods continues to get longer and longer. Lately I have had to deal with a number of such crimes and their consequences.
  • One group went ahead with development against nothing more than a strawman design, that in turn was for only verbally stated requirements. No one could agree in a meeting I attended what the actual requirements were. The number of times someone said "I assumed he meant ..." was incredible.
  • Another group designed and implemented their own tables, but now have no time to have their database design reviewed before the release deadline. It "must" be delivered as it is, because it is too late to change anything.
  • Another group simply did not bother to document anything. When asked at how they arrived at their final solution and the analysis they went through, they could not explain or justify anything.
  • All of the groups have used the phrase "we left that for another sprint" for some major, critical piece of functionality. They are simply ignoring anything that is too difficult.
  • Iteration speed is more important than getting it right or meeting all the requirements. The fact that the speed of iteration is introducing more errors that need to be corrected by future sprints never gets flagged by anyone. I am sure the quality of what is being delivered is going down, but everyone else is happy that we now have some more buttons to click on the user facing screens and forms.
  • Design and code reviews are simply ignored. The only measurement seems to be the volume of exposed and visible functionality delivered. Whether it works or not is never really considered. By that time the developers have moved onto the next sprint, and the next piece of functionality to be delivered. The push is for additional functionality added to the application software at all costs.
  • Many application level requirements are simply ignored in early development sprints, just to deliver some usable functionality as early as possible. The fact that the initial design and development cannot be extended in the future to meet the ignored requirements is itself simply ignored.
I really want to like Agile, but the things done in the name of Agile are truly horrible. The attitude of simply ignoring critical dependencies and then expecting the rest of the world to reorganise itself around their particular problem and solve it for them is amazing. And if you don't put everything down immediately and jump , you are accussed of being the blocker who is holding up development and stopping delivery of the wonderful software they have all developed. The fact that it is seriously crippled and will never work properly is somehow never their problem. And the most common answer to my any question I ask about the development they did is "we never had time to do that (so we didn't)".

I now term Agile the "ignore it today, it will be someone else's problem tomorrow" development methodology. I really do think that many of the developer's believe in a world of magic fairies or pixies that will sprinkle magic Agile dust everywhere in the middle of the whole development, and all of their ignored issues will just disappear and go away. Unfortunately, everyone else is living and dealing with the real world. Have you ever tried living in a house built and delivered one room at a time, all with separately laid foundations, and installed plumbing, electrics, windows and doors? It is a mess, and it gets worse over time as more and more rooms are added on, because there was never an overall design, and no one ever bothered to think about future requirements until they had to deliver them.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 25 May 2010 21:08
A generally good rule of thumb is that date columns should appear at the end of a multi-column index and not at the start or the middle. Why? Because generally date columns are used in range comparisons in queries a lot more often than equality comparisons. The effect of putting such a date column in the middle of an index is to distribute values of the following columns across the different dates within the index.

What this means is that if you have an index on 3 columns - A, D, Z say - where D is a date and A and Z are not (either numeric or a character string) and you execute a query constraining on all 3 columns but with a range constraint on D, the database server will have to read and check a great many index entries

select ... from T where A = 123 and D > to_date ('20090101', 'YYYYMMDD') and Z = 456 ;

An index is a tree like structure organized by the columns in it. So at the highest levels we branch out for different values of A. Then lower down, for each value of A it branches out for each value of D within that particular value of A. And underneath that we have each value of Z that occurs for each value of D. The value 456 of Z may only occur a few times in the whole table, but it could occur on any value date of D, and so could be in different parts of the index.

This means that when executing the query, the database server will traverse the index, find the branch for A with the value 123 and then all the values for D underneath it. Due to the date range constraint, it will have to check many different values of D to see which ones also have Z with a value of 456.

If the index was on A, Z, D instead, then fewer index entries would need to be checked when executing the query. First it would traverse the index to 123 for A. Then it would traverse down the next level to 456 for Z. Then it would traverse down the next level to D and all values greater than the specified date (1/1/2009). At this point all such entries in the index match the 3 constraints, and so can be retrieved.

This alternate index leads to far fewer index entries being read and checked when executing this particular query.

I am not saying that dates should always go at the end of indexes. If Z was not constrained in the query, only A and D, then the second index would not be much use, and the first index would be much better. There may also be cases where dates are used with equality constraints rather than ranges. But the general rule is still a good one. Put date columns at the end of an index, unless you know the queries being executed would benefit from that date column appearing earlier in the index. As ever the best indexes are those that are most useful to the queries you execute and cover the referenced columns.

An example of this I have just come across resulted in a query execution time coming down from 10,000 ms (10 seconds) by an order of magnitude to 20 ms - a major improvement I think you'll agree. To me this just shows the greater "efficiency" of the index - far fewer index blocks need to be visited to satisfy the query, and those fewer blocks are more likely to be cached in memory too. Hence the orders of magnitude reduction in the elapsed time of the query.

Furthermore this query is executed in a loop some 200+ times within the application I was looking at. So the net effect is not just 10 seconds once down to less than 1 second once, but really 2,000 seconds or 33 minutes down to less than 33 seconds across the overall job each day it is run. A worthwhile improvement from having an alternate index with the date column at the end of it.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 29 Mar 2010 19:52
On the face of it I think DTrace is a phenomenal piece of technology from Sun. That on the fly you can turn on instrumentation on all kernel events and system calls, and write your own scripts to count calls, sum elapsed time, or look at which files had the most I/Os to them, and much more. All implemented with no external change to Solaris itself - all existing commands and utilities run the same - and minimal impact on performance when not enabled. Truly amazing.

For the record I should confess that I was a Sun employee when DTrace was initially announced. However I was an Oracle Consultant and had nothing to do with Solaris kernel development or DTrace. My viewpoint was and remains that of a user of Solaris, and I did not gain any inside knowledge about DTrace one way or another.

I did see the demonstrations that are commonly repeated, where they start off a rogue process and then use DTrace scripts to see what is going on, and which process is causing all the extra work on the system. It looks so simple and so direct - a few iterations and you have identified the process in question and got some of the details of what it is doing.

I have been waiting since then to use DTrace in anger on some real life investigations. It has taken some time for Solaris 10 to be fully rolled out by most organizations. And that opportunity just occurred the other day. A chance to use DTrace to see what an application was doing, and in which functions it was spending most of its time.

A quick peruse of the Web finds the DTrace Toolkit, in which I find something called dappprof.d which does something similar to what I want - elapsed time per function call. I write my own D script for what I want, using similar probes and timestamp calculations, and am ready to look inside this application. Problem solved? No!

Unfortunately, DTrace does not work! Or at least that is what seems to be happening when I try and use it. Which is why I am writing this blog post. Sorry to be so negative, but either it does what it is supposed to do or it does not. And unfortunately it simply does not do what it is supposed to. And I've just spent 2 days banging my head against a brick wall trying to get DTrace to do what it should be doing. The good news is that there is an "answer" to the problem I had, which I'll get to later on. But this could all have been avoided if some of the "details" of DTrace had been done properly by Sun - like useful and helpful error messages, and configuration options that made sense.

First, unlike other software products which tell you WHY something went wrong so that you can fix it, DTrace simply says "I'm not going to do that" and stops. Shades of HAL from 2001: A Space Odyssey spring to mind. Generally whenever something goes wrong with Oracle or a programming language compiler I get a specific message - "syntax error at line xxx", "missing keyword expected", "out of memory". With DTrace you get a message that amounts to "I did not like that so I am stopping" one way or another.

My first error message was:
dtrace: processing aborted: Abort due to systemic unresponsiveness

I managed to find a blog entry by Jon Haslam of Sun, that says that there is nothing you can do about this, other than disabling certain protections within DTrace. And generally speaking those protections are there for a reason, so it is not recommended to disable them.

Okay, maybe I was sampling too often, or matched too many probes, or something. As I say, DTrace never tells you WHAT caused the failure. So I cut down my script and instead of 20,000 probes I now match less than 100. Just 33 in fact. Does it run properly? No! Still the same problem:
dtrace: script './gpsmintrace.d' matched 33 probes
dtrace: processing aborted: Abort due to systemic unresponsiveness

The script only has 4 probes as far as I am concerned, even though DTrace ends up matching more than this:
pid$target:a.out::entry
pid$target:a.out::return
profile:::tick-5sec
dtrace:::END

So the entry and exit points of functions in the program file itself (a.out), every 5 seconds, and finally at the end of the script when it stops. How can these be causing "systemic unresponsiveness?".

I try different combinations - a different process, take out the tick-5sec probe, name the functions to match so only "11 probes" matched - and always the same result:
dtrace: processing aborted: Abort due to systemic unresponsiveness

At this point the only conclusion I can come to is that DTrace is useless. What else can I conclude? Every time I run my minimal D script it sits there with no output messages and then aborts. Why do I not see the output from the 5 second ticks? Why does it never work?

As you can imagine I am just about ready to give up on DTrace - forever. I have been through the trouble of designing and writing an initial D script to help investigate my application behaviour, and have requested the appropriate privileges from the system administrators - dtrace_user and dtrace_proc - but have achieved nothing from 2 days of continuous trying. After 2 days I have got zero information out of DTrace, useful or not.

I'm ready to throw in the towel and give up on DTrace as another piece of great but fundamentally flawed Sun technology when I have one more thought. Why are the 5 second tick probes never firing? Why do I never see any output from them? What if the tick probes never fired? Wouldn't my running totals in the aggregations get very large? Could this be causing some kind of overflow and DTrace to simply give up?

A quick Google on the terms "dtrace tick probe privilege" turns up an entry at http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02621.html with the same symptoms - the tick probe never fires. And the solution is confirmed in the reply http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02622.html - you need the "dtrace_kernel" privilege for the tick probes to fire and output anything. So I make another request to the system administrators for this privilege, and voila! My D script now works as it should do.

So it seems that to do anything useful with DTrace you will need the maximum privilege level of dtrace_kernel, regardless of what the manual tells you differently. Yes, to "use" the tick probe of the profile provider in a D script you need the "dtrace_user" privilege. However, such a defined tick probe will never fire unless you have the "dtrace_kernel" privilege. The "dtrace_user" privilege gives you the "right" to use a tick probe, but not the "access" to that probe at run time. You need the "dtrace_kernel" privilege too for the run time "access" to the tick probe.

I've a feeling that this is extremely poorly designed and documented by Sun. It is not documented in the standard DTrace manuals - I know I looked very closely at them on how to make the tick probe work. So it must be a common problem that people run into time and time again on production systems when trying to work with the least necessary privileges. Yet my Google search only found one explicit reference to this, on a forum posting by someone with the same problem. All the other hits on "dtrace tick probe privilege" were standard descriptions of using probes in DTrace. Nothing from Sun themselves on their own web sites.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 10 Mar 2010 19:28
I've been playing with an installation of OpenSolaris on my Intel x86 PC within a VirtualBox virtual machine. (VirtualBox is good, but I'll say more about it separately). The good news is that I managed to install OpenSolaris (2009.06) easily enough and get logged into it for normal use. The bad news is that OpenSolaris is not Solaris, which introduces a whole bunch of incompatability issues. And given that I have done this in order to install and test Oracle, it creates a whole series of problems that would not occur if it was really Solaris.

What is OpenSolaris?
The simplistic answer is that OpenSolaris is an open sourced subset of Solaris i.e. the subset that Sun owned the rights to and could open source. But that is not the real truth. The real truth is that OpenSolaris is the open sourced next generation version of Solaris i.e. Solaris 11. And indeed the version number of OpenSolaris is 11 (or 5.11 technically speaking). And Sun have changed a lot of administration side things between Solaris 10 and 11. And they are still changing things, as OpenSolaris is a work in progress by Sun. A lot can change even between releases of OpenSolaris. So remember that OpenSolaris is not Solaris. There is no mention yet of any level of compatability between Solaris 10 and OpenSolaris.

Clearly OpenSolaris shares the same kernel core as Solaris 10. But outside of that, in terms of the various applications that can be installed, things change. A lot. Solaris has always had its own "packages" installed via the "pkgadd" command. However it lacked a GUI (at least I was not aware of any before now), and did not handle dependencies between packages (A uses things from B, so B should be installed first). You also had to obtain the package distribution file yourself somehow - download or CD media - before it could be installed.

The major Linux distributions have all addressed these issues, so that installing most application software is just a case of firing up the GUI utility, clicking on the application name and then the "Install" button. OpenSolaris has now tried to do the same thing, via its Image Packaging System (IPS), which seems to share the same concepts as Synaptic in Ubuntu. There are repositories of packages, and the packages include lists of dependencies on other packages. Now you can install a package via a few simple clicks, and the package distribution is downloaded from the repository.

Nice. Except ... That there is only one repository, being Sun's own repository of its packages included as part of Solaris. So there is no third party software to choose from. And Sun continues to name all its packages with a very short name and a prefix of "SUNW". Would you know that "SUNWarc" were the "Lint Libraries"? Or that the "Solaris Bundled tools" were in package "SUNWsprot", but that "SUNWbtool" was in fact the "CCS tools bundled with SunOS". Not clear or straightforward at least.

Which leads to the second problem. Even if you know that there is a package containing a program or utility you need, finding out what the name of the package is can be quite hard work. There is not a lot of useful information on OpenSolaris yet that you can find easily via Google. And Sun has provided very little documentation.

Okay, I'll just use the existing Solaris 10 packages? No! OpenSolaris is not Solaris, and even though you could try to install a Solaris 10 package, and it might work, it is not supported and the results are not guaranteed. This is because OpenSolaris is really Solaris 11, and things can change in the kernel and interface libraries. So you should only use OpenSolaris packages, and not Solaris 10 ones.

Are there any other repositories? Of any shape or form? Yes. There is one at Blastwave. Can I use its packages? Yes, but they are very old. Clearly someone did port various Open Source to earlier releases of OpenSolaris, and these have been gathered together at Blastwave. But when I went looking for some things, I could only find 3 year old versions from 2007. In the end I downloaded the source code to the current version of the utility I needed and compiled it all myself.

Oracle 10gR2 Issues
First there is the issue of required packages, and I've already discussed the new IPS on OpenSolaris. For Oracle 10gR2, as per the Oracle 10gR2 on Solaris installation notes, you will need to install:
  • SUNWarc SUNWbtool SUNWhea SUNWlibm SUNWlibms SUNWsprot SUNWtoo
but not:
  • SUNWi1of SUNWi1cs SUNWi15cs SUNWxwfnt
This is because these are closed source font libraries, and Sun cannot open source them or include them in OpenSolaris under its license terms. But these will not cause an issue, as other fonts will be automatically used instead.

You also need to install SUNWmfrun, which is the Motif Runtime libraries. Yes, the Oracle 10gR2 graphical utilities use Motif behind the scenes. And Motif is no longer installed as standard on OpenSolaris.

Then can we install Oracle 10gR2? No, because the Oracle installer sees the operating system version as 11 (or rather 5.11 as given by "uname -r") and stops because it is not certified against it. Which is not surprising, as you can only certify against officially released operating systems, and OpenSolaris is still a work in progress. But we will assume that OpenSolaris (11) is fully compatible with Solaris 10, and use the "-IgnoreSysPrereqs" to make Oracle ignore this mismatch in the operating system version.

Then the installer will fail with missing Java libraries e.g.
Exception java.lang.UnsatisfiedLinkError:
/tmp/OraInstall<...>/jre/1.4.2/lib/i386/motif21/libmawt.so:
ld.so.1: java: fatal: libdps.so.5: open failed: No such file or directory occurred..

This is because the Oracle installer is using Java 1.4.2 which is shipped as part of the Oracle 10gR2 distribution itself, but OpenSolaris does not have the right bits for backwards compatability with this, hence the missing library message. Again we can use an installer command line option to tell it to use the current Java Runtime Environment on OpenSolaris (1.6), which will not be missing any bits it needs:
  • ./runInstaller -IgnoreSysPrereqs -jreLoc /usr/java/jre
And voila, an Oracle installer window eventually appears and we can proceed with the installation.

Well, again no. There were various errors about certain files not existing. It seems that the installer cannot create symbolic links between files. Using an Oracle installation on another system I was able to work out what these files should be linked to, and manually did this at the end of the installation.

And finally, Yes, I was able to start Oracle and create a database.
Author: "John Brady (noreply@blogger.com)" Tags: "install, 10g, opensolaris, virtualbox, s..."
Send by mail Print  Save  Delicious 
Date: Tuesday, 02 Mar 2010 19:34
Recently on a Solaris system I got the following disk statistics from “sar –d” (non-busy disks have been removed for clarity):
device  %busy   avque   r+w/s  blks/s  avwait  avserv
sd3 82 0.9 134 879 0.0 6.6
sd4 90 0.9 133 879 0.0 6.9
sd5 73 17.1 777 12366 0.0 22.0

You can draw conflicting conclusions from this data:
  • On the one hand disk “sd5” seems to be performing slower than disk “sd4”, at 22.0 milliseconds per disk I/O request from Solaris versus only 6.9 milliseconds for “sd4”.
  • But on the other hand “sd5” is clearly doing more work than “sd4” – 777 I/Os per second versus 133 I/Os per seconds (6 times more), and 12,366 blocks per second versus 879 (14 times more). Does this make “sd5” actually faster than “sd4” overall?
So which conclusion is right? Which of the two disks is actually faster than the other for the individual disk I/Os themselves, ignoring any time spent queuing before being executed? Clearly “sd5” will have longer queues than “sd4” because it is processing a far greater number of I/Os per second. This is confirmed by the average queue length value reported by “sar” – only 0.9 for “sd4” (less than 1 I/O at a time), while it is 17.1 for “sd5”.

I actually think that “sd5” is faster, given how many more disk I/Os it is doing per second, and the size of its average queue length. How can I prove this one way or the other? Well our old friend “Queuing Theory” can help, with its set of formulae describing how such things work.

A key point to realise is that modern disks have internal queues, and will accept more than one request from the operating system at a time. From the operating system’s perspective it can send a new I/O request to a disk before all the previous ones have finished. From the disk’s perspective it has an internal queue in front of the real disk, and the real disk can still only do one I/O at a time. We can see that this is the case in Solaris because the average queue length is 17.1 for “sd5”. Also the average wait time is 0.0 for all disks, because there is no waiting or queuing within Solaris, which is what this measures. Solaris was always able to immediately issue a new I/O request to the disk, and never exceeded any limit on concurrent requests to the disks.

So although “sd5” looks slow at 22.0 milliseconds service time, this is the full service time measured by Solaris, which includes any queuing time within the disk device itself. And with 17.1 concurrent requests on average, this could be quite a large queue, meaning that the 22.0 milliseconds reported by Solaris could include a significant amount of time waiting within the disk before the I/O was actually performed and the data returned.

Queuing Theory can help us “look inside the disk device” and see how big its queue is on average, and what the “real service time” of an I/O is within the disk when it performs it.

What do we know about the disks behaviour?
  • Average completed requests per second are 777 for sd5 and 133 for sd4.
  • External service times are 22.0 ms for sd5 and 6.9 ms for sd4
  • Average requests in the disk device are 17.1 for sd5, and 0.9 for sd4
Even from this we should be able to see that 22.0 ms does not make sense for individual disk access times on sd5, because it managed to do 777 of them per second. Assuming that the disk was 100% busy during a one second interval, if it did 777 I/Os then each must have taken less than 1 / 777 = 0.001287 = 1.287 milliseconds. Which further confirms that the 22.0 ms reported by Solaris is mainly queuing time within the disk device itself.

We would like to know the actual service time within the disk, separate from the queue time within the disk. We can use a formula from Queuing Theory for this:
  • S = R / (1 + N)
For this we need to know the response time from outside the disk i.e. from Solaris, and the number of overlapping concurrent requests on average (queue length) submitted to the disk. We have both of these from sar:
  • sd5: S = 22.0 / (1 + 17.1) = 22.0 / 18.1 = 1.215 milliseconds
  • sd4: S = 6.9 / (1 + 0.9) = 6.9 / 1.9 = 3.632 milliseconds
There we have it – disk sd5 has a far lower true service time than that of sd4. sd5 is actually almost 3 times faster than sd4 at performing each individual disk access! It is just the large queue of outstanding requests that causes the total disk access time as measured from Solaris to be so high at 22.0. We can now see that on average each disk I/O to sd5 spends (22.0 – 1.215) or 20.785 milliseconds waiting within the internal disk device queue before it is then executed, which then takes only 1.215 milliseconds.

In terms of the utilisation of each disk, the Queuing Theory formula is U = X * S, so:
  • sd5: U = 777 * 0.001215 = 0.944 = 94.4%
  • sd4: U = 133 * 0.003632 = 0.483 = 48.3%
This indicates that disk “sd5” is operating at a high utilisation level, and any increases in utilisation will lead to exponential increases in response time (service time as measured by Solaris). Disk “sd4” however is at just less than 50% utilisation, which correlates with the average queue length being just under 1 (0.9).

In this scenario I would suggest trying to move some of the I/O workload off “sd5” and onto some other disks somehow. Any reduction in the workload on “sd5” would dramatically reduce the number of concurrent requests (average queue length) and so dramatically reduce the service time as measured by Solaris. In other words, “sd5” is a hot and busy disk.
Author: "John Brady (noreply@blogger.com)" Tags: "queuing theory"
Send by mail Print  Save  Delicious 
Date: Wednesday, 16 Dec 2009 10:57
Sometimes you want to delete or update rows in one table based on the existence of matching rows in another table. Examples include building up a list of records to be deleted in another table, or new data values being first loaded into a staging table before updating the main data set. I call this a Correlated Delete or Update, because the rows affected in the main table must correlate with matching rows in the other table. There are different ways we can write such correlated actions in Oracle. Unfortunately one way results in a very poor execution plan and incredibly bad performance – several hours estimated in one example – while the other way might take only a second or two. So it can be very useful to know which is which, and why.

This scenario is easiest seen with a Delete i.e. Delete from tableA where matching rows exist in tableB. We might naturally think of this as being a kind of join between tableA and tableB, assuming tableB has a foreign key to the primary key of tableA. It turns out that Sybase has implemented an extension to its DELETE and UPDATE statements that lets us use join syntax to specify this kind of correlated action, with an additional ‘from’ clause. In Sybase our delete would be:
delete tableA from tableA, tableB where tableA.pkey = tableB.fkey

Unfortunately this is an extension to the ANSI SQL syntax, and Oracle does not have an equivalent syntax. So in Oracle we can only refer to one table in the main table, and need to use sub-queries to refer to the other tables. One way I came across the other day to do this is:
delete from tableA
where exists (select 1 from tableB where tableA.pkey = tableB.fkey)

On the face of it this is correct - we only delete rows in tableA that have a match in tableB. Unfortunately it suffers from terrible performance. In the case I came across I saw that Oracle would take 3 hours to scan tableA (table names changed from their original ones):
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 33 | 1075K (2)| 03:35:07 |
| 1 | DELETE | A | | | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | A | 320M| 9G| 1072K (2)| 03:34:30 |
|* 4 | TABLE ACCESS BY INDEX ROWID| B | 1 | 82 | 0 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | X1_B | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

This is because the sub-query is correlated – it refers to the outer table (tableA) and so must be executed for each row of tableA from the outer, main query. This results in a full table scan of tableA, and then a join to tableB for the correlated sub-query. If tableA is large and tableB is small with a list of rows to delete, then the performance is very bad indeed.

The solution is to rewrite the sub-query so that it is not correlated, and we can do that using IN i.e. where a row in tableA has a matching row IN tableB. The delete then becomes:
delete from tableA where (tableA.pkey) IN (select tableB.fkey from tableB)

The meaning of this is exactly the same as the other delete, but now the sub-query is not correlated. Oracle can now choose an execution plan that scans tableB to produce a set of rows, and join to tableA using an index. This executes much faster, as you would expect.

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 113 | 6 (17)| 00:00:01 |
| 1 | DELETE | A | | | | |
|* 2 | FILTER | | | | | |
| 3 | NESTED LOOPS | | 1 | 113 | 6 (17)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 80 | 2 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | B | 1 | 80 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | X1_A | 1 | 33 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

As you can see the estimated cost has reduced from over 1 million to just 6, and the estimated elapsed time from 3 and a half hours to just one second! A major improvement. Obviously this improvement is down to the fact that tableB is at least 1,000 times smaller than tableA.

Either way, I believe that using IN is a better way of phrasing the sub-query, because it is clearer to Oracle which way round the relationship is (A depends on B, not vice versa), and gives the optimizer more flexibility of how to execute the SQL statement. If tableB were big enough with respect to tableA, then there is no reason why the optimizer could not go back to the original execution plan – scanning A and joining to B.

Where it really makes a difference is where you have other conditions on tableB within the sub-query – not just the foreign key join to tableA. When using EXISTS, Oracle will ignore extra indexes on tableB and treat the correlated sub-query in the same way – check tableB for each row in tableA. When using IN, Oracle can take advantage of extra indexes on tableB if they help with other conditions on columns of tableB in its sub-query. Thus it can use an index for the initial access to data in tableB, and then use the primary key index into tableA. This results in efficient execution plans for the DELETE.

The problem also occurs with Updates, but is compounded by needing to refer to the same table twice within the update statement. Again, the Sybase syntax is much simpler and more straightforward:

update tableA
set tableA.column1 = tableB.column1, tableA.column2 = tableB.column2
from tableA, tableB
where tableA.pkey = tableB.fkey

In Oracle we need to use a sub-query for tableB in the WHERE clause in the same way as for the Delete statement, and we also need a sub-query for tableB in the SET clause so that it is updated to the values in the matching row in tableB. This is important – without both sub-queries we would either update the wrong rows (WHERE clause problem) or update them to the wrong values (SET clause problem). The Oracle equivalent, using the same rewrite as before is:

update tableA
set (column1, column2) = (select tableB.column1, tableB.column2
from tableB where tableA.pkey = tableB.fkey)
where pkey in (select fkey from tableB)

As already mentioned this particular update will perform well when executed. Typically Oracle will scan tableB if it is much smaller for the sub-query, then join to tableA on its key columns assuming there is an index on them, and then join back to tableB to get the values to update the columns to.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 19 Oct 2009 21:11
Previously I have looked at the challenges involved in being able to support a changing database design in an application development environment, which were:
  1. Support change at any time
  2. Ensure database designed and modelled properly
  3. Record each individual change, to enable metamorphosis of older databases
  4. Allow multiple versions or branches of the Database Design to exist, and be changed individually or collectively
Although I started off on this problem of database structure change from the context of the need for it when using Agile development methods, the actual problem is not unique or specific to Agile. It is a generic problem to the development of any application that involves the use of a relational database to store its data in.

What I now want to do is list the key features needed in a solution to this problem which, if all assembled together, would deliver a viable solution to this problem.
  • With Application Source Code we only care about the final state it ends up in. But with a Database Design we care about each individual change, and what gets changed. We need to record each of these Databse Changes individually, so that they can be applied to other instances of that database, as well as recording the final state of the database design itself in the Database Model.
  • Changes are physically made to a Database Instance by executing appropriate SQL statements. This is the only way to make a change to an existing database instance.
  • Each "Database Change" should be an atomic change to a database - it either completes successfully or not at all. Changes cannot be partially implemented on a database instance.
  • The Database Changes must be executed in the correct sequence on each database instance, for repeatability and due to any dependencies between the changes.
  • Changes to the Database Design should be formally Requested, and these requests recorded somewhere to provide an audit trail - Who, When, What Changed, Where in the Database.
  • Changes must be reviewed and approved by the Data Modeller, to ensure a good, scalable design. The Data Modeller is responsible for updating the Database Model - the record of the Database Design - and producing the SQL statement that implements that change. Some of this could be done automatically using appropriate tools, or manually. All that matters is that it does happen.
  • The Database Designer or Data Modelling role now becomes a part time role during the whole lifecycle of the application's development, instead of being a full time role only in the first phase of its development. It is now likely that all development cycles will involve the Database Designer, who becomes a central and critical member of the development team. The role itself, however, could be fulfilled by someone who has other roles too within the development cycle, or shared amongst a number of people, because it is not a full time role. This could be someone who is also a developer or a database administrator.
  • Each Database Change is given its own unique Identifier, which must be globally unique over all the other Database Changes. This identifies the particular Database Change, regardless of which Database Instances it is applied to, or which versions of the Application it appears in.
  • Changes to a particular Database Instance must be recorded within the Database Instance itself. This is the only way to avoid applying the same changes twice to a Database Instance. This implies some form of Change History table within each Database Instance.
  • This Change History table needs to record the Change Identifer for those changes applied, and the date and time when it was done. Other data could also be recorded, but these two are the minimum required.
  • The set of changes should be stored in a single file, often termed the Change Log. Each entry in this will include the Change Identifier and the SQL statement that implements that change. All necessary changes can be easily located and sequenced this way.
  • This Change Log needs to be well structured, so that a program can read and decode the Changes in it and execute the SQL statements for changes not yet applied to a given Database Instance. This is probably best done using XML.
  • There must be separate instances of the Database Model / Database Design and the Change Log file in each version / branch of the Application source code. This allows changes to be made independently to each branch of the Application.
  • In reality SQL statements are often specific to one database product, such as Oracle, Sybase, SQL*Server or MySQL. The Change Log will need to record the separate SQL statements for each supported database product for each Database Change. Thus each Change will have an Identifier, and a SQL statement per database product, clearly labelled with the database product name.
  • The whole solution is brought together in a program that is run during an Application Upgrade. It opens the Change Log XML file and reads in each Change. If that Change has not yet been applied to this particular Database Instance, then the corresponding SQL statement is executed and a record inserted into the Change History metadata table.
This is my interpretation of how to solve the problem of a changing database design during application development. But it turns out of course that others have been through exactly the same process before me and come up with exactly the same answers. And in fact, once I had thoroughly understood the nature of the problem, I could immediately see that these other people too had understood it and had arrived at pretty much the same conclusions I would. So although I did work through the problem from first principles myself, simply because that is the best way for me to fully understand a topic and appreciate whether any "solution" is right or not, I have no doubt borrowed from some of the material I have read in the way that I have described this solution here.

I still find some of the descriptions I found on the Web around Database Design, Changes and Agile Development unclear around this particular problem of implementing changes to a database design during application development. Quite a lot discuss the need to change the database design, and how to model different types of changes (refactoring is often mentioned). But almost none discuss how such changes get implemented in real deployed databases, and how you maintain a database over the long term with a series of changes to the application that uses it. None of them covered enough of it at once to leave me feeling that I had a real solution I could go out and apply. Hence my need to work through the problem from first principles for myself, to get the full picture as it were.

Again, one article that did help to clarify the nature of the problem a lot for me was Rethinking Agility in Databases: Evolution from Hexagon Software. This brought home the message that Databases must be treated differently to Applications in how you change them.

And the Change Log file of SQL statements was explicitly mentioned by Peter Schuh and Pramod Sadalage, who have put this into practice themselves on large projects. See Agility and the Database by Peter Schuh for instance.

So although I have not invented anything new here that has not already been described by other people elsewhere, in one form or another, I have tried to bring together all of the necessary ingredients for a solution in a coherent manner. As I said, this was my way of working through the database design problem to arrive at a solution that I fully understood and felt able to go out and apply in the real world.

I may revisit this again and try and outline a working solution to this i.e. what would need to be delivered to implement the kind of solution I have described here. Anybody reading this and want to know more?

Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Next page
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader