• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Wednesday, 26 Mar 2014 18:39
The background to this - I was "given" a free license to Red Gate Source Control for Oracle some months ago as part of some marketing and awareness activity Red Gate were doing. I've been busy with other things, so I've only now got around to trying to understand what the product does and see if it can be of any benefit to me. Hence this review.

Before I start my review I want to make two things clear. The first thing is about how you treat Databases in a software development world. Databases are fundamentally different to Application Software. Whereas Application Software is normally rebuilt each time by compiling it from its source code, Databases are never rebuilt - they are always modified in place. A customer's data cannot be thrown away and replaced by a new empty database. Changes to the database must happen in place, in an incremental way. To put it another way, Application Software evolves between each release, with each release being a next generation that replaces the one before it. Databases though metamorphose between releases, with changes being applied directly to the database so that it changes in place and existing data is retained.

This means that a Database must be maintained through a series of "change scripts" that make incremental changes to the database, and these change scripts must be run in a controlled sequence when upgrading the database from one version of its associated Application Software to another version. I would expect any "Database Source Code Control" tool to help in the production of these "database change scripts", and to help manage their check in to the source code tree.

The second point is that I am only interested in source code control tools that truly track "changes" to source code files. I want to use something that knows what changed in a file, and can use this to do things like merges between different branches in a sensible way. I don't want something that just tracks the final contents of each file, and can reverse engineer what changed between two versions by working out the differences after the event. I've used Mercurial and I know that it works this way. And I'm also aware that Git works in a similar way by tracking changes to files. These are the main tools that truly track changes (there may be a few other ones as well), but it means that most other source code control tools do not track changes in the same way. Tracking changes to files is a very powerful way of doing source code control, and enables a whole bunch of advanced features around propagating changes between versions, and merging changes together.

Red Gate Source Control for Oracle

Before I tried to use the product I thought I would read up on it to understand exactly what it thought it was capable of, and how I would need to configure it to get out of it what I wanted. The first disappointment was the slightly weak product information on their web site, which was of the generic "we can do it" type without being specific about what "it" actually was that it did. But then I guess that is marketing for you.

What I did get from the product web page was that it only works with the source code control tools Subversion (SVN) or Team Foundation Server (TFS) from Microsoft. This is the first major shortcoming for this product. I'm not going to stop using Mercurial, which I find to be really good at what it does.

After that I did manage to find documentation for the product online, via the Support link on the Red Gate web site, which was good enough at describing how to use it. It seems that it will generate a set of SQL script files containing the DDL SQL statements to create the database you are using i.e. it reverse engineers from your current database the SQL that would be needed to recreate it. And having generated these SQL files it will then check them into your source code tree for you. When you change the structure of your database in any way, you can make it do a refresh and it will re-generate these SQL files, and re-upload them to your source code control tool. This means that it will update those SQL files that have changed in some way since the last source code update.

This was pretty much confirmed in a review by Lewis Cunningham, which described using Red Gate Source Control for Oracle and the series of SQL files it generated.

Which brings me to the second major shortcoming of the product - as Lewis notes "It generated create scripts rather than alters". Remember the point I made at the very start about deployed production databases must metamorphose in place, keeping the existing data inside them. This means that databases must be maintained through a series of "change scripts" that alter existing database structures to update them. We do not want just another set of "create" scripts that will create a new empty database. This is confirmed in a comment in Lewis's post by a Red Gate employee:-
I am one of the developers ... You were wondering ... how you might get the alter scripts. ... You can checkout a version of the database from SVN and Schema Compare can compare it to a live database and generate the alters you need to deploy the new version.
He is saying that Red Gate Source Code Control for Oracle will not do the "alters" for you, but Red Gate have another product called Schema Compare that will produce the "alters" for you.

Which leads on to the third shortcoming with this tool. How did my local test database get "changed" in the first place, so that Red Gate Source Control for Oracle could detect this change? If the Red Gate tool did not produce the SQL that I ran to change my local database, then it must have come from somewhere else. Most likely it came from another tool, or from SQL statements I wrote myself manually and ran against the database.

Now, if I've already got the SQL statements that "alter" the database structure, and I am sensible enough to put these into a file before running them against my local test database, then I already have the necessary "change script" that updates my database in place. And I could just check in this "change script" directly to my source code tree, and not bother with Red Gate Source Control for Oracle at all.

Summary

To summarise what Red Gate Source Control for Oracle does:
  • Red Gate Source Control for Oracle only works with SVN and TFS
  • It will not produce "alter" SQL scripts to run against a database - only "create" SQL scripts
  • You must manually or otherwise create these "alter" SQL scripts yourself to change your local database
  • It only detects changes to a database after you have already run your own "alter" SQL to change it
As a result of its limitations I didn't get around to using it because it would not work with my preferred source code control tool (Mercurial), and it would not deliver what I really needed anyway (database "change scripts").

The "holy grail" of database development tools for me would be one that helped produce the "alter" SQL statements in the first place, as a series of "change scripts". I could then test these "change scripts" locally in my test database, and if happy with the results check them into the source code tree. Unfortunately this tool from Red Gate does not do this at all, and does something completely different - only generating a complete set of "create" SQL scripts all the time, and only after you have already changed the structure of your local test database somehow.

Furthermore, I'm not sure what this tool really delivers beyond running "DBMS_METADATA.GET_DDL" (or the equivalent of it) in a loop to get the SQL DDL statements for each object in the database. I've already got my own set of scripts that do just this. The only thing Red Gate seems to add is a nice GUI in front of it all, and some integration to two source code control tools.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 11 Mar 2014 18:55
There are a number of techniques you can use to speed up an Oracle Import, some of which I'll describe here. This is not any attempt at a comprehensive list, just some of the main techniques I have used that can really speed up some parts of an import. I've seen a standard import come down from 2.5 hours to about 0.75 hours using these techniques.

The first thing to realise is that an exported dump file is a "logical" dump of the database - it contains the SQL statements necessary to recreate all the database structures, and the data records to be loaded into each table. The import works by executing these SQL statements against the target database, thus recreating the contents of the original database that was exported. We can leverage this to our advantage in various ways.

The objects and their corresponding SQL are in a certain order in the exported dump file, and they are imported in a strict sequence by the "impdp" utility. This order is roughly the following for the main objects (it is not a complete or absolute list, as I have simplified some of the details a bit):
  • Sequences
  • Tables - Creation, but no data
  • Table Data
  • Indexes
  • Constraints - Check & Referential
  • Packages, Functions, Procedures
  • Views
  • Triggers
  • Functional and Bitmap Indexes
  • Materialized Views
When you run a full import, each of these sets of objects is done in turn, one after the other. This means that the import is not really parallelised. There is a PARALLEL option, but this only really affects the "Table Data" section, and allows multiple slave readers of the table data to run INSERT statements i.e. different tables can be loaded at the same time, but that is all. The PARALLEL option does not affect other options, such as index builds, which is a shame.

 

Its all about Disk Bandwidth

Before I get into specific techniques, I want to point out that an Oracle Import is fundamentally a disk I/O bound task. The techniques I'll be describing will be using parallelism of various forms to get more work done at the same time on the system, to reduce the total elapsed time. And this in turn will produce more disk I/O operations running at the same time. So you will ultimately be limited by the system's ability to get data off and on to the disks, rather than being limited by the amount of CPU or Memory it has in it.

Consider how each of the object types in turn is created within the database during the import:
  • The initial table data load is mainly disk I/O - read all the data records from the dump file and INSERT
  • Index creation is mainly disk I/O - read all the data in the table, sort it for the index, write the index to disk
  • Constraints are mainly disk I/O - read all the data in one table and verify value or existence in another table
  • Materialized Views involve executing the SELECT to populate the MV on disk
Yes, some CPU will be used (certainly for data load and index creation), but the CPU's must each be fed with data from disk to keep them busy. So CPU utilisation can only go up by increasing the disk bandwidth used.

In my experience it is the index creation and the referential integrity constraints that take the longest elapsed time during imports, due to their need to always read all the data records in a table.

 

Divide And Conquer

The main technique I use to speed up an import is to import each object type separately via its own "impdp" command, using the "INCLUDE" option to only do those type of objects. While this does not directly speed up the import itself, it sets the foundation for being able to do so. Now we have a series of individual import commands to import the whole database, we can then sub-divide these up within themselves.

So instead of loading data into all the tables in one command, which does them sequentially one after the other, we can run separate import commands at the same time for each of the largest tables, and run another import command for all the other tables (using the EXCLUDE option to omit the largest tables). Now we have a high level of parallelism within the table data load, which we can fine tune based on the relative sizes of our tables and the disk I/O bandwidth available on the system.

 

Parallelizing Indexes

We could use a similar technique on indexes of doing different tables at the same time, but this can have other issues to deal with e.g. an increase in the amount of temporary tablespace storage used. Also you will probably be limited by the time taken for all of the indexes on the largest table in the database, which would be built serially one after the other.

Instead what I do is use the "SQLFILE" option of impdp with the "INCLUDE=INDEX" option to get the "CREATE INDEX" statements into a single SQL file. Then I edit this file and add a "PARALLEL" clause to the end of each "CREATE INDEX". In fact, import actually puts a "PARALLEL 1" on the end of each "CREATE INDEX" already (I'm using 11.2.0.3.0), so I only need to do a global replace of this by the degree of parallelism I want. You probably want something like the same as the number of disks you have, but you can experiment to get the best value.

Once edited I can then create the indexes by running this SQL script from SQL*Plus, after all the tables have had their data loaded in. As each index is being created using parallelism, each will be maximising the resources on the system, and we can just let it build one index at a time. There may not be any benefit from trying to split the indexes into two separate SQL files and running them both at the same time.

Note that you can create and edit this index SQL script as soon as you have the dump file you want to import. You can do this while the table data is being loaded, so that the index SQL script is ready before the tables have finished loading. This way there is no need for a gap between the end of the table data being loaded and the index creation starting.

 

Skipping Constraints

As mentioned, constraints involve reading all of the data in a table to validate that it is correct against the specified constraint. If we are importing a complete database from another system, and we know for certain that all of the data satisfies all of the constraints in that database, then we can basically skip validating the constraints completely. The speed up from this is quite significant, as we completely avoid enormous amounts of disk I/O for every table in the database, but can still end up with all the constraints in place.

As for indexes I use the SQLFILE option to produce a file containing the SQL for the constraints. I then edit this file and add "NOVALIDATE" to the end of each constraint statement. This means that Oracle will create the constraint, which is what we want, but it will not validate the data i.e. it will assume that the data is valid and not explicitly check it, which is also what we want. In fact each constraint SQL statement ends in the word "ENABLE", so we can do a global replace of this by "ENABLE NOVALIDATE", which is easy.

Again, as for indexes, we can produce and edit this SQL script as soon as we have the dump file, and then run it at the appropriate point in the steps being used to import the database.

 

Export

I've discussed the import side of things so far, as that is where you often want the speed up. But you can apply similar principles to the export of the database, using divide and conquer to run multiple exports at the same time. Again this will increase the disk I/O operations occurring at the same time, but will reduce the total elapsed time of the export. Some of the techniques include the following:
  • Do an export of just the metadata i.e. just the SQL to create the objects, using "CONTENT=METADATA_ONLY"
  • Export the data from the biggest tables individually to their own dump files, using the "TABLES" option to list each table
  • Export the data from all the other tables to a dump file, using the "EXCLUDE=TABLE:" option
    • Beware that the "EXCLUDE" option has a slightly weird syntax where you need to embed quotes around the list of table names

 

Full Database Migration

If you are using export and import to migrate a production database from one system to another, then you can combine all of these techniques together to minimise the total elapsed time before the second database is up and available.
  • Once the metadata export has finished, transfer it to the target system over the network
    • Then create the empty tables, and produce and edit the SQL scripts for indexes and constraints
  • As each table data export finishes, start its transfer over the network to the target system
    • And as each data dump file arrives on the target system start its load
  • Once data load has completed, start creating the indexes, followed by the other steps
This overlaps execution of the export, transfer of the dump files over the network, and import on all of the systems at the same time.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 17 Feb 2014 19:52
To try and keep a longer story as short as possible I needed to package up the Python run time environment to ship along with a Python based application I had written. And in this case the target platform was Windows, though the solution will also work for Linux or any other platform (however most Linux distributions will already have Python on them). I needed to ship Python itself with my application to guarantee that the application would be able to run (had everything it needed), and to avoid complications of requiring the customer to download and install Python themselves (potential issues over version compatibility).

Through a number of blog posts by other people about different "packaging" techniques (see References later) I came up with the following solution that works. This is not the only method of packaging a Python application, and indeed it is quite surprising how many different techniques there are. But this worked for me, and was what I wanted i.e. including Python itself with my Python based application. One of the neat benefits of this for me is that the whole Python run time I need is only 7.5 MB in size, and the main ZIP file of the run time environment is only 2.5 MB in size, which shows how compressable it all is.

Packaging Python with my application

First I create a directory (folder for Windows people) for my application, and put all my application's Python files in there.

Then I create a sub-directory in this to put Python itself into e.g. Python33_Win.

Into this I put the following files:
_socket.pyd
cx_Oracle.pyd
msvcr100.dll
pyexpat.pyd
python.exe
python33.dll
python33.zip
LICENSE.txt
Note that "cx_Oracle.pyd" is needed because my application makes a connection to an Oracle database to do its work. Also "msvcr100.dll" is technically a Microsoft DLL that is needed by programs written in C, which the Python interpreter is. Microsoft allows this DLL to be copied for the purpose of running such C based programs.

The "python33.zip" is something I created, and into which I put the rest of the necessary Python run time files. There are quite a lot of these, all taken from the directory where you installed Python on your own system:
  • All the ".py" files in the top level Python folder
  • The following folders including any sub-folders:-
    • collections
    • concurrent
    • ctypes
    • curses
    • dbm
    • distutils
    • email
    • encodings
    • html
    • http
    • importlib
    • logging
    • pydoc_data
    • site-packages
    • unittest
    • urllib
    • venv
    • wsgiref
    • xml
Then I wrote a wrapper script to run my application via the Python run time included. In this case it is a Windows batch script, and it exists in the folder above my application source code. My application needs two command line arguments provided e.g. user name and password.
MyAppName\Python33_Win\python MyAppName\myappname.py %1 %2
That's it, and it works.

How does it work.

Built into the Python interpreter i.e. into "python.exe", is clearly the functionality to dynamically load into itself various libraries it needs at run time. An example of these are the "*.pyd" files explicitly included in the Python directory. However, it also has the functionality to open up a ZIP file and look inside that for the libraries it needs. Thus we can take most of the Python run time environment files and put them into a PYTHON33.ZIP file, and Python will look in here to find the files it needs. Which makes packaging up Python pretty simple.

The exceptions to this are the "python33.dll" and Microsoft C DLL, and a few PYD files (which are Python Dynamic libraries, a bit like Windows DLL's). These will not be found in the ZIP file, as they seem to be needed before Python gets around to opening such a ZIP file.

Further Notes

  • These files and directories are what I needed to make my application work. The particular set needed can be different for each Python based application, depending on what modules you import into your program at run time.
  • I tried using the "modulefinder" module to report on which files were actually being referenced at run time when my application was being run. This helped reduce down the total number of files I needed to include in the ZIP file.
  • The ZIP file is named "python33.zip" because I am using Python version 3.3, and its DLL is named "python33.dll" i.e. it looks for a ZIP file with a matching name of its version.

References

The main articles I found that helped me and pointed out this way of packaging up Python were:
  • Distributing a Python Embedding Program which states that everything can be put into a ZIP file, except for some special files, and gives an example where those files are listed.
  • How to Distribute Commercial Python Applications which describes the different packaging options available for Python based applications, and concludes that "For complex scenarios, my recommendation is to go with bundling CPython yourself and steer clear of the freezers".
    • It does not however tell you how to "bundle CPython yourself" - it just states that it is possible to do it.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 23 Jan 2014 20:12
Some time ago I dealt with a system that was experiencing high "log file sync" times, as well as other RAC related waits ("gc buffer busy" and "gc cr block busy"). Initially I assumed that the "log file sync" waits and the RAC Global Cache waits were independent of each other, but it turned out that they were interlinked. This post is mainly to make other people aware that on a RAC system there is a relationship between high inter-node block sharing traffic (Global Cache) and "log file sync", and that the log file sync events are a direct side effect of the high inter-node block sharing that is occurring. I'll also provide some evidence to support this in terms of AWR reports from the system.

The system was a 2 node RAC cluster running 10.2.0.4.0. An AWR report from an hour of peak day time activity showed:
Elapsed:   59.58 (mins)    
DB Time: 469.97 (mins)
So a database busy time of nearly 480 minutes in an elapsed period of nearly 60 minutes, meaning an average of almost 8 active sessions during that period. This is high for a system with only 4 CPUs. Theoretically there would only be 240 minutes of CPU capacity available on a 4 CPU node in a 60 minute period.

The top 5 wait events were:
Event                     Waits  Time(s)  Avg Wait(ms) % Total Call Time  Wait Class
CPU time 8,450 30.0
log file sync 77,234 4,930 64 17.5 Commit
gc buffer busy 45,532 4,902 108 17.4 Cluster
db file sequential read 269,971 1,587 6 5.6 User I/O
gc cr block busy 45,388 1,401 31 5.0 Cluster
8,450 seconds is about 140 minutes. This is just over half of the 240 minutes of CPU capacity available on the 4 CPU node, indicating that waiting is excessive i.e. the total Database Time is made up of more waiting than actively executing SQL statements. And we can see that the other 4 top wait events add to over 12,800 seconds, which is far more than the CPU time - again more waiting than doing real work.

As I said I initially focussed on trying to tune the "log file sync" events, but this lead nowhere. However a blog post I read when Googling this combination (sorry I cannot seem to find the link now) stated that in fact the log file sync is a consequence of the "gc buffer busy" event, and that there are other clues in the RAC specific section of the AWR report. And indeed on looking in the "RAC Statistics" under "Global Cache and Enqueue Services - Workload Characteristics" I saw the following:
Avg global enqueue get time (ms):                       0.6
Avg global cache cr block receive time (ms): 11.2
Avg global cache current block receive time (ms): 7.2
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.2
Global cache log flushes for cr blocks served %: 10.8
Avg global cache cr block flush time (ms): 72.4
Avg global cache current block pin time (ms): 0.1
Avg global cache current block send time (ms): 0.2
Global cache log flushes for current blocks served %: 0.5
Avg global cache current block flush time (ms): 227.5
The stand out statistic is the last one - 227.5 milliseconds on average to flush a "global cache current block". This is way too high. I would expect an inter-node network message to be of the order of a few milliseconds, and anything involving disk writes to be of the order of tens of milliseconds. So hundreds of milliseconds is way too high.

Further digging around revealed that in some cases a Global Cache request for a block from another node may cause that node to flush its work so far to its redo log before the block is copied across to the other node i.e. a "gc buffer" request from one node may cause a "log file sync" to occur on the other node. And I found this confirmed in this Pythian Blog post on Tuning "log file sync" Event Waits
In a RAC cluster, this problem is magnified, since both CR- and CUR-mode buffers need the log flush to complete before these blocks can be transferred to another instance’s cache.
At this point I switched focus onto the causes of those Global Cache requests and the high inter-node traffic.

In the "SQL Statistics" section of the AWR report is "SQL Ordered by Cluster Wait Time", and I could see a set of SQL statements all acting on the same table and with very high Cluster Wait Times as a percentage of elapsed execution time (CWT%). The percentage cluster wait time was around 85% for all of these related SQL statements i.e. it was spending almost 6 times longer waiting than doing the actual work of the SQL statement.

All of these SQL statements acted on one table that was being both heavily inserted into, and selected from. This is kind of the classic "hot table" or "hot block" problem seen in some OLTP applications on RAC databases. This was further confirmed by the "Segments Statistics" section of the AWR report, in "Segments by Global Cache Buffer Busy". The only objects mentioned are the one table and some of the indexes on it.

In this case there were 2 suggestions I could make:
  1. Make the primary key unique index a Reverse Key index, so that new records would not be inserted into the same last index leaf block.
    • The primary key on this table was a generated sequential number, so it will always be increasing in value, and inserted at the end of a normal index.
  2. Create a new multi-column index on the table that would better satisfy most of the queries being run.
    • The current execution plan used a less selective index and had to also retrieve a number of data blocks from the table to further filter the data according to the constraints in the "WHERE" clause.
These 2 changes reduced both the number of data blocks being accessed in the table for each query execution, and reduced specific contention on the last block in the primary key index. The net effect was to reduce the total number of blocks being continually shared between the nodes in the Global Cache for these SQL statements.

Suffice it to say that performance improved significantly and the users were happy after these changes. The "Avg global cache current block flush time" came down to under 5 milliseconds, which is a significant reduction. The original high wait events all but disappeared, so that disk I/O became the top wait event, and the total Database Time went down significantly because it was no longer waiting so much and so was executing more efficiently.

The key lesson for me was that on a RAC database high inter-node global cache traffic can give rise to redo log buffer flushes and consequently high "log file sync" wait times. And this is primarily due to a high volume of blocks being continually shared between the nodes in the cluster. Rather than trying to directly address the log file sync event, you instead need to address the global cache buffer sharing and reduce the total number of blocks being shared somehow. In my case I was able to create a better index that enabled the execution to access fewer data blocks in total, reducing the global cache traffic volume and contention for hot blocks.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 14 Jan 2014 19:17
I needed to "play" with Oracle VM and learn how to use it for some upcoming work involving virtual machines. Oracle VM (not VirtualBox) is virtualization software that installs directly onto a system i.e. it installs onto "bare metal" and does not need a host operating system. The problem I had was that I had no "spare" unused system on which I could install Oracle VM. I really wanted to be able to install it within a host environment as a kind of test sandpit. Which lead me to thinking about VirtualBox - Oracle's other virtualization software. VirtualBox runs inside a host operating system and creates virtual machines in which you can then install other operating systems i.e. it provides virtual machines that each look like an x86 system. Which means that in principle I could use VirtualBox to create a virtual machine, in which I would them install Oracle VM.

And indeed this does work, but there are a couple of important gotchas to be aware of:
  • I was using Oracle VM 3.2 which also needs a separate system for Oracle VM Manager to run on
    • So I created 2 virtual machines in VirtualBox
  • The Oracle VM virtual machine should be quite large e.g. 8 GB of memory
    • As you will then want to create virtual machines within this
  • The Oracle VM virtual machine will need additional disk storage which can be used for its virtual machines
    • I created the VirtualBox VM with a 5 GB disk for installing Oracle VM itself onto
    • And a second 100 GB virtual disk to use for Oracle VM virtual machines later on
  • Oracle VM and Oracle VM Manager must be on the same network as each other for communication
    • I added a second network adaptor to each virtual machine on an Internal Network (vmnet)
  • This second network adaptor must have Promiscuous Mode enabled i.e set to "Allow All"
    • The default of "Deny" will not allow Oracle VM to communicate successfully with Oracle VM Manager
  • VirtualBox does not emulate the Intel VT-x virtualization instructions in its virtual machines
    • So within Oracle VM you can only do Paravirtualized Machines (PVM)
    • So I used Oracle Enterprise Linux has it has suitable paravirtualized drivers in it for use in a PVM
    • You cannot create a Hardware Virtualized Machine (HVM) as this requires the Intel VT-x in the CPU used
    • This is a limitation of VirtualBox, and nothing to do with the real CPU in your physical system on which your host operating system is running
  • If you want a virtual machine running under Oracle VM to be able to network to the outside world you need to configure something else in the network somewhere. There are different options for doing this, depending on what you want and how much change you can do at your network level.
    • You could use the Bridged Adaptor type of network adaptor in VirtualBox, but this means that each virtual machine is visible on your public network and its IP address and hostname must not clash with any other existing system.
    • If you only want internet access for web browsing and downloading operating system updates then you can use the Oracle VM Manager system as an intermediate. It may seem complicated but it worked for me. There are 2 parts:
      • Configure and run some DNS (Domain Name System) software on the Oracle VM Manager system, to let it act as a DNS server to the Oracle VM virtual machines. I used "Dnsmasq" which is simple and straightforward, and can act as a bridge out to cascade out to another DNS server. On the Oracle VM virtual machine configure it to use your Oracle VM Manager system as its DNS server.
      • Configure the Oracle VM Manager to do NAT (Network Address Translation), which involves a few steps to configure "iptables" to do packet forwarding. Then configure the Oracle VM virtual machine to use the Oracle VM Manager as a network gateway (using "route").

The most important point is probably the one about the network adaptor setting in VirtualBox. This caused me no ends of problems as I could successfully install both Oracle VM and Oracle VM Manager on the VirtualBox virtual machines, and I could get Oracle VM Manager to discover and see the Oracle VM system. Which meant that the network between the two systems was definitely working. But whenever I tried to create a new virtual machine in Oracle VM and boot it, it always failed at the Oracle Enterprise Linux installation stage after entering a URL to install from. There is something special done at that point of the installation procedure that requires the network adaptors to be in promiscuous mode. After some digging I came across this "Promiscuous Mode" and tried its alternative setting, and it all worked then.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 14 Jan 2014 19:14
So far I've shown the skeleton of a query that lists out some key activity data values from all the data captured by AWR in its regular snapshots. So far we've only extracted a few time based data values:
  • Duration - real clock time that occurred between the AWR 2 snapshots being reported on
  • Database Time - time spent by the database being active
  • SQL Execution Time - time spent by the database executing SQL statements
AWR captures a lot of other data in its snapshots and we can add more of these to our query to drill down further into what happened between each pair of snapshots. These other data sets include:
  • Wait event data - individual wait events and wait time, and wait classes
  • System statistics - as normally seen in V$SYSSTAT as cumulative counters
  • SQL execution statistics for individual SQL statements
As before, I'll be adding these as separate sub-queries in the WITH clause so that they produce one row of data values per snapshot, and then joining to these named sub-queries in the main query and outputting the data values we want. This keeps the main query simpler, and pushes issues about grouping and summarising into the sub-query. It also lets us refer to the same sub-query multiple times if needed - which is the case for some of the data captured by AWR e.g. System Statistics are captured as sets of Name / Value pairs per snapshot.

For now lets add on wait event data telling us how long the system waited on different types of events. As we already have two different times for database activity (DB Time & SQL Execution Time), we would most like to know the total wait time within that database activity. This would let us calculate the actual time doing real SQL execution work, and see wait time as a percentage of total database time. The basic query for this extracts the wait times in microseconds from DBA_HIST_SYSTEM_EVENT, and would be:
, syswaits as 
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
-- Ignore Idle wait events
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id
, sysevent.dbid
) -- syswaits
Note that I exclude "Idle" waits, which are waits unrelated to the execution of a SQL statement.

The next logical thing would be to break down the total wait time somehow into various major components, to show which type of waits occurred more often than the others. There are 3 major ways to break down wait events:
  • Foreground versus background waits i.e. waits experienced directly by sessions for connected users, and waits experienced by permanent background server sessions
  • By Wait Class - these group individual events into common classes such as Concurrency, User I/O, Commit, Network, Cluster.
  • By individual wait event - unfortunately there are around 100 of these, and outputting these as individual columns in the query would be too cumbersome.
So the obvious thing to do is to split out foreground or background wait times (the other one can be calculated using the total wait time), and by wait class as there are only about 10 or so of these. Background waits are actually obtained from a different AWR snapshot table - DBA_HIST_BG_EVENT_SUMMARY. So the query for this would be:
, sysbgwait as 
-- One row per System Wait Event with change in value between snapshots
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
from dba_hist_bg_event_summary sysevent, dba_hist_bg_event_summary psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id, sysevent.dbid
) -- sysbgwait
This gives the total background wait time in each snapshot for each database instance. The main query can then join to this as it does to the other sub-queries.

For the wait classes we now have two main choices for how to write such sub-queries:
  • Have separate sub-queries for each class
  • Collapse the sub-queries into just one sub-query using the "decode" function to sum values over different criteria
The trouble with the first approach is that each sub-query will probably be executed separately, meaning multiple reads of the same snapshot wait event data to produce multiple data sets, and then each sub-query data set will be joined together by Snapshot ID and Database ID. The second approach should be more efficient when executed - there is only one read of the AWR wait event data, and it is processed as it is read into one result set without any further joins between different wait event data subsets (the only joins are to the other AWR data sets in the main query).

The classes can be broken out using the Oracle "DECODE" function as an "IF" function to test the class name.
, syswaits as 
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
, sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
, sum (decode (sysevent.wait_class, 'Cluster', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) cluster_time
, sum (decode (sysevent.wait_class, 'Concurrency', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) concurrency_time
, sum (decode (sysevent.wait_class, 'Network', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) network_time
, sum (decode (sysevent.wait_class, 'System I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) system_io_time
, sum (decode (sysevent.wait_class, 'User I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) user_io_time
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
-- Ignore Idle wait events
and sysevent.wait_class != 'Idle'
group by sysevent.snap_id
, sysevent.dbid
) syswaits
The principle is that if the wait class name matches the target class name then the corresponding wait time is used in the sum, otherwise zero is used meaning that particular wait time is not included in the sum for that wait class.

Adding this to our existing query gives the following - note the divides by one million to convert microseconds to seconds:
with
snaps as
...
, systimes as
...
, sysbgwait as
...
, syswaits as
...
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
, extract (second from snaps.snap_interval)
+ (extract (minute from snaps.snap_interval)
+ (extract (hour from snaps.snap_interval)
+ (extract (day from snaps.snap_interval) * 24)
) * 60
) * 60 snap_duration
, dbtime.value / 1000000 db_time
, sqlexectime.value / 1000000 sql_exec_time
, syswaits.all_wait_time / 1000000 all_wait_time
, sysbgwait.time_waited_micro / 1000000 bg_wait_time
, syswaits.commit_time / 1000000 commit_time
, syswaits.cluster_time / 1000000 cluster_time
, syswaits.concurrency_time / 1000000 concurrency_time
, syswaits.network_time / 1000000 network_time
, syswaits.system_io_time / 1000000 system_io_time
, syswaits.user_io_time / 1000000 user_io_time
from snaps
join (select * from systimes where stat_name = 'DB time') dbtime
on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
join (select * from systimes where stat_name = 'sql execute elapsed time') sqlexectime
on snaps.snap_id = sqlexectime.snap_id and snaps.dbid = sqlexectime.dbid
join syswaits
on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
join sysbgwait
on snaps.snap_id = sysbgwait.snap_id and snaps.dbid = sysbgwait.dbid
order by snaps.end_snap_time
/
And again, it would be good to define some nice column formats when running this in SQL*Plus:
col snap_time         format a15         heading 'SNAP|TIME'
col snap_duration format 999,999 heading 'SNAP|DURATION'
col db_time format 999,990 heading 'DB|TIME'
col sql_exec_time format 999,990 heading 'SQL EXEC|TIME'

col all_wait_time format 999,990 heading 'ALL WAIT|TIME'
col bg_wait_time format 999,990 heading 'BG WAIT|TIME'
col commit_time format 999,990 heading 'COMMIT|WAIT TM'
col cluster_time format 999,990 heading 'CLUSTER|WAIT TM'
col concurrency_time format 999,990 heading 'CONCURNCY|WAIT TM'
col network_time format 999,990 heading 'NETWORK|WAIT TM'
col system_io_time format 999,990 heading 'SYS IO|WAIT TM'
col user_io_time format 999,990 heading 'USER IO|WAIT TM'
When run the output is something like this:
SNAP                SNAP       DB SQL EXEC ALL WAIT  BG WAIT   COMMIT  CLUSTER CONCURNCY  NETWORK   SYS IO  USER IO
TIME DURATION TIME TIME TIME TIME WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM
--------------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- --------
11/09/13 06:00 3,615 1,594 1,292 894 154 104 61 6 482 52 114
11/09/13 07:00 3,591 1,638 1,352 907 190 71 38 8 491 98 119
11/09/13 08:00 3,589 2,149 1,869 1,324 188 64 70 8 503 86 518
11/09/13 09:00 3,640 5,516 5,062 3,344 223 83 382 24 582 94 2,081
11/09/13 10:00 3,558 11,193 10,150 6,625 326 264 1,040 146 670 161 4,007
As before, this particular database is relatively inactive before 9am, then becomes more active. Now we can see that the wait time is significant e.g. about 60% of Database Time is actually Wait Time in the hour up to 10am. And we can see that the largest component of this Wait Time is User I/O. We can also see that the next highest Wait Time component is Cluster wait time.

What next? We could either try and drill down into individual wait events, or maybe look at the SQL statements being executed and the waits they experience. More in future blog posts.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 21 Nov 2013 18:02
I've previously covered a number of queries that can be combined together to pull out key statistics from the Oracle AWR data for each snapshot it makes, which you could use to analyse the performance of a system. One example would be to use this to look at how the workload changes during the day, and to identify the period of peak activity. Another use would be to see if there were any periods experiencing significantly higher waits than at other times.

I'll now go through how you can use some of the AWR data to do an overall performance analysis of a system. But before we do that, we need to review some basics on performance tuning, just to clarify some things and avoid potential confusion.

Performance Tuning Basics

I'm of the same school of tuning as people like Cary Millsap and Jeff Holt and others - generally speaking you should be tuning specific performance problems as experienced by individual Oracle sessions, and not tuning the system as a whole. This is because any "improvements" you make to the system that do not benefit the slow sessions are completely wasted and pointless. For instance, the system may be experiencing slow disk I/O's, but it might be that the reported slow interactive response time by some users is due to very high CPU usage resulting from a poor execution plan for the particular SQL statement that session is executing. Making the disk I/O's happen quicker might "improve" things for the system as a whole, but it will have no impact at all on the CPU bound SQL statement that is the cause of slow response times on an end user's screen.

That said, there are times where system level tuning is appropriate or when you can only do system level tuning, and that is where AWR can help as it gathers system wide activity data. For session level tuning you need to look elsewhere - such as ASH (Active Session History) or using SQL Trace.

There are some big assumptions being made when we do system level tuning:
  • All sessions are executing the same or similar SQL statements
    • So a system wide "average" view across all sessions is valid
  • Any "poor performance" on the system impacts all sessions equally
    • So even one rogue SQL statement can slow down all other SQL statements being executed from other sessions
  • The system is under a high enough load that performance problems are due to interactions between the SQL statements
    • It is assumed that poorly performing individual SQL statements would have been identified earlier under initial testing
Providing these assumptions or similar are true, then you can do system level tuning.

System Level Tuning Measurements

A computer system exists to do work, and its performance is really about how long it takes to to that work and which resources it uses while doing it. Performance is an issue when a task takes too long to complete.

We need to measure both sides of this – the input work requests, and the output usage of the system resources that results. We can only do performance tuning properly when we know both of these, as this lets us see the impact of any changes we make - does resource usage go down or not for the same input workload?

Measuring the input workload is best done at business or application level transactions. And this is best achieved by instrumenting the application software itself to record both the type of transaction it is performing and the elapsed time it takes when submitted. However, very few applications if any are instrumented this way.

This leaves a database level measurement as the only viable and consistently available way of recording the input workload, and in my view the best available measurement is the number of SQL statements executed per second. While this is very imperfect as a measure of the input workload, it is the nearest we are going to get to it, and all input application transactions result in SQL statements being executed. So it does correlate with input workload.

For a measurement of workload on the system - the output resulting from the execution of the SQL statements - the best measurement is probably Average Active Sessions. AAS is simply the Database Time ("DB Time" from System Time Model) divided by elapsed time for the period over which Database Time was measured. The Database Time value is a measurement of the amount of elapsed time sessions were active executing SQL statements, and includes both active execution time (running on a CPU) and waiting time (for disk I/O or any other wait event). AAS indicates how many sessions were active at the same time executing SQL statements i.e. the level of concurrency on your system.

AAS correlates with the number of CPU's on your system - if all SQL statements executed with no waiting at all they would only use up CPU resource and your transaction rate would be limited by the number of CPU's in the system. If AAS is less than the number of CPU's in your system then you are not reaching capacity. If AAS is close to or more than the number of CPU's then you have an overloaded system and waiting of some form or another is occurring.

To make real use of the AAS value we also want to know the percentage of time spent waiting during each period i.e. total wait time as a percentage of database active time (DB time again). The Percentage Wait (which I label Wait%) indicates the efficiency of the SQL statement execution - if this waiting was eliminated then that wait time would be removed from the elapsed time of each SQL statement executed. This is only relevant when AAS is high enough. When the level of concurrency is very low you can get various anomalies, and there will always be some waiting somewhere. Again remember the assumptions I stated earlier - the system must be under a high enough load to experience performance problems due to the combined workload itself.

That's it - 3 simple measurements should be enough to tell you whether your system is overloaded or not, and whether it is performing efficiently. These can be easily extracted from AWR (see later in this post), and can graphed in your favourite spreadsheet tool to spot trends and anomalies.

Using these measurements

If you have a system wide bottleneck / capacity limit then checking on the total system resource usage during a period of time will hopefully identify any such bottlenecks. As stated, if AAS is high enough (I would say more than half of the number of CPU's), and the Wait% is also high (say 50% or more), then you have poor performance and it can be improved.

At this point I would use the expanded version of the AWR query that I have built up over the previous posts to get all of the data out of AWR for each snapshot, and again into a spreadsheet or something else. Within this AWR data is a breakdown of the Oracle Wait Time by Wait Class, which tells you which "type" of wait was consuming most of the wait time. This can either primarily be Disk (I/O), Network, Cluster, Commit (Redo), or Concurrency (locking).

Knowing the wait class you can then query AWR for SQL statement execution and sum them over the wait class identified earlier, and sort by the wait class time summed. I showed how to do this in my previous post, summing the number of disk reads per SQL statement executed in each snapshot. This way you easily get to see the top SQL by that wait type, and see how much time they spent waiting relative to the total wait time.

Equally you could just do all SQL statements by their total wait time, to see those that experienced the worst waits whatever type of wait they were. Wait time for a SQL statement would be calculated as its Elapsed time minus the CPU time.

AWR System Performance Query

Here is a query that gives you just the 3 key measurements mentioned before for a system from the AWR data. It reports all the measurements for yesterday - modify the date range constraint to report on different periods.
set feedback off
set verify off
set linesize 1000
set trimout on
set trimspool on
--
col snap_time format a15 heading 'SNAP TIME'
col user_calls_sec format 9,990 heading 'UCALL/S'
col aas format 90.0 heading 'AAS'
col wait_pct format 990.0 heading 'WAIT%'
--
with
snaps as
(select snap_id
, dbid
, end_snap_time
, snap_interval
, extract (second from snap_interval)
+ (extract (minute from snap_interval)
+ (extract (hour from snap_interval)
+ (extract (day from snap_interval) * 24)
) * 60
) * 60 snap_duration
from (select csnaps.snap_id
, csnaps.dbid
, min (csnaps.end_interval_time) end_snap_time
, min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
from dba_hist_snapshot csnaps
group by csnaps.snap_id, csnaps.dbid
)
) -- snaps
, systimes as
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
, systime.dbid
, systime.stat_name
, sum (systime.value - psystime.value) value
from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
where systime.snap_id = psystime.snap_id + 1
and systime.dbid = psystime.dbid
and systime.instance_number = psystime.instance_number
and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
, sysstats as
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
, sysstat.dbid
, sysstat.stat_name
, sum (sysstat.value - psysstat.value) value
from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
where sysstat.snap_id = psysstat.snap_id + 1
and sysstat.dbid = psysstat.dbid
and sysstat.instance_number = psysstat.instance_number
and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats
, syswaits as
-- One row for total wait time, plus break down into major wait classes, and events
(select sysevent.snap_id
, sysevent.dbid
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
, sum (sysevent.total_waits - psysevent.total_waits) wait_count
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
and sysevent.dbid = psysevent.dbid
and sysevent.instance_number = psysevent.instance_number
and sysevent.event_id = psysevent.event_id
and sysevent.wait_class != 'Idle' -- Ignore Idle wait events
group by sysevent.snap_id
, sysevent.dbid
) -- syswaits
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
, (user_calls_st.value / snaps.snap_duration) user_calls_sec
, (dbtime.value / 1000000) / snaps.snap_duration aas
, (100 * syswaits.time_waited_micro / dbtime.value) wait_pct
from snaps
join (select * from systimes where stat_name = 'DB time') dbtime
on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
join syswaits
on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
join (select * from sysstats where stat_name = 'user calls') user_calls_st
on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
where snaps.end_snap_time between
(trunc (sysdate) - 1) and (trunc (sysdate))
order by snaps.end_snap_time
/
--
set feedback on
set lines 80
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 06 Nov 2013 20:35
This is the last of the main set of posts on this topic. As for the last post, I'll try and keep this brief and post the SQL involved.

Another data set we can look at is on SQL statements i.e. statistics collected by AWR on individual SQL statements, not overall totals. AWR snapshots SQL statements from V$SQLSTAT to DBA_HIST_SQLSTAT for per SQL statement statistics.

As before we could use the technique of subtracting the values from the previous snapshot from the values of the current snapshot to get the change in value between the snapshots. Unfortunately this runs into issues as not all SQL statements may be present in both snapshots, and SQL statements can be flushed out of the library cache in the SGA and then added back in again later so their statistics have been reset in-between.

Luckily Oracle has solved this problem for us and provided a set of DELTA columns for the most important statistics giving us just what we want. Also this avoids the need to join back to the previous snapshot to calculate the change between the snapshots - the DELTA columns are already the change in value from the previous snapshot.

The main query then to extract useful statistics per SQL statement per database per snapshot is:
select sqlstat.snap_id
, sqlstat.dbid
, sqlstat.sql_id
, sum (sqlstat.elapsed_time_delta) sql_time
, sum (sqlstat.executions_delta) sql_execs
, sum (sqlstat.disk_reads_delta) sql_reads
from dba_hist_sqlstat sqlstat
group by sqlstat.snap_id
, sqlstat.dbid
, sqlstat.sql_id
We cannot use this query directly in our current main AWR summary query as we are getting multiple data rows per snapshot - one per SQL statement captured by that snapshot. The obvious thing to do is to sum the values over all of the SQL statements executed in that snapshot to produce one set of values per snapshot. Unfortunately this doesn't really give us anything that useful - we already have data values from AWR for these statistics across the whole system (SQL execution time, number of SQL statements executed, and number of disk reads), and when added up across all the SQL statements it doesn't help us see if there are any anomalies within the SQL statement themselves.

A better use for this query is to run it separately within another query that instead groups by SQL_ID and sums over multiple snaphots e.g. all snapshots in one day. This is a useful way of seeing which particular SQL statements put a greater load on the system than other SQL statements. "Load" could be number of executions, CPU used, disk reads, or some other measurement.

For example, one system I was looking at recently was doing a lot of disk reads and I could see that there were a number of table scans occurring. So I wanted to identify the SQL statements causing these table scans i.e. the SQL statements with the highest disk reads. By summarising over a whole day I could ensure that I was looking at the worst offenders who were executed multiple times during the day, and not a bad query only executed once.

The following query reports SQL statements captured by AWR yesterday ("sysdate - 1" truncated to midnight) sorted by total number of disk reads. The "sql_reads > 100000" is a filter so that not all SQL statements are listed, only those with a significant number of disk reads - you can increase or decrease this threshold based on how active your system is. You may want to start higher at a million and then reduce it by a factor of 10 until you get enough SQL statements listed.
with 
snaps as
(select csnaps.snap_id
, csnaps.dbid
, min (csnaps.end_interval_time) end_snap_time
, min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
from dba_hist_snapshot csnaps
group by csnaps.snap_id, csnaps.dbid
) -- snaps
, sqlstats as
(
[insert previous query here]
)
select sqlstats.sql_id
, sum (sqlstats.sql_reads) sql_reads
, sum (sqlstats.sql_execs) sql_execs
, sum (sqlstats.sql_time) / 1000000 sql_time
from snaps
join sqlstats
on snaps.snap_id = sqlstats.snap_id and snaps.dbid = sqlstats.dbid
where snaps.end_snap_time between
(trunc (sysdate) - 1) and (trunc (sysdate))
and sql_reads > 100000
group by sqlstats.sql_id
order by sql_reads desc
/
Note that the time values are in microseconds and so must be divided by one million to output them as seconds.

A further optimization to this query is to restrict it to the main working hours, say 8am to 6pm:
   and extract (hour from snaps.end_snap_time) between 8 and 17 -- 8:00 to 17:59
This avoids issues with any overnight batch jobs you may be running or the Oracle default job to update stale statistics on database objects (which does do a lot of disk reads).

When I used this I was able to identify about 4 SQL statements responsible for about 25% of the disk reads on the system, each having a relatively low execution count - under 100 each for millions of disk reads in total. Each was clearly doing full table scans, which was validated by checking the execution plans of each - there is an AWR report supplied with Oracle that reports this information for a SQL statement across a number of snapshots (awrsqrpt.sql).

Now that I knew the top contributors to disk reads on the system I was able to investigate each SQL statement individually and work out changes to improve their execution. This was a mixture of extra indexes, SQL rewrites and other database changes.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 28 Oct 2013 20:22
So far our AWR report query shows overall duration between two consecutive snapshots, database time (i.e. doing work), SQL execution time (may be lower), total wait time, wait time by class, and some common specific wait events. We know how busy the database was and how much time it spent waiting and on what. Now I want to add in some more data captured by the AWR on system activity. This series is dragging on a bit, so I'm going to try and keep this one shorert and just get it posted out.

AWR snapshots V$SYSSTAT into DBA_HIST_SYSSTAT. This contains individual system statistics with cumulative running totals for each. These are stored as Name / Value pairs in each table. Most statistics are just counts of event occurrences e.g. "physical reads", "physical writes", but some have been added that are time values. This lets you calculate average time per event when you have the corresponding event count as well. Beware that many of the "time" values are in odd units, such as hundredths of seconds (1/100 second), and will need to be adjusted if you want to report in standard units of time.

Statistics recorded include the following, which I've tried to group together into related sets. The details don't really matter on these, as the point of these AWR Summary Reports is to pull out as much as we can at once from a database for later analysis. Some of these statistics might only be relevant in specific scenarios - but by pulling them out at the same time as others, we already have them when they are needed.
  • SQL statement execution - how many? Elapsed Time is in the Database Time Model data set
    • "execute count" - SQL statements executed, both user and recursive
    • "user calls" = Number of user calls such as login, parse, fetch, or execute.
    • "recursive calls"
  • Transactions = units of work = groups of related SQL statements
    • "user commits" = commit count
    • "user rollbacks" = rollback count = abandoned transactions
  • Average Redo Write Time - how long to write to the redo log?
    • "redo write time" = Total write time in 100'ths of seconds aka. 10's of milliseconds
    • "redo writes" = Number of redo writes that occurred
    • Also "redo blocks written" - Divide by "redo writes" to get average blocks per write
    • Can break redo writes down into those due to a "commit" and others (log buffer full enough to warrant a write)
    • Commit == Redo Synch -> "redo synch time" / "redo synch writes"
  • Disk I/O - how many disk operations? Elapsed time is in the Wait Events data set
    • "physical reads" = Physical Reads
    • "physical writes" - Physical Writes
  • Logical reads = "consistent gets" + "db block gets"
    • Note that "session logical reads" is defined as 'The sum of "db block gets" plus "consistent gets"', but might also include logical reads from 'process private memory'.
Because the statistics are stored as Name / Value pairs, each statistic is in a separate row rather than a separate column in the same row as other statistics. We can use a sub-query that retrieves each change in statistic value for each AWR snapshot, and then use this within the main query in an in-line view for each statistic with an appropriate alias name.
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
, sysstat.dbid
, sysstat.stat_name
, sum (sysstat.value - psysstat.value) value
from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
where sysstat.snap_id = psysstat.snap_id + 1
and sysstat.dbid = psysstat.dbid
and sysstat.instance_number = psysstat.instance_number
and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats

select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
...
, execs.value executions
, user_calls_st.value user_calls
, user_commits_st.value user_commits
, redo_write_st.value redo_writes
, redo_time_st.value / 100 redo_write_time
, phys_reads.value physical_reads
, phys_writes.value physical_writes
, table_scans_st.value table_scans
from
...
join (select * from sysstats where stat_name = 'execute count') execs
on snaps.snap_id = execs.snap_id and snaps.dbid = execs.dbid
join (select * from sysstats where stat_name = 'user calls') user_calls_st
on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
join (select * from sysstats where stat_name = 'user commits') user_commits_st
on snaps.snap_id = user_commits_st.snap_id and snaps.dbid = user_commits_st.dbid
join (select * from sysstats where stat_name = 'redo writes') redo_write_st
on snaps.snap_id = redo_write_st.snap_id and snaps.dbid = redo_write_st.dbid
join (select * from sysstats where stat_name = 'redo write time') redo_time_st
on snaps.snap_id = redo_time_st.snap_id and snaps.dbid = redo_time_st.dbid
join (select * from sysstats where stat_name = 'physical reads') phys_reads
on snaps.snap_id = phys_reads.snap_id and snaps.dbid = phys_reads.dbid
join (select * from sysstats where stat_name = 'physical writes') phys_writes
on snaps.snap_id = phys_writes.snap_id and snaps.dbid = phys_writes.dbid
join (select * from sysstats where stat_name = 'table scans (long tables)') table_scans_st
on snaps.snap_id = table_scans_st.snap_id and snaps.dbid = table_scans_st.dbid
...
This means that we can now correlate overall system performance (Database Time and Wait Time) against other activities on the system and drill down further. Using the template given above you can extract any System Statistic you are interested in, and it is easy to add other ones in the future when you need them.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 15 Oct 2013 20:52
Given the previous sub-query for total wait times including class level waits from the AWR snapshot data, we are going to extend that particular sub-query in order to add some extra wait event data items to it. This will give us a further breakdown of what the major wait events were, and how much time was spent waiting versus doing real work. The upside to these changes are that we will be able to pull out some other wait event times from the AWR snapshots in the same query, for little extra cost or increase in complexity. The downside is that the number of data items being selected for output will increase a lot, and this may start to look like a "mother of all database monitoring queries". However, the query itself is not more complex, and the intention is to get as much data as we can from each AWR snapshot in one hit to allow us to analyse it later outside Oracle (typically in a spreadsheet).

But before we do that I want to refactor that sub-query to make some of the later additions to it easier and simpler. Remember that we had just one sub-query, with the wait time always being calculated by subtracting the current snapshot's wait time from the previous snapshot's time. I want to refactor this by pushing this subtraction down into a nested sub-query, so that the rest of the wait sub-query becomes simpler.

So before we had elements like:
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
, sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
The refactor changes this to:
, syswaits as 
-- One row for total wait time, plus break down into major wait classes, and events
(select syswaitevents.snap_id
, syswaitevents.dbid
, sum (syswaitevents.time_waited_micro) all_wait_time
, sum (decode (syswaitevents.wait_class, 'Commit', syswaitevents.time_waited_micro, 0)) commit_time
...
from
(select sysevent.snap_id
, sysevent.dbid
, sysevent.wait_class
, sysevent.event_name
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
, sum (sysevent.total_waits - psysevent.total_waits) wait_count
from dba_hist_system_event sysevent, dba_hist_system_event psysevent
where sysevent.snap_id = psysevent.snap_id + 1
...
group by sysevent.snap_id, sysevent.dbid, sysevent.wait_class, sysevent.event_name
) syswaitevents
group by syswaitevents.snap_id
, syswaitevents.dbid
) -- syswaits
The key points are:
  • The inner sub-query groups by wait class and individual event name, as well as by snapshot and database id, and sums the wait time to this granularity.
  • It also sums the total occurrences of each wait event, for further use in the outer query.
  • The outer sub-query can now just refer to "syswaitevents.time_waited_micro" as the elapsed wait time, as the subtraction between the current and previous snapshot values are done in the inner sub-query.
  • The outer sub-query still does the "decode" of the class name to extract individual sums for each class desired
Previously I said that there were too many individual events to report them all - around 100 individual wait events. However, experienced DBA's find that they do tend to run across a very common set of wait events that occur most often on "slow" systems. There is a strong enough argument for extracting and reporting the wait times for some individual wait events that you consider to be the "most common" wait events.

The benefit is that if any one of these particular wait event times is very high, then we already have extracted it and reported it for analysis and can see it in the output. If all of these particular wait event times are low, then there was almost no extra cost involved in retrieving them at the same time as the other wait event times (there is no extra disk I/O, only some extra CPU to process the wait time data). It avoids the need to run another query afterwards to drill down into the wait event data, and if the extra wait data is irrelevant then we can just ignore it. And the mechanism is the same one as for the wait classes, using "decode" against the event name.

The top wait events that I am going to include are listed below. It's not a large list as I want to focus on the query, not the specific wait events. You can disagree with some of these, or want to add some others of your own. You can do this just by modifying the query I give below and changing the event names used in the "decode" functions.
  • db file sequential read
  • db file scattered read
  • direct path read
  • log file sync
  • log file parallel write
  • gc buffer busy
  • gc cr block busy
As well as the total wait time for each event, we would also want to know the number of waits that occurred so we can calculate the average time per wait. This is important as it might not be the number of waits that is the problem but the average time per wait. We can achieve a "count" by doing a "sum" using one (1) when the wait event name matches, and zero (0) otherwise. This keeps it uniform with the wait time per event.

The only changes to the refactored sub-query given earlier are the addition of these extra data items to the "select" output list:
      , sum (decode (syswaitevents.event_name, 'db file sequential read', syswaitevents.time_waited_micro, 0)) db_file_sqrd_time
, sum (decode (syswaitevents.event_name, 'db file sequential read', 1, 0)) db_file_sqrd_count
, sum (decode (syswaitevents.event_name, 'db file scattered read', syswaitevents.time_waited_micro, 0)) db_file_sctrd_time
, sum (decode (syswaitevents.event_name, 'db file scattered read', 1, 0)) db_file_sctrd_count
, sum (decode (syswaitevents.event_name, 'direct path read', syswaitevents.time_waited_micro, 0)) direct_path_read_time
, sum (decode (syswaitevents.event_name, 'direct path read', 1, 0)) direct_path_read_count
, sum (decode (syswaitevents.event_name, 'log file sync', syswaitevents.time_waited_micro, 0)) log_file_sync_time
, sum (decode (syswaitevents.event_name, 'log file sync', 1, 0)) log_file_sync_count
, sum (decode (syswaitevents.event_name, 'log file parallel write', syswaitevents.time_waited_micro, 0)) log_file_prl_wr_time
, sum (decode (syswaitevents.event_name, 'log file parallel write', 1, 0)) log_file_prl_wr_count
, sum (decode (syswaitevents.event_name, 'gc buffer busy', syswaitevents.time_waited_micro, 0)) gc_buff_busy_time
, sum (decode (syswaitevents.event_name, 'gc buffer busy', 1, 0)) gc_buff_busy_count
, sum (decode (syswaitevents.event_name, 'gc cr block busy', syswaitevents.time_waited_micro, 0)) gc_cr_blk_busy_time
, sum (decode (syswaitevents.event_name, 'gc cr block busy', 1, 0)) gc_cr_blk_busy_count
Then we need to add these extra columns to the output of the main "select", remembering to divide times in microseconds by one million:
     , syswaits.db_file_sqrd_time     / 1000000 db_file_sqrd_time
, syswaits.db_file_sctrd_time / 1000000 db_file_sctrd_time
, syswaits.direct_path_read_time / 1000000 direct_path_read_time
, syswaits.log_file_sync_time / 1000000 log_file_sync_time
, syswaits.log_file_prl_wr_time / 1000000 log_file_prl_wr_time
, syswaits.gc_buff_busy_time / 1000000 gc_buff_busy_time
, syswaits.gc_cr_blk_busy_time / 1000000 gc_cr_blk_busy_time
, syswaits.db_file_sqrd_count db_file_sqrd_count
, syswaits.db_file_sctrd_count db_file_sctrd_count
, syswaits.direct_path_read_count direct_path_read_count
, syswaits.log_file_sync_count log_file_sync_count
, syswaits.log_file_prl_wr_count log_file_prl_wr_count
, syswaits.gc_buff_busy_count gc_buff_busy_count
, syswaits.gc_cr_blk_busy_count gc_cr_blk_busy_count
Note that I have chosen to keep the times together and the counts together, rather than alternate between them for each event in turn. This particular ordering makes it easier to copy an "average wait time" formula from one cell to another in a spreadsheet, as each referenced column shifts by one column in each case.

With these columns added, the output from my AWR summary report becomes:
SNAP                SNAP       DB SQL EXEC ALL WAIT  BG WAIT   COMMIT  CLUSTER CONCRNCY  NETWORK   SYS IO  USER IO  DBF SEQ DF SCTRD DRCT PTH  LOG FILE LG FL PR  GC BUFF GC CRBLK    DBF SEQ   DF SCTRD   DRCT PTH   LOG FILE   LG FL PR    GC BUFF   GC CRBLK
TIME DURATION TIME TIME TIME TIME WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM RD TIME RD TIME RD TIME SYNC TIME WR TIME BSY TIME BSY TIME RD CNT RD CNT RD CNT SYNC CNT WR CNT BSY CNT BSY CNT
--------------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- -------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
11/09/13 06:00 3,615 1,594 1,292 894 154 104 61 6 482 52 114 85 3 24 104 22 0 18 20,096 880 31,531 45,162 47,067 18 1,390
11/09/13 07:00 3,591 1,638 1,352 907 190 71 38 8 491 98 119 53 9 53 71 49 0 2 13,934 2,584 51,890 47,190 49,327 30 232
11/09/13 08:00 3,589 2,149 1,869 1,324 188 64 70 8 503 86 518 388 14 98 64 41 0 2 199,512 5,722 74,116 49,184 51,518 28 849
11/09/13 09:00 3,640 5,516 5,062 3,344 223 83 382 24 582 94 2,081 1,681 38 218 83 50 146 3 635,825 10,174 147,400 80,669 85,350 65,098 2,136
11/09/13 10:00 3,558 11,193 10,150 6,625 326 264 1,040 146 670 161 4,007 2,980 227 710 264 87 106 34 744,366 90,168 415,035 129,443 136,572 116,076 8,471
11/09/13 11:00 3,630 16,981 15,578 10,260 486 371 1,294 116 865 255 7,116 5,804 368 495 371 144 14 18 1,346,338 162,856 274,090 191,730 201,510 834 4,778
11/09/13 12:00 3,609 15,954 14,349 9,516 497 508 1,456 143 876 231 5,887 4,979 185 579 508 125 70 68 1,230,115 92,290 317,980 186,878 197,001 3,207 8,931
11/09/13 13:00 3,602 10,800 9,651 6,013 383 278 774 81 747 181 3,665 3,191 87 323 278 102 18 21 954,932 32,045 200,258 173,877 184,167 661 9,443
11/09/13 14:00 3,562 8,532 7,562 4,359 319 203 488 55 698 158 2,566 2,364 25 149 203 92 22 12 553,064 6,607 133,380 159,407 169,369 388 6,391
11/09/13 15:00 3,601 11,486 10,210 6,721 402 374 899 111 751 189 4,102 3,485 157 304 374 109 42 50 814,112 79,733 214,011 172,358 181,824 1,944 7,554
As I said at the beginning this is a lot more columns of data than before. But the whole point is that we can now cut and paste this into a spreadsheet and do further analysis of the data to identify high values of one form or another. Now we not only have a break down of wait time by wait class, we also have the times and counts for the most common wait events.

Previously we saw that from 9am to 10am the major wait time was User I/O at 4,007 seconds out of 6,625 seconds for all waits. Now we can see that the one wait event of "db file sequential read" accounted for almost 3,000 seconds (2,980) out of the 4,000 for User I/O. We can also see that the next longest wait is probably "direct path read" at 710 seconds. So we now know which wait event contributed the most wait time, and how many times it occurred. And this lets us calculate the average "db file sequential read" for that period as being 0.004 seconds i.e. 4 milliseconds. Depending on your particular situation you may view this as being a high average time, or normal and acceptable. If the average wait time is acceptable then you would look at the number of such reads and try and find which SQL statements are causing them and tune those.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Sunday, 08 Sep 2013 18:05
Following on from my previous post on doing summary reports on the data captured in the AWR. Here is some example output from using that query, and a useful statistic you can derive from the output data it produces.

The previous query just gave us 3 time values:
  • Duration - real clock time that occurred between the AWR 2 snapshots being reported on
  • Database Time - time spent by the database being active
  • SQL Execution Time - time spent by the database executing SQL statements
If you use the following WHERE clause within the SQL I gave before you get these data values that have been captured today:
where snaps.end_snap_time > trunc (sysdate)
And with suitable column formatting
col snap_duration format 999,990.9
col db_time format 999,990.99
col sql_exec_time format 999,990.99
you get output like the following:
SNAP_TIME      SNAP_DURATION     DB_TIME SQL_EXEC_TIME
-------------- ------------- ----------- -------------
15/08/13 00:00 3,602.0 1,637.70 1,394.76
15/08/13 01:00 3,605.1 1,378.69 1,175.09
15/08/13 02:00 3,586.6 2,219.12 2,024.38
15/08/13 03:00 3,619.0 1,436.43 1,265.08
15/08/13 04:00 3,626.7 1,190.99 1,021.95
15/08/13 05:00 3,552.6 1,216.76 1,051.03
15/08/13 06:00 3,602.9 1,287.73 1,104.61
15/08/13 07:00 3,594.1 1,595.69 1,392.66
15/08/13 08:00 3,606.4 2,351.33 2,132.49
15/08/13 09:00 3,609.6 3,449.07 3,132.93
15/08/13 10:00 3,622.1 9,128.43 8,266.52
15/08/13 11:00 3,598.8 12,355.47 10,913.63
And as mentioned you can cut and paste this into a spreadsheet and derive other statistics from these measurements.

One obvious measurement is Average Active Sessions, which is a measure of how many sessions were active at the same time. An "active session" will be doing work and will be measured as an increase in "Database Time". So dividing "Database Time" by the elapsed time between the consecutive snapshots will give you the Average Active Sessions.

We can see that between 8 and 9am the AAS was just under 1 (0.96), while between 9 and 10am it had risen to 2.52, and between 10 and 11am it was 3.43. Depending on the number of CPUs in your system this may or may not indicate an overloaded system. This particular system has 8 CPUs, so it can easily handle under 4 concurrently active sessions and is not overloaded.

Next, I'll add on some more SQL sub-queries to the main query to pull out other data from the AWR, telling us about other things that happened on the system.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 14 Aug 2013 20:28
The addition of the Automatic Workload Repository (AWR) in Oracle 10g was a good thing, as its regular hourly snapshots collect and store a number of database wide activity statistics which allows historical reporting later on. However, the Oracle supplied standard AWR report (awrrpt) only provides detailed information on the activity that happened in one period of time between 2 snapshots. There is no summary report of activity across all of the snapshots in the AWR. Which means that to use the standard AWR report you first have to know which period of time is the important one you should be looking at. So how do you identify the "most active" time period?

The lack of any standard reports or analysis that works on the whole of this data set and summarises it seems to be something of a missed opportunity to me. And something that is vital when analysing historical activity and trying to identify hot spots. So I decided to write my own reports summarising activity between each snapshot over a number of snapshots i.e. over a number of days.

For each snapshot the report lists the change in value of the main activity statistics since the previous snapshot, all on one output line per snapshot, ordered by snapshot time. I can then use this output to determine the most active period (e.g. high SQL executions, high DB time) or the highest wait events (wait time) or something else. Generally I do this by loading the output into Excel (fixed width format data fields) and either glancing down some critical columns of data, or graphing them and looking for spikes. And within Excel I can also enter formulae and derive other statistics e.g. Average Active Sessions from DB Time and Elapsed Time.

Your mileage will vary with my report - you may not want some of the statistics I report, you may want some other ones, or you may want them formatted differently. But you can always edit the SQL and tweak it to your satisfaction. The main benefit is that you get all of the activity history captured in the AWR summarised into key measurements per snapshot. And this can be used to easily identify the busiest period, and also examine bottlenecks that give rise to high wait events.

I won't give all the raw SQL, as it is very long with multiple joins between many AWR tables. What I will do is build up the components of the SQL in stages, so you can build your own customised AWR query. I did create this particular SQL query myself, rather than simply copying someone else's, but I have seen the same concept used by others - for instance Karl Aaro, and it is interesting to see both how other people have pulled out this data (the SQL they use), and which statistics they deem most important and why.

Core Query - List of Snapshots

The core query is one that provides the list of snapshots, and the elapsed time of each one. I use the "WITH" sub-query clause of the "SELECT" query to keep each sub-query separate and make the final query simpler and easier to read.
with
snaps as
(select csnaps.snap_id
, csnaps.dbid
, min (csnaps.end_interval_time) end_snap_time
, min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
from dba_hist_snapshot csnaps
group by csnaps.snap_id, csnaps.dbid
) -- snaps
This provides a list of snapshot identifiers, which are a foreign key that appears in all other AWR tables, and identifies each snapshot of activity data itself. It also provides some other key data items, which can be important when summarising the data:
  • Database Identifier (dbid). This should be the same for all snapshots, but may not be under some circumstances. It is important to use in joins to other tables to ensure you are only getting activity data for the same database.
  • When the snapshot period finished (end_interval_time)
  • Snapshot duration (end time minus start time)
On a RAC system you will get separate snapshots of data for each instance, which means that there can be more than one end and start time per snapshot (one per instance). To get just one value for the start and end times I use the "min" aggregate function.

Database Time Model

This sub-query provides the set of Database Time Model data statistics per snapshot e.g. DB Time, sql execute elapsed time.
, systimes as 
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
, systime.dbid
, systime.stat_name
, sum (systime.value - psystime.value) value
from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
where systime.snap_id = psystime.snap_id + 1
and systime.dbid = psystime.dbid
and systime.instance_number = psystime.instance_number
and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
As before I group by the snapshot identifier, database identifier, and also statistic name. As the statistics values are cumulative, I am interested in the change in the value between two consecutive snapshots. So I join the Sys Time Model table to itself on the snapshot identifier being one more than the previous snapshot, and subtract the two data values (value column in this particular case) to get the change in value between the two snapshots. In case there is more than one data value per snapshot, which there will be on a RAC database, I also join by the instance number and add up the values together (sum aggregate function).

AWR Summary #1

Combining the two sub-queries together with suitable join conditions gives the following SQL to summarise database activity times between all snapshots:
with
snaps as
...
, systimes as
...
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
, extract (second from snaps.snap_interval)
+ (extract (minute from snaps.snap_interval)
+ (extract (hour from snaps.snap_interval)
+ (extract (day from snaps.snap_interval) * 24)
) * 60
) * 60 snap_duration
, dbtime.value / 1000000 db_time
, sqlexectime.value / 1000000 sql_exec_time
from snaps
join (select * from systimes where stat_name = 'DB time') dbtime
on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
join (select * from systimes where stat_name = 'sql execute elapsed time') sqlexectime
on snaps.snap_id = sqlexectime.snap_id and snaps.dbid = sqlexectime.dbid
order by snaps.end_snap_time
/
Notes:
  • The "snap_interval" is an INTERVAL data type, so to convert to just seconds I need to extract each time field component, multiply it up and add on to the other time components.
  • The times in the Sys Time Model are in microseconds, so I divide by one million to get the value in seconds.
  • The Sys Time Model data is stored as name / value pairs, so I need to query the same data set twice restricting to one named statistic each time. I do this in an inline view within the query.
  • The joins between all data sets are on the Snapshot Identifier and the Database Identifier.
  • From these data items you can derive other useful numbers e.g. Average Active Sessions is DB Time divided by Snapshot Duration.
This reports everything in the AWR, but you can restrict it to certain days by adding in a suitable WHERE clause. Here is one I use in SQL*Plus that refers to 2 numbers passed in as arguments when you invoke the script.
 where snaps.end_snap_time between 
(trunc (sysdate) - &1) and (trunc (sysdate) - &1 + &2)
You can also format the columns nicely in SQL*Plus by pre-defining column formats:
col snap_duration format 999,990.9
col db_time format 999,990.99
col sql_exec_time format 999,990.99
In a future post I'll show some other sub-queries against the AWR tables that can be added in to pull out SQL statement execution statistics, and wait events, and other statistics.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Sunday, 21 Jul 2013 14:54
On the one hand source code control can be a nightmare involving complicated tools and formal processes, but on the other hand the ability to track the history of changes to a file can be really useful. With complicated SQL queries and stored procedures, seeing what changed between each major version can be useful when debugging a change in behaviour or when application performance degrades. What is needed is something that is straightforward enough to use (not overcomplicated), does not get in the way of what you really want to do, helps you easily see what changed and when, and provides the right amount of functionality (nothing essential missing). If it fails on any of these then it is probably not worth using.

Having stumbled across Mercurial about 4 years ago when I needed something for a project I was working on, over time I have found it will do almost everything I need, and I've not looked back since. I use it for pretty much anything - most obviously Oracle SQL scripts, database build scripts, stored procedures, and other programming languages (I like Python which I've mentioned before). But it will work on any kind of text file.

Why Mercurial? What are the main things I like about it?
  • Command line interface and various GUI options
    • Command line access makes direct use easy if you prefer that
    • GUI options include a web browser interface, TortoiseHg GUI front end, and integration with other tools e.g. Eclipse
  • Truly tracks changes to files
    • It really records what changed between your saves (commits), not just a copy of the new file
    • This makes more advanced things really easy and almost trivial e.g. taking a change and applying it as a patch to another version
  • Changes saved as "Change Sets" - groups of changes to files are saved together at the same time
    • This is logically what you want - all your changes saved together as one set
  • Changes saved when you say - as frequently or infrequently as you want
  • You always have a full copy of the source code tree locally
    • None of this stuff about needing to check out in advance files you will need to change
  • Detects which files have changed or been added
    • Makes it really easy when saving changes - helps you avoid "forgetting" an important file
    • Also makes it easy to see if anyone else has changed any other files
  • Its repository (history of changes) sits alongside your source code in a separate directory at the top level
    • Its metadata is kept separate from your source code files
      • There are no extra metadata files or sub-directories mixed in with your source code files
  • Many advanced features too
    • Branching within repositories - different sets of changes from a common point in history
      • Useful for support versions of released versions, and bug fixes
    • Linked repositories - parent / child relationship between repositories
      • Another way of doing branches and separating released versions from in-development ones
      • Change Sets made to one repository can be pushed across to another repository e.g. bug fixes
    • Fully distributed - multiple copies of a repository can exist and be reconciled with each other
      • Developers can work on their own copy of the source code, then push changes back into group level repositories
    • Flexibility over repository hierarchies
      • Can be flat (master and one per developer) or very hierarchical (master, testing, project, sub-project, developer)
I'm probably making it sound more complicated than it is. I use it all the time in a very straightforward way because it is so easy, and it lets me see my change history if I ever need to. From my perspective here is what I do on any discrete project I work on.
  • If there is any existing source code for the project get a copy of it locally
  • Initialise the Mercurial repository ("hg init") and save the baseline
    • hg commit -m 'Baseline'
  • Edit any files I need to edit, test, and repeat until I'm satisfied
  • Save my changes locally in Mercurial
    • hg commit -m 'Description of what I did'
  • If changes need to be propagated to another source code control tool, work out what files changed
    • TortoiseHg gives you a straightforward view of what files were changed in each commit
    • Or use the command line to see what changed in the last commit "hg log -v"
    • Or any previous commit (revision number) "hg log -v -r REV#"
  • If using another source code control tool, synchronise with it when needed
    • Update my local copy of files from the other source code control tool with their latest version
    • Mercurial will detect what files have changed, if any
    • Save the changes in Mercurial with an appropriate message
      • hg commit -m 'Synced with master for latest changes by others'
  • If the intended changes are tentative or need extensive testing, I first take a full copy of the source code tree (Mercurial repository)
    • hg clone existing-repo repo-copy-name
    • This keeps my new changes separate from the original ones, and avoids issues over needing to back out rejected changes later on
    • After testing if the changes are accepted I "push" them back to the parent repository it was cloned from
      • Otherwise I just delete the copy of the source code and abandon those changes
    • All changes made since the repository was cloned are pushed back to the parent
      • So the number of local saves I did during editing and testing does not matter - all my changes are propagated back
  • Having multiple clones lets me work on different projects at the same time, without mixing up their changes in the same source code tree
As I said before, this lets me easily keep a local repository tracking all changes made to source code files, so I can see who changed what and when. It also enables me to keep multiple copies of a master source code tree and test out different changes independently of each other, and only push back changes when thoroughly tested and accepted by everyone else.

Your mileage will vary, but I find Mercurial easy enough to use, and it provides the right level of tracking of source code changes that I want.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Saturday, 06 Jul 2013 16:09
I find that during my technical work on Oracle and Performance Tuning there are a few key pieces of software that I keep coming back to again and again. They are not directly related to databases, and are really more like technical tools that help me do my work better. Each does a unique thing, and does it so well that they have become standard tools I use all of the time one way or another.

So my top technical utilities that I could not do without, in no particular order, are:
  • TiddlyWiki - a great personal Wiki for keeping together all the bits of useful information I come across
  • Mercurial - source code control, great for tracking the history of all my SQL scripts
  • VirtualBox - virtualised environments for testing multiple Linux and Oracle versions
  • Trello - an online list tool, great for do lists and following mini projects
I'm not trying to say that each of these is the best possible for each thing. I'm saying that they are so easy and straightforward to use that I use them again and again. And their feature set is great in each case - just what I want, no more, no less. Each of these tools has advantages that makes it more useful than other options out there or just not using anything at all.

TiddlyWiki is a self contained Wiki in a single HTML file, with a built in editing capability. So you can carry around your own Wiki on a memory stick as a single HTML file, and keep adding to it any time you want to. As ever it uses a bit of meta-syntax to embed formatting and hyperlinks. But that is it - no complicated set up or configuration, or other dependencies. Its portable - I use Firefox as my browser, but on Windows or Linux or anything else.

Mercurial tracks changes to text files i.e. source code versioning. Great for tracking changes to all of my numerous SQL scripts over time. Again, no complicated set up, though you do need to install it first (either build from scratch or as an installable package if available for your operating system). It keeps a record of what you changed in each file whenever you do a commit (an update to the history in its repository). It is dead easy to see the history of changes to any given file (often difficult for some tools) - specifically what changed and when. I've used it for all kinds of "source code files" on projects and been able to debug bad changes by working out when certain changes were made and by whom. You might never need to go "back in time" to see old versions of scripts, but when you do need to it is invaluable. And for just a little disk overhead of keeping the change history.

I'm sure many of you know about VirtualBox for creating virtual machines. I can easily create a new virtual machine, install what variant of Linux I want, and do any testing I want - all in isolation from other things. Really useful for creating sandpit test environments, and throwing together specific combinations of software without needing a dedicated system.

Trello is my latest discovery from about 2 years ago. I've always wanted a "good enough" do list tool, but nothing too complicated or over the top - not full blown project management. I want to add new items easily, add new lists, move things between lists, order them as I want to within a list, and mix lists and items together nesting them in each other. Trello lets me do this. Although web based and requiring registration, it has a great GUI with drag and drop of items - within lists and between lists. Again, they have made it really easy to use. Click for a new item, type it in, and save. Editing is just a case of clicking twice (once to open, then edit). Trello kind of has 4 levels of hierarchy within it, so you can work out which combination of levels works best for you. There are boards (separate sets of lists), lists within a board, items within a list, and items can also have a checklist within them which is useful. So you either have one board with multiple lists and items, or multiple boards (one per project?) with lists of sub-projects and task level items. Or mix it another way. I like the checklists because they only appear when you open the item (keeping the main display less cluttered), and they work well for task lists as you tick them off as complete.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Sunday, 22 Jul 2012 21:04
The Oracle Database software is only "supported" on Red Hat Enterprise Linux and Oracle Enterprise Linux I believe. But it can be made to work on many other Linux distributions one way or another, as a lot of the stuff is the same or equally available on other Linux distributions. While this is not supported or recommended in any way by Oracle, it can be useful if you have a different Linux distribution installed on a test system for one reason or another and cannot change it. Here I will present the recipe for how I got Oracle 11g Release 2 installed on Arch Linux. Remember that none of this should be used in any kind of production like environment as it is unsupported.

Arch Linux

Arch Linux is a Linux distribution based on a rolling release model. This primarily means that there is only ever one version of Arch, being the "latest version", and whenever new releases of software packages are made by the developer they are then added to the Arch repositories after a small delay. This means that you can always get the latest version of any available software package on Arch, and that there is never a big upgrade or reinstall hassle with Arch every 6 or 12 months. Instead you regularly do a system upgrade which pulls down the latest versions of all installed packages from the repository, so that your system is always "up to date".

Another thing about Arch is that it is highly customisable to whatever combination of Linux software applications you want. It does this by only installing a "bare bones" set of Linux packages that gives you a text only console with a "login" prompt (classic old school UNIX). From this you then install all the extra packages you want, to arrive at your customised Linux, just as you want it. No bloat from other applications you never wanted in the first place. This means installing and configuring Arch is more than just booting from a Live CD and clicking on "Install". But Arch has an excellent Wiki including an "Installation Guide" that explains everything.

Arch has its own package manager (pacman) that handles all of the downloading, resolving dependencies and installation of the packages. So if a package is available you can just install it and everything it needs through one simple command.

I'm assuming for this article that you already have Arch Linux installed, and know enough about how to install it. And that you have installed Arch plus X Windows plus a full Desktop Environment (e.g. Gnome or Xfce) plus a Display Manager to handle the initial user login (e.g. GDM). If you don't use a full blown Desktop Environment then you will need the equivalent combination of Window Manager, Application Menu, etc., which is also possible in Arch.

Extra packages

Oracle 11g requires "gcc" which is in the "base-devel" package along with other development tools, such as "make". Install this, plus some other needed packages:
pacman -S base-devel elfutils libaio unixodbc sysstat pdksh icu gdb
pacman -S libstdc++5
The last one - "libstdc++5" - is needed because the make files provided with Oracle make explicit reference to this. Since Oracle 11g was originally compiled on the current at that time version of Red Hat Linux the standard C library has moved on to version 6, but Oracle still wants something called version 5 when installing.

Symbolic links

Oracle expects some commands to exist in the /bin directory when they are now in the /usr/bin directory as standard. You can work around this by creating symbolic links from one location to the other for these files:
ln -s /usr/bin/basename /bin/basename
ln -s /usr/bin/tr /bin/tr
ln -s /usr/lib/libgcc_s.so.1 /lib/libgcc_s.so.1
Other symbolic links are mentioned by other people for other Linux distributions, but for Arch these are the only ones I needed to make - other links already existed for some other files. Also bash is the default shell on Arch, which is the same as on Red Hat.

Normal Oracle Pre-Installation

  • Configure the Linux kernel and other settings as stated in the Oracle installation instructions. This includes:
    • Settings in /etc/sysctl.conf
    • Settings in /etc/security/limits.conf
    • Adding a line to /etc/pam.d/login if not already present
  • Create the groups needed (oinstall and dba), and the "oracle" user account.
  • Create the directories where the software will be installed e.g. /apps/oracle and /apps/oraInventory
  • I create a ".profile" file for the "oracle" user and set ORACLE_BASE, ORACLE_HOME based on the installation directories. Also set ORACLE_SID, and add ORACLE_HOME/bin to PATH.
 

Install Oracle 11g

Get the Oracle 11g software distribution - downloadable from oracle.com - and unzip the files to another directory. Run the installer and make your choices from the screens it presents - I normally go for "Install Software Only" and "Single Instance".

You will get the following issues when running the Installer:
  • All prerequisite checks will fail. This is because Oracle uses "rpm" to find out what packages are installed, and Arch does not use "rpm", so Oracle thinks everything is missing.
    • Providing you have installed the packages listed earlier, simply click on the "Ignore All" check box on the top right of the screen, then click "Next".
    • Click "Yes" on the pop-up dialog box that warns you about ignoring prerequisites. Then click "Install" as normal on the final screen
  • You will get a failure message during installation about "target agent nmhs", from the make file "sysman/lib/ins_emagent.mk".
    • This only affects the Enterprise Manager Agent, which is generally not used, and so can be ignored.
    • It is due to a change in how "gcc" scans libraries to find referenced symbols in a program.
    • If you really want this Agent, a workaround is possible by editing the makefile - just Google it to find a solution.
    • Assuming you don't want this Agent, just click "Continue" to ignore it can carry on.
Eventually you will get the normal pop-up dialog box about running the "root.sh" scripts as root, and then the installation is complete.

From here you can go on and create a database using the Database Configuration Assistant (dbca) or the "CREATE DATABASE" SQL statement, whichever you are more familiar with.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Tuesday, 17 Jul 2012 21:33
I really like the Python programming language for general purpose programming and quickly putting together little utilities. There are similarities to Perl in terms of being an interpreted language with powerful data processing capabilities, but there the direct comparisons end. Python is a more modern language than Perl, and has a relatively clean design (nothing is ever perfect though, and Python itself has evolved over the years). The main reasons I prefer Python over Perl are:
  • Very clean syntax - no need for obscure characters in front of variable names and so on
  • Very readable code - it is generally clear what a piece of code is doing
  • Minimal syntax dressing overhead - Python avoids the need for begin / end block statement markers
  • True support for functions
    • Perl just pushes all the arguments into a single list that you must disassemble
  • Strongly typed - Python supports multiple data types and checks at run time that operations are valid
  • Rich set of types - number, string, list, dictionaries, plus others e.g. set, tuple
  • Supports classic "function" based programming - just like 'C' does
    • Very simple to write code to directly do what you want, and modularise common code into functions
  • Also supports full "object oriented programming" - full support for Classes as first level objects
    • Perl does not truly do classes as first level objects
I'm not trying to say that Python is better than Perl in all cases, or other programming languages. I'm just saying that Python is a very good and usable programming language, and that I prefer it over Perl now. With Perl I find it gets very confusing when you try and use any level of complexity and the syntax is not obvious or consistent, whereas with Python I don't get any of this because it is such a cleanly designed language. With Python I can easily build a small application by just putting together the necessary source code while using functions for modularity (what I call "just build and run"), avoiding the overheads of defining classes with fully object oriented programming. However, if I have a more complex set of requirements then I can do a full blown object based solution using classes within Python (too complex to just build, so some initial design is needed, probably with test cases too).

Python is very useful for a variety of different scenarios:
  • building a small application or utility by just writing the code - no "compile and link" steps as it is interpreted
  • building something iteratively (top down design & development) as your code is always runnable (being interpreted)
  • prototyping object based code quicker and easier than compiled languages such as Java
  • ability to extend Python with your own libraries - it is written in C and can call your own compiled libraries
  • quickly prototyping real Java code that uses standard Java Classes in the JVM via Jython 
    • Jython is a Python interpreter written in Java
    • this also uses far fewer lines of source code than would be needed in Java, because of Python's fundamentally different design
Again, some of these capabilities are not unique to Python, but the combination of them all together make it a winner for me.

Python Overview

Python should be straightforward enough to understand on reading it. The only major difference to other languages is that Python does not have begin / end statement block markers and instead relies on statement indenting to determine which statements are part of the same block. You will see that statements with embedded statement blocks in them have a colon (':') at the end of the first line, and the following lines are indented. Initially it takes some getting used to, but it does end up with less typing on your part and a consistent code layout.

Although Python supports what I call direct coding (write some statements and then just run them), it is really fully object oriented behind the scenes. Thus you see many standard functions actually returning objects, against which you invoke one of their methods (see Database Example below). The syntax of "object.method (arguments)" occurs frequently, rather than the non-object way of "function (object, other-arguments)".

Python Development

You could edit and run Python programs from the command line ("python filename.py" - the ".py" extension is just a common convention), or you could use an Integrated Development Environment, such as Eclipse with the PyDev plugin. The nice thing about this is that it does syntax checking for you as you type, so you can easily spot various errors before you run your code. And you can run your Python application from within the development environment, so you don't have to leave the editor each time you want to test something.

Python Database Access

Like Perl, Python defines a standard API for database access. The intention is that developers can write code based on this standard API, and a variety of drivers can be provided for access to different databases. By changing the driver used you can connect to different databases, but the bulk of your code remains the same.

There are a bunch of drivers out there for Oracle, but the most common one seems to be cx_Oracle.

Python Database Examples

Using Python is relatively straightforward. I won't describe the database API in detail, as it is obvious when you use it. In summary, you call a "connect" library function with connection details and get back a connection handle (really an object). Using this you can then create a new statement handle, execute it, and fetch back any data or other results. One neat thing is that Python has a "fetchall" method that fetches back all data rows for a SELECT into a single list (actually a list of rows, each row being a list of columns). This then lets you traverse the data using Python logic, with no more fetch calls. I assume that there are efficiencies with the "fetchall" call over how the data is transferred back over the network, but I have not done any tests about this. It will of course end up using more memory within your Python program to hold all the results fetched.

Example

Connect to an Oracle database and execute a SELECT from a table, printing out the data fetched.
from cx_Oracle import connect

conn = connect ("scott", "tiger", "orcl")
# Or: conn = connect('scott/tiger@orcl')
curs = conn.cursor()
curs.execute("SELECT name, value FROM atable")
rows = curs.fetchall()
for i in range(len(rows)):
print "Row", i, "name", rows[i][0], "value", rows[i][1]
Notes:
  • len is a built in function that returns the number of entries in a list
  • range is a built in function that produces a list of numbers from 0 to one less than the supplied value
  • List members can be accessed using array like notation
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Thursday, 12 Jul 2012 20:36
I was trying to set up Data Guard between two servers and I kept getting connection errors from RMAN on the primary to the secondary:
RMAN>  connect auxiliary sys/syspassword@dgtest_stdby

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-04006: error from auxiliary database: ORA-01017: invalid username/password; logon denied
I tried to validate everything and it looked okay ("tnsping" worked), but I could not connect through SQL*Net at all, whether from RMAN or SQL*Plus. Local direct connections worked fine though. Eventually I managed to work out what was wrong.  So in case anyone else runs into the same problem ...

What I had originally was:
  • A newly created database (using CREATE DATABASE the old fashioned way)
  • An Oracle password file created using orapwd with the SYS password
  • A listner.ora on each server defining a static service name for the database on that server
  • A tnsnames.ora on each server defining the mappings of connection names to servers and services
The problem was that I had never explicitly set a password for the SYS user in the newly created database. I could still connect locally, which I assume is because I was in the "dba" group in the operating system. But I could not connect remotely no matter what password I used. So the solution was:
  • ALTER USER SYS IDENTIFIED BY NEW_PASSWORD
  • This automatically updates the $ORACLE_HOME/dbs/ora$ORACLE_SID password file on the server
  • Copy (ftp) this file over to the second, standby server
Now I could connect over SQL*Net to either database.

I would say that the error message is not particularly helpful, as it seems that all manner of different error conditions can give rise to the same error message. And there is almost no way of investigating this, other than trial and error - which I was getting sick of after 2 hours of getting nowhere. I knew that the underlying SQL*Net worked, as tnsping worked, and I knew that the Listener configurations were correct. Eventually I simply decided to change the SYS password to be something completely different and lo and behold I could now connect to the primary database remotely. It then took another few minutes before I realised that I needed to copy the Oracle password file to the standby server again, because it had changed.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Monday, 02 Jul 2012 20:22
When executing SQL involving ANSI syntax OUTER JOIN's Oracle first converts this to its own native syntax before executing the SQL, and sometimes this conversion is not totally transparent thereby affecting and restricting the execution plan produced.

This started when I had a long running query, and it happened to involve outer joins between most of the tables (but not all of them). From the execution plan it was clear that the main problem was a full table scan on a very large table - out of a total cost of 35,000, 28,000 was from this full table scan alone. So replacing the full table scan by a more efficient index access, if possible, could lead to a significant improvement.

After some analysis I created an index on all of the referenced columns in the table. This meant that all necessary data for the query could be obtained from the index, with no need to visit the data rows in the table at all. As the data row was quite wide (180 bytes), the index was much narrower and much smaller in size. Also, because the index stores data in sorted order by the leading columns, it meant that the rows the query wanted would all be stored together within the index structure, further reducing the number of disk I/Os needed to retrieve it all. But, the Oracle Optimizer would not use this index at all. It just seemed to ignore it, and always do a full table scan.

At this point I tried another of my techniques for understanding how a query gets executed - remove tables from the query one at a time, until you get a more "sensible" execution plan, and then build it back up again to the final query by adding a table at a time, monitoring the execution plan after each change.

What I noticed was that when I got down to 3 tables from the original 7, the Optimizer would now choose to use the new index (doing an Index Fast Full Scan, at a lower cost of 4,000). But when I added back in a fourth table, it went back to doing a Full Table Scan at a cost of 28,000.

I did a 10053 trace on the execution of each query (3 table and 4 table queries) and noted that in the SINGLE TABLE ACCESS PATH section of the 3 table query it was costing an Index Fast Full Scan (IndexFFS), whereas in the 4 table query it was not costing this access path, and only had the Full Table Scan costed.

At this point while I could see what was going on (in one case the Optimizer knows about the Index Fast Full Scan as an option, but in the other case it doesn't) I could not explain why it was happening, nor what to do about it. So I turned to the Oracle Forums and raised a question about Why Optimizer ignoring Index Fast Full Scan? While waiting for replies I also continued to tinker with the queries in the hope of gaining some extra understanding of what was going on.

The cause of the problem was identified by Dom Brooks as being the fact that Oracle internally rewrites ANSI syntax OUTER JOIN's to its own native syntax (using "(+)") before then optimizing and executing the query, and that this can result in some complex SQL that cannot be optimised as you might otherwise expect. Also see his earlier reply which gives more background information and links to other explanations about this.

When it rewrites the ANSI syntax JOIN's to Oracle native syntax, it wraps each join up in its own in-line view, to ensure that it is equivalent to the ANSI meaning. It then tries to merge these inline views back into the main query. For a single outer join, and other simple cases, this view merging happens smoothly. But in other cases it does not, and the Optimizer is left with some complex SQL with the joins being performed within the inline views. And this restricts the options available to it when building the final execution plan.

There is more information on this at the Oracle Optimizer Blog - Outer Joins in Oracle
"In Oracle, ANSI left and right outerjoins are internally expressed in terms of left outerjoined lateral views. In many cases, a left outerjoined lateral view can be merged and the ANSI left (or right) outerjoin can be expressed entirely in terms of Oracle native left outerjoin operator. (A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.)"
So for a simple single outer join such as the following:
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2 ON (T1.x = T2.x);
it first becomes this on initial conversion to a lateral inline view:
SELECT T1.d, LV.c
FROM T1, LATERAL (SELECT T2.C FROM T2 WHERE T1.x = T2.x)(+) LV;
which then becomes this when the view is merged:
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) ;
However, if the joins are more complex, or there are other issues e.g. NULL values allowed within indexed columns, then it may not be possible to collapse the lateral views back into a single level query. In which case the options then open to the Optimizer are less than they would be otherwise, as it must treat the unmerged views on their own.

So if you have a query involving OUTER JOIN's that is not getting as good an execution plan as you think it could get, it may be due to you having used ANSI join syntax. And when Oracle rewrites this to its native syntax it is not able to merge back together the inline views for one reason or another. In my case it seems to have been due to the fact that almost all columns allowed NULL values to be stored in them (except the primary key columns), and NULL values are not stored within indexes on disk. As a result, Oracle was not able to utilise this extra index when more tables were added to the query that were also outer joined.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 20 Jun 2012 19:54
Previously I showed some of the issues with allowing NULL values in columns and how indexes might be ignored by the Optimizer. All of the examples I gave involved single column indexes only, and showed that NULL values are not stored in such an index. As others have pointed out in comments, this is not necessarily true for multi-column indexes. If at least one of the columns in the index does not allow NULL values then an index entry is stored with the values of all of the indexed columns. This means that a NULL value can be stored in a multi-column index. Lets test that through an example.

Previously we saw that if we had an index on the ref_id_n column in the transaction table, which allowed NULL values, then the Optimizer would not use the index for an "IS NULL" constraint:
SQL_ID  cwm2cmgn8q09n, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3578 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)
The index on ref_id_n is being ignored, as the NULL values are not stored in that index. However, if we create another index, with ref_id_n as the first column followed by another column that does not allow NULL values, then the index will contain entries where ref_id_n is NULL. With such an index, the Optimizer can now use it for an "IS NULL" constraint:
SQL> create index ix2_transaction on transaction (ref_id_n, ref_id_nn) ;

Index created.

SQL> select count (*) from transaction where ref_id_n is null ;

COUNT(*)
----------
100

SQL> @xplastexec

SQL_ID 8tubzdty7vdnv, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 176942238

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| IX2_TRANSACTION | 1 | 4 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REF_ID_N" IS NULL)
So multi-column indexes can store NULL values in them, providing at least one column does not allow NULL values. The benefit of such an index can be significant - in this test case the cost came down from 3,578 to 2, simply because the number of NULL rows were so few. But an index on only the ref_id_n column itself is of no use for this query, and is ignored by the Optimizer.

Potentially you can also gain some benefit from an index where the ref_id_n column is not the leading index, as the index may be smaller in size than the table, and the Optimizer may chose an Index Full Scan rather than a Full Table Scan. And that is the case with the test data set I have been using:
SQL> drop index ix2_transaction ;

Index dropped.

SQL> create index ix2_transaction on transaction (ref_id_nn, ref_id_n) ;

Index created.

SQL> select count (*) from transaction where ref_id_n is null;

COUNT(*)
----------
100

SQL> @xplastexec

SQL_ID 2j64r2n1nq4xm, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 1095380460

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 726 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FAST FULL SCAN| IX2_TRANSACTION | 1 | 4 | 726 (1)| 00:00:09 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)

The Index Fast Full Scan is costed at 726, compared to the 3,578 of the Full Table Scan. Being a "count (*)" only, no other data columns are needed from the table itself.

So if you are allowing NULL values in columns within your database design, and you want to find those rows that have a NULL value stored, then you cannot use an index on just that column alone. You will need a multi-column index, and include another column that does not allow NULL values. However, it may be that the only column that does not allow NULL values in your database design is the primary key column(s) itself, if you simply allow NULLs for every column by default.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Date: Wednesday, 13 Jun 2012 20:57
It is quite clear that Oracle does not store NULL values within indexes (B-Tree indexes specifically, being the most common type used within Oracle). This has been the case for many years - I remember reading about this in the version 7 days over 15 years ago. My understanding was that this was an optimization on Oracle's part, as they believed that NULL values would not be used in most queries, and that storing NULL values in an index might skew the index somehow if there were a great many number of NULL values in rows. There are several consequences of this non-storage of NULL values in an index, as I mentioned in a previous post on NULL values. But it seems that some people are still unaware of this issue, believing that NULL's are stored within indexes, and that as a result all indexes are equal. Not having any hard evidence to refer people to, I thought I would invent some simple tests to show that NULL's are not stored in indexes, and the various side effects of this. Also, assumptions can be dangerous things when taken too far, and the memory plays tricks over what I once read in the past. So providing some real test cases would either verify any assumptions I had made, or show up them as being wrong.

Tests Overview

I'll create two simple tables - one as a master table of codes (reference with 10,000 rows), and another larger table that refers to the master table as a foreign key (transaction with 1,000,000 rows). This means that there are 100 rows per Reference Identifier value in Transaction. The Transaction table will have two such foreign key columns - one without NULL's (ref_id_nn), and one with NULL's (ref_id_n).

Indexes will be created on these columns, and some example queries run using either reference column to show whether the indexes are being used or not. We can also look at the statistics on the indexes themselves to tell us something about whether NULL values are stored in them or not. I provide all the SQL to create and populate these two tables at the end of this post - I assume most people are interested in the results first.

Index Statistics

Immediately we can see that NULL values are not stored in an index by looking at the statistics for the indexes on the Transaction table.

col index_name heading 'Index'     format a20
col lb heading 'Leaf|Blocks' format 999,999
col dk heading 'Distinct|Keys' format 999,999,999
col cf heading 'Clustering|Factor' format 999,999,999
col nr heading 'Num Rows' format 999,999,999
--
select i.index_name,
i.leaf_blocks lb,
i.num_rows nr,
i.distinct_keys dk,
i.clustering_factor cf
from user_ind_statistics i
where i.table_name = 'TRANSACTION'
/
The results of this query are:
                         Leaf                  Distinct   Clustering
Index Blocks Num Rows Keys Factor
-------------------- -------- ------------ ------------ ------------
PK_TRANSACTION 1,875 1,000,000 1,000,000 13,147
IX1_TRANSACTION_NN 2,090 1,000,000 10,000 1,000,000
IX1_TRANSACTION_N 2,090 999,900 9,999 999,900

Look at the "Number of Rows" values and the "Number of Distinct Keys" values for the two indexes on the two foreign key columns. The index on the column allowing NULL's has one less distinct key value - because the NULL value has not been stored in the index. Whereas the index on the column without any NULL's has the full 10,000 distinct key values in it. This is also reflected in the number of rows covered by the index - 100 less for the index with NULL values than for the index without NULL values. So already we have evidence that NULL values are not stored within a B-Tree index i.e. they are ignored by the index itself.

Queries

In my previous blog post I made two claims about indexes on columns allowing NULL values not being used:
  • Such an index cannot be used to satisfy an "IS NULL" query constraint
  • The index also cannot be used to satisfy a "column != value" query constraint
Lets test these claims. In principle we want to run the same query against each of the two foreign key columns, and see whether the corresponding index is used. However, this may not be possible for one reason or another, as I'll explain in each case.

"IS NULL" Constraint

The query is:
select count (*) from transaction where ref_id_n is null ;
When executed the query has the following execution plan (as from dbms_xplan.display_cursor immediately after executing the query):
SQL_ID  cwm2cmgn8q09n, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3578 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("REF_ID_N" IS NULL)
So the index was ignored and a full table scan was done, even though there were only 100 rows to retrieve.

Unfortunately we cannot run this query using the other column, because it does not allow NULL values. So the Optimizer will know that this query cannot return any rows at all. (It actually did an Index Fast Full Scan when I ran it, and returned a count of 0 rows). Instead we can change the query to compare to a real value, rather than NULL, and see that it uses the index for the execution now, even though it is still 100 rows again from the 1,000,000 in the table.
SQL_ID  7burxn278qj8b, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n = 5

Plan hash value: 1807025728

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| IX1_TRANSACTION_N | 101 | 404 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REF_ID_N"=5)
This shows that an index on a column allowing NULL values cannot be used when executing a query with an "IS NULL" constraint on that column.

"!=" Constraint

I've since realised that the Optimizer would be more likely to choose a Full Table Scan whether the column allowed NULL or not for a "not equals" constraint, simply because of the number of rows being returned. If 100 rows match a value, then 999,900 do not match that value. And a Full Table Scan will be the optimal access method regardless of whether the column allows NULL values or not.

However, if the query only did a "count (*)" and did not retrieve any data columns from the table, then potentially an index could be used - with the Index Fast Full Scan access method. Consider the following performed on each of the two foreign key columns:
select count (*) from transaction where ref_id_nn != 5 ;
When I execute this I get 999,900 for the non-NULL column, and 999,800 for the NULL allowed column. Whoops! Something is not right somewhere. Again, we are back to the point that NULL is not a "normal value" within the value space of the data type of the column. And a NULL value is treated differently by Oracle rather than just another value. So although NULL is never equal to another real value, it is also never not equal to another real value. Even if I force a full table scan via the "full (transaction)" hint, I still get a count of 999,800 for the "ref_id_n != 5" constraint. (The execution plan from dbms_xplan.display_cursor showed that "TABLE ACCESS FULL" was used on TRANSACTION). [This very point about "!=" was also mentioned in a comment by Narenda on my previous post, and I had independently stumbled upon this as a result of this series of tests].

So even though we have not been able to show one of my claims about NULL values and indexes, we have instead stumbled upon another issue with NULL values. A NULL value is never equal to a real value, and also never not equal to a real value. I am sure that I have seen Jonathan Lewis discuss this in a presentation when talking about rewriting SQL statements into their equivalents. If a column allows NULL values then you must be careful when rewriting equality constraints into equivalent inequality constraints.

Further testing threw up another anomaly. Rather than test "!=" I thought I would test "not in", but contrived so that there is only one value in the "not in" list. For the NULL allowed column the Optimizer chose a full table scan:
SQL> select count (*) from transaction where ref_id_n not in (select 5 from dual) ;

COUNT(*)
----------
999800

SQL> @xplastexec

SQL_ID 0f3nh5h11yu6p, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n not in (select 5 from dual)

Plan hash value: 297680891

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 23425 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL| TRANSACTION | 998K| 3900K| 3578 (1)| 00:00:43 |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( IS NULL)
4 - filter(LNNVL(:B1<>5))

While for the non-NULL column it used the index:
SQL> select count (*) from transaction where ref_id_nn not in (select 5 from dual) ;

COUNT(*)
----------
999900

SQL> @xplastexec

SQL_ID fw2svuam6vzb1, child number 0
-------------------------------------
select count (*) from transaction where ref_id_nn not in (select 5 from dual)

Plan hash value: 1550919231

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 20422 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | FILTER | | | | | |
| 3 | INDEX FAST FULL SCAN| IX1_TRANSACTION_NN | 998K| 3900K| 575 (2)| 00:00:07 |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( IS NULL)
4 - filter(LNNVL(:B1<>5))
So I think that this example does show that there are times when the Optimizer will not use an index on a column allowing NULL values, but would otherwise use an index if NULL's were not allowed.

NULL as "Anything"

Another pattern I have seen is where a NULL value is used to mean "any value", rather than "no value". So in a system that involves processing transactions according to type, and each type is processed by a different handler, then a NULL value means that the transaction can be processed by any available handler. This leads to a query similar to the following:
select ... from transaction where (ref_id_n = 5 or ref_id_n is NULL) ...
The execution plan for this cannot use the index because NULL values are not stored in it. Hence you get an execution plan of
SQL_ID  4tsn8vbt3s3gh, child number 0
-------------------------------------
select count (*) from transaction where ref_id_n = 5 or ref_id_n is null

Plan hash value: 185056574

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3580 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| TRANSACTION | 101 | 404 | 3580 (1)| 00:00:43 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("REF_ID_N"=5 OR "REF_ID_N" IS NULL))
However, if the database design was different, and the value '0' used as a special case for "any handler", with a corresponding row inserted into the "reference" table, then the query could be changed to the following and Oracle would use the index on the column:
SQL_ID  d2v9fgscm8nrq, child number 0
-------------------------------------
select count (*) from transaction where ref_id_nn = 5 or ref_id_nn = 0

Plan hash value: 1215059433

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INLIST ITERATOR | | | | | |
|* 3 | INDEX RANGE SCAN| IX1_TRANSACTION_NN | 201 | 804 | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("REF_ID_NN"=0 OR "REF_ID_NN"=5))
Again, the presence of NULL values results in sub-optimal query execution, and a different design can provide a better solution, allowing the Optimizer to use an index for much faster access (cost of 4 versus 3580).

Conclusion

For now I think I have written enough in one post about NULL's and indexes. My main point was to provide some real evidence that NULL values are not stored in B-Tree indexes in Oracle, and that if NULL values are allowed in columns it can affect whether the Optimizer will use an index or not. And I think I have done that. I'm sure that people experienced with Oracle already know this, but I just wanted to provide some proof points for anybody who doubted it for any reason. As ever, the benefit of some real tests is that you get to either verify any assumptions you have made or show them up to be wrong and correct them. In this case I have been able to correct some weak assumptions I had, and have learnt some more about how the Oracle Optimizer works, specifically when handling NULL values.

Table Creation SQL

Here is the SQL to create the tables, populate them with data and index them. Note the two foreign key columns in "transaction", and that one value is mapped to NULL when generating the data.
--
-- Create the tables necessary for the NULL values investigation
-- Create tables, load in generated data, index, gather statistics
--
prompt Creating tables
--
-- Reference table - a lookup of master values
--
create table reference (
ref_id number (*,0) not null,
group_id number (*,0) not null,
description varchar2 (512) not null,
constraint pk_reference primary key (ref_id)
) ;
--
-- Transaction table
--
create table transaction (
trans_id number (*,0) not null,
ref_id_nn number (*,0) not null,
ref_id_n number (*,0) ,
location_id number (*,0) not null,
padding varchar2 (512),
constraint pk_transaction primary key (trans_id)
) ;
--
alter table transaction add constraint fk_ref_nn
foreign key (ref_id_nn) references reference (ref_id) ;
--
alter table transaction add constraint fk_ref_n
foreign key (ref_id_n) references reference (ref_id) ;
--
prompt Loading data
--
-- SPECIAL DUMMY ROW!! Keep in sync with any newly added columns!*!
insert into reference values (0, 0, 'Unknown - Dummy entry for referential integrity purposes') ;
--
insert into reference
with generator as
(select rownum r
from (select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 1000000
)
select r,
mod (r, 1000),
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' || to_char (r, '00000000')
from generator g
where g.r <= 10000
/
commit ;
--
insert into transaction
with generator as
(select rownum r
from (select rownum r from dual connect by rownum <= 1000) a,
(select rownum r from dual connect by rownum <= 1000) b,
(select rownum r from dual connect by rownum <= 1000) c
where rownum <= 1000000
)
select r,
mod (r, 10000),
decode (mod (r, 10000), 0, null, mod (r, 10000) + 1),
mod (r, 1999),
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789' || to_char (r, '000000000')
from generator g
where g.r <= 1000000
/
commit ;
--
prompt Creating indexes
--
create index ix1_transaction_nn on transaction (ref_id_nn) ;
create index ix1_transaction_n on transaction (ref_id_n) ;
--
prompt Gathering Statistics
--
exec dbms_stats.gather_schema_stats ('JB')
--
So there are 10,000 rows in Reference, and 1,000,000 rows in Transaction. This means that there are 100 rows per Reference Identifier value in Transaction.
Author: "John Brady (noreply@blogger.com)"
Send by mail Print  Save  Delicious 
Next page
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader