GoldenGate Auto CDR and Off The Shelf Applications

Recently, I have been involved in completing a Proof of Concept for adding GoldenGate to an Off the Shelf (OTS) Application to facilitate keeping their DR site in sync.  This post isn’t to discuss the pros or cons of using GoldenGate for this purpose, but rather to share a few items about Auto Conflict Detection and Resolution that will prevent many OTS Applications from using this GoldenGate feature. I have also changed all references to the REAL application to the SAMPLE HR schema to protect the innocent.

This useful feature was first available with Database 12.2 and GoldenGate Version 12.3.  All in all, it is a great feature if you are looking for a database managed Conflict Detection and Resolution Methodology. You can read all about the feature here:
Automatic Conflict Detection and Resolution

One thing I will mention is that while the feature is useful, it does produce a ton of clutter in the schema you choose to deploy this feature in.  As someone who really likes to keep clutter at a minimum, you will notice the following new objects in your schema if the default options are used.  Using simplest form of Auto CDR and the HR.EMPLOYEES table as an example:

  • A new hidden column named CDRTS$ROW TIMESTAMP(6) has been added to the table
  • A new table called a Tombstone table which keeps track of delete rows DT$_EMPLOYEES is created

Now, as of this article, there have been several bugs logged with reguards to the Tombstone table and its functionality so you can choose to not use it by modifying the option TOMBSTONE_DELETES to FALSE upon running the “DBMS_GOLDENGATE_ADM.ADD_AUTO_CDR” for any given table.

It is these invisible and hidden objects which cause issues with OTS products which utilize SQL Frameworks such as:
Hibernate.  It wasn’t until we rebooted the application that we found that it simply would not come up any longer.  In the beginning we got the usual Java error message as long as your arm and we couldn’t tell what was really wrong until we enabled SQL Tracing for the processes at startup.  It was then that TKPROF showed us exactly where the problem query was:

select name,intcol#,segcol#,type#,length,nvl(precision#,0),decode(type#,2,
  nvl(scale,-127/*MAXSB1MINAL*/),178,scale,179,scale,180,scale,181,scale,182,
  scale,183,scale,231,scale,0),null$,fixedstorage,nvl(deflength,0),default$,
  rowid,col#,property, nvl(charsetid,0),nvl(charsetform,0),spare1,spare2,
  nvl(spare3,0), nvl(evaledition#,1),nvl(unusablebefore#,0),
  nvl(unusablebeginning#,0), case when (type# in (1,8,9,96,112)) then
  nvl(collid, 16382) else 0 end case, nvl(collintcol#,0), nvl(acdrrescol#, 0),
  nvl(spare7, 0)
from
col$ where obj#=:1 order by intcol#;

It is this query which the framework uses to build its object-relational mapping. Guess what. It does not filter out INVISIBLE columns as evidenced by the following error message (table / column names changed as appropriate):

ORA-01400: cannot insert NULL into (“HR”.”EMPLOYEES”.”CDRTS$ROW”)

This column is used for conflict detection and maintained by Oracle. Seeing that the application does not know what to do with it, it errors out. Perhaps the a possible remedy to this situation is to propose to the framework author to modify the query which builds the dictionary to include the following additional where clause “and col# 0”. Columns are marked invisible when the col# is 0:

select name,intcol#,segcol#,type#,length,nvl(precision#,0),decode(type#,2,
  nvl(scale,-127/*MAXSB1MINAL*/),178,scale,179,scale,180,scale,181,scale,182,
  scale,183,scale,231,scale,0),null$,fixedstorage,nvl(deflength,0),default$,
  rowid,col#,property, nvl(charsetid,0),nvl(charsetform,0),spare1,spare2,
  nvl(spare3,0), nvl(evaledition#,1),nvl(unusablebefore#,0),
  nvl(unusablebeginning#,0), case when (type# in (1,8,9,96,112)) then
  nvl(collid, 16382) else 0 end case, nvl(collintcol#,0), nvl(acdrrescol#, 0),
  nvl(spare7, 0)
from
col$ 
where obj#=:1 and col# != 0 
order by intcol#;

Once your particular SQL framework is able to properly handle invisible columns you should be able to use this useful feature. Unfortunately for my case, it will be another release or 2 before the framework we are using can handle it.

Advertisements

Be a Problem Solver Not a DBA #2

As a continuation of this series (first post here), I would like to discuss one of the processes that I use in an attempt to efficiently solve problems.  Recently, I was in the middle of a situation where a client resource came in pounding the table that they believed something needed to be disabled and it had to be done right now!  I didn’t dispute that there was a problem, but there was very little evidence of who was being impacted, what was being impacted and how much the problem was impacting these resources.  Part of being a good IT resource is not only identifying problems and how to fix them, but more importantly what are my alternatives and how is it effecting the system and if we fix it what else might happen.

Perhaps you have noticed, but from time to time I will make references to aviation and aviation related topics.  While it is one of my hobbies, aviation has also taught me to be a better problem solver.  This is because when you are up there and something happens, you need to have the tools to figure things out on your own as more often than not, there isn’t anyone else to help.  So this post will focus on a method of problem solving that I learned throughout my flight training; one that I rely heavily on today, the DECIDE model.

Aeronautical Decision Making (ADM) is a cornerstone of both being a good aviator and being able to solve problems.  So for the context of technology, lets just think of it as Decision Making.  If you ever care to read it, a full chapter on this is publicly available via the FAA website:

Effective Aeronautical Decision-Making

DECIDE

  • Define

Define the problem.  What is going on?  When did it start?  What is the impact?  Is there really a problem?  Is what needs to be happening still happening?

  • Establish the Criteria

Establish the criteria for evaluating the identified problem.  What do you need to achieve in this decision-making process?  What about the current situation do we need to make sure continues to happen?  What additional problems or side effects should we try to avoid inducing?

  • Consider Alternatives

What are all the possible choices which allow us to fulfill our previously defined criteria?  Do we do nothing?  Defer for a period of time?  Which alternative has the least assumptions?

  • Identify the BEST Alternative

Select the best alternative based upon experience, intuition and experimentation.  The important thing to remember here is to rely on all available resources to Identify the alternative.  While it’s important to be fast, do not sacrifice considering all corners of an alternative as part of identifying the BEST alternative.  Its OK to have an interim alternative before implementing the final alternative.

  • Do

Develop and Implement a plan of action.  When will we implement?  Is there a short-term and long-term implementation plan?  What people and other resources are needed?

  • Evaluate

Evaluate and monitor the solution.  Did the implementation go as expected?  What could go wrong?  How will it be handled?  What could be done next time to improve?

 

Hopefully implementing this into your routine and becoming methodical about solving problems will make you more efficient and more certain about the solutions you are proposing.  I know it works for me.

Additional References:
Problem Solving – Free Management Books

Series: Be a Problem Solver Not a DBA #1

I’m going to try from time to time to publish some scenarios I have been in throughout my career where being a DBA doesn’t mean being a DBA, it means being a problem solver.

In my opinion being a good problem solver requires following several basic tenets, one of which is:

“When presented with competing hypotheses to solve a problem, one should select the solution with the fewest assumptions.” – William of Ockham

Having done many migrations throughout my career, I have learned that performing database migrations is much like a “Reality TV” script. Everything starts out with a plan, the plan is executed and usually, with days to go, there is a big risk that jeopardizes the project. All to be figured out in the end with a successful migration. A recent migration was no different, however this time, it was a perfect example of how to be a Problem Solver not a DBA.

The purpose of this post is to not fully explain problem solving methods, it is more to discuss going outside the comfort zone as a DBA and look at items that you may not normally look at. In this case, I know enough about java and java coding to be dangerous, but knew that the other resources looking at this weren’t going to solve the problem (per the vendor, there was only one person on the planet who could solve this and they were on a bus in Italy) so I had to take things into my own hands.

A little background:

This particular migration was an upgrade from 11.2.0.4 to 12.2.0.1 on a SuperCluster. About a week or so out, I saw a very high spike in memory utilization to the point where the system was out of memory. Upon investigation, we found out that their scheduling agent was utilizing a newer version of Java and in turn, using 4x more heap space than the previous version of Java.

Upon investigation, I found that the process was not utilizing either the -Xms or -Xmx flags when invoking the process so what changed between Java versions to cause the increased utilization?

Since we did not own that portion of the application, the issue was transferred to the responsible party to troubleshoot. After several days of no movement, I decided to put my “Problem Solving” had on.

Using the your tenets of problem solving follow a logical path:
After lots of searching, I tried to check the defaults of what the java uses for min heap and max heap by default. There was a big change from the old and new version. For example, the old version used:

java -XX:+PrintFlagsFinal -version | grep HeapSize
    uintx ErgoHeapSizeLimit               = 0               {product}
    uintx InitialHeapSize                := 126556928       {product}
    uintx LargePageHeapSizeThreshold      = 134217728       {product}
    uintx MaxHeapSize                    := 2025848832      {product}
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

While the new version version used:

java -XX:+PrintFlagsFinal -version | grep HeapSize
    uintx ErgoHeapSizeLimit               = 0              {product}
    uintx HeapSizePerGCThread             = 87241520       {product}
    uintx InitialHeapSize                := 2147483648     {product}
    uintx LargePageHeapSizeThreshold      = 134217728      {product}
    uintx MaxHeapSize                    := 32210157568    {product}
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

 

Ultimately, the solution was to add the “-Xms and -Xmx flags” to the program invoking the java process as to not utilize the environment defaults. In addition, this doesn’t waste infrastructure resources and also reduces time to invoke and close the java process by only assigning the memory that you need.

And as part of any problem solving exercise, focus from the bottom of the stack up, especially when multiple changes are in play.  In this case, the path with the least assumptions surrounded the changed java version so thats where I focused my effort.

Oracle 12 Non-CDB to PDB Migration Methods

Continuing with trying to finish additional blog posts which have been on my list for a while, is a review of the methods of migrating a Non-Container Database (Non-CDB) to a Container Database (CDB) with Pluggables.

I wish this was easier and Oracle gave you a complete “in-place” method to do this, but the only way to get a Non-CDB (pre-12c database) to a CDB is to create a new database and migrate the Non-CDB into the new CDB as a PDB.

I will make the assumption that the creation of the new CDB is complete. So lets look at some methods of creating / migrating the PDB and their pros and cons:

In either case, you need to start with creating the XML file which will describe the new PDB:

On the Non-CDB:

	SHUTDOWN IMMEDIATE;
	startup mount exclusive;
	alter database open read only;

Create the PDB Describe XML File:

     BEGIN
       DBMS_PDB.DESCRIBE(
          pdb_descr_file => '/export/home/oracle/nonCDBToPDB.xml');
     END;
     /
     shutdown immediate;

Verify the XML File on the new CDB:

     SET SERVEROUTPUT ON
     DECLARE
         hold_var boolean;
     begin
          hold_var := DBMS_PDB.CHECK_PLUG_COMPATIBILITY(pdb_descr_file=>'/export/home/oracle/nonCDBToPDB.xml');
     if hold_var then
          dbms_output.put_line('YES');
     else
          dbms_output.put_line('NO');
     end if;
     end;
     /

Check for errors:

   set lines 300
   col cause for a25
   col message for a150
   set pagesize 9999
   select name,cause,action,type,message,status from PDB_PLUG_IN_VIOLATIONS;

Before you can proceed, all errors contained in PDB_PLUG_IN_VIOLATIONS must be resolved. Next are 2 out of the 3 methods to migrate an Non-CDB to PDB. I will leave the “COPY” method out of this post as it is not feasible to move any of the databases I deal with on a day to day basis using ‘COPY’.

Create the PDB using “NOCOPY”. While “NOCOPY” is the fastest, it could be the most problematic long term because this function leaves all datafiles where they came from and since the new CDB is likely to be on the same host, the naming differences could be confusing at some point. Nonetheless, for demonstration, the command is quite easy:

     CREATE PLUGGABLE DATABASE devpdb USING '/export/home/oracle/nonCDBToPDB.xml' NOCOPY TEMPFILE REUSE;

Based on my testing, the method I liked the most was the ‘MOVE’ option. To some, this may seem invasive, but for my environments this was the best option because the new file names are also corrected to contain the correct OMF path names based on the new CDB. While this method wasn’t as fast as NOCOPY, in my 18TB environment with 1200 datafiles, this command finished in just over 30 minutes. Pretty acceptable in my book:

     CREATE PLUGGABLE DATABASE devpdb USING '/export/home/oracle/nonCDBToPDB.xml' MOVE TEMPFILE REUSE;

Finishing off the migration to the new PDB is the same regardless of the migration method:

Verify the PDB:

	select name,guid, open_mode from v$pdbs;
	col pdb_name for a15
	select pdb_name, status from dba_pdbs;

Clean up the PDB. This by far was one of the longest operations of the whole conversion:

	alter session set container=devpdb;
	$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql

Check for errors:

	set lines 300
	col cause for a25
	col message for a50
	col action for a100
	set pagesize 9999
	select name,cause,action,type,message,status from PDB_PLUG_IN_VIOLATIONS;

In my case, the only remaining violations that I had were some orphan database services that were stuck in the metadata. To clean this up you can execute:

	alter session set container=;
	
	select SERVICE_ID,NAME,NETWORK_NAME,PDB FROM CDB_SERVICES ORDER BY PDB,SERVICE_ID;
	select SERVICE_ID,NAME,NETWORK_NAME FROM DBA_SERVICES ORDER BY SERVICE_ID;
	
	exec dbms_service.delete_service('')

Ensure that all pluggables are open and open on restart:

	alter pluggable database all open;
	alter pluggable database all save state;

As with anything in our business, adequate testing goes a long way and these were my observations in my environment. If your experience varies, I sure would like to hear about it.

Updating Solaris DNS via SVCCFG

It’s been a while since I have made a blog post. Time has certainly been ver precious. Believe me, I have no shortage of topics to cover either.

Recently, while doing some of my routine checks, I discovered one of of the SuperClusters that I support had the DNS of all Global and Local Domains pointing to a DNS server over 2000 miles away. How that happened isn’t entirely important, and I know we can all agree this isn’t a great idea, especially when there is a DNS in the local data center.

So let’s look at the best way to fix this. Solaris 11 makes it pretty easy using the “svccfg” command, right?

svccfg -s network/dns/client listprop config
svccfg -s network/dns/client setprop config/nameserver = net_address: "([primary dns] [backup dns])"
svcadm refresh dns/client
svccfg -s network/dns/client listprop config

Well there you have it. It all set now and forever? Actually, no. Upon further investigation, I found that the “/etc/resolv.conf” file was not updated with the new changes. Why was that? Further research yielded that there was actually one more command to execute and it wasn’t located any where near the first set of commands in the documents:

nscfg import svc:/network/dns/client:default

After I had added that simple command to the set that had been executed first, the “/etc/resolv.conf” file now reflected the new values.

Hope this helps.

Adjusting Available CPU Threads in SuperCluster Local Zones Online

Lately, I have been working on the Oracle SuperCluster platform. After having worked with Linux for the past many years, it was quite refreshing to get back to an OS that so many of us have worked on. As part of our local zone layout, we have a requirement to allocate different amount of M7 CPU Threads per zone. Upon researching the best way to do this, I found varying information, so I thought that I would go ahead and blog about the way that worked best for this situation.

In this case, CPU Thread control was set-up using resource pools. Solaris Resource Pools are described here:

Oracle Solaris Resource Pools

By default, the resource pool does not restrict access or control scheduling. By modifying the resource pool and allocating specific threads to specific zones, you thereby allocate threads to the local zones.

Here’s how:

First, lets display the pool layout. Since we only need to look at allocating threads (the command actually outputs a ton of data), I will limit the output to only what is relevant.

Find the pool configurations you want to effect. Pset pertains directly to cpu threads so that is what we will look for:

#  poolcfg -dc info | egrep 'pset |pset.size|pset.min|pset.max'

        pset pset_[host name]_id_25289
                uint    pset.min 32
                uint    pset.max 32
                uint    pset.size 32
        pset pset_[host name]_id_25223
                uint    pset.min 64
                uint    pset.max 64
                uint    pset.size 64
        pset pset_[host name]_id_25287
                uint    pset.min 64
                uint    pset.max 64
                uint    pset.size 64
        pset pset_[host name]_id_25224
                uint    pset.min 32
                uint    pset.max 32
                uint    pset.size 32
        pset pset_default
                uint    pset.min 1
                uint    pset.max 65536
                uint    pset.size 64

In this case we can see that out of the 256 CPU threads available to this Global Domain, 32 have been allocated to the first local domain, 64 each to the next 2 and then 32 to the last, leaving 64 in the default pool or available to the global domain.

If you would like to see the file which also details the complete rules of the resource pool, you can look here:

/etc/pooladm.conf

To start with any modifications, it is best to ensure that the latest configuration is saved. To do so you can run this command from the global domain:

# pooladm -s

Once this has been done, you can proceed with the reallocation. In this example, I will modify one pool by taking CPU Threads from the default pool.
Using “-d” operates directly on the kernel state, so use this with caution. On a running system, I would reallocate in small chunks. That will give the operating system time to adapt to the different CPU configuration. In this example we will add 8 threads to a local zone which already had 32 Threads:

# poolcfg -dc 'modify pset pset_[host name]_id_25289 ( uint pset.min = 40 ; uint pset.max = 40)'

At this point the change has been made to the configuration file only (/etc/pooladm.conf), not actually to the system. To make the change to the system, save the configuration and commit to the system:

# pooladm -s

# pooladm -c

Once this change is done, we can inspect the configuration by running the same command shown above. Notice the changes below:

#  poolcfg -dc info | egrep 'pset |pset.size|pset.min|pset.max'

        pset pset_[host name]_id_25289
                uint    pset.min 40
                uint    pset.max 40
                uint    pset.size 40
        pset pset_[host name]_id_25223
                uint    pset.min 64
                uint    pset.max 64
                uint    pset.size 64
        pset pset_[host name]_id_25287
                uint    pset.min 64
                uint    pset.max 64
                uint    pset.size 64
        pset pset_[host name]_id_25224
                uint    pset.min 32
                uint    pset.max 32
                uint    pset.size 32
        pset pset_default
                uint    pset.min 1
                uint    pset.max 65536
                uint    pset.size 56

If you need to transfer cpu from one local zone to another, you can do so by executing the following command:

poolcfg -dc 'transfer 8 from pset pset_default to pset_[host name]_id_25289'

Or if you want to assign a specific CPU Thread:

poolcfg -dc 'transfer to pset pset_[host name]_id_25289 ( cpu 5)'

The rest of the steps remain the same. In the next post I will show you how to verify the additional CPU in each local zone.

Improper Use of the Oracle ‘Rownum’ Pseudocolumn

The other day I found myself needing to explain to some developers why their use-case of the Oracle ‘rownum’ pseudocolumn was yielding a result in one database instance, but a completely different result in another.

In this situation, the correct result is the ‘maximum’ value of the column, however this query was also occasionally returning the exact ‘minimum’ value of this column. How could this happen? The answer lies in the using the ‘rownum’ pseudocolumn correctly. Of course there are other (probably better) ways to write this query without the use of ‘rownum’, but I’m not here to debate that right now….

** Note the tables in the query have been changed to protect the innocent.

select column_a from (select column_a,rownum rowid0 from schema.table order by column_a desc ) aa where aa.rowid0 =1;

Oracle documentation states that it depends how Oracle accessed the rows in the query as to which result you will get. For example your results can vary depending on a lot of factors (ie: the order that you inserted the data in the table or if there is an index on the table and how that index is used). For further information you can see the documentation here:

https://docs.oracle.com/cd/B28359_01/server.111/b28286/pseudocolumns009.htm#SQLRF00255

For further explanation, lets explore the explain plans encountered used in each system:

Correct Result:


--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                               |       |       |     1 (100)|          |
|*  1 |  VIEW                        |                               |  1257 | 32682 |     1   (0)| 00:00:01 |
|   2 |   COUNT                      |                               |       |       |            |          |
|   3 |    INDEX FULL SCAN DESCENDING| SCHEMA_TABLE_PK               |  1257 |  6285 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AA"."ROWID0"=1)


22 rows selected.

Incorrect Result:


---------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                               |       |       |     4 (100)|          |
|*  1 |  VIEW                   |                               |  1257 | 32682 |     4  (25)| 00:00:01 |
|   2 |   SORT ORDER BY         |                               |  1257 |  6285 |     4  (25)| 00:00:01 |
|   3 |    COUNT                |                               |       |       |            |          |
|   4 |     INDEX FAST FULL SCAN| SCHEMA_TABLE_PK               |  1257 |  6285 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("AA"."ROWID0"=1)


24 rows selected.

As you can see, the major difference here is that the two systems have not chosen the same access path in which to return the data. In one system a plan utilized an ‘INDEX FULL SCAN DESCENDING’ access path, while the other utilized an ‘INDEX FAST FULL SCAN’ access path.

Is this really that different? Turns out it is.

ASK Tom Explained the reason why very concisely:
(Ask TOM “Difference between Full Index Scans and Fast Full Index Scans”)

They state that:

“An index fast full scan reads the ENTIRE index, unsorted, as it exists on disk. It is basically using the index as a “skinny” version of the table. The query in question would only be accessing attributes in the index (we are not using the index as a way to get to the table, we are using the index INSTEAD of the table) We use multiblock IO and read all of the leaf, branch and the root block. We ignore the branch and root blocks and just process the (unordered) data on the leaf blocks.

An index full scan is when we read the index a block at a time – from start to finish. We’ll read the root block, navigate down the left hand side of the index (or right if we are doing a descending full scan) and then when we hit the leaf block – we’ll read across the entire bottom of the index – a block at a time – in sorted order. We use single block IO, not multiblock IO for this operation.”

Well there you have it. And this is why the result is different. How can we keep this from occurring in the future? The answer is to utilize the ‘rownum’ pseudocolumn correctly. Remember, rownum is not a real column so in order to get the right results, it needs to be added after the data is in the sorted order that you want. To do that, make sure you write the query so that ‘rownum’ is applied after the sort. Using the same query above, lets ‘rewrite’ it in such a way that it will achieve the desired results:

select column_a from (select column_a,rownum from (select column_a from schema.table order by column_a desc)) where rownum = 1;

See the steps now?

  1. Retrieve data in sorted order
  2. Apply the ‘rownum’ pseudocolumn
  3. Filter for the desired value in the list

If you must use the ‘rownum’ pseudocolumn, writing your query in this manner will ensure that you always get the same result.

Enjoy!