Local Listener vs. Remote Listener vs. Listener Networks

Introduction:

Often, when it comes to the database, you may see separate networks configured for the following types of traffic:

  • Backups
  • Management
  • Client

Recently, one of the configurations that I was a part of took it a step further than that and had a few additional networks configured:

  • Data Guard
  • Private Non-Routed Network

One additional requirement was that a scan listener be present for each one of these networks. I wasn’t given the opportunity to set this up either so we had to trust that the other entity set all of the correct parameters.  No big deal right?

 

The Problem:

Once all of the networks were configured and scan listeners were in place for each network, connectivity on each network was very erratic.  Clients would connect at times and at other times they would not.

It wasn’t until we used a  packet analyzer (Wireshark), that we really saw what was going on.  Upon investigation, a colleague found that occasionally the scan listener would return the wrong VIP to the connecting client.  Good news was that it was the SAME wrong VIP each time.  But why was it doing this?  The culprit ended up being incorrect / missing entries in the following parameters.

  • REMOTE_LISTENER
  • LOCAL_LISTENER
  • LISTENER_NETWORKS

The Oracle documentation on this was not a ton of help either.

The Solution:

Upon investigation, we found that an entry for each local listener was present in the LOCAL_LISTENER parameter and each SCAN_LISTENER was present in the REMOTE_LISTENER parameter and LISTENER_NETWORKS parameter was blank.  As it turns out, LOCAL_LISTENER and REMOTE_LISTENER should contain entries for those listeners present on the first network ONLY.

Incorrect Parameters:

local_listener='(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=client-vip.example.com)(PORT=1534))(ADDRESS=(PROTOCOL=TCP)(HOST=vlan-vip.example.com)(PORT=1534)))'
remote_listener='client-scan:1534','vlan-scan:1534'
listener_networks=''

The LISTENER_NETWORKS parameter is responsible for registration of listeners for ALL other networks.

Correct Parameters:

local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=client-vip.example.com)(PORT=1534))'
remote_listener='client-scan:1534'
listener_networks='((NAME=netVLAN)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=vlan-vip.example.com)(PORT=1534))))(REMOTE_LISTENER=vlan-scan:1534))'

Once these changes were made, the intermittent connection issues were gone and Wireshark confirmed that the listeners were returning the correct VIP for the network being requested.

Advertisements

Temporarily Convert FTP to SCP Using cURL

Whenever you re-host  or modernize a system, whether it be from one platform to another or from one host to another, you will usually have a laundry list of things to do one of which is migration of those pesky one off scripts and jobs.  When you are talking about jobs on a database server, more often than not, they are placed on the cron.  Consider yourself lucky if they are actually running via dbms_scheduler as this does alleviate some of workload as the schedule will be migrated during database instantiation.

Part of my normal routine when I work on these types of jobs, is to first catalog the ‘who’, ‘what’, ‘where’ and ‘why’ of each script.  It is important to do this because maybe you can document a script that wasn’t well documented before or even find a script that can be retired all together.  Once these questions have been answered, I then decide what MUST change and what SHOULD change.  One of the items that typically falls under the MUST category is migration of FTP to SCP (or equivalent).  This item is usually a MUST either, because of security concerns or because traditional FTP is no longer being installed as part of the base build build in many data centers.  The most important part of setting up SCP is the process of getting the correct ssh keys in place so that SCP can work in a ‘passwordless’ fashion.  Preferably, service accounts are created so that the ssh key of the oracle user is not spread in across too many non-database hosts and in many cases the task must be completed by someone else. Most of the time is nice and can alleviate some of your workload, but what about the times when you thought the other party did what they needed to but didn’t?  I don’t know about you, but it’s always little stuff like this that will bite you at the worst time.  So what do you do when you are in the 11th hour of a conversion, realize they ssh keys are not present, but have to get the job functioning?   Use cURL!

Code Exerpt from typical FTP inside a shell script:

#!/bin/sh
HOST='ftp.somehost.com'
USER='username'
PASSWD='password'
FILE='somefile.txt'
DEST_PATH='/tmp'

ftp -n ${HOST} <<END_SCRIPT
quote USER ${USER}
quote PASS ${PASSWD}
cd ${DEST_PATH}
put ${FILE}
quit
END_SCRIPT
exit 0

 

Modified code using CURL:

#!/bin/sh
HOST='somehost.com'
USER='username'
PASSWD='password'
FILE='somefile.txt'
DEST_PATH='/tmp'

curl -T $FILE -u ${USER}:${PASSWORD} scp://${HOST}/${DEST_PATH}

As you can see, this doesn’t alleviate the security concern of having clear text passwords, but if you ever find yourself in a bind, cannot get FTP or other packages installed that can help with the issue, this is a great temporary work around to get the job done. Remember this should be TEMPORARY and make sure to come back and fix it!

Enjoy!

GaOUG Oracle Tech Day 2016

On March 3, 2016, I’ll be at AMA Executive Conference Centers for GaOUG’s Tech Day 2016. This Oracle user conference promises to be packed with the best content from the best the industry has to offer.  If you will be in the Atlanta area and use Oracle products, consider attending the event!

Specifically, I will be presenting:

Presentation Name:
Application High Availability and Upgrades Using Oracle GoldenGate

Abstract:
This presentation will discuss the techniques and methods used to deploy a High Availability Active / Active configuration using Oracle GoldenGate.  Specifically, discussion will surround how to deploy GoldenGate in a standard configuration, utilization of the built in Conflict Detection and Resolution (CDR) functionality and available methodologies to deploy application or database architecture changes within this configuration.  Focus will also be given to how the DBA resources must also involve other IT resources to achieve a successful deployment.

Click here for more information or to register for GaOUG’s Tech Day.

We appreciate your efforts and look forward to seeing you on March 3. If you have any questions, please feel free to contact me anytime.

Data Pump May Cause Exponential Growth in USER_HISTORY$ Records

Data Pump is a common method for moving data from one schema to another.  Oftentimes, the easiest way to do this is to do a full schema export, followed by a full schema import using the appropriate ‘remap’ options.  Usually during a job like this, the user will already exist and it was this situation that uncovered an unexpected behavior in Data Pump.  In our situation, the weekly data copy job ran for 37 weeks and all of a sudden the job started to noticeably take longer and longer, until the point to where it would run for days and then never finish.

Upon investigation, we found that the data pump job was initially hanging on the import Data Pump during this step:

Processing object type SCHEMA_EXPORT/PASSWORD_HISTORY

Because we didn’t really know what was going on and we needed the job to finish, we excluded “PASSWORD_HISTORY” from the import DataPump.  Then just a few short weeks later, the job then was exhibiting the same behavior on the export.  Clearly something more was going on.  After taking some time to analyze ASH reports, it was clear that the process was getting hung during processing of the USER_HISTORY$ table.

What is the USER_HISTORY$ table?

As it turns out, this table stores the actual password history for each user in the database.

SQL> desc user_history$
Name Null? Type
----------------------- -------- ----------------
USER# NOT NULL NUMBER
PASSWORD VARCHAR2(30)
PASSWORD_DATE DATE

This table is never purged so each time the user receives a password change, a row is written to this table.  In our case a single password change for this user resulted in millions of rows for this user over the 44 weeks that the job had been occurring and Data Pump was exacerbating the issue.  With each export / import Data Pump, and because we were alternating schemas, this one row became two, two became three, three became five and so on, until week 37 where the table had almost 15 million rows.

Demonstration:

To demonstrate the issue, we will use very simple parameter files and export / import the same schema:
expdp-HR2.par:

USERID='/ as sysdba'
DIRECTORY=EXP_DIR
DUMPFILE=EXP_DIR:hr_exp.dat
LOGFILE=EXP_DIR:hr_exp.log
SCHEMAS=HR2
CONTENT=ALL
COMPRESSION=ALL
REUSE_DUMPFILES=Y
JOB_NAME=EXPHR_JOB

impdp-HR2.par:

USERID='/ as sysdba'
DIRECTORY=EXP_DIR
DUMPFILE=hr_exp.dat
LOGFILE=hr_imp.log
TABLE_EXISTS_ACTION=REPLACE

We should initially check the user_history$ table to determine the baseline for a particular user:

SQL> select name, count(*)
2 from user_history$, user$
3 where user_history$.user# = user$.user#
4 and name like 'HR%'
5 group by name order by name;

NAME  COUNT(*)
----- ---------------
HR    2
HR2   6

We will then run an export and import using the parameter files above and then re-run the query:

SQL> select name,count(*)
 2 from user_history$ a, user$ b
 3 where a.user#=b.user#
 4 and name like 'HR%'
 5 group by name order by 1;

NAME  COUNT(*)
----- ---------------
HR    2
HR2   12

And sure enough, you see the rows for password history have doubled just from running the import 1 time!

Solution:

Luckily we now have two solutions available to us with minimal impacts.  Oracle has made a patch available (16811897) for all versions 11.2.0.3 and above, you can upgrade to 12.1.0.2 or you can exclude ‘PASSWORD_HISTORY’ from the export / import process.

Of course if you are running into this issue, make sure you test thoroughly as results may vary!

Beware the Orphan Streams Capture Effect on Archivelogs

Recently, I ran into an issue where the Oracle Backup and Recovery Manager (RMAN) would not delete old database archivelogs.  I have had this happen before when there as a lag condition with either data guard or GoldenGate extract process.  Upon further investigation, I found that RMAN was issuing the following error.  It looked just like the other times I have encountered this issue:

RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+RECOC1/……

This error repeated many times, representing a ton of space being consumed in the DB_FILE_RECOVERY_DEST.  In order to begin diagnosis of the problem (which also happens to be a data guard environment), I went through the normal motions of ensuring proper application of logs to the physical standby environments:

/* On Primary */
SELECT * FROM v$dataguard_stats;
/* To determine which logs are not shipped and applied yet */&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;SELECT *
FROM v$archived_log
WHERE DEST_ID &amp;lt;&amp;gt; 1
AND APPLIED = 'NO'
ORDER BY completion_time;
/* Run on both source and standby to compare the last logs that were applied */&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;SELECT THREAD#, MAX(SEQUENCE#) AS &quot;LAST_APPLIED_LOG&quot;
FROM V$LOG_HISTORY
GROUP BY THREAD#;
/* On Primary */
SELECT *
FROM
(SELECT *
FROM DBA_HIST_SYSMETRIC_HISTORY
WHERE metric_name='Redo Generated Per Sec'
ORDER BY snap_id DESC
)
WHERE rownum&amp;lt;=10;
/* On Primary */
SELECT current_scn FROM v$database;
/* On Standby */
SELECT current_scn FROM v$database;
/* On Primary - To determine actual time difference */
SELECT SCN_TO_TIMESTAMP([SCN of PRIMARY), SCN_TO_TIMESTAMP(SCN of SECONDARY) FROM dual;

After looking at data guard performance, it was clear that data guard itself was not holding RMAN from removing the archivelogs.  In addition to data guard being present, this system is in the process of being migrated from an older system via GoldenGate.  I have seen GoldenGate cause this issue before when an extract was registered with “log retention”, but this is the target system so this isn’t possible, RIGHT?  Just to be sure, I ran a query to see if there were any items which caused the database to think there was a GoldenGate object registered with the system:

/* On Primary - Where archivelogs are being held */
SELECT CAPTURE_NAME, START_TIME FROM dba_capture;


CAPTURE_NAME
---------------------
OGG2$_EECCP2_12C415F04

START_TIME
-------------------------------------------
02-NOV-15 06.47.41.000000000 AM

STATUS
----------
DISABLED

Hey, wait a minute, we do not have an GoldenGate extract or any other registered GoldenGate objects running on this database so why is this entry here?  I then ran the same query as above on my source server and there it is.  The same object on the source.

So how did this happen?

It all points back to the initial data pump that we used to instantiate the database before turning on GoldenGate.  We used a ‘FULL’ data pump export which was taken AFTER the extract was started on the source.  Because of this the export also contained the capture objects necessary to register the extract with ‘LOGRETENTION’.

Solution….

In order to remedy this situation we need to completely remove this ‘orphaned’ capture object from the database.  To do this we need to use the dbms_streams_adm package.  Utilize all of the package defaults, so that you will raise an error should you try to delete the incorrect queue:


exec DBMS_STREAMS_ADM.REMOVE_QUEUE([CAPTURE_NAME])

You should now re-execute the query against dba_capture and the queue just deleted, should no longer be there:


/* On Primary - Where archivelogs are being held */
SELECT CAPTURE_NAME, START_TIME FROM dba_capture;

no rows selected

From this point on, your archivelogs should not be required for any “standby or upstream capture process” and RMAN should now delete your backed up archivelogs providing free space in your DB_FILE_RECOVERY_DEST!

Understand Integrated Replicat Performance using the GGSCI STATS Command

With GoldenGate 12c, Integrated Replicat has now become the preferred option to apply transactions (Classic and Coordinated are the others). In order to get the most efficiency out of the Integrated Replicat, it is important to understand what types of operations cannot be applied by the integrated apply server. GoldenGate considers operations which cannot be applied in “integrated” mode to be applied in “direct” mode.

So what is the difference between Integrated and Direct?

“Integrated” applies transactions via a Logical Change Record (LCR) vs. “Direct” which applies transactions with a SQL Statement via OCI. “Direct” also requires that transactions be applied serially vs. “Integrated” where they can be applied in a parallel, coordinated fashion.

There are some limitations to applying changes via a LCR.  The following are items which can only be applied in “direct” mode:

  • DDL operations
  • Sequence operations
  • SQLEXEC parameter within a TABLE or MAP parameter
  • EVENTACTIONS processing
  • UDT Note, if the extract uses USENATIVEOBJSUPPORT to capture the UDT, then Integrated Replicat will apply it with the inbound server, otherwise it will be handled by Replicat directly.

Understanding the types of operations going on inside your database should be one of the first steps in setting up a GoldenGate environment and heavy use of “direct” transactions within an Integrated Replicat will likely reduce the performance of the replicat.

Integrated Replicat Statistics Explained:

  • Total transactions – Total transactions processed by replicat
  • Redirected – Number of transactions for which replicat is redirected to classic mode
  • DDL operations – Number of DDL operations processed in direct mode
  • Stored procedures – Number of Stored procedures in direct mode
  • Datatype functionality – Number of user defined data types processed in direct mode
  • Event actions – Number of event actions processed in direct mode
  • Direct transactions ratio – Percentage of transactions that the Integrated Replicat converts itself to direct mode to apply transactions.

Direct transactions ratio are computed by either:

Direct transactions ratio = (Redirected/Total transactions)*100
(or)
Direct transactions ratio = ((DDL operations + Stored procedures + Datatype functionality + Event actions) / Total transactions)*100

The lower the direct transactions ratio, the better the performance. This is one thing that should be considered when changing to Integrated Replicat.  If the ratio is high, then it could be better to use Classic Replicat to improve performance.

Example of Integrated Replicat Statistics:

 
GGSCI (orcl12c-rac1.localdomain) 4&amp;gt; stats R_TST_R1 totalsonly *.*

Sending STATS request to REPLICAT R_TST_R1 ...

Start of Statistics at 2015-09-14 19:34:26.
 Integrated Replicat Statistics:

Total transactions 11829288.00
 Redirected 0.00
 DDL operations 0.00
 Stored procedures 0.00
 Datatype functionality 295977.00
 Event actions 0.00
 Direct transactions ratio 2.50%
.........


 GGSCI (orcl12c-rac1.localdomain) 2&amp;gt; stats R_TST_R2 totalsonly *.*

Sending STATS request to REPLICAT R_TST_R2 ...

Start of Statistics at 2015-09-14 19:11:48.
 Integrated Replicat Statistics:

Total transactions 2781917.00
 Redirected 0.00
 DDL operations 0.00
 Stored procedures 0.00
 Datatype functionality 733578.00
 Event actions 0.00
 Direct transactions ratio 26.37%
.........

As you can see in the second example, the “Direct transactions ratio” is much higher. Because I know this system, I know that this higher ratio is entirely attributable to the replication of sequence objects. Using this as an example, a few options to lower the Direct transaction ratio and improve performance might include the discontinuation of sequence replication or splitting sequence replication into their own ‘classic’ mode replicat. Of course if current performance is adequate, you could do nothing. I would probably consider changing to classic replicat when the “Direct transactions ratio” approached 50%.

No matter the situation, it is important to understand the statistics which GoldenGate is recording on your behalf. It may lend good insight into what is going on in your environment.

Oracle 11g Data Pump EXCLUDE Parameter Caveat When the FULL=Y Parameter is Used

During the course of my work, I migrate many databases from one host to another.  Occasionally, DBAs must create backup tables when diagnosing an issue or implementing a change and more frequently than not, these objects never get dropped.  Usually this is no problem except for when you need to move a database from one host to another and you do not want to move these old backup objects.  In my latest migration, I found over 300GB of backup tables which would certainly elongate the time needed to complete the data pump.  After obtaining approval that these tables did not need to be migrated, I began to determine how to exclude these tables from the data pump.  At first, I thought I had it all figured out, data pump everything and exclude only the objects I did not want to bring over.  High level the steps looked like this:

  1. Create a work table in the database which contained the list of excluded tables.
  2. Load the reference data which will be queried by data pump for the exclude list.
  3. Create the data pump parameter file
  4. Execute the data pump
Exclusion table DDL (this simplifies the parameter file):
CREATE TABLE GGATE.EXCLUDE_TABLE (
TABLE_NAME VARCHAR2(30),
OBSOLETE_FLG VARCHAR2(1)
)
TABLESPACE USERS;

INSERT INTO GGATE.EXCLUDE_TABLE VALUES ('TEMP_900008090_08122014', 'Y');
COMMIT;

Data pump parameter file:

USERID='/ as sysdba'
DUMPFILE=EXP_ASM_DIR:METADATA_EXP.DAT LOGFILE=EXP_LOG_DIR:METADATA_EXP.LOG
FULL=Y
CONTENT=METADATA_ONLY
REUSE_DUMPFILES=Y
JOB_NAME=EXP_METADATA_JOB
EXCLUDE=STATISTICS
EXCLUDE=TABLE:"IN (SELECT TABLE_NAME FROM GGATE.EXCLUDE_TABLE WHERE OBSOLETE_FLG='Y')"
CLUSTER=N

You will have all objects except for the few tables which you excluded.  Right?  Well, almost.   You have everything you asked for except, the data pump left out ALL of the following paths:

DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
DATABASE_EXPORT/SCHEMA/TABLE/INDEX/FUNCTIONAL_INDEX/INDEX
DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/CONSTRAINT
DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/REF_CONSTRAINT
DATABASE_EXPORT/SCHEMA/TABLE/TRIGGER

So after many hours of research, I found this as a bug documented in Oracle Doc 1491557.1 and effects nearly all versions of 11g.  As a work around, you can do one of the following:

  1. Do another export of metadata without exclusions to get these object paths and import them one by one for each schema
  2. Do a FULL=Y with no exclusions and add the EXCLUDE parameter on the impdp parameter file like this:
    USERID='/ as sysdba'
    DUMPFILE=IMP_ASM_DIR:METADATA_EXP.DAT
    LOGFILE=IMP_LOG_DIR:METADATA_IMP.LOG
    CONTENT=METADATA_ONLY
    JOB_NAME=IMP_METADATA_JOB
    CLUSTER=N
    EXCLUDE=TABLE:"IN (SELECT TABLE_NAME FROM GGATE.EXCLUDE_TABLE)"
    EXCLUDE=STATISTICS
    
  3. Apply patch 14095143 and the EXCLUDE will work properly
  4. Upgrade to 11.2.0.4

Have fun!

To see a where exactly the data pump paths were left out, feel free to see a better representation here:
datapump export compare