Category Archives: Goldengate

Extending GoldenGate Change Data Capture With Eventactions

In the previous post, I discussed a very simple setup of GoldenGate for the purpose of implementing Change Data Capture.  Occasionally, depending on the requirements and the data volume, it may be worthwhile to suspend replication while the application processes data or performs some other work.  One way to do this is with eventactions.  GoldenGate eventactions are a simple way of telling Goldengate to do something if a certain data situation is encountered.  In this example, I am going to data drive my event actions by using a control table with 2 rows:

EVENT_DESC EVENT_TMSTP
SUSPEND REPLICAT 13-DEC-16 11.31.00.881031 AM
RESUME REPLICAT 13-DEC-16 11.48.51.065772 PM

Based on an update to the timestamp in the source system, the downstream replicat will utilize that data to either suspend or resume the replicat.

The only way that this event action can be completely data driven, is to have 2 replicats.  One which processes all of the change data and another which processes the ‘resume’ command once issued.  The second replicat is needed due to the fact that the first replicat cannot process any ‘resume’ commands or apply any data on its own because it is SUSPENDED!

In the replicat parameter which will process the suspend, the parameters may look like the following:

--Standard entries in a replicat parameter file

MAP MYSCHEMA.EVENTS, TARGET MYSCHEMA.EVENTS, &
COLMAP (USEDEFAULTS), &
FILTER (@STREQ (EVENT_DESC, 'SUSPEND REPLICAT' )), &
EVENTACTIONS (IGNORE RECORD, LOG INFO, REPORT, SUSPEND, CP BOTH);

In the replicat that will process the resume, the parameters may look like:

--Standard entries in a replicat parameter file

ALLOWDUPTARGETMAP

MAP MYSCHEMA.EVENTS, TARGET MYSCHEMA.EVENTS, &
COLMAP (USEDEFAULTS), &
HANDLECOLLISIONS;

MAP MYSCHEMA.EVENTS, TARGET MYSCHEMA.EVENTS, &
FILTER (@STREQ (EVENT_DESC, 'RESUME REPLICAT' )), &
EVENTACTIONS (IGNORE RECORD, LOG, REPORT, CP AFTER,&
SHELL ('./dirshell/resume_replicat.sh $1', VAR $1 = 'RCDC1'));

If you need a more robust mechanism wherby you need to check some other condition prior to issuing the suspend, you can also extend the suspend action further by executing a SQL Statement as shown within the replicat.  In this case, we need to make sure that if a suspend was issued, it is not prior to a resume record being issued.  This could be very helpful to safeguard against accidental suspends being processed since it is very possible that the CDC replicat may be processing data which is well behind that of the one that processes the ‘resume’ event.

--Standard entries in a replicat parameter file

MAP MYSCHEMA.EVENTS, TARGET MYSCHEMA.EVENTS, &
COLMAP (USEDEFAULTS), &
FILTER (@STREQ (EVENT_DESC, 'SUSPEND REPLICAT' )), &
SQLEXEC (ID LOOKUP_RESUME_DATE, &
QUERY ' SELECT EVENT_DESC,EVENT_TMSTP, COUNT(*) SUSPEND_OK FROM MYSCHEMA.EVENTS &
WHERE EVENT_DESC = :p_resume_event_desc AND EVENT_TMSTP < :p_suspend_tmstp & AND NOT EXISTS( & SELECT 1 FROM MYSCHEMA.EVENTS & WHERE EVENT_DESC = :p_suspend_event_desc AND EVENT_TMSTP > :p_suspend_tmstp) &
GROUP BY EVENT_DESC, EVENT_TMSTP ', &
PARAMS (p_resume_event_desc = 'RESUME REPLICAT', p_suspend_event_desc = 'SUSPEND REPLICAT', p_suspend_tmstp = EVENT_TMSTP), &
TRACE ALL, &
BEFOREFILTER), &
FILTER (@STREQ (LOOKUP_RESUME_DATE.SUSPEND_OK, 1 )), &
EVENTACTIONS (IGNORE, LOG, REPORT, SUSPEND, CP BOTH);

As you can see, event actions are very powerful and can be extended in a variety of ways.  This is just one example.  If you choose to implement this, make sure you also account for the event action in any monitoring scripts you have because GoldenGate will show lag while the suspend action is valid.  Have fun!

Advertisements

Implement GoldenGate Change Data Capture

Recently, I was asked to implement a solution to fulfill data audit requirements which captured the complete lifecycle of a row of data.  As it turns out, Oracle GoldenGate, is a perfect tool to accomplish this requirement of Change Data Capture.

Setup of Change Data Capture

Setup is very similar to other GoldenGate installations.  You need to make sure that you capture both the before and after images in the Extract and the remaining parameters required to do this are within the replicat parameter file. Most of the time I implement this via a MACRO, but for sake of simplicity, I will show it as a basic parameter file:

MAP SOE.STRESSTESTTABLE, TARGET SOE.STRESSTESTTABLE_CDC INSERTALLRECORDS. &
COLMAP ( &
USEDEFAULTS, &
COMMIT_TMSTP = @GETENV('GGHEADER', 'COMMITTIMESTAMP'), &
RECORD_SCN_NBR = @GETENV('TRANSACTION', 'CSN'), &
ORIG_DML_TYPE_NM = @GETENV('GGHEADER', 'OPTYPE'), &
BFR_AFT_IND_CD = @GETENV ('GGHEADER', 'BEFOREAFTERINDICATOR'), &
TARGET_INSERT_TMSTP = @DATE ('YYYY-MM-DD HH:MI:SS.FFFFFF','JTS',@GETENV ('JULIANTIMESTAMP')), &
DML_TYPE_CD = @CASE (@GETENV ('GGHEADER', 'OPTYPE'), &
                'INSERT', 'I', &
                'UPDATE', 'U',&
                'SQL COMPUPDATE', 'U', &
                'PK UPDATE', 'U', &
                'DELETE', 'D', &
                'TRUNCATE', 'T', &
                'ENSCRIBE COMPUPDATE','U', &
                @COLSTAT(NULL)));

The important part to notice about this parameter file is the INSERTALLRECORDS keyword.  This takes every DML operation and converts it to an insert.  Additional data is also required to correctly interpret the row and is accomplished by the addition of the following columns to every table you wish to capture changes on:

  • COMMIT_TMSTP – date which the source row was committed
  • RECORD_SCN_NBR – corresponding SCN of when the source row was committed
  • ORIG_DML_TYPE_NM – GoldenGate Operation Type
  • BFR_AFT_IND_CD – Indicates if row is the before image or after image
  • TARGET_INSERT_TMSTP – Timestamp when row was inserted to target
  • DML_TYPE_CD – Translates the GoldenGate Operation Type to indicate a simple form of CREATE, UPDATE or DELETE operation.

Change Data Capture in Action

Once the GoldenGate scheme is started, the change data capture starts recording the changes as normal processing occurs.  Using the SOE.STRESSTEST provided by Swingbench, we can see the Change Data Capture in action:

COLUMN DATA ROW 1 DATA ROW 2 DATA ROW 3
ID 17346 17346 17346
AINT 4730828 4730829 4730829
AFLOAT 300.60 301.00 301.00
ASMALLVARCHAR HsnzUb HzdhHY HzdhHY
COMMIT_TMSTP 16-DEC-16 09.18.22.00 16-DEC-16 09.18.38.00 16-DEC-16 09.20.12.00
RECORD_SCN_NBR 11405081 11404603 11479311
ORIG_DML_TYPE_NM INSERT SQL COMPUPDATE DELETE
BFR_AFT_IND_CD AFTER AFTER BEFORE
TARGET_INSERT_TMSTP 16-DEC-16 09.22.33.27 16-DEC-16 09.29.20.84 16-DEC-16 09.29.41.23
DML_TYPE_CD I U D

Building Eventactions on top of Change Data Capture

This was an overly simple example of how to implement Change Data Capture.  In the next blog post, I will show how to use event actions to suspend the replicat using a data driven model.  This methodology will give a quiesced time where a separate application can process the changes and perform table truncates vs. a model where no suspend takes place and the application must perform transaction deletes.  When the application is finished processing, it can then ‘un-suspend’ the replicat using the same data driven model and continue processing.

Beware the Orphan Streams Capture Effect on Archivelogs

Recently, I ran into an issue where the Oracle Backup and Recovery Manager (RMAN) would not delete old database archivelogs.  I have had this happen before when there as a lag condition with either data guard or GoldenGate extract process.  Upon further investigation, I found that RMAN was issuing the following error.  It looked just like the other times I have encountered this issue:

RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+RECOC1/……

This error repeated many times, representing a ton of space being consumed in the DB_FILE_RECOVERY_DEST.  In order to begin diagnosis of the problem (which also happens to be a data guard environment), I went through the normal motions of ensuring proper application of logs to the physical standby environments:

/* On Primary */
SELECT * FROM v$dataguard_stats;
/* To determine which logs are not shipped and applied yet */&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;SELECT *
FROM v$archived_log
WHERE DEST_ID &amp;lt;&amp;gt; 1
AND APPLIED = 'NO'
ORDER BY completion_time;
/* Run on both source and standby to compare the last logs that were applied */&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;SELECT THREAD#, MAX(SEQUENCE#) AS &quot;LAST_APPLIED_LOG&quot;
FROM V$LOG_HISTORY
GROUP BY THREAD#;
/* On Primary */
SELECT *
FROM
(SELECT *
FROM DBA_HIST_SYSMETRIC_HISTORY
WHERE metric_name='Redo Generated Per Sec'
ORDER BY snap_id DESC
)
WHERE rownum&amp;lt;=10;
/* On Primary */
SELECT current_scn FROM v$database;
/* On Standby */
SELECT current_scn FROM v$database;
/* On Primary - To determine actual time difference */
SELECT SCN_TO_TIMESTAMP([SCN of PRIMARY), SCN_TO_TIMESTAMP(SCN of SECONDARY) FROM dual;

After looking at data guard performance, it was clear that data guard itself was not holding RMAN from removing the archivelogs.  In addition to data guard being present, this system is in the process of being migrated from an older system via GoldenGate.  I have seen GoldenGate cause this issue before when an extract was registered with “log retention”, but this is the target system so this isn’t possible, RIGHT?  Just to be sure, I ran a query to see if there were any items which caused the database to think there was a GoldenGate object registered with the system:

/* On Primary - Where archivelogs are being held */
SELECT CAPTURE_NAME, START_TIME FROM dba_capture;


CAPTURE_NAME
---------------------
OGG2$_EECCP2_12C415F04

START_TIME
-------------------------------------------
02-NOV-15 06.47.41.000000000 AM

STATUS
----------
DISABLED

Hey, wait a minute, we do not have an GoldenGate extract or any other registered GoldenGate objects running on this database so why is this entry here?  I then ran the same query as above on my source server and there it is.  The same object on the source.

So how did this happen?

It all points back to the initial data pump that we used to instantiate the database before turning on GoldenGate.  We used a ‘FULL’ data pump export which was taken AFTER the extract was started on the source.  Because of this the export also contained the capture objects necessary to register the extract with ‘LOGRETENTION’.

Solution….

In order to remedy this situation we need to completely remove this ‘orphaned’ capture object from the database.  To do this we need to use the dbms_streams_adm package.  Utilize all of the package defaults, so that you will raise an error should you try to delete the incorrect queue:


exec DBMS_STREAMS_ADM.REMOVE_QUEUE([CAPTURE_NAME])

You should now re-execute the query against dba_capture and the queue just deleted, should no longer be there:


/* On Primary - Where archivelogs are being held */
SELECT CAPTURE_NAME, START_TIME FROM dba_capture;

no rows selected

From this point on, your archivelogs should not be required for any “standby or upstream capture process” and RMAN should now delete your backed up archivelogs providing free space in your DB_FILE_RECOVERY_DEST!

Understand Integrated Replicat Performance using the GGSCI STATS Command

With GoldenGate 12c, Integrated Replicat has now become the preferred option to apply transactions (Classic and Coordinated are the others). In order to get the most efficiency out of the Integrated Replicat, it is important to understand what types of operations cannot be applied by the integrated apply server. GoldenGate considers operations which cannot be applied in “integrated” mode to be applied in “direct” mode.

So what is the difference between Integrated and Direct?

“Integrated” applies transactions via a Logical Change Record (LCR) vs. “Direct” which applies transactions with a SQL Statement via OCI. “Direct” also requires that transactions be applied serially vs. “Integrated” where they can be applied in a parallel, coordinated fashion.

There are some limitations to applying changes via a LCR.  The following are items which can only be applied in “direct” mode:

  • DDL operations
  • Sequence operations
  • SQLEXEC parameter within a TABLE or MAP parameter
  • EVENTACTIONS processing
  • UDT Note, if the extract uses USENATIVEOBJSUPPORT to capture the UDT, then Integrated Replicat will apply it with the inbound server, otherwise it will be handled by Replicat directly.

Understanding the types of operations going on inside your database should be one of the first steps in setting up a GoldenGate environment and heavy use of “direct” transactions within an Integrated Replicat will likely reduce the performance of the replicat.

Integrated Replicat Statistics Explained:

  • Total transactions – Total transactions processed by replicat
  • Redirected – Number of transactions for which replicat is redirected to classic mode
  • DDL operations – Number of DDL operations processed in direct mode
  • Stored procedures – Number of Stored procedures in direct mode
  • Datatype functionality – Number of user defined data types processed in direct mode
  • Event actions – Number of event actions processed in direct mode
  • Direct transactions ratio – Percentage of transactions that the Integrated Replicat converts itself to direct mode to apply transactions.

Direct transactions ratio are computed by either:

Direct transactions ratio = (Redirected/Total transactions)*100
(or)
Direct transactions ratio = ((DDL operations + Stored procedures + Datatype functionality + Event actions) / Total transactions)*100

The lower the direct transactions ratio, the better the performance. This is one thing that should be considered when changing to Integrated Replicat.  If the ratio is high, then it could be better to use Classic Replicat to improve performance.

Example of Integrated Replicat Statistics:

 
GGSCI (orcl12c-rac1.localdomain) 4&amp;gt; stats R_TST_R1 totalsonly *.*

Sending STATS request to REPLICAT R_TST_R1 ...

Start of Statistics at 2015-09-14 19:34:26.
 Integrated Replicat Statistics:

Total transactions 11829288.00
 Redirected 0.00
 DDL operations 0.00
 Stored procedures 0.00
 Datatype functionality 295977.00
 Event actions 0.00
 Direct transactions ratio 2.50%
.........


 GGSCI (orcl12c-rac1.localdomain) 2&amp;gt; stats R_TST_R2 totalsonly *.*

Sending STATS request to REPLICAT R_TST_R2 ...

Start of Statistics at 2015-09-14 19:11:48.
 Integrated Replicat Statistics:

Total transactions 2781917.00
 Redirected 0.00
 DDL operations 0.00
 Stored procedures 0.00
 Datatype functionality 733578.00
 Event actions 0.00
 Direct transactions ratio 26.37%
.........

As you can see in the second example, the “Direct transactions ratio” is much higher. Because I know this system, I know that this higher ratio is entirely attributable to the replication of sequence objects. Using this as an example, a few options to lower the Direct transaction ratio and improve performance might include the discontinuation of sequence replication or splitting sequence replication into their own ‘classic’ mode replicat. Of course if current performance is adequate, you could do nothing. I would probably consider changing to classic replicat when the “Direct transactions ratio” approached 50%.

No matter the situation, it is important to understand the statistics which GoldenGate is recording on your behalf. It may lend good insight into what is going on in your environment.

Set Up and Use of ACFS for GoldenGate

Recently, I found myself in a situation where I needed a Linux mount point of sufficient space for GoldenGate binaries / trail files.  I’ve used the Oracle Database File System (DBFS) option in the past, although I never really was a big fan of it since its use creates additional database objects and in my opinion unnecessary additional database I/O as well as additional redo and rman activity.  Based on this, I decided to explore the use of Oracle ASM Clustered File System (ACFS) for this use case.  At first glance, it seemed to be much faster to set up and was available on all nodes by default, which would also allow GoldenGate to fail over to other nodes.  In addition,  ACFS does not require the database to be up so the filesystem can also be used for other purposes.  If you are using this mount solely for GoldenGate, make sure you follow the best practices document which is updated periodically (Oracle GoldenGate Best Practice: NFS Mount options for use with GoldenGate (Doc ID 1232303.1))

***  Refer to the following steps at your own risk and always test for your use case prior to using in a production setting.

Requirements:

  • Root user access
  • Sufficient ASM Space
  • Separate ASM Diskgroup (Optional)
  • Latest Oracle Grid Infrastructure and Database Patchset

Configuration:

Verify that ACFS/ADVM modules are present in memory (on each node):

 $ lsmod | grep oracle

If the modules are not present, the command will return something similar to:
oracleasm              53591  1

If the modules are present, the command will return something similar to:
oracleacfs 3308260 0
oracleadvm 508030 0
oracleoks 506741 2 oracleacfs,oracleadvm
oracleasm 53591 1

If the modules are not present or you would like to ensure that the latest version is loaded, run the following before proceeding (as the root user):

 
$ . oraenv

ORACLE_SID = [CDBRAC1] ? +ASM

The Oracle base remains unchanged with value /u01/app/oracle

# $GRID_HOME/bin/acfsroot install

Reboot the node if the modules were already present and you are reloading them.

Start and enable the ACFS modules on each node:

On each node and as the root user:

# $GRID_HOME/bin/acfsload start
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
ACFS-9156: Detecting control device '/dev/ofsctl'.
ACFS-9322: completed

If running Grid Infrastructure, enable the driver modules in clusterware (only on one node as the root user):

# $GRID_HOME/bin/acfsroot enable
ACFS-9376: Adding ADVM/ACFS drivers resource succeeded.
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'orcl-rac1'
CRS-2676: Start of 'ora.drivers.acfs' on 'orcl-rac1' succeeded
ACFS-9380: Starting ADVM/ACFS drivers resource succeeded.
ACFS-9368: Adding ACFS registry resource succeeded.
CRS-2672: Attempting to start 'ora.registry.acfs' on 'orcl-rac2'
CRS-2672: Attempting to start 'ora.registry.acfs' on 'orcl-rac1'
CRS-2676: Start of 'ora.registry.acfs' on 'orcl-rac2' succeeded
CRS-2676: Start of 'ora.registry.acfs' on 'orcl-rac1' succeeded
ACFS-9372: Starting ACFS registry resource succeeded.

Once installation is complete, and the mount is registered with clusterware, these modules will be loaded automatically.

If you like you can double check the driverstate by using the following executable:
usage: acfsdriverstate [-orahome ] [-s]

As oracle user, create an ASM volume for ACFS (run only on one node):

Source in the grid environment.

$ . oraenv
ORACLE_SID = [CDBRAC1] ? +ASM
The Oracle base remains unchanged with value /u01/app/oracle

Create the volume using the volcreate command.
You can use an existing disk group or create a separate one to house ACFS.

$ asmcmd
ASMCMD> volcreate -G DATA -s 10G ACFSVOL1
ASMCMD> volinfo --all
Diskgroup Name: DATA

Volume Name: ACFSVOL1
Volume Device: /dev/asm/acfsvol1-370
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 64
Redundancy: UNPROT
Stripe Columns: 8
Stripe Width (K): 1024
Usage:
Mountpath:

As oracle user, create the filesystem on the volume which was just created:

$ /sbin/mkfs -t acfs /dev/asm/acfsvol1-370

mkfs.acfs: version = 12.1.0.2.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfsvol1-370
mkfs.acfs: volume size = 1073741824 ( 1.00 GB )
mkfs.acfs: Format complete.

As root, create an empty directory which will house the file system:

# mkdir -p /acfsmounts/acfsvol1
# chown root:oinstall /acfsmounts
# chmod 770 /acfsmounts
# chown -R oracle:oinstall /acfsmounts/acfsvol1
# chmod 775 /acfsmounts/acfsvol1

As root, setup the file system to be auto mounted by clusterware:

In a RAC 11g environment, you use acfsutil (srvctl may be supported – was not tested and the “-u option” will allow the oracle user to administer the mount):
# . /usr/local/bin/oraenv
ORACLE_SID = [CDBRAC1] ? +ASM
The Oracle base remains unchanged with value /u01/app/oracle
# /sbin/acfsutil registry -a /dev/asm/acfsvol1-370 /acfsmounts/acfsvol1 -t "ACFS General Purpose Mount" -u oracle
In a RAC 12c GI environment, register it with clusterware using the following commands (the “-u option” will allow the oracle user to administer the mount):
# .&amp;amp;nbsp;/usr/local/bin/oraenv
ORACLE_SID = [CDBRAC1] ? +ASM
The Oracle base remains unchanged with value /u01/app/oracle
# srvctl add volume -volume ACFSVOL1 -diskgroup DATA -device /dev/asm/acfsvol1-370
# srvctl add filesystem -device /dev/asm/acfsvol1-370 -path /acfsmounts/acfsvol1 -diskgroup DATA -user oracle -fstype ACFS -description "ACFS General Purpose Mount"

At this point the mount should be ready for read/write and will be automatically mounted by clusterware.

Administration of the ACFS mount:

If you need to resize the mount once created (since you granted control to the oracle user, this command can also be executed by the oracle user:

$ acfsutil size 25G /acfsmounts/acfsvol1
$ srvctl start filesystem -device /dev/asm/acfsvol1-370
$ srvctl stop filesystem -device /dev/asm/acfsvol1-370