Why is My App Table Scanning in PostgreSQL but not Oracle?

My team and I have been working on a lot of migrations off of Oracle and onto Google CloudSQL for Postgres / AlloyDB lately. One of the common things that I have been seeing in my performance tuning / migration activities is that Oracle handles certain application datatypes differently than PostgreSQL. For my most recent client, they used the “float” datatype extensively in their Java code and unfortunately, the PostgreSQL JDBC driver doesn’t convert that datatype like the Oracle JDBC driver does. The result of the datatype mismatch ends up as a full table scan in PostgreSQL whereas in Oracle it was using an index.

*** Note: This client did not use an ORM within their code. While I still need to test it, I am hopeful that this same issue will not manifest itself when using an ORM like SQLAlchemy (Python) or Hibernate (Java).

While in Oracle numbers are usually stored in the numeric datatype, you have many options within PostgreSQL to do the same thing:

  • numeric (x)
  • numeric (x,y)
  • numeric
  • smallint
  • bigint
  • int

Each serve a purpose and should be used with careful analysis. That said, don’t forget about the code! To demonstrate why, I modified a simple Java test harness I usually use to test connectivity / check SSL Encryption functionality during migrations to show what happens if the datatypes are not looked at carefully.

As you can see, in Oracle, the JDBC driver will convert any of the following types to “numeric“, the same query plan is also achieved using index range scans and no table scans. The output is below and Oracle Code is at the bottom of the post:

java -classpath /Users/shaneborden/Documents/java/ojdbc11.jar:/Users/shaneborden/Documents/java OracleJdbcTest
Password: *******
=====  Database info =====
   DatabaseProductName: Oracle
   DatabaseProductVersion: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0
   DatabaseMajorVersion: 19
   DatabaseMinorVersion: 0
=====  Driver info =====
   DriverName: Oracle JDBC driver
   DriverVersion: 21.8.0.0.0
   DriverMajorVersion: 21
   DriverMinorVersion: 8
=====  JDBC/DB attributes =====
   Supports getGeneratedKeys(): true
===== Database info =====


===== Query Plan - Cast Int to Numeric =====
   Plan hash value: 423740054

   -----------------------------------------------------------------------------------------------------------
   | Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
   -----------------------------------------------------------------------------------------------------------
   |   0 | SELECT STATEMENT                    |                     |     2 |    28 |     5   (0)| 00:00:01 |
   |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATATYPE_TEST       |     2 |    28 |     5   (0)| 00:00:01 |
   |*  2 |   INDEX RANGE SCAN                  | DATATYPE_NUMBER_VAL |     2 |       |     3   (0)| 00:00:01 |
   -----------------------------------------------------------------------------------------------------------

   Predicate Information (identified by operation id):
   ---------------------------------------------------

      2 - access("NUMBER_VAL"=:1)


===== Query Plan - Cast Long to Numeric =====
   Plan hash value: 423740054

   -----------------------------------------------------------------------------------------------------------
   | Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
   -----------------------------------------------------------------------------------------------------------
   |   0 | SELECT STATEMENT                    |                     |     2 |    28 |     5   (0)| 00:00:01 |
   |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATATYPE_TEST       |     2 |    28 |     5   (0)| 00:00:01 |
   |*  2 |   INDEX RANGE SCAN                  | DATATYPE_NUMBER_VAL |     2 |       |     3   (0)| 00:00:01 |
   -----------------------------------------------------------------------------------------------------------

   Predicate Information (identified by operation id):
   ---------------------------------------------------

      2 - access("NUMBER_VAL"=:1)


===== Query Plan - Cast Float to Numeric =====
   Plan hash value: 423740054

   -----------------------------------------------------------------------------------------------------------
   | Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
   -----------------------------------------------------------------------------------------------------------
   |   0 | SELECT STATEMENT                    |                     |     2 |    28 |     5   (0)| 00:00:01 |
   |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATATYPE_TEST       |     2 |    28 |     5   (0)| 00:00:01 |
   |*  2 |   INDEX RANGE SCAN                  | DATATYPE_NUMBER_VAL |     2 |       |     3   (0)| 00:00:01 |
   -----------------------------------------------------------------------------------------------------------

   Predicate Information (identified by operation id):
   ---------------------------------------------------

      2 - access("NUMBER_VAL"=:1)


===== Query Plan - Cast Double to Numeric =====
   Plan hash value: 423740054

   -----------------------------------------------------------------------------------------------------------
   | Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
   -----------------------------------------------------------------------------------------------------------
   |   0 | SELECT STATEMENT                    |                     |     2 |    28 |     5   (0)| 00:00:01 |
   |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATATYPE_TEST       |     2 |    28 |     5   (0)| 00:00:01 |
   |*  2 |   INDEX RANGE SCAN                  | DATATYPE_NUMBER_VAL |     2 |       |     3   (0)| 00:00:01 |
   -----------------------------------------------------------------------------------------------------------

   Predicate Information (identified by operation id):
   ---------------------------------------------------

      2 - access("NUMBER_VAL"=:1)


=========================
Command successfully executed

However in PostgreSQL, the same statement where we convert “int” to “numeric” and “long” to “Numeric” an index scan is executed, however when casting “float” and “double” datatypes to numeric a table scan results. Similar behavior is seen when using the other PostgreSQL datatypes such as “smallint”, “bigint” and “int”. The output is shown below and the PostgreSQL Code is located at the bottom of the post:

java -classpath /Users/shaneborden/Documents/java/postgresql-42.5.0.jar:/Users/shaneborden/Documents/java PostgresJdbcTest
Password: *******
=====  Database info =====
   DatabaseProductName: PostgreSQL
   DatabaseProductVersion: 14.4
   DatabaseMajorVersion: 14
   DatabaseMinorVersion: 4
=====  Driver info =====
   DriverName: PostgreSQL JDBC Driver
   DriverVersion: 42.5.0
   DriverMajorVersion: 42
   DriverMinorVersion: 5
=====  JDBC/DB attributes =====
   Supports getGeneratedKeys(): true
===== Database info =====
   Current Date from Postgres : 2022-12-27 16:25:41.493047-05
   Client connected pid from Postgres : 17727
   Postgres DB Unique Name from Postgres : mytpchdb
   Client connected hostname from Postgres : null
   Client connected application_name from Postgres : PostgreSQL JDBC Driver


===== Query Plan - Cast Int to Numeric =====
   Index Scan using datatype_test_numeric_decimal on datatype_test  (cost=0.43..6.47 rows=2 width=28) (actual time=0.057..0.060 rows=2 loops=1)
     Index Cond: (numeric_val = '10001'::numeric)
   Planning Time: 0.389 ms
   Execution Time: 0.085 ms


===== Query Plan - Cast Long to Numeric =====
   Index Scan using datatype_test_numeric_decimal on datatype_test  (cost=0.43..6.47 rows=2 width=28) (actual time=0.011..0.013 rows=2 loops=1)
     Index Cond: (numeric_val = '10001'::numeric)
   Planning Time: 0.126 ms
   Execution Time: 0.027 ms


===== Query Plan - Cast Float to Numeric =====
   Seq Scan on datatype_test  (cost=0.00..233334.01 rows=50000 width=28) (actual time=1050.733..2622.224 rows=2 loops=1)
     Filter: ((numeric_val)::double precision = '10001'::real)
     Rows Removed by Filter: 9999999
   Planning Time: 0.094 ms
   Execution Time: 2622.273 ms


===== Query Plan - Cast Double to Numeric =====
   Seq Scan on datatype_test  (cost=0.00..233334.01 rows=50000 width=28) (actual time=1055.081..2629.634 rows=2 loops=1)
     Filter: ((numeric_val)::double precision = '10001'::double precision)
     Rows Removed by Filter: 9999999
   Planning Time: 0.096 ms
   Execution Time: 2629.660 ms

As you can see, its very important to also ensure that your datatypes within your code are fully compliant with the destination RDBMS. The “double” and the “float” types within the Java code cause a table scan! While Oracle has become very forgiving with that over the years, PostgreSQL just isn’t there yet and you need to make sure that you adjust your code accordingly!

Code Samples:

Jar / Java Requirements:

  • openjdk 11
  • postgresql-42.5.0.jar
  • ojdbc11.jar

Compile:

To compile the code:
Oracle: javac OracleJdbcTest.java
Postgres: javac PostgresJdbcTest.java

Oracle Java Test Harness:

Note: To execute the code follow the instructions in the comment block at the top of the code. The instructions to create the sample table objects are also contained within the same comment block.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Properties;
import java.sql.*;

/*
 *
 * Simple Java Program to connect Oracle database by using Oracle JDBC thin driver
 * Make sure you have Oracle JDBC thin driver in your classpath before running this program
 * @author
 javac OracleJdbcTest.java
 java -classpath /Users/shaneborden/Documents/java/ojdbc11.jar:/Users/shaneborden/Documents/java OracleJdbcTest
 Setup:
 CREATE TABLE tc.datatype_test (
		number_decimal_val number(12,2), 
		number_val number(12),
    random_val number(4))
  TABLESPACE USERS;
 CREATE SEQUENCE tc.datatype_test_seq
  START WITH     1
  INCREMENT BY   1
  NOCACHE
  NOCYCLE;
 BEGIN
  FOR i in 1 .. 1000000
 LOOP
 INSERT INTO tc.datatype_test VALUES ( 
		tc.datatype_test_seq.nextval,
		floor(dbms_random.value(1, 1000000)),
    floor(dbms_random.value(1, 1000)));
 END LOOP;
 END;
 /

CREATE INDEX tc.datatype_number_decimal_val on tc.datatype_test(number_decimal_val);
CREATE INDEX tc.datatype_number_val on tc.datatype_test(number_val);
*/

public class OracleJdbcTest
{
  public static void main(String args[]) throws SQLException, ClassNotFoundException
  {
    try
    {
      java.io.Console console = System.console();
      Boolean dataTypeCheck = true;
      String sourceDatatType = "Numeric";
      String inputPassword = new String(console.readPassword("Password: "));
      Integer intQueryParam = 10001;
      Long longQueryParam = 10001L;
      Float floatQueryParam = 10001f;
      Double doubleQueryParam = 10001.0;

      /**Set URL of Oracle database server*/
      String url = "jdbc:oracle:thin:@//192.168.1.105:1521/vboxncdb.localdomain.com";
      String xPlanSql = "select * from table(dbms_xplan.display)";
       
      /** properties for creating connection to Oracle database */
      Properties props = new Properties();
      props.setProperty("user", "datatypeTestUser");
      props.setProperty("password",  inputPassword);
       
      /** creating connection to Oracle database using JDBC*/
       
      Connection conn = DriverManager.getConnection(url,props);
      DatabaseMetaData dbmd = conn.getMetaData();
       
      System.out.println("=====  Database info =====");
      System.out.println("   DatabaseProductName: " + dbmd.getDatabaseProductName() );
      System.out.println("   DatabaseProductVersion: " + dbmd.getDatabaseProductVersion() );
      System.out.println("   DatabaseMajorVersion: " + dbmd.getDatabaseMajorVersion() );
      System.out.println("   DatabaseMinorVersion: " + dbmd.getDatabaseMinorVersion() );
      System.out.println("=====  Driver info =====");
      System.out.println("   DriverName: " + dbmd.getDriverName() );
      System.out.println("   DriverVersion: " + dbmd.getDriverVersion() );
      System.out.println("   DriverMajorVersion: " + dbmd.getDriverMajorVersion() );
      System.out.println("   DriverMinorVersion: " + dbmd.getDriverMinorVersion() );
      System.out.println("=====  JDBC/DB attributes =====");
      if (dbmd.supportsGetGeneratedKeys() )
        System.out.println("   Supports getGeneratedKeys(): true");
      else
        System.out.println("   Supports getGeneratedKeys(): false");
      System.out.println("===== Database info =====");
       
      String sql = "with session_data as (";
            sql = sql + "select sysdate as current_day,SYS_CONTEXT ('USERENV', 'DB_UNIQUE_NAME') as db_name,SYS_CONTEXT ('USERENV', 'SERVICE_NAME') as service_name, ";
            sql = sql + "SYS_CONTEXT ('USERENV', 'HOST') as host, SYS_CONTEXT ('USERENV', 'IP_ADDRESS') as ip_address,  SYS_CONTEXT('USERENV','SID') sid from dual) ";
            sql = sql + "select sd.current_day, sd.db_name, sd.service_name, sd.host, sd.ip_address, ";
            sql = sql + "sd.sid, nvl(sci.network_service_banner, 'Traffic Not Encrypted') network_service_banner ";
            sql = sql + "from session_data sd ";
            sql = sql + "left join v$session_connect_info sci on (sd.sid = sci.sid) ";
            sql = sql + "where sci.network_service_banner like '%Crypto-checksumming service adapter%'";
       
      /** creating PreparedStatement object to execute query*/
      PreparedStatement preStatement = conn.prepareStatement(sql);
       
      ResultSet result = preStatement.executeQuery();
       
      while(result.next())
      {
        System.out.println("Current Date from Oracle : " +         result.getString("current_day"));
        System.out.println("Oracle DB Unique Name from Oracle : " +         result.getString("db_name"));
        System.out.println("Oracle Connected Listener Service Name from Oracle : " +         result.getString("service_name"));
        System.out.println("Client connected hostname from Oracle : " +         result.getString("host"));
        System.out.println("Client connected ip_address from Oracle : " +         result.getString("ip_address"));
        System.out.println("Client connected encryption info from Oracle : " +         result.getString("network_service_banner"));
      }


      if (dataTypeCheck)
        if (sourceDatatType == "Numeric122") {
          sql = "EXPLAIN PLAN FOR ";
          sql = sql + "select * from tc.datatype_test where number_decimal_val = ?";
        } else if (sourceDatatType == "Numeric") {
          sql = "EXPLAIN PLAN FOR ";
          sql = sql + "select * from tc.datatype_test where number_val = ?";
        }

        System.out.println("");
        System.out.println("");
        System.out.println("===== Query Plan - Cast Int to "+ sourceDatatType +" =====");

        /** creating PreparedStatement object to execute query*/
        preStatement = conn.prepareStatement(sql);

        preStatement.setInt(1, intQueryParam);
          
        result = preStatement.executeQuery();

        PreparedStatement xPlanStatement = conn.prepareStatement(xPlanSql);
        ResultSet xPlanResult = xPlanStatement.executeQuery();

        while(xPlanResult.next())
        {
            System.out.println("   " + xPlanResult.getString(1));
        }

        System.out.println("");
        System.out.println("");
        System.out.println("===== Query Plan - Cast Long to "+ sourceDatatType +" =====");

        preStatement.setLong(1, longQueryParam);
        
        result = preStatement.executeQuery();

        xPlanStatement = conn.prepareStatement(xPlanSql);
        xPlanResult = xPlanStatement.executeQuery();

        while(xPlanResult.next())
        {
            System.out.println("   " + xPlanResult.getString(1));
        }

        System.out.println("");
        System.out.println("");
        System.out.println("===== Query Plan - Cast Float to "+ sourceDatatType +" =====");

        preStatement.setFloat(1, floatQueryParam);
        
        result = preStatement.executeQuery();

        xPlanStatement = conn.prepareStatement(xPlanSql);
        xPlanResult = xPlanStatement.executeQuery();

        while(xPlanResult.next())
        {
            System.out.println("   " + xPlanResult.getString(1));
        }

        System.out.println("");
        System.out.println("");
        System.out.println("===== Query Plan - Cast Double to "+ sourceDatatType +" =====");

        preStatement.setDouble(1, doubleQueryParam);
        
        result = preStatement.executeQuery();

        xPlanStatement = conn.prepareStatement(xPlanSql);
        xPlanResult = xPlanStatement.executeQuery();

        while(xPlanResult.next())
        {
            System.out.println("   " + xPlanResult.getString(1));
        }

        System.out.println("");
        System.out.println("");

      conn.close();
      System.out.println("=========================");
      System.out.println("Command successfully executed");

    }

    catch(SQLException exp) {
	   System.out.println("Exception: " + exp.getMessage());
	   System.out.println("SQL State: " + exp.getSQLState());
	   System.out.println("Vendor Error: " + exp.getErrorCode());
    }
    
  }
}

PostgreSQL Java Test Harness:

Note: To execute the code follow the instructions in the comment block at the top of the code. The instructions to create the sample table objects are also contained within the same comment block.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Properties;
import java.sql.*;

/*
 *
 * Simple Java Program to connect Postgres database by using Postgres JDBC thin driver
 * Make sure you have Postgres JDBC thin driver in your classpath before running this program
 * @author
 java -classpath /Users/shaneborden/Documents/java/postgresql-42.5.0.jar:/Users/shaneborden/Documents/java PostgresJdbcTest
 Setup:
 CREATE TABLE datatype_test (
		int_val int, 
		bigint_val bigint, 
		numeric_val numeric(12),
    numeric_decimal_val numeric(12,2), 
		smallint_val smallint);
INSERT INTO datatype_test VALUES ( 
		generate_series(0,10000000), 
		generate_series(0,10000000), 
		floor(random()*10000000),
    random()*10000000, 
		floor(random()* (32765-1 + 1) + 1) );

SET SESSION max_parallel_maintenance_workers TO 4;
SET SESSION maintenance_work_mem TO '2 GB';

CREATE INDEX datatype_test_int on datatype_test(int_val);
CREATE INDEX datatype_test_bigint on datatype_test(bigint_val);
CREATE INDEX datatype_test_numeric on datatype_test(numeric_val);
CREATE INDEX datatype_test_numeric_decimal on datatype_test(numeric_val);
CREATE INDEX datatype_test_smallint on datatype_test(smallint_val);
*/

public class PostgresJdbcTest
{
  public static void main(String args[]) throws SQLException, ClassNotFoundException
  {
    try
    {
       java.io.Console console = System.console();
       Boolean dataTypeCheck = true;
       String sourceDatatType = "Numeric";
       String inputPassword = new String(console.readPassword("Password: "));
       Integer intQueryParam = 10001;
       Long longQueryParam = 10001L;
       Float floatQueryParam = 10001f;
       Double doubleQueryParam = 10001.0;

       /**Set URL of Postgres database server*/
        String url = "jdbc:postgresql://localhost:6000/mytpchdb";
       
       /** properties for creating connection to Postgres database */
       Properties props = new Properties();
       props.setProperty("user", "postgres");
       props.setProperty("password",  inputPassword);
       
       /** creating connection to Postgres database using JDBC*/
       
       Connection conn = DriverManager.getConnection(url,props);
       DatabaseMetaData dbmd = conn.getMetaData();
       
       System.out.println("=====  Database info =====");
       System.out.println("   DatabaseProductName: " + dbmd.getDatabaseProductName() );
       System.out.println("   DatabaseProductVersion: " + dbmd.getDatabaseProductVersion() );
       System.out.println("   DatabaseMajorVersion: " + dbmd.getDatabaseMajorVersion() );
       System.out.println("   DatabaseMinorVersion: " + dbmd.getDatabaseMinorVersion() );
       System.out.println("=====  Driver info =====");
       System.out.println("   DriverName: " + dbmd.getDriverName() );
       System.out.println("   DriverVersion: " + dbmd.getDriverVersion() );
       System.out.println("   DriverMajorVersion: " + dbmd.getDriverMajorVersion() );
       System.out.println("   DriverMinorVersion: " + dbmd.getDriverMinorVersion() );
       System.out.println("=====  JDBC/DB attributes =====");
       if (dbmd.supportsGetGeneratedKeys() )
         System.out.println("   Supports getGeneratedKeys(): true");
       else
         System.out.println("   Supports getGeneratedKeys(): false");
       System.out.println("===== Database info =====");
       
       String sql = "select now() as current_day,current_database() as db_name, ";
              sql = sql + "client_hostname as host, application_name, pid from pg_stat_activity ";
              sql = sql + " where pid = pg_backend_pid() ";
       
       /** creating PreparedStatement object to execute query*/
       PreparedStatement preStatement = conn.prepareStatement(sql);
       
       ResultSet result = preStatement.executeQuery();
       
       while(result.next())
       {
           System.out.println("   Current Date from Postgres : " +         result.getString("current_day"));
           System.out.println("   Client connected pid from Postgres : " +         result.getString("pid"));
           System.out.println("   Postgres DB Unique Name from Postgres : " +         result.getString("db_name"));
           System.out.println("   Client connected hostname from Postgres : " +         result.getString("host"));
           System.out.println("   Client connected application_name from Postgres : " +         result.getString("application_name"));
       }

       if (dataTypeCheck)
          if (sourceDatatType == "Int") {
            sql = "EXPLAIN (ANALYZE, COSTS)";
            sql = sql + "select * from datatype_test where int_val = ?";
          } else if (sourceDatatType == "Bigint") {
            sql = "EXPLAIN (ANALYZE, COSTS)";
            sql = sql + "select * from datatype_test where bigint_val = ?";
          } else if (sourceDatatType == "Numeric") {
            sql = "EXPLAIN (ANALYZE, COSTS)";
            sql = sql + "select * from datatype_test where numeric_val = ?";
          } else if (sourceDatatType == "Numeric122") {
            sql = "EXPLAIN (ANALYZE, COSTS)";
            sql = sql + "select * from datatype_test where numeric_val = ?";
          } else if (sourceDatatType == "Smallint") {
            sql = "EXPLAIN (ANALYZE, COSTS)";
            sql = sql + "select * from datatype_test where smallint_val = ?";
          }

          Statement stmt = conn.createStatement();
          stmt.execute("SET max_parallel_workers_per_gather = 0");

          System.out.println("");
          System.out.println("");
          System.out.println("===== Query Plan - Cast Int to "+ sourceDatatType +" =====");

          /** creating PreparedStatement object to execute query*/
          preStatement = conn.prepareStatement(sql);

          preStatement.setInt(1, intQueryParam);
          
          result = preStatement.executeQuery();
          
          while(result.next())
          {
              System.out.println("   " + result.getString(1));
          }

          System.out.println("");
          System.out.println("");
          System.out.println("===== Query Plan - Cast Long to "+ sourceDatatType +" =====");

          preStatement.setLong(1, longQueryParam);
          
          result = preStatement.executeQuery();
          
          while(result.next())
          {
            System.out.println("   " + result.getString(1));
          }

          System.out.println("");
          System.out.println("");
          System.out.println("===== Query Plan - Cast Float to "+ sourceDatatType +" =====");

          preStatement.setFloat(1, floatQueryParam);
          
          result = preStatement.executeQuery();
          
          while(result.next())
          {
            System.out.println("   " + result.getString(1));
          }

          System.out.println("");
          System.out.println("");
          System.out.println("===== Query Plan - Cast Double to "+ sourceDatatType +" =====");

          preStatement.setDouble(1, doubleQueryParam);
          
          result = preStatement.executeQuery();
          
          while(result.next())
          {
            System.out.println("   " + result.getString(1));
          }

          System.out.println("");
          System.out.println("");

      conn.close();
      System.out.println("=========================");
      System.out.println("Command successfully executed");

    }

    catch(SQLException exp) {
	   System.out.println("Exception: " + exp.getMessage());
	   System.out.println("SQL State: " + exp.getSQLState());
	   System.out.println("Vendor Error: " + exp.getErrorCode());
    }
    
  }
}

Advertisement

Tuning the PostgreSQL “random_page_cost” Parameter

In my previous post “Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!“, I introduced three PostgreSQL parameters that I feel are the most “neglected” and present the most opportunity for performance tuning in a PostgreSQL instance. I’ve previously posted parts 1 and 2 which cover “work_mem” and “effective_io_concurrency“, so in the final part of this series, I would like to demonstrate tuning the “random_page_cost” parameter.

Because PostgreSQL has the ability to be installed on many different types of systems, the default for this parameter represents a system that is likely the least performant, one that has low CPU and a disk subsystem that is less than ideal. This setting can be overridden at the individual object level as well, however that may represent a management nightmare so I would recommend against that. A good explanation of the parameter exists here, and for most CloudSQL instances, should likely be set lower than the default because random page costs are expected to be less expensive on the types of I/O subsystems are present within today’s cloud environments.

For those of you that come from Oracle backgrounds, this parameter is very much like the “OPTIMIZER_INDEX_COST_ADJ” parameter that we used to manipulate in older Oracle versions. To refresh your mind on this parameter you can see the 19c explanation here.

As a simple example of how the query plan can change for a simple SQL, I will first show the query plan with the default setting of 4. While it is using an index, the access path could be better:

set max_parallel_workers_per_gather = 0;
set session random_page_cost to 4;

explain (analyze, verbose, costs, settings, buffers, wal, timing, summary, format text)
SELECT c.c_name, c.c_acctbal, sum(o.o_totalprice)
FROM orders o
JOIN customer c ON (c.c_custkey = o.o_custkey)
WHERE c.c_custkey = 30003 and o.o_orderstatus = 'O'
GROUP BY c.c_name, c.c_acctbal;

                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=76.28..76.37 rows=1 width=57) (actual time=0.034..0.035 rows=0 loops=1)
   Output: c.c_name, c.c_acctbal, sum(o.o_totalprice)
   Group Key: c.c_name, c.c_acctbal
   Buffers: shared hit=7
   ->  Sort  (cost=76.28..76.30 rows=8 width=33) (actual time=0.033..0.034 rows=0 loops=1)
         Output: c.c_name, c.c_acctbal, o.o_totalprice
         Sort Key: c.c_name, c.c_acctbal
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=7
         ->  Nested Loop  (cost=4.97..76.16 rows=8 width=33) (actual time=0.027..0.028 rows=0 loops=1)
               Output: c.c_name, c.c_acctbal, o.o_totalprice
               Buffers: shared hit=7
               ->  Index Scan using customer_pk on public.customer c  (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.015 rows=1 loops=1)
                     Output: c.c_custkey, c.c_mktsegment, c.c_nationkey, c.c_name, c.c_address, c.c_phone, c.c_acctbal, c.c_comment
                     Index Cond: (c.c_custkey = '30003'::numeric)
                     Buffers: shared hit=4
               ->  Bitmap Heap Scan on public.orders o  (cost=4.55..67.64 rows=8 width=14) (actual time=0.009..0.009 rows=0 loops=1)
                     Output: o.o_orderdate, o.o_orderkey, o.o_custkey, o.o_orderpriority, o.o_shippriority, o.o_clerk, o.o_orderstatus, o.o_totalprice, o.o_comment
                     Recheck Cond: (o.o_custkey = '30003'::numeric)
                     Filter: (o.o_orderstatus = 'O'::bpchar)
                     Buffers: shared hit=3
                     ->  Bitmap Index Scan on order_customer_fkidx  (cost=0.00..4.55 rows=16 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                           Index Cond: (o.o_custkey = '30003'::numeric)
                           Buffers: shared hit=3
 Settings: effective_cache_size = '3053008kB', effective_io_concurrency = '10', max_parallel_workers_per_gather = '0', work_mem = '512MB'
 Query Identifier: 7272380376793434809
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.234 ms
 Execution Time: 0.076 ms

And now with a change to a setting of 2, we get a different access path:

set max_parallel_workers_per_gather = 0;
set session random_page_cost to 2;

explain (analyze, verbose, costs, settings, buffers, wal, timing, summary, format text)
SELECT c.c_name, c.c_acctbal, sum(o.o_totalprice)
FROM orders o
JOIN customer c ON (c.c_custkey = o.o_custkey)
WHERE c.c_custkey = 30003 and o.o_orderstatus = 'O'
GROUP BY c.c_name, c.c_acctbal;

                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=39.38..39.48 rows=1 width=57) (actual time=0.027..0.028 rows=0 loops=1)
   Output: c.c_name, c.c_acctbal, sum(o.o_totalprice)
   Group Key: c.c_name, c.c_acctbal
   Buffers: shared hit=7
   ->  Sort  (cost=39.38..39.40 rows=8 width=33) (actual time=0.026..0.027 rows=0 loops=1)
         Output: c.c_name, c.c_acctbal, o.o_totalprice
         Sort Key: c.c_name, c.c_acctbal
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=7
         ->  Nested Loop  (cost=0.85..39.26 rows=8 width=33) (actual time=0.021..0.022 rows=0 loops=1)
               Output: c.c_name, c.c_acctbal, o.o_totalprice
               Buffers: shared hit=7
               ->  Index Scan using customer_pk on public.customer c  (cost=0.42..4.44 rows=1 width=31) (actual time=0.012..0.012 rows=1 loops=1)
                     Output: c.c_custkey, c.c_mktsegment, c.c_nationkey, c.c_name, c.c_address, c.c_phone, c.c_acctbal, c.c_comment
                     Index Cond: (c.c_custkey = '30003'::numeric)
                     Buffers: shared hit=4
               ->  Index Scan using order_customer_fkidx on public.orders o  (cost=0.43..34.75 rows=8 width=14) (actual time=0.008..0.008 rows=0 loops=1)
                     Output: o.o_orderdate, o.o_orderkey, o.o_custkey, o.o_orderpriority, o.o_shippriority, o.o_clerk, o.o_orderstatus, o.o_totalprice, o.o_comment
                     Index Cond: (o.o_custkey = '30003'::numeric)
                     Filter: (o.o_orderstatus = 'O'::bpchar)
                     Buffers: shared hit=3
 Settings: effective_cache_size = '3053008kB', effective_io_concurrency = '10', max_parallel_workers_per_gather = '0', random_page_cost = '2', work_mem = '512MB'
 Query Identifier: 7272380376793434809
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.199 ms
 Execution Time: 0.064 ms

Now, for this simple example, the execution time isn’t vastly different, because in both cases an index is being used, however, in cases where the parameter adjustment allows an index to be used over a sequential scan, you will really see the benefit.

Ultimately, there are some other parameters that may benefit from adjustment such as the “cpu_*” parameters, however, those will require much more testing and experimentation over the adjustment of “random_page_cost” especially if your system is running SSDs as in most Google CloudSQL for Postgres instances or even Google AlloyDB where the I/O subsystem is built specifically for the implementation. And if you use either of these implementations, I would highly consider updating this parameter from the default of 4 to at least 2, maybe even 1.1 depending on the shape that you have chosen and the I/O limits served by each Shape.

Enjoy!

Tuning the PostgreSQL “effective_io_concurrency” Parameter

In my previous post “Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!“, I introduced three PostgreSQL parameters that I feel are the most “neglected” and present the most opportunity for performance tuning in a PostgreSQL instance. I’ve previously discussed “work_mem” in a previous post, so in Part 2 of this series, I would like to demonstrate tuning the “effective_io_concurrency” parameter. While this parameter has been discussed in other blogs, I will attempt to make this discussion relevant to how it might affect a CloudSQL instance.

The parameter “effective_io_concurrency” reflects the number of simultaneous requests that can be handled efficiently by the disk subsystem. One thing to keep in mind is that currently this parameter only effects “bitmap_heap_scans” where the data is not already present in the shared buffer. In general, if using spinning HDD devices, this should be set to reflect the number of drives that participate in the RAID stripe. In cases where SSDs are used, you can set this value much higher, although you must take into account any Quality of Service I/O ops limits which are usually present in a cloud implementation. A full explanation of the parameter can be found here.

To do a simple demonstration of how this parameter can effect queries, I set up a small Google CloudSQL for Postgres instance (2 vCPU X 8GB memory) and loaded up some tables, then executed a query that ensured a “bitmap heap scan” changing “effective_io_concurrency” parameter between each test. In addition, the instance was bounced before each test to ensure that the shared buffers were cleared.

Setup:

CREATE TABLE public.effective_io_concurrency_test (
		id int PRIMARY KEY, 
		value numeric,
		product_id int,
		effective_date timestamp(3)
		);
INSERT INTO public.effective_io_concurrency_test VALUES ( 
		generate_series(0,100000000), 
		random()*1000,
		random()*100,
		current_timestamp(3));

CREATE INDEX prod_id_idx ON public.effective_io_concurrency_test (product_id);
VACUUM ANALYZE public.effective_io_concurrency_test;

Execution:

The resulting query plans did not show any variation in execution path or cost, but the timings did vary across the tests.

EXPLAIN (analyze, verbose, costs, settings, buffers, wal, timing, summary, format text)
SELECT * FROM public.effective_io_concurrency_test
WHERE id BETWEEN 10000 AND 100000;

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.effective_io_concurrency_test  (cost=7035.15..522009.95 rows=547023 width=27) (actual time=293.542..33257.631 rows=588784 loops=1)
   Output: id, value, product_id, effective_date
   Recheck Cond: (((effective_io_concurrency_test.id >= 10000) AND (effective_io_concurrency_test.id = 100) AND (effective_io_concurrency_test.product_id   BitmapOr  (cost=7035.15..7035.15 rows=547450 width=0) (actual time=156.951..156.954 rows=0 loops=1)
         Buffers: shared hit=6 read=668
         I/O Timings: read=24.501
         ->  Bitmap Index Scan on effective_io_concurrency_test_pkey  (cost=0.00..1459.74 rows=94117 width=0) (actual time=14.908..14.908 rows=90001 loops=1)
               Index Cond: ((effective_io_concurrency_test.id >= 10000) AND (effective_io_concurrency_test.id   Bitmap Index Scan on prod_id_idx  (cost=0.00..5301.90 rows=453333 width=0) (actual time=142.040..142.040 rows=499255 loops=1)
               Index Cond: ((effective_io_concurrency_test.product_id >= 100) AND (effective_io_concurrency_test.product_id <= 200))
               Buffers: shared hit=3 read=421
               I/O Timings: read=14.154
 Settings: effective_cache_size = '3259448kB', effective_io_concurrency = '8', random_page_cost = '2', work_mem = '512MB'
 Query Identifier: -8974663893066369302
 Planning:
   Buffers: shared hit=103 read=17
   I/O Timings: read=26.880
 Planning Time: 28.350 ms
 Execution Time: 33322.389 ms

Summary:

effective_io_concurrencyQuery Time
1 /* CloudSQL Default */194708.918 ms
2 /* Equal number of CPU */107953.205 ms
4 /* 2x number of CPU */58161.010 ms
8 /* 4x number of CPU */33322.389 ms
10 /* 5x number of CPU */30118.593 ms
20 /* 6x number of CPU */28758.106 ms

As you can see, there is a diminishing return as we increased the parameter, but why? Upon looking at Google Cloud Console “System Insights” the reason was clear.

**** One thing to note, is that the CPU Utilization spike is a result of the shutdown and restart of the instance between each test. The utilization following the spike represents the utilization found during the test itself.

The Conclusion:

While CPU utilization didn’t hit any limit, the IOPS limits for that CloudSQL shape did. You can add IOPS by changing the shape, but the point of this was to show that the optimal setting always depends on your workload and instance shape. In this case and for this CloudSQL shape, you might actually want to choose a setting of “4” which represents a setting of 2x the number of CPU and one that doesn’t quite max out the guaranteed IOPS. The setting doesn’t get you the fastest query time, but does leave resources left over for other queries to execute at the same time.

As always, be sure to test any changed in your own system and balance accordingly because your “mileage may vary” depending on your individual situation. That being said, in almost no cases is the default setting acceptable unless you are running HDD or on an OS which lacks “posix_fadvise” function (like MacOS or Solaris).

Enjoy!

Tuning the PostgreSQL “work_mem” Parameter

In my previous post “Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!“, I introduced three PostgreSQL parameters that I feel are the most “neglected” and present the most opportunity for performance tuning in a PostgreSQL instance. In the first of what will become a three part series, I would like to demonstrate tuning the “work_mem” parameter.

The “work_mem” parameter optimizes database operations such as:

  • sorts
  • bitmap heap scans
  • hash joins
  • materialized common table expressions (WITH statements)

To get started, lets create a test table with 100M rows of data:

CREATE TABLE public.work_mem_test (
		id int PRIMARY KEY, 
		value numeric,
		product_id int,
		effective_date timestamp(3)
		);
INSERT INTO public.work_mem_test VALUES ( 
		generate_series(0,100000000), 
		random()*1000,
		random()*100,
		current_timestamp(3));

CREATE INDEX prod_value_idx ON public.work_mem_test (product_id);
VACUUM ANALYZE public.work_mem_test;

We will then run an explain analyze with the “COSTS, BUFFERS, VERBOSE” options so that we can fully see what is going on with the query. For demonstration purposes, I have set the “work_mem” to the lowest possible setting of 64kB. In addition, so that we don’t get the variability of parallel processing I have set the “max_parallel_workers_per_gather” to zero to disable parallel processing. Most systems may also experience better gains than this test case as this was a very small 2 vCPU / 8GB Google CloudSQL PostgreSQL instance:

set session work_mem to '64kB';
set max_parallel_workers_per_gather = 0;
set effective_io_concurrency = 20;
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT * FROM public.work_mem_test
WHERE id BETWEEN 10000 AND 100000
OR product_id BETWEEN 100 AND 200
ORDER BY value NULLS FIRST;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2640424.97..2641959.63 rows=613866 width=27) (actual time=16593.228..16778.132 rows=589379 loops=1)
   Output: id, value, product_id, effective_date
   Sort Key: work_mem_test.value NULLS FIRST
   Sort Method: external merge  Disk: 24248kB
   Buffers: shared hit=6 read=363702, temp read=15340 written=16486
   I/O Timings: read=4120.104
   ->  Bitmap Heap Scan on public.work_mem_test  (cost=7805.12..2539439.23 rows=613866 width=27) (actual time=82.454..15413.822 rows=589379 loops=1)
         Output: id, value, product_id, effective_date
         Recheck Cond: (((work_mem_test.id >= 10000) AND (work_mem_test.id <= 100000)) OR ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200)))
         Rows Removed by Index Recheck: 48506004
         Heap Blocks: exact=2058 lossy=360975
         Buffers: shared hit=6 read=363702
         I/O Timings: read=4120.104
         ->  BitmapOr  (cost=7805.12..7805.12 rows=614306 width=0) (actual time=80.340..80.342 rows=0 loops=1)
               Buffers: shared hit=6 read=669
               I/O Timings: read=17.683
               ->  Bitmap Index Scan on work_mem_test_pkey  (cost=0.00..1280.95 rows=82638 width=0) (actual time=12.680..12.680 rows=90001 loops=1)
                     Index Cond: ((work_mem_test.id >= 10000) AND (work_mem_test.id <= 100000))
                     Buffers: shared hit=3 read=247
                     I/O Timings: read=7.831
               ->  Bitmap Index Scan on prod_value_idx  (cost=0.00..6217.24 rows=531667 width=0) (actual time=67.657..67.657 rows=499851 loops=1)
                     Index Cond: ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200))
                     Buffers: shared hit=3 read=422
                     I/O Timings: read=9.852
 Query Identifier: 731698457235411789
 Planning:
   Buffers: shared hit=39 read=2
   I/O Timings: read=2.588
 Planning Time: 3.136 ms
 Execution Time: 16811.970 ms
(30 rows)

Time: 16903.784 ms (00:16.904)

EXPLAIN, via the BUFFERS keyword gives us the following data points:

Rows Removed by Index Recheck: 48506004
Heap Blocks: exact=2058 lossy=360975


Execution Time: 16811.970 ms

This essentially means that the 64kB of work_mem can hold 2058 blocks in the bitmap structure within that work_mem size. To get the remainder of the results, everything that falls out of that bitmap
are lossy blocks, meaning that they don’t point to an exact tuple, but to rather a block with many tuples. The recheck condition then checks that block for the tuples the query is looking for.

The following formula is a starting point, but may or may not give you the exact setting needed based on various factors. Since we used the lowest possible work_mem, the setting becomes a multiple of that:

new_mem_in_mbytes = 
 ((exact heap blocks + lossy heap blocks) / exact heap blocks) * work_mem_in_bytes / 1048576
= ceil(round(((2058 + 360975) / 2058) * 65536 / 1048576,1))
= 11MB

Note: In most cases, I have found that this formula has worked well on the first pass, however as you will see in the subsequent tests, this estimated work_mem setting wasn’t quite close to the actual amount needed and this is likely due to a mis-estimate by the planner

Reducing the Lossy Block Access

So for the next test I will increase the “work_mem” to 11MB and re-execute the test.

set session work_mem to '11MB';
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT * FROM public.work_mem_test
WHERE id BETWEEN 10000 AND 100000
OR product_id BETWEEN 100 AND 200
ORDER BY value NULLS FIRST;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2160340.77..2161875.43 rows=613866 width=27) (actual time=11382.002..11574.572 rows=589379 loops=1)
   Output: id, value, product_id, effective_date
   Sort Key: work_mem_test.value NULLS FIRST
   Sort Method: external merge  Disk: 24232kB
   Buffers: shared hit=23329 read=340379, temp read=3029 written=3034
   I/O Timings: read=3618.302
   ->  Bitmap Heap Scan on public.work_mem_test  (cost=7805.12..2090832.53 rows=613866 width=27) (actual time=185.251..10923.764 rows=589379 loops=1)
         Output: id, value, product_id, effective_date
         Recheck Cond: (((work_mem_test.id >= 10000) AND (work_mem_test.id = 100) AND (work_mem_test.product_id   BitmapOr  (cost=7805.12..7805.12 rows=614306 width=0) (actual time=132.954..132.957 rows=0 loops=1)
               Buffers: shared hit=675
               ->  Bitmap Index Scan on work_mem_test_pkey  (cost=0.00..1280.95 rows=82638 width=0) (actual time=4.449..4.450 rows=90001 loops=1)
                     Index Cond: ((work_mem_test.id >= 10000) AND (work_mem_test.id   Bitmap Index Scan on prod_value_idx  (cost=0.00..6217.24 rows=531667 width=0) (actual time=128.503..128.503 rows=499851 loops=1)
                     Index Cond: ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200))
                     Buffers: shared hit=425
 Query Identifier: 731698457235411789
 Planning:
   Buffers: shared hit=30
 Planning Time: 0.179 ms
 Execution Time: 11611.071 ms
(26 rows)

Time: 11695.952 ms (00:11.696)

With 11MB, we got more exact heap blocks, but still not enough memory to process. Applying the formula based on the execution plan of the query….

new_mem_in_mbytes = 
 ((exact heap blocks + lossy heap blocks) / exact heap blocks) * work_mem_in_bytes / 1048576
= ceil(round(((164090 + 198943) / 164090) * 12582912 / 1048576,1));
= 24MB


Let’s increase just a little bit more to 24MB as the latest iteration of the formula has suggested.

set session work_mem to '24MB';
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT * FROM public.work_mem_test
WHERE id BETWEEN 10000 AND 100000
OR product_id BETWEEN 100 AND 200
ORDER BY value NULLS FIRST;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1709385.33..1710919.99 rows=613866 width=27) (actual time=3651.589..3791.250 rows=589379 loops=1)
   Output: id, value, product_id, effective_date
   Sort Key: work_mem_test.value NULLS FIRST
   Sort Method: external merge  Disk: 24232kB
   Buffers: shared hit=23329 read=340379, temp read=3029 written=3031
   I/O Timings: read=1493.162
   ->  Bitmap Heap Scan on public.work_mem_test  (cost=7805.12..1639877.09 rows=613866 width=27) (actual time=348.261..3201.421 rows=589379 loops=1)
         Output: id, value, product_id, effective_date
         Recheck Cond: (((work_mem_test.id >= 10000) AND (work_mem_test.id = 100) AND (work_mem_test.product_id   BitmapOr  (cost=7805.12..7805.12 rows=614306 width=0) (actual time=188.309..188.311 rows=0 loops=1)
               Buffers: shared hit=675
               ->  Bitmap Index Scan on work_mem_test_pkey  (cost=0.00..1280.95 rows=82638 width=0) (actual time=5.090..5.091 rows=90001 loops=1)
                     Index Cond: ((work_mem_test.id >= 10000) AND (work_mem_test.id   Bitmap Index Scan on prod_value_idx  (cost=0.00..6217.24 rows=531667 width=0) (actual time=183.215..183.215 rows=499851 loops=1)
                     Index Cond: ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200))
                     Buffers: shared hit=425
 Query Identifier: 731698457235411789
 Planning:
   Buffers: shared hit=30
 Planning Time: 0.242 ms
 Execution Time: 3828.215 ms
(25 rows)

Time: 3912.670 ms (00:03.913)

No more lossy block scans! Time has also reduced quite significantly from the first execution.

Handling the Sort Method

Now, we need to pay attention to the explain plan line:

 "Bitmap Heap Scan on public.work_mem_test….rows=613866 width=27" 
 "Sort Method: external merge Disk: 24232kB."

Some of the sort is in memory and some is spilled to disk. So in order to fit the entire rowset in memory, we must multiply the input rows by the width, which is 16MB. In addition, the planner spilled another 24MB to disk, so let’s add that also to “work_mem”.

16Mb + 24Mb which is being spilled = 40Mb more "work_mem"

So with the current “work_mem” of 24MB plus the additional computed to remove the sort (rounded up), the total needed is 64MB.

Lets run one more test:

set session work_mem to '64MB';
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT * FROM public.work_mem_test
WHERE id BETWEEN 10000 AND 100000
OR product_id BETWEEN 100 AND 200
ORDER BY value NULLS FIRST;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=613087.41..614622.07 rows=613866 width=27) (actual time=3186.918..3309.896 rows=589379 loops=1)
   Output: id, value, product_id, effective_date
   Sort Key: work_mem_test.value NULLS FIRST
   Sort Method: quicksort  Memory: 61169kB
   Buffers: shared hit=6 read=363702
   I/O Timings: read=1306.892
   ->  Bitmap Heap Scan on public.work_mem_test  (cost=7805.12..554071.67 rows=613866 width=27) (actual time=245.348..2908.344 rows=589379 loops=1)
         Output: id, value, product_id, effective_date
         Recheck Cond: (((work_mem_test.id >= 10000) AND (work_mem_test.id = 100) AND (work_mem_test.product_id   BitmapOr  (cost=7805.12..7805.12 rows=614306 width=0) (actual time=115.051..115.053 rows=0 loops=1)
               Buffers: shared hit=6 read=669
               I/O Timings: read=3.561
               ->  Bitmap Index Scan on work_mem_test_pkey  (cost=0.00..1280.95 rows=82638 width=0) (actual time=6.160..6.161 rows=90001 loops=1)
                     Index Cond: ((work_mem_test.id >= 10000) AND (work_mem_test.id   Bitmap Index Scan on prod_value_idx  (cost=0.00..6217.24 rows=531667 width=0) (actual time=108.889..108.889 rows=499851 loops=1)
                     Index Cond: ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200))
                     Buffers: shared hit=3 read=422
                     I/O Timings: read=2.231
 Query Identifier: 731698457235411789
 Planning:
   Buffers: shared hit=30
 Planning Time: 0.180 ms
 Execution Time: 3347.271 ms
(28 rows)

Time: 3431.188 ms (00:03.431)

With that adjustment, we have significantly increased the efficiency and performance of the query. From the beginning, just by tuning “work_mem”, we have shaved approximately 13.5 seconds of processing time!

What about a top-N Heapsort??

Now if we want to demonstrate a top-N Heapsort, we can change the query just a little bit more:

set session work_mem to '64MB';
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT * FROM public.work_mem_test
WHERE id BETWEEN 10000 AND 100000
OR product_id BETWEEN 100 AND 200
ORDER BY value NULLS FIRST LIMIT 10;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=567337.09..567337.12 rows=10 width=27) (actual time=3021.185..3021.190 rows=10 loops=1)
   Output: id, value, product_id, effective_date
   Buffers: shared hit=6 read=363702
   I/O Timings: read=1313.044
   ->  Sort  (cost=567337.09..568871.76 rows=613866 width=27) (actual time=3021.183..3021.186 rows=10 loops=1)
         Output: id, value, product_id, effective_date
         Sort Key: work_mem_test.value NULLS FIRST
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=6 read=363702
         I/O Timings: read=1313.044
         ->  Bitmap Heap Scan on public.work_mem_test  (cost=7805.12..554071.67 rows=613866 width=27) (actual time=235.909..2911.978 rows=589379 loops=1)
               Output: id, value, product_id, effective_date
               Recheck Cond: (((work_mem_test.id >= 10000) AND (work_mem_test.id = 100) AND (work_mem_test.product_id   BitmapOr  (cost=7805.12..7805.12 rows=614306 width=0) (actual time=108.429..108.431 rows=0 loops=1)
                     Buffers: shared hit=6 read=669
                     I/O Timings: read=3.114
                     ->  Bitmap Index Scan on work_mem_test_pkey  (cost=0.00..1280.95 rows=82638 width=0) (actual time=5.582..5.582 rows=90001 loops=1)
                           Index Cond: ((work_mem_test.id >= 10000) AND (work_mem_test.id   Bitmap Index Scan on prod_value_idx  (cost=0.00..6217.24 rows=531667 width=0) (actual time=102.845..102.845 rows=499851 loops=1)
                           Index Cond: ((work_mem_test.product_id >= 100) AND (work_mem_test.product_id <= 200))
                           Buffers: shared hit=3 read=422
                           I/O Timings: read=2.037
 Query Identifier: 4969544646514690020
 Planning:
   Buffers: shared hit=30
 Planning Time: 0.177 ms
 Execution Time: 3023.304 ms
(32 rows)

Time: 3107.421 ms (00:03.107)

Because we are only returning the top N rows, the memory used is not as high because a different sort methodology can be used. In addition, the time is further reduced.

As you can see with, a little tuning of the “work_mem” parameter, lots of performance can be gained in the system. In this example, we have increased “work_mem” a fairly small amount from 64kb to 64MB. In my mind you never want to increase “work_mem” to a setting where if all workers were being worked by CPU, you could overrun the free memory on the system. Also, remember that there is some overhead to maintaining that memory so it’s really important to find a good balance for your workload. Keep in mind that you can set this parameter at the server level, as an alter in the query text or at the user level as a profile.

Enjoy!

Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!

In my new role at Google, not only am I still working with lots of Oracle and replication tools, I am also expanding more into moving Oracle systems to Google Cloud on either CloudSQL for PostgreSQL or AlloyDB for PostgreSQL. After you have been looking at the systems for a little bit of time, there seem to be a few things worth tweaking from the out of the box values. It is my goal to discuss some of those things now and in future blog posts.

Let me start off by saying managed PostgreSQL CloudSQL products such as Google’s CloudSQL for PostgreSQL and AlloyDB for PostgreSQL (in Preview as of this post) are designed to be low maintenance and fit many different types of workloads. That being said, there are a few configuration parameters that you should really look at tuning as the defaults (as of PostgreSQL version 14) in most cases are just not set to the most efficient value if your workload is anything more than a VERY light workload.

work_mem

Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files and the default value is four megabytes (4MB). People coming from the Oracle world will equate this setting with PGA, however you must keep in mind that the implementation is “private” memory in PostgreSQL while it is “shared” memory in Oracle. You must take care not to over configure this setting in PostgreSQL.

A full description of the parameter can be found here.

random_page_cost

Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page and the default is 4.0. In reality this setting is good for a system in which disk performance is a concern (i.e a system with HDD vs SSDs) as it is assumed that random disk access is 40x slower than sequential access. Essentially if you want your system to prefer index and cache reads, lower this number from the default, but to no lower than the setting for seq_page_cost. For normal CloudSQL for PostgreSQL deployments that use SSD, I like to set this to 2. In deployments which utilize AlloyDB for PostgreSQL an even lower setting of 1.1 can be used due to the efficient Colossus Storage implementation.

For those that have been around Oracle for a while, this parameter behaves much like the “optimizer_index_cost_adj” parameter.

A full description of the parameter can be found here.

effective_io_concurrency

Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel. The default is 1 and at this point this setting only effects bitmap heap scans. That being said, bitmap heap scans, while efficient, by nature have to look at the index as well as a corresponding heap block and if that data has to be read from disk and if your system can handle the parallelism like when you use SSD storage, you should increase this to a more meaningful value. I will do a separate blog post to show the effects of this, but in general as this number is increased beyond 1/2 the number of CPUs available, greater diminishing returns are observed.

A full description of the parameter can be found here.

In closing, just like Oracle and other RDBMSs, there are numerous configuration parameters all which can have effects on the workload. However, the above three parameters are the ones I most often find that have opportunities for optimization, especially on more modern platforms.

In future posts I will detail how each one of these can change a workload.

Enjoy!

Indexes Can Check-Out Anytime, But Can They Leave?

When I think about indexes in any DBMS, I immediately think about the lyrics in the Eagles song “Hotel California”.

“You can check-out any time you like, but you can never leave…..”

Almost every system that has been around for a while is over indexed. Every index costs something to maintain and some more than others. Also, systems and code change. There could be an index that was once needed, but over time the code shifts to use another one thats more efficient instead. Guess what though, you are still maintaining that old index. That costs CPU, I/O which translates into TIME.

Thankfully in Oracle 12.2, they have introduced a feature called INDEX USAGE TRACKING. In older versions this was called INDEX MONITORING and it was much harder and invasive to use. In the newer versions, it just works and you can read more about both options here:

https://oracle-base.com/articles/12c/index-usage-tracking-12cr2

By default, indexes in the views “DBA_INDEX_USAGE” and “V$INDEX_USAGE_INFO”. The “V$” table is flushed to the “DBA” table every 15 minutes. For most purposes, using the “DBA” table is just fine. The thing to remember is that querying only these tables tell you the indexes that are being USED, not the ones that are NOT USED. To get the ones that are not used, we just need to expand the query a bit:

SELECT
    di.owner,
    di.index_name,
    di.index_type,
    di.table_name
FROM
    dba_indexes di
WHERE
             di.owner in (select username from dba_users where oracle_maintained = 'N')
    AND di.uniqueness = 'NONUNIQUE'
    AND NOT EXISTS (
        SELECT
            1
        FROM
            dba_index_usage iu
        WHERE
                iu.owner = di.owner
            AND iu.name = di.index_name
    )
GROUP BY
    di.owner,
    di.index_name,
    di.index_type,
    di.table_name
ORDER BY
    di.owner,
    di.table_name,
    di.index_name;

Now that we have the indexes that are not used, which ones have the biggest impact on database operations? We just expand the query again to leverage the “DBA_TAB_MODIFICATIONS” table. This will allow us to see how many operations occur on each table that the index sits on:

SELECT
    di.owner,
    di.index_name,
    di.index_type,
    di.table_name,
    SUM(tm.inserts + tm.updates + tm.deletes) operations
FROM
    dba_indexes di,
    dba_tab_modifications tm
WHERE
             di.owner in (select username from dba_users where oracle_maintained = 'N')
    AND di.owner = tm.table_owner
    AND di.table_name = tm.table_name
    AND di.uniqueness = 'NONUNIQUE'
    AND NOT EXISTS (
        SELECT
            1
        FROM
            dba_index_usage iu
        WHERE
                iu.owner = di.owner
            AND iu.name = di.index_name
    )
GROUP BY
    di.owner,
    di.index_name,
    di.index_type,
    di.table_name
ORDER BY
    SUM(tm.inserts + tm.updates + tm.deletes) ASC NULLS FIRST,
    di.owner,
    di.table_name,
    di.index_name;

Now, I have done an equi-join condition against the DBA_TAB_MODIFICATIONS so this is only showing us unused indexes against tables that the system has recorded operations against. If you wanted to show all indexes regardless if there were operations or not, then just change the join condition against DBA_TAB_MODIFICATIONS to be an outer join.

In my case, there were some very large indexes that were unused sitting on tables that had LOTS of operations (names changed to protect the innocent):

OWNERINDEX NAMEINDEX TYPETABLE NAMEOPERATIONS
HRFK_1_IDXNORMALEMPLOYEES295931545
HR1EMP_TOTFUNCTION-BASED NORMALEMP_HOURS1374288673
HR2EMP_TOTFUNCTION-BASED NORMALEMP_HOURS1425114200
HR3EMP_TOTFUNCTION-BASED NORMALEMP_HOURS2487284020

Adding “OPERATIONS” to the query now gives us some sense of the “BENEFIT” of dropping the index. Based on the results you can present your case for dropping the index. People always love to have the “BENEFIT” quantified and this is the closest thing you can do without performing a session trace which could prove to be invasive and fairly difficult to get the exact statement you are looking for.

Hopefully this will give you the additional information you need to advocate for dropping that index that is no longer being used.

SQLPlus is Expensive!


I was recently tasked to work on a server that had severe performance issues. The first thing I noticed was that the server was, at times, running at a load average 3-4x and under steady state about 1.5x the number of cores on the host! Further investigation, yielded that there was a large quantity of SQPlus invocations from a set of ETL scripts. Digging even more, I found that the script performed ETL across more than 100 schemas 75% of which were done via parallel background processes with a random “sleep” built in.

This is where I noticed something interesting. Instead of coding the script to invoke SQLPlus once, run a bunch of statements and exit, the coder constructed the script to invoke SQLPlus, run a statement, exit, invoke SQLPlus, run a statement, and on and on and on.

My first thought was that this has to be EXPENSIVE right?

When a SQLPlus login is made, a lot of things happen. Simply stated, the OS has to start a database process, and the database has to do a bunch of things to get your session ready (password authentication, memory allocation, update some tables with login details, setup permissions), etc, etc. Managing processes is one of the most expensive things an Operating System can do.

In terms of the database itself, no less than 19 individual statements are executed in order to prepare your session. Digging into those statements, probably one of the most expensive is:

sql_id:  9zg9qd9bm4spu
update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :2) where user#=:1


It can be so expensive, Oracle has actually logged a bug (Bug 33121934  Library cache lock / load lock / mutex x during connection storm due to update user$) where you can discontinue updating the last login time for every login and do it every X number of seconds. The explain plan for this statement is actually quite lengthy and there is a lot more work going on than you think!

So based on all of that, I headed back to my scripts only to discover for every invocation of the script (the script is invoked for each of the 114 schemas), there were a minimum of 11 SQLPlus invocations and a max of of 27 invocations depending on the schema. So every minute that this script was executed there was a total of almost 1400 SQLPlus Invocations!!! Definitely a big contributor to the very high load average on the server.

As with anything in IT, change is tough and there’s hardly ever much appetite to change something that “works”. That is where it is important to always quantify the benefits. So to do this I wrote a quick script to show the difference in CPU Utilization for a script that does many SQLPlus invocations vs one that does one invocation and then calls many sql scripts.

The resultant stats are as follows. For the original script in the test system I used had one SQLPlus invocation for each sql file that was called, the following resources were used. The original script executed SQLPlus 61 times and yielded the following system usage:

real     0m0.876s
user     0m0.480s
sys      0m0.463s

The modified script that reduced the number of calls from 61 to 14 (77%) yielded the following system usage:

real     0m0.188s
user     0m0.117s
sys      0m0.085s

*** NOTE: The test system I was working on had no data to manipulate for ETL so most of the work was the SQLPlus call. Which is what I wanted to demonstrate anyway!

As you can see, reducing the number of SQLPlus calls by 77%, reduced system usage by:

  • real: 79%
  • user: 75%
  • sys: 82%

So with a small change to group sql statements into one SQLPlus call, we are able to drastically improve system stability and reduce resource utilization. A WIN in my book! Once this gets through the test cycles, I will follow up with a post on the improvement seen in production.

Why Aren’t My Exadata Database Queries Performing Smart Scans?


Recently, I was performing prerequisite checks on an Exadata in preparation for the normal quarterly full stack patch and in doing so came across a section of the Exacheck with the heading “Database should not be in DST upgrade state“. Upon investigation I found that to be true:

select name, value$ from sys.props$ where name like '%DST%';

NAME			       VALUE$
------------------------------ ------------------------------
DST_UPGRADE_STATE	       UPGRADE
DST_PRIMARY_TT_VERSION	       32
DST_SECONDARY_TT_VERSION       18

Based on this, I took this as an action item to correct during the patching cycle and continued the normal checks and a quick look through recent AWRs. Is was then that I found something interesting in the “Exadata” section of the report:

As you can see, nothing is being offloaded and in the “Passthru Reasons” section “timezone” is the culprit. I then went to look at a query or two and sure enough, that confirms it:


So sure enough a stuck timezone file upgrade can cause the entire reason you bought an Exadata not to work! The date it quit working also coincides with when an upgrade to 19c occurred. Another quick sanity check of MOS also revealed the following note:

Exadata: Database Performance Degrades when Database is in Timezone Upgrade Mode (Doc ID 1583297.1)

So now, how to fix it? The note says to try the following commands, but unfortunately that did not work:

alter session set events '30090 trace name context forever, level 32';
exec dbms_dst.unload_secondary;
BEGIN dbms_dst.unload_secondary; END;

*
ERROR at line 1:
ORA-56938: no secondary time zone data file being loaded by on-demand or a datapump job
ORA-06512: at "SYS.DBMS_DST", line 1969
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.DBMS_DST", line 1906
ORA-06512: at line 1

So at this point we could either try the timezone upgrade again, or try to force the timezone upgrade window to close. The following MOS note discusses the timezone upgrade process:

Updating the RDBMS DST version in 12c Release 1 (12.1.0.1 ) and above using DBMS_DST (Doc ID 1509653.1)

Due to time, I decided to try forcing the upgrade window to close (as all other indicators showed that the upgrade was successful on all other objects) and immediately we found the reason why the upgrade failed in the first place…. An invalid and UNUSED version of APEX!!!

alter session set "_with_subquery"=materialize;
alter session set "_simple_view_merging"=TRUE;
set serveroutput on
VAR numfail number
BEGIN
DBMS_DST.UPGRADE_DATABASE(:numfail,
parallel => TRUE,
log_errors => TRUE,
log_errors_table => 'SYS.DST$ERROR_TABLE',
log_triggers_table => 'SYS.DST$TRIGGER_TABLE',
error_on_overlap_time => FALSE,
error_on_nonexisting_time => FALSE);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :numfail);
END;
/
VAR fail number
BEGIN
DBMS_DST.END_UPGRADE(:fail);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :fail);
END;
/
SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;
Table list: "APEX_040200"."WWV_FLOW_FEEDBACK"
BEGIN
*
ERROR at line 1:
ORA-04098: trigger 'APEX_040200.WWV_FLOW_FEEDBACK_T1' is invalid and failed re-validation
ORA-06512: at "SYS.DBMS_DST", line 1131
ORA-06512: at "SYS.DBMS_DST", line 611
ORA-06512: at "SYS.DBMS_DST", line 209
ORA-06512: at "SYS.DBMS_DST", line 858
ORA-06512: at "SYS.DBMS_DST", line 596
ORA-06512: at "SYS.DBMS_DST", line 1123
ORA-06512: at line 2

After following the normal process to remove those old versions of APEX (I am a believer in removing it period if you aren’t using it), I was then able to close the upgrade window for timezone with no errors:

select name, value$ from sys.props$ where name like '%DST%';

NAME			       VALUE$
------------------------------ ------------------------------
DST_UPGRADE_STATE	       UPGRADE
DST_PRIMARY_TT_VERSION	       32
DST_SECONDARY_TT_VERSION       18

alter session set "_with_subquery"=materialize;

Session altered.

alter session set "_simple_view_merging"=TRUE;

Session altered.

set serveroutput on
VAR numfail number
BEGIN
DBMS_DST.UPGRADE_DATABASE(:numfail,
parallel => TRUE,
log_errors => TRUE,
log_errors_table => 'SYS.DST$ERROR_TABLE',
log_triggers_table => 'SYS.DST$TRIGGER_TABLE',
error_on_overlap_time => FALSE,
error_on_nonexisting_time => FALSE);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :numfail);
END;
/
Table list: "APEX_200200"."WWV_FLOW_WORKSHEET_NOTIFY"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_WEB_SRC_MODULES"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_ACTIVITY_LOG2$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_DEBUG_MESSAGES"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_DEBUG_MESSAGES2"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_WEBSOURCE_SYNC_LOG1$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_WEBSOURCE_SYNC_LOG2$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_AUTOMATION_LOG1$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_AUTOMATION_LOG2$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_AUTOMATION_MSG_LOG1$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_AUTOMATION_MSG_LOG2$"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_FEEDBACK"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_FEEDBACK_FOLLOWUP"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_PKG_APP_INSTALL_LOG"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_ACTIVITY_LOG1$"
Number of failures: 0
Table list: "APEX_200200"."APEX$ARCHIVE_HEADER"
Number of failures: 0
Table list: "APEX_200200"."APEX$ARCHIVE_CONTENTS"
Number of failures: 0
Table list: "APEX_200200"."APEX$ARCHIVE_HISTORY"
Number of failures: 0
Table list: "APEX_200200"."APEX$ARCHIVE_LOG"
Number of failures: 0
Table list: "APEX_200200"."APEX$ARCHIVE_PREF"
Number of failures: 0
Table list: "APEX_200200"."WWV_FLOW_AUTOMATIONS"
Number of failures: 0
Failures:0

PL/SQL procedure successfully completed.

VAR fail number
BEGIN
DBMS_DST.END_UPGRADE(:fail);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :fail);
END;
/
An upgrade window has been successfully ended.
Failures:0

PL/SQL procedure successfully completed.

select name, value$ from sys.props$ where name like '%DST%';

NAME			       VALUE$
------------------------------ ------------------------------
DST_UPGRADE_STATE	       NONE
DST_PRIMARY_TT_VERSION	       32
DST_SECONDARY_TT_VERSION       0

Now after changing this, you can now see that the “Passthru Reasons” section is no longer populated and all cell offloads are occurring again:

Confirming via looking at AWR also shows the query now getting proper offload:

So the moral of the story is that when performing an upgrade, take extra care to check if there are invalid objects along the way and also don’t proceed unless it is properly resolved! Also, consider removing features of the database if you are not using them. It will lessen the chance for there being outliers causing problems and your upgrade will actually complete faster!

Enjoy!

Using Python Along with the Oracle GoldenGate Big Data Adapter Integrated Heartbeat

In my previous blog post, https://stborden.wordpress.com/2021/08/27/enabling-and-use-of-the-oracle-goldengate-big-data-adapter-integrated-heartbeat/, I showed how to enable the heartbeat for the GoldenGate Big Data Adapter. In this post, we will extend this facility using python to read this data and provide lag information for the replication path over different time periods.

Python is a perfect language to iterate over the json generated by the integrated heartbeat. For ease, I have posted the script in the code block below, but it is also available on my GitHub site:

https://github.com/shane-borden/goldengate-scripts/blob/main/ogg_big_data_heartbeat_report.py


#!/bin/python3
# Python program to read ogg heartbeat history json file

import json
import time
import datetime
import os
import glob
import sys, getopt

def main(argv):
  # Initialize Variables
  vLagJsonDir = ''
  try:
    opts, args = getopt.getopt(argv,"h:j:",["jsondir="])
    if len(opts) == 0:
      print('Script Usage: ogg_big_data_heartbeat_report.py -j ')
      sys.exit(1)
  except getopt.error as err:
    print('Script Usage: ogg_big_data_heartbeat_report.py -j ')
    sys.exit(2)
  for opt, arg in opts:
    if opt == '-h':
      print('Script Usage: ogg_big_data_heartbeat_report.py -j ')
      sys.exit()
    #elif opt in ("-j", "--jsondir"):
    elif opt == '-j':
      vLagJsonDir = arg
    elif opt == '--jsondir':
      vLagJsonDir = arg

  vTotLag = 0
  vTotJsonRecords = 0
  vTotLag_1hour = 0
  vTotJsonRecords_1hour = 0
  vTotLag_4hour = 0
  vTotJsonRecords_4hour = 0
  vTotLag_8hour = 0
  vTotJsonRecords_8hour = 0
  vTotLag_24hour = 0
  vTotJsonRecords_24hour = 0
  now = time.mktime(datetime.datetime.now().timetuple())
  if vLagJsonDir == "":
    vLagJsonDir = "/u01/app/oracle/product/oggBd/19.1/gg_1/dirtmp/"
    print('JSON Dir defaulted to: ' + str(vLagJsonDir))
  else:
    print('JSON Dir is: ' + str(vLagJsonDir))
  lag_records = []
  heartbeat_timestamp_records = []
  replication_path_records = []

  # Opening JSON file
  for filename in glob.glob(vLagJsonDir + '/*-hb-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].json'):
    #print(os.path.join(vLagJsonDir + "/", filename))
    f = open(os.path.join(vLagJsonDir + "/", filename))

    # returns JSON object as
    # a dictionary
    data = json.load(f)

    # Iterating through the json
    # list
    for i in data['records']:
      vIncomingTs = time.mktime(datetime.datetime.strptime(i['incomingHeartbeatTs'][:-3],"%Y-%m-%d %H:%M:%S.%f").timetuple())
      vOutgoingTs = time.mktime(datetime.datetime.strptime(i['outgoingReplicatTs'][:-3],"%Y-%m-%d %H:%M:%S.%f").timetuple())
      vIncomingHeartbeatTs = datetime.datetime.strptime(i['incomingHeartbeatTs'][:-3],"%Y-%m-%d %H:%M:%S.%f").strftime('%Y-%m-%d %H:%M')
      heartbeat_timestamp_records.append(vIncomingHeartbeatTs)
      #print(str(now - vOutgoingTs))
      if (now - vOutgoingTs)  " + i['incomingRoutingPath'] + " => " + i['incomingReplicat'] + " | " + vIncomingHeartbeatTs + " | " + str(vOutgoingTs - vIncomingTs))
        replication_path_records.append(i['incomingExtract'] + " => " + i['incomingRoutingPath'] + " => " + i['incomingReplicat'])
      elif (now - vOutgoingTs) <= 14400:
        vTotLag_4hour = vTotLag_4hour + (vOutgoingTs - vIncomingTs)
        vTotJsonRecords_4hour = (vTotJsonRecords_4hour + 1)
      elif (now - vOutgoingTs) <= 28800:
        vTotLag_8hour = vTotLag_8hour + (vOutgoingTs - vIncomingTs)
        vTotJsonRecords_8hour = (vTotJsonRecords_8hour + 1)
      elif (now - vOutgoingTs)  0:
    print("Average Lag over the past hour: " + str(vTotLag_1hour // vTotJsonRecords_1hour) + " seconds")
  if vTotJsonRecords_4hour > 0:
    print("Average Lag over the past 4 hours: " + str(vTotLag_4hour // vTotJsonRecords_4hour) + " seconds")
  if vTotJsonRecords_8hour > 0:
    print("Average Lag over the past 8 hours: " + str(vTotLag_8hour // vTotJsonRecords_8hour) + " seconds")
  if vTotJsonRecords_24hour > 0:
    print("Average Lag over the past 24 hours: " + str(vTotLag_24hour // vTotJsonRecords_24hour) + " seconds")
  print("Average Lag over the dataset (" + str(vTimeDiff) + " Days): " + str(vTotLag // vTotJsonRecords) + " seconds")

if __name__ == "__main__":
  if len(sys.argv) < 1:
    print('Script Usage: ogg_big_data_heartbeat_report.py -j ')
    sys.exit(2)
  else:
    main(sys.argv[1:])

Now, to execute the script its quite easy. Just tell the script where the json files are and it will aggregate the results over the past hour, 4 hours, 8 hours, 24 hours and the entire dataset:

oracle@orcl-vbox2:/home/oracle $ ogg_big_data_heartbeat_report.py -j /u01/app/oracle/product/oggBd/19.1/gg_1/dirtmp
JSON Dir is: /u01/app/oracle/product/oggBd/19.1/gg_1/dirtmp

Replication Paths:
E_HR_CL => P_HR_CL => RKFJSCON

Combined Lag Data for Replication Paths:

Average Lag over the past hour: 6.0 seconds
Average Lag over the past 4 hours: 6.0 seconds
Average Lag over the past 8 hours: 6.0 seconds
Average Lag over the past 24 hours: 6.0 seconds
Average Lag over the dataset (18.0 Days): 443446.0 seconds

I’ve only had my VM up for the past day or so, so the data for the entire dataset is skewed as there was a several day period where the VM was down.

Now you can extend this even further if you like to execute on a timely bases from OEM so that historical data can be kept or from any other scheduling tool you might have. Enjoy!

Enabling and Use of the Oracle GoldenGate Big Data Adapter Integrated Heartbeat

I have always found the Integrated Heartbeat for GoldenGate very useful for quickly identifying replication trends and potential slowness in the replication path. Luckily in GoldenGate Big Data Adapter Version 19c they have extended this functionality for the Big Data Adapter. My goal with this blog post is to show how easy it is to enable the heartbeat and to demonstrate a python script to utilize the data the heartbeat produces.

First, enable the heartbeat on the source just as you normally would. I won’t cover how to do that in this post, but for GoldenGate 19c, the instructions can be found here:

https://docs.oracle.com/en/middleware/goldengate/core/19.1/gclir/add-heartbeattable.html#GUID-126E30A2-DC7A-4C93-93EC-0EB8BA7C13CB

Enabling the HeartBeat for the Big Data Adapter

Luckily, enabling the heartbeat for the big data adapter is just as easy as in the normal Goldengate CLI. The official reference is here:

https://docs.oracle.com/en/middleware/goldengate/big-data/19.1/gbdin/automatic-heartbeat-bigdata.html#GUID-56DF810A-A417-45DB-A3ED-2D902B800778

For expedience, I will show the basic usage of the command here. After logging into GGSCI, enabling the target heartbeat is as easy as:

Oracle GoldenGate for Big Data
Version 19.1.0.0.8 (Build 002)

Oracle GoldenGate Command Interpreter
Version 19.1.0.0.210228 OGGCORE_19.1.0.0.0OGGBP_PLATFORMS_210212.1752.2
Linux, x64, 64bit (optimized), Generic  on Mar 18 2021 07:13:36
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2021, Oracle and/or its affiliates. All rights reserved.



GGSCI (orcl-vbox2.localdomain.com) 1> ADD HEARTBEATTABLE
Heartbeat table is now enabled.
HEARTBEAT configuration file dirprm/heartbeat.properties
heartbeat.enabled=true
heartbeat.frequency=60
heartbeat.retention_time=30
heartbeat.purge.frequency=1
heartbeat.db.name=BigData

Now that the heartbeat has been enabled where does the data go? The heartbeat information (by default) is under your GOLDENGATE_HOME/dirtmp directory in the form of JSON files:

RKFJSCON-hb.json – Data for Last Heartbeat

The last heartbeat recorded looks like this:

{
“localDatabase” : “BigData”,
“heartbeatTimestamp” : “2021-08-27 13:17:14.149808000”,
“remoteDatabase” : “VBOXNCDB”,
“incomingExtract” : “E_HR_CL”,
“incomingRoutingPath” : “P_HR_CL”,
“incomingReplicat” : “RKFJSCON”,
“incomingHeartbeatTs” : “2021-08-27 13:17:14.149808000”,
“incomingExtractTs” : “2021-08-27 13:17:16.822345000”,
“incomingRoutingTs” : “2021-08-27 13:17:17.275030000”,
“incomingReplicatTs” : “2021-08-27 13:17:20.411229000”,
“outgoingReplicat” : “RKFJSCON”,
“outgoingReplicatTs” : “2021-08-27 13:17:20.412000000”
}


The every time a new heartbeat arrives the last heartbeat is written to a historical file by day up to the retention period where the data is then deleted:

{
“records” : [ {
“localDatabase” : “BigData”,
“heartbeatTimestamp” : “2021-08-27 00:00:14.110627000”,
“remoteDatabase” : “VBOXNCDB”,
“incomingExtract” : “E_HR_CL”,
“incomingRoutingPath” : “P_HR_CL”,
“incomingReplicat” : “RKFJSCON”,
“incomingHeartbeatTs” : “2021-08-27 00:00:14.110627000”,
“incomingExtractTs” : “2021-08-27 00:00:16.213547000”,
“incomingRoutingTs” : “2021-08-27 00:00:17.371348000”,
“incomingReplicatTs” : “2021-08-27 00:00:20.147079000”,
“outgoingReplicat” : “RKFJSCON”,
“outgoingReplicatTs” : “2021-08-27 00:00:20.170000000”
}, {
More json records
}, {
“localDatabase” : “BigData”,
“heartbeatTimestamp” : “2021-08-27 13:21:14.178962000”,
“remoteDatabase” : “VBOXNCDB”,
“incomingExtract” : “E_HR_CL”,
“incomingRoutingPath” : “P_HR_CL”,
“incomingReplicat” : “RKFJSCON”,
“incomingHeartbeatTs” : “2021-08-27 13:21:14.178962000”,
“incomingExtractTs” : “2021-08-27 13:21:16.602223000”,
“incomingRoutingTs” : “2021-08-27 13:21:17.675216000”,
“incomingReplicatTs” : “2021-08-27 13:21:21.481484000”,
“outgoingReplicat” : “RKFJSCON”,
“outgoingReplicatTs” : “2021-08-27 13:21:21.481000000”
} ]
}


So as you can see the heartbeat data is now in json format to be able to write alerts, OEM extensions or whatever else you see fit. Check out my next blog post on how to report on this data using a simple Python script!