Home / Educational Content / Database & Technology / SELECT Journal / How to Lower Latency and Speed Up Data Delivery

How to Lower Latency and Speed Up Data Delivery

HVR_SpeedLatency_HomepageFeature.png

Sponsored Content

By Mark Van de Wiel, CTO of HVR, an IOUG partner | Edited by Jim Czuprynski 

Every now and again I come across the question: How can we lower latency and speed up data delivery? Irrespective of the target database, the desirable answer for the person responsible for implementing the data integration strategy may be to employ parallel processing. However, in many cases the decision to parallelize is the best answer only if better options have first been exhausted, and organizations don’t always explore those options.

Also, since most environments leverage log-based Change Data Capture (CDC) processing to capture transactions at the source database, the posting of those transactions at the target database is often slower than the capture side because log-based capture perform large sequential block reads from the log transaction files. Therefore, it makes sense to focus our attention on the integration side of the equation.

How Replication Works

First, let’s understand how replication typically works:

  • The source database processes many transactions simultaneously.
  • Committed changes to data are assigned some sort of unique commit sequence number.
  • Most log-based CDC technologies pick up the transactions as they are committed to replay them on the target in the order they were committed on the source.
  • In most cases, the default approach involves a single integration process on the target applying the transactions in commit order to ensure and maintain data integrity on the target database in line with the source system (even for changes within a transaction to allow enforced constraints on the target).

Given there is usually a multi-user workload on the source system it is not a big surprise that on the target side, depending on the load on the source database (number and size of transactions, tables and data types involved, etc.), the target system may not always be able to keep up with the demand.

The question to always ask before thinking about parallelizing the data integration – and with that splitting transactions across different processes which generally breaks data integrity (whether that is an issue depends on the application) – is: Why is my replication slow? Or, is there any way I can speed up the replication?

Why is my Replication Slow?

There can be many reasons why the replication is slow at the target database:

  • Are statistics up to date so a cost-based optimizer can make informed decisions?
  • Single row updates and deletes typically benefit from an index on the primary key column(s). Perhaps these indexes are missing; contrariwise, on some database technologies, if these indexes do exist, they may have become fragmented, thus slowing down indexed searches, so it’s possible that at least some of these indexes may require defragmentation and rebuilding.
  • Typically, optimization replication technologies (including HVR) implement methods that combine multiple source database transactions into fewer transactions on the target database. Perhaps the target database itself does not handle large transactions well and may require the DBA to tune the data delivery technology to commit more frequently.
  • The target database may be a columnar database. Columnar databases typically perform (much!) slower for single row updates and deletes than row-based databases, so maybe just replicating the workload transactions from the source database is fundamentally not the best way to go. For example, with HVR, we recommend the use of micro-batches to apply all outstanding changes – no matter how few – to be applied using micro-batches, i.e., using a (fast) bulk load into staging tables followed by set-based operations against the target tables. Such an approach is dramatically more efficient with heavy transaction workloads changes as it regulates itself keeping up with the load on the source.
  • The target database may an MPP (Massively Parallel Processing) database, and the distributed nature of the system may not lend itself to comparable performance for OLTP operations. Commits may be significantly slower on a distributed system compared to a single instance database. Again, in such a scenario the use of micro-batches has proven to be a good alternative approach.

Problem Resolution: Gather Information

Rather than guessing what the problem is, the first step in all cases is to gather sufficiently-detailed metrics and related diagnostic information:

  • If you are licensed for the Oracle Database Diagnostic Pack, it provides an excellent method to capture information through AWR reports. These reports tell you during a time window (ideally, a short timeframe window during which you ran a relevant workload) how the instance spent most of its time, what are the top wait events, what are the top SQL statements, and so forth. Looking through these reports may point you to potential problems. Likewise SQL Server has wait events and other statistics that are worth analyzing. Or you can typically find out in the database’s data dictionary what are the sizes of database objects, which indexes are fragmented, and so forth.
  • Implement changes you think make sense based on an analysis of the workload; for example, you may consider adding or rebuilding indexes as needed, implementing partitioning if it makes sense (and you are licensed for it), or even changing the distribution of tables in an MPP system.
  • Finally, if and only if you run out of alternative approaches, then it’s time to consider parallelizing data delivery at the target database. You can leverage HVR or other replication technologies to accomplish this.

How to Lower Latency and Speed Up Data Delivery