Establish an automated monitoring mechanism for Amazon DMS database migration tasks

Posted by samsolomonraj on Wed, 22 Dec 2021 05:36:01 +0100

Amazon Database Migration Service(DMS) is a cloud service that can easily complete the migration of relational databases, data warehouses, NoSQL databases and other types of data stores. Amazon DMS is specifically used to migrate data from multiple local instances or a combination of cloud and local instances to Amazon cloud.

In the process of data migration using Amazon DMS, the most important thing is to monitor the status of ongoing replication tasks. You can do this through the task control table and Amazon CloudWatch service. You can monitor task progress, actual resources used, and network connectivity through the Amazon management console, the Amazon command line interface (Amazon CLI), or the Amazon DMS API.

You typically use multiple tasks to perform the migration. These tasks are independent of each other and can run at the same time, while the number of replication tasks varies according to the actual scenario. When faced with a large number of simultaneous replication tasks, it is undoubtedly a boring and error prone work to monitor the progress of each task manually.

📢 To learn more about the latest technology release and practical innovation of Amazon cloud technology, please pay attention to the 2021 Amazon cloud technology China Summit held in Shanghai, Beijing and Shenzhen! Click the picture to sign up ~

In this article, we provide you with an automation solution using Amazon CloudFormation templates. This solution consists of the following steps:

  1. Create a CloudWatch alarm for the replication task.
  2. Create Amazon DMS event subscriptions.
  3. Configure Amazon Simple Notification Service (Amazon SNS) to notify you of the errors found in the CloudWatch log of the task.
  4. Create an Amazon Lambda function and send SNS notification to reproduce CloudWatch alarm.

prerequisite

Before you begin, you must have the following resources:

  1. Amazon DMS source and target endpoints
  2. An Amazon DMS replication instance
  3. Amazon DMS replication task with logging enabled (for detailed instructions, see Creating continuous replication task with Amazon DMS)
  4. An SNS theme

After the above prerequisites are met, you can start automatic replication task monitoring.

CloudWatch alarm for Amazon DMS replication task

The preferred method to monitor task status is to create CloudWatch alarms for replication tasks. Because as long as the index of the replication task changes, the alarm will be triggered immediately.

We recommend that you set alarms for the following DMS indicators:

  1. CDCLatencySource
  2. CDCLatencyTarget
  3. CDCChangesDiskSource
  4. CDCChangesDiskTarget

For more details on Amazon DMS metrics, see Amazon Database Migration Service metrics.

CDCLatencySource

CDCLatencySource is the interval (in seconds) between the last event captured from the source terminal node and the current system timestamp of the Amazon DMS instance. Amazon DMS sets this value to zero if no changes are captured from the source terminal node due to task scope constraints.

During replication, Amazon DMS reads changes from the source database transaction log.

Depending on the actual engine of the source database, the source transaction log may contain uncommitted data. During replication, Amazon DMS reads the incoming changes from the transaction log, but only forwards the committed changes to the target, which eventually delays the sending of the originating database.

CDCLatencyTarget

CDCLatencyTarget is the interval (in seconds) between the timestamp of the first event waiting to be committed on the target and the current system timestamp of the Amazon DMS instance. This value is generated if there are transactions that the target does not process. If all transactions are processed in time, the target delay will be the same as the source delay. The target delay should not be lower than the source delay.

The reason why the target delay is higher than the source delay is that the former represents the total delay time of the current record from the insertion time in the source database to the submission of the corresponding row.

CDCChangesDiskSource

CDCChangesDiskSource is the total number of rows accumulated on the disk and waiting for actual commit from the source.

All rows in CDCChangesDiskSource were once in memory, but they were evicted and saved on disk because they reached the maximum time threshold allowed to reside in memory. Our goal is to understand the internal structure of the engine and minimize the value of CDCChangesDiskSource using task settings. For example, we can try to set the values of MemoryLimitTotal and MemoryKeepTime. If you need more details, please refer to debugging Amazon DMS migration: actions to take in case of problems (Part II).

CDCChangesDiskTarget

CDCChangesDiskTarget is the total number of rows accumulated on the disk and waiting to be submitted to the target.

We should try to ensure that these rows are processed in memory. If the value of CDCChangesDiskTarget keeps increasing, it may represent two problems: the memory on the replicated instance may be overused; Or the target database instance has been unable to synchronously receive the changes sent by Amazon DMS.

Create CloudWatch alarm

The following CloudFormation stack will create CloudWatch alerts for your Amazon DMS tasks:
https://console.aws.amazon.co...

Provide the following information to the CloudFormation stack:

  1. Stack name
  2. Amazon DMS task identifier
  3. Amazon DMS replication instance name
  4. SNS subject ARN

All other settings can be left as default.

Create Amazon DMS event subscriptions

When a specific event occurs in a replication task or replication instance (for example, when an instance is created or deleted), you can receive the corresponding notification through the created Amazon DMS event subscription.

For replication tasks, create subscriptions for the following events:

  1. Configuration change
  2. establish
  3. delete
  4. fail
  5. Status change

For replication instances, create subscriptions for the following events:

  1. Configuration change
  2. establish
  3. delete
  4. Failover
  5. fail
  6. Low storage capacity
  7. maintain

After you create an event subscription, Amazon DMS will send event notifications to the destination address you provide. You may want to create multiple different subscriptions, for example, one subscription receives all event notifications, and another subscription handles key events involving only Amazon DMS resources in the production environment.

You can easily turn off notifications while continuing to keep subscriptions by setting the Enabled option to No in the Amazon DMS console or setting the Enabled parameter to false using the Amazon DMS API. For more details, see using events and notifications in Amazon database migration services.

The following CloudFormation stack will create event subscriptions for your Amazon DMS tasks:
https://console.aws.amazon.co...

The following information is required for the stack:

  1. Stack name
  2. Amazon DMS task name
  3. SNS subject ARN

All other settings can be left as default.

Create SNS notifications for error messages in CloudWatch logs

Amazon DMS can publish detailed task information to CloudWatch logs. You can use this to monitor the health of a task as it runs and diagnose any problems that occur

By default, logs are stored in the DMS task log stream of the log group DMS tasks -. For more details, see logging task settings.

To get notification of error messages in CloudWatch logs, create a subscription filter on the log group. For details, see the following Python script:

from __future__ import print_function
import json
import base64, zlib
import boto3
import os

def logstream_handler(event, context):
    bstream_data = event.get("awslogs").get("data")
    decoded_data = json.loads(zlib.decompress(base64.b64decode(bstream_data),16 + zlib.MAX_WBITS))
    client = boto3.client('sns')
    subscriptionFilters = decoded_data.get("subscriptionFilters")
    subject = ""
    if subscriptionFilters:
        subject = "Log Filter Alert : {0}".format(subscriptionFilters[0])
    decoded_msg = decoded_data.get("logEvents")
    msg = "logGroup : {0}\nlogStream : {1}".format(
        decoded_data.get("logGroup"),
        decoded_data.get("logStream"))
    msg = "{0}\n\nMessages: \n".format(msg)
    for m in decoded_msg:
        msg = "{0}\n{1}".format(msg,m.get("message"))
    topicARN=os.environ.get("topicARN")
    args = {}
    args["TargetArn"]=topicARN
    args["Message"]=msg
    if subject:
        args["Subject"]=subject
    response = client.publish(**args)
    return {
        "statusCode": 200,
        "body": json.dumps('Sent Message.')
    }

The following CloudFormation stack creates an environment for sending SNS error log Notifications:
https://console.aws.amazon.co...

Provide the following information for the stack:

  1. Stack name
  2. Log group name
  3. SNS subject ARN
  4. Filtering mode

All other settings can be left as default.

Create a Lambda function to send SNS notification for repeated CloudWatch alarms

You can create multiple CloudWatch alarms to understand when the alarm status changes. In some cases, the alarm may be active for a long time, causing you to miss the previously sent alarm. To obtain a duplicate alarm, we provide you with a Lambda function to check the status of the alarm and the duration of the current status, and send a notification accordingly.

The following Python script that can send SNS notifications will be called by the CloudWatch Events rule:

import json
import boto3
import os

cloudwatch = boto3.client('cloudwatch')
sns = boto3.client('sns')

subject_str = '{}: "{}" in {}'
message_str = """You are receiving this email because your Amazon CloudWatch Alarm "{}" in the {} region has entered the {} state, because "{}".

Alarm Details :
    - Name: {}
    - Description: {}
    - Reason for State Change: {}


Monitored Metric:
    - MetricNamespace: {}
    - MetricName: {}
    - Dimensions: {}
    - Period: {}
    - Statistic: {}
    - Unit: {}
    - TreatMissingData: {}
"""

def send_alarm(topic, subject, message):
    """ Sends SNS Notification to given topic """
    response = sns.publish(
                    TopicArn=topic,
                    Message=message,
                    Subject=subject
                )
    print("Alarm Sent Subject : {}".format(subject))
    return

def main_handler(event, context):
    """
        Describes existing alarms in current region and check it's state
        If state matches to alarmState that sent as input send alarm.

        Parameters
        ----------
        alarmNames - ['string']
        alarmState - string (Alarm/OK/INSUFFICIENT_DATA)
    """
    alarm_names = event['alarmNames']
    alarm_state = event.get('alarmState', 'Alarm').lower()
    region = os.environ["AWS_REGION"]
    response = cloudwatch.describe_alarms(
        AlarmNames=alarm_names
        )
    metric_alarms = response["MetricAlarms"]
    if len(metric_alarms) == 0:
        return {
            'statusCode': 200,
            'body': json.dumps('No Alarms Configured')
        }
    for alarm in metric_alarms:
        if alarm["StateValue"].lower() != alarm_state:
            continue
        topics = alarm["AlarmActions"] if alarm_state == 'alarm' else alarm["OKActions"] if alarm_state == 'ok' else alarm['InsufficientDataActions'] if alarm_state == 'insufficient_data' else []
        if len(topics) == 0:
            print('No Topics Configured for state %s to %s' %(alarm_state, alarm['AlarmName']))
            continue
        subject = subject_str.format(alarm["StateValue"], alarm['AlarmName'], region)
        message = message_str.format(alarm['AlarmName'], region,
                                    alarm['StateValue'], alarm['StateReason'],
                                    alarm['AlarmName'], alarm['AlarmDescription'],
                                    alarm['StateReason'], alarm['Namespace'],
                                    alarm['MetricName'], str(["{}={}".format(d['Name'], d['Value']) for d in alarm["Dimensions"]]),
                                    alarm['Period'], alarm['Statistic'],
                                    alarm.get('Unit', 'not specified'), alarm['TreatMissingData'])
        for topic in topics:
            send_alarm(topic, subject, message)
    return {
        'statusCode': 200,
        'body': json.dumps('Success')
    }
Python

The following CloudFormation stack creates an environment for sending SNS error log message notifications:
https://console.aws.amazon.co...

Provide the stack with the following information:

  1. Stack name
  2. Amazon DMS task name
  3. SNS subject ARN

All other settings can be left as default.

summary

In this article, we introduced you how to use CloudWatch, Amazon DMS event subscription, Amazon SNS and Amazon Lambda to automatically monitor and alarm Amazon DMS replication tasks.

With this solution, you can easily track the status of replication tasks without using the console. The system will notify you of each change event; If an error occurs, you will also receive the corresponding alarm.

I hope this article can help you understand how to monitor Amazon DMS database migration process

Author of this article


**Venkata Naveen Koppula
**
Amazon cloud technology
Professional services assistant consultant

He focuses on projects such as Amazon DMS, SCT and Aurora PostgreSQL, and is committed to providing customers with the best experience.


Vijaya Diddi
Amazon cloud technology
Professional services assistant consultant

She focuses on Amazon DMS, SCT, Amazon Config, and SSM Documents. She is willing to use Python language to develop automation tools, which greatly reduces the difficulty of migration implementation.

Topics: data