Achieving idempotency in the AWS serverless space

Recently I experienced something that you usually hear about when going through resources around cloud and distributed systems, which is everything fails and you need to plan for it. This doesn’t just mean infrastructure failure but also software failure. Now, if you ever went through any of the AWS messaging services documentation like SNS and SQS, I am pretty sure you’ve read somewhere in the documentation that they ‘promise’ at least one message delivery and they don’t guarantee exactly one. Basically, this means, your job as a developer is to make sure to guarantee that safety because even AWS resources can fail.

It all comes to “everything fails and you need to plan for it” when I recently encountered an issue that idempotency was the key to solve it. Let me first try to define idempotency in my own understanding and learning. It is when a functionality is invoked multiple times with the same input results in different outputs after each invocation. In this case, a message that holds the same values resulted in multiple documents

How I experienced it

A Lambda that is scheduled every day to be invoked queries a bunch of data from DynamoDB. It will publish a message to an SNS topic to which another Lambda function is subscribed and will process the message, at the end of the process of each message one single document should be stored in the table based on the message data it received.

The implementation here failed me and it’s because I didn’t have a couple of things in mind that revolve around “things fail all the time”. A message can be received more than once in rare situations, but also modeling the document without ensuring uniqueness & achieving idempotency for such business process. The unique document that should’ve been stored in the database once was stored twice.

Figure 1: One event record results in two documents “of the same kind” stored in the database

A step towards idempotency

FIFO Topics & Queues

Distributed systems (like SNS) and client applications sometimes generate duplicate messages. You can avoid duplicated message deliveries from the topic in two ways: either by enabling content-based deduplication on the topic, or by adding a deduplication ID to the messages that you publish. With message content-based deduplication, SNS uses a SHA-256 hash to generate the message deduplication ID using the body of the message. After a message with a specific deduplication ID is published successfully, there is a 5-minute interval during which any message with the same deduplication ID is accepted but not delivered. If you subscribe a FIFO queue to a FIFO topic, the deduplication ID is passed to the queue and it is used by SQS to avoid duplicate messages being received.

Introducing Amazon SNS FIFO — First-In-First-Out Pub/Sub Messaging

To achieve exactly-once message delivery some conditions must be met

  • An SQS fifo queue is subscribed to the SNS fifo topic
  • The SQS queue processes the messages and deletes them before the visibility timeout
  • There are no message filtering on the SNS subscription
  • There must be no network disruptions that could prevent message received acknowledgment

All of the conditions can be met by the configurations we make, except for the last one “There must be no network disruptions” which we will take care of in the next section. With some adjustments to our architecture, it should look like Figure 2.

Figure 2: Adding both a FIFO SNS topic & SQS in front of the lambda to handle message deduplication

Below in the code snippet, we’re passing a value to MessageDeduplicationId property that is composed of two values that will always be uniquely combined together.

import SNS from 'aws-sdk/clients/sns';
import { format } from 'date-fnz/fp';

const sns = new SNS({ region: process.env.AWS_REGION });

const xValue = "x_unique_value";
const yValue = "y_unique_value";

await sns.publish({
TopicArn: process.env.MY_FIFO_TOPIC_ARN,
Message: JSON.stringify({ x: xValue, y: yValue}),
MessageDeduplicationId: `${xValue}#${yValue}`,
}).promise();

Using the serverless framework, the IaC should look something like the following

MyFifoSNSTopic:
Type: AWS::SNS::Topic
Properties:
FifoTopic: true
ContentBasedDeduplication: true
TopicName: MyFifoSNSTopic.fifo

MyFifoSQSQueue:
Type: AWS::SQS::Queue
Properties:
FifoQueue: true
ContentBasedDeduplication: true
QueueName: MyFifoSQSQueue.fifo

MySNSTopicSubscription:
Type: AWS::SNS::Subscription
Properties:
RawMessageDelivery: true
TopicArn:
Ref: MyFifoSNSTopic
Protocol: sqs
Endpoint:
Fn::GetAtt:
- MyFifoSQSQueue
- Arn

MySQSSNSPlicy:
Type: AWS::SQS::QueuePolicy
Properties:
Queues:
- Ref: MyFifoSQSQueue
PolicyDocument:
Version: "2012-10-17"
Statement:
- Action: SQS:SendMessage
Effect: Allow
Principal:
Service: "sns.amazonaws.com"
Resource:
- Fn::GetAtt:
- MyFifoSQSQueue
- Arn
Condition:
ArnEquals:
aws:SourceArn:
Ref: MyFifoSNSTopic

There is a caveat though, and that is the deduplication only happens within a 5 minute interval. Simply put, if the same message (same deduplication ID or message content) has been sent more than once outside a 5-minute window where the a previous has been sent, SNS & SQS won’t deduplicate it for us and will consider it as a unique message that was not sent before. This could also be tackled in the next section.

DynamoDB Composite Keys

Figure 3: Composite keys on a DynamoDB table having unique values on the partition and sort keys

The nice thing about composite keys is, if anything happens during the publishing & consuming of the message and we receive the message more than once, the composite key will guarantee idempotency as a last resort when storing the document in the table. In the code snippet below, we’re passing a ConditionExpression, although this is not required since DDB will reject the put request if the composite key already exists, but to understand the reason behind the put failure I am able to decide what would be the next steps in the execution flow to happen. In this case, I just want to exit safely in order to signal to the SQS queue that the message has been successfully processed and the queue will delete the message.

import type { SQSEvent } from 'aws-lambda';
import { DocumentClient } from 'aws-sdk/clients/dynamodb';

const ddbClient = new DocumentClient({ region: process.env.AWS_REGION });

export default = (event: SQSEvent) => {
for (const record in event.records) {
try {
const { x, y }: { x: string; y: string } = JSON.parse(record.body);
await client.put({
TableName: 'my-table',
Item: {
id: ':id',
sk: ':sk',
},
ConditionExpression: 'attribute_not_exists(#sk)',
ExpressionAttributeValues: {
':id': x,
':sk': y
},
ExpressionAttributeNames: {
'#sk': 'sk'
},
}).promise();
} catch (error) {
if (error.code === 'ConditionalCheckFailed') {
// Don't retry the message if the document with the composite key already
// exists in the table, rather let the message considered to be processed
// successfuly and to deleted by the SQS queue
return;
} else {
throw error;
}
}
}
}

Conclusion

⚠️ This is from my own learnings and understandings, would appreciate any feedback & improvement for the described solution in this article

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store