Exporting to Amazon S3 Tables with AWS Step Functions Distributed Map

This repository showcases a serverless solution that processes PDF fors using AWS Step Functions Distributed Map, extracts data with Amazon Textract, and stores results in Amazon S3 Tables in Iceberg format.

Overview

The repository demonstrates how to:

Process PDF forms at scale using Step Functions Distributed Map
Extract structured data from PDFs using Amazon Textract
Store data in Amazon S3 Tables (Iceberg format) via Kinesis Data Firehose
Schedule automated processing with Amazon EventBridge Scheduler

Warning This application is not ready for production use. It was written for demonstration and educational purposes. Review the Security section of this README and consult with your security team before deploying this stack. No warranty is implied in this example.

Note This architecture creates resources that have costs associated with them. Please see the AWS Pricing page for details and make sure to understand the costs before deploying this stack.

How the application works

The solutions comprises of the below steps:

A user uploads customer interest forms as scanned PDFs to an Amazon S3 bucket.
An Amazon EventBridge Scheduler rule triggers at regular interval, initiating a Step Functions workflow execution.
The workflow execution activates a Distributed Map State, which lists all PDF files uploaded to Amazon S3 since the previous run.
The Distributed Map iterates over the list of objects and passes each objects metadata (Bucket, Key, Size, ETag) to a child workflow execution.
For each object, the child workflow calls Amazon Textract with the provided Bucket and Key to extract raw text and relevant fields (name, email address, mailing address, interest area) from the PDF.
The child workflow writes the extracted data to an Amazon Data Firehose, which is configured to forward data to an Amazon S3 Tables.
The Firehose batches the incoming data from the child workflow and writes it to the Amazon S3 Tables at a pre-configured time interval.

Deployment instructions

Prerequisites

AWS CLI configured with appropriate permissions
AWS SAM CLI installed

Deploy the application with AWS SAM

Clone the repository

git clone https://github.com/aws-samples/sample-exporting-to-amazon-s3-tables-with-aws-step-functions-distributed-map.git
cd sample-exporting-to-amazon-s3-tables-with-aws-step-functions-distributed-map

Deploy the stack
```
sam build
sam deploy --guided
```
Upload test PDFs

Upload PDF forms to your S3 bucket under the path: RawInterestForms/YYYY/WW/
Trigger processing

Execute the Step Function manually or wait for the schedule

Clean up

To avoid ongoing charges, delete the stack and associated resources:

sam delete

Manual cleanup required:

S3 Tables bucket and data (if not empty)
CloudWatch log groups (if retention is set)
Any uploaded PDF files in the source bucket

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

DISCLAIMER

The solution architecture sample code is provided without any guarantees, and you're not recommended to use it for production-grade workloads. The intention is to provide content to build and learn. Be sure of reading the licensing terms.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.DS_Store		.DS_Store
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
samconfig.toml		samconfig.toml
solution-architecture.png		solution-architecture.png
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exporting to Amazon S3 Tables with AWS Step Functions Distributed Map

Overview

How the application works

Deployment instructions

Prerequisites

Deploy the application with AWS SAM

Clean up

Security

License

DISCLAIMER

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

aws-samples/sample-exporting-to-amazon-s3-tables-with-aws-step-functions-distributed-map

Folders and files

Latest commit

History

Repository files navigation

Exporting to Amazon S3 Tables with AWS Step Functions Distributed Map

Overview

How the application works

Deployment instructions

Prerequisites

Deploy the application with AWS SAM

Clean up

Security

License

DISCLAIMER

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages