How to Use AWS Textract to Extract Text from Scanned Documents in S3 Buckets

A Quick Start Guide for Amazon’s New OCR Service that Uses Python SDK Boto3

AWS recently released Textract for general use on May 29, 2019.

This article demonstrates how to use AWS Textract to extract text from scanned documents in an S3 bucket.

This goes beyond Amazon’s documentation — where they only use examples involving one image. Included in this blog is a sample code snippet using AWS Python SDK Boto3 to help you quickly get started.

Definitions

Amazon Textract is a service that automatically extracts text and data from scanned documents.
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Code

https://medium.com/media/9852f797e51614e2de63146246da0ef7/href

Closing

Textract is an amazing OCR (optical character recognition) tool. It can save your team countless man hours by automating the tedious and error-prone task of manual data entry.

Thanks for reading — and please follow me here on Medium for more interesting software engineering articles!

P.S. We’re hiring! Explore our current openings at https://studios.panya.me/

How to Use AWS Textract to Extract Text from Scanned Documents in S3 Buckets was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.

A Quick Start Guide for Amazon’s New OCR Service that Uses Python SDK Boto3

Definitions

Code

Closing

Related Posts