Amazon Textract


Amazon Textract is used to convert scanned documents to text. This includes text in tables and hand written form. When text is extracted it is returned with coordinates that identify a box shaped area on the document. This allows for auditing later since the text can be traced back to a specific area in a specific document. The extracted text is also returned with a score to indicate how confident Textract is on the results. This gives you the option to reject the automatic processing of text extracted with a low level of confidence.

Key features

  • Optical Character Recognition (OCR)
  • Form Extraction
  • Table Extraction
  • Handwriting Recognition
  • Built-in Human Review Workflow

Video: Amazon Textract – Extracting text, tables and forms from documents

8.34 minute video by Julien Simon. This shows the capability of Textract to process some complex documents.

Use cases

The most common use cases for Amazon Textract include:

  • Import Documents and Forms into Business Applications
  • Create Smart Search Indexes 
  • Build Automated Document Processing Workflows
  • Maintain Compliance in Document Archives
  • Extract Text for Natural Language Processing (NLP)
  • Text Extraction for Document Classification


