Amazon Textract
Overview
Amazon Textract is used to convert scanned documents to text. This includes text in tables and hand written form. When text is extracted it is returned with coordinates that identify a box shaped area on the document. This allows for auditing later since the text can be traced back to a specific area in a specific document. The extracted text is also returned with a score to indicate how confident Textract is on the results. This gives you the option to reject the automatic processing of text extracted with a low level of confidence.
- AWS docs: https://aws.amazon.com/textract/
- AWS FAQs: https://aws.amazon.com/textract/faqs/
Key features
- Optical Character Recognition (OCR)
- Form Extraction
- Table Extraction
- Handwriting Recognition
- Built-in Human Review Workflow
Video: Amazon Textract – Extracting text, tables and forms from documents
Use cases
The most common use cases for Amazon Textract include:
- Import Documents and Forms into Business Applications
- Create Smart Search Indexes
- Build Automated Document Processing Workflows
- Maintain Compliance in Document Archives
- Extract Text for Natural Language Processing (NLP)
- Text Extraction for Document Classification
Credits
- Photo by nadi borodina on Unsplash