Pull structured data from PDFs — invoices, receipts, contracts, reports — into typed JSON. Better than commercial OCR for common formats.
From Wikipedia
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1993 used to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, a PDF file encapsulates a complete description of a fixed-layout document, including the text, fonts, vector graphics, raster images and other information needed to display it.