Tesseract Studio is a Windows application for creating, reviewing and correcting OCR data and creating searchable PDF files.
It provides a graphical user interface in the .NET environment for the open source Tesseract OCR library, a modern neural network based optical recognition engine, actively developed and supported by Google engineers and other contributers.
TessStudio is designed for ad-hoc OCR and can process one file at a time. There are two licenses available for TessStudio:
Community | Registered | |
---|---|---|
Support TIFF, JPEG, PNG images and multi-page PDF files, with or without prior OCR data. | ||
For multi-page files, multiple instances of the tesseract engine run in parallel for improved performance. | ||
Support OCR languages, including complex documents that use multiple languages. | ||
The built-in spell checker automatically tags words not found in the dictionary. | ||
Display OCR words on a faded background of the image with visible boundaries. | ||
Edit OCR mistakes, add missing words, split, merge, delete or move recognized words. | ||
Support any number of Undo and Redo operations. | ||
Display sortable list of recognized words with Tesseract assigned confidence factors. | ||
Preserve existing non-OCR text in PDF pages and limit OCR to embedded graphical objects. | ||
Save the OCR data as text hidden behind images in searchable PDF format. | ||
Apply OCR to a single page, specific pages, or all pages of multi-page source documents. | ||
Pick a tesseract page segmentation mode for specialized layout analysis. | ||
Support fixed threshold or dynamic algorithms for conversion to binary. | ||
Perform image processing and cleanup (deskew, despeckle, grid and line removal, correct inverted blocks). | ||
Debugging option to capture intermediate images and full recognition data. | ||
Define regular expression rules for post processing OCR data. | ||
Save entire document or specified pages from the document as a new PDF. | ||
Save as Vector PDF, Searchable PDF, Raster Image or Text Only PDF. Use specific image formats, resolutions or fonts. | ||
Save as an encrypted PDF or in a PDF/A compliant format. |