Tesseract Studio is a Windows application for creating, reviewing and correcting OCR data and creating searchable PDF files.

It provides a graphical user interface in the .NET environment for the open source Tesseract OCR library, a modern neural network based optical recognition engine, actively developed and supported by Google engineers and other contributers.

TessStudio is designed for ad-hoc OCR and can process one file at a time. There are two licenses available for TessStudio:

Please refer to the features section below for a comparison of the two versions.

TessStudio Features

Community Registered
Support TIFF, JPEG, PNG images and multi-page PDF files, with or without prior OCR data.
For multi-page files, multiple instances of the tesseract engine run in parallel for improved performance.
Support OCR languages, including complex documents that use multiple languages.
The built-in spell checker automatically tags words not found in the dictionary.
Display OCR words on a faded background of the image with visible boundaries.
Edit OCR mistakes, add missing words, split, merge, delete or move recognized words.
Support any number of Undo and Redo operations.
Display sortable list of recognized words with Tesseract assigned confidence factors.
Preserve existing non-OCR text in PDF pages and limit OCR to embedded graphical objects.
Save the OCR data as text hidden behind images in searchable PDF format.
Apply OCR to a single page, specific pages, or all pages of multi-page source documents.
Pick a tesseract page segmentation mode for specialized layout analysis.
Support fixed threshold or dynamic algorithms for conversion to binary.
Perform image processing and cleanup (deskew, despeckle, grid and line removal, correct inverted blocks).
Debugging option to capture intermediate images and full recognition data.
Define regular expression rules for post processing OCR data.
Save entire document or specified pages from the document as a new PDF.
Save as Vector PDF, Searchable PDF, Raster Image or Text Only PDF. Use specific image formats, resolutions or fonts.
Save as an encrypted PDF or in a PDF/A compliant format.
