OCR a PDF — make scans searchable

Drop a scanned PDF and get the same document back — searchable, selectable, and with the text downloadable as .txt. Recognition runs on your device.

Recognizes printed English text. More languages are planned.

Files are processed entirely in your browser. Nothing is uploaded to any server.

Free, private, and actually unlimited.

No daily caps. No upload queue. No spinner that turns into a paywall after the third file.

Private by architecture

Your PDF's contents never leave your device. The editing tools run entirely in your browser — no upload, no server-side copy — and a Content-Security-Policy blocks any code that would try. Only account and contact actions ever reach our server, and they never carry your file.

Truly unlimited

No hourly throttling. No daily or monthly caps. No file-count limit. Edit one PDF or ten thousand — same site, same speed, no nag screen.

No signup, no watermarks

Every tool below works with or without an account or email. Output PDFs are clean — no stamps, no banners, no preview-mode quality downgrades.

About this tool

A scanned PDF is a stack of photographs: you can read it, but you can't search it, select a sentence, or copy a paragraph into an email. OCR (optical character recognition) fixes that. Drop a scanned PDF here and you get back the same document — same pages, same image quality, byte-for-byte identical to look at — with an invisible text layer aligned over the print, so Ctrl+F finds things, text selects naturally, and copy-paste works. You can also download everything the OCR read as a plain .txt file.

The pipeline runs entirely in your browser: PDFium (the engine behind this site's editor) renders each page at 300 DPI — the resolution OCR engines are happiest at — and Tesseract, the open-source OCR engine that Google maintained for years, recognizes the text as WebAssembly running on your own machine. Even the OCR language model is served from this site, not a third-party CDN, so the network panel stays empty while your document is processed. That makes this one of the few OCR tools where 'we never see your document' is verifiable rather than a promise.

Expectations, honestly set: recognition quality on clean, printed English documents is very good — contracts, books, letters, invoices. Quality drops on low-resolution scans, skewed photos, handwriting, and decorative fonts; OCR is probabilistic, and the occasional misread word is normal. The searchable PDF keeps the original scan on top, so even where a word is misread, what you see is still the true page — only the hidden search layer is imperfect. English is the supported language in this version, with more planned.

Frequently asked questions

Is my scan uploaded for processing?
No. OCR runs as WebAssembly in your browser — the renderer, the recognition engine, and the language data all load from this site and your document never leaves your device. You can watch the network panel while it runs: no upload fires.
What does a "searchable PDF" actually mean?
Your original pages stay visually identical; an invisible text layer is positioned over the print. PDF readers then treat it like any digital document — search finds words, selection works, copy-paste works — while you still see the original scan.
How accurate is the recognition?
On clean printed English at normal sizes, expect near-perfect results with occasional misses. Low-quality scans, skew, small print, and decorative fonts reduce accuracy. Handwriting is not supported — that requires a different class of model.
Which languages are supported?
English in this version. The OCR engine itself supports 100+ languages, and adding more is planned — each language adds a few MB of model data, so they'll be opt-in rather than always-loaded.
How long does it take?
Roughly one to three seconds per page on a typical laptop, after a one-time engine load of a few seconds. A 50-page scan finishes in about a minute or two, with per-page progress shown throughout.
My PDF already has some text — can I still OCR it?
Yes. The tool warns you when it detects an existing text layer (running OCR on a digital PDF is usually redundant), but it still runs — useful for mixed documents where some pages are scans and others are digital.