OCR February 14, 2026 · 6 min read

How to Extract Text From a PDF Image (Scanned Document)

Step-by-step guide to extracting selectable, copyable text from image-based PDFs and scanned documents using free online and offline tools.

AltoUnlockPDF Team

PDF Tools Expert

When you receive a PDF that’s actually a scanned image, you can’t simply click and drag to select text. The document is a photograph of a page — the text is “baked in.” To get the text out, you need OCR.

Here’s every method, from simplest to most powerful.

Why Can’t You Copy Text From Some PDFs?

There are two types of PDFs:

Native/digital PDFs — created from Word, InDesign, etc. Text is stored as actual characters. You can search and copy freely.
Image-based/scanned PDFs — the page is stored as a raster image. No text data exists; just pixels.

If you try to Ctrl+A in a scanned PDF and no text gets selected, you have an image-based PDF.

Method 1: AltoUnlockPDF (Fast, Free, Online)

Visit our OCR tool
Upload your PDF
Select output format: Searchable PDF (keeps original appearance + adds text layer) or Plain Text (.txt)
Choose language
Download output

Takes about 5–30 seconds per page. No signup required.

Method 2: Google Drive (Free, Highly Accurate)

Upload PDF to Google Drive
Right-click → Open with → Google Docs
Wait 30–60 seconds for OCR to complete
The document opens with extracted text above/below each page image
Select all → Copy → paste wherever needed

Works great for 1–20 page documents. Free and unlimited.

Method 3: Python — Programmatic Extraction

For developers or bulk processing:

import pdf2image
import pytesseract
from PIL import Image
import io

def extract_text_from_scanned_pdf(pdf_path):
    # Convert PDF pages to images
    images = pdf2image.convert_from_path(pdf_path, dpi=300)
    
    text_pages = []
    for i, image in enumerate(images):
        # Run OCR on each page
        text = pytesseract.image_to_string(image, lang='eng')
        text_pages.append(f"--- Page {i+1} ---\n{text}")
        print(f"Processed page {i+1}/{len(images)}")
    
    return '\n\n'.join(text_pages)

# Usage
text = extract_text_from_scanned_pdf('contract.pdf')
with open('contract_text.txt', 'w') as f:
    f.write(text)

Dependencies:

pip install pdf2image pytesseract Pillow
# Also install: poppler (for pdf2image) and tesseract (for pytesseract)

Method 4: Adobe Acrobat Reader (Recognize Text)

Even the free Adobe Acrobat Reader can recognize text in scanned PDFs:

Open the scanned PDF in Adobe Acrobat Reader
Look for the notification bar: “This document contains only images”
Click “Recognize Text” (appears in the right panel or notification)
Wait for processing
Now Ctrl+F search and text selection work

Limitations: free version can recognize but may not let you export the text layer.

Method 5: macOS Preview (Built-In on Mac)

macOS Preview has improved significantly and now includes basic OCR:

Open the scanned PDF in Preview
Select the text tool (T)
Try to click and drag on text areas
If OCR is needed, use Edit → Redact or import to Notes for Apple Intelligence OCR

Apple’s Live Text feature in macOS Monterey+ recognizes text in images automatically when you use the selection tool.

Extracting Text While Preserving Formatting

Sometimes you need the text with its original structure (columns, tables, headings). Tools for this:

ABBYY FineReader (paid) — best structure preservation
Adobe Acrobat Pro (paid) — good table extraction
Camelot (Python, free) — specifically for tables in PDFs:

import camelot
tables = camelot.read_pdf('annual_report.pdf', pages='1-5')
tables[0].df  # Returns a pandas DataFrame
tables.export('tables.csv', f='csv')

Which Method Should You Use?

Scenario	Best Method
Quick, one-time extraction	AltoUnlockPDF or Google Drive
Bulk processing (100+ PDFs)	Python + pytesseract
Mac user	macOS Live Text + Preview
Need tables preserved	ABBYY FineReader or Camelot
Developer building a product	pytesseract API or cloud OCR API

The best free combination for most people: Google Drive for single documents, Python script for batch processing.