Files
sergio c7ed7b2b9c
All checks were successful
build_docker / essential (push) Successful in 0s
build_docker / build_cpu (push) Successful in 5m0s
build_docker / build_gpu (push) Successful in 22m55s
build_docker / build_easyocr (push) Successful in 18m47s
build_docker / build_easyocr_gpu (push) Successful in 19m0s
build_docker / build_raytune (push) Successful in 3m27s
build_docker / build_doctr (push) Successful in 19m42s
build_docker / build_doctr_gpu (push) Successful in 14m49s
Paddle ocr, easyicr and doctr gpu support. (#4)
2026-01-19 17:35:24 +00:00

6.8 KiB

DocTR Tuning REST API

REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start

CPU Version

cd src/doctr_service

# Build
docker build -t doctr-api:cpu .

# Run
docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

# Test
curl http://localhost:8003/health

GPU Version

# Build GPU image
docker build -f Dockerfile.gpu -t doctr-api:gpu .

# Run with GPU
docker run -d -p 8003:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:gpu

Files

File Description
doctr_tuning_rest.py FastAPI REST service with 9 tunable hyperparameters
dataset_manager.py Dataset loader (shared with other services)
Dockerfile CPU-only image (amd64 + arm64)
Dockerfile.gpu GPU/CUDA image (amd64 + arm64)
requirements.txt Python dependencies

API Endpoints

GET /health

Check if service is ready.

{
  "status": "ok",
  "model_loaded": true,
  "dataset_loaded": true,
  "dataset_size": 24,
  "det_arch": "db_resnet50",
  "reco_arch": "crnn_vgg16_bn",
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10"
}

POST /evaluate

Run OCR evaluation with given hyperparameters.

Request (9 tunable parameters):

{
  "pdf_folder": "/app/dataset",
  "assume_straight_pages": true,
  "straighten_pages": false,
  "preserve_aspect_ratio": true,
  "symmetric_pad": true,
  "disable_page_orientation": false,
  "disable_crop_orientation": false,
  "resolve_lines": true,
  "resolve_blocks": false,
  "paragraph_break": 0.035,
  "start_page": 5,
  "end_page": 10
}

Response:

{
  "CER": 0.0189,
  "WER": 0.1023,
  "TIME": 52.3,
  "PAGES": 5,
  "TIME_PER_PAGE": 10.46,
  "model_reinitialized": false
}

Note: model_reinitialized indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).

Debug Output (debugset)

The debugset folder allows saving OCR predictions for debugging and analysis. When save_output=True is passed to /evaluate, predictions are written to /app/debugset.

Enable Debug Output

{
  "pdf_folder": "/app/dataset",
  "save_output": true,
  "start_page": 5,
  "end_page": 10
}

Output Structure

debugset/
├── doc1/
│   └── doctr/
│       ├── page_0005.txt
│       ├── page_0006.txt
│       └── ...
├── doc2/
│   └── doctr/
│       └── ...

Each .txt file contains the OCR-extracted text for that page.

Docker Mount

Add the debugset volume to your docker run command:

docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v $(pwd)/../debugset:/app/debugset:rw \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

Use Cases

  • Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with save_output=True, then diff results
  • Debug hyperparameters: See how different settings affect text extraction
  • Ground truth comparison: Compare predictions against expected output

Hyperparameters

Processing Flags (Require Model Reinitialization)

Parameter Default Description
assume_straight_pages true Skip rotation handling for straight documents
straighten_pages false Pre-straighten pages before detection
preserve_aspect_ratio true Maintain document proportions during resize
symmetric_pad true Use symmetric padding when preserving aspect ratio

Note: Changing these flags requires model reinitialization (~2-5s).

Orientation Flags

Parameter Default Description
disable_page_orientation false Skip page orientation classification
disable_crop_orientation false Skip crop orientation detection

Output Grouping

Parameter Default Range Description
resolve_lines true bool Group words into lines
resolve_blocks false bool Group lines into blocks
paragraph_break 0.035 0.0-1.0 Minimum space ratio separating paragraphs

Model Architecture

DocTR uses a two-stage pipeline:

  1. Detection (det_arch): Localizes text regions

    • Default: db_resnet50 (DBNet with ResNet-50 backbone)
    • Alternatives: linknet_resnet18, db_mobilenet_v3_large
  2. Recognition (reco_arch): Recognizes characters

    • Default: crnn_vgg16_bn (CRNN with VGG-16 backbone)
    • Alternatives: sar_resnet31, master, vitstr_small

Architecture is set via environment variables (fixed at startup).

GPU Support

Platform Support

Platform CPU GPU
Linux x86_64 (amd64) PyTorch CUDA
Linux ARM64 (GH200/GB200/DGX Spark) PyTorch CUDA (cu128 index)
macOS ARM64 (M1/M2)

PyTorch CUDA on ARM64

Unlike PaddlePaddle, PyTorch provides official ARM64 CUDA wheels on the cu128 index:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

This works on both amd64 and arm64 platforms with CUDA support.

GPU Detection

DocTR automatically uses GPU when available:

import torch
print(torch.cuda.is_available())  # True if GPU available

# DocTR model moves to GPU
model = ocr_predictor(pretrained=True)
if torch.cuda.is_available():
    model = model.cuda()

The /health endpoint shows GPU status:

{
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10",
  "gpu_memory_total": "128.00 GB"
}

Environment Variables

Variable Default Description
DOCTR_DET_ARCH db_resnet50 Detection architecture
DOCTR_RECO_ARCH crnn_vgg16_bn Recognition architecture
CUDA_VISIBLE_DEVICES 0 GPU device selection

CI/CD

Built images available from registry:

Image Architecture
seryus.ddns.net/unir/doctr-cpu:latest amd64, arm64
seryus.ddns.net/unir/doctr-gpu:latest amd64, arm64

Sources