REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start

CPU Version

cd src/doctr_service

# Build
docker build -t doctr-api:cpu .

# Run
docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

# Test
curl http://localhost:8003/health

GPU Version

# Build GPU image
docker build -f Dockerfile.gpu -t doctr-api:gpu .

# Run with GPU
docker run -d -p 8003:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:gpu

Files

File	Description
`doctr_tuning_rest.py`	FastAPI REST service with 9 tunable hyperparameters
`dataset_manager.py`	Dataset loader (shared with other services)
`Dockerfile`	CPU-only image (amd64 + arm64)
`Dockerfile.gpu`	GPU/CUDA image (amd64 + arm64)
`requirements.txt`	Python dependencies

API Endpoints

`GET /health`

Check if service is ready.

{
  "status": "ok",
  "model_loaded": true,
  "dataset_loaded": true,
  "dataset_size": 24,
  "det_arch": "db_resnet50",
  "reco_arch": "crnn_vgg16_bn",
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10"
}

`POST /evaluate`

Run OCR evaluation with given hyperparameters.

Request (9 tunable parameters):

{
  "pdf_folder": "/app/dataset",
  "assume_straight_pages": true,
  "straighten_pages": false,
  "preserve_aspect_ratio": true,
  "symmetric_pad": true,
  "disable_page_orientation": false,
  "disable_crop_orientation": false,
  "resolve_lines": true,
  "resolve_blocks": false,
  "paragraph_break": 0.035,
  "start_page": 5,
  "end_page": 10
}

Response:

{
  "CER": 0.0189,
  "WER": 0.1023,
  "TIME": 52.3,
  "PAGES": 5,
  "TIME_PER_PAGE": 10.46,
  "model_reinitialized": false
}

Note: model_reinitialized indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).

Debug Output (debugset)

The debugset folder allows saving OCR predictions for debugging and analysis. When save_output=True is passed to /evaluate, predictions are written to /app/debugset.

Enable Debug Output

{
  "pdf_folder": "/app/dataset",
  "save_output": true,
  "start_page": 5,
  "end_page": 10
}

Output Structure

debugset/
├── doc1/
│   └── doctr/
│       ├── page_0005.txt
│       ├── page_0006.txt
│       └── ...
├── doc2/
│   └── doctr/
│       └── ...

Each .txt file contains the OCR-extracted text for that page.

Docker Mount

Add the debugset volume to your docker run command:

docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v $(pwd)/../debugset:/app/debugset:rw \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

Use Cases

Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with save_output=True, then diff results
Debug hyperparameters: See how different settings affect text extraction
Ground truth comparison: Compare predictions against expected output

Hyperparameters

Processing Flags (Require Model Reinitialization)

Parameter	Default	Description
`assume_straight_pages`	true	Skip rotation handling for straight documents
`straighten_pages`	false	Pre-straighten pages before detection
`preserve_aspect_ratio`	true	Maintain document proportions during resize
`symmetric_pad`	true	Use symmetric padding when preserving aspect ratio

Note: Changing these flags requires model reinitialization (~2-5s).

Orientation Flags

Parameter	Default	Description
`disable_page_orientation`	false	Skip page orientation classification
`disable_crop_orientation`	false	Skip crop orientation detection

Output Grouping

Parameter	Default	Range	Description
`resolve_lines`	true	bool	Group words into lines
`resolve_blocks`	false	bool	Group lines into blocks
`paragraph_break`	0.035	0.0-1.0	Minimum space ratio separating paragraphs

Model Architecture

DocTR uses a two-stage pipeline:

Detection (det_arch): Localizes text regions
- Default: db_resnet50 (DBNet with ResNet-50 backbone)
- Alternatives: linknet_resnet18, db_mobilenet_v3_large
Recognition (reco_arch): Recognizes characters
- Default: crnn_vgg16_bn (CRNN with VGG-16 backbone)
- Alternatives: sar_resnet31, master, vitstr_small

Architecture is set via environment variables (fixed at startup).

GPU Support

Platform Support

Platform	CPU	GPU
Linux x86_64 (amd64)	✅	✅ PyTorch CUDA
Linux ARM64 (GH200/GB200/DGX Spark)	✅	✅ PyTorch CUDA (cu128 index)
macOS ARM64 (M1/M2)	✅	❌

PyTorch CUDA on ARM64

Unlike PaddlePaddle, PyTorch provides official ARM64 CUDA wheels on the cu128 index:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

This works on both amd64 and arm64 platforms with CUDA support.

GPU Detection

DocTR automatically uses GPU when available:

import torch
print(torch.cuda.is_available())  # True if GPU available

# DocTR model moves to GPU
model = ocr_predictor(pretrained=True)
if torch.cuda.is_available():
    model = model.cuda()

The /health endpoint shows GPU status:

{
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10",
  "gpu_memory_total": "128.00 GB"
}

Environment Variables

Variable	Default	Description
`DOCTR_DET_ARCH`	`db_resnet50`	Detection architecture
`DOCTR_RECO_ARCH`	`crnn_vgg16_bn`	Recognition architecture
`CUDA_VISIBLE_DEVICES`	`0`	GPU device selection

CI/CD

Built images available from registry:

Image	Architecture
`seryus.ddns.net/unir/doctr-cpu:latest`	amd64, arm64
`seryus.ddns.net/unir/doctr-gpu:latest`	amd64, arm64

README.md

DocTR Tuning REST API