DocTR Tuning REST API
REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
Quick Start
CPU Version
cd src/doctr_service
# Build
docker build -t doctr-api:cpu .
# Run
docker run -d -p 8003:8000 \
-v $(pwd)/../dataset:/app/dataset:ro \
-v doctr-cache:/root/.cache/doctr \
doctr-api:cpu
# Test
curl http://localhost:8003/health
GPU Version
# Build GPU image
docker build -f Dockerfile.gpu -t doctr-api:gpu .
# Run with GPU
docker run -d -p 8003:8000 --gpus all \
-v $(pwd)/../dataset:/app/dataset:ro \
-v doctr-cache:/root/.cache/doctr \
doctr-api:gpu
Files
| File | Description |
|---|---|
doctr_tuning_rest.py |
FastAPI REST service with 9 tunable hyperparameters |
dataset_manager.py |
Dataset loader (shared with other services) |
Dockerfile |
CPU-only image (amd64 + arm64) |
Dockerfile.gpu |
GPU/CUDA image (amd64 + arm64) |
requirements.txt |
Python dependencies |
API Endpoints
GET /health
Check if service is ready.
{
"status": "ok",
"model_loaded": true,
"dataset_loaded": true,
"dataset_size": 24,
"det_arch": "db_resnet50",
"reco_arch": "crnn_vgg16_bn",
"cuda_available": true,
"device": "cuda",
"gpu_name": "NVIDIA GB10"
}
POST /evaluate
Run OCR evaluation with given hyperparameters.
Request (9 tunable parameters):
{
"pdf_folder": "/app/dataset",
"assume_straight_pages": true,
"straighten_pages": false,
"preserve_aspect_ratio": true,
"symmetric_pad": true,
"disable_page_orientation": false,
"disable_crop_orientation": false,
"resolve_lines": true,
"resolve_blocks": false,
"paragraph_break": 0.035,
"start_page": 5,
"end_page": 10
}
Response:
{
"CER": 0.0189,
"WER": 0.1023,
"TIME": 52.3,
"PAGES": 5,
"TIME_PER_PAGE": 10.46,
"model_reinitialized": false
}
Note: model_reinitialized indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).
Debug Output (debugset)
The debugset folder allows saving OCR predictions for debugging and analysis. When save_output=True is passed to /evaluate, predictions are written to /app/debugset.
Enable Debug Output
{
"pdf_folder": "/app/dataset",
"save_output": true,
"start_page": 5,
"end_page": 10
}
Output Structure
debugset/
├── doc1/
│ └── doctr/
│ ├── page_0005.txt
│ ├── page_0006.txt
│ └── ...
├── doc2/
│ └── doctr/
│ └── ...
Each .txt file contains the OCR-extracted text for that page.
Docker Mount
Add the debugset volume to your docker run command:
docker run -d -p 8003:8000 \
-v $(pwd)/../dataset:/app/dataset:ro \
-v $(pwd)/../debugset:/app/debugset:rw \
-v doctr-cache:/root/.cache/doctr \
doctr-api:cpu
Use Cases
- Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with
save_output=True, then diff results - Debug hyperparameters: See how different settings affect text extraction
- Ground truth comparison: Compare predictions against expected output
Hyperparameters
Processing Flags (Require Model Reinitialization)
| Parameter | Default | Description |
|---|---|---|
assume_straight_pages |
true | Skip rotation handling for straight documents |
straighten_pages |
false | Pre-straighten pages before detection |
preserve_aspect_ratio |
true | Maintain document proportions during resize |
symmetric_pad |
true | Use symmetric padding when preserving aspect ratio |
Note: Changing these flags requires model reinitialization (~2-5s).
Orientation Flags
| Parameter | Default | Description |
|---|---|---|
disable_page_orientation |
false | Skip page orientation classification |
disable_crop_orientation |
false | Skip crop orientation detection |
Output Grouping
| Parameter | Default | Range | Description |
|---|---|---|---|
resolve_lines |
true | bool | Group words into lines |
resolve_blocks |
false | bool | Group lines into blocks |
paragraph_break |
0.035 | 0.0-1.0 | Minimum space ratio separating paragraphs |
Model Architecture
DocTR uses a two-stage pipeline:
-
Detection (
det_arch): Localizes text regions- Default:
db_resnet50(DBNet with ResNet-50 backbone) - Alternatives:
linknet_resnet18,db_mobilenet_v3_large
- Default:
-
Recognition (
reco_arch): Recognizes characters- Default:
crnn_vgg16_bn(CRNN with VGG-16 backbone) - Alternatives:
sar_resnet31,master,vitstr_small
- Default:
Architecture is set via environment variables (fixed at startup).
GPU Support
Platform Support
| Platform | CPU | GPU |
|---|---|---|
| Linux x86_64 (amd64) | ✅ | ✅ PyTorch CUDA |
| Linux ARM64 (GH200/GB200/DGX Spark) | ✅ | ✅ PyTorch CUDA (cu128 index) |
| macOS ARM64 (M1/M2) | ✅ | ❌ |
PyTorch CUDA on ARM64
Unlike PaddlePaddle, PyTorch provides official ARM64 CUDA wheels on the cu128 index:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
This works on both amd64 and arm64 platforms with CUDA support.
GPU Detection
DocTR automatically uses GPU when available:
import torch
print(torch.cuda.is_available()) # True if GPU available
# DocTR model moves to GPU
model = ocr_predictor(pretrained=True)
if torch.cuda.is_available():
model = model.cuda()
The /health endpoint shows GPU status:
{
"cuda_available": true,
"device": "cuda",
"gpu_name": "NVIDIA GB10",
"gpu_memory_total": "128.00 GB"
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
DOCTR_DET_ARCH |
db_resnet50 |
Detection architecture |
DOCTR_RECO_ARCH |
crnn_vgg16_bn |
Recognition architecture |
CUDA_VISIBLE_DEVICES |
0 |
GPU device selection |
CI/CD
Built images available from registry:
| Image | Architecture |
|---|---|
seryus.ddns.net/unir/doctr-cpu:latest |
amd64, arm64 |
seryus.ddns.net/unir/doctr-gpu:latest |
amd64, arm64 |