unir/MastersThesis

Fork 0

Files

sergio c7ed7b2b9c

build_docker / essential (push) Successful in 0s

Details

build_docker / build_cpu (push) Successful in 5m0s

Details

build_docker / build_gpu (push) Successful in 22m55s

Details

build_docker / build_easyocr (push) Successful in 18m47s

Details

build_docker / build_easyocr_gpu (push) Successful in 19m0s

Details

build_docker / build_raytune (push) Successful in 3m27s

Details

build_docker / build_doctr (push) Successful in 19m42s

Details

build_docker / build_doctr_gpu (push) Successful in 14m49s

Details

Paddle ocr, easyicr and doctr gpu support. (#4 )

2026-01-19 17:35:24 +00:00

6.8 KiB

Raw Permalink Blame History

DocTR Tuning REST API

REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start

CPU Version

cd src/doctr_service

# Build
docker build -t doctr-api:cpu .

# Run
docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

# Test
curl http://localhost:8003/health

GPU Version

# Build GPU image
docker build -f Dockerfile.gpu -t doctr-api:gpu .

# Run with GPU
docker run -d -p 8003:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:gpu

Files

File	Description
`doctr_tuning_rest.py`	FastAPI REST service with 9 tunable hyperparameters
`dataset_manager.py`	Dataset loader (shared with other services)
`Dockerfile`	CPU-only image (amd64 + arm64)
`Dockerfile.gpu`	GPU/CUDA image (amd64 + arm64)
`requirements.txt`	Python dependencies

API Endpoints

`GET /health`

Check if service is ready.

{
  "status": "ok",
  "model_loaded": true,
  "dataset_loaded": true,
  "dataset_size": 24,
  "det_arch": "db_resnet50",
  "reco_arch": "crnn_vgg16_bn",
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10"
}

`POST /evaluate`

Run OCR evaluation with given hyperparameters.

Request (9 tunable parameters):

{
  "pdf_folder": "/app/dataset",
  "assume_straight_pages": true,
  "straighten_pages": false,
  "preserve_aspect_ratio": true,
  "symmetric_pad": true,
  "disable_page_orientation": false,
  "disable_crop_orientation": false,
  "resolve_lines": true,
  "resolve_blocks": false,
  "paragraph_break": 0.035,
  "start_page": 5,
  "end_page": 10
}

Response:

{
  "CER": 0.0189,
  "WER": 0.1023,
  "TIME": 52.3,
  "PAGES": 5,
  "TIME_PER_PAGE": 10.46,
  "model_reinitialized": false
}

Note: model_reinitialized indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).

Debug Output (debugset)

The debugset folder allows saving OCR predictions for debugging and analysis. When save_output=True is passed to /evaluate, predictions are written to /app/debugset.

Enable Debug Output

{
  "pdf_folder": "/app/dataset",
  "save_output": true,
  "start_page": 5,
  "end_page": 10
}

Output Structure

debugset/
├── doc1/
│   └── doctr/
│       ├── page_0005.txt
│       ├── page_0006.txt
│       └── ...
├── doc2/
│   └── doctr/
│       └── ...

Each .txt file contains the OCR-extracted text for that page.

Docker Mount

Add the debugset volume to your docker run command:

docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v $(pwd)/../debugset:/app/debugset:rw \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

Use Cases

Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with save_output=True, then diff results
Debug hyperparameters: See how different settings affect text extraction
Ground truth comparison: Compare predictions against expected output

Hyperparameters

Processing Flags (Require Model Reinitialization)

Parameter	Default	Description
`assume_straight_pages`	true	Skip rotation handling for straight documents
`straighten_pages`	false	Pre-straighten pages before detection
`preserve_aspect_ratio`	true	Maintain document proportions during resize
`symmetric_pad`	true	Use symmetric padding when preserving aspect ratio

Note: Changing these flags requires model reinitialization (~2-5s).

Orientation Flags

Parameter	Default	Description
`disable_page_orientation`	false	Skip page orientation classification
`disable_crop_orientation`	false	Skip crop orientation detection

Output Grouping

Parameter	Default	Range	Description
`resolve_lines`	true	bool	Group words into lines
`resolve_blocks`	false	bool	Group lines into blocks
`paragraph_break`	0.035	0.0-1.0	Minimum space ratio separating paragraphs

Model Architecture

DocTR uses a two-stage pipeline:

Detection (det_arch): Localizes text regions
- Default: db_resnet50 (DBNet with ResNet-50 backbone)
- Alternatives: linknet_resnet18, db_mobilenet_v3_large
Recognition (reco_arch): Recognizes characters
- Default: crnn_vgg16_bn (CRNN with VGG-16 backbone)
- Alternatives: sar_resnet31, master, vitstr_small

Architecture is set via environment variables (fixed at startup).

GPU Support

Platform Support

Platform	CPU	GPU
Linux x86_64 (amd64)	✅	✅ PyTorch CUDA
Linux ARM64 (GH200/GB200/DGX Spark)	✅	✅ PyTorch CUDA (cu128 index)
macOS ARM64 (M1/M2)	✅	❌

PyTorch CUDA on ARM64

Unlike PaddlePaddle, PyTorch provides official ARM64 CUDA wheels on the cu128 index:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

This works on both amd64 and arm64 platforms with CUDA support.

GPU Detection

DocTR automatically uses GPU when available:

import torch
print(torch.cuda.is_available())  # True if GPU available

# DocTR model moves to GPU
model = ocr_predictor(pretrained=True)
if torch.cuda.is_available():
    model = model.cuda()

The /health endpoint shows GPU status:

{
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10",
  "gpu_memory_total": "128.00 GB"
}

Environment Variables

Variable	Default	Description
`DOCTR_DET_ARCH`	`db_resnet50`	Detection architecture
`DOCTR_RECO_ARCH`	`crnn_vgg16_bn`	Recognition architecture
`CUDA_VISIBLE_DEVICES`	`0`	GPU device selection

CI/CD

Built images available from registry:

Image	Architecture
`seryus.ddns.net/unir/doctr-cpu:latest`	amd64, arm64
`seryus.ddns.net/unir/doctr-gpu:latest`	amd64, arm64

6.8 KiB Raw Permalink Blame History

DocTR Tuning REST API

Quick Start

CPU Version

GPU Version

Files

API Endpoints

GET /health

POST /evaluate

Debug Output (debugset)

Enable Debug Output

Output Structure

Docker Mount

Use Cases

Hyperparameters

Processing Flags (Require Model Reinitialization)

Orientation Flags

Output Grouping

Model Architecture

GPU Support

Platform Support

PyTorch CUDA on ARM64

GPU Detection

Environment Variables

CI/CD

Sources

6.8 KiB

Raw Permalink Blame History

`GET /health`

`POST /evaluate`