# DocTR Tuning REST API REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search. ## Quick Start ### CPU Version ```bash cd src/doctr_service # Build docker build -t doctr-api:cpu . # Run docker run -d -p 8003:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ -v doctr-cache:/root/.cache/doctr \ doctr-api:cpu # Test curl http://localhost:8003/health ``` ### GPU Version ```bash # Build GPU image docker build -f Dockerfile.gpu -t doctr-api:gpu . # Run with GPU docker run -d -p 8003:8000 --gpus all \ -v $(pwd)/../dataset:/app/dataset:ro \ -v doctr-cache:/root/.cache/doctr \ doctr-api:gpu ``` ## Files | File | Description | |------|-------------| | `doctr_tuning_rest.py` | FastAPI REST service with 9 tunable hyperparameters | | `dataset_manager.py` | Dataset loader (shared with other services) | | `Dockerfile` | CPU-only image (amd64 + arm64) | | `Dockerfile.gpu` | GPU/CUDA image (amd64 + arm64) | | `requirements.txt` | Python dependencies | ## API Endpoints ### `GET /health` Check if service is ready. ```json { "status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24, "det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn", "cuda_available": true, "device": "cuda", "gpu_name": "NVIDIA GB10" } ``` ### `POST /evaluate` Run OCR evaluation with given hyperparameters. **Request (9 tunable parameters):** ```json { "pdf_folder": "/app/dataset", "assume_straight_pages": true, "straighten_pages": false, "preserve_aspect_ratio": true, "symmetric_pad": true, "disable_page_orientation": false, "disable_crop_orientation": false, "resolve_lines": true, "resolve_blocks": false, "paragraph_break": 0.035, "start_page": 5, "end_page": 10 } ``` **Response:** ```json { "CER": 0.0189, "WER": 0.1023, "TIME": 52.3, "PAGES": 5, "TIME_PER_PAGE": 10.46, "model_reinitialized": false } ``` **Note:** `model_reinitialized` indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead). ## Debug Output (debugset) The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`. ### Enable Debug Output ```json { "pdf_folder": "/app/dataset", "save_output": true, "start_page": 5, "end_page": 10 } ``` ### Output Structure ``` debugset/ ├── doc1/ │ └── doctr/ │ ├── page_0005.txt │ ├── page_0006.txt │ └── ... ├── doc2/ │ └── doctr/ │ └── ... ``` Each `.txt` file contains the OCR-extracted text for that page. ### Docker Mount Add the debugset volume to your docker run command: ```bash docker run -d -p 8003:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ -v $(pwd)/../debugset:/app/debugset:rw \ -v doctr-cache:/root/.cache/doctr \ doctr-api:cpu ``` ### Use Cases - **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results - **Debug hyperparameters**: See how different settings affect text extraction - **Ground truth comparison**: Compare predictions against expected output ## Hyperparameters ### Processing Flags (Require Model Reinitialization) | Parameter | Default | Description | |-----------|---------|-------------| | `assume_straight_pages` | true | Skip rotation handling for straight documents | | `straighten_pages` | false | Pre-straighten pages before detection | | `preserve_aspect_ratio` | true | Maintain document proportions during resize | | `symmetric_pad` | true | Use symmetric padding when preserving aspect ratio | **Note:** Changing these flags requires model reinitialization (~2-5s). ### Orientation Flags | Parameter | Default | Description | |-----------|---------|-------------| | `disable_page_orientation` | false | Skip page orientation classification | | `disable_crop_orientation` | false | Skip crop orientation detection | ### Output Grouping | Parameter | Default | Range | Description | |-----------|---------|-------|-------------| | `resolve_lines` | true | bool | Group words into lines | | `resolve_blocks` | false | bool | Group lines into blocks | | `paragraph_break` | 0.035 | 0.0-1.0 | Minimum space ratio separating paragraphs | ## Model Architecture DocTR uses a two-stage pipeline: 1. **Detection** (`det_arch`): Localizes text regions - Default: `db_resnet50` (DBNet with ResNet-50 backbone) - Alternatives: `linknet_resnet18`, `db_mobilenet_v3_large` 2. **Recognition** (`reco_arch`): Recognizes characters - Default: `crnn_vgg16_bn` (CRNN with VGG-16 backbone) - Alternatives: `sar_resnet31`, `master`, `vitstr_small` Architecture is set via environment variables (fixed at startup). ## GPU Support ### Platform Support | Platform | CPU | GPU | |----------|-----|-----| | Linux x86_64 (amd64) | ✅ | ✅ PyTorch CUDA | | Linux ARM64 (GH200/GB200/DGX Spark) | ✅ | ✅ PyTorch CUDA (cu128 index) | | macOS ARM64 (M1/M2) | ✅ | ❌ | ### PyTorch CUDA on ARM64 Unlike PaddlePaddle, PyTorch provides **official ARM64 CUDA wheels** on the cu128 index: ```bash pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128 ``` This works on both amd64 and arm64 platforms with CUDA support. ### GPU Detection DocTR automatically uses GPU when available: ```python import torch print(torch.cuda.is_available()) # True if GPU available # DocTR model moves to GPU model = ocr_predictor(pretrained=True) if torch.cuda.is_available(): model = model.cuda() ``` The `/health` endpoint shows GPU status: ```json { "cuda_available": true, "device": "cuda", "gpu_name": "NVIDIA GB10", "gpu_memory_total": "128.00 GB" } ``` ## Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `DOCTR_DET_ARCH` | `db_resnet50` | Detection architecture | | `DOCTR_RECO_ARCH` | `crnn_vgg16_bn` | Recognition architecture | | `CUDA_VISIBLE_DEVICES` | `0` | GPU device selection | ## CI/CD Built images available from registry: | Image | Architecture | |-------|--------------| | `seryus.ddns.net/unir/doctr-cpu:latest` | amd64, arm64 | | `seryus.ddns.net/unir/doctr-gpu:latest` | amd64, arm64 | ## Sources - [DocTR Documentation](https://mindee.github.io/doctr/) - [DocTR GitHub](https://github.com/mindee/doctr) - [DocTR Model Usage](https://mindee.github.io/doctr/latest/using_doctr/using_models.html) - [PyTorch ARM64 CUDA Wheels](https://github.com/pytorch/pytorch/issues/160162)