Paddle ocr, easyicr and doctr gpu support. (#4)

2026-01-19 17:35:24 +00:00
parent 8e2b7a5096
commit c7ed7b2b9c
105 changed files with 8170 additions and 1263 deletions
--- a/src/doctr_service/README.md
+++ b/src/doctr_service/README.md
@@ -0,0 +1,261 @@
+# DocTR Tuning REST API
+
+REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
+
+## Quick Start
+
+### CPU Version
+
+```bash
+cd src/doctr_service
+
+# Build
+docker build -t doctr-api:cpu .
+
+# Run
+docker run -d -p 8003:8000 \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  -v doctr-cache:/root/.cache/doctr \
+  doctr-api:cpu
+
+# Test
+curl http://localhost:8003/health
+```
+
+### GPU Version
+
+```bash
+# Build GPU image
+docker build -f Dockerfile.gpu -t doctr-api:gpu .
+
+# Run with GPU
+docker run -d -p 8003:8000 --gpus all \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  -v doctr-cache:/root/.cache/doctr \
+  doctr-api:gpu
+```
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `doctr_tuning_rest.py` | FastAPI REST service with 9 tunable hyperparameters |
+| `dataset_manager.py` | Dataset loader (shared with other services) |
+| `Dockerfile` | CPU-only image (amd64 + arm64) |
+| `Dockerfile.gpu` | GPU/CUDA image (amd64 + arm64) |
+| `requirements.txt` | Python dependencies |
+
+## API Endpoints
+
+### `GET /health`
+
+Check if service is ready.
+
+```json
+{
+  "status": "ok",
+  "model_loaded": true,
+  "dataset_loaded": true,
+  "dataset_size": 24,
+  "det_arch": "db_resnet50",
+  "reco_arch": "crnn_vgg16_bn",
+  "cuda_available": true,
+  "device": "cuda",
+  "gpu_name": "NVIDIA GB10"
+}
+```
+
+### `POST /evaluate`
+
+Run OCR evaluation with given hyperparameters.
+
+**Request (9 tunable parameters):**
+```json
+{
+  "pdf_folder": "/app/dataset",
+  "assume_straight_pages": true,
+  "straighten_pages": false,
+  "preserve_aspect_ratio": true,
+  "symmetric_pad": true,
+  "disable_page_orientation": false,
+  "disable_crop_orientation": false,
+  "resolve_lines": true,
+  "resolve_blocks": false,
+  "paragraph_break": 0.035,
+  "start_page": 5,
+  "end_page": 10
+}
+```
+
+**Response:**
+```json
+{
+  "CER": 0.0189,
+  "WER": 0.1023,
+  "TIME": 52.3,
+  "PAGES": 5,
+  "TIME_PER_PAGE": 10.46,
+  "model_reinitialized": false
+}
+```
+
+**Note:** `model_reinitialized` indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).
+
+## Debug Output (debugset)
+
+The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`.
+
+### Enable Debug Output
+
+```json
+{
+  "pdf_folder": "/app/dataset",
+  "save_output": true,
+  "start_page": 5,
+  "end_page": 10
+}
+```
+
+### Output Structure
+
+```
+debugset/
+├── doc1/
+│   └── doctr/
+│       ├── page_0005.txt
+│       ├── page_0006.txt
+│       └── ...
+├── doc2/
+│   └── doctr/
+│       └── ...
+```
+
+Each `.txt` file contains the OCR-extracted text for that page.
+
+### Docker Mount
+
+Add the debugset volume to your docker run command:
+
+```bash
+docker run -d -p 8003:8000 \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  -v $(pwd)/../debugset:/app/debugset:rw \
+  -v doctr-cache:/root/.cache/doctr \
+  doctr-api:cpu
+```
+
+### Use Cases
+
+- **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results
+- **Debug hyperparameters**: See how different settings affect text extraction
+- **Ground truth comparison**: Compare predictions against expected output
+
+## Hyperparameters
+
+### Processing Flags (Require Model Reinitialization)
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `assume_straight_pages` | true | Skip rotation handling for straight documents |
+| `straighten_pages` | false | Pre-straighten pages before detection |
+| `preserve_aspect_ratio` | true | Maintain document proportions during resize |
+| `symmetric_pad` | true | Use symmetric padding when preserving aspect ratio |
+
+**Note:** Changing these flags requires model reinitialization (~2-5s).
+
+### Orientation Flags
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `disable_page_orientation` | false | Skip page orientation classification |
+| `disable_crop_orientation` | false | Skip crop orientation detection |
+
+### Output Grouping
+
+| Parameter | Default | Range | Description |
+|-----------|---------|-------|-------------|
+| `resolve_lines` | true | bool | Group words into lines |
+| `resolve_blocks` | false | bool | Group lines into blocks |
+| `paragraph_break` | 0.035 | 0.0-1.0 | Minimum space ratio separating paragraphs |
+
+## Model Architecture
+
+DocTR uses a two-stage pipeline:
+
+1. **Detection** (`det_arch`): Localizes text regions
+   - Default: `db_resnet50` (DBNet with ResNet-50 backbone)
+   - Alternatives: `linknet_resnet18`, `db_mobilenet_v3_large`
+
+2. **Recognition** (`reco_arch`): Recognizes characters
+   - Default: `crnn_vgg16_bn` (CRNN with VGG-16 backbone)
+   - Alternatives: `sar_resnet31`, `master`, `vitstr_small`
+
+Architecture is set via environment variables (fixed at startup).
+
+## GPU Support
+
+### Platform Support
+
+| Platform | CPU | GPU |
+|----------|-----|-----|
+| Linux x86_64 (amd64) | ✅ | ✅ PyTorch CUDA |
+| Linux ARM64 (GH200/GB200/DGX Spark) | ✅ | ✅ PyTorch CUDA (cu128 index) |
+| macOS ARM64 (M1/M2) | ✅ | ❌ |
+
+### PyTorch CUDA on ARM64
+
+Unlike PaddlePaddle, PyTorch provides **official ARM64 CUDA wheels** on the cu128 index:
+
+```bash
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
+```
+
+This works on both amd64 and arm64 platforms with CUDA support.
+
+### GPU Detection
+
+DocTR automatically uses GPU when available:
+
+```python
+import torch
+print(torch.cuda.is_available())  # True if GPU available
+
+# DocTR model moves to GPU
+model = ocr_predictor(pretrained=True)
+if torch.cuda.is_available():
+    model = model.cuda()
+```
+
+The `/health` endpoint shows GPU status:
+```json
+{
+  "cuda_available": true,
+  "device": "cuda",
+  "gpu_name": "NVIDIA GB10",
+  "gpu_memory_total": "128.00 GB"
+}
+```
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `DOCTR_DET_ARCH` | `db_resnet50` | Detection architecture |
+| `DOCTR_RECO_ARCH` | `crnn_vgg16_bn` | Recognition architecture |
+| `CUDA_VISIBLE_DEVICES` | `0` | GPU device selection |
+
+## CI/CD
+
+Built images available from registry:
+
+| Image | Architecture |
+|-------|--------------|
+| `seryus.ddns.net/unir/doctr-cpu:latest` | amd64, arm64 |
+| `seryus.ddns.net/unir/doctr-gpu:latest` | amd64, arm64 |
+
+## Sources
+
+- [DocTR Documentation](https://mindee.github.io/doctr/)
+- [DocTR GitHub](https://github.com/mindee/doctr)
+- [DocTR Model Usage](https://mindee.github.io/doctr/latest/using_doctr/using_models.html)
+- [PyTorch ARM64 CUDA Wheels](https://github.com/pytorch/pytorch/issues/160162)