src/doctr_service/README.md

# DocTR Tuning REST API

REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

## Quick Start

### CPU Version

```bash
cd src/doctr_service

# Build
docker build -t doctr-api:cpu .

# Run
docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu

# Test
curl http://localhost:8003/health
```

### GPU Version

```bash
# Build GPU image
docker build -f Dockerfile.gpu -t doctr-api:gpu .

# Run with GPU
docker run -d -p 8003:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:gpu
```

## Files

| File | Description |
|------|-------------|
| `doctr_tuning_rest.py` | FastAPI REST service with 9 tunable hyperparameters |
| `dataset_manager.py` | Dataset loader (shared with other services) |
| `Dockerfile` | CPU-only image (amd64 + arm64) |
| `Dockerfile.gpu` | GPU/CUDA image (amd64 + arm64) |
| `requirements.txt` | Python dependencies |

## API Endpoints

### `GET /health`

Check if service is ready.

```json
{
  "status": "ok",
  "model_loaded": true,
  "dataset_loaded": true,
  "dataset_size": 24,
  "det_arch": "db_resnet50",
  "reco_arch": "crnn_vgg16_bn",
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10"
}
```

### `POST /evaluate`

Run OCR evaluation with given hyperparameters.

**Request (9 tunable parameters):**
```json
{
  "pdf_folder": "/app/dataset",
  "assume_straight_pages": true,
  "straighten_pages": false,
  "preserve_aspect_ratio": true,
  "symmetric_pad": true,
  "disable_page_orientation": false,
  "disable_crop_orientation": false,
  "resolve_lines": true,
  "resolve_blocks": false,
  "paragraph_break": 0.035,
  "start_page": 5,
  "end_page": 10
}
```

**Response:**
```json
{
  "CER": 0.0189,
  "WER": 0.1023,
  "TIME": 52.3,
  "PAGES": 5,
  "TIME_PER_PAGE": 10.46,
  "model_reinitialized": false
}
```

**Note:** `model_reinitialized` indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).

## Debug Output (debugset)

The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`.

### Enable Debug Output

```json
{
  "pdf_folder": "/app/dataset",
  "save_output": true,
  "start_page": 5,
  "end_page": 10
}
```

### Output Structure

```
debugset/
├── doc1/
│   └── doctr/
│       ├── page_0005.txt
│       ├── page_0006.txt
│       └── ...
├── doc2/
│   └── doctr/
│       └── ...
```

Each `.txt` file contains the OCR-extracted text for that page.

### Docker Mount

Add the debugset volume to your docker run command:

```bash
docker run -d -p 8003:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v $(pwd)/../debugset:/app/debugset:rw \
  -v doctr-cache:/root/.cache/doctr \
  doctr-api:cpu
```

### Use Cases

- **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results
- **Debug hyperparameters**: See how different settings affect text extraction
- **Ground truth comparison**: Compare predictions against expected output

## Hyperparameters

### Processing Flags (Require Model Reinitialization)

| Parameter | Default | Description |
|-----------|---------|-------------|
| `assume_straight_pages` | true | Skip rotation handling for straight documents |
| `straighten_pages` | false | Pre-straighten pages before detection |
| `preserve_aspect_ratio` | true | Maintain document proportions during resize |
| `symmetric_pad` | true | Use symmetric padding when preserving aspect ratio |

**Note:** Changing these flags requires model reinitialization (~2-5s).

### Orientation Flags

| Parameter | Default | Description |
|-----------|---------|-------------|
| `disable_page_orientation` | false | Skip page orientation classification |
| `disable_crop_orientation` | false | Skip crop orientation detection |

### Output Grouping

| Parameter | Default | Range | Description |
|-----------|---------|-------|-------------|
| `resolve_lines` | true | bool | Group words into lines |
| `resolve_blocks` | false | bool | Group lines into blocks |
| `paragraph_break` | 0.035 | 0.0-1.0 | Minimum space ratio separating paragraphs |

## Model Architecture

DocTR uses a two-stage pipeline:

1. **Detection** (`det_arch`): Localizes text regions
   - Default: `db_resnet50` (DBNet with ResNet-50 backbone)
   - Alternatives: `linknet_resnet18`, `db_mobilenet_v3_large`

2. **Recognition** (`reco_arch`): Recognizes characters
   - Default: `crnn_vgg16_bn` (CRNN with VGG-16 backbone)
   - Alternatives: `sar_resnet31`, `master`, `vitstr_small`

Architecture is set via environment variables (fixed at startup).

## GPU Support

### Platform Support

| Platform | CPU | GPU |
|----------|-----|-----|
| Linux x86_64 (amd64) | ✅ | ✅ PyTorch CUDA |
| Linux ARM64 (GH200/GB200/DGX Spark) | ✅ | ✅ PyTorch CUDA (cu128 index) |
| macOS ARM64 (M1/M2) | ✅ | ❌ |

### PyTorch CUDA on ARM64

Unlike PaddlePaddle, PyTorch provides **official ARM64 CUDA wheels** on the cu128 index:

```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
```

This works on both amd64 and arm64 platforms with CUDA support.

### GPU Detection

DocTR automatically uses GPU when available:

```python
import torch
print(torch.cuda.is_available())  # True if GPU available

# DocTR model moves to GPU
model = ocr_predictor(pretrained=True)
if torch.cuda.is_available():
    model = model.cuda()
```

The `/health` endpoint shows GPU status:
```json
{
  "cuda_available": true,
  "device": "cuda",
  "gpu_name": "NVIDIA GB10",
  "gpu_memory_total": "128.00 GB"
}
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `DOCTR_DET_ARCH` | `db_resnet50` | Detection architecture |
| `DOCTR_RECO_ARCH` | `crnn_vgg16_bn` | Recognition architecture |
| `CUDA_VISIBLE_DEVICES` | `0` | GPU device selection |

## CI/CD

Built images available from registry:

| Image | Architecture |
|-------|--------------|
| `seryus.ddns.net/unir/doctr-cpu:latest` | amd64, arm64 |
| `seryus.ddns.net/unir/doctr-gpu:latest` | amd64, arm64 |

## Sources

- [DocTR Documentation](https://mindee.github.io/doctr/)
- [DocTR GitHub](https://github.com/mindee/doctr)
- [DocTR Model Usage](https://mindee.github.io/doctr/latest/using_doctr/using_models.html)
- [PyTorch ARM64 CUDA Wheels](https://github.com/pytorch/pytorch/issues/160162)
Paddle ocr, easyicr and doctr gpu support. (#4) 2026-01-19 17:35:24 +00:00			`# DocTR Tuning REST API`

			`REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.`

			`## Quick Start`

			`### CPU Version`

			```bash
			`cd src/doctr_service`

			`# Build`
			`docker build -t doctr-api:cpu .`

			`# Run`
			`docker run -d -p 8003:8000 \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`-v doctr-cache:/root/.cache/doctr \`
			`doctr-api:cpu`

			`# Test`
			`curl http://localhost:8003/health`
			```

			`### GPU Version`

			```bash
			`# Build GPU image`
			`docker build -f Dockerfile.gpu -t doctr-api:gpu .`

			`# Run with GPU`
			`docker run -d -p 8003:8000 --gpus all \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`-v doctr-cache:/root/.cache/doctr \`
			`doctr-api:gpu`
			```

			`## Files`

			`\| File \| Description \|`
			`\|------\|-------------\|`
			\| `doctr_tuning_rest.py` \| FastAPI REST service with 9 tunable hyperparameters \|
			\| `dataset_manager.py` \| Dataset loader (shared with other services) \|
			\| `Dockerfile` \| CPU-only image (amd64 + arm64) \|
			\| `Dockerfile.gpu` \| GPU/CUDA image (amd64 + arm64) \|
			\| `requirements.txt` \| Python dependencies \|

			`## API Endpoints`

			### `GET /health`

			`Check if service is ready.`

			```json
			`{`
			`"status": "ok",`
			`"model_loaded": true,`
			`"dataset_loaded": true,`
			`"dataset_size": 24,`
			`"det_arch": "db_resnet50",`
			`"reco_arch": "crnn_vgg16_bn",`
			`"cuda_available": true,`
			`"device": "cuda",`
			`"gpu_name": "NVIDIA GB10"`
			`}`
			```

			### `POST /evaluate`

			`Run OCR evaluation with given hyperparameters.`

			`Request (9 tunable parameters):`
			```json
			`{`
			`"pdf_folder": "/app/dataset",`
			`"assume_straight_pages": true,`
			`"straighten_pages": false,`
			`"preserve_aspect_ratio": true,`
			`"symmetric_pad": true,`
			`"disable_page_orientation": false,`
			`"disable_crop_orientation": false,`
			`"resolve_lines": true,`
			`"resolve_blocks": false,`
			`"paragraph_break": 0.035,`
			`"start_page": 5,`
			`"end_page": 10`
			`}`
			```

			`Response:`
			```json
			`{`
			`"CER": 0.0189,`
			`"WER": 0.1023,`
			`"TIME": 52.3,`
			`"PAGES": 5,`
			`"TIME_PER_PAGE": 10.46,`
			`"model_reinitialized": false`
			`}`
			```

			Note: `model_reinitialized` indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).

			`## Debug Output (debugset)`

			The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`.

			`### Enable Debug Output`

			```json
			`{`
			`"pdf_folder": "/app/dataset",`
			`"save_output": true,`
			`"start_page": 5,`
			`"end_page": 10`
			`}`
			```

			`### Output Structure`

			```
			`debugset/`
			`├── doc1/`
			`│ └── doctr/`
			`│ ├── page_0005.txt`
			`│ ├── page_0006.txt`
			`│ └── ...`
			`├── doc2/`
			`│ └── doctr/`
			`│ └── ...`
			```

			Each `.txt` file contains the OCR-extracted text for that page.

			`### Docker Mount`

			`Add the debugset volume to your docker run command:`

			```bash
			`docker run -d -p 8003:8000 \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`-v $(pwd)/../debugset:/app/debugset:rw \`
			`-v doctr-cache:/root/.cache/doctr \`
			`doctr-api:cpu`
			```

			`### Use Cases`

			- Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results
			`- Debug hyperparameters: See how different settings affect text extraction`
			`- Ground truth comparison: Compare predictions against expected output`

			`## Hyperparameters`

			`### Processing Flags (Require Model Reinitialization)`

			`\| Parameter \| Default \| Description \|`
			`\|-----------\|---------\|-------------\|`
			\| `assume_straight_pages` \| true \| Skip rotation handling for straight documents \|
			\| `straighten_pages` \| false \| Pre-straighten pages before detection \|
			\| `preserve_aspect_ratio` \| true \| Maintain document proportions during resize \|
			\| `symmetric_pad` \| true \| Use symmetric padding when preserving aspect ratio \|

			`Note: Changing these flags requires model reinitialization (~2-5s).`

			`### Orientation Flags`

			`\| Parameter \| Default \| Description \|`
			`\|-----------\|---------\|-------------\|`
			\| `disable_page_orientation` \| false \| Skip page orientation classification \|
			\| `disable_crop_orientation` \| false \| Skip crop orientation detection \|

			`### Output Grouping`

			`\| Parameter \| Default \| Range \| Description \|`
			`\|-----------\|---------\|-------\|-------------\|`
			\| `resolve_lines` \| true \| bool \| Group words into lines \|
			\| `resolve_blocks` \| false \| bool \| Group lines into blocks \|
			\| `paragraph_break` \| 0.035 \| 0.0-1.0 \| Minimum space ratio separating paragraphs \|

			`## Model Architecture`

			`DocTR uses a two-stage pipeline:`

			1. Detection (`det_arch`): Localizes text regions
			- Default: `db_resnet50` (DBNet with ResNet-50 backbone)
			- Alternatives: `linknet_resnet18`, `db_mobilenet_v3_large`

			2. Recognition (`reco_arch`): Recognizes characters
			- Default: `crnn_vgg16_bn` (CRNN with VGG-16 backbone)
			- Alternatives: `sar_resnet31`, `master`, `vitstr_small`

			`Architecture is set via environment variables (fixed at startup).`

			`## GPU Support`

			`### Platform Support`

			`\| Platform \| CPU \| GPU \|`
			`\|----------\|-----\|-----\|`
			`\| Linux x86_64 (amd64) \| ✅ \| ✅ PyTorch CUDA \|`
			`\| Linux ARM64 (GH200/GB200/DGX Spark) \| ✅ \| ✅ PyTorch CUDA (cu128 index) \|`
			`\| macOS ARM64 (M1/M2) \| ✅ \| ❌ \|`

			`### PyTorch CUDA on ARM64`

			`Unlike PaddlePaddle, PyTorch provides official ARM64 CUDA wheels on the cu128 index:`

			```bash
			`pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128`
			```

			`This works on both amd64 and arm64 platforms with CUDA support.`

			`### GPU Detection`

			`DocTR automatically uses GPU when available:`

			```python
			`import torch`
			`print(torch.cuda.is_available()) # True if GPU available`

			`# DocTR model moves to GPU`
			`model = ocr_predictor(pretrained=True)`
			`if torch.cuda.is_available():`
			`model = model.cuda()`
			```

			The `/health` endpoint shows GPU status:
			```json
			`{`
			`"cuda_available": true,`
			`"device": "cuda",`
			`"gpu_name": "NVIDIA GB10",`
			`"gpu_memory_total": "128.00 GB"`
			`}`
			```

			`## Environment Variables`

			`\| Variable \| Default \| Description \|`
			`\|----------\|---------\|-------------\|`
			\| `DOCTR_DET_ARCH` \| `db_resnet50` \| Detection architecture \|
			\| `DOCTR_RECO_ARCH` \| `crnn_vgg16_bn` \| Recognition architecture \|
			\| `CUDA_VISIBLE_DEVICES` \| `0` \| GPU device selection \|

			`## CI/CD`

			`Built images available from registry:`

			`\| Image \| Architecture \|`
			`\|-------\|--------------\|`
			\| `seryus.ddns.net/unir/doctr-cpu:latest` \| amd64, arm64 \|
			\| `seryus.ddns.net/unir/doctr-gpu:latest` \| amd64, arm64 \|

			`## Sources`

			`- [DocTR Documentation](https://mindee.github.io/doctr/)`
			`- [DocTR GitHub](https://github.com/mindee/doctr)`
			`- [DocTR Model Usage](https://mindee.github.io/doctr/latest/using_doctr/using_models.html)`
			`- [PyTorch ARM64 CUDA Wheels](https://github.com/pytorch/pytorch/issues/160162)`