Paddle ocr, easyicr and doctr gpu support. (#4)
All checks were successful
build_docker / essential (push) Successful in 0s
build_docker / build_cpu (push) Successful in 5m0s
build_docker / build_gpu (push) Successful in 22m55s
build_docker / build_easyocr (push) Successful in 18m47s
build_docker / build_easyocr_gpu (push) Successful in 19m0s
build_docker / build_raytune (push) Successful in 3m27s
build_docker / build_doctr (push) Successful in 19m42s
build_docker / build_doctr_gpu (push) Successful in 14m49s
All checks were successful
build_docker / essential (push) Successful in 0s
build_docker / build_cpu (push) Successful in 5m0s
build_docker / build_gpu (push) Successful in 22m55s
build_docker / build_easyocr (push) Successful in 18m47s
build_docker / build_easyocr_gpu (push) Successful in 19m0s
build_docker / build_raytune (push) Successful in 3m27s
build_docker / build_doctr (push) Successful in 19m42s
build_docker / build_doctr_gpu (push) Successful in 14m49s
This commit was merged in pull request #4.
This commit is contained in:
261
src/doctr_service/README.md
Normal file
261
src/doctr_service/README.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# DocTR Tuning REST API
|
||||
|
||||
REST API service for DocTR (Document Text Recognition) hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### CPU Version
|
||||
|
||||
```bash
|
||||
cd src/doctr_service
|
||||
|
||||
# Build
|
||||
docker build -t doctr-api:cpu .
|
||||
|
||||
# Run
|
||||
docker run -d -p 8003:8000 \
|
||||
-v $(pwd)/../dataset:/app/dataset:ro \
|
||||
-v doctr-cache:/root/.cache/doctr \
|
||||
doctr-api:cpu
|
||||
|
||||
# Test
|
||||
curl http://localhost:8003/health
|
||||
```
|
||||
|
||||
### GPU Version
|
||||
|
||||
```bash
|
||||
# Build GPU image
|
||||
docker build -f Dockerfile.gpu -t doctr-api:gpu .
|
||||
|
||||
# Run with GPU
|
||||
docker run -d -p 8003:8000 --gpus all \
|
||||
-v $(pwd)/../dataset:/app/dataset:ro \
|
||||
-v doctr-cache:/root/.cache/doctr \
|
||||
doctr-api:gpu
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `doctr_tuning_rest.py` | FastAPI REST service with 9 tunable hyperparameters |
|
||||
| `dataset_manager.py` | Dataset loader (shared with other services) |
|
||||
| `Dockerfile` | CPU-only image (amd64 + arm64) |
|
||||
| `Dockerfile.gpu` | GPU/CUDA image (amd64 + arm64) |
|
||||
| `requirements.txt` | Python dependencies |
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### `GET /health`
|
||||
|
||||
Check if service is ready.
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"model_loaded": true,
|
||||
"dataset_loaded": true,
|
||||
"dataset_size": 24,
|
||||
"det_arch": "db_resnet50",
|
||||
"reco_arch": "crnn_vgg16_bn",
|
||||
"cuda_available": true,
|
||||
"device": "cuda",
|
||||
"gpu_name": "NVIDIA GB10"
|
||||
}
|
||||
```
|
||||
|
||||
### `POST /evaluate`
|
||||
|
||||
Run OCR evaluation with given hyperparameters.
|
||||
|
||||
**Request (9 tunable parameters):**
|
||||
```json
|
||||
{
|
||||
"pdf_folder": "/app/dataset",
|
||||
"assume_straight_pages": true,
|
||||
"straighten_pages": false,
|
||||
"preserve_aspect_ratio": true,
|
||||
"symmetric_pad": true,
|
||||
"disable_page_orientation": false,
|
||||
"disable_crop_orientation": false,
|
||||
"resolve_lines": true,
|
||||
"resolve_blocks": false,
|
||||
"paragraph_break": 0.035,
|
||||
"start_page": 5,
|
||||
"end_page": 10
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"CER": 0.0189,
|
||||
"WER": 0.1023,
|
||||
"TIME": 52.3,
|
||||
"PAGES": 5,
|
||||
"TIME_PER_PAGE": 10.46,
|
||||
"model_reinitialized": false
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** `model_reinitialized` indicates if the model was reloaded due to changed processing flags (adds ~2-5s overhead).
|
||||
|
||||
## Debug Output (debugset)
|
||||
|
||||
The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`.
|
||||
|
||||
### Enable Debug Output
|
||||
|
||||
```json
|
||||
{
|
||||
"pdf_folder": "/app/dataset",
|
||||
"save_output": true,
|
||||
"start_page": 5,
|
||||
"end_page": 10
|
||||
}
|
||||
```
|
||||
|
||||
### Output Structure
|
||||
|
||||
```
|
||||
debugset/
|
||||
├── doc1/
|
||||
│ └── doctr/
|
||||
│ ├── page_0005.txt
|
||||
│ ├── page_0006.txt
|
||||
│ └── ...
|
||||
├── doc2/
|
||||
│ └── doctr/
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
Each `.txt` file contains the OCR-extracted text for that page.
|
||||
|
||||
### Docker Mount
|
||||
|
||||
Add the debugset volume to your docker run command:
|
||||
|
||||
```bash
|
||||
docker run -d -p 8003:8000 \
|
||||
-v $(pwd)/../dataset:/app/dataset:ro \
|
||||
-v $(pwd)/../debugset:/app/debugset:rw \
|
||||
-v doctr-cache:/root/.cache/doctr \
|
||||
doctr-api:cpu
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results
|
||||
- **Debug hyperparameters**: See how different settings affect text extraction
|
||||
- **Ground truth comparison**: Compare predictions against expected output
|
||||
|
||||
## Hyperparameters
|
||||
|
||||
### Processing Flags (Require Model Reinitialization)
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `assume_straight_pages` | true | Skip rotation handling for straight documents |
|
||||
| `straighten_pages` | false | Pre-straighten pages before detection |
|
||||
| `preserve_aspect_ratio` | true | Maintain document proportions during resize |
|
||||
| `symmetric_pad` | true | Use symmetric padding when preserving aspect ratio |
|
||||
|
||||
**Note:** Changing these flags requires model reinitialization (~2-5s).
|
||||
|
||||
### Orientation Flags
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `disable_page_orientation` | false | Skip page orientation classification |
|
||||
| `disable_crop_orientation` | false | Skip crop orientation detection |
|
||||
|
||||
### Output Grouping
|
||||
|
||||
| Parameter | Default | Range | Description |
|
||||
|-----------|---------|-------|-------------|
|
||||
| `resolve_lines` | true | bool | Group words into lines |
|
||||
| `resolve_blocks` | false | bool | Group lines into blocks |
|
||||
| `paragraph_break` | 0.035 | 0.0-1.0 | Minimum space ratio separating paragraphs |
|
||||
|
||||
## Model Architecture
|
||||
|
||||
DocTR uses a two-stage pipeline:
|
||||
|
||||
1. **Detection** (`det_arch`): Localizes text regions
|
||||
- Default: `db_resnet50` (DBNet with ResNet-50 backbone)
|
||||
- Alternatives: `linknet_resnet18`, `db_mobilenet_v3_large`
|
||||
|
||||
2. **Recognition** (`reco_arch`): Recognizes characters
|
||||
- Default: `crnn_vgg16_bn` (CRNN with VGG-16 backbone)
|
||||
- Alternatives: `sar_resnet31`, `master`, `vitstr_small`
|
||||
|
||||
Architecture is set via environment variables (fixed at startup).
|
||||
|
||||
## GPU Support
|
||||
|
||||
### Platform Support
|
||||
|
||||
| Platform | CPU | GPU |
|
||||
|----------|-----|-----|
|
||||
| Linux x86_64 (amd64) | ✅ | ✅ PyTorch CUDA |
|
||||
| Linux ARM64 (GH200/GB200/DGX Spark) | ✅ | ✅ PyTorch CUDA (cu128 index) |
|
||||
| macOS ARM64 (M1/M2) | ✅ | ❌ |
|
||||
|
||||
### PyTorch CUDA on ARM64
|
||||
|
||||
Unlike PaddlePaddle, PyTorch provides **official ARM64 CUDA wheels** on the cu128 index:
|
||||
|
||||
```bash
|
||||
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
|
||||
```
|
||||
|
||||
This works on both amd64 and arm64 platforms with CUDA support.
|
||||
|
||||
### GPU Detection
|
||||
|
||||
DocTR automatically uses GPU when available:
|
||||
|
||||
```python
|
||||
import torch
|
||||
print(torch.cuda.is_available()) # True if GPU available
|
||||
|
||||
# DocTR model moves to GPU
|
||||
model = ocr_predictor(pretrained=True)
|
||||
if torch.cuda.is_available():
|
||||
model = model.cuda()
|
||||
```
|
||||
|
||||
The `/health` endpoint shows GPU status:
|
||||
```json
|
||||
{
|
||||
"cuda_available": true,
|
||||
"device": "cuda",
|
||||
"gpu_name": "NVIDIA GB10",
|
||||
"gpu_memory_total": "128.00 GB"
|
||||
}
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `DOCTR_DET_ARCH` | `db_resnet50` | Detection architecture |
|
||||
| `DOCTR_RECO_ARCH` | `crnn_vgg16_bn` | Recognition architecture |
|
||||
| `CUDA_VISIBLE_DEVICES` | `0` | GPU device selection |
|
||||
|
||||
## CI/CD
|
||||
|
||||
Built images available from registry:
|
||||
|
||||
| Image | Architecture |
|
||||
|-------|--------------|
|
||||
| `seryus.ddns.net/unir/doctr-cpu:latest` | amd64, arm64 |
|
||||
| `seryus.ddns.net/unir/doctr-gpu:latest` | amd64, arm64 |
|
||||
|
||||
## Sources
|
||||
|
||||
- [DocTR Documentation](https://mindee.github.io/doctr/)
|
||||
- [DocTR GitHub](https://github.com/mindee/doctr)
|
||||
- [DocTR Model Usage](https://mindee.github.io/doctr/latest/using_doctr/using_models.html)
|
||||
- [PyTorch ARM64 CUDA Wheels](https://github.com/pytorch/pytorch/issues/160162)
|
||||
Reference in New Issue
Block a user