330 lines
8.8 KiB
Markdown
330 lines
8.8 KiB
Markdown
|
|
# PaddleOCR Tuning REST API
|
||
|
|
|
||
|
|
REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
|
||
|
|
|
||
|
|
## Quick Start with Docker Compose
|
||
|
|
|
||
|
|
Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
|
||
|
|
- `ocr-cpu` - CPU-only version (works everywhere)
|
||
|
|
- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)
|
||
|
|
|
||
|
|
### Run CPU Version
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Build and start (first time takes ~2-3 min to build, ~30s to load model)
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
|
||
|
|
# Or run in background (detached)
|
||
|
|
docker compose up -d ocr-cpu
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
docker compose logs -f ocr-cpu
|
||
|
|
|
||
|
|
# Stop
|
||
|
|
docker compose down
|
||
|
|
```
|
||
|
|
|
||
|
|
### Run GPU Version
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Requires: NVIDIA GPU + nvidia-container-toolkit installed
|
||
|
|
docker compose up ocr-gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test the API
|
||
|
|
|
||
|
|
Once running, test with:
|
||
|
|
```bash
|
||
|
|
# Check health
|
||
|
|
curl http://localhost:8000/health
|
||
|
|
|
||
|
|
# Or use the test script
|
||
|
|
pip install requests
|
||
|
|
python test.py --url http://localhost:8000
|
||
|
|
```
|
||
|
|
|
||
|
|
### What Docker Compose Does
|
||
|
|
|
||
|
|
```
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
│
|
||
|
|
├─► Builds image from Dockerfile.cpu (if not exists)
|
||
|
|
├─► Creates container "paddle-ocr-cpu"
|
||
|
|
├─► Mounts ../dataset → /app/dataset (your PDF images)
|
||
|
|
├─► Mounts paddlex-cache volume (persists downloaded models)
|
||
|
|
├─► Exposes port 8000
|
||
|
|
└─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000
|
||
|
|
```
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
| File | Description |
|
||
|
|
|------|-------------|
|
||
|
|
| `paddle_ocr_tuning_rest.py` | FastAPI REST service |
|
||
|
|
| `dataset_manager.py` | Dataset loader |
|
||
|
|
| `test.py` | API test client |
|
||
|
|
| `Dockerfile.cpu` | CPU-only image (multi-arch) |
|
||
|
|
| `Dockerfile.gpu` | GPU/CUDA image (x86_64) |
|
||
|
|
| `docker-compose.yml` | Service orchestration |
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### `GET /health`
|
||
|
|
Check if service is ready.
|
||
|
|
|
||
|
|
```json
|
||
|
|
{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}
|
||
|
|
```
|
||
|
|
|
||
|
|
### `POST /evaluate`
|
||
|
|
Run OCR evaluation with given hyperparameters.
|
||
|
|
|
||
|
|
**Request:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"pdf_folder": "/app/dataset",
|
||
|
|
"textline_orientation": true,
|
||
|
|
"use_doc_orientation_classify": false,
|
||
|
|
"use_doc_unwarping": false,
|
||
|
|
"text_det_thresh": 0.469,
|
||
|
|
"text_det_box_thresh": 0.5412,
|
||
|
|
"text_det_unclip_ratio": 0.0,
|
||
|
|
"text_rec_score_thresh": 0.635,
|
||
|
|
"start_page": 5,
|
||
|
|
"end_page": 10
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response:**
|
||
|
|
```json
|
||
|
|
{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}
|
||
|
|
```
|
||
|
|
|
||
|
|
### `POST /evaluate_full`
|
||
|
|
Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).
|
||
|
|
|
||
|
|
## Building Images
|
||
|
|
|
||
|
|
### CPU Image (Multi-Architecture)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Local build (current architecture)
|
||
|
|
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
|
||
|
|
|
||
|
|
# Multi-arch build with buildx (amd64 + arm64)
|
||
|
|
docker buildx create --name multiarch --use
|
||
|
|
docker buildx build -f Dockerfile.cpu \
|
||
|
|
--platform linux/amd64,linux/arm64 \
|
||
|
|
-t paddle-ocr-api:cpu \
|
||
|
|
--push .
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU Image (x86_64 only)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .
|
||
|
|
```
|
||
|
|
|
||
|
|
## Running
|
||
|
|
|
||
|
|
### CPU (Any machine)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker run -d -p 8000:8000 \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
-v paddlex-cache:/root/.paddlex \
|
||
|
|
paddle-ocr-api:cpu
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU (NVIDIA)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker run -d -p 8000:8000 --gpus all \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
-v paddlex-cache:/root/.paddlex \
|
||
|
|
paddle-ocr-api:gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
## DGX Spark (ARM64 + CUDA)
|
||
|
|
|
||
|
|
DGX Spark uses ARM64 (Grace CPU) with NVIDIA Hopper GPU. You have two options:
|
||
|
|
|
||
|
|
### Option 1: Native ARM64 Build (Recommended)
|
||
|
|
|
||
|
|
PaddlePaddle has ARM64 support. Build natively:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On DGX Spark or ARM64 machine
|
||
|
|
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .
|
||
|
|
```
|
||
|
|
|
||
|
|
For GPU acceleration on ARM64, you'll need to modify `Dockerfile.gpu` to use ARM-compatible base image:
|
||
|
|
|
||
|
|
```dockerfile
|
||
|
|
# Change this line in Dockerfile.gpu:
|
||
|
|
FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
|
||
|
|
|
||
|
|
# To ARM64-compatible version:
|
||
|
|
FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
|
||
|
|
# (same image works on ARM64 when pulled on ARM machine)
|
||
|
|
```
|
||
|
|
|
||
|
|
Then build on the DGX Spark:
|
||
|
|
```bash
|
||
|
|
docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu-arm64 .
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option 2: x86_64 Emulation via QEMU (Slow)
|
||
|
|
|
||
|
|
You CAN run x86_64 images on ARM via emulation, but it's ~10-20x slower:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On DGX Spark, enable QEMU emulation
|
||
|
|
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
|
||
|
|
|
||
|
|
# Run x86_64 image with emulation
|
||
|
|
docker run --platform linux/amd64 -p 8000:8000 \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
paddle-ocr-api:cpu
|
||
|
|
```
|
||
|
|
|
||
|
|
**Not recommended** for production due to severe performance penalty.
|
||
|
|
|
||
|
|
### Option 3: Cross-compile from x86_64
|
||
|
|
|
||
|
|
Build ARM64 images from your x86_64 machine:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Setup buildx for multi-arch
|
||
|
|
docker buildx create --name mybuilder --use
|
||
|
|
|
||
|
|
# Build ARM64 image from x86_64 machine
|
||
|
|
docker buildx build -f Dockerfile.cpu \
|
||
|
|
--platform linux/arm64 \
|
||
|
|
-t paddle-ocr-api:arm64 \
|
||
|
|
--load .
|
||
|
|
|
||
|
|
# Save and transfer to DGX Spark
|
||
|
|
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
|
||
|
|
scp paddle-ocr-arm64.tar.gz dgx-spark:~/
|
||
|
|
# On DGX Spark:
|
||
|
|
docker load < paddle-ocr-arm64.tar.gz
|
||
|
|
```
|
||
|
|
|
||
|
|
## Using with Ray Tune
|
||
|
|
|
||
|
|
Update your notebook's `trainable_paddle_ocr` function:
|
||
|
|
|
||
|
|
```python
|
||
|
|
import requests
|
||
|
|
|
||
|
|
API_URL = "http://localhost:8000/evaluate"
|
||
|
|
|
||
|
|
def trainable_paddle_ocr(config):
|
||
|
|
"""Call OCR API instead of subprocess."""
|
||
|
|
payload = {
|
||
|
|
"pdf_folder": "/app/dataset",
|
||
|
|
"use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
|
||
|
|
"use_doc_unwarping": config.get("use_doc_unwarping", False),
|
||
|
|
"textline_orientation": config.get("textline_orientation", True),
|
||
|
|
"text_det_thresh": config.get("text_det_thresh", 0.0),
|
||
|
|
"text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
|
||
|
|
"text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
|
||
|
|
"text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
|
||
|
|
}
|
||
|
|
|
||
|
|
try:
|
||
|
|
response = requests.post(API_URL, json=payload, timeout=600)
|
||
|
|
response.raise_for_status()
|
||
|
|
metrics = response.json()
|
||
|
|
tune.report(metrics=metrics)
|
||
|
|
except Exception as e:
|
||
|
|
tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture: Model Lifecycle
|
||
|
|
|
||
|
|
The model is loaded **once** at container startup and stays in memory for all requests:
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
flowchart TB
|
||
|
|
subgraph Container["Docker Container Lifecycle"]
|
||
|
|
Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
|
||
|
|
Load --> Ready[API Ready<br/>Models in RAM ~500MB]
|
||
|
|
|
||
|
|
subgraph Requests["Incoming Requests - Models Stay Loaded"]
|
||
|
|
Ready --> R1[Request 1] --> Ready
|
||
|
|
Ready --> R2[Request 2] --> Ready
|
||
|
|
Ready --> RN[Request N...] --> Ready
|
||
|
|
end
|
||
|
|
|
||
|
|
Ready --> Stop([Container Stop])
|
||
|
|
Stop --> Free[Models Freed]
|
||
|
|
end
|
||
|
|
|
||
|
|
style Load fill:#f9f,stroke:#333
|
||
|
|
style Ready fill:#9f9,stroke:#333
|
||
|
|
style Requests fill:#e8f4ea,stroke:#090
|
||
|
|
```
|
||
|
|
|
||
|
|
**Subprocess vs REST API comparison:**
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
flowchart LR
|
||
|
|
subgraph Subprocess["❌ Subprocess Approach"]
|
||
|
|
direction TB
|
||
|
|
S1[Trial 1] --> L1[Load Model ~10s]
|
||
|
|
L1 --> E1[Evaluate ~60s]
|
||
|
|
E1 --> U1[Unload]
|
||
|
|
U1 --> S2[Trial 2]
|
||
|
|
S2 --> L2[Load Model ~10s]
|
||
|
|
L2 --> E2[Evaluate ~60s]
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph REST["✅ REST API Approach"]
|
||
|
|
direction TB
|
||
|
|
Start2[Start Container] --> Load2[Load Model ~10s]
|
||
|
|
Load2 --> Ready2[Model in Memory]
|
||
|
|
Ready2 --> T1[Trial 1 ~60s]
|
||
|
|
T1 --> Ready2
|
||
|
|
Ready2 --> T2[Trial 2 ~60s]
|
||
|
|
T2 --> Ready2
|
||
|
|
Ready2 --> TN[Trial N ~60s]
|
||
|
|
end
|
||
|
|
|
||
|
|
style L1 fill:#faa
|
||
|
|
style L2 fill:#faa
|
||
|
|
style Load2 fill:#afa
|
||
|
|
style Ready2 fill:#afa
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Comparison
|
||
|
|
|
||
|
|
| Approach | Model Load | Per-Trial Overhead | 64 Trials |
|
||
|
|
|----------|------------|-------------------|-----------|
|
||
|
|
| Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours |
|
||
|
|
| Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours |
|
||
|
|
| **REST API** | **Once** | **~0.1s** | **~5.8 hours** |
|
||
|
|
|
||
|
|
The REST API saves ~1+ hour by loading the model only once.
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Model download slow on first run
|
||
|
|
The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.
|
||
|
|
|
||
|
|
### Out of memory
|
||
|
|
Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
|
||
|
|
```bash
|
||
|
|
docker run --memory=8g ...
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU not detected
|
||
|
|
Ensure NVIDIA Container Toolkit is installed:
|
||
|
|
```bash
|
||
|
|
nvidia-smi # Should work
|
||
|
|
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work
|
||
|
|
```
|