2026-01-17 10:24:00 +01:00
# PaddleOCR Tuning REST API
REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
## Quick Start with Docker Compose
Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
- `ocr-cpu` - CPU-only version (works everywhere)
- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)
### Run CPU Version
```bash
cd src/paddle_ocr
# Build and start (first time takes ~2-3 min to build, ~30s to load model)
docker compose up ocr-cpu
# Or run in background (detached)
docker compose up -d ocr-cpu
# View logs
docker compose logs -f ocr-cpu
# Stop
docker compose down
```
### Run GPU Version
```bash
# Requires: NVIDIA GPU + nvidia-container-toolkit installed
docker compose up ocr-gpu
```
### Test the API
Once running, test with:
```bash
# Check health
curl http://localhost:8000/health
# Or use the test script
pip install requests
python test.py --url http://localhost:8000
```
### What Docker Compose Does
```
docker compose up ocr-cpu
│
├─► Builds image from Dockerfile.cpu (if not exists)
├─► Creates container "paddle-ocr-cpu"
├─► Mounts ../dataset → /app/dataset (your PDF images)
├─► Mounts paddlex-cache volume (persists downloaded models)
├─► Exposes port 8000
└─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000
```
## Files
| File | Description |
|------|-------------|
| `paddle_ocr_tuning_rest.py` | FastAPI REST service |
| `dataset_manager.py` | Dataset loader |
| `test.py` | API test client |
| `Dockerfile.cpu` | CPU-only image (multi-arch) |
2026-01-17 10:46:36 +01:00
| `Dockerfile.gpu` | GPU/CUDA image (x86_64 + ARM64 with local wheel) |
| `Dockerfile.build-paddle` | PaddlePaddle GPU wheel builder for ARM64 |
2026-01-17 10:24:00 +01:00
| `docker-compose.yml` | Service orchestration |
2026-01-17 10:46:36 +01:00
| `wheels/` | Local PaddlePaddle wheels (created by build-paddle) |
2026-01-17 10:24:00 +01:00
## API Endpoints
### `GET /health`
Check if service is ready.
```json
{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}
```
### `POST /evaluate`
Run OCR evaluation with given hyperparameters.
**Request:**
```json
{
"pdf_folder": "/app/dataset",
"textline_orientation": true,
"use_doc_orientation_classify": false,
"use_doc_unwarping": false,
"text_det_thresh": 0.469,
"text_det_box_thresh": 0.5412,
"text_det_unclip_ratio": 0.0,
"text_rec_score_thresh": 0.635,
"start_page": 5,
"end_page": 10
}
```
**Response:**
```json
{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}
```
### `POST /evaluate_full`
Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).
## Building Images
### CPU Image (Multi-Architecture)
```bash
# Local build (current architecture)
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
# Multi-arch build with buildx (amd64 + arm64)
docker buildx create --name multiarch --use
docker buildx build -f Dockerfile.cpu \
--platform linux/amd64,linux/arm64 \
-t paddle-ocr-api:cpu \
--push .
```
### GPU Image (x86_64 only)
```bash
docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .
```
2026-01-17 11:10:13 +01:00
> **Note:** PaddlePaddle GPU 3.x packages are **not on PyPI**. The Dockerfile installs from PaddlePaddle's official CUDA index (`paddlepaddle.org.cn/packages/stable/cu126/`). This is handled automatically during build.
2026-01-17 10:24:00 +01:00
## Running
### CPU (Any machine)
```bash
docker run -d -p 8000:8000 \
-v $(pwd)/../dataset:/app/dataset:ro \
-v paddlex-cache:/root/.paddlex \
paddle-ocr-api:cpu
```
### GPU (NVIDIA)
```bash
docker run -d -p 8000:8000 --gpus all \
-v $(pwd)/../dataset:/app/dataset:ro \
-v paddlex-cache:/root/.paddlex \
paddle-ocr-api:gpu
```
2026-01-17 10:46:36 +01:00
## GPU Support Analysis
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
### Host System Reference (DGX Spark)
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
This section documents GPU support findings based on testing on an NVIDIA DGX Spark:
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
| Component | Value |
|-----------|-------|
| Architecture | ARM64 (aarch64) |
| CPU | NVIDIA Grace (ARM) |
| GPU | NVIDIA GB10 |
| CUDA Version | 13.0 |
| Driver | 580.95.05 |
| OS | Ubuntu 24.04 LTS |
| Container Toolkit | nvidia-container-toolkit 1.18.1 |
| Docker | 28.5.1 |
| Docker Compose | v2.40.0 |
### PaddlePaddle GPU Platform Support
**Critical Finding:** PaddlePaddle-GPU does **NOT ** support ARM64/aarch64 architecture.
| Platform | CPU | GPU |
|----------|-----|-----|
| Linux x86_64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
| Windows x64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
| macOS x64 | ✅ | ❌ |
| macOS ARM64 (M1/M2) | ✅ | ❌ |
| Linux ARM64 (Jetson/DGX) | ✅ | ❌ No wheels |
**Source:** [PaddlePaddle-GPU PyPI ](https://pypi.org/project/paddlepaddle-gpu/ ) - only `manylinux_x86_64` and `win_amd64` wheels available.
### Why GPU Doesn't Work on ARM64
1. **No prebuilt wheels ** : `pip install paddlepaddle-gpu` fails on ARM64 - no compatible wheels exist
2. **Not a CUDA issue ** : The NVIDIA CUDA base images work fine on ARM64 (`nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04` )
3. **Not a container toolkit issue ** : `nvidia-container-toolkit` is installed and functional
4. **PaddlePaddle limitation ** : The Paddle team hasn't compiled GPU wheels for ARM64
When you run `pip install paddlepaddle-gpu` on ARM64:
```
ERROR: No matching distribution found for paddlepaddle-gpu
```
### Options for ARM64 Systems
#### Option 1: CPU-Only (Recommended)
Use `Dockerfile.cpu` which works on ARM64:
2026-01-17 10:24:00 +01:00
```bash
2026-01-17 10:46:36 +01:00
# On DGX Spark
docker compose up ocr-cpu
# Or build directly
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
2026-01-17 10:24:00 +01:00
```
2026-01-17 10:46:36 +01:00
**Performance:** CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.
#### Option 2: Build PaddlePaddle from Source (Docker-based)
Use the included Docker builder to compile PaddlePaddle GPU for ARM64:
```bash
cd src/paddle_ocr
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)
docker compose --profile build run --rm build-paddle
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
# Verify wheel was created
ls -la wheels/paddlepaddle*.whl
# Step 2: Build the GPU image (uses local wheel)
docker compose build ocr-gpu
# Step 3: Run with GPU
docker compose up ocr-gpu
# Verify GPU is working
docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"
2026-01-17 10:24:00 +01:00
```
2026-01-17 10:46:36 +01:00
**What this does:**
1. `build-paddle` compiles PaddlePaddle from source inside a CUDA container
2. The wheel is saved to `./wheels/` directory
3. `Dockerfile.gpu` detects the local wheel and uses it instead of PyPI
**Caveats:**
- Build takes 2-4 hours on first run
- Requires ~20GB disk space during build
- Not officially supported by PaddlePaddle team
- May need adjustments for future PaddlePaddle versions
See: [GitHub Issue #17327 ](https://github.com/PaddlePaddle/PaddleOCR/issues/17327 )
#### Option 3: Alternative OCR Engines
For ARM64 GPU acceleration, consider alternatives:
| Engine | ARM64 GPU | Notes |
|--------|-----------|-------|
| **Tesseract ** | ❌ CPU-only | Good fallback, widely available |
| **EasyOCR ** | ⚠️ Via PyTorch | PyTorch has ARM64 GPU support |
| **TrOCR ** | ⚠️ Via Transformers | Hugging Face Transformers + PyTorch |
| **docTR ** | ⚠️ Via TensorFlow/PyTorch | Both backends have ARM64 support |
EasyOCR with PyTorch is a viable alternative:
2026-01-17 10:24:00 +01:00
```bash
2026-01-17 10:46:36 +01:00
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install easyocr
2026-01-17 10:24:00 +01:00
```
2026-01-17 10:46:36 +01:00
### x86_64 GPU Setup (Working)
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
For x86_64 systems with NVIDIA GPU, the GPU Docker works:
2026-01-17 10:24:00 +01:00
```bash
2026-01-17 10:46:36 +01:00
# Verify GPU is accessible
nvidia-smi
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# Build and run GPU version
docker compose up ocr-gpu
```
### GPU Docker Compose Configuration
The `docker-compose.yml` configures GPU access via:
```yaml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
2026-01-17 10:24:00 +01:00
```
2026-01-17 10:46:36 +01:00
This requires Docker Compose v2 and nvidia-container-toolkit.
## DGX Spark / ARM64 Quick Start
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:
2026-01-17 10:24:00 +01:00
2026-01-17 10:46:36 +01:00
```bash
cd src/paddle_ocr
# Build ARM64-native CPU image
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .
# Run
docker run -d -p 8000:8000 \
-v $(pwd)/../dataset:/app/dataset:ro \
paddle-ocr-api:arm64
# Test
curl http://localhost:8000/health
```
### Cross-Compile from x86_64
Build ARM64 images from an x86_64 machine:
2026-01-17 10:24:00 +01:00
```bash
# Setup buildx for multi-arch
docker buildx create --name mybuilder --use
# Build ARM64 image from x86_64 machine
docker buildx build -f Dockerfile.cpu \
--platform linux/arm64 \
-t paddle-ocr-api:arm64 \
--load .
# Save and transfer to DGX Spark
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
scp paddle-ocr-arm64.tar.gz dgx-spark:~/
2026-01-17 10:46:36 +01:00
2026-01-17 10:24:00 +01:00
# On DGX Spark:
docker load < paddle-ocr-arm64.tar.gz
```
## Using with Ray Tune
Update your notebook's `trainable_paddle_ocr` function:
```python
import requests
API_URL = "http://localhost:8000/evaluate"
def trainable_paddle_ocr(config):
"""Call OCR API instead of subprocess."""
payload = {
"pdf_folder": "/app/dataset",
"use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
"use_doc_unwarping": config.get("use_doc_unwarping", False),
"textline_orientation": config.get("textline_orientation", True),
"text_det_thresh": config.get("text_det_thresh", 0.0),
"text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
"text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
"text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
}
try:
response = requests.post(API_URL, json=payload, timeout=600)
response.raise_for_status()
metrics = response.json()
tune.report(metrics=metrics)
except Exception as e:
tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})
```
## Architecture: Model Lifecycle
The model is loaded **once ** at container startup and stays in memory for all requests:
```mermaid
flowchart TB
subgraph Container["Docker Container Lifecycle"]
Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
Load --> Ready[API Ready<br/>Models in RAM ~500MB]
subgraph Requests["Incoming Requests - Models Stay Loaded"]
Ready --> R1[Request 1] --> Ready
Ready --> R2[Request 2] --> Ready
Ready --> RN[Request N...] --> Ready
end
Ready --> Stop([Container Stop])
Stop --> Free[Models Freed]
end
style Load fill:#f9f ,stroke:#333
style Ready fill:#9f9 ,stroke:#333
style Requests fill:#e8f4ea ,stroke:#090
```
**Subprocess vs REST API comparison:**
```mermaid
flowchart LR
subgraph Subprocess["❌ Subprocess Approach"]
direction TB
S1[Trial 1] --> L1[Load Model ~10s]
L1 --> E1[Evaluate ~60s]
E1 --> U1[Unload]
U1 --> S2[Trial 2]
S2 --> L2[Load Model ~10s]
L2 --> E2[Evaluate ~60s]
end
subgraph REST["✅ REST API Approach"]
direction TB
Start2[Start Container] --> Load2[Load Model ~10s]
Load2 --> Ready2[Model in Memory]
Ready2 --> T1[Trial 1 ~60s]
T1 --> Ready2
Ready2 --> T2[Trial 2 ~60s]
T2 --> Ready2
Ready2 --> TN[Trial N ~60s]
end
style L1 fill:#faa
style L2 fill:#faa
style Load2 fill:#afa
style Ready2 fill:#afa
```
## Performance Comparison
| Approach | Model Load | Per-Trial Overhead | 64 Trials |
|----------|------------|-------------------|-----------|
| Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours |
| Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours |
| **REST API ** | **Once ** | * * ~0.1s** | * * ~5.8 hours** |
The REST API saves ~1+ hour by loading the model only once.
## Troubleshooting
### Model download slow on first run
The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.
### Out of memory
Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
```bash
docker run --memory=8g ...
```
### GPU not detected
Ensure NVIDIA Container Toolkit is installed:
```bash
nvidia-smi # Should work
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work
```
2026-01-17 11:10:13 +01:00
### PaddlePaddle GPU installation fails
PaddlePaddle 3.x GPU packages are **not available on PyPI ** . They must be installed from PaddlePaddle's official index:
```bash
# For CUDA 12.x
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# For CUDA 11.8
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
The Dockerfile.gpu handles this automatically.