doceker support

2026-01-17 10:24:00 +01:00
parent 8e2b7a5096
commit c4ab0ffad1
9 changed files with 1004 additions and 0 deletions
--- a/src/paddle_ocr/README.md
+++ b/src/paddle_ocr/README.md
@@ -0,0 +1,329 @@
+# PaddleOCR Tuning REST API
+
+REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
+
+## Quick Start with Docker Compose
+
+Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
+- `ocr-cpu` - CPU-only version (works everywhere)
+- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)
+
+### Run CPU Version
+
+```bash
+cd src/paddle_ocr
+
+# Build and start (first time takes ~2-3 min to build, ~30s to load model)
+docker compose up ocr-cpu
+
+# Or run in background (detached)
+docker compose up -d ocr-cpu
+
+# View logs
+docker compose logs -f ocr-cpu
+
+# Stop
+docker compose down
+```
+
+### Run GPU Version
+
+```bash
+# Requires: NVIDIA GPU + nvidia-container-toolkit installed
+docker compose up ocr-gpu
+```
+
+### Test the API
+
+Once running, test with:
+```bash
+# Check health
+curl http://localhost:8000/health
+
+# Or use the test script
+pip install requests
+python test.py --url http://localhost:8000
+```
+
+### What Docker Compose Does
+
+```
+docker compose up ocr-cpu
+       │
+       ├─► Builds image from Dockerfile.cpu (if not exists)
+       ├─► Creates container "paddle-ocr-cpu"
+       ├─► Mounts ../dataset → /app/dataset (your PDF images)
+       ├─► Mounts paddlex-cache volume (persists downloaded models)
+       ├─► Exposes port 8000
+       └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000
+```
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `paddle_ocr_tuning_rest.py` | FastAPI REST service |
+| `dataset_manager.py` | Dataset loader |
+| `test.py` | API test client |
+| `Dockerfile.cpu` | CPU-only image (multi-arch) |
+| `Dockerfile.gpu` | GPU/CUDA image (x86_64) |
+| `docker-compose.yml` | Service orchestration |
+
+## API Endpoints
+
+### `GET /health`
+Check if service is ready.
+
+```json
+{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}
+```
+
+### `POST /evaluate`
+Run OCR evaluation with given hyperparameters.
+
+**Request:**
+```json
+{
+  "pdf_folder": "/app/dataset",
+  "textline_orientation": true,
+  "use_doc_orientation_classify": false,
+  "use_doc_unwarping": false,
+  "text_det_thresh": 0.469,
+  "text_det_box_thresh": 0.5412,
+  "text_det_unclip_ratio": 0.0,
+  "text_rec_score_thresh": 0.635,
+  "start_page": 5,
+  "end_page": 10
+}
+```
+
+**Response:**
+```json
+{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}
+```
+
+### `POST /evaluate_full`
+Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).
+
+## Building Images
+
+### CPU Image (Multi-Architecture)
+
+```bash
+# Local build (current architecture)
+docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
+
+# Multi-arch build with buildx (amd64 + arm64)
+docker buildx create --name multiarch --use
+docker buildx build -f Dockerfile.cpu \
+  --platform linux/amd64,linux/arm64 \
+  -t paddle-ocr-api:cpu \
+  --push .
+```
+
+### GPU Image (x86_64 only)
+
+```bash
+docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .
+```
+
+## Running
+
+### CPU (Any machine)
+
+```bash
+docker run -d -p 8000:8000 \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  -v paddlex-cache:/root/.paddlex \
+  paddle-ocr-api:cpu
+```
+
+### GPU (NVIDIA)
+
+```bash
+docker run -d -p 8000:8000 --gpus all \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  -v paddlex-cache:/root/.paddlex \
+  paddle-ocr-api:gpu
+```
+
+## DGX Spark (ARM64 + CUDA)
+
+DGX Spark uses ARM64 (Grace CPU) with NVIDIA Hopper GPU. You have two options:
+
+### Option 1: Native ARM64 Build (Recommended)
+
+PaddlePaddle has ARM64 support. Build natively:
+
+```bash
+# On DGX Spark or ARM64 machine
+docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .
+```
+
+For GPU acceleration on ARM64, you'll need to modify `Dockerfile.gpu` to use ARM-compatible base image:
+
+```dockerfile
+# Change this line in Dockerfile.gpu:
+FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
+
+# To ARM64-compatible version:
+FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
+# (same image works on ARM64 when pulled on ARM machine)
+```
+
+Then build on the DGX Spark:
+```bash
+docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu-arm64 .
+```
+
+### Option 2: x86_64 Emulation via QEMU (Slow)
+
+You CAN run x86_64 images on ARM via emulation, but it's ~10-20x slower:
+
+```bash
+# On DGX Spark, enable QEMU emulation
+docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
+
+# Run x86_64 image with emulation
+docker run --platform linux/amd64 -p 8000:8000 \
+  -v $(pwd)/../dataset:/app/dataset:ro \
+  paddle-ocr-api:cpu
+```
+
+**Not recommended** for production due to severe performance penalty.
+
+### Option 3: Cross-compile from x86_64
+
+Build ARM64 images from your x86_64 machine:
+
+```bash
+# Setup buildx for multi-arch
+docker buildx create --name mybuilder --use
+
+# Build ARM64 image from x86_64 machine
+docker buildx build -f Dockerfile.cpu \
+  --platform linux/arm64 \
+  -t paddle-ocr-api:arm64 \
+  --load .
+
+# Save and transfer to DGX Spark
+docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
+scp paddle-ocr-arm64.tar.gz dgx-spark:~/
+# On DGX Spark:
+docker load < paddle-ocr-arm64.tar.gz
+```
+
+## Using with Ray Tune
+
+Update your notebook's `trainable_paddle_ocr` function:
+
+```python
+import requests
+
+API_URL = "http://localhost:8000/evaluate"
+
+def trainable_paddle_ocr(config):
+    """Call OCR API instead of subprocess."""
+    payload = {
+        "pdf_folder": "/app/dataset",
+        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
+        "use_doc_unwarping": config.get("use_doc_unwarping", False),
+        "textline_orientation": config.get("textline_orientation", True),
+        "text_det_thresh": config.get("text_det_thresh", 0.0),
+        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
+        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
+        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
+    }
+
+    try:
+        response = requests.post(API_URL, json=payload, timeout=600)
+        response.raise_for_status()
+        metrics = response.json()
+        tune.report(metrics=metrics)
+    except Exception as e:
+        tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})
+```
+
+## Architecture: Model Lifecycle
+
+The model is loaded **once** at container startup and stays in memory for all requests:
+
+```mermaid
+flowchart TB
+    subgraph Container["Docker Container Lifecycle"]
+        Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
+        Load --> Ready[API Ready<br/>Models in RAM ~500MB]
+
+        subgraph Requests["Incoming Requests - Models Stay Loaded"]
+            Ready --> R1[Request 1] --> Ready
+            Ready --> R2[Request 2] --> Ready
+            Ready --> RN[Request N...] --> Ready
+        end
+
+        Ready --> Stop([Container Stop])
+        Stop --> Free[Models Freed]
+    end
+
+    style Load fill:#f9f,stroke:#333
+    style Ready fill:#9f9,stroke:#333
+    style Requests fill:#e8f4ea,stroke:#090
+```
+
+**Subprocess vs REST API comparison:**
+
+```mermaid
+flowchart LR
+    subgraph Subprocess["❌ Subprocess Approach"]
+        direction TB
+        S1[Trial 1] --> L1[Load Model ~10s]
+        L1 --> E1[Evaluate ~60s]
+        E1 --> U1[Unload]
+        U1 --> S2[Trial 2]
+        S2 --> L2[Load Model ~10s]
+        L2 --> E2[Evaluate ~60s]
+    end
+
+    subgraph REST["✅ REST API Approach"]
+        direction TB
+        Start2[Start Container] --> Load2[Load Model ~10s]
+        Load2 --> Ready2[Model in Memory]
+        Ready2 --> T1[Trial 1 ~60s]
+        T1 --> Ready2
+        Ready2 --> T2[Trial 2 ~60s]
+        T2 --> Ready2
+        Ready2 --> TN[Trial N ~60s]
+    end
+
+    style L1 fill:#faa
+    style L2 fill:#faa
+    style Load2 fill:#afa
+    style Ready2 fill:#afa
+```
+
+## Performance Comparison
+
+| Approach | Model Load | Per-Trial Overhead | 64 Trials |
+|----------|------------|-------------------|-----------|
+| Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours |
+| Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours |
+| **REST API** | **Once** | **~0.1s** | **~5.8 hours** |
+
+The REST API saves ~1+ hour by loading the model only once.
+
+## Troubleshooting
+
+### Model download slow on first run
+The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.
+
+### Out of memory
+Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
+```bash
+docker run --memory=8g ...
+```
+
+### GPU not detected
+Ensure NVIDIA Container Toolkit is installed:
+```bash
+nvidia-smi  # Should work
+docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi  # Should work
+```