# PaddleOCR Tuning REST API REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search. ## Quick Start with Docker Compose Docker Compose manages building and running containers. The `docker-compose.yml` defines two services: - `ocr-cpu` - CPU-only version (works everywhere) - `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit) ### Run CPU Version ```bash cd src/paddle_ocr # Build and start (first time takes ~2-3 min to build, ~30s to load model) docker compose up ocr-cpu # Or run in background (detached) docker compose up -d ocr-cpu # View logs docker compose logs -f ocr-cpu # Stop docker compose down ``` ### Run GPU Version ```bash # Requires: NVIDIA GPU + nvidia-container-toolkit installed docker compose up ocr-gpu ``` ### Test the API Once running, test with: ```bash # Check health curl http://localhost:8000/health # Or use the test script pip install requests python test.py --url http://localhost:8000 ``` ### What Docker Compose Does ``` docker compose up ocr-cpu │ ├─► Builds image from Dockerfile.cpu (if not exists) ├─► Creates container "paddle-ocr-cpu" ├─► Mounts ../dataset → /app/dataset (your PDF images) ├─► Mounts paddlex-cache volume (persists downloaded models) ├─► Exposes port 8000 └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000 ``` ## Files | File | Description | |------|-------------| | `paddle_ocr_tuning_rest.py` | FastAPI REST service | | `dataset_manager.py` | Dataset loader | | `test.py` | API test client | | `Dockerfile.cpu` | CPU-only image (multi-arch) | | `Dockerfile.gpu` | GPU/CUDA image (x86_64) | | `docker-compose.yml` | Service orchestration | ## API Endpoints ### `GET /health` Check if service is ready. ```json {"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24} ``` ### `POST /evaluate` Run OCR evaluation with given hyperparameters. **Request:** ```json { "pdf_folder": "/app/dataset", "textline_orientation": true, "use_doc_orientation_classify": false, "use_doc_unwarping": false, "text_det_thresh": 0.469, "text_det_box_thresh": 0.5412, "text_det_unclip_ratio": 0.0, "text_rec_score_thresh": 0.635, "start_page": 5, "end_page": 10 } ``` **Response:** ```json {"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1} ``` ### `POST /evaluate_full` Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page). ## Building Images ### CPU Image (Multi-Architecture) ```bash # Local build (current architecture) docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu . # Multi-arch build with buildx (amd64 + arm64) docker buildx create --name multiarch --use docker buildx build -f Dockerfile.cpu \ --platform linux/amd64,linux/arm64 \ -t paddle-ocr-api:cpu \ --push . ``` ### GPU Image (x86_64 only) ```bash docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu . ``` ## Running ### CPU (Any machine) ```bash docker run -d -p 8000:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ -v paddlex-cache:/root/.paddlex \ paddle-ocr-api:cpu ``` ### GPU (NVIDIA) ```bash docker run -d -p 8000:8000 --gpus all \ -v $(pwd)/../dataset:/app/dataset:ro \ -v paddlex-cache:/root/.paddlex \ paddle-ocr-api:gpu ``` ## DGX Spark (ARM64 + CUDA) DGX Spark uses ARM64 (Grace CPU) with NVIDIA Hopper GPU. You have two options: ### Option 1: Native ARM64 Build (Recommended) PaddlePaddle has ARM64 support. Build natively: ```bash # On DGX Spark or ARM64 machine docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 . ``` For GPU acceleration on ARM64, you'll need to modify `Dockerfile.gpu` to use ARM-compatible base image: ```dockerfile # Change this line in Dockerfile.gpu: FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 # To ARM64-compatible version: FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 # (same image works on ARM64 when pulled on ARM machine) ``` Then build on the DGX Spark: ```bash docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu-arm64 . ``` ### Option 2: x86_64 Emulation via QEMU (Slow) You CAN run x86_64 images on ARM via emulation, but it's ~10-20x slower: ```bash # On DGX Spark, enable QEMU emulation docker run --rm --privileged multiarch/qemu-user-static --reset -p yes # Run x86_64 image with emulation docker run --platform linux/amd64 -p 8000:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ paddle-ocr-api:cpu ``` **Not recommended** for production due to severe performance penalty. ### Option 3: Cross-compile from x86_64 Build ARM64 images from your x86_64 machine: ```bash # Setup buildx for multi-arch docker buildx create --name mybuilder --use # Build ARM64 image from x86_64 machine docker buildx build -f Dockerfile.cpu \ --platform linux/arm64 \ -t paddle-ocr-api:arm64 \ --load . # Save and transfer to DGX Spark docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz scp paddle-ocr-arm64.tar.gz dgx-spark:~/ # On DGX Spark: docker load < paddle-ocr-arm64.tar.gz ``` ## Using with Ray Tune Update your notebook's `trainable_paddle_ocr` function: ```python import requests API_URL = "http://localhost:8000/evaluate" def trainable_paddle_ocr(config): """Call OCR API instead of subprocess.""" payload = { "pdf_folder": "/app/dataset", "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False), "use_doc_unwarping": config.get("use_doc_unwarping", False), "textline_orientation": config.get("textline_orientation", True), "text_det_thresh": config.get("text_det_thresh", 0.0), "text_det_box_thresh": config.get("text_det_box_thresh", 0.0), "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5), "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0), } try: response = requests.post(API_URL, json=payload, timeout=600) response.raise_for_status() metrics = response.json() tune.report(metrics=metrics) except Exception as e: tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]}) ``` ## Architecture: Model Lifecycle The model is loaded **once** at container startup and stays in memory for all requests: ```mermaid flowchart TB subgraph Container["Docker Container Lifecycle"] Start([Container Start]) --> Load[Load PaddleOCR Models
~10-30s one-time cost] Load --> Ready[API Ready
Models in RAM ~500MB] subgraph Requests["Incoming Requests - Models Stay Loaded"] Ready --> R1[Request 1] --> Ready Ready --> R2[Request 2] --> Ready Ready --> RN[Request N...] --> Ready end Ready --> Stop([Container Stop]) Stop --> Free[Models Freed] end style Load fill:#f9f,stroke:#333 style Ready fill:#9f9,stroke:#333 style Requests fill:#e8f4ea,stroke:#090 ``` **Subprocess vs REST API comparison:** ```mermaid flowchart LR subgraph Subprocess["❌ Subprocess Approach"] direction TB S1[Trial 1] --> L1[Load Model ~10s] L1 --> E1[Evaluate ~60s] E1 --> U1[Unload] U1 --> S2[Trial 2] S2 --> L2[Load Model ~10s] L2 --> E2[Evaluate ~60s] end subgraph REST["✅ REST API Approach"] direction TB Start2[Start Container] --> Load2[Load Model ~10s] Load2 --> Ready2[Model in Memory] Ready2 --> T1[Trial 1 ~60s] T1 --> Ready2 Ready2 --> T2[Trial 2 ~60s] T2 --> Ready2 Ready2 --> TN[Trial N ~60s] end style L1 fill:#faa style L2 fill:#faa style Load2 fill:#afa style Ready2 fill:#afa ``` ## Performance Comparison | Approach | Model Load | Per-Trial Overhead | 64 Trials | |----------|------------|-------------------|-----------| | Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours | | Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours | | **REST API** | **Once** | **~0.1s** | **~5.8 hours** | The REST API saves ~1+ hour by loading the model only once. ## Troubleshooting ### Model download slow on first run The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them. ### Out of memory Reduce `max_concurrent_trials` in Ray Tune, or increase container memory: ```bash docker run --memory=8g ... ``` ### GPU not detected Ensure NVIDIA Container Toolkit is installed: ```bash nvidia-smi # Should work docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work ```