unir/MastersThesis

Fork 0

Files

History

sergio c4ab0ffad1 doceker support

2026-01-17 10:24:00 +01:00

dataset_manager.py

doceker support

2026-01-17 10:24:00 +01:00

docker-compose.yml

doceker support

2026-01-17 10:24:00 +01:00

Dockerfile.cpu

doceker support

2026-01-17 10:24:00 +01:00

Dockerfile.gpu

doceker support

2026-01-17 10:24:00 +01:00

paddle_ocr_tuning_rest.py

doceker support

2026-01-17 10:24:00 +01:00

README.md

doceker support

2026-01-17 10:24:00 +01:00

requirements-gpu.txt

doceker support

2026-01-17 10:24:00 +01:00

requirements.txt

doceker support

2026-01-17 10:24:00 +01:00

test.py

doceker support

2026-01-17 10:24:00 +01:00

README.md

PaddleOCR Tuning REST API

REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start with Docker Compose

Docker Compose manages building and running containers. The docker-compose.yml defines two services:

ocr-cpu - CPU-only version (works everywhere)
ocr-gpu - GPU version (requires NVIDIA GPU + Container Toolkit)

Run CPU Version

cd src/paddle_ocr

# Build and start (first time takes ~2-3 min to build, ~30s to load model)
docker compose up ocr-cpu

# Or run in background (detached)
docker compose up -d ocr-cpu

# View logs
docker compose logs -f ocr-cpu

# Stop
docker compose down

Run GPU Version

# Requires: NVIDIA GPU + nvidia-container-toolkit installed
docker compose up ocr-gpu

Test the API

Once running, test with:

# Check health
curl http://localhost:8000/health

# Or use the test script
pip install requests
python test.py --url http://localhost:8000

What Docker Compose Does

docker compose up ocr-cpu
       │
       ├─► Builds image from Dockerfile.cpu (if not exists)
       ├─► Creates container "paddle-ocr-cpu"
       ├─► Mounts ../dataset → /app/dataset (your PDF images)
       ├─► Mounts paddlex-cache volume (persists downloaded models)
       ├─► Exposes port 8000
       └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000

Files

File	Description
`paddle_ocr_tuning_rest.py`	FastAPI REST service
`dataset_manager.py`	Dataset loader
`test.py`	API test client
`Dockerfile.cpu`	CPU-only image (multi-arch)
`Dockerfile.gpu`	GPU/CUDA image (x86_64)
`docker-compose.yml`	Service orchestration

API Endpoints

`GET /health`

Check if service is ready.

{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}

`POST /evaluate`

Run OCR evaluation with given hyperparameters.

Request:

{
  "pdf_folder": "/app/dataset",
  "textline_orientation": true,
  "use_doc_orientation_classify": false,
  "use_doc_unwarping": false,
  "text_det_thresh": 0.469,
  "text_det_box_thresh": 0.5412,
  "text_det_unclip_ratio": 0.0,
  "text_rec_score_thresh": 0.635,
  "start_page": 5,
  "end_page": 10
}

Response:

{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}

`POST /evaluate_full`

Same as /evaluate but runs on ALL pages (ignores start_page/end_page).

Building Images

CPU Image (Multi-Architecture)

# Local build (current architecture)
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

# Multi-arch build with buildx (amd64 + arm64)
docker buildx create --name multiarch --use
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t paddle-ocr-api:cpu \
  --push .

GPU Image (x86_64 only)

docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .

Running

CPU (Any machine)

docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:cpu

GPU (NVIDIA)

docker run -d -p 8000:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:gpu

DGX Spark (ARM64 + CUDA)

DGX Spark uses ARM64 (Grace CPU) with NVIDIA Hopper GPU. You have two options:

Option 1: Native ARM64 Build (Recommended)

PaddlePaddle has ARM64 support. Build natively:

# On DGX Spark or ARM64 machine
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .

For GPU acceleration on ARM64, you'll need to modify Dockerfile.gpu to use ARM-compatible base image:

# Change this line in Dockerfile.gpu:
FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04

# To ARM64-compatible version:
FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
# (same image works on ARM64 when pulled on ARM machine)

Then build on the DGX Spark:

docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu-arm64 .

Option 2: x86_64 Emulation via QEMU (Slow)

You CAN run x86_64 images on ARM via emulation, but it's ~10-20x slower:

# On DGX Spark, enable QEMU emulation
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

# Run x86_64 image with emulation
docker run --platform linux/amd64 -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  paddle-ocr-api:cpu

Not recommended for production due to severe performance penalty.

Option 3: Cross-compile from x86_64

Build ARM64 images from your x86_64 machine:

# Setup buildx for multi-arch
docker buildx create --name mybuilder --use

# Build ARM64 image from x86_64 machine
docker buildx build -f Dockerfile.cpu \
  --platform linux/arm64 \
  -t paddle-ocr-api:arm64 \
  --load .

# Save and transfer to DGX Spark
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
scp paddle-ocr-arm64.tar.gz dgx-spark:~/
# On DGX Spark:
docker load < paddle-ocr-arm64.tar.gz

Using with Ray Tune

Update your notebook's trainable_paddle_ocr function:

import requests

API_URL = "http://localhost:8000/evaluate"

def trainable_paddle_ocr(config):
    """Call OCR API instead of subprocess."""
    payload = {
        "pdf_folder": "/app/dataset",
        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
        "use_doc_unwarping": config.get("use_doc_unwarping", False),
        "textline_orientation": config.get("textline_orientation", True),
        "text_det_thresh": config.get("text_det_thresh", 0.0),
        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
    }

    try:
        response = requests.post(API_URL, json=payload, timeout=600)
        response.raise_for_status()
        metrics = response.json()
        tune.report(metrics=metrics)
    except Exception as e:
        tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})

Architecture: Model Lifecycle

The model is loaded once at container startup and stays in memory for all requests:

flowchart TB
    subgraph Container["Docker Container Lifecycle"]
        Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
        Load --> Ready[API Ready<br/>Models in RAM ~500MB]

        subgraph Requests["Incoming Requests - Models Stay Loaded"]
            Ready --> R1[Request 1] --> Ready
            Ready --> R2[Request 2] --> Ready
            Ready --> RN[Request N...] --> Ready
        end

        Ready --> Stop([Container Stop])
        Stop --> Free[Models Freed]
    end

    style Load fill:#f9f,stroke:#333
    style Ready fill:#9f9,stroke:#333
    style Requests fill:#e8f4ea,stroke:#090

Subprocess vs REST API comparison:

flowchart LR
    subgraph Subprocess["❌ Subprocess Approach"]
        direction TB
        S1[Trial 1] --> L1[Load Model ~10s]
        L1 --> E1[Evaluate ~60s]
        E1 --> U1[Unload]
        U1 --> S2[Trial 2]
        S2 --> L2[Load Model ~10s]
        L2 --> E2[Evaluate ~60s]
    end

    subgraph REST["✅ REST API Approach"]
        direction TB
        Start2[Start Container] --> Load2[Load Model ~10s]
        Load2 --> Ready2[Model in Memory]
        Ready2 --> T1[Trial 1 ~60s]
        T1 --> Ready2
        Ready2 --> T2[Trial 2 ~60s]
        T2 --> Ready2
        Ready2 --> TN[Trial N ~60s]
    end

    style L1 fill:#faa
    style L2 fill:#faa
    style Load2 fill:#afa
    style Ready2 fill:#afa

Performance Comparison

Approach	Model Load	Per-Trial Overhead	64 Trials
Subprocess (original)	Every trial (~10s)	~10s	~7 hours
Docker per trial	Every trial (~10s)	~12-15s	~7.5 hours
REST API	Once	~0.1s	~5.8 hours

The REST API saves ~1+ hour by loading the model only once.

Troubleshooting

Model download slow on first run

The first run downloads ~500MB of models. Use volume paddlex-cache to persist them.

Out of memory

Reduce max_concurrent_trials in Ray Tune, or increase container memory:

docker run --memory=8g ...

GPU not detected

Ensure NVIDIA Container Toolkit is installed:

nvidia-smi  # Should work
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi  # Should work