Files
MastersThesis/src/paddle_ocr/README.md
Sergio Jimenez Jimenez a89ddd2d13
Some checks failed
build_docker / build_cpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_cpu (linux/amd64) (pull_request) Has been cancelled
build_docker / essential (push) Successful in 0s
build_docker / build_gpu (linux/amd64) (push) Has been cancelled
build_docker / build_gpu (linux/arm64) (push) Has been cancelled
build_docker / build_cpu (linux/amd64) (push) Has been cancelled
build_docker / build_cpu (linux/arm64) (push) Has been cancelled
build_docker / essential (pull_request) Successful in 1s
ci update
2026-01-17 16:15:53 +01:00

17 KiB

PaddleOCR Tuning REST API

REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start with Docker Compose

Docker Compose manages building and running containers. The docker-compose.yml defines two services:

  • ocr-cpu - CPU-only version (works everywhere)
  • ocr-gpu - GPU version (requires NVIDIA GPU + Container Toolkit)

Run CPU Version

cd src/paddle_ocr

# Build and start (first time takes ~2-3 min to build, ~30s to load model)
docker compose up ocr-cpu

# Or run in background (detached)
docker compose up -d ocr-cpu

# View logs
docker compose logs -f ocr-cpu

# Stop
docker compose down

Run GPU Version

# Requires: NVIDIA GPU + nvidia-container-toolkit installed
docker compose up ocr-gpu

Test the API

Once running, test with:

# Check health
curl http://localhost:8000/health

# Or use the test script
pip install requests
python test.py --url http://localhost:8000

What Docker Compose Does

docker compose up ocr-cpu
       │
       ├─► Builds image from Dockerfile.cpu (if not exists)
       ├─► Creates container "paddle-ocr-cpu"
       ├─► Mounts ../dataset → /app/dataset (your PDF images)
       ├─► Mounts paddlex-cache volume (persists downloaded models)
       ├─► Exposes port 8000
       └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000

Files

File Description
paddle_ocr_tuning_rest.py FastAPI REST service
dataset_manager.py Dataset loader
test.py API test client
Dockerfile.cpu CPU-only image (x86_64 + ARM64 with local wheel)
Dockerfile.gpu GPU/CUDA image (x86_64 + ARM64 with local wheel)
Dockerfile.build-paddle PaddlePaddle GPU wheel builder for ARM64
Dockerfile.build-paddle-cpu PaddlePaddle CPU wheel builder for ARM64
docker-compose.yml Service orchestration
docker-compose.cpu-registry.yml Pull CPU image from registry
docker-compose.gpu-registry.yml Pull GPU image from registry
wheels/ Local PaddlePaddle wheels (created by build-paddle)

API Endpoints

GET /health

Check if service is ready.

{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}

POST /evaluate

Run OCR evaluation with given hyperparameters.

Request:

{
  "pdf_folder": "/app/dataset",
  "textline_orientation": true,
  "use_doc_orientation_classify": false,
  "use_doc_unwarping": false,
  "text_det_thresh": 0.469,
  "text_det_box_thresh": 0.5412,
  "text_det_unclip_ratio": 0.0,
  "text_rec_score_thresh": 0.635,
  "start_page": 5,
  "end_page": 10
}

Response:

{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}

POST /evaluate_full

Same as /evaluate but runs on ALL pages (ignores start_page/end_page).

Building Images

CPU Image (Multi-Architecture)

# Local build (current architecture)
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

# Multi-arch build with buildx (amd64 + arm64)
docker buildx create --name multiarch --use
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t paddle-ocr-api:cpu \
  --push .

GPU Image (x86_64 only)

docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .

Note: PaddlePaddle GPU 3.x packages are not on PyPI. The Dockerfile installs from PaddlePaddle's official CUDA index (paddlepaddle.org.cn/packages/stable/cu126/). This is handled automatically during build.

Running

CPU (Any machine)

docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:cpu

GPU (NVIDIA)

docker run -d -p 8000:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:gpu

GPU Support Analysis

Host System Reference (DGX Spark)

This section documents GPU support findings based on testing on an NVIDIA DGX Spark:

Component Value
Architecture ARM64 (aarch64)
CPU NVIDIA Grace (ARM)
GPU NVIDIA GB10
CUDA Version 13.0
Driver 580.95.05
OS Ubuntu 24.04 LTS
Container Toolkit nvidia-container-toolkit 1.18.1
Docker 28.5.1
Docker Compose v2.40.0

PaddlePaddle GPU Platform Support

Critical Finding: PaddlePaddle-GPU does NOT support ARM64/aarch64 architecture.

Platform CPU GPU
Linux x86_64 CUDA 10.2/11.x/12.x
Windows x64 CUDA 10.2/11.x/12.x
macOS x64
macOS ARM64 (M1/M2)
Linux ARM64 (Jetson/DGX) No wheels

Source: PaddlePaddle-GPU PyPI - only manylinux_x86_64 and win_amd64 wheels available.

Why GPU Doesn't Work on ARM64

  1. No prebuilt wheels: pip install paddlepaddle-gpu fails on ARM64 - no compatible wheels exist
  2. Not a CUDA issue: The NVIDIA CUDA base images work fine on ARM64 (nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04)
  3. Not a container toolkit issue: nvidia-container-toolkit is installed and functional
  4. PaddlePaddle limitation: The Paddle team hasn't compiled GPU wheels for ARM64

When you run pip install paddlepaddle-gpu on ARM64:

ERROR: No matching distribution found for paddlepaddle-gpu

Options for ARM64 Systems

Use Dockerfile.cpu which works on ARM64:

# On DGX Spark
docker compose up ocr-cpu

# Or build directly
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

Performance: CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.

Option 2: Build PaddlePaddle from Source (Docker-based)

Use the included Docker builder to compile PaddlePaddle GPU for ARM64:

cd src/paddle_ocr

# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)
docker compose --profile build run --rm build-paddle

# Verify wheel was created
ls -la wheels/paddlepaddle*.whl

# Step 2: Build the GPU image (uses local wheel)
docker compose build ocr-gpu

# Step 3: Run with GPU
docker compose up ocr-gpu

# Verify GPU is working
docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"

What this does:

  1. build-paddle compiles PaddlePaddle from source inside a CUDA container
  2. The wheel is saved to ./wheels/ directory
  3. Dockerfile.gpu detects the local wheel and uses it instead of PyPI

Caveats:

  • Build takes 2-4 hours on first run
  • Requires ~20GB disk space during build
  • Not officially supported by PaddlePaddle team
  • May need adjustments for future PaddlePaddle versions

See: GitHub Issue #17327

Option 3: Alternative OCR Engines

For ARM64 GPU acceleration, consider alternatives:

Engine ARM64 GPU Notes
Tesseract CPU-only Good fallback, widely available
EasyOCR ⚠️ Via PyTorch PyTorch has ARM64 GPU support
TrOCR ⚠️ Via Transformers Hugging Face Transformers + PyTorch
docTR ⚠️ Via TensorFlow/PyTorch Both backends have ARM64 support

EasyOCR with PyTorch is a viable alternative:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install easyocr

x86_64 GPU Setup (Working)

For x86_64 systems with NVIDIA GPU, the GPU Docker works:

# Verify GPU is accessible
nvidia-smi

# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Build and run GPU version
docker compose up ocr-gpu

GPU Docker Compose Configuration

The docker-compose.yml configures GPU access via:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

This requires Docker Compose v2 and nvidia-container-toolkit.

DGX Spark / ARM64 Quick Start

For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:

cd src/paddle_ocr

# Build ARM64-native CPU image
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .

# Run
docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  paddle-ocr-api:arm64

# Test
curl http://localhost:8000/health

Cross-Compile from x86_64

Build ARM64 images from an x86_64 machine:

# Setup buildx for multi-arch
docker buildx create --name mybuilder --use

# Build ARM64 image from x86_64 machine
docker buildx build -f Dockerfile.cpu \
  --platform linux/arm64 \
  -t paddle-ocr-api:arm64 \
  --load .

# Save and transfer to DGX Spark
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
scp paddle-ocr-arm64.tar.gz dgx-spark:~/

# On DGX Spark:
docker load < paddle-ocr-arm64.tar.gz

Using with Ray Tune

Update your notebook's trainable_paddle_ocr function:

import requests

API_URL = "http://localhost:8000/evaluate"

def trainable_paddle_ocr(config):
    """Call OCR API instead of subprocess."""
    payload = {
        "pdf_folder": "/app/dataset",
        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
        "use_doc_unwarping": config.get("use_doc_unwarping", False),
        "textline_orientation": config.get("textline_orientation", True),
        "text_det_thresh": config.get("text_det_thresh", 0.0),
        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
    }

    try:
        response = requests.post(API_URL, json=payload, timeout=600)
        response.raise_for_status()
        metrics = response.json()
        tune.report(metrics=metrics)
    except Exception as e:
        tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})

Architecture: Model Lifecycle

The model is loaded once at container startup and stays in memory for all requests:

flowchart TB
    subgraph Container["Docker Container Lifecycle"]
        Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
        Load --> Ready[API Ready<br/>Models in RAM ~500MB]

        subgraph Requests["Incoming Requests - Models Stay Loaded"]
            Ready --> R1[Request 1] --> Ready
            Ready --> R2[Request 2] --> Ready
            Ready --> RN[Request N...] --> Ready
        end

        Ready --> Stop([Container Stop])
        Stop --> Free[Models Freed]
    end

    style Load fill:#f9f,stroke:#333
    style Ready fill:#9f9,stroke:#333
    style Requests fill:#e8f4ea,stroke:#090

Subprocess vs REST API comparison:

flowchart LR
    subgraph Subprocess["❌ Subprocess Approach"]
        direction TB
        S1[Trial 1] --> L1[Load Model ~10s]
        L1 --> E1[Evaluate ~60s]
        E1 --> U1[Unload]
        U1 --> S2[Trial 2]
        S2 --> L2[Load Model ~10s]
        L2 --> E2[Evaluate ~60s]
    end

    subgraph REST["✅ REST API Approach"]
        direction TB
        Start2[Start Container] --> Load2[Load Model ~10s]
        Load2 --> Ready2[Model in Memory]
        Ready2 --> T1[Trial 1 ~60s]
        T1 --> Ready2
        Ready2 --> T2[Trial 2 ~60s]
        T2 --> Ready2
        Ready2 --> TN[Trial N ~60s]
    end

    style L1 fill:#faa
    style L2 fill:#faa
    style Load2 fill:#afa
    style Ready2 fill:#afa

Performance Comparison

Approach Model Load Per-Trial Overhead 64 Trials
Subprocess (original) Every trial (~10s) ~10s ~7 hours
Docker per trial Every trial (~10s) ~12-15s ~7.5 hours
REST API Once ~0.1s ~5.8 hours

The REST API saves ~1+ hour by loading the model only once.

Troubleshooting

Model download slow on first run

The first run downloads ~500MB of models. Use volume paddlex-cache to persist them.

Out of memory

Reduce max_concurrent_trials in Ray Tune, or increase container memory:

docker run --memory=8g ...

GPU not detected

Ensure NVIDIA Container Toolkit is installed:

nvidia-smi  # Should work
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi  # Should work

PaddlePaddle GPU installation fails

PaddlePaddle 3.x GPU packages are not available on PyPI. They must be installed from PaddlePaddle's official index:

# For CUDA 12.x
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# For CUDA 11.8
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

The Dockerfile.gpu handles this automatically.

CI/CD Pipeline

The project includes a Gitea Actions workflow (.gitea/workflows/ci.yaml) for automated builds.

What CI Builds

Image Architecture Source
paddle-ocr-cpu:amd64 amd64 PyPI paddlepaddle
paddle-ocr-cpu:arm64 arm64 Pre-built wheel from Gitea packages
paddle-ocr-gpu:amd64 amd64 PyPI paddlepaddle-gpu
paddle-ocr-gpu:arm64 arm64 Pre-built wheel from Gitea packages

ARM64 Wheel Workflow

Since PyPI wheels don't work on ARM64 (x86 SSE instructions), wheels must be built from source using sse2neon:

  1. Built manually on an ARM64 machine (one-time)
  2. Uploaded to Gitea generic packages
  3. Downloaded by CI when building ARM64 images

Step 1: Build ARM64 Wheels (One-time, on ARM64 machine)

cd src/paddle_ocr

# Build GPU wheel (requires NVIDIA GPU, takes 1-2 hours)
sudo docker build -t paddle-builder:gpu-arm64 -f Dockerfile.build-paddle .
sudo docker run --rm -v ./wheels:/wheels paddle-builder:gpu-arm64

# Build CPU wheel (no GPU required, takes 1-2 hours)
sudo docker build -t paddle-builder:cpu-arm64 -f Dockerfile.build-paddle-cpu .
sudo docker run --rm -v ./wheels:/wheels paddle-builder:cpu-arm64

# Verify wheels were created
ls -la wheels/paddlepaddle*.whl
# paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl (GPU)
# paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl (CPU)

Step 2: Upload Wheels to Gitea Packages

export GITEA_TOKEN="your-token-here"

# Upload GPU wheel
curl -X PUT \
  -H "Authorization: token $GITEA_TOKEN" \
  --upload-file wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl \
  "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl"

# Upload CPU wheel
curl -X PUT \
  -H "Authorization: token $GITEA_TOKEN" \
  --upload-file wheels/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl \
  "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl"

Wheels available at:

https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl
https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl

Step 3: CI Builds Images

CI automatically:

  1. Downloads ARM64 wheels from Gitea packages (for arm64 builds only)
  2. Builds both CPU and GPU images for amd64 and arm64
  3. Pushes to registry with arch-specific tags

Required CI Secrets

Configure these in Gitea repository settings:

Secret Description
CI_READWRITE Gitea token with registry read/write access

Manual Image Push

# Login to registry
docker login seryus.ddns.net

# Build and push CPU (multi-arch)
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t seryus.ddns.net/unir/paddle-ocr-api:cpu \
  --push .

# Build and push GPU (x86_64)
docker build -f Dockerfile.gpu -t seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64 .
docker push seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64

# Build and push GPU (ARM64) - requires wheel in wheels/
docker buildx build -f Dockerfile.gpu \
  --platform linux/arm64 \
  -t seryus.ddns.net/unir/paddle-ocr-api:gpu-arm64 \
  --push .

Updating the ARM64 Wheels

When PaddlePaddle releases a new version:

  1. Update PADDLE_VERSION in Dockerfile.build-paddle and Dockerfile.build-paddle-cpu
  2. Rebuild both wheels on an ARM64 machine
  3. Upload to Gitea packages with new version
  4. Update PADDLE_VERSION in .gitea/workflows/ci.yaml