unir/MastersThesis

Fork 0

Files

sergio c7ed7b2b9c

build_docker / essential (push) Successful in 0s

Details

build_docker / build_cpu (push) Successful in 5m0s

Details

build_docker / build_gpu (push) Successful in 22m55s

Details

build_docker / build_easyocr (push) Successful in 18m47s

Details

build_docker / build_easyocr_gpu (push) Successful in 19m0s

Details

build_docker / build_raytune (push) Successful in 3m27s

Details

build_docker / build_doctr (push) Successful in 19m42s

Details

build_docker / build_doctr_gpu (push) Successful in 14m49s

Details

Paddle ocr, easyicr and doctr gpu support. (#4 )

2026-01-19 17:35:24 +00:00

26 KiB

Raw Blame History

PaddleOCR Tuning REST API

REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

Quick Start with Docker Compose

Docker Compose manages building and running containers. The docker-compose.yml defines two services:

ocr-cpu - CPU-only version (works everywhere)
ocr-gpu - GPU version (requires NVIDIA GPU + Container Toolkit)

Run CPU Version

cd src/paddle_ocr

# Build and start (first time takes ~2-3 min to build, ~30s to load model)
docker compose up ocr-cpu

# Or run in background (detached)
docker compose up -d ocr-cpu

# View logs
docker compose logs -f ocr-cpu

# Stop
docker compose down

Run GPU Version

# Requires: NVIDIA GPU + nvidia-container-toolkit installed
docker compose up ocr-gpu

Test the API

Once running, test with:

# Check health
curl http://localhost:8000/health

# Or use the test script
pip install requests
python test.py --url http://localhost:8000

What Docker Compose Does

docker compose up ocr-cpu
       │
       ├─► Builds image from Dockerfile.cpu (if not exists)
       ├─► Creates container "paddle-ocr-cpu"
       ├─► Mounts ../dataset → /app/dataset (your PDF images)
       ├─► Mounts paddlex-cache volume (persists downloaded models)
       ├─► Exposes port 8000
       └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000

Files

File	Description
`paddle_ocr_tuning_rest.py`	FastAPI REST service
`dataset_manager.py`	Dataset loader
`test.py`	API test client
`Dockerfile.cpu`	CPU-only image (x86_64 + ARM64 with local wheel)
`Dockerfile.gpu`	GPU/CUDA image (x86_64 + ARM64 with local wheel)
`Dockerfile.build-paddle`	PaddlePaddle GPU wheel builder for ARM64
`Dockerfile.build-paddle-cpu`	PaddlePaddle CPU wheel builder for ARM64
`docker-compose.yml`	Service orchestration
`docker-compose.cpu-registry.yml`	Pull CPU image from registry
`docker-compose.gpu-registry.yml`	Pull GPU image from registry
`wheels/`	Local PaddlePaddle wheels (created by build-paddle)

API Endpoints

`GET /health`

Check if service is ready.

{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}

`POST /evaluate`

Run OCR evaluation with given hyperparameters.

Request:

{
  "pdf_folder": "/app/dataset",
  "textline_orientation": true,
  "use_doc_orientation_classify": false,
  "use_doc_unwarping": false,
  "text_det_thresh": 0.469,
  "text_det_box_thresh": 0.5412,
  "text_det_unclip_ratio": 0.0,
  "text_rec_score_thresh": 0.635,
  "start_page": 5,
  "end_page": 10
}

Response:

{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}

`POST /evaluate_full`

Same as /evaluate but runs on ALL pages (ignores start_page/end_page).

Debug Output (debugset)

The debugset folder allows saving OCR predictions for debugging and analysis. When save_output=True is passed to /evaluate, predictions are written to /app/debugset.

Enable Debug Output

{
  "pdf_folder": "/app/dataset",
  "save_output": true,
  "start_page": 5,
  "end_page": 10
}

Output Structure

debugset/
├── doc1/
│   └── paddle_ocr/
│       ├── page_0005.txt
│       ├── page_0006.txt
│       └── ...
├── doc2/
│   └── paddle_ocr/
│       └── ...

Each .txt file contains the OCR-extracted text for that page.

Docker Mount

The debugset folder is mounted read-write in docker-compose:

volumes:
  - ../debugset:/app/debugset:rw

Use Cases

Compare OCR engines: Run same pages through PaddleOCR, DocTR, EasyOCR with save_output=True, then diff results
Debug hyperparameters: See how different settings affect text extraction
Ground truth comparison: Compare predictions against expected output

Building Images

CPU Image (Multi-Architecture)

# Local build (current architecture)
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

# Multi-arch build with buildx (amd64 + arm64)
docker buildx create --name multiarch --use
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t paddle-ocr-api:cpu \
  --push .

GPU Image (x86_64 + ARM64 with local wheel)

docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .

Note: PaddlePaddle GPU 3.x packages are not on PyPI. The Dockerfile installs from PaddlePaddle's official CUDA index (paddlepaddle.org.cn/packages/stable/cu126/). This is handled automatically during build.

Running

CPU (Any machine)

docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:cpu

GPU (NVIDIA)

docker run -d -p 8000:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:gpu

GPU Support Analysis

Host System Reference (DGX Spark)

This section documents GPU support findings based on testing on an NVIDIA DGX Spark:

Component	Value
Architecture	ARM64 (aarch64)
CPU	NVIDIA Grace (ARM)
GPU	NVIDIA GB10
CUDA Version	13.0
Driver	580.95.05
OS	Ubuntu 24.04 LTS
Container Toolkit	nvidia-container-toolkit 1.18.1
Docker	28.5.1
Docker Compose	v2.40.0

PaddlePaddle GPU Platform Support

Note: PaddlePaddle-GPU does NOT have prebuilt ARM64 wheels on PyPI, but ARM64 support is available via custom-built wheels.

Platform	CPU	GPU
Linux x86_64	✅	✅ CUDA 10.2/11.x/12.x
Windows x64	✅	✅ CUDA 10.2/11.x/12.x
macOS x64	✅	❌
macOS ARM64 (M1/M2)	✅	❌
Linux ARM64 (Jetson/DGX)	✅	⚠️ Limited - see Blackwell note

Source: PaddlePaddle-GPU PyPI - only manylinux_x86_64 and win_amd64 wheels available on PyPI. ARM64 wheels must be built from source or downloaded from Gitea packages.

ARM64 GPU Support

ARM64 GPU support is available but requires custom-built wheels:

No prebuilt PyPI wheels: pip install paddlepaddle-gpu fails on ARM64 - no compatible wheels exist on PyPI
Custom wheels work: This project provides Dockerfiles to build ARM64 GPU wheels from source
CI/CD builds ARM64 GPU images: Pre-built wheels are available from Gitea packages

To use GPU on ARM64:

Use the pre-built images from the container registry, or
Build the wheel locally using Dockerfile.build-paddle (see Option 2 below), or
Download the wheel from Gitea packages: wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl

⚠️ Known Limitation: Blackwell GPU (sm_121 / GB10)

Status: GPU inference does NOT work on NVIDIA Blackwell GPUs (DGX Spark, GB200, etc.)

Symptoms

When running PaddleOCR on Blackwell GPUs:

CUDA loads successfully ✅
Basic tensor operations work ✅
Detection model outputs constant values ❌
0 text regions detected
CER/WER = 100% (nothing recognized)

Root Cause

Confirmed: PaddlePaddle's entire CUDA backend does NOT support Blackwell (sm_121). This is NOT just an inference model problem - even basic operations fail.

Test Results (January 2026):

PTX JIT Test (CUDA_FORCE_PTX_JIT=1):

OSError: CUDA error(209), no kernel image is available for execution on the device.
[Hint: 'cudaErrorNoKernelImageForDevice']

→ Confirmed: No PTX code exists in PaddlePaddle binaries

Dynamic Graph Mode Test (bypassing inference models):

Conv2D + BatchNorm output:
  Output min: 0.0000
  Output max: 0.0000
  Output mean: 0.0000
Dynamic graph mode: BROKEN (constant output)

→ Confirmed: Even simple nn.Conv2D produces zeros on Blackwell

Conclusion: The issue is PaddlePaddle's compiled CUDA kernels (cubins), not just the inference models. The entire framework was compiled without sm_121 support and without PTX for JIT compilation.

Why building PaddlePaddle from source doesn't fix it:

⚠️ Building with CUDA_ARCH=121 requires CUDA 13.0+ (PaddlePaddle only supports up to CUDA 12.6)
❌ Even if you could build it, PaddleOCR models contain pre-compiled CUDA ops
❌ These model files were exported/compiled targeting sm_80/sm_90 architectures
❌ The model kernels execute on GPU but produce garbage output on sm_121

To truly fix this, the PaddlePaddle team would need to:

Add sm_121 to their model export pipeline
Re-export all PaddleOCR models (PP-OCRv4, PP-OCRv5, etc.) with Blackwell support
Release new model versions

This is tracked in GitHub Issue #17327.

Debug Script

Use the included debug script to verify this issue:

docker exec paddle-ocr-gpu python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png

Expected output showing the problem:

OUTPUT ANALYSIS:
  Shape: (1, 1, 640, 640)
  Min: 0.000010
  Max: 0.000010   # <-- Same as min = constant output
  Mean: 0.000010

DIAGNOSIS:
  PROBLEM: Output is constant - model inference is broken!
  This typically indicates GPU compute capability mismatch.

Workarounds

Use CPU mode (recommended):
```
docker compose up ocr-cpu
```
The ARM Grace CPU is fast (~2-5 sec/page). This is the reliable option.

Use EasyOCR or DocTR with GPU: These use PyTorch which has official ARM64 CUDA wheels (cu128 index):

# EasyOCR with GPU on DGX Spark
docker build -f ../easyocr_service/Dockerfile.gpu -t easyocr-gpu ../easyocr_service
docker run --gpus all -p 8002:8000 easyocr-gpu

Wait for PaddlePaddle Blackwell support: Track GitHub Issue #17327 for updates.

GPU Support Matrix (Updated)

GPU Architecture	Compute	CPU	GPU
Ampere (A100, A10)	sm_80	✅	✅
Hopper (H100, H200)	sm_90	✅	✅
Blackwell (GB10, GB200)	sm_121	✅	❌ Not supported

FAQ: Why Doesn't CUDA Backward Compatibility Work?

Q: CUDA normally runs older kernels on newer GPUs. Why doesn't this work for Blackwell?

Per NVIDIA Blackwell Compatibility Guide:

CUDA can run older code on newer GPUs via PTX JIT compilation:

PTX (Parallel Thread Execution) is NVIDIA's intermediate representation
If an app includes PTX code, the driver JIT-compiles it for the target GPU
This allows sm_80 code to run on sm_121

The problem: PaddleOCR inference models contain only pre-compiled cubins (SASS binary), not PTX. Without PTX, there's nothing to JIT-compile.

We tested PTX JIT (January 2026):

# Force PTX JIT compilation
docker run --gpus all -e CUDA_FORCE_PTX_JIT=1 paddle-ocr-gpu \
  python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png

# Result:
# OSError: CUDA error(209), no kernel image is available for execution on the device.

Confirmed: No PTX exists in PaddlePaddle binaries. The CUDA kernels are cubins-only (SASS binary), compiled for sm_80/sm_90 without PTX fallback.

Note on sm_121: Per NVIDIA docs, "sm_121 is the same as sm_120 since the only difference is physically integrated CPU+GPU memory of Spark." The issue is general Blackwell (sm_12x) support, not Spark-specific.

FAQ: Does Dynamic Graph Mode Work on Blackwell?

Q: Can I bypass inference models and use PaddlePaddle's dynamic graph mode?

No. We tested dynamic graph mode (January 2026):

# Test script runs: paddle.nn.Conv2D + paddle.nn.BatchNorm2D
python /app/scripts/test_dynamic_mode.py

# Result:
# Input shape: [1, 3, 224, 224]
# Output shape: [1, 64, 112, 112]
# Output min: 0.0000
# Output max: 0.0000  # <-- All zeros!
# Output mean: 0.0000
# Dynamic graph mode: BROKEN (constant output)

Conclusion: The problem isn't limited to inference models. PaddlePaddle's core CUDA kernels (Conv2D, BatchNorm, etc.) produce garbage on sm_121. The entire framework lacks Blackwell support.

FAQ: Can I Run AMD64 Containers on ARM64 DGX Spark?

Q: Can I just run the working x86_64 GPU image via emulation?

Short answer: Yes for CPU, No for GPU.

You can run amd64 containers via QEMU emulation:

# Install QEMU
sudo apt-get install qemu binfmt-support qemu-user-static
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

# Run amd64 container
docker run --platform linux/amd64 paddle-ocr-gpu:amd64 ...

But GPU doesn't work:

QEMU emulates CPU instructions (x86 → ARM)
QEMU user-mode does NOT support GPU passthrough
GPU calls from emulated x86 code cannot reach the ARM64 GPU

So even if the amd64 image works on x86_64:

❌ No GPU access through QEMU
❌ CPU emulation is 10-100x slower than native ARM64
❌ Defeats the purpose entirely

Approach	CPU	GPU	Speed
ARM64 native (CPU)	✅	N/A	Fast (~2-5s/page)
ARM64 native (GPU)	✅	❌ Blackwell issue	-
AMD64 via QEMU	⚠️ Works	❌ No passthrough	10-100x slower

Options for ARM64 Systems

Option 1: CPU-Only (Recommended)

Use Dockerfile.cpu which works on ARM64:

# On DGX Spark
docker compose up ocr-cpu

# Or build directly
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

Performance: CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.

Option 2: Build PaddlePaddle from Source (Docker-based)

Use the included Docker builder to compile PaddlePaddle GPU for ARM64:

cd src/paddle_ocr

# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)
docker compose --profile build run --rm build-paddle

# Verify wheel was created
ls -la wheels/paddlepaddle*.whl

# Step 2: Build the GPU image (uses local wheel)
docker compose build ocr-gpu

# Step 3: Run with GPU
docker compose up ocr-gpu

# Verify GPU is working
docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"

What this does:

build-paddle compiles PaddlePaddle from source inside a CUDA container
The wheel is saved to ./wheels/ directory
Dockerfile.gpu detects the local wheel and uses it instead of PyPI

Caveats:

Build takes 2-4 hours on first run
Requires ~20GB disk space during build
Not officially supported by PaddlePaddle team
May need adjustments for future PaddlePaddle versions

See: GitHub Issue #17327

Option 3: Alternative OCR Engines

For ARM64 GPU acceleration, consider alternatives:

Engine	ARM64 GPU	Notes
Tesseract	❌ CPU-only	Good fallback, widely available
EasyOCR	⚠️ Via PyTorch	PyTorch has ARM64 GPU support
TrOCR	⚠️ Via Transformers	Hugging Face Transformers + PyTorch
docTR	⚠️ Via TensorFlow/PyTorch	Both backends have ARM64 support

EasyOCR with PyTorch is a viable alternative:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install easyocr

x86_64 GPU Setup (Working)

For x86_64 systems with NVIDIA GPU, the GPU Docker works:

# Verify GPU is accessible
nvidia-smi

# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Build and run GPU version
docker compose up ocr-gpu

GPU Docker Compose Configuration

The docker-compose.yml configures GPU access via:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

This requires Docker Compose v2 and nvidia-container-toolkit.

DGX Spark / ARM64 Quick Start

For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:

cd src/paddle_ocr

# Build ARM64-native CPU image
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .

# Run
docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  paddle-ocr-api:arm64

# Test
curl http://localhost:8000/health

Cross-Compile from x86_64

Build ARM64 images from an x86_64 machine:

# Setup buildx for multi-arch
docker buildx create --name mybuilder --use

# Build ARM64 image from x86_64 machine
docker buildx build -f Dockerfile.cpu \
  --platform linux/arm64 \
  -t paddle-ocr-api:arm64 \
  --load .

# Save and transfer to DGX Spark
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
scp paddle-ocr-arm64.tar.gz dgx-spark:~/

# On DGX Spark:
docker load < paddle-ocr-arm64.tar.gz

Using with Ray Tune

Multi-Worker Setup for Parallel Trials

Run multiple workers for parallel hyperparameter tuning:

cd src/paddle_ocr

# Start 2 CPU workers (ports 8001-8002)
sudo docker compose -f docker-compose.workers.yml --profile cpu up -d

# Or for GPU workers (if supported)
sudo docker compose -f docker-compose.workers.yml --profile gpu up -d

# Check workers are healthy
curl http://localhost:8001/health
curl http://localhost:8002/health

Then run the notebook with max_concurrent_trials=2 to use both workers in parallel.

Single Worker Setup

Update your notebook's trainable_paddle_ocr function:

import requests

API_URL = "http://localhost:8000/evaluate"

def trainable_paddle_ocr(config):
    """Call OCR API instead of subprocess."""
    payload = {
        "pdf_folder": "/app/dataset",
        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
        "use_doc_unwarping": config.get("use_doc_unwarping", False),
        "textline_orientation": config.get("textline_orientation", True),
        "text_det_thresh": config.get("text_det_thresh", 0.0),
        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
    }

    try:
        response = requests.post(API_URL, json=payload, timeout=600)
        response.raise_for_status()
        metrics = response.json()
        tune.report(metrics=metrics)
    except Exception as e:
        tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})

Architecture: Model Lifecycle

The model is loaded once at container startup and stays in memory for all requests:

flowchart TB
    subgraph Container["Docker Container Lifecycle"]
        Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
        Load --> Ready[API Ready<br/>Models in RAM ~500MB]

        subgraph Requests["Incoming Requests - Models Stay Loaded"]
            Ready --> R1[Request 1] --> Ready
            Ready --> R2[Request 2] --> Ready
            Ready --> RN[Request N...] --> Ready
        end

        Ready --> Stop([Container Stop])
        Stop --> Free[Models Freed]
    end

    style Load fill:#f9f,stroke:#333
    style Ready fill:#9f9,stroke:#333
    style Requests fill:#e8f4ea,stroke:#090

Subprocess vs REST API comparison:

flowchart LR
    subgraph Subprocess["❌ Subprocess Approach"]
        direction TB
        S1[Trial 1] --> L1[Load Model ~10s]
        L1 --> E1[Evaluate ~60s]
        E1 --> U1[Unload]
        U1 --> S2[Trial 2]
        S2 --> L2[Load Model ~10s]
        L2 --> E2[Evaluate ~60s]
    end

    subgraph REST["✅ REST API Approach"]
        direction TB
        Start2[Start Container] --> Load2[Load Model ~10s]
        Load2 --> Ready2[Model in Memory]
        Ready2 --> T1[Trial 1 ~60s]
        T1 --> Ready2
        Ready2 --> T2[Trial 2 ~60s]
        T2 --> Ready2
        Ready2 --> TN[Trial N ~60s]
    end

    style L1 fill:#faa
    style L2 fill:#faa
    style Load2 fill:#afa
    style Ready2 fill:#afa

Performance Comparison

Approach	Model Load	Per-Trial Overhead	64 Trials
Subprocess (original)	Every trial (~10s)	~10s	~7 hours
Docker per trial	Every trial (~10s)	~12-15s	~7.5 hours
REST API	Once	~0.1s	~5.8 hours

The REST API saves ~1+ hour by loading the model only once.

Troubleshooting

Model download slow on first run

The first run downloads ~500MB of models. Use volume paddlex-cache to persist them.

Out of memory

Reduce max_concurrent_trials in Ray Tune, or increase container memory:

docker run --memory=8g ...

GPU not detected

Ensure NVIDIA Container Toolkit is installed:

nvidia-smi  # Should work
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi  # Should work

PaddlePaddle GPU installation fails

PaddlePaddle 3.x GPU packages are not available on PyPI. They must be installed from PaddlePaddle's official index:

# For CUDA 12.x
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# For CUDA 11.8
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

The Dockerfile.gpu handles this automatically.

CI/CD Pipeline

The project includes a Gitea Actions workflow (.gitea/workflows/ci.yaml) for automated builds.

What CI Builds

Image	Architecture	Source
`paddle-ocr-cpu:amd64`	amd64	PyPI paddlepaddle
`paddle-ocr-cpu:arm64`	arm64	Pre-built wheel from Gitea packages
`paddle-ocr-gpu:amd64`	amd64	PyPI paddlepaddle-gpu
`paddle-ocr-gpu:arm64`	arm64	Pre-built wheel from Gitea packages

ARM64 Wheel Workflow

Since PyPI wheels don't work on ARM64 (x86 SSE instructions), wheels must be built from source using sse2neon:

Built manually on an ARM64 machine (one-time)
Uploaded to Gitea generic packages
Downloaded by CI when building ARM64 images

Step 1: Build ARM64 Wheels (One-time, on ARM64 machine)

cd src/paddle_ocr

# Build GPU wheel (requires NVIDIA GPU, takes 1-2 hours)
sudo docker build -t paddle-builder:gpu-arm64 -f Dockerfile.build-paddle .
sudo docker run --rm -v ./wheels:/wheels paddle-builder:gpu-arm64

# Build CPU wheel (no GPU required, takes 1-2 hours)
sudo docker build -t paddle-builder:cpu-arm64 -f Dockerfile.build-paddle-cpu .
sudo docker run --rm -v ./wheels:/wheels paddle-builder:cpu-arm64

# Verify wheels were created
ls -la wheels/paddlepaddle*.whl
# paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl (GPU)
# paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl (CPU)

Step 2: Upload Wheels to Gitea Packages

export GITEA_TOKEN="your-token-here"

# Upload GPU wheel
curl -X PUT \
  -H "Authorization: token $GITEA_TOKEN" \
  --upload-file wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl \
  "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl"

# Upload CPU wheel
curl -X PUT \
  -H "Authorization: token $GITEA_TOKEN" \
  --upload-file wheels/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl \
  "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl"

Wheels available at:

https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl
https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl

Step 3: CI Builds Images

CI automatically:

Downloads ARM64 wheels from Gitea packages (for arm64 builds only)
Builds both CPU and GPU images for amd64 and arm64
Pushes to registry with arch-specific tags

Required CI Secrets

Configure these in Gitea repository settings:

Secret	Description
`CI_READWRITE`	Gitea token with registry read/write access

Manual Image Push

# Login to registry
docker login seryus.ddns.net

# Build and push CPU (multi-arch)
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t seryus.ddns.net/unir/paddle-ocr-api:cpu \
  --push .

# Build and push GPU (x86_64)
docker build -f Dockerfile.gpu -t seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64 .
docker push seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64

# Build and push GPU (ARM64) - requires wheel in wheels/
docker buildx build -f Dockerfile.gpu \
  --platform linux/arm64 \
  -t seryus.ddns.net/unir/paddle-ocr-api:gpu-arm64 \
  --push .

Updating the ARM64 Wheels

When PaddlePaddle releases a new version:

Update PADDLE_VERSION in Dockerfile.build-paddle and Dockerfile.build-paddle-cpu
Rebuild both wheels on an ARM64 machine
Upload to Gitea packages with new version
Update PADDLE_VERSION in .gitea/workflows/ci.yaml

26 KiB Raw Blame History

PaddleOCR Tuning REST API

Quick Start with Docker Compose

Run CPU Version

Run GPU Version

Test the API

What Docker Compose Does

Files

API Endpoints

GET /health

POST /evaluate

POST /evaluate_full

Debug Output (debugset)

Enable Debug Output

Output Structure

Docker Mount

Use Cases

Building Images

CPU Image (Multi-Architecture)

GPU Image (x86_64 + ARM64 with local wheel)

Running

CPU (Any machine)

GPU (NVIDIA)

GPU Support Analysis

Host System Reference (DGX Spark)

PaddlePaddle GPU Platform Support

ARM64 GPU Support

⚠️ Known Limitation: Blackwell GPU (sm_121 / GB10)

Symptoms

Root Cause

Debug Script

Workarounds

GPU Support Matrix (Updated)

FAQ: Why Doesn't CUDA Backward Compatibility Work?

FAQ: Does Dynamic Graph Mode Work on Blackwell?

FAQ: Can I Run AMD64 Containers on ARM64 DGX Spark?

Options for ARM64 Systems

Option 1: CPU-Only (Recommended)

Option 2: Build PaddlePaddle from Source (Docker-based)

Option 3: Alternative OCR Engines

x86_64 GPU Setup (Working)

GPU Docker Compose Configuration

DGX Spark / ARM64 Quick Start

Cross-Compile from x86_64

Using with Ray Tune

Multi-Worker Setup for Parallel Trials

Single Worker Setup

Architecture: Model Lifecycle

Performance Comparison

Troubleshooting

Model download slow on first run

Out of memory

GPU not detected

PaddlePaddle GPU installation fails

CI/CD Pipeline

What CI Builds

ARM64 Wheel Workflow

Step 1: Build ARM64 Wheels (One-time, on ARM64 machine)

Step 2: Upload Wheels to Gitea Packages

Step 3: CI Builds Images

Required CI Secrets

Manual Image Push

Updating the ARM64 Wheels

26 KiB

Raw Blame History

`GET /health`

`POST /evaluate`

`POST /evaluate_full`