src/paddle_ocr/README.md

# PaddleOCR Tuning REST API

REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.

## Quick Start with Docker Compose

Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
- `ocr-cpu` - CPU-only version (works everywhere)
- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)

### Run CPU Version

```bash
cd src/paddle_ocr

# Build and start (first time takes ~2-3 min to build, ~30s to load model)
docker compose up ocr-cpu

# Or run in background (detached)
docker compose up -d ocr-cpu

# View logs
docker compose logs -f ocr-cpu

# Stop
docker compose down
```

### Run GPU Version

```bash
# Requires: NVIDIA GPU + nvidia-container-toolkit installed
docker compose up ocr-gpu
```

### Test the API

Once running, test with:
```bash
# Check health
curl http://localhost:8000/health

# Or use the test script
pip install requests
python test.py --url http://localhost:8000
```

### What Docker Compose Does

```
docker compose up ocr-cpu
       │
       ├─► Builds image from Dockerfile.cpu (if not exists)
       ├─► Creates container "paddle-ocr-cpu"
       ├─► Mounts ../dataset → /app/dataset (your PDF images)
       ├─► Mounts paddlex-cache volume (persists downloaded models)
       ├─► Exposes port 8000
       └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000
```

## Files

| File | Description |
|------|-------------|
| `paddle_ocr_tuning_rest.py` | FastAPI REST service |
| `dataset_manager.py` | Dataset loader |
| `test.py` | API test client |
| `Dockerfile.cpu` | CPU-only image (multi-arch) |
| `Dockerfile.gpu` | GPU/CUDA image (x86_64 + ARM64 with local wheel) |
| `Dockerfile.build-paddle` | PaddlePaddle GPU wheel builder for ARM64 |
| `docker-compose.yml` | Service orchestration |
| `wheels/` | Local PaddlePaddle wheels (created by build-paddle) |

## API Endpoints

### `GET /health`
Check if service is ready.

```json
{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}
```

### `POST /evaluate`
Run OCR evaluation with given hyperparameters.

**Request:**
```json
{
  "pdf_folder": "/app/dataset",
  "textline_orientation": true,
  "use_doc_orientation_classify": false,
  "use_doc_unwarping": false,
  "text_det_thresh": 0.469,
  "text_det_box_thresh": 0.5412,
  "text_det_unclip_ratio": 0.0,
  "text_rec_score_thresh": 0.635,
  "start_page": 5,
  "end_page": 10
}
```

**Response:**
```json
{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}
```

### `POST /evaluate_full`
Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).

## Building Images

### CPU Image (Multi-Architecture)

```bash
# Local build (current architecture)
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .

# Multi-arch build with buildx (amd64 + arm64)
docker buildx create --name multiarch --use
docker buildx build -f Dockerfile.cpu \
  --platform linux/amd64,linux/arm64 \
  -t paddle-ocr-api:cpu \
  --push .
```

### GPU Image (x86_64 only)

```bash
docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .
```

> **Note:** PaddlePaddle GPU 3.x packages are **not on PyPI**. The Dockerfile installs from PaddlePaddle's official CUDA index (`paddlepaddle.org.cn/packages/stable/cu126/`). This is handled automatically during build.

## Running

### CPU (Any machine)

```bash
docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:cpu
```

### GPU (NVIDIA)

```bash
docker run -d -p 8000:8000 --gpus all \
  -v $(pwd)/../dataset:/app/dataset:ro \
  -v paddlex-cache:/root/.paddlex \
  paddle-ocr-api:gpu
```

## GPU Support Analysis

### Host System Reference (DGX Spark)

This section documents GPU support findings based on testing on an NVIDIA DGX Spark:

| Component | Value |
|-----------|-------|
| Architecture | ARM64 (aarch64) |
| CPU | NVIDIA Grace (ARM) |
| GPU | NVIDIA GB10 |
| CUDA Version | 13.0 |
| Driver | 580.95.05 |
| OS | Ubuntu 24.04 LTS |
| Container Toolkit | nvidia-container-toolkit 1.18.1 |
| Docker | 28.5.1 |
| Docker Compose | v2.40.0 |

### PaddlePaddle GPU Platform Support

**Critical Finding:** PaddlePaddle-GPU does **NOT** support ARM64/aarch64 architecture.

| Platform | CPU | GPU |
|----------|-----|-----|
| Linux x86_64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
| Windows x64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
| macOS x64 | ✅ | ❌ |
| macOS ARM64 (M1/M2) | ✅ | ❌ |
| Linux ARM64 (Jetson/DGX) | ✅ | ❌ No wheels |

**Source:** [PaddlePaddle-GPU PyPI](https://pypi.org/project/paddlepaddle-gpu/) - only `manylinux_x86_64` and `win_amd64` wheels available.

### Why GPU Doesn't Work on ARM64

1. **No prebuilt wheels**: `pip install paddlepaddle-gpu` fails on ARM64 - no compatible wheels exist
2. **Not a CUDA issue**: The NVIDIA CUDA base images work fine on ARM64 (`nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04`)
3. **Not a container toolkit issue**: `nvidia-container-toolkit` is installed and functional
4. **PaddlePaddle limitation**: The Paddle team hasn't compiled GPU wheels for ARM64

When you run `pip install paddlepaddle-gpu` on ARM64:
```
ERROR: No matching distribution found for paddlepaddle-gpu
```

### Options for ARM64 Systems

#### Option 1: CPU-Only (Recommended)

Use `Dockerfile.cpu` which works on ARM64:

```bash
# On DGX Spark
docker compose up ocr-cpu

# Or build directly
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
```

**Performance:** CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.

#### Option 2: Build PaddlePaddle from Source (Docker-based)

Use the included Docker builder to compile PaddlePaddle GPU for ARM64:

```bash
cd src/paddle_ocr

# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)
docker compose --profile build run --rm build-paddle

# Verify wheel was created
ls -la wheels/paddlepaddle*.whl

# Step 2: Build the GPU image (uses local wheel)
docker compose build ocr-gpu

# Step 3: Run with GPU
docker compose up ocr-gpu

# Verify GPU is working
docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"
```

**What this does:**
1. `build-paddle` compiles PaddlePaddle from source inside a CUDA container
2. The wheel is saved to `./wheels/` directory
3. `Dockerfile.gpu` detects the local wheel and uses it instead of PyPI

**Caveats:**
- Build takes 2-4 hours on first run
- Requires ~20GB disk space during build
- Not officially supported by PaddlePaddle team
- May need adjustments for future PaddlePaddle versions

See: [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327)

#### Option 3: Alternative OCR Engines

For ARM64 GPU acceleration, consider alternatives:

| Engine | ARM64 GPU | Notes |
|--------|-----------|-------|
| **Tesseract** | ❌ CPU-only | Good fallback, widely available |
| **EasyOCR** | ⚠️ Via PyTorch | PyTorch has ARM64 GPU support |
| **TrOCR** | ⚠️ Via Transformers | Hugging Face Transformers + PyTorch |
| **docTR** | ⚠️ Via TensorFlow/PyTorch | Both backends have ARM64 support |

EasyOCR with PyTorch is a viable alternative:
```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install easyocr
```

### x86_64 GPU Setup (Working)

For x86_64 systems with NVIDIA GPU, the GPU Docker works:

```bash
# Verify GPU is accessible
nvidia-smi

# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Build and run GPU version
docker compose up ocr-gpu
```

### GPU Docker Compose Configuration

The `docker-compose.yml` configures GPU access via:

```yaml
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]
```

This requires Docker Compose v2 and nvidia-container-toolkit.

## DGX Spark / ARM64 Quick Start

For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:

```bash
cd src/paddle_ocr

# Build ARM64-native CPU image
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .

# Run
docker run -d -p 8000:8000 \
  -v $(pwd)/../dataset:/app/dataset:ro \
  paddle-ocr-api:arm64

# Test
curl http://localhost:8000/health
```

### Cross-Compile from x86_64

Build ARM64 images from an x86_64 machine:

```bash
# Setup buildx for multi-arch
docker buildx create --name mybuilder --use

# Build ARM64 image from x86_64 machine
docker buildx build -f Dockerfile.cpu \
  --platform linux/arm64 \
  -t paddle-ocr-api:arm64 \
  --load .

# Save and transfer to DGX Spark
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
scp paddle-ocr-arm64.tar.gz dgx-spark:~/

# On DGX Spark:
docker load < paddle-ocr-arm64.tar.gz
```

## Using with Ray Tune

Update your notebook's `trainable_paddle_ocr` function:

```python
import requests

API_URL = "http://localhost:8000/evaluate"

def trainable_paddle_ocr(config):
    """Call OCR API instead of subprocess."""
    payload = {
        "pdf_folder": "/app/dataset",
        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
        "use_doc_unwarping": config.get("use_doc_unwarping", False),
        "textline_orientation": config.get("textline_orientation", True),
        "text_det_thresh": config.get("text_det_thresh", 0.0),
        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
    }

    try:
        response = requests.post(API_URL, json=payload, timeout=600)
        response.raise_for_status()
        metrics = response.json()
        tune.report(metrics=metrics)
    except Exception as e:
        tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})
```

## Architecture: Model Lifecycle

The model is loaded **once** at container startup and stays in memory for all requests:

```mermaid
flowchart TB
    subgraph Container["Docker Container Lifecycle"]
        Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
        Load --> Ready[API Ready<br/>Models in RAM ~500MB]

        subgraph Requests["Incoming Requests - Models Stay Loaded"]
            Ready --> R1[Request 1] --> Ready
            Ready --> R2[Request 2] --> Ready
            Ready --> RN[Request N...] --> Ready
        end

        Ready --> Stop([Container Stop])
        Stop --> Free[Models Freed]
    end

    style Load fill:#f9f,stroke:#333
    style Ready fill:#9f9,stroke:#333
    style Requests fill:#e8f4ea,stroke:#090
```

**Subprocess vs REST API comparison:**

```mermaid
flowchart LR
    subgraph Subprocess["❌ Subprocess Approach"]
        direction TB
        S1[Trial 1] --> L1[Load Model ~10s]
        L1 --> E1[Evaluate ~60s]
        E1 --> U1[Unload]
        U1 --> S2[Trial 2]
        S2 --> L2[Load Model ~10s]
        L2 --> E2[Evaluate ~60s]
    end

    subgraph REST["✅ REST API Approach"]
        direction TB
        Start2[Start Container] --> Load2[Load Model ~10s]
        Load2 --> Ready2[Model in Memory]
        Ready2 --> T1[Trial 1 ~60s]
        T1 --> Ready2
        Ready2 --> T2[Trial 2 ~60s]
        T2 --> Ready2
        Ready2 --> TN[Trial N ~60s]
    end

    style L1 fill:#faa
    style L2 fill:#faa
    style Load2 fill:#afa
    style Ready2 fill:#afa
```

## Performance Comparison

| Approach | Model Load | Per-Trial Overhead | 64 Trials |
|----------|------------|-------------------|-----------|
| Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours |
| Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours |
| **REST API** | **Once** | **~0.1s** | **~5.8 hours** |

The REST API saves ~1+ hour by loading the model only once.

## Troubleshooting

### Model download slow on first run
The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.

### Out of memory
Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
```bash
docker run --memory=8g ...
```

### GPU not detected
Ensure NVIDIA Container Toolkit is installed:
```bash
nvidia-smi  # Should work
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi  # Should work
```

### PaddlePaddle GPU installation fails
PaddlePaddle 3.x GPU packages are **not available on PyPI**. They must be installed from PaddlePaddle's official index:
```bash
# For CUDA 12.x
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# For CUDA 11.8
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
The Dockerfile.gpu handles this automatically.
doceker support 2026-01-17 10:24:00 +01:00			`# PaddleOCR Tuning REST API`

			`REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.`

			`## Quick Start with Docker Compose`

			Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
			- `ocr-cpu` - CPU-only version (works everywhere)
			- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)

			`### Run CPU Version`

			```bash
			`cd src/paddle_ocr`

			`# Build and start (first time takes ~2-3 min to build, ~30s to load model)`
			`docker compose up ocr-cpu`

			`# Or run in background (detached)`
			`docker compose up -d ocr-cpu`

			`# View logs`
			`docker compose logs -f ocr-cpu`

			`# Stop`
			`docker compose down`
			```

			`### Run GPU Version`

			```bash
			`# Requires: NVIDIA GPU + nvidia-container-toolkit installed`
			`docker compose up ocr-gpu`
			```

			`### Test the API`

			`Once running, test with:`
			```bash
			`# Check health`
			`curl http://localhost:8000/health`

			`# Or use the test script`
			`pip install requests`
			`python test.py --url http://localhost:8000`
			```

			`### What Docker Compose Does`

			```
			`docker compose up ocr-cpu`
			`│`
			`├─► Builds image from Dockerfile.cpu (if not exists)`
			`├─► Creates container "paddle-ocr-cpu"`
			`├─► Mounts ../dataset → /app/dataset (your PDF images)`
			`├─► Mounts paddlex-cache volume (persists downloaded models)`
			`├─► Exposes port 8000`
			`└─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000`
			```

			`## Files`

			`\| File \| Description \|`
			`\|------\|-------------\|`
			\| `paddle_ocr_tuning_rest.py` \| FastAPI REST service \|
			\| `dataset_manager.py` \| Dataset loader \|
			\| `test.py` \| API test client \|
			\| `Dockerfile.cpu` \| CPU-only image (multi-arch) \|
gpu dgx 2026-01-17 10:46:36 +01:00			\| `Dockerfile.gpu` \| GPU/CUDA image (x86_64 + ARM64 with local wheel) \|
			\| `Dockerfile.build-paddle` \| PaddlePaddle GPU wheel builder for ARM64 \|
doceker support 2026-01-17 10:24:00 +01:00			\| `docker-compose.yml` \| Service orchestration \|
gpu dgx 2026-01-17 10:46:36 +01:00			\| `wheels/` \| Local PaddlePaddle wheels (created by build-paddle) \|
doceker support 2026-01-17 10:24:00 +01:00
			`## API Endpoints`

			### `GET /health`
			`Check if service is ready.`

			```json
			`{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}`
			```

			### `POST /evaluate`
			`Run OCR evaluation with given hyperparameters.`

			`Request:`
			```json
			`{`
			`"pdf_folder": "/app/dataset",`
			`"textline_orientation": true,`
			`"use_doc_orientation_classify": false,`
			`"use_doc_unwarping": false,`
			`"text_det_thresh": 0.469,`
			`"text_det_box_thresh": 0.5412,`
			`"text_det_unclip_ratio": 0.0,`
			`"text_rec_score_thresh": 0.635,`
			`"start_page": 5,`
			`"end_page": 10`
			`}`
			```

			`Response:`
			```json
			`{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}`
			```

			### `POST /evaluate_full`
			Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).

			`## Building Images`

			`### CPU Image (Multi-Architecture)`

			```bash
			`# Local build (current architecture)`
			`docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .`

			`# Multi-arch build with buildx (amd64 + arm64)`
			`docker buildx create --name multiarch --use`
			`docker buildx build -f Dockerfile.cpu \`
			`--platform linux/amd64,linux/arm64 \`
			`-t paddle-ocr-api:cpu \`
			`--push .`
			```

			`### GPU Image (x86_64 only)`

			```bash
			`docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .`
			```

gpu amd 64 2026-01-17 11:10:13 +01:00			> Note: PaddlePaddle GPU 3.x packages are not on PyPI. The Dockerfile installs from PaddlePaddle's official CUDA index (`paddlepaddle.org.cn/packages/stable/cu126/`). This is handled automatically during build.

doceker support 2026-01-17 10:24:00 +01:00			`## Running`

			`### CPU (Any machine)`

			```bash
			`docker run -d -p 8000:8000 \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`-v paddlex-cache:/root/.paddlex \`
			`paddle-ocr-api:cpu`
			```

			`### GPU (NVIDIA)`

			```bash
			`docker run -d -p 8000:8000 --gpus all \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`-v paddlex-cache:/root/.paddlex \`
			`paddle-ocr-api:gpu`
			```

gpu dgx 2026-01-17 10:46:36 +01:00			`## GPU Support Analysis`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`### Host System Reference (DGX Spark)`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`This section documents GPU support findings based on testing on an NVIDIA DGX Spark:`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`\| Component \| Value \|`
			`\|-----------\|-------\|`
			`\| Architecture \| ARM64 (aarch64) \|`
			`\| CPU \| NVIDIA Grace (ARM) \|`
			`\| GPU \| NVIDIA GB10 \|`
			`\| CUDA Version \| 13.0 \|`
			`\| Driver \| 580.95.05 \|`
			`\| OS \| Ubuntu 24.04 LTS \|`
			`\| Container Toolkit \| nvidia-container-toolkit 1.18.1 \|`
			`\| Docker \| 28.5.1 \|`
			`\| Docker Compose \| v2.40.0 \|`

			`### PaddlePaddle GPU Platform Support`

			`Critical Finding: PaddlePaddle-GPU does NOT support ARM64/aarch64 architecture.`

			`\| Platform \| CPU \| GPU \|`
			`\|----------\|-----\|-----\|`
			`\| Linux x86_64 \| ✅ \| ✅ CUDA 10.2/11.x/12.x \|`
			`\| Windows x64 \| ✅ \| ✅ CUDA 10.2/11.x/12.x \|`
			`\| macOS x64 \| ✅ \| ❌ \|`
			`\| macOS ARM64 (M1/M2) \| ✅ \| ❌ \|`
			`\| Linux ARM64 (Jetson/DGX) \| ✅ \| ❌ No wheels \|`

			Source: [PaddlePaddle-GPU PyPI](https://pypi.org/project/paddlepaddle-gpu/) - only `manylinux_x86_64` and `win_amd64` wheels available.

			`### Why GPU Doesn't Work on ARM64`

			1. No prebuilt wheels: `pip install paddlepaddle-gpu` fails on ARM64 - no compatible wheels exist
			2. Not a CUDA issue: The NVIDIA CUDA base images work fine on ARM64 (`nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04`)
			3. Not a container toolkit issue: `nvidia-container-toolkit` is installed and functional
			`4. PaddlePaddle limitation: The Paddle team hasn't compiled GPU wheels for ARM64`

			When you run `pip install paddlepaddle-gpu` on ARM64:
			```
			`ERROR: No matching distribution found for paddlepaddle-gpu`
			```

			`### Options for ARM64 Systems`

			`#### Option 1: CPU-Only (Recommended)`

			Use `Dockerfile.cpu` which works on ARM64:
doceker support 2026-01-17 10:24:00 +01:00
			```bash
gpu dgx 2026-01-17 10:46:36 +01:00			`# On DGX Spark`
			`docker compose up ocr-cpu`

			`# Or build directly`
			`docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .`
doceker support 2026-01-17 10:24:00 +01:00			```

gpu dgx 2026-01-17 10:46:36 +01:00			`Performance: CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.`

			`#### Option 2: Build PaddlePaddle from Source (Docker-based)`

			`Use the included Docker builder to compile PaddlePaddle GPU for ARM64:`

			```bash
			`cd src/paddle_ocr`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)`
			`docker compose --profile build run --rm build-paddle`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`# Verify wheel was created`
			`ls -la wheels/paddlepaddle*.whl`

			`# Step 2: Build the GPU image (uses local wheel)`
			`docker compose build ocr-gpu`

			`# Step 3: Run with GPU`
			`docker compose up ocr-gpu`

			`# Verify GPU is working`
			`docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"`
doceker support 2026-01-17 10:24:00 +01:00			```

gpu dgx 2026-01-17 10:46:36 +01:00			`What this does:`
			1. `build-paddle` compiles PaddlePaddle from source inside a CUDA container
			2. The wheel is saved to `./wheels/` directory
			3. `Dockerfile.gpu` detects the local wheel and uses it instead of PyPI

			`Caveats:`
			`- Build takes 2-4 hours on first run`
			`- Requires ~20GB disk space during build`
			`- Not officially supported by PaddlePaddle team`
			`- May need adjustments for future PaddlePaddle versions`

			`See: [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327)`

			`#### Option 3: Alternative OCR Engines`

			`For ARM64 GPU acceleration, consider alternatives:`

			`\| Engine \| ARM64 GPU \| Notes \|`
			`\|--------\|-----------\|-------\|`
			`\| Tesseract \| ❌ CPU-only \| Good fallback, widely available \|`
			`\| EasyOCR \| ⚠️ Via PyTorch \| PyTorch has ARM64 GPU support \|`
			`\| TrOCR \| ⚠️ Via Transformers \| Hugging Face Transformers + PyTorch \|`
			`\| docTR \| ⚠️ Via TensorFlow/PyTorch \| Both backends have ARM64 support \|`

			`EasyOCR with PyTorch is a viable alternative:`
doceker support 2026-01-17 10:24:00 +01:00			```bash
gpu dgx 2026-01-17 10:46:36 +01:00			`pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121`
			`pip install easyocr`
doceker support 2026-01-17 10:24:00 +01:00			```

gpu dgx 2026-01-17 10:46:36 +01:00			`### x86_64 GPU Setup (Working)`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`For x86_64 systems with NVIDIA GPU, the GPU Docker works:`
doceker support 2026-01-17 10:24:00 +01:00
			```bash
gpu dgx 2026-01-17 10:46:36 +01:00			`# Verify GPU is accessible`
			`nvidia-smi`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`# Verify Docker GPU access`
			`docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi`

			`# Build and run GPU version`
			`docker compose up ocr-gpu`
			```

			`### GPU Docker Compose Configuration`

			The `docker-compose.yml` configures GPU access via:

			```yaml
			`deploy:`
			`resources:`
			`reservations:`
			`devices:`
			`- driver: nvidia`
			`count: 1`
			`capabilities: [gpu]`
doceker support 2026-01-17 10:24:00 +01:00			```

gpu dgx 2026-01-17 10:46:36 +01:00			`This requires Docker Compose v2 and nvidia-container-toolkit.`

			`## DGX Spark / ARM64 Quick Start`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			`For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:`
doceker support 2026-01-17 10:24:00 +01:00
gpu dgx 2026-01-17 10:46:36 +01:00			```bash
			`cd src/paddle_ocr`

			`# Build ARM64-native CPU image`
			`docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .`

			`# Run`
			`docker run -d -p 8000:8000 \`
			`-v $(pwd)/../dataset:/app/dataset:ro \`
			`paddle-ocr-api:arm64`

			`# Test`
			`curl http://localhost:8000/health`
			```

			`### Cross-Compile from x86_64`

			`Build ARM64 images from an x86_64 machine:`
doceker support 2026-01-17 10:24:00 +01:00
			```bash
			`# Setup buildx for multi-arch`
			`docker buildx create --name mybuilder --use`

			`# Build ARM64 image from x86_64 machine`
			`docker buildx build -f Dockerfile.cpu \`
			`--platform linux/arm64 \`
			`-t paddle-ocr-api:arm64 \`
			`--load .`

			`# Save and transfer to DGX Spark`
			`docker save paddle-ocr-api:arm64 \| gzip > paddle-ocr-arm64.tar.gz`
			`scp paddle-ocr-arm64.tar.gz dgx-spark:~/`
gpu dgx 2026-01-17 10:46:36 +01:00
doceker support 2026-01-17 10:24:00 +01:00			`# On DGX Spark:`
			`docker load < paddle-ocr-arm64.tar.gz`
			```

			`## Using with Ray Tune`

			Update your notebook's `trainable_paddle_ocr` function:

			```python
			`import requests`

			`API_URL = "http://localhost:8000/evaluate"`

			`def trainable_paddle_ocr(config):`
			`"""Call OCR API instead of subprocess."""`
			`payload = {`
			`"pdf_folder": "/app/dataset",`
			`"use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),`
			`"use_doc_unwarping": config.get("use_doc_unwarping", False),`
			`"textline_orientation": config.get("textline_orientation", True),`
			`"text_det_thresh": config.get("text_det_thresh", 0.0),`
			`"text_det_box_thresh": config.get("text_det_box_thresh", 0.0),`
			`"text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),`
			`"text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),`
			`}`

			`try:`
			`response = requests.post(API_URL, json=payload, timeout=600)`
			`response.raise_for_status()`
			`metrics = response.json()`
			`tune.report(metrics=metrics)`
			`except Exception as e:`
			`tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})`
			```

			`## Architecture: Model Lifecycle`

			`The model is loaded once at container startup and stays in memory for all requests:`

			```mermaid
			`flowchart TB`
			`subgraph Container["Docker Container Lifecycle"]`
			`Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]`
			`Load --> Ready[API Ready<br/>Models in RAM ~500MB]`

			`subgraph Requests["Incoming Requests - Models Stay Loaded"]`
			`Ready --> R1[Request 1] --> Ready`
			`Ready --> R2[Request 2] --> Ready`
			`Ready --> RN[Request N...] --> Ready`
			`end`

			`Ready --> Stop([Container Stop])`
			`Stop --> Free[Models Freed]`
			`end`

			`style Load fill:#f9f,stroke:#333`
			`style Ready fill:#9f9,stroke:#333`
			`style Requests fill:#e8f4ea,stroke:#090`
			```

			`Subprocess vs REST API comparison:`

			```mermaid
			`flowchart LR`
			`subgraph Subprocess["❌ Subprocess Approach"]`
			`direction TB`
			`S1[Trial 1] --> L1[Load Model ~10s]`
			`L1 --> E1[Evaluate ~60s]`
			`E1 --> U1[Unload]`
			`U1 --> S2[Trial 2]`
			`S2 --> L2[Load Model ~10s]`
			`L2 --> E2[Evaluate ~60s]`
			`end`

			`subgraph REST["✅ REST API Approach"]`
			`direction TB`
			`Start2[Start Container] --> Load2[Load Model ~10s]`
			`Load2 --> Ready2[Model in Memory]`
			`Ready2 --> T1[Trial 1 ~60s]`
			`T1 --> Ready2`
			`Ready2 --> T2[Trial 2 ~60s]`
			`T2 --> Ready2`
			`Ready2 --> TN[Trial N ~60s]`
			`end`

			`style L1 fill:#faa`
			`style L2 fill:#faa`
			`style Load2 fill:#afa`
			`style Ready2 fill:#afa`
			```

			`## Performance Comparison`

			`\| Approach \| Model Load \| Per-Trial Overhead \| 64 Trials \|`
			`\|----------\|------------\|-------------------\|-----------\|`
			`\| Subprocess (original) \| Every trial (~10s) \| ~10s \| ~7 hours \|`
			`\| Docker per trial \| Every trial (~10s) \| ~12-15s \| ~7.5 hours \|`
			`\| REST API \| Once \| ~0.1s \| ~5.8 hours \|`

			`The REST API saves ~1+ hour by loading the model only once.`

			`## Troubleshooting`

			`### Model download slow on first run`
			The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.

			`### Out of memory`
			Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
			```bash
			`docker run --memory=8g ...`
			```

			`### GPU not detected`
			`Ensure NVIDIA Container Toolkit is installed:`
			```bash
			`nvidia-smi # Should work`
			`docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work`
			```
gpu amd 64 2026-01-17 11:10:13 +01:00
			`### PaddlePaddle GPU installation fails`
			`PaddlePaddle 3.x GPU packages are not available on PyPI. They must be installed from PaddlePaddle's official index:`
			```bash
			`# For CUDA 12.x`
			`pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/`

			`# For CUDA 11.8`
			`pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/`
			```
			`The Dockerfile.gpu handles this automatically.`