825 lines
26 KiB
Markdown
825 lines
26 KiB
Markdown
|
|
# PaddleOCR Tuning REST API
|
||
|
|
|
||
|
|
REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search.
|
||
|
|
|
||
|
|
## Quick Start with Docker Compose
|
||
|
|
|
||
|
|
Docker Compose manages building and running containers. The `docker-compose.yml` defines two services:
|
||
|
|
- `ocr-cpu` - CPU-only version (works everywhere)
|
||
|
|
- `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit)
|
||
|
|
|
||
|
|
### Run CPU Version
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Build and start (first time takes ~2-3 min to build, ~30s to load model)
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
|
||
|
|
# Or run in background (detached)
|
||
|
|
docker compose up -d ocr-cpu
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
docker compose logs -f ocr-cpu
|
||
|
|
|
||
|
|
# Stop
|
||
|
|
docker compose down
|
||
|
|
```
|
||
|
|
|
||
|
|
### Run GPU Version
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Requires: NVIDIA GPU + nvidia-container-toolkit installed
|
||
|
|
docker compose up ocr-gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test the API
|
||
|
|
|
||
|
|
Once running, test with:
|
||
|
|
```bash
|
||
|
|
# Check health
|
||
|
|
curl http://localhost:8000/health
|
||
|
|
|
||
|
|
# Or use the test script
|
||
|
|
pip install requests
|
||
|
|
python test.py --url http://localhost:8000
|
||
|
|
```
|
||
|
|
|
||
|
|
### What Docker Compose Does
|
||
|
|
|
||
|
|
```
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
│
|
||
|
|
├─► Builds image from Dockerfile.cpu (if not exists)
|
||
|
|
├─► Creates container "paddle-ocr-cpu"
|
||
|
|
├─► Mounts ../dataset → /app/dataset (your PDF images)
|
||
|
|
├─► Mounts paddlex-cache volume (persists downloaded models)
|
||
|
|
├─► Exposes port 8000
|
||
|
|
└─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000
|
||
|
|
```
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
| File | Description |
|
||
|
|
|------|-------------|
|
||
|
|
| `paddle_ocr_tuning_rest.py` | FastAPI REST service |
|
||
|
|
| `dataset_manager.py` | Dataset loader |
|
||
|
|
| `test.py` | API test client |
|
||
|
|
| `Dockerfile.cpu` | CPU-only image (x86_64 + ARM64 with local wheel) |
|
||
|
|
| `Dockerfile.gpu` | GPU/CUDA image (x86_64 + ARM64 with local wheel) |
|
||
|
|
| `Dockerfile.build-paddle` | PaddlePaddle GPU wheel builder for ARM64 |
|
||
|
|
| `Dockerfile.build-paddle-cpu` | PaddlePaddle CPU wheel builder for ARM64 |
|
||
|
|
| `docker-compose.yml` | Service orchestration |
|
||
|
|
| `docker-compose.cpu-registry.yml` | Pull CPU image from registry |
|
||
|
|
| `docker-compose.gpu-registry.yml` | Pull GPU image from registry |
|
||
|
|
| `wheels/` | Local PaddlePaddle wheels (created by build-paddle) |
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### `GET /health`
|
||
|
|
Check if service is ready.
|
||
|
|
|
||
|
|
```json
|
||
|
|
{"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24}
|
||
|
|
```
|
||
|
|
|
||
|
|
### `POST /evaluate`
|
||
|
|
Run OCR evaluation with given hyperparameters.
|
||
|
|
|
||
|
|
**Request:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"pdf_folder": "/app/dataset",
|
||
|
|
"textline_orientation": true,
|
||
|
|
"use_doc_orientation_classify": false,
|
||
|
|
"use_doc_unwarping": false,
|
||
|
|
"text_det_thresh": 0.469,
|
||
|
|
"text_det_box_thresh": 0.5412,
|
||
|
|
"text_det_unclip_ratio": 0.0,
|
||
|
|
"text_rec_score_thresh": 0.635,
|
||
|
|
"start_page": 5,
|
||
|
|
"end_page": 10
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response:**
|
||
|
|
```json
|
||
|
|
{"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1}
|
||
|
|
```
|
||
|
|
|
||
|
|
### `POST /evaluate_full`
|
||
|
|
Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page).
|
||
|
|
|
||
|
|
## Debug Output (debugset)
|
||
|
|
|
||
|
|
The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`.
|
||
|
|
|
||
|
|
### Enable Debug Output
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"pdf_folder": "/app/dataset",
|
||
|
|
"save_output": true,
|
||
|
|
"start_page": 5,
|
||
|
|
"end_page": 10
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Output Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
debugset/
|
||
|
|
├── doc1/
|
||
|
|
│ └── paddle_ocr/
|
||
|
|
│ ├── page_0005.txt
|
||
|
|
│ ├── page_0006.txt
|
||
|
|
│ └── ...
|
||
|
|
├── doc2/
|
||
|
|
│ └── paddle_ocr/
|
||
|
|
│ └── ...
|
||
|
|
```
|
||
|
|
|
||
|
|
Each `.txt` file contains the OCR-extracted text for that page.
|
||
|
|
|
||
|
|
### Docker Mount
|
||
|
|
|
||
|
|
The `debugset` folder is mounted read-write in docker-compose:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
volumes:
|
||
|
|
- ../debugset:/app/debugset:rw
|
||
|
|
```
|
||
|
|
|
||
|
|
### Use Cases
|
||
|
|
|
||
|
|
- **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results
|
||
|
|
- **Debug hyperparameters**: See how different settings affect text extraction
|
||
|
|
- **Ground truth comparison**: Compare predictions against expected output
|
||
|
|
|
||
|
|
## Building Images
|
||
|
|
|
||
|
|
### CPU Image (Multi-Architecture)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Local build (current architecture)
|
||
|
|
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
|
||
|
|
|
||
|
|
# Multi-arch build with buildx (amd64 + arm64)
|
||
|
|
docker buildx create --name multiarch --use
|
||
|
|
docker buildx build -f Dockerfile.cpu \
|
||
|
|
--platform linux/amd64,linux/arm64 \
|
||
|
|
-t paddle-ocr-api:cpu \
|
||
|
|
--push .
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU Image (x86_64 + ARM64 with local wheel)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu .
|
||
|
|
```
|
||
|
|
|
||
|
|
> **Note:** PaddlePaddle GPU 3.x packages are **not on PyPI**. The Dockerfile installs from PaddlePaddle's official CUDA index (`paddlepaddle.org.cn/packages/stable/cu126/`). This is handled automatically during build.
|
||
|
|
|
||
|
|
## Running
|
||
|
|
|
||
|
|
### CPU (Any machine)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker run -d -p 8000:8000 \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
-v paddlex-cache:/root/.paddlex \
|
||
|
|
paddle-ocr-api:cpu
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU (NVIDIA)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker run -d -p 8000:8000 --gpus all \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
-v paddlex-cache:/root/.paddlex \
|
||
|
|
paddle-ocr-api:gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
## GPU Support Analysis
|
||
|
|
|
||
|
|
### Host System Reference (DGX Spark)
|
||
|
|
|
||
|
|
This section documents GPU support findings based on testing on an NVIDIA DGX Spark:
|
||
|
|
|
||
|
|
| Component | Value |
|
||
|
|
|-----------|-------|
|
||
|
|
| Architecture | ARM64 (aarch64) |
|
||
|
|
| CPU | NVIDIA Grace (ARM) |
|
||
|
|
| GPU | NVIDIA GB10 |
|
||
|
|
| CUDA Version | 13.0 |
|
||
|
|
| Driver | 580.95.05 |
|
||
|
|
| OS | Ubuntu 24.04 LTS |
|
||
|
|
| Container Toolkit | nvidia-container-toolkit 1.18.1 |
|
||
|
|
| Docker | 28.5.1 |
|
||
|
|
| Docker Compose | v2.40.0 |
|
||
|
|
|
||
|
|
### PaddlePaddle GPU Platform Support
|
||
|
|
|
||
|
|
**Note:** PaddlePaddle-GPU does NOT have prebuilt ARM64 wheels on PyPI, but ARM64 support is available via custom-built wheels.
|
||
|
|
|
||
|
|
| Platform | CPU | GPU |
|
||
|
|
|----------|-----|-----|
|
||
|
|
| Linux x86_64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
|
||
|
|
| Windows x64 | ✅ | ✅ CUDA 10.2/11.x/12.x |
|
||
|
|
| macOS x64 | ✅ | ❌ |
|
||
|
|
| macOS ARM64 (M1/M2) | ✅ | ❌ |
|
||
|
|
| Linux ARM64 (Jetson/DGX) | ✅ | ⚠️ Limited - see Blackwell note |
|
||
|
|
|
||
|
|
**Source:** [PaddlePaddle-GPU PyPI](https://pypi.org/project/paddlepaddle-gpu/) - only `manylinux_x86_64` and `win_amd64` wheels available on PyPI. ARM64 wheels must be built from source or downloaded from Gitea packages.
|
||
|
|
|
||
|
|
### ARM64 GPU Support
|
||
|
|
|
||
|
|
ARM64 GPU support is available but requires custom-built wheels:
|
||
|
|
|
||
|
|
1. **No prebuilt PyPI wheels**: `pip install paddlepaddle-gpu` fails on ARM64 - no compatible wheels exist on PyPI
|
||
|
|
2. **Custom wheels work**: This project provides Dockerfiles to build ARM64 GPU wheels from source
|
||
|
|
3. **CI/CD builds ARM64 GPU images**: Pre-built wheels are available from Gitea packages
|
||
|
|
|
||
|
|
**To use GPU on ARM64:**
|
||
|
|
- Use the pre-built images from the container registry, or
|
||
|
|
- Build the wheel locally using `Dockerfile.build-paddle` (see Option 2 below), or
|
||
|
|
- Download the wheel from Gitea packages: `wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl`
|
||
|
|
|
||
|
|
### ⚠️ Known Limitation: Blackwell GPU (sm_121 / GB10)
|
||
|
|
|
||
|
|
**Status: GPU inference does NOT work on NVIDIA Blackwell GPUs (DGX Spark, GB200, etc.)**
|
||
|
|
|
||
|
|
#### Symptoms
|
||
|
|
|
||
|
|
When running PaddleOCR on Blackwell GPUs:
|
||
|
|
- CUDA loads successfully ✅
|
||
|
|
- Basic tensor operations work ✅
|
||
|
|
- **Detection model outputs constant values** ❌
|
||
|
|
- 0 text regions detected
|
||
|
|
- CER/WER = 100% (nothing recognized)
|
||
|
|
|
||
|
|
#### Root Cause
|
||
|
|
|
||
|
|
**Confirmed:** PaddlePaddle's entire CUDA backend does NOT support Blackwell (sm_121). This is NOT just an inference model problem - even basic operations fail.
|
||
|
|
|
||
|
|
**Test Results (January 2026):**
|
||
|
|
|
||
|
|
1. **PTX JIT Test** (`CUDA_FORCE_PTX_JIT=1`):
|
||
|
|
```
|
||
|
|
OSError: CUDA error(209), no kernel image is available for execution on the device.
|
||
|
|
[Hint: 'cudaErrorNoKernelImageForDevice']
|
||
|
|
```
|
||
|
|
→ Confirmed: No PTX code exists in PaddlePaddle binaries
|
||
|
|
|
||
|
|
2. **Dynamic Graph Mode Test** (bypassing inference models):
|
||
|
|
```
|
||
|
|
Conv2D + BatchNorm output:
|
||
|
|
Output min: 0.0000
|
||
|
|
Output max: 0.0000
|
||
|
|
Output mean: 0.0000
|
||
|
|
Dynamic graph mode: BROKEN (constant output)
|
||
|
|
```
|
||
|
|
→ Confirmed: Even simple nn.Conv2D produces zeros on Blackwell
|
||
|
|
|
||
|
|
**Conclusion:** The issue is PaddlePaddle's compiled CUDA kernels (cubins), not just the inference models. The entire framework was compiled without sm_121 support and without PTX for JIT compilation.
|
||
|
|
|
||
|
|
**Why building PaddlePaddle from source doesn't fix it:**
|
||
|
|
|
||
|
|
1. ⚠️ Building with `CUDA_ARCH=121` requires CUDA 13.0+ (PaddlePaddle only supports up to CUDA 12.6)
|
||
|
|
2. ❌ Even if you could build it, PaddleOCR models contain pre-compiled CUDA ops
|
||
|
|
3. ❌ These model files were exported/compiled targeting sm_80/sm_90 architectures
|
||
|
|
4. ❌ The model kernels execute on GPU but produce garbage output on sm_121
|
||
|
|
|
||
|
|
**To truly fix this**, the PaddlePaddle team would need to:
|
||
|
|
1. Add sm_121 to their model export pipeline
|
||
|
|
2. Re-export all PaddleOCR models (PP-OCRv4, PP-OCRv5, etc.) with Blackwell support
|
||
|
|
3. Release new model versions
|
||
|
|
|
||
|
|
This is tracked in [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327).
|
||
|
|
|
||
|
|
#### Debug Script
|
||
|
|
|
||
|
|
Use the included debug script to verify this issue:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker exec paddle-ocr-gpu python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png
|
||
|
|
```
|
||
|
|
|
||
|
|
Expected output showing the problem:
|
||
|
|
```
|
||
|
|
OUTPUT ANALYSIS:
|
||
|
|
Shape: (1, 1, 640, 640)
|
||
|
|
Min: 0.000010
|
||
|
|
Max: 0.000010 # <-- Same as min = constant output
|
||
|
|
Mean: 0.000010
|
||
|
|
|
||
|
|
DIAGNOSIS:
|
||
|
|
PROBLEM: Output is constant - model inference is broken!
|
||
|
|
This typically indicates GPU compute capability mismatch.
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Workarounds
|
||
|
|
|
||
|
|
1. **Use CPU mode** (recommended):
|
||
|
|
```bash
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
```
|
||
|
|
The ARM Grace CPU is fast (~2-5 sec/page). This is the reliable option.
|
||
|
|
|
||
|
|
2. **Use EasyOCR or DocTR with GPU**:
|
||
|
|
These use PyTorch which has official ARM64 CUDA wheels (cu128 index):
|
||
|
|
```bash
|
||
|
|
# EasyOCR with GPU on DGX Spark
|
||
|
|
docker build -f ../easyocr_service/Dockerfile.gpu -t easyocr-gpu ../easyocr_service
|
||
|
|
docker run --gpus all -p 8002:8000 easyocr-gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Wait for PaddlePaddle Blackwell support**:
|
||
|
|
Track [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327) for updates.
|
||
|
|
|
||
|
|
#### GPU Support Matrix (Updated)
|
||
|
|
|
||
|
|
| GPU Architecture | Compute | CPU | GPU |
|
||
|
|
|------------------|---------|-----|-----|
|
||
|
|
| Ampere (A100, A10) | sm_80 | ✅ | ✅ |
|
||
|
|
| Hopper (H100, H200) | sm_90 | ✅ | ✅ |
|
||
|
|
| **Blackwell (GB10, GB200)** | sm_121 | ✅ | ❌ Not supported |
|
||
|
|
|
||
|
|
#### FAQ: Why Doesn't CUDA Backward Compatibility Work?
|
||
|
|
|
||
|
|
**Q: CUDA normally runs older kernels on newer GPUs. Why doesn't this work for Blackwell?**
|
||
|
|
|
||
|
|
Per [NVIDIA Blackwell Compatibility Guide](https://docs.nvidia.com/cuda/blackwell-compatibility-guide/):
|
||
|
|
|
||
|
|
CUDA **can** run older code on newer GPUs via **PTX JIT compilation**:
|
||
|
|
1. PTX (Parallel Thread Execution) is NVIDIA's intermediate representation
|
||
|
|
2. If an app includes PTX code, the driver JIT-compiles it for the target GPU
|
||
|
|
3. This allows sm_80 code to run on sm_121
|
||
|
|
|
||
|
|
**The problem**: PaddleOCR inference models contain only pre-compiled **cubins** (SASS binary), not PTX. Without PTX, there's nothing to JIT-compile.
|
||
|
|
|
||
|
|
We tested PTX JIT (January 2026):
|
||
|
|
```bash
|
||
|
|
# Force PTX JIT compilation
|
||
|
|
docker run --gpus all -e CUDA_FORCE_PTX_JIT=1 paddle-ocr-gpu \
|
||
|
|
python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png
|
||
|
|
|
||
|
|
# Result:
|
||
|
|
# OSError: CUDA error(209), no kernel image is available for execution on the device.
|
||
|
|
```
|
||
|
|
**Confirmed: No PTX exists** in PaddlePaddle binaries. The CUDA kernels are cubins-only (SASS binary), compiled for sm_80/sm_90 without PTX fallback.
|
||
|
|
|
||
|
|
**Note on sm_121**: Per NVIDIA docs, "sm_121 is the same as sm_120 since the only difference is physically integrated CPU+GPU memory of Spark." The issue is general Blackwell (sm_12x) support, not Spark-specific.
|
||
|
|
|
||
|
|
#### FAQ: Does Dynamic Graph Mode Work on Blackwell?
|
||
|
|
|
||
|
|
**Q: Can I bypass inference models and use PaddlePaddle's dynamic graph mode?**
|
||
|
|
|
||
|
|
**No.** We tested dynamic graph mode (January 2026):
|
||
|
|
```bash
|
||
|
|
# Test script runs: paddle.nn.Conv2D + paddle.nn.BatchNorm2D
|
||
|
|
python /app/scripts/test_dynamic_mode.py
|
||
|
|
|
||
|
|
# Result:
|
||
|
|
# Input shape: [1, 3, 224, 224]
|
||
|
|
# Output shape: [1, 64, 112, 112]
|
||
|
|
# Output min: 0.0000
|
||
|
|
# Output max: 0.0000 # <-- All zeros!
|
||
|
|
# Output mean: 0.0000
|
||
|
|
# Dynamic graph mode: BROKEN (constant output)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Conclusion:** The problem isn't limited to inference models. PaddlePaddle's core CUDA kernels (Conv2D, BatchNorm, etc.) produce garbage on sm_121. The entire framework lacks Blackwell support.
|
||
|
|
|
||
|
|
#### FAQ: Can I Run AMD64 Containers on ARM64 DGX Spark?
|
||
|
|
|
||
|
|
**Q: Can I just run the working x86_64 GPU image via emulation?**
|
||
|
|
|
||
|
|
**Short answer: Yes for CPU, No for GPU.**
|
||
|
|
|
||
|
|
You can run amd64 containers via QEMU emulation:
|
||
|
|
```bash
|
||
|
|
# Install QEMU
|
||
|
|
sudo apt-get install qemu binfmt-support qemu-user-static
|
||
|
|
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
|
||
|
|
|
||
|
|
# Run amd64 container
|
||
|
|
docker run --platform linux/amd64 paddle-ocr-gpu:amd64 ...
|
||
|
|
```
|
||
|
|
|
||
|
|
**But GPU doesn't work:**
|
||
|
|
- QEMU emulates CPU instructions (x86 → ARM)
|
||
|
|
- **QEMU user-mode does NOT support GPU passthrough**
|
||
|
|
- GPU calls from emulated x86 code cannot reach the ARM64 GPU
|
||
|
|
|
||
|
|
So even if the amd64 image works on x86_64:
|
||
|
|
- ❌ No GPU access through QEMU
|
||
|
|
- ❌ CPU emulation is 10-100x slower than native ARM64
|
||
|
|
- ❌ Defeats the purpose entirely
|
||
|
|
|
||
|
|
| Approach | CPU | GPU | Speed |
|
||
|
|
|----------|-----|-----|-------|
|
||
|
|
| ARM64 native (CPU) | ✅ | N/A | Fast (~2-5s/page) |
|
||
|
|
| ARM64 native (GPU) | ✅ | ❌ Blackwell issue | - |
|
||
|
|
| AMD64 via QEMU | ⚠️ Works | ❌ No passthrough | 10-100x slower |
|
||
|
|
|
||
|
|
### Options for ARM64 Systems
|
||
|
|
|
||
|
|
#### Option 1: CPU-Only (Recommended)
|
||
|
|
|
||
|
|
Use `Dockerfile.cpu` which works on ARM64:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On DGX Spark
|
||
|
|
docker compose up ocr-cpu
|
||
|
|
|
||
|
|
# Or build directly
|
||
|
|
docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu .
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance:** CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page.
|
||
|
|
|
||
|
|
#### Option 2: Build PaddlePaddle from Source (Docker-based)
|
||
|
|
|
||
|
|
Use the included Docker builder to compile PaddlePaddle GPU for ARM64:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours)
|
||
|
|
docker compose --profile build run --rm build-paddle
|
||
|
|
|
||
|
|
# Verify wheel was created
|
||
|
|
ls -la wheels/paddlepaddle*.whl
|
||
|
|
|
||
|
|
# Step 2: Build the GPU image (uses local wheel)
|
||
|
|
docker compose build ocr-gpu
|
||
|
|
|
||
|
|
# Step 3: Run with GPU
|
||
|
|
docker compose up ocr-gpu
|
||
|
|
|
||
|
|
# Verify GPU is working
|
||
|
|
docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())"
|
||
|
|
```
|
||
|
|
|
||
|
|
**What this does:**
|
||
|
|
1. `build-paddle` compiles PaddlePaddle from source inside a CUDA container
|
||
|
|
2. The wheel is saved to `./wheels/` directory
|
||
|
|
3. `Dockerfile.gpu` detects the local wheel and uses it instead of PyPI
|
||
|
|
|
||
|
|
**Caveats:**
|
||
|
|
- Build takes 2-4 hours on first run
|
||
|
|
- Requires ~20GB disk space during build
|
||
|
|
- Not officially supported by PaddlePaddle team
|
||
|
|
- May need adjustments for future PaddlePaddle versions
|
||
|
|
|
||
|
|
See: [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327)
|
||
|
|
|
||
|
|
#### Option 3: Alternative OCR Engines
|
||
|
|
|
||
|
|
For ARM64 GPU acceleration, consider alternatives:
|
||
|
|
|
||
|
|
| Engine | ARM64 GPU | Notes |
|
||
|
|
|--------|-----------|-------|
|
||
|
|
| **Tesseract** | ❌ CPU-only | Good fallback, widely available |
|
||
|
|
| **EasyOCR** | ⚠️ Via PyTorch | PyTorch has ARM64 GPU support |
|
||
|
|
| **TrOCR** | ⚠️ Via Transformers | Hugging Face Transformers + PyTorch |
|
||
|
|
| **docTR** | ⚠️ Via TensorFlow/PyTorch | Both backends have ARM64 support |
|
||
|
|
|
||
|
|
EasyOCR with PyTorch is a viable alternative:
|
||
|
|
```bash
|
||
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
|
||
|
|
pip install easyocr
|
||
|
|
```
|
||
|
|
|
||
|
|
### x86_64 GPU Setup (Working)
|
||
|
|
|
||
|
|
For x86_64 systems with NVIDIA GPU, the GPU Docker works:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Verify GPU is accessible
|
||
|
|
nvidia-smi
|
||
|
|
|
||
|
|
# Verify Docker GPU access
|
||
|
|
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
|
||
|
|
|
||
|
|
# Build and run GPU version
|
||
|
|
docker compose up ocr-gpu
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU Docker Compose Configuration
|
||
|
|
|
||
|
|
The `docker-compose.yml` configures GPU access via:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
deploy:
|
||
|
|
resources:
|
||
|
|
reservations:
|
||
|
|
devices:
|
||
|
|
- driver: nvidia
|
||
|
|
count: 1
|
||
|
|
capabilities: [gpu]
|
||
|
|
```
|
||
|
|
|
||
|
|
This requires Docker Compose v2 and nvidia-container-toolkit.
|
||
|
|
|
||
|
|
## DGX Spark / ARM64 Quick Start
|
||
|
|
|
||
|
|
For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Build ARM64-native CPU image
|
||
|
|
docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 .
|
||
|
|
|
||
|
|
# Run
|
||
|
|
docker run -d -p 8000:8000 \
|
||
|
|
-v $(pwd)/../dataset:/app/dataset:ro \
|
||
|
|
paddle-ocr-api:arm64
|
||
|
|
|
||
|
|
# Test
|
||
|
|
curl http://localhost:8000/health
|
||
|
|
```
|
||
|
|
|
||
|
|
### Cross-Compile from x86_64
|
||
|
|
|
||
|
|
Build ARM64 images from an x86_64 machine:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Setup buildx for multi-arch
|
||
|
|
docker buildx create --name mybuilder --use
|
||
|
|
|
||
|
|
# Build ARM64 image from x86_64 machine
|
||
|
|
docker buildx build -f Dockerfile.cpu \
|
||
|
|
--platform linux/arm64 \
|
||
|
|
-t paddle-ocr-api:arm64 \
|
||
|
|
--load .
|
||
|
|
|
||
|
|
# Save and transfer to DGX Spark
|
||
|
|
docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz
|
||
|
|
scp paddle-ocr-arm64.tar.gz dgx-spark:~/
|
||
|
|
|
||
|
|
# On DGX Spark:
|
||
|
|
docker load < paddle-ocr-arm64.tar.gz
|
||
|
|
```
|
||
|
|
|
||
|
|
## Using with Ray Tune
|
||
|
|
|
||
|
|
### Multi-Worker Setup for Parallel Trials
|
||
|
|
|
||
|
|
Run multiple workers for parallel hyperparameter tuning:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Start 2 CPU workers (ports 8001-8002)
|
||
|
|
sudo docker compose -f docker-compose.workers.yml --profile cpu up -d
|
||
|
|
|
||
|
|
# Or for GPU workers (if supported)
|
||
|
|
sudo docker compose -f docker-compose.workers.yml --profile gpu up -d
|
||
|
|
|
||
|
|
# Check workers are healthy
|
||
|
|
curl http://localhost:8001/health
|
||
|
|
curl http://localhost:8002/health
|
||
|
|
```
|
||
|
|
|
||
|
|
Then run the notebook with `max_concurrent_trials=2` to use both workers in parallel.
|
||
|
|
|
||
|
|
### Single Worker Setup
|
||
|
|
|
||
|
|
Update your notebook's `trainable_paddle_ocr` function:
|
||
|
|
|
||
|
|
```python
|
||
|
|
import requests
|
||
|
|
|
||
|
|
API_URL = "http://localhost:8000/evaluate"
|
||
|
|
|
||
|
|
def trainable_paddle_ocr(config):
|
||
|
|
"""Call OCR API instead of subprocess."""
|
||
|
|
payload = {
|
||
|
|
"pdf_folder": "/app/dataset",
|
||
|
|
"use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
|
||
|
|
"use_doc_unwarping": config.get("use_doc_unwarping", False),
|
||
|
|
"textline_orientation": config.get("textline_orientation", True),
|
||
|
|
"text_det_thresh": config.get("text_det_thresh", 0.0),
|
||
|
|
"text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
|
||
|
|
"text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
|
||
|
|
"text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
|
||
|
|
}
|
||
|
|
|
||
|
|
try:
|
||
|
|
response = requests.post(API_URL, json=payload, timeout=600)
|
||
|
|
response.raise_for_status()
|
||
|
|
metrics = response.json()
|
||
|
|
tune.report(metrics=metrics)
|
||
|
|
except Exception as e:
|
||
|
|
tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]})
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture: Model Lifecycle
|
||
|
|
|
||
|
|
The model is loaded **once** at container startup and stays in memory for all requests:
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
flowchart TB
|
||
|
|
subgraph Container["Docker Container Lifecycle"]
|
||
|
|
Start([Container Start]) --> Load[Load PaddleOCR Models<br/>~10-30s one-time cost]
|
||
|
|
Load --> Ready[API Ready<br/>Models in RAM ~500MB]
|
||
|
|
|
||
|
|
subgraph Requests["Incoming Requests - Models Stay Loaded"]
|
||
|
|
Ready --> R1[Request 1] --> Ready
|
||
|
|
Ready --> R2[Request 2] --> Ready
|
||
|
|
Ready --> RN[Request N...] --> Ready
|
||
|
|
end
|
||
|
|
|
||
|
|
Ready --> Stop([Container Stop])
|
||
|
|
Stop --> Free[Models Freed]
|
||
|
|
end
|
||
|
|
|
||
|
|
style Load fill:#f9f,stroke:#333
|
||
|
|
style Ready fill:#9f9,stroke:#333
|
||
|
|
style Requests fill:#e8f4ea,stroke:#090
|
||
|
|
```
|
||
|
|
|
||
|
|
**Subprocess vs REST API comparison:**
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
flowchart LR
|
||
|
|
subgraph Subprocess["❌ Subprocess Approach"]
|
||
|
|
direction TB
|
||
|
|
S1[Trial 1] --> L1[Load Model ~10s]
|
||
|
|
L1 --> E1[Evaluate ~60s]
|
||
|
|
E1 --> U1[Unload]
|
||
|
|
U1 --> S2[Trial 2]
|
||
|
|
S2 --> L2[Load Model ~10s]
|
||
|
|
L2 --> E2[Evaluate ~60s]
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph REST["✅ REST API Approach"]
|
||
|
|
direction TB
|
||
|
|
Start2[Start Container] --> Load2[Load Model ~10s]
|
||
|
|
Load2 --> Ready2[Model in Memory]
|
||
|
|
Ready2 --> T1[Trial 1 ~60s]
|
||
|
|
T1 --> Ready2
|
||
|
|
Ready2 --> T2[Trial 2 ~60s]
|
||
|
|
T2 --> Ready2
|
||
|
|
Ready2 --> TN[Trial N ~60s]
|
||
|
|
end
|
||
|
|
|
||
|
|
style L1 fill:#faa
|
||
|
|
style L2 fill:#faa
|
||
|
|
style Load2 fill:#afa
|
||
|
|
style Ready2 fill:#afa
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Comparison
|
||
|
|
|
||
|
|
| Approach | Model Load | Per-Trial Overhead | 64 Trials |
|
||
|
|
|----------|------------|-------------------|-----------|
|
||
|
|
| Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours |
|
||
|
|
| Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours |
|
||
|
|
| **REST API** | **Once** | **~0.1s** | **~5.8 hours** |
|
||
|
|
|
||
|
|
The REST API saves ~1+ hour by loading the model only once.
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Model download slow on first run
|
||
|
|
The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them.
|
||
|
|
|
||
|
|
### Out of memory
|
||
|
|
Reduce `max_concurrent_trials` in Ray Tune, or increase container memory:
|
||
|
|
```bash
|
||
|
|
docker run --memory=8g ...
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU not detected
|
||
|
|
Ensure NVIDIA Container Toolkit is installed:
|
||
|
|
```bash
|
||
|
|
nvidia-smi # Should work
|
||
|
|
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work
|
||
|
|
```
|
||
|
|
|
||
|
|
### PaddlePaddle GPU installation fails
|
||
|
|
PaddlePaddle 3.x GPU packages are **not available on PyPI**. They must be installed from PaddlePaddle's official index:
|
||
|
|
```bash
|
||
|
|
# For CUDA 12.x
|
||
|
|
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
|
||
|
|
|
||
|
|
# For CUDA 11.8
|
||
|
|
pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
|
||
|
|
```
|
||
|
|
The Dockerfile.gpu handles this automatically.
|
||
|
|
|
||
|
|
## CI/CD Pipeline
|
||
|
|
|
||
|
|
The project includes a Gitea Actions workflow (`.gitea/workflows/ci.yaml`) for automated builds.
|
||
|
|
|
||
|
|
### What CI Builds
|
||
|
|
|
||
|
|
| Image | Architecture | Source |
|
||
|
|
|-------|--------------|--------|
|
||
|
|
| `paddle-ocr-cpu:amd64` | amd64 | PyPI paddlepaddle |
|
||
|
|
| `paddle-ocr-cpu:arm64` | arm64 | Pre-built wheel from Gitea packages |
|
||
|
|
| `paddle-ocr-gpu:amd64` | amd64 | PyPI paddlepaddle-gpu |
|
||
|
|
| `paddle-ocr-gpu:arm64` | arm64 | Pre-built wheel from Gitea packages |
|
||
|
|
|
||
|
|
### ARM64 Wheel Workflow
|
||
|
|
|
||
|
|
Since PyPI wheels don't work on ARM64 (x86 SSE instructions), wheels must be built from source using sse2neon:
|
||
|
|
|
||
|
|
1. Built manually on an ARM64 machine (one-time)
|
||
|
|
2. Uploaded to Gitea generic packages
|
||
|
|
3. Downloaded by CI when building ARM64 images
|
||
|
|
|
||
|
|
#### Step 1: Build ARM64 Wheels (One-time, on ARM64 machine)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd src/paddle_ocr
|
||
|
|
|
||
|
|
# Build GPU wheel (requires NVIDIA GPU, takes 1-2 hours)
|
||
|
|
sudo docker build -t paddle-builder:gpu-arm64 -f Dockerfile.build-paddle .
|
||
|
|
sudo docker run --rm -v ./wheels:/wheels paddle-builder:gpu-arm64
|
||
|
|
|
||
|
|
# Build CPU wheel (no GPU required, takes 1-2 hours)
|
||
|
|
sudo docker build -t paddle-builder:cpu-arm64 -f Dockerfile.build-paddle-cpu .
|
||
|
|
sudo docker run --rm -v ./wheels:/wheels paddle-builder:cpu-arm64
|
||
|
|
|
||
|
|
# Verify wheels were created
|
||
|
|
ls -la wheels/paddlepaddle*.whl
|
||
|
|
# paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl (GPU)
|
||
|
|
# paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl (CPU)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 2: Upload Wheels to Gitea Packages
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export GITEA_TOKEN="your-token-here"
|
||
|
|
|
||
|
|
# Upload GPU wheel
|
||
|
|
curl -X PUT \
|
||
|
|
-H "Authorization: token $GITEA_TOKEN" \
|
||
|
|
--upload-file wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl \
|
||
|
|
"https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl"
|
||
|
|
|
||
|
|
# Upload CPU wheel
|
||
|
|
curl -X PUT \
|
||
|
|
-H "Authorization: token $GITEA_TOKEN" \
|
||
|
|
--upload-file wheels/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl \
|
||
|
|
"https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl"
|
||
|
|
```
|
||
|
|
|
||
|
|
Wheels available at:
|
||
|
|
```
|
||
|
|
https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl
|
||
|
|
https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 3: CI Builds Images
|
||
|
|
|
||
|
|
CI automatically:
|
||
|
|
1. Downloads ARM64 wheels from Gitea packages (for arm64 builds only)
|
||
|
|
2. Builds both CPU and GPU images for amd64 and arm64
|
||
|
|
3. Pushes to registry with arch-specific tags
|
||
|
|
|
||
|
|
### Required CI Secrets
|
||
|
|
|
||
|
|
Configure these in Gitea repository settings:
|
||
|
|
|
||
|
|
| Secret | Description |
|
||
|
|
|--------|-------------|
|
||
|
|
| `CI_READWRITE` | Gitea token with registry read/write access |
|
||
|
|
|
||
|
|
### Manual Image Push
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Login to registry
|
||
|
|
docker login seryus.ddns.net
|
||
|
|
|
||
|
|
# Build and push CPU (multi-arch)
|
||
|
|
docker buildx build -f Dockerfile.cpu \
|
||
|
|
--platform linux/amd64,linux/arm64 \
|
||
|
|
-t seryus.ddns.net/unir/paddle-ocr-api:cpu \
|
||
|
|
--push .
|
||
|
|
|
||
|
|
# Build and push GPU (x86_64)
|
||
|
|
docker build -f Dockerfile.gpu -t seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64 .
|
||
|
|
docker push seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64
|
||
|
|
|
||
|
|
# Build and push GPU (ARM64) - requires wheel in wheels/
|
||
|
|
docker buildx build -f Dockerfile.gpu \
|
||
|
|
--platform linux/arm64 \
|
||
|
|
-t seryus.ddns.net/unir/paddle-ocr-api:gpu-arm64 \
|
||
|
|
--push .
|
||
|
|
```
|
||
|
|
|
||
|
|
### Updating the ARM64 Wheels
|
||
|
|
|
||
|
|
When PaddlePaddle releases a new version:
|
||
|
|
|
||
|
|
1. Update `PADDLE_VERSION` in `Dockerfile.build-paddle` and `Dockerfile.build-paddle-cpu`
|
||
|
|
2. Rebuild both wheels on an ARM64 machine
|
||
|
|
3. Upload to Gitea packages with new version
|
||
|
|
4. Update `PADDLE_VERSION` in `.gitea/workflows/ci.yaml`
|