# PaddleOCR Tuning REST API REST API service for PaddleOCR hyperparameter evaluation. Keeps the model loaded in memory for fast repeated evaluations during hyperparameter search. ## Quick Start with Docker Compose Docker Compose manages building and running containers. The `docker-compose.yml` defines two services: - `ocr-cpu` - CPU-only version (works everywhere) - `ocr-gpu` - GPU version (requires NVIDIA GPU + Container Toolkit) ### Run CPU Version ```bash cd src/paddle_ocr # Build and start (first time takes ~2-3 min to build, ~30s to load model) docker compose up ocr-cpu # Or run in background (detached) docker compose up -d ocr-cpu # View logs docker compose logs -f ocr-cpu # Stop docker compose down ``` ### Run GPU Version ```bash # Requires: NVIDIA GPU + nvidia-container-toolkit installed docker compose up ocr-gpu ``` ### Test the API Once running, test with: ```bash # Check health curl http://localhost:8000/health # Or use the test script pip install requests python test.py --url http://localhost:8000 ``` ### What Docker Compose Does ``` docker compose up ocr-cpu │ ├─► Builds image from Dockerfile.cpu (if not exists) ├─► Creates container "paddle-ocr-cpu" ├─► Mounts ../dataset → /app/dataset (your PDF images) ├─► Mounts paddlex-cache volume (persists downloaded models) ├─► Exposes port 8000 └─► Runs: uvicorn paddle_ocr_tuning_rest:app --host 0.0.0.0 --port 8000 ``` ## Files | File | Description | |------|-------------| | `paddle_ocr_tuning_rest.py` | FastAPI REST service | | `dataset_manager.py` | Dataset loader | | `test.py` | API test client | | `Dockerfile.cpu` | CPU-only image (x86_64 + ARM64 with local wheel) | | `Dockerfile.gpu` | GPU/CUDA image (x86_64 + ARM64 with local wheel) | | `Dockerfile.build-paddle` | PaddlePaddle GPU wheel builder for ARM64 | | `Dockerfile.build-paddle-cpu` | PaddlePaddle CPU wheel builder for ARM64 | | `docker-compose.yml` | Service orchestration | | `docker-compose.cpu-registry.yml` | Pull CPU image from registry | | `docker-compose.gpu-registry.yml` | Pull GPU image from registry | | `wheels/` | Local PaddlePaddle wheels (created by build-paddle) | ## API Endpoints ### `GET /health` Check if service is ready. ```json {"status": "ok", "model_loaded": true, "dataset_loaded": true, "dataset_size": 24} ``` ### `POST /evaluate` Run OCR evaluation with given hyperparameters. **Request:** ```json { "pdf_folder": "/app/dataset", "textline_orientation": true, "use_doc_orientation_classify": false, "use_doc_unwarping": false, "text_det_thresh": 0.469, "text_det_box_thresh": 0.5412, "text_det_unclip_ratio": 0.0, "text_rec_score_thresh": 0.635, "start_page": 5, "end_page": 10 } ``` **Response:** ```json {"CER": 0.0115, "WER": 0.0989, "TIME": 330.5, "PAGES": 5, "TIME_PER_PAGE": 66.1} ``` ### `POST /evaluate_full` Same as `/evaluate` but runs on ALL pages (ignores start_page/end_page). ## Debug Output (debugset) The `debugset` folder allows saving OCR predictions for debugging and analysis. When `save_output=True` is passed to `/evaluate`, predictions are written to `/app/debugset`. ### Enable Debug Output ```json { "pdf_folder": "/app/dataset", "save_output": true, "start_page": 5, "end_page": 10 } ``` ### Output Structure ``` debugset/ ├── doc1/ │ └── paddle_ocr/ │ ├── page_0005.txt │ ├── page_0006.txt │ └── ... ├── doc2/ │ └── paddle_ocr/ │ └── ... ``` Each `.txt` file contains the OCR-extracted text for that page. ### Docker Mount The `debugset` folder is mounted read-write in docker-compose: ```yaml volumes: - ../debugset:/app/debugset:rw ``` ### Use Cases - **Compare OCR engines**: Run same pages through PaddleOCR, DocTR, EasyOCR with `save_output=True`, then diff results - **Debug hyperparameters**: See how different settings affect text extraction - **Ground truth comparison**: Compare predictions against expected output ## Building Images ### CPU Image (Multi-Architecture) ```bash # Local build (current architecture) docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu . # Multi-arch build with buildx (amd64 + arm64) docker buildx create --name multiarch --use docker buildx build -f Dockerfile.cpu \ --platform linux/amd64,linux/arm64 \ -t paddle-ocr-api:cpu \ --push . ``` ### GPU Image (x86_64 + ARM64 with local wheel) ```bash docker build -f Dockerfile.gpu -t paddle-ocr-api:gpu . ``` > **Note:** PaddlePaddle GPU 3.x packages are **not on PyPI**. The Dockerfile installs from PaddlePaddle's official CUDA index (`paddlepaddle.org.cn/packages/stable/cu126/`). This is handled automatically during build. ## Running ### CPU (Any machine) ```bash docker run -d -p 8000:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ -v paddlex-cache:/root/.paddlex \ paddle-ocr-api:cpu ``` ### GPU (NVIDIA) ```bash docker run -d -p 8000:8000 --gpus all \ -v $(pwd)/../dataset:/app/dataset:ro \ -v paddlex-cache:/root/.paddlex \ paddle-ocr-api:gpu ``` ## GPU Support Analysis ### Host System Reference (DGX Spark) This section documents GPU support findings based on testing on an NVIDIA DGX Spark: | Component | Value | |-----------|-------| | Architecture | ARM64 (aarch64) | | CPU | NVIDIA Grace (ARM) | | GPU | NVIDIA GB10 | | CUDA Version | 13.0 | | Driver | 580.95.05 | | OS | Ubuntu 24.04 LTS | | Container Toolkit | nvidia-container-toolkit 1.18.1 | | Docker | 28.5.1 | | Docker Compose | v2.40.0 | ### PaddlePaddle GPU Platform Support **Note:** PaddlePaddle-GPU does NOT have prebuilt ARM64 wheels on PyPI, but ARM64 support is available via custom-built wheels. | Platform | CPU | GPU | |----------|-----|-----| | Linux x86_64 | ✅ | ✅ CUDA 10.2/11.x/12.x | | Windows x64 | ✅ | ✅ CUDA 10.2/11.x/12.x | | macOS x64 | ✅ | ❌ | | macOS ARM64 (M1/M2) | ✅ | ❌ | | Linux ARM64 (Jetson/DGX) | ✅ | ⚠️ Limited - see Blackwell note | **Source:** [PaddlePaddle-GPU PyPI](https://pypi.org/project/paddlepaddle-gpu/) - only `manylinux_x86_64` and `win_amd64` wheels available on PyPI. ARM64 wheels must be built from source or downloaded from Gitea packages. ### ARM64 GPU Support ARM64 GPU support is available but requires custom-built wheels: 1. **No prebuilt PyPI wheels**: `pip install paddlepaddle-gpu` fails on ARM64 - no compatible wheels exist on PyPI 2. **Custom wheels work**: This project provides Dockerfiles to build ARM64 GPU wheels from source 3. **CI/CD builds ARM64 GPU images**: Pre-built wheels are available from Gitea packages **To use GPU on ARM64:** - Use the pre-built images from the container registry, or - Build the wheel locally using `Dockerfile.build-paddle` (see Option 2 below), or - Download the wheel from Gitea packages: `wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl` ### ⚠️ Known Limitation: Blackwell GPU (sm_121 / GB10) **Status: GPU inference does NOT work on NVIDIA Blackwell GPUs (DGX Spark, GB200, etc.)** #### Symptoms When running PaddleOCR on Blackwell GPUs: - CUDA loads successfully ✅ - Basic tensor operations work ✅ - **Detection model outputs constant values** ❌ - 0 text regions detected - CER/WER = 100% (nothing recognized) #### Root Cause **Confirmed:** PaddlePaddle's entire CUDA backend does NOT support Blackwell (sm_121). This is NOT just an inference model problem - even basic operations fail. **Test Results (January 2026):** 1. **PTX JIT Test** (`CUDA_FORCE_PTX_JIT=1`): ``` OSError: CUDA error(209), no kernel image is available for execution on the device. [Hint: 'cudaErrorNoKernelImageForDevice'] ``` → Confirmed: No PTX code exists in PaddlePaddle binaries 2. **Dynamic Graph Mode Test** (bypassing inference models): ``` Conv2D + BatchNorm output: Output min: 0.0000 Output max: 0.0000 Output mean: 0.0000 Dynamic graph mode: BROKEN (constant output) ``` → Confirmed: Even simple nn.Conv2D produces zeros on Blackwell **Conclusion:** The issue is PaddlePaddle's compiled CUDA kernels (cubins), not just the inference models. The entire framework was compiled without sm_121 support and without PTX for JIT compilation. **Why building PaddlePaddle from source doesn't fix it:** 1. ⚠️ Building with `CUDA_ARCH=121` requires CUDA 13.0+ (PaddlePaddle only supports up to CUDA 12.6) 2. ❌ Even if you could build it, PaddleOCR models contain pre-compiled CUDA ops 3. ❌ These model files were exported/compiled targeting sm_80/sm_90 architectures 4. ❌ The model kernels execute on GPU but produce garbage output on sm_121 **To truly fix this**, the PaddlePaddle team would need to: 1. Add sm_121 to their model export pipeline 2. Re-export all PaddleOCR models (PP-OCRv4, PP-OCRv5, etc.) with Blackwell support 3. Release new model versions This is tracked in [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327). #### Debug Script Use the included debug script to verify this issue: ```bash docker exec paddle-ocr-gpu python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png ``` Expected output showing the problem: ``` OUTPUT ANALYSIS: Shape: (1, 1, 640, 640) Min: 0.000010 Max: 0.000010 # <-- Same as min = constant output Mean: 0.000010 DIAGNOSIS: PROBLEM: Output is constant - model inference is broken! This typically indicates GPU compute capability mismatch. ``` #### Workarounds 1. **Use CPU mode** (recommended): ```bash docker compose up ocr-cpu ``` The ARM Grace CPU is fast (~2-5 sec/page). This is the reliable option. 2. **Use EasyOCR or DocTR with GPU**: These use PyTorch which has official ARM64 CUDA wheels (cu128 index): ```bash # EasyOCR with GPU on DGX Spark docker build -f ../easyocr_service/Dockerfile.gpu -t easyocr-gpu ../easyocr_service docker run --gpus all -p 8002:8000 easyocr-gpu ``` 3. **Wait for PaddlePaddle Blackwell support**: Track [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327) for updates. #### GPU Support Matrix (Updated) | GPU Architecture | Compute | CPU | GPU | |------------------|---------|-----|-----| | Ampere (A100, A10) | sm_80 | ✅ | ✅ | | Hopper (H100, H200) | sm_90 | ✅ | ✅ | | **Blackwell (GB10, GB200)** | sm_121 | ✅ | ❌ Not supported | #### FAQ: Why Doesn't CUDA Backward Compatibility Work? **Q: CUDA normally runs older kernels on newer GPUs. Why doesn't this work for Blackwell?** Per [NVIDIA Blackwell Compatibility Guide](https://docs.nvidia.com/cuda/blackwell-compatibility-guide/): CUDA **can** run older code on newer GPUs via **PTX JIT compilation**: 1. PTX (Parallel Thread Execution) is NVIDIA's intermediate representation 2. If an app includes PTX code, the driver JIT-compiles it for the target GPU 3. This allows sm_80 code to run on sm_121 **The problem**: PaddleOCR inference models contain only pre-compiled **cubins** (SASS binary), not PTX. Without PTX, there's nothing to JIT-compile. We tested PTX JIT (January 2026): ```bash # Force PTX JIT compilation docker run --gpus all -e CUDA_FORCE_PTX_JIT=1 paddle-ocr-gpu \ python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png # Result: # OSError: CUDA error(209), no kernel image is available for execution on the device. ``` **Confirmed: No PTX exists** in PaddlePaddle binaries. The CUDA kernels are cubins-only (SASS binary), compiled for sm_80/sm_90 without PTX fallback. **Note on sm_121**: Per NVIDIA docs, "sm_121 is the same as sm_120 since the only difference is physically integrated CPU+GPU memory of Spark." The issue is general Blackwell (sm_12x) support, not Spark-specific. #### FAQ: Does Dynamic Graph Mode Work on Blackwell? **Q: Can I bypass inference models and use PaddlePaddle's dynamic graph mode?** **No.** We tested dynamic graph mode (January 2026): ```bash # Test script runs: paddle.nn.Conv2D + paddle.nn.BatchNorm2D python /app/scripts/test_dynamic_mode.py # Result: # Input shape: [1, 3, 224, 224] # Output shape: [1, 64, 112, 112] # Output min: 0.0000 # Output max: 0.0000 # <-- All zeros! # Output mean: 0.0000 # Dynamic graph mode: BROKEN (constant output) ``` **Conclusion:** The problem isn't limited to inference models. PaddlePaddle's core CUDA kernels (Conv2D, BatchNorm, etc.) produce garbage on sm_121. The entire framework lacks Blackwell support. #### FAQ: Can I Run AMD64 Containers on ARM64 DGX Spark? **Q: Can I just run the working x86_64 GPU image via emulation?** **Short answer: Yes for CPU, No for GPU.** You can run amd64 containers via QEMU emulation: ```bash # Install QEMU sudo apt-get install qemu binfmt-support qemu-user-static docker run --rm --privileged multiarch/qemu-user-static --reset -p yes # Run amd64 container docker run --platform linux/amd64 paddle-ocr-gpu:amd64 ... ``` **But GPU doesn't work:** - QEMU emulates CPU instructions (x86 → ARM) - **QEMU user-mode does NOT support GPU passthrough** - GPU calls from emulated x86 code cannot reach the ARM64 GPU So even if the amd64 image works on x86_64: - ❌ No GPU access through QEMU - ❌ CPU emulation is 10-100x slower than native ARM64 - ❌ Defeats the purpose entirely | Approach | CPU | GPU | Speed | |----------|-----|-----|-------| | ARM64 native (CPU) | ✅ | N/A | Fast (~2-5s/page) | | ARM64 native (GPU) | ✅ | ❌ Blackwell issue | - | | AMD64 via QEMU | ⚠️ Works | ❌ No passthrough | 10-100x slower | ### Options for ARM64 Systems #### Option 1: CPU-Only (Recommended) Use `Dockerfile.cpu` which works on ARM64: ```bash # On DGX Spark docker compose up ocr-cpu # Or build directly docker build -f Dockerfile.cpu -t paddle-ocr-api:cpu . ``` **Performance:** CPU inference on ARM64 Grace is surprisingly fast due to high core count. Expect ~2-5 seconds per page. #### Option 2: Build PaddlePaddle from Source (Docker-based) Use the included Docker builder to compile PaddlePaddle GPU for ARM64: ```bash cd src/paddle_ocr # Step 1: Build the PaddlePaddle GPU wheel (one-time, 2-4 hours) docker compose --profile build run --rm build-paddle # Verify wheel was created ls -la wheels/paddlepaddle*.whl # Step 2: Build the GPU image (uses local wheel) docker compose build ocr-gpu # Step 3: Run with GPU docker compose up ocr-gpu # Verify GPU is working docker compose exec ocr-gpu python -c "import paddle; print(paddle.device.is_compiled_with_cuda())" ``` **What this does:** 1. `build-paddle` compiles PaddlePaddle from source inside a CUDA container 2. The wheel is saved to `./wheels/` directory 3. `Dockerfile.gpu` detects the local wheel and uses it instead of PyPI **Caveats:** - Build takes 2-4 hours on first run - Requires ~20GB disk space during build - Not officially supported by PaddlePaddle team - May need adjustments for future PaddlePaddle versions See: [GitHub Issue #17327](https://github.com/PaddlePaddle/PaddleOCR/issues/17327) #### Option 3: Alternative OCR Engines For ARM64 GPU acceleration, consider alternatives: | Engine | ARM64 GPU | Notes | |--------|-----------|-------| | **Tesseract** | ❌ CPU-only | Good fallback, widely available | | **EasyOCR** | ⚠️ Via PyTorch | PyTorch has ARM64 GPU support | | **TrOCR** | ⚠️ Via Transformers | Hugging Face Transformers + PyTorch | | **docTR** | ⚠️ Via TensorFlow/PyTorch | Both backends have ARM64 support | EasyOCR with PyTorch is a viable alternative: ```bash pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 pip install easyocr ``` ### x86_64 GPU Setup (Working) For x86_64 systems with NVIDIA GPU, the GPU Docker works: ```bash # Verify GPU is accessible nvidia-smi # Verify Docker GPU access docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Build and run GPU version docker compose up ocr-gpu ``` ### GPU Docker Compose Configuration The `docker-compose.yml` configures GPU access via: ```yaml deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ``` This requires Docker Compose v2 and nvidia-container-toolkit. ## DGX Spark / ARM64 Quick Start For ARM64 systems (DGX Spark, Jetson, Graviton), use CPU-only: ```bash cd src/paddle_ocr # Build ARM64-native CPU image docker build -f Dockerfile.cpu -t paddle-ocr-api:arm64 . # Run docker run -d -p 8000:8000 \ -v $(pwd)/../dataset:/app/dataset:ro \ paddle-ocr-api:arm64 # Test curl http://localhost:8000/health ``` ### Cross-Compile from x86_64 Build ARM64 images from an x86_64 machine: ```bash # Setup buildx for multi-arch docker buildx create --name mybuilder --use # Build ARM64 image from x86_64 machine docker buildx build -f Dockerfile.cpu \ --platform linux/arm64 \ -t paddle-ocr-api:arm64 \ --load . # Save and transfer to DGX Spark docker save paddle-ocr-api:arm64 | gzip > paddle-ocr-arm64.tar.gz scp paddle-ocr-arm64.tar.gz dgx-spark:~/ # On DGX Spark: docker load < paddle-ocr-arm64.tar.gz ``` ## Using with Ray Tune ### Multi-Worker Setup for Parallel Trials Run multiple workers for parallel hyperparameter tuning: ```bash cd src/paddle_ocr # Start 2 CPU workers (ports 8001-8002) sudo docker compose -f docker-compose.workers.yml --profile cpu up -d # Or for GPU workers (if supported) sudo docker compose -f docker-compose.workers.yml --profile gpu up -d # Check workers are healthy curl http://localhost:8001/health curl http://localhost:8002/health ``` Then run the notebook with `max_concurrent_trials=2` to use both workers in parallel. ### Single Worker Setup Update your notebook's `trainable_paddle_ocr` function: ```python import requests API_URL = "http://localhost:8000/evaluate" def trainable_paddle_ocr(config): """Call OCR API instead of subprocess.""" payload = { "pdf_folder": "/app/dataset", "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False), "use_doc_unwarping": config.get("use_doc_unwarping", False), "textline_orientation": config.get("textline_orientation", True), "text_det_thresh": config.get("text_det_thresh", 0.0), "text_det_box_thresh": config.get("text_det_box_thresh", 0.0), "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5), "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0), } try: response = requests.post(API_URL, json=payload, timeout=600) response.raise_for_status() metrics = response.json() tune.report(metrics=metrics) except Exception as e: tune.report({"CER": 1.0, "WER": 1.0, "ERROR": str(e)[:500]}) ``` ## Architecture: Model Lifecycle The model is loaded **once** at container startup and stays in memory for all requests: ```mermaid flowchart TB subgraph Container["Docker Container Lifecycle"] Start([Container Start]) --> Load[Load PaddleOCR Models
~10-30s one-time cost] Load --> Ready[API Ready
Models in RAM ~500MB] subgraph Requests["Incoming Requests - Models Stay Loaded"] Ready --> R1[Request 1] --> Ready Ready --> R2[Request 2] --> Ready Ready --> RN[Request N...] --> Ready end Ready --> Stop([Container Stop]) Stop --> Free[Models Freed] end style Load fill:#f9f,stroke:#333 style Ready fill:#9f9,stroke:#333 style Requests fill:#e8f4ea,stroke:#090 ``` **Subprocess vs REST API comparison:** ```mermaid flowchart LR subgraph Subprocess["❌ Subprocess Approach"] direction TB S1[Trial 1] --> L1[Load Model ~10s] L1 --> E1[Evaluate ~60s] E1 --> U1[Unload] U1 --> S2[Trial 2] S2 --> L2[Load Model ~10s] L2 --> E2[Evaluate ~60s] end subgraph REST["✅ REST API Approach"] direction TB Start2[Start Container] --> Load2[Load Model ~10s] Load2 --> Ready2[Model in Memory] Ready2 --> T1[Trial 1 ~60s] T1 --> Ready2 Ready2 --> T2[Trial 2 ~60s] T2 --> Ready2 Ready2 --> TN[Trial N ~60s] end style L1 fill:#faa style L2 fill:#faa style Load2 fill:#afa style Ready2 fill:#afa ``` ## Performance Comparison | Approach | Model Load | Per-Trial Overhead | 64 Trials | |----------|------------|-------------------|-----------| | Subprocess (original) | Every trial (~10s) | ~10s | ~7 hours | | Docker per trial | Every trial (~10s) | ~12-15s | ~7.5 hours | | **REST API** | **Once** | **~0.1s** | **~5.8 hours** | The REST API saves ~1+ hour by loading the model only once. ## Troubleshooting ### Model download slow on first run The first run downloads ~500MB of models. Use volume `paddlex-cache` to persist them. ### Out of memory Reduce `max_concurrent_trials` in Ray Tune, or increase container memory: ```bash docker run --memory=8g ... ``` ### GPU not detected Ensure NVIDIA Container Toolkit is installed: ```bash nvidia-smi # Should work docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Should work ``` ### PaddlePaddle GPU installation fails PaddlePaddle 3.x GPU packages are **not available on PyPI**. They must be installed from PaddlePaddle's official index: ```bash # For CUDA 12.x pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ # For CUDA 11.8 pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ ``` The Dockerfile.gpu handles this automatically. ## CI/CD Pipeline The project includes a Gitea Actions workflow (`.gitea/workflows/ci.yaml`) for automated builds. ### What CI Builds | Image | Architecture | Source | |-------|--------------|--------| | `paddle-ocr-cpu:amd64` | amd64 | PyPI paddlepaddle | | `paddle-ocr-cpu:arm64` | arm64 | Pre-built wheel from Gitea packages | | `paddle-ocr-gpu:amd64` | amd64 | PyPI paddlepaddle-gpu | | `paddle-ocr-gpu:arm64` | arm64 | Pre-built wheel from Gitea packages | ### ARM64 Wheel Workflow Since PyPI wheels don't work on ARM64 (x86 SSE instructions), wheels must be built from source using sse2neon: 1. Built manually on an ARM64 machine (one-time) 2. Uploaded to Gitea generic packages 3. Downloaded by CI when building ARM64 images #### Step 1: Build ARM64 Wheels (One-time, on ARM64 machine) ```bash cd src/paddle_ocr # Build GPU wheel (requires NVIDIA GPU, takes 1-2 hours) sudo docker build -t paddle-builder:gpu-arm64 -f Dockerfile.build-paddle . sudo docker run --rm -v ./wheels:/wheels paddle-builder:gpu-arm64 # Build CPU wheel (no GPU required, takes 1-2 hours) sudo docker build -t paddle-builder:cpu-arm64 -f Dockerfile.build-paddle-cpu . sudo docker run --rm -v ./wheels:/wheels paddle-builder:cpu-arm64 # Verify wheels were created ls -la wheels/paddlepaddle*.whl # paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl (GPU) # paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl (CPU) ``` #### Step 2: Upload Wheels to Gitea Packages ```bash export GITEA_TOKEN="your-token-here" # Upload GPU wheel curl -X PUT \ -H "Authorization: token $GITEA_TOKEN" \ --upload-file wheels/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl \ "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl" # Upload CPU wheel curl -X PUT \ -H "Authorization: token $GITEA_TOKEN" \ --upload-file wheels/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl \ "https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl" ``` Wheels available at: ``` https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-gpu-arm64/3.0.0/paddlepaddle_gpu-3.0.0-cp311-cp311-linux_aarch64.whl https://seryus.ddns.net/api/packages/unir/generic/paddlepaddle-cpu-arm64/3.0.0/paddlepaddle-3.0.0-cp311-cp311-linux_aarch64.whl ``` #### Step 3: CI Builds Images CI automatically: 1. Downloads ARM64 wheels from Gitea packages (for arm64 builds only) 2. Builds both CPU and GPU images for amd64 and arm64 3. Pushes to registry with arch-specific tags ### Required CI Secrets Configure these in Gitea repository settings: | Secret | Description | |--------|-------------| | `CI_READWRITE` | Gitea token with registry read/write access | ### Manual Image Push ```bash # Login to registry docker login seryus.ddns.net # Build and push CPU (multi-arch) docker buildx build -f Dockerfile.cpu \ --platform linux/amd64,linux/arm64 \ -t seryus.ddns.net/unir/paddle-ocr-api:cpu \ --push . # Build and push GPU (x86_64) docker build -f Dockerfile.gpu -t seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64 . docker push seryus.ddns.net/unir/paddle-ocr-api:gpu-amd64 # Build and push GPU (ARM64) - requires wheel in wheels/ docker buildx build -f Dockerfile.gpu \ --platform linux/arm64 \ -t seryus.ddns.net/unir/paddle-ocr-api:gpu-arm64 \ --push . ``` ### Updating the ARM64 Wheels When PaddlePaddle releases a new version: 1. Update `PADDLE_VERSION` in `Dockerfile.build-paddle` and `Dockerfile.build-paddle-cpu` 2. Rebuild both wheels on an ARM64 machine 3. Upload to Gitea packages with new version 4. Update `PADDLE_VERSION` in `.gitea/workflows/ci.yaml`