More docs on gpu for paddle
Some checks failed
build_docker / essential (pull_request) Successful in 0s
build_docker / build_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr_gpu (pull_request) Has been cancelled
build_docker / build_cpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_cpu (linux/amd64) (pull_request) Successful in 5m0s
build_docker / build_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_cpu (pull_request) Has been cancelled
build_docker / manifest_gpu (pull_request) Has been cancelled
build_docker / build_easyocr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr (linux/arm64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr (pull_request) Has been cancelled
build_docker / manifest_doctr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/arm64) (pull_request) Has been cancelled
Some checks failed
build_docker / essential (pull_request) Successful in 0s
build_docker / build_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr_gpu (pull_request) Has been cancelled
build_docker / build_cpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_cpu (linux/amd64) (pull_request) Successful in 5m0s
build_docker / build_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_cpu (pull_request) Has been cancelled
build_docker / manifest_gpu (pull_request) Has been cancelled
build_docker / build_easyocr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr (linux/arm64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr (pull_request) Has been cancelled
build_docker / manifest_doctr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/arm64) (pull_request) Has been cancelled
This commit is contained in:
@@ -214,12 +214,33 @@ When running PaddleOCR on Blackwell GPUs:
|
||||
|
||||
#### Root Cause
|
||||
|
||||
PaddleOCR uses **pre-compiled inference models** (PP-OCRv4_mobile_det, PP-OCRv5_server_det, etc.) that contain embedded CUDA kernels. These kernels were compiled for older GPU architectures (sm_80 Ampere, sm_90 Hopper) and **do not support Blackwell (sm_121)**.
|
||||
**Confirmed:** PaddlePaddle's entire CUDA backend does NOT support Blackwell (sm_121). This is NOT just an inference model problem - even basic operations fail.
|
||||
|
||||
**Test Results (January 2026):**
|
||||
|
||||
1. **PTX JIT Test** (`CUDA_FORCE_PTX_JIT=1`):
|
||||
```
|
||||
OSError: CUDA error(209), no kernel image is available for execution on the device.
|
||||
[Hint: 'cudaErrorNoKernelImageForDevice']
|
||||
```
|
||||
→ Confirmed: No PTX code exists in PaddlePaddle binaries
|
||||
|
||||
2. **Dynamic Graph Mode Test** (bypassing inference models):
|
||||
```
|
||||
Conv2D + BatchNorm output:
|
||||
Output min: 0.0000
|
||||
Output max: 0.0000
|
||||
Output mean: 0.0000
|
||||
Dynamic graph mode: BROKEN (constant output)
|
||||
```
|
||||
→ Confirmed: Even simple nn.Conv2D produces zeros on Blackwell
|
||||
|
||||
**Conclusion:** The issue is PaddlePaddle's compiled CUDA kernels (cubins), not just the inference models. The entire framework was compiled without sm_121 support and without PTX for JIT compilation.
|
||||
|
||||
**Why building PaddlePaddle from source doesn't fix it:**
|
||||
|
||||
1. ✅ You can build `paddlepaddle-gpu` with `CUDA_ARCH=121` - this creates a Blackwell-compatible framework
|
||||
2. ❌ But the **PaddleOCR inference models** (`.pdiparams`, `.pdmodel` files) contain pre-compiled CUDA ops
|
||||
1. ⚠️ Building with `CUDA_ARCH=121` requires CUDA 13.0+ (PaddlePaddle only supports up to CUDA 12.6)
|
||||
2. ❌ Even if you could build it, PaddleOCR models contain pre-compiled CUDA ops
|
||||
3. ❌ These model files were exported/compiled targeting sm_80/sm_90 architectures
|
||||
4. ❌ The model kernels execute on GPU but produce garbage output on sm_121
|
||||
|
||||
@@ -291,17 +312,39 @@ CUDA **can** run older code on newer GPUs via **PTX JIT compilation**:
|
||||
|
||||
**The problem**: PaddleOCR inference models contain only pre-compiled **cubins** (SASS binary), not PTX. Without PTX, there's nothing to JIT-compile.
|
||||
|
||||
You can test if PTX exists:
|
||||
We tested PTX JIT (January 2026):
|
||||
```bash
|
||||
# Force PTX JIT compilation
|
||||
docker run --gpus all -e CUDA_FORCE_PTX_JIT=1 paddle-ocr-gpu \
|
||||
python /app/scripts/debug_gpu_detection.py /app/dataset/0/img/page_0001.png
|
||||
|
||||
# Result:
|
||||
# OSError: CUDA error(209), no kernel image is available for execution on the device.
|
||||
```
|
||||
- If output is still constant → No PTX in models (confirmed)
|
||||
- If output varies → PTX worked
|
||||
**Confirmed: No PTX exists** in PaddlePaddle binaries. The CUDA kernels are cubins-only (SASS binary), compiled for sm_80/sm_90 without PTX fallback.
|
||||
|
||||
**Note on sm_121**: Per NVIDIA docs, "sm_121 is the same as sm_120 since the only difference is physically integrated CPU+GPU memory of Spark." The issue is general Blackwell (sm_12x) support, not Spark-specific.
|
||||
|
||||
#### FAQ: Does Dynamic Graph Mode Work on Blackwell?
|
||||
|
||||
**Q: Can I bypass inference models and use PaddlePaddle's dynamic graph mode?**
|
||||
|
||||
**No.** We tested dynamic graph mode (January 2026):
|
||||
```bash
|
||||
# Test script runs: paddle.nn.Conv2D + paddle.nn.BatchNorm2D
|
||||
python /app/scripts/test_dynamic_mode.py
|
||||
|
||||
# Result:
|
||||
# Input shape: [1, 3, 224, 224]
|
||||
# Output shape: [1, 64, 112, 112]
|
||||
# Output min: 0.0000
|
||||
# Output max: 0.0000 # <-- All zeros!
|
||||
# Output mean: 0.0000
|
||||
# Dynamic graph mode: BROKEN (constant output)
|
||||
```
|
||||
|
||||
**Conclusion:** The problem isn't limited to inference models. PaddlePaddle's core CUDA kernels (Conv2D, BatchNorm, etc.) produce garbage on sm_121. The entire framework lacks Blackwell support.
|
||||
|
||||
#### FAQ: Can I Run AMD64 Containers on ARM64 DGX Spark?
|
||||
|
||||
**Q: Can I just run the working x86_64 GPU image via emulation?**
|
||||
|
||||
Reference in New Issue
Block a user