raytune as docker

2026-01-19 16:32:45 +01:00
parent d67cbd4677
commit 94b25f9752
20 changed files with 7214 additions and 112 deletions
--- a/src/README.md
+++ b/src/README.md
@@ -1,74 +1,153 @@
-# Running Notebooks in Background
-
-## Quick: Check Ray Tune Progress
-
-```bash
-# Is papermill still running?
-ps aux | grep papermill | grep -v grep
-
-# View live log
-tail -f papermill.log
-
-# Find latest Ray Tune run and count completed trials
-LATEST=$(ls -td ~/ray_results/trainable_* 2>/dev/null | head -1)
-echo "Run: $LATEST"
-COMPLETED=$(find "$LATEST" -name "result.json" -size +0 2>/dev/null | wc -l)
-TOTAL=$(ls -d "$LATEST"/trainable_*/ 2>/dev/null | wc -l)
-echo "Completed: $COMPLETED / $TOTAL"
-
-# Check workers are healthy
-for port in 8001 8002 8003; do
-  status=$(curl -s "localhost:$port/health" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status','down'))" 2>/dev/null || echo "down")
-  echo "Worker $port: $status"
-done
-
-# Show best result so far
-if [ "$COMPLETED" -gt 0 ]; then
-  find "$LATEST" -name "result.json" -size +0 -exec cat {} \; 2>/dev/null | \
-    python3 -c "import sys,json; results=[json.loads(l) for l in sys.stdin if l.strip()]; best=min(results,key=lambda x:x.get('CER',999)); print(f'Best CER: {best[\"CER\"]:.4f}, WER: {best[\"WER\"]:.4f}')" 2>/dev/null
-fi
-```
-
---
-
-## Option 1: Papermill (Recommended)
-
-Runs notebooks directly without conversion.
-
-```bash
-pip install papermill
-nohup papermill <notebook>.ipynb output.ipynb > papermill.log 2>&1 &
-```
-
-Monitor:
-```bash
-tail -f papermill.log
-```
-
-## Option 2: Convert to Python Script
-
-```bash
-jupyter nbconvert --to script <notebook>.ipynb
-nohup python <notebook>.py > output.log 2>&1 &
-```
-
-**Note:** `%pip install` magic commands need manual removal before running as `.py`
-
-## Important Notes
-
- Ray Tune notebooks require the OCR service running first (Docker)
- For Ray workers, imports must be inside trainable functions
-
-## Example: Ray Tune PaddleOCR
-
-```bash
-# 1. Start OCR service
-cd src/paddle_ocr && docker compose up -d ocr-cpu
-
-# 2. Run notebook with papermill
-cd src
-nohup papermill paddle_ocr_raytune_rest.ipynb output_raytune.ipynb > papermill.log 2>&1 &
-
-# 3. Monitor
-tail -f papermill.log
-```
+# OCR Hyperparameter Tuning with Ray Tune
+
+This directory contains the Docker setup for running automated hyperparameter optimization on OCR services using Ray Tune with Optuna.
+
+## Prerequisites
+
+- Docker with NVIDIA GPU support (`nvidia-container-toolkit`)
+- NVIDIA GPU with CUDA support
+
+## Quick Start
+
+```bash
+cd src
+
+# Start PaddleOCR service and run tuning (images pulled from registry)
+docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+```
+
+## Available Services
+
+| Service | Port | Compose File |
+|---------|------|--------------|
+| PaddleOCR | 8002 | `docker-compose.tuning.paddle.yml` |
+| DocTR | 8003 | `docker-compose.tuning.doctr.yml` |
+| EasyOCR | 8002 | `docker-compose.tuning.easyocr.yml` |
+
+**Note:** PaddleOCR and EasyOCR both use port 8002. Run them separately.
+
+## Usage Examples
+
+### PaddleOCR Tuning
+
+```bash
+# Start service
+docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+
+# Wait for health check (check with)
+curl http://localhost:8002/health
+
+# Run tuning (64 samples)
+docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+
+# Stop service
+docker compose -f docker-compose.tuning.paddle.yml down
+```
+
+### DocTR Tuning
+
+```bash
+docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
+curl http://localhost:8003/health
+docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
+docker compose -f docker-compose.tuning.doctr.yml down
+```
+
+### EasyOCR Tuning
+
+```bash
+docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
+curl http://localhost:8002/health
+docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
+docker compose -f docker-compose.tuning.easyocr.yml down
+```
+
+### Run Multiple Services (PaddleOCR + DocTR)
+
+```bash
+# Start both services
+docker compose -f docker-compose.tuning.yml up -d paddle-ocr-gpu doctr-gpu
+
+# Run tuning for each
+docker compose -f docker-compose.tuning.yml run raytune --service paddle --samples 64
+docker compose -f docker-compose.tuning.yml run raytune --service doctr --samples 64
+
+# Stop all
+docker compose -f docker-compose.tuning.yml down
+```
+
+## Command Line Options
+
+```bash
+docker compose -f <compose-file> run raytune --service <service> --samples <n>
+```
+
+| Option | Description | Default |
+|--------|-------------|---------|
+| `--service` | OCR service: `paddle`, `doctr`, `easyocr` | Required |
+| `--samples` | Number of hyperparameter trials | 64 |
+
+## Output
+
+Results are saved to `src/results/` as CSV files:
+- `raytune_paddle_results_<timestamp>.csv`
+- `raytune_doctr_results_<timestamp>.csv`
+- `raytune_easyocr_results_<timestamp>.csv`
+
+## Directory Structure
+
+```
+src/
+├── docker-compose.tuning.yml          # All services (PaddleOCR + DocTR)
+├── docker-compose.tuning.paddle.yml   # PaddleOCR only
+├── docker-compose.tuning.doctr.yml    # DocTR only
+├── docker-compose.tuning.easyocr.yml  # EasyOCR only
+├── raytune/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   ├── raytune_ocr.py
+│   └── run_tuning.py
+├── dataset/                           # Input images and ground truth
+├── results/                           # Output CSV files
+└── debugset/                          # Debug output
+```
+
+## Docker Images
+
+All images are pre-built and pulled from registry:
+- `seryus.ddns.net/unir/raytune:latest` - Ray Tune tuning service
+- `seryus.ddns.net/unir/paddle-ocr-gpu:latest` - PaddleOCR GPU
+- `seryus.ddns.net/unir/doctr-gpu:latest` - DocTR GPU
+- `seryus.ddns.net/unir/easyocr-gpu:latest` - EasyOCR GPU
+
+### Build locally (development)
+
+```bash
+# Build raytune image locally
+docker build -t seryus.ddns.net/unir/raytune:latest ./raytune
+```
+
+## Troubleshooting
+
+### Service not ready
+Wait for the health check to pass before running tuning:
+```bash
+# Check service health
+curl http://localhost:8002/health
+# Expected: {"status": "ok", "model_loaded": true, ...}
+```
+
+### GPU not detected
+Ensure `nvidia-container-toolkit` is installed:
+```bash
+nvidia-smi  # Should show your GPU
+docker run --rm --gpus all nvidia/cuda:12.4.1-base nvidia-smi
+```
+
+### Port already in use
+Stop any running OCR services:
+```bash
+docker compose -f docker-compose.tuning.paddle.yml down
+docker compose -f docker-compose.tuning.easyocr.yml down
+```