Paddle ocr gpu support. #4
@@ -25,6 +25,7 @@ jobs:
|
||||
image_easyocr_gpu: seryus.ddns.net/unir/easyocr-gpu
|
||||
image_doctr: seryus.ddns.net/unir/doctr-cpu
|
||||
image_doctr_gpu: seryus.ddns.net/unir/doctr-gpu
|
||||
image_raytune: seryus.ddns.net/unir/raytune
|
||||
steps:
|
||||
- name: Output version info
|
||||
run: |
|
||||
@@ -205,3 +206,32 @@ jobs:
|
||||
tags: |
|
||||
${{ needs.essential.outputs.image_doctr_gpu }}:${{ needs.essential.outputs.Version }}
|
||||
${{ needs.essential.outputs.image_doctr_gpu }}:latest
|
||||
|
||||
# Ray Tune OCR image (amd64 only)
|
||||
build_raytune:
|
||||
runs-on: ubuntu-latest
|
||||
needs: essential
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Login to Gitea Registry
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ${{ needs.essential.outputs.repo }}
|
||||
username: username
|
||||
password: ${{ secrets.CI_READWRITE }}
|
||||
|
||||
- name: Build and push Ray Tune image
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: src/raytune
|
||||
file: src/raytune/Dockerfile
|
||||
platforms: linux/amd64
|
||||
push: true
|
||||
tags: |
|
||||
${{ needs.essential.outputs.image_raytune }}:${{ needs.essential.outputs.Version }}
|
||||
${{ needs.essential.outputs.image_raytune }}:latest
|
||||
|
||||
20
README.md
20
README.md
@@ -18,11 +18,15 @@ Optimizar el rendimiento de PaddleOCR para documentos académicos en español me
|
||||
|
||||
## Resultados Principales
|
||||
|
||||
**Tabla.** *Comparación de métricas OCR entre configuración baseline y optimizada.*
|
||||
|
||||
| Modelo | CER | Precisión Caracteres | WER | Precisión Palabras |
|
||||
|--------|-----|---------------------|-----|-------------------|
|
||||
| PaddleOCR (Baseline) | 7.78% | 92.22% | 14.94% | 85.06% |
|
||||
| **PaddleOCR-HyperAdjust** | **1.49%** | **98.51%** | **7.62%** | **92.38%** |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
**Mejora obtenida:** Reducción del CER en un **80.9%**
|
||||
|
||||
### Configuración Óptima Encontrada
|
||||
@@ -56,6 +60,8 @@ PDF (académico UNIR)
|
||||
|
||||
### Experimento de Optimización
|
||||
|
||||
**Tabla.** *Parámetros de configuración del experimento Ray Tune.*
|
||||
|
||||
| Parámetro | Valor |
|
||||
|-----------|-------|
|
||||
| Número de trials | 64 |
|
||||
@@ -64,6 +70,8 @@ PDF (académico UNIR)
|
||||
| Trials concurrentes | 2 |
|
||||
| Tiempo total | ~6 horas (CPU) |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
---
|
||||
|
||||
## Estructura del Repositorio
|
||||
@@ -143,16 +151,20 @@ Se realizó una validación adicional con aceleración GPU para evaluar la viabi
|
||||
|
||||
## Requisitos
|
||||
|
||||
**Tabla.** *Dependencias principales del proyecto y versiones utilizadas.*
|
||||
|
||||
| Componente | Versión |
|
||||
|------------|---------|
|
||||
| Python | 3.11.9 |
|
||||
| Python | 3.12.3 |
|
||||
| PaddlePaddle | 3.2.2 |
|
||||
| PaddleOCR | 3.3.2 |
|
||||
| Ray | 2.52.1 |
|
||||
| Optuna | 4.6.0 |
|
||||
| Optuna | 4.7.0 |
|
||||
| jiwer | (para métricas CER/WER) |
|
||||
| PyMuPDF | (para conversión PDF) |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
---
|
||||
|
||||
## Uso
|
||||
@@ -262,11 +274,15 @@ python3 apply_content.py
|
||||
|
||||
### Archivos de Entrada y Salida
|
||||
|
||||
**Tabla.** *Relación de scripts de generación con sus archivos de entrada y salida.*
|
||||
|
||||
| Script | Entrada | Salida |
|
||||
|--------|---------|--------|
|
||||
| `generate_mermaid_figures.py` | `docs/*.md` (bloques ```mermaid```) | `thesis_output/figures/figura_*.png`, `figures_manifest.json` |
|
||||
| `apply_content.py` | `instructions/plantilla_individual.htm`, `docs/*.md`, `thesis_output/figures/*.png` | `thesis_output/plantilla_individual.htm` |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### Contenido Generado Automáticamente
|
||||
|
||||
- **30 tablas** con formato APA (Tabla X. *Título* + Fuente: ...)
|
||||
|
||||
@@ -6,7 +6,8 @@ import os
|
||||
from bs4 import BeautifulSoup, NavigableString
|
||||
|
||||
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||
TEMPLATE = os.path.join(BASE_DIR, 'thesis_output/plantilla_individual.htm')
|
||||
TEMPLATE_INPUT = os.path.join(BASE_DIR, 'instructions/plantilla_individual.htm')
|
||||
TEMPLATE_OUTPUT = os.path.join(BASE_DIR, 'thesis_output/plantilla_individual.htm')
|
||||
DOCS_DIR = os.path.join(BASE_DIR, 'docs')
|
||||
|
||||
# Global counters for tables and figures
|
||||
@@ -365,7 +366,7 @@ def main():
|
||||
global table_counter, figure_counter
|
||||
|
||||
print("Reading template...")
|
||||
html_content = read_file(TEMPLATE)
|
||||
html_content = read_file(TEMPLATE_INPUT)
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
print("Reading docs content...")
|
||||
@@ -595,9 +596,9 @@ def main():
|
||||
|
||||
print("Saving modified template...")
|
||||
output_html = str(soup)
|
||||
write_file(TEMPLATE, output_html)
|
||||
write_file(TEMPLATE_OUTPUT, output_html)
|
||||
|
||||
print(f"✓ Done! Modified: {TEMPLATE}")
|
||||
print(f"✓ Done! Modified: {TEMPLATE_OUTPUT}")
|
||||
print("\nTo convert to DOCX:")
|
||||
print("1. Open the .htm file in Microsoft Word")
|
||||
print("2. Replace [Insertar diagrama Mermaid aquí] placeholders with actual diagrams")
|
||||
|
||||
@@ -18,6 +18,8 @@ El procesamiento de documentos en español presenta particularidades que complic
|
||||
|
||||
La Tabla 1 resume los principales desafíos lingüísticos del OCR en español:
|
||||
|
||||
**Tabla 1.** *Desafíos lingüísticos específicos del OCR en español.*
|
||||
|
||||
| Desafío | Descripción | Impacto en OCR |
|
||||
|---------|-------------|----------------|
|
||||
| Caracteres especiales | ñ, á, é, í, ó, ú, ü, ¿, ¡ | Confusión con caracteres similares (n/ñ, a/á) |
|
||||
@@ -25,7 +27,7 @@ La Tabla 1 resume los principales desafíos lingüísticos del OCR en español:
|
||||
| Abreviaturas | Dr., Sra., Ud., etc. | Puntos internos confunden segmentación |
|
||||
| Nombres propios | Tildes en apellidos (García, Martínez) | Bases de datos sin soporte Unicode |
|
||||
|
||||
*Tabla 1. Desafíos lingüísticos específicos del OCR en español. Fuente: Elaboración propia.*
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
Además de los aspectos lingüísticos, los documentos académicos y administrativos en español presentan características tipográficas que complican el reconocimiento: variaciones en fuentes entre encabezados, cuerpo y notas al pie; presencia de tablas con bordes y celdas; logotipos institucionales; marcas de agua; y elementos gráficos como firmas o sellos. Estos elementos generan ruido que puede propagarse en aplicaciones downstream como la extracción de entidades nombradas o el análisis semántico.
|
||||
|
||||
@@ -37,6 +39,8 @@ La adaptación de modelos preentrenados a dominios específicos típicamente req
|
||||
|
||||
La Tabla 2 ilustra los requisitos típicos para diferentes estrategias de mejora de OCR:
|
||||
|
||||
**Tabla 2.** *Comparación de estrategias de mejora de modelos OCR.*
|
||||
|
||||
| Estrategia | Datos requeridos | Hardware | Tiempo | Expertise |
|
||||
|------------|------------------|----------|--------|-----------|
|
||||
| Fine-tuning completo | >10,000 imágenes etiquetadas | GPU (≥16GB VRAM) | Días-Semanas | Alto |
|
||||
@@ -44,7 +48,7 @@ La Tabla 2 ilustra los requisitos típicos para diferentes estrategias de mejora
|
||||
| Transfer learning | >500 imágenes etiquetadas | GPU (≥8GB VRAM) | Horas | Medio |
|
||||
| **Optimización de hiperparámetros** | **<100 imágenes de validación** | **CPU suficiente** | **Horas** | **Bajo-Medio** |
|
||||
|
||||
*Tabla 2. Comparación de estrategias de mejora de modelos OCR. Fuente: Elaboración propia.*
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### La oportunidad: optimización sin fine-tuning
|
||||
|
||||
@@ -88,6 +92,8 @@ Una solución técnicamente superior pero impracticable tiene valor limitado. Es
|
||||
|
||||
Este trabajo se centra específicamente en:
|
||||
|
||||
**Tabla 3.** *Delimitación del alcance del trabajo.*
|
||||
|
||||
| Aspecto | Dentro del alcance | Fuera del alcance |
|
||||
|---------|-------------------|-------------------|
|
||||
| **Tipo de documento** | Documentos académicos digitales (PDF) | Documentos escaneados, manuscritos |
|
||||
@@ -96,7 +102,7 @@ Este trabajo se centra específicamente en:
|
||||
| **Método de mejora** | Optimización de hiperparámetros | Fine-tuning, aumento de datos |
|
||||
| **Hardware** | Ejecución en CPU | Aceleración GPU |
|
||||
|
||||
*Tabla 3. Delimitación del alcance del trabajo. Fuente: Elaboración propia.*
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### Relevancia y beneficiarios
|
||||
|
||||
|
||||
@@ -8,6 +8,8 @@ Este capítulo establece los objetivos del trabajo siguiendo la metodología SMA
|
||||
|
||||
### Justificación SMART del Objetivo General
|
||||
|
||||
**Tabla 4.** *Justificación SMART del objetivo general.*
|
||||
|
||||
| Criterio | Cumplimiento |
|
||||
|----------|--------------|
|
||||
| **Específico (S)** | Se define claramente qué se quiere lograr: optimizar PaddleOCR mediante ajuste de hiperparámetros para documentos en español |
|
||||
@@ -16,6 +18,8 @@ Este capítulo establece los objetivos del trabajo siguiendo la metodología SMA
|
||||
| **Relevante (R)** | El impacto es demostrable: mejora la extracción de texto en documentos académicos sin costes adicionales de infraestructura |
|
||||
| **Temporal (T)** | El plazo es un cuatrimestre, correspondiente al TFM |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
## Objetivos específicos
|
||||
|
||||
### OE1: Comparar soluciones OCR de código abierto
|
||||
@@ -115,12 +119,16 @@ class ImageTextDataset:
|
||||
|
||||
#### Modelos Evaluados
|
||||
|
||||
**Tabla 5.** *Modelos OCR evaluados en el benchmark inicial.*
|
||||
|
||||
| Modelo | Versión | Configuración |
|
||||
|--------|---------|---------------|
|
||||
| EasyOCR | - | Idiomas: ['es', 'en'] |
|
||||
| PaddleOCR | PP-OCRv5 | Modelos server_det + server_rec |
|
||||
| DocTR | - | db_resnet50 + sar_resnet31 |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
#### Métricas de Evaluación
|
||||
|
||||
Se utilizó la biblioteca `jiwer` para calcular:
|
||||
@@ -139,6 +147,8 @@ def evaluate_text(reference, prediction):
|
||||
|
||||
#### Hiperparámetros Seleccionados
|
||||
|
||||
**Tabla 6.** *Hiperparámetros seleccionados para optimización.*
|
||||
|
||||
| Parámetro | Tipo | Rango/Valores | Descripción |
|
||||
|-----------|------|---------------|-------------|
|
||||
| `use_doc_orientation_classify` | Booleano | [True, False] | Clasificación de orientación del documento |
|
||||
@@ -149,6 +159,8 @@ def evaluate_text(reference, prediction):
|
||||
| `text_det_unclip_ratio` | Fijo | 0.0 | Coeficiente de expansión (fijado) |
|
||||
| `text_rec_score_thresh` | Continuo | [0.0, 0.7] | Umbral de confianza de reconocimiento |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
#### Configuración de Ray Tune
|
||||
|
||||
```python
|
||||
@@ -235,23 +247,31 @@ Y retorna métricas en formato JSON:
|
||||
|
||||
#### Hardware
|
||||
|
||||
**Tabla 7.** *Especificaciones de hardware del entorno de desarrollo.*
|
||||
|
||||
| Componente | Especificación |
|
||||
|------------|----------------|
|
||||
| CPU | Intel Core (especificar modelo) |
|
||||
| RAM | 16 GB |
|
||||
| GPU | No disponible (ejecución en CPU) |
|
||||
| CPU | AMD Ryzen 7 5800H |
|
||||
| RAM | 16 GB DDR4 |
|
||||
| GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |
|
||||
| Almacenamiento | SSD |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
#### Software
|
||||
|
||||
**Tabla 8.** *Versiones de software utilizadas.*
|
||||
|
||||
| Componente | Versión |
|
||||
|------------|---------|
|
||||
| Sistema Operativo | Windows 10/11 |
|
||||
| Python | 3.11.9 |
|
||||
| Sistema Operativo | Ubuntu 24.04.3 LTS |
|
||||
| Python | 3.12.3 |
|
||||
| PaddleOCR | 3.3.2 |
|
||||
| PaddlePaddle | 3.2.2 |
|
||||
| Ray | 2.52.1 |
|
||||
| Optuna | 4.6.0 |
|
||||
| Optuna | 4.7.0 |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### Limitaciones Metodológicas
|
||||
|
||||
|
||||
@@ -34,6 +34,11 @@ Se seleccionaron tres soluciones OCR de código abierto representativas del esta
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
**Imágenes Docker disponibles en el registro del proyecto:**
|
||||
- PaddleOCR: `seryus.ddns.net/unir/paddle-ocr-gpu`, `seryus.ddns.net/unir/paddle-ocr-cpu`
|
||||
- EasyOCR: `seryus.ddns.net/unir/easyocr-gpu`
|
||||
- DocTR: `seryus.ddns.net/unir/doctr-gpu`
|
||||
|
||||
### Criterios de Éxito
|
||||
|
||||
Los criterios establecidos para evaluar las soluciones fueron:
|
||||
@@ -322,7 +327,7 @@ Esta sección ha presentado:
|
||||
|
||||
### Introducción
|
||||
|
||||
Esta sección describe el proceso de optimización de hiperparámetros de PaddleOCR utilizando Ray Tune con el algoritmo de búsqueda Optuna. Los experimentos fueron implementados en el notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb` y los resultados se almacenaron en `src/raytune_paddle_subproc_results_20251207_192320.csv`.
|
||||
Esta sección describe el proceso de optimización de hiperparámetros de PaddleOCR utilizando Ray Tune con el algoritmo de búsqueda Optuna. Los experimentos fueron implementados en [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py) con la librería de utilidades [`src/raytune_ocr.py`](https://github.com/seryus/MastersThesis/blob/main/src/raytune_ocr.py), y los resultados se almacenaron en [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results).
|
||||
|
||||
La optimización de hiperparámetros representa una alternativa al fine-tuning tradicional que no requiere:
|
||||
- Acceso a GPU dedicada
|
||||
@@ -339,17 +344,17 @@ El experimento se ejecutó en el siguiente entorno:
|
||||
|
||||
| Componente | Versión/Especificación |
|
||||
|------------|------------------------|
|
||||
| Sistema operativo | Windows 10/11 |
|
||||
| Python | 3.11.9 |
|
||||
| Sistema operativo | Ubuntu 24.04.3 LTS |
|
||||
| Python | 3.12.3 |
|
||||
| PaddlePaddle | 3.2.2 |
|
||||
| PaddleOCR | 3.3.2 |
|
||||
| Ray | 2.52.1 |
|
||||
| Optuna | 4.6.0 |
|
||||
| CPU | Intel Core (multinúcleo) |
|
||||
| RAM | 16 GB |
|
||||
| GPU | No disponible (ejecución CPU) |
|
||||
| Optuna | 4.7.0 |
|
||||
| CPU | AMD Ryzen 7 5800H |
|
||||
| RAM | 16 GB DDR4 |
|
||||
| GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |
|
||||
|
||||
*Fuente: Outputs del notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb`.*
|
||||
*Fuente: Configuración del entorno de ejecución. Resultados en `src/results/` generados por `src/run_tuning.py`.*
|
||||
|
||||
#### Arquitectura de Ejecución
|
||||
|
||||
@@ -613,7 +618,7 @@ Configuración óptima:
|
||||
| text_det_unclip_ratio | 0.0 | 1.5 | -1.5 (fijado) |
|
||||
| text_rec_score_thresh | **0.6350** | 0.5 | +0.135 |
|
||||
|
||||
*Fuente: Análisis del notebook.*
|
||||
*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*
|
||||
|
||||
#### Análisis de Correlación
|
||||
|
||||
@@ -628,7 +633,7 @@ Se calculó la correlación de Pearson entre los parámetros continuos y las mé
|
||||
| `text_rec_score_thresh` | -0.161 | Correlación débil negativa |
|
||||
| `text_det_unclip_ratio` | NaN | Varianza cero (valor fijo) |
|
||||
|
||||
*Fuente: Análisis del notebook.*
|
||||
*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*
|
||||
|
||||
**Tabla 24.** *Correlación de parámetros con WER.*
|
||||
|
||||
@@ -638,7 +643,7 @@ Se calculó la correlación de Pearson entre los parámetros continuos y las mé
|
||||
| `text_det_box_thresh` | +0.227 | Correlación débil positiva |
|
||||
| `text_rec_score_thresh` | -0.173 | Correlación débil negativa |
|
||||
|
||||
*Fuente: Análisis del notebook.*
|
||||
*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*
|
||||
|
||||
**Hallazgo clave**: El parámetro `text_det_thresh` muestra la correlación más fuerte (-0.52 con ambas métricas), indicando que valores más altos de este umbral tienden a reducir el error. Este umbral controla qué píxeles se consideran "texto" en el mapa de probabilidad del detector.
|
||||
|
||||
@@ -653,7 +658,7 @@ El parámetro booleano `textline_orientation` demostró tener el mayor impacto e
|
||||
| True | 3.76% | 7.12% | 12.73% | 32 |
|
||||
| False | 12.40% | 14.93% | 21.71% | 32 |
|
||||
|
||||
*Fuente: Análisis del notebook.*
|
||||
*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*
|
||||
|
||||
**Interpretación:**
|
||||
|
||||
@@ -741,7 +746,7 @@ optimized_config = {
|
||||
| PaddleOCR (Baseline) | 7.78% | 92.22% | 14.94% | 85.06% |
|
||||
| PaddleOCR-HyperAdjust | **1.49%** | **98.51%** | **7.62%** | **92.38%** |
|
||||
|
||||
*Fuente: Ejecución final en notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb`.*
|
||||
*Fuente: Validación final. Código en [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py), resultados en [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results).*
|
||||
|
||||
#### Métricas de Mejora
|
||||
|
||||
@@ -823,9 +828,9 @@ Esta sección ha presentado:
|
||||
4. **Mejora final**: CER reducido de 7.78% a 1.49% (reducción del 80.9%)
|
||||
|
||||
**Fuentes de datos:**
|
||||
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb`: Código del experimento
|
||||
- `src/raytune_paddle_subproc_results_20251207_192320.csv`: Resultados de 64 trials
|
||||
- `src/paddle_ocr_tuning.py`: Script de evaluación
|
||||
- [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py): Script principal de optimización
|
||||
- [`src/raytune_ocr.py`](https://github.com/seryus/MastersThesis/blob/main/src/raytune_ocr.py): Librería de utilidades Ray Tune
|
||||
- [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results): Resultados CSV de los trials
|
||||
|
||||
## Discusión y análisis de resultados
|
||||
|
||||
@@ -1066,8 +1071,13 @@ Este capítulo ha presentado el desarrollo completo de la contribución:
|
||||
**Resultado principal**: Se logró alcanzar el objetivo de CER < 2% mediante optimización de hiperparámetros, sin requerir fine-tuning ni recursos GPU.
|
||||
|
||||
**Fuentes de datos:**
|
||||
- `src/raytune_paddle_subproc_results_20251207_192320.csv`: Resultados de 64 trials
|
||||
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb`: Notebook principal del experimento
|
||||
- [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py): Script principal de optimización
|
||||
- [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results): Resultados CSV de los trials
|
||||
|
||||
**Imágenes Docker:**
|
||||
- `seryus.ddns.net/unir/paddle-ocr-gpu`: PaddleOCR con soporte GPU
|
||||
- `seryus.ddns.net/unir/easyocr-gpu`: EasyOCR con soporte GPU
|
||||
- `seryus.ddns.net/unir/doctr-gpu`: DocTR con soporte GPU
|
||||
|
||||
### Validación con Aceleración GPU
|
||||
|
||||
|
||||
@@ -10,10 +10,14 @@ Este Trabajo Fin de Máster ha demostrado que es posible mejorar significativame
|
||||
|
||||
El objetivo principal del trabajo era alcanzar un CER inferior al 2% en documentos académicos en español. Los resultados obtenidos confirman el cumplimiento de este objetivo:
|
||||
|
||||
**Tabla 39.** *Cumplimiento del objetivo de CER.*
|
||||
|
||||
| Métrica | Objetivo | Resultado |
|
||||
|---------|----------|-----------|
|
||||
| CER | < 2% | **1.49%** |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### Conclusiones Específicas
|
||||
|
||||
**Respecto a OE1 (Comparativa de soluciones OCR)**:
|
||||
|
||||
@@ -48,6 +48,8 @@ MastersThesis/
|
||||
|
||||
### Sistema de Desarrollo
|
||||
|
||||
**Tabla A1.** *Especificaciones del sistema de desarrollo.*
|
||||
|
||||
| Componente | Especificación |
|
||||
|------------|----------------|
|
||||
| Sistema Operativo | Ubuntu 24.04.3 LTS |
|
||||
@@ -56,20 +58,30 @@ MastersThesis/
|
||||
| GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |
|
||||
| CUDA | 12.4 |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
### Dependencias
|
||||
|
||||
**Tabla A2.** *Dependencias del proyecto.*
|
||||
|
||||
| Componente | Versión |
|
||||
|------------|---------|
|
||||
| Python | 3.11 |
|
||||
| Docker | 24+ |
|
||||
| Python | 3.12.3 |
|
||||
| Docker | 29.1.5 |
|
||||
| NVIDIA Container Toolkit | Requerido para GPU |
|
||||
| Ray | 2.52+ |
|
||||
| Optuna | 4.6+ |
|
||||
| Ray | 2.52.1 |
|
||||
| Optuna | 4.7.0 |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
## A.4 Instrucciones de Ejecución de Servicios OCR
|
||||
|
||||
### PaddleOCR (Puerto 8002)
|
||||
|
||||
**Imágenes Docker:**
|
||||
- GPU: `seryus.ddns.net/unir/paddle-ocr-gpu`
|
||||
- CPU: `seryus.ddns.net/unir/paddle-ocr-cpu`
|
||||
|
||||
```bash
|
||||
cd src/paddle_ocr
|
||||
|
||||
@@ -82,6 +94,8 @@ docker compose -f docker-compose.cpu-registry.yml up -d
|
||||
|
||||
### DocTR (Puerto 8003)
|
||||
|
||||
**Imagen Docker:** `seryus.ddns.net/unir/doctr-gpu`
|
||||
|
||||
```bash
|
||||
cd src/doctr_service
|
||||
|
||||
@@ -91,6 +105,8 @@ docker compose up -d
|
||||
|
||||
### EasyOCR (Puerto 8002)
|
||||
|
||||
**Imagen Docker:** `seryus.ddns.net/unir/easyocr-gpu`
|
||||
|
||||
```bash
|
||||
cd src/easyocr_service
|
||||
|
||||
@@ -165,29 +181,37 @@ analyze_results(results, prefix='raytune_paddle', config_keys=PADDLE_OCR_CONFIG_
|
||||
|
||||
### Servicios y Puertos
|
||||
|
||||
**Tabla A3.** *Servicios Docker y puertos.*
|
||||
|
||||
| Servicio | Puerto | Script de Ajuste |
|
||||
|----------|--------|------------------|
|
||||
| PaddleOCR | 8002 | `paddle_ocr_payload` |
|
||||
| DocTR | 8003 | `doctr_payload` |
|
||||
| EasyOCR | 8002 | `easyocr_payload` |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
## A.7 Métricas de Rendimiento
|
||||
|
||||
Los resultados detallados de las evaluaciones y ajustes de hiperparámetros se encuentran en:
|
||||
|
||||
- [Métricas Generales](metrics/metrics.md) - Comparativa de los tres servicios
|
||||
- [PaddleOCR](metrics/metrics_paddle.md) - Mejor precisión (7.72% CER)
|
||||
- [PaddleOCR](metrics/metrics_paddle.md) - Mejor precisión (7.76% CER baseline, **1.49% optimizado**)
|
||||
- [DocTR](metrics/metrics_doctr.md) - Más rápido (0.50s/página)
|
||||
- [EasyOCR](metrics/metrics_easyocr.md) - Balance intermedio
|
||||
|
||||
### Resumen de Resultados
|
||||
|
||||
**Tabla A4.** *Resumen de resultados del benchmark por servicio.*
|
||||
|
||||
| Servicio | CER Base | CER Ajustado | Mejora |
|
||||
|----------|----------|--------------|--------|
|
||||
| **PaddleOCR** | 8.85% | **7.72%** | 12.8% |
|
||||
| DocTR | 12.06% | 12.07% | 0% |
|
||||
| EasyOCR | 11.23% | 11.14% | 0.8% |
|
||||
|
||||
*Fuente: Elaboración propia.*
|
||||
|
||||
## A.8 Licencia
|
||||
|
||||
El código se distribuye bajo licencia MIT.
|
||||
|
||||
215
src/README.md
215
src/README.md
@@ -1,74 +1,153 @@
|
||||
# Running Notebooks in Background
|
||||
# OCR Hyperparameter Tuning with Ray Tune
|
||||
|
||||
## Quick: Check Ray Tune Progress
|
||||
This directory contains the Docker setup for running automated hyperparameter optimization on OCR services using Ray Tune with Optuna.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker with NVIDIA GPU support (`nvidia-container-toolkit`)
|
||||
- NVIDIA GPU with CUDA support
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Is papermill still running?
|
||||
ps aux | grep papermill | grep -v grep
|
||||
|
||||
# View live log
|
||||
tail -f papermill.log
|
||||
|
||||
# Find latest Ray Tune run and count completed trials
|
||||
LATEST=$(ls -td ~/ray_results/trainable_* 2>/dev/null | head -1)
|
||||
echo "Run: $LATEST"
|
||||
COMPLETED=$(find "$LATEST" -name "result.json" -size +0 2>/dev/null | wc -l)
|
||||
TOTAL=$(ls -d "$LATEST"/trainable_*/ 2>/dev/null | wc -l)
|
||||
echo "Completed: $COMPLETED / $TOTAL"
|
||||
|
||||
# Check workers are healthy
|
||||
for port in 8001 8002 8003; do
|
||||
status=$(curl -s "localhost:$port/health" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status','down'))" 2>/dev/null || echo "down")
|
||||
echo "Worker $port: $status"
|
||||
done
|
||||
|
||||
# Show best result so far
|
||||
if [ "$COMPLETED" -gt 0 ]; then
|
||||
find "$LATEST" -name "result.json" -size +0 -exec cat {} \; 2>/dev/null | \
|
||||
python3 -c "import sys,json; results=[json.loads(l) for l in sys.stdin if l.strip()]; best=min(results,key=lambda x:x.get('CER',999)); print(f'Best CER: {best[\"CER\"]:.4f}, WER: {best[\"WER\"]:.4f}')" 2>/dev/null
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Option 1: Papermill (Recommended)
|
||||
|
||||
Runs notebooks directly without conversion.
|
||||
|
||||
```bash
|
||||
pip install papermill
|
||||
nohup papermill <notebook>.ipynb output.ipynb > papermill.log 2>&1 &
|
||||
```
|
||||
|
||||
Monitor:
|
||||
```bash
|
||||
tail -f papermill.log
|
||||
```
|
||||
|
||||
## Option 2: Convert to Python Script
|
||||
|
||||
```bash
|
||||
jupyter nbconvert --to script <notebook>.ipynb
|
||||
nohup python <notebook>.py > output.log 2>&1 &
|
||||
```
|
||||
|
||||
**Note:** `%pip install` magic commands need manual removal before running as `.py`
|
||||
|
||||
## Important Notes
|
||||
|
||||
- Ray Tune notebooks require the OCR service running first (Docker)
|
||||
- For Ray workers, imports must be inside trainable functions
|
||||
|
||||
## Example: Ray Tune PaddleOCR
|
||||
|
||||
```bash
|
||||
# 1. Start OCR service
|
||||
cd src/paddle_ocr && docker compose up -d ocr-cpu
|
||||
|
||||
# 2. Run notebook with papermill
|
||||
cd src
|
||||
nohup papermill paddle_ocr_raytune_rest.ipynb output_raytune.ipynb > papermill.log 2>&1 &
|
||||
|
||||
# 3. Monitor
|
||||
tail -f papermill.log
|
||||
# Start PaddleOCR service and run tuning (images pulled from registry)
|
||||
docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
|
||||
docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
|
||||
```
|
||||
|
||||
## Available Services
|
||||
|
||||
| Service | Port | Compose File |
|
||||
|---------|------|--------------|
|
||||
| PaddleOCR | 8002 | `docker-compose.tuning.paddle.yml` |
|
||||
| DocTR | 8003 | `docker-compose.tuning.doctr.yml` |
|
||||
| EasyOCR | 8002 | `docker-compose.tuning.easyocr.yml` |
|
||||
|
||||
**Note:** PaddleOCR and EasyOCR both use port 8002. Run them separately.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### PaddleOCR Tuning
|
||||
|
||||
```bash
|
||||
# Start service
|
||||
docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
|
||||
|
||||
# Wait for health check (check with)
|
||||
curl http://localhost:8002/health
|
||||
|
||||
# Run tuning (64 samples)
|
||||
docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
|
||||
|
||||
# Stop service
|
||||
docker compose -f docker-compose.tuning.paddle.yml down
|
||||
```
|
||||
|
||||
### DocTR Tuning
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
|
||||
curl http://localhost:8003/health
|
||||
docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
|
||||
docker compose -f docker-compose.tuning.doctr.yml down
|
||||
```
|
||||
|
||||
### EasyOCR Tuning
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
|
||||
curl http://localhost:8002/health
|
||||
docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
|
||||
docker compose -f docker-compose.tuning.easyocr.yml down
|
||||
```
|
||||
|
||||
### Run Multiple Services (PaddleOCR + DocTR)
|
||||
|
||||
```bash
|
||||
# Start both services
|
||||
docker compose -f docker-compose.tuning.yml up -d paddle-ocr-gpu doctr-gpu
|
||||
|
||||
# Run tuning for each
|
||||
docker compose -f docker-compose.tuning.yml run raytune --service paddle --samples 64
|
||||
docker compose -f docker-compose.tuning.yml run raytune --service doctr --samples 64
|
||||
|
||||
# Stop all
|
||||
docker compose -f docker-compose.tuning.yml down
|
||||
```
|
||||
|
||||
## Command Line Options
|
||||
|
||||
```bash
|
||||
docker compose -f <compose-file> run raytune --service <service> --samples <n>
|
||||
```
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--service` | OCR service: `paddle`, `doctr`, `easyocr` | Required |
|
||||
| `--samples` | Number of hyperparameter trials | 64 |
|
||||
|
||||
## Output
|
||||
|
||||
Results are saved to `src/results/` as CSV files:
|
||||
- `raytune_paddle_results_<timestamp>.csv`
|
||||
- `raytune_doctr_results_<timestamp>.csv`
|
||||
- `raytune_easyocr_results_<timestamp>.csv`
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── docker-compose.tuning.yml # All services (PaddleOCR + DocTR)
|
||||
├── docker-compose.tuning.paddle.yml # PaddleOCR only
|
||||
├── docker-compose.tuning.doctr.yml # DocTR only
|
||||
├── docker-compose.tuning.easyocr.yml # EasyOCR only
|
||||
├── raytune/
|
||||
│ ├── Dockerfile
|
||||
│ ├── requirements.txt
|
||||
│ ├── raytune_ocr.py
|
||||
│ └── run_tuning.py
|
||||
├── dataset/ # Input images and ground truth
|
||||
├── results/ # Output CSV files
|
||||
└── debugset/ # Debug output
|
||||
```
|
||||
|
||||
## Docker Images
|
||||
|
||||
All images are pre-built and pulled from registry:
|
||||
- `seryus.ddns.net/unir/raytune:latest` - Ray Tune tuning service
|
||||
- `seryus.ddns.net/unir/paddle-ocr-gpu:latest` - PaddleOCR GPU
|
||||
- `seryus.ddns.net/unir/doctr-gpu:latest` - DocTR GPU
|
||||
- `seryus.ddns.net/unir/easyocr-gpu:latest` - EasyOCR GPU
|
||||
|
||||
### Build locally (development)
|
||||
|
||||
```bash
|
||||
# Build raytune image locally
|
||||
docker build -t seryus.ddns.net/unir/raytune:latest ./raytune
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service not ready
|
||||
Wait for the health check to pass before running tuning:
|
||||
```bash
|
||||
# Check service health
|
||||
curl http://localhost:8002/health
|
||||
# Expected: {"status": "ok", "model_loaded": true, ...}
|
||||
```
|
||||
|
||||
### GPU not detected
|
||||
Ensure `nvidia-container-toolkit` is installed:
|
||||
```bash
|
||||
nvidia-smi # Should show your GPU
|
||||
docker run --rm --gpus all nvidia/cuda:12.4.1-base nvidia-smi
|
||||
```
|
||||
|
||||
### Port already in use
|
||||
Stop any running OCR services:
|
||||
```bash
|
||||
docker compose -f docker-compose.tuning.paddle.yml down
|
||||
docker compose -f docker-compose.tuning.easyocr.yml down
|
||||
```
|
||||
|
||||
50
src/docker-compose.tuning.doctr.yml
Normal file
50
src/docker-compose.tuning.doctr.yml
Normal file
@@ -0,0 +1,50 @@
|
||||
# docker-compose.tuning.doctr.yml - Ray Tune with DocTR GPU
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
|
||||
# docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
|
||||
# docker compose -f docker-compose.tuning.doctr.yml down
|
||||
|
||||
services:
|
||||
raytune:
|
||||
image: seryus.ddns.net/unir/raytune:latest
|
||||
command: ["--service", "doctr", "--host", "doctr-gpu", "--port", "8000", "--samples", "64"]
|
||||
volumes:
|
||||
- ./results:/app/results:rw
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
depends_on:
|
||||
doctr-gpu:
|
||||
condition: service_healthy
|
||||
|
||||
doctr-gpu:
|
||||
image: seryus.ddns.net/unir/doctr-gpu:latest
|
||||
container_name: doctr-gpu-tuning
|
||||
ports:
|
||||
- "8003:8000"
|
||||
volumes:
|
||||
- ./dataset:/app/dataset:ro
|
||||
- ./debugset:/app/debugset:rw
|
||||
- doctr-cache:/root/.cache/doctr
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- DOCTR_DET_ARCH=db_resnet50
|
||||
- DOCTR_RECO_ARCH=crnn_vgg16_bn
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 180s
|
||||
|
||||
volumes:
|
||||
doctr-cache:
|
||||
name: doctr-model-cache
|
||||
51
src/docker-compose.tuning.easyocr.yml
Normal file
51
src/docker-compose.tuning.easyocr.yml
Normal file
@@ -0,0 +1,51 @@
|
||||
# docker-compose.tuning.easyocr.yml - Ray Tune with EasyOCR GPU
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
|
||||
# docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
|
||||
# docker compose -f docker-compose.tuning.easyocr.yml down
|
||||
#
|
||||
# Note: EasyOCR uses port 8002 (same as PaddleOCR). Cannot run simultaneously.
|
||||
|
||||
services:
|
||||
raytune:
|
||||
image: seryus.ddns.net/unir/raytune:latest
|
||||
command: ["--service", "easyocr", "--host", "easyocr-gpu", "--port", "8000", "--samples", "64"]
|
||||
volumes:
|
||||
- ./results:/app/results:rw
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
depends_on:
|
||||
easyocr-gpu:
|
||||
condition: service_healthy
|
||||
|
||||
easyocr-gpu:
|
||||
image: seryus.ddns.net/unir/easyocr-gpu:latest
|
||||
container_name: easyocr-gpu-tuning
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./dataset:/app/dataset:ro
|
||||
- ./debugset:/app/debugset:rw
|
||||
- easyocr-cache:/root/.EasyOCR
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- EASYOCR_LANGUAGES=es,en
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 120s
|
||||
|
||||
volumes:
|
||||
easyocr-cache:
|
||||
name: easyocr-model-cache
|
||||
50
src/docker-compose.tuning.paddle.yml
Normal file
50
src/docker-compose.tuning.paddle.yml
Normal file
@@ -0,0 +1,50 @@
|
||||
# docker-compose.tuning.paddle.yml - Ray Tune with PaddleOCR GPU
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
|
||||
# docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
|
||||
# docker compose -f docker-compose.tuning.paddle.yml down
|
||||
|
||||
services:
|
||||
raytune:
|
||||
image: seryus.ddns.net/unir/raytune:latest
|
||||
command: ["--service", "paddle", "--host", "paddle-ocr-gpu", "--port", "8000", "--samples", "64"]
|
||||
volumes:
|
||||
- ./results:/app/results:rw
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
depends_on:
|
||||
paddle-ocr-gpu:
|
||||
condition: service_healthy
|
||||
|
||||
paddle-ocr-gpu:
|
||||
image: seryus.ddns.net/unir/paddle-ocr-gpu:latest
|
||||
container_name: paddle-ocr-gpu-tuning
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./dataset:/app/dataset:ro
|
||||
- ./debugset:/app/debugset:rw
|
||||
- paddlex-cache:/root/.paddlex
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- PADDLE_DET_MODEL=PP-OCRv5_mobile_det
|
||||
- PADDLE_REC_MODEL=PP-OCRv5_mobile_rec
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
|
||||
volumes:
|
||||
paddlex-cache:
|
||||
name: paddlex-model-cache
|
||||
82
src/docker-compose.tuning.yml
Normal file
82
src/docker-compose.tuning.yml
Normal file
@@ -0,0 +1,82 @@
|
||||
# docker-compose.tuning.yml - Ray Tune with all OCR services (PaddleOCR + DocTR)
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.tuning.yml up -d paddle-ocr-gpu doctr-gpu
|
||||
# docker compose -f docker-compose.tuning.yml run raytune --service paddle --samples 64
|
||||
# docker compose -f docker-compose.tuning.yml run raytune --service doctr --samples 64
|
||||
# docker compose -f docker-compose.tuning.yml down
|
||||
#
|
||||
# Note: EasyOCR uses port 8002 (same as PaddleOCR). Use docker-compose.tuning.easyocr.yml separately.
|
||||
|
||||
services:
|
||||
raytune:
|
||||
image: seryus.ddns.net/unir/raytune:latest
|
||||
network_mode: host
|
||||
shm_size: '5gb'
|
||||
volumes:
|
||||
- ./results:/app/results:rw
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
|
||||
paddle-ocr-gpu:
|
||||
image: seryus.ddns.net/unir/paddle-ocr-gpu:latest
|
||||
container_name: paddle-ocr-gpu-tuning
|
||||
ports:
|
||||
- "8002:8000"
|
||||
volumes:
|
||||
- ./dataset:/app/dataset:ro
|
||||
- ./debugset:/app/debugset:rw
|
||||
- paddlex-cache:/root/.paddlex
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- PADDLE_DET_MODEL=PP-OCRv5_mobile_det
|
||||
- PADDLE_REC_MODEL=PP-OCRv5_mobile_rec
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
|
||||
doctr-gpu:
|
||||
image: seryus.ddns.net/unir/doctr-gpu:latest
|
||||
container_name: doctr-gpu-tuning
|
||||
ports:
|
||||
- "8003:8000"
|
||||
volumes:
|
||||
- ./dataset:/app/dataset:ro
|
||||
- ./debugset:/app/debugset:rw
|
||||
- doctr-cache:/root/.cache/doctr
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
- DOCTR_DET_ARCH=db_resnet50
|
||||
- DOCTR_RECO_ARCH=crnn_vgg16_bn
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 180s
|
||||
|
||||
volumes:
|
||||
paddlex-cache:
|
||||
name: paddlex-model-cache
|
||||
doctr-cache:
|
||||
name: doctr-model-cache
|
||||
18
src/raytune/Dockerfile
Normal file
18
src/raytune/Dockerfile
Normal file
@@ -0,0 +1,18 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application files
|
||||
COPY raytune_ocr.py .
|
||||
COPY run_tuning.py .
|
||||
|
||||
# Create results directory
|
||||
RUN mkdir -p /app/results
|
||||
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
ENTRYPOINT ["python", "run_tuning.py"]
|
||||
131
src/raytune/README.md
Normal file
131
src/raytune/README.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# Ray Tune OCR Hyperparameter Optimization
|
||||
|
||||
Docker-based hyperparameter tuning for OCR services using Ray Tune with Optuna search.
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
raytune/
|
||||
├── Dockerfile # Python 3.12-slim with Ray Tune + Optuna
|
||||
├── requirements.txt # Dependencies
|
||||
├── raytune_ocr.py # Shared utilities and search spaces
|
||||
├── run_tuning.py # CLI entry point
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd src
|
||||
|
||||
# Build the raytune image
|
||||
docker compose -f docker-compose.tuning.paddle.yml build raytune
|
||||
|
||||
# Or pull from registry
|
||||
docker pull seryus.ddns.net/unir/raytune:latest
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### PaddleOCR Tuning
|
||||
|
||||
```bash
|
||||
# Start PaddleOCR service
|
||||
docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
|
||||
|
||||
# Wait for health check, then run tuning
|
||||
docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
|
||||
|
||||
# Stop when done
|
||||
docker compose -f docker-compose.tuning.paddle.yml down
|
||||
```
|
||||
|
||||
### DocTR Tuning
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
|
||||
docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
|
||||
docker compose -f docker-compose.tuning.doctr.yml down
|
||||
```
|
||||
|
||||
### EasyOCR Tuning
|
||||
|
||||
```bash
|
||||
# Note: EasyOCR uses port 8002 (same as PaddleOCR). Cannot run simultaneously.
|
||||
docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
|
||||
docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
|
||||
docker compose -f docker-compose.tuning.easyocr.yml down
|
||||
```
|
||||
|
||||
## CLI Options
|
||||
|
||||
```
|
||||
python run_tuning.py --service {paddle,doctr,easyocr} --samples N
|
||||
```
|
||||
|
||||
| Option | Description | Default |
|
||||
|------------|--------------------------------------|---------|
|
||||
| --service | OCR service to tune (required) | - |
|
||||
| --samples | Number of hyperparameter trials | 64 |
|
||||
|
||||
## Search Spaces
|
||||
|
||||
### PaddleOCR
|
||||
- `use_doc_orientation_classify`: [True, False]
|
||||
- `use_doc_unwarping`: [True, False]
|
||||
- `textline_orientation`: [True, False]
|
||||
- `text_det_thresh`: uniform(0.0, 0.7)
|
||||
- `text_det_box_thresh`: uniform(0.0, 0.7)
|
||||
- `text_rec_score_thresh`: uniform(0.0, 0.7)
|
||||
|
||||
### DocTR
|
||||
- `assume_straight_pages`: [True, False]
|
||||
- `straighten_pages`: [True, False]
|
||||
- `preserve_aspect_ratio`: [True, False]
|
||||
- `symmetric_pad`: [True, False]
|
||||
- `disable_page_orientation`: [True, False]
|
||||
- `disable_crop_orientation`: [True, False]
|
||||
- `resolve_lines`: [True, False]
|
||||
- `resolve_blocks`: [True, False]
|
||||
- `paragraph_break`: uniform(0.01, 0.1)
|
||||
|
||||
### EasyOCR
|
||||
- `text_threshold`: uniform(0.3, 0.9)
|
||||
- `low_text`: uniform(0.2, 0.6)
|
||||
- `link_threshold`: uniform(0.2, 0.6)
|
||||
- `slope_ths`: uniform(0.0, 0.3)
|
||||
- `ycenter_ths`: uniform(0.3, 1.0)
|
||||
- `height_ths`: uniform(0.3, 1.0)
|
||||
- `width_ths`: uniform(0.3, 1.0)
|
||||
- `add_margin`: uniform(0.0, 0.3)
|
||||
- `contrast_ths`: uniform(0.05, 0.3)
|
||||
- `adjust_contrast`: uniform(0.3, 0.8)
|
||||
- `decoder`: ["greedy", "beamsearch"]
|
||||
- `beamWidth`: [3, 5, 7, 10]
|
||||
- `min_size`: [5, 10, 15, 20]
|
||||
|
||||
## Output
|
||||
|
||||
Results are saved to `src/results/` as CSV files:
|
||||
- `raytune_paddle_results_YYYYMMDD_HHMMSS.csv`
|
||||
- `raytune_doctr_results_YYYYMMDD_HHMMSS.csv`
|
||||
- `raytune_easyocr_results_YYYYMMDD_HHMMSS.csv`
|
||||
|
||||
Each row contains:
|
||||
- Configuration parameters (prefixed with `config/`)
|
||||
- Metrics: CER, WER, TIME, PAGES, TIME_PER_PAGE
|
||||
- Worker URL used for the trial
|
||||
|
||||
## Network Mode
|
||||
|
||||
The raytune container uses `network_mode: host` to access OCR services on localhost ports:
|
||||
- PaddleOCR: port 8002
|
||||
- DocTR: port 8003
|
||||
- EasyOCR: port 8002 (conflicts with PaddleOCR)
|
||||
|
||||
## Dependencies
|
||||
|
||||
- ray[tune]==2.52.1
|
||||
- optuna==4.7.0
|
||||
- requests>=2.28.0
|
||||
- pandas>=2.0.0
|
||||
371
src/raytune/raytune_ocr.py
Normal file
371
src/raytune/raytune_ocr.py
Normal file
@@ -0,0 +1,371 @@
|
||||
# raytune_ocr.py
|
||||
# Shared Ray Tune utilities for OCR hyperparameter optimization
|
||||
#
|
||||
# Usage:
|
||||
# from raytune_ocr import check_workers, create_trainable, run_tuner, analyze_results
|
||||
#
|
||||
# Environment variables:
|
||||
# OCR_HOST: Host for OCR services (default: localhost)
|
||||
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Callable
|
||||
|
||||
import requests
|
||||
import pandas as pd
|
||||
|
||||
import ray
|
||||
from ray import tune
|
||||
from ray.tune.search.optuna import OptunaSearch
|
||||
|
||||
|
||||
def check_workers(
|
||||
ports: List[int],
|
||||
service_name: str = "OCR",
|
||||
timeout: int = 180,
|
||||
interval: int = 5,
|
||||
) -> List[str]:
|
||||
"""
|
||||
Wait for workers to be fully ready (model + dataset loaded) and return healthy URLs.
|
||||
|
||||
Args:
|
||||
ports: List of port numbers to check
|
||||
service_name: Name for error messages
|
||||
timeout: Max seconds to wait for each worker
|
||||
interval: Seconds between retries
|
||||
|
||||
Returns:
|
||||
List of healthy worker URLs
|
||||
|
||||
Raises:
|
||||
RuntimeError if no healthy workers found after timeout
|
||||
"""
|
||||
import time
|
||||
|
||||
host = os.environ.get("OCR_HOST", "localhost")
|
||||
worker_urls = [f"http://{host}:{port}" for port in ports]
|
||||
healthy_workers = []
|
||||
|
||||
for url in worker_urls:
|
||||
print(f"Waiting for {url}...")
|
||||
start = time.time()
|
||||
|
||||
while time.time() - start < timeout:
|
||||
try:
|
||||
health = requests.get(f"{url}/health", timeout=10).json()
|
||||
model_ok = health.get('model_loaded', False)
|
||||
dataset_ok = health.get('dataset_loaded', False)
|
||||
|
||||
if health.get('status') == 'ok' and model_ok:
|
||||
gpu = health.get('gpu_name', 'CPU')
|
||||
print(f"✓ {url}: ready ({gpu})")
|
||||
healthy_workers.append(url)
|
||||
break
|
||||
|
||||
elapsed = int(time.time() - start)
|
||||
print(f" [{elapsed}s] model={model_ok}")
|
||||
except requests.exceptions.RequestException:
|
||||
elapsed = int(time.time() - start)
|
||||
print(f" [{elapsed}s] not reachable")
|
||||
|
||||
time.sleep(interval)
|
||||
else:
|
||||
print(f"✗ {url}: timeout after {timeout}s")
|
||||
|
||||
if not healthy_workers:
|
||||
raise RuntimeError(
|
||||
f"No healthy {service_name} workers found.\n"
|
||||
f"Checked ports: {ports}"
|
||||
)
|
||||
|
||||
print(f"\n{len(healthy_workers)}/{len(worker_urls)} workers ready\n")
|
||||
return healthy_workers
|
||||
|
||||
|
||||
def create_trainable(ports: List[int], payload_fn: Callable[[Dict], Dict]) -> Callable:
|
||||
"""
|
||||
Factory to create a trainable function for Ray Tune.
|
||||
|
||||
Args:
|
||||
ports: List of worker ports for load balancing
|
||||
payload_fn: Function that takes config dict and returns API payload dict
|
||||
|
||||
Returns:
|
||||
Trainable function for Ray Tune
|
||||
|
||||
Note:
|
||||
Ray Tune 2.x API: tune.report(metrics_dict) - pass dict directly, NOT kwargs.
|
||||
See: https://docs.ray.io/en/latest/tune/api/doc/ray.tune.report.html
|
||||
"""
|
||||
def trainable(config):
|
||||
import os
|
||||
import random
|
||||
import requests
|
||||
from ray.tune import report # Ray 2.x: report(dict), not report(**kwargs)
|
||||
|
||||
host = os.environ.get("OCR_HOST", "localhost")
|
||||
api_url = f"http://{host}:{random.choice(ports)}"
|
||||
payload = payload_fn(config)
|
||||
|
||||
try:
|
||||
response = requests.post(f"{api_url}/evaluate", json=payload, timeout=None)
|
||||
response.raise_for_status()
|
||||
metrics = response.json()
|
||||
metrics["worker"] = api_url
|
||||
report(metrics) # Ray 2.x API: pass dict directly
|
||||
except Exception as e:
|
||||
report({ # Ray 2.x API: pass dict directly
|
||||
"CER": 1.0,
|
||||
"WER": 1.0,
|
||||
"TIME": 0.0,
|
||||
"PAGES": 0,
|
||||
"TIME_PER_PAGE": 0,
|
||||
"worker": api_url,
|
||||
"ERROR": str(e)[:500]
|
||||
})
|
||||
|
||||
return trainable
|
||||
|
||||
|
||||
def run_tuner(
|
||||
trainable: Callable,
|
||||
search_space: Dict[str, Any],
|
||||
num_samples: int = 64,
|
||||
num_workers: int = 1,
|
||||
metric: str = "CER",
|
||||
mode: str = "min",
|
||||
) -> tune.ResultGrid:
|
||||
"""
|
||||
Initialize Ray and run hyperparameter tuning.
|
||||
|
||||
Args:
|
||||
trainable: Trainable function from create_trainable()
|
||||
search_space: Dict of parameter names to tune.* search spaces
|
||||
num_samples: Number of trials to run
|
||||
num_workers: Max concurrent trials
|
||||
metric: Metric to optimize
|
||||
mode: "min" or "max"
|
||||
|
||||
Returns:
|
||||
Ray Tune ResultGrid
|
||||
"""
|
||||
ray.init(
|
||||
ignore_reinit_error=True,
|
||||
include_dashboard=False,
|
||||
configure_logging=False,
|
||||
_metrics_export_port=0, # Disable metrics export to avoid connection warnings
|
||||
)
|
||||
print(f"Ray Tune ready (version: {ray.__version__})")
|
||||
|
||||
tuner = tune.Tuner(
|
||||
trainable,
|
||||
tune_config=tune.TuneConfig(
|
||||
metric=metric,
|
||||
mode=mode,
|
||||
search_alg=OptunaSearch(),
|
||||
num_samples=num_samples,
|
||||
max_concurrent_trials=num_workers,
|
||||
),
|
||||
param_space=search_space,
|
||||
)
|
||||
|
||||
return tuner.fit()
|
||||
|
||||
|
||||
def analyze_results(
|
||||
results: tune.ResultGrid,
|
||||
output_folder: str = "results",
|
||||
prefix: str = "raytune",
|
||||
config_keys: List[str] = None,
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
Analyze and save tuning results.
|
||||
|
||||
Args:
|
||||
results: Ray Tune ResultGrid
|
||||
output_folder: Directory to save CSV
|
||||
prefix: Filename prefix
|
||||
config_keys: List of config keys to show in best result (without 'config/' prefix)
|
||||
|
||||
Returns:
|
||||
Results DataFrame
|
||||
"""
|
||||
os.makedirs(output_folder, exist_ok=True)
|
||||
df = results.get_dataframe()
|
||||
|
||||
# Save to CSV
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"{prefix}_results_{timestamp}.csv"
|
||||
filepath = os.path.join(output_folder, filename)
|
||||
df.to_csv(filepath, index=False)
|
||||
print(f"Results saved: {filepath}")
|
||||
|
||||
# Best configuration
|
||||
best = df.loc[df["CER"].idxmin()]
|
||||
print(f"\nBest CER: {best['CER']:.6f}")
|
||||
print(f"Best WER: {best['WER']:.6f}")
|
||||
|
||||
if config_keys:
|
||||
print(f"\nOptimal Configuration:")
|
||||
for key in config_keys:
|
||||
col = f"config/{key}"
|
||||
if col in best:
|
||||
val = best[col]
|
||||
if isinstance(val, float):
|
||||
print(f" {key}: {val:.4f}")
|
||||
else:
|
||||
print(f" {key}: {val}")
|
||||
|
||||
return df
|
||||
|
||||
|
||||
def correlation_analysis(df: pd.DataFrame, param_keys: List[str]) -> None:
|
||||
"""
|
||||
Print correlation of numeric parameters with CER/WER.
|
||||
|
||||
Args:
|
||||
df: Results DataFrame
|
||||
param_keys: List of config keys (without 'config/' prefix)
|
||||
"""
|
||||
param_cols = [f"config/{k}" for k in param_keys if f"config/{k}" in df.columns]
|
||||
numeric_cols = [c for c in param_cols if df[c].dtype in ['float64', 'int64']]
|
||||
|
||||
if not numeric_cols:
|
||||
print("No numeric parameters for correlation analysis")
|
||||
return
|
||||
|
||||
corr_cer = df[numeric_cols + ["CER"]].corr()["CER"].sort_values(ascending=False)
|
||||
corr_wer = df[numeric_cols + ["WER"]].corr()["WER"].sort_values(ascending=False)
|
||||
|
||||
print("Correlation with CER:")
|
||||
print(corr_cer)
|
||||
print("\nCorrelation with WER:")
|
||||
print(corr_wer)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# OCR-specific payload functions
|
||||
# =============================================================================
|
||||
|
||||
def paddle_ocr_payload(config: Dict) -> Dict:
|
||||
"""Create payload for PaddleOCR API. Uses pages 5-10 (first doc) for tuning."""
|
||||
return {
|
||||
"pdf_folder": "/app/dataset",
|
||||
"use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
|
||||
"use_doc_unwarping": config.get("use_doc_unwarping", False),
|
||||
"textline_orientation": config.get("textline_orientation", True),
|
||||
"text_det_thresh": config.get("text_det_thresh", 0.0),
|
||||
"text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
|
||||
"text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
|
||||
"text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
|
||||
"start_page": 5,
|
||||
"end_page": 10,
|
||||
"save_output": False,
|
||||
}
|
||||
|
||||
|
||||
def doctr_payload(config: Dict) -> Dict:
|
||||
"""Create payload for DocTR API. Uses pages 5-10 (first doc) for tuning."""
|
||||
return {
|
||||
"pdf_folder": "/app/dataset",
|
||||
"assume_straight_pages": config.get("assume_straight_pages", True),
|
||||
"straighten_pages": config.get("straighten_pages", False),
|
||||
"preserve_aspect_ratio": config.get("preserve_aspect_ratio", True),
|
||||
"symmetric_pad": config.get("symmetric_pad", True),
|
||||
"disable_page_orientation": config.get("disable_page_orientation", False),
|
||||
"disable_crop_orientation": config.get("disable_crop_orientation", False),
|
||||
"resolve_lines": config.get("resolve_lines", True),
|
||||
"resolve_blocks": config.get("resolve_blocks", False),
|
||||
"paragraph_break": config.get("paragraph_break", 0.035),
|
||||
"start_page": 5,
|
||||
"end_page": 10,
|
||||
"save_output": False,
|
||||
}
|
||||
|
||||
|
||||
def easyocr_payload(config: Dict) -> Dict:
|
||||
"""Create payload for EasyOCR API. Uses pages 5-10 (first doc) for tuning."""
|
||||
return {
|
||||
"pdf_folder": "/app/dataset",
|
||||
"text_threshold": config.get("text_threshold", 0.7),
|
||||
"low_text": config.get("low_text", 0.4),
|
||||
"link_threshold": config.get("link_threshold", 0.4),
|
||||
"slope_ths": config.get("slope_ths", 0.1),
|
||||
"ycenter_ths": config.get("ycenter_ths", 0.5),
|
||||
"height_ths": config.get("height_ths", 0.5),
|
||||
"width_ths": config.get("width_ths", 0.5),
|
||||
"add_margin": config.get("add_margin", 0.1),
|
||||
"contrast_ths": config.get("contrast_ths", 0.1),
|
||||
"adjust_contrast": config.get("adjust_contrast", 0.5),
|
||||
"decoder": config.get("decoder", "greedy"),
|
||||
"beamWidth": config.get("beamWidth", 5),
|
||||
"min_size": config.get("min_size", 10),
|
||||
"start_page": 5,
|
||||
"end_page": 10,
|
||||
"save_output": False,
|
||||
}
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Search spaces
|
||||
# =============================================================================
|
||||
|
||||
PADDLE_OCR_SEARCH_SPACE = {
|
||||
"use_doc_orientation_classify": tune.choice([True, False]),
|
||||
"use_doc_unwarping": tune.choice([True, False]),
|
||||
"textline_orientation": tune.choice([True, False]),
|
||||
"text_det_thresh": tune.uniform(0.0, 0.7),
|
||||
"text_det_box_thresh": tune.uniform(0.0, 0.7),
|
||||
"text_det_unclip_ratio": tune.choice([0.0]),
|
||||
"text_rec_score_thresh": tune.uniform(0.0, 0.7),
|
||||
}
|
||||
|
||||
DOCTR_SEARCH_SPACE = {
|
||||
"assume_straight_pages": tune.choice([True, False]),
|
||||
"straighten_pages": tune.choice([True, False]),
|
||||
"preserve_aspect_ratio": tune.choice([True, False]),
|
||||
"symmetric_pad": tune.choice([True, False]),
|
||||
"disable_page_orientation": tune.choice([True, False]),
|
||||
"disable_crop_orientation": tune.choice([True, False]),
|
||||
"resolve_lines": tune.choice([True, False]),
|
||||
"resolve_blocks": tune.choice([True, False]),
|
||||
"paragraph_break": tune.uniform(0.01, 0.1),
|
||||
}
|
||||
|
||||
EASYOCR_SEARCH_SPACE = {
|
||||
"text_threshold": tune.uniform(0.3, 0.9),
|
||||
"low_text": tune.uniform(0.2, 0.6),
|
||||
"link_threshold": tune.uniform(0.2, 0.6),
|
||||
"slope_ths": tune.uniform(0.0, 0.3),
|
||||
"ycenter_ths": tune.uniform(0.3, 1.0),
|
||||
"height_ths": tune.uniform(0.3, 1.0),
|
||||
"width_ths": tune.uniform(0.3, 1.0),
|
||||
"add_margin": tune.uniform(0.0, 0.3),
|
||||
"contrast_ths": tune.uniform(0.05, 0.3),
|
||||
"adjust_contrast": tune.uniform(0.3, 0.8),
|
||||
"decoder": tune.choice(["greedy", "beamsearch"]),
|
||||
"beamWidth": tune.choice([3, 5, 7, 10]),
|
||||
"min_size": tune.choice([5, 10, 15, 20]),
|
||||
}
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Config keys for results display
|
||||
# =============================================================================
|
||||
|
||||
PADDLE_OCR_CONFIG_KEYS = [
|
||||
"use_doc_orientation_classify", "use_doc_unwarping", "textline_orientation",
|
||||
"text_det_thresh", "text_det_box_thresh", "text_det_unclip_ratio", "text_rec_score_thresh",
|
||||
]
|
||||
|
||||
DOCTR_CONFIG_KEYS = [
|
||||
"assume_straight_pages", "straighten_pages", "preserve_aspect_ratio", "symmetric_pad",
|
||||
"disable_page_orientation", "disable_crop_orientation", "resolve_lines", "resolve_blocks",
|
||||
"paragraph_break",
|
||||
]
|
||||
|
||||
EASYOCR_CONFIG_KEYS = [
|
||||
"text_threshold", "low_text", "link_threshold", "slope_ths", "ycenter_ths",
|
||||
"height_ths", "width_ths", "add_margin", "contrast_ths", "adjust_contrast",
|
||||
"decoder", "beamWidth", "min_size",
|
||||
]
|
||||
4
src/raytune/requirements.txt
Normal file
4
src/raytune/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
ray[tune]==2.52.1
|
||||
optuna==4.7.0
|
||||
requests>=2.28.0
|
||||
pandas>=2.0.0
|
||||
80
src/raytune/run_tuning.py
Normal file
80
src/raytune/run_tuning.py
Normal file
@@ -0,0 +1,80 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Run hyperparameter tuning for OCR services."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
from raytune_ocr import (
|
||||
check_workers, create_trainable, run_tuner, analyze_results,
|
||||
paddle_ocr_payload, doctr_payload, easyocr_payload,
|
||||
PADDLE_OCR_SEARCH_SPACE, DOCTR_SEARCH_SPACE, EASYOCR_SEARCH_SPACE,
|
||||
PADDLE_OCR_CONFIG_KEYS, DOCTR_CONFIG_KEYS, EASYOCR_CONFIG_KEYS,
|
||||
)
|
||||
|
||||
SERVICES = {
|
||||
"paddle": {
|
||||
"payload_fn": paddle_ocr_payload,
|
||||
"search_space": PADDLE_OCR_SEARCH_SPACE,
|
||||
"config_keys": PADDLE_OCR_CONFIG_KEYS,
|
||||
"name": "PaddleOCR",
|
||||
},
|
||||
"doctr": {
|
||||
"payload_fn": doctr_payload,
|
||||
"search_space": DOCTR_SEARCH_SPACE,
|
||||
"config_keys": DOCTR_CONFIG_KEYS,
|
||||
"name": "DocTR",
|
||||
},
|
||||
"easyocr": {
|
||||
"payload_fn": easyocr_payload,
|
||||
"search_space": EASYOCR_SEARCH_SPACE,
|
||||
"config_keys": EASYOCR_CONFIG_KEYS,
|
||||
"name": "EasyOCR",
|
||||
},
|
||||
}
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Run OCR hyperparameter tuning")
|
||||
parser.add_argument("--service", choices=["paddle", "doctr", "easyocr"], required=True)
|
||||
parser.add_argument("--host", type=str, default="localhost", help="OCR service host")
|
||||
parser.add_argument("--port", type=int, default=8000, help="OCR service port")
|
||||
parser.add_argument("--samples", type=int, default=64, help="Number of samples")
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set environment variable for raytune_ocr module
|
||||
os.environ["OCR_HOST"] = args.host
|
||||
|
||||
cfg = SERVICES[args.service]
|
||||
ports = [args.port]
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Hyperparameter Tuning: {cfg['name']}")
|
||||
print(f"Host: {args.host}:{args.port}")
|
||||
print(f"Samples: {args.samples}")
|
||||
print(f"{'='*50}\n")
|
||||
|
||||
# Check workers
|
||||
healthy = check_workers(ports, cfg["name"])
|
||||
|
||||
# Create trainable and run tuning
|
||||
trainable = create_trainable(ports, cfg["payload_fn"])
|
||||
results = run_tuner(
|
||||
trainable=trainable,
|
||||
search_space=cfg["search_space"],
|
||||
num_samples=args.samples,
|
||||
num_workers=len(healthy),
|
||||
)
|
||||
|
||||
# Analyze results
|
||||
df = analyze_results(
|
||||
results,
|
||||
output_folder="results",
|
||||
prefix=f"raytune_{args.service}",
|
||||
config_keys=cfg["config_keys"],
|
||||
)
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print("Tuning complete!")
|
||||
print(f"{'='*50}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Binary file not shown.
6075
thesis_output/plantilla_individual.htm.bak
Normal file
6075
thesis_output/plantilla_individual.htm.bak
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user