raytune as docker

2026-01-19 16:32:45 +01:00
parent d67cbd4677
commit 94b25f9752
20 changed files with 7214 additions and 112 deletions
--- a/.gitea/workflows/ci.yaml
+++ b/.gitea/workflows/ci.yaml
@@ -25,6 +25,7 @@ jobs:
      image_easyocr_gpu: seryus.ddns.net/unir/easyocr-gpu
      image_doctr: seryus.ddns.net/unir/doctr-cpu
      image_doctr_gpu: seryus.ddns.net/unir/doctr-gpu
+      image_raytune: seryus.ddns.net/unir/raytune
    steps:
      - name: Output version info
        run: |
@@ -205,3 +206,32 @@ jobs:
          tags: |
            ${{ needs.essential.outputs.image_doctr_gpu }}:${{ needs.essential.outputs.Version }}
            ${{ needs.essential.outputs.image_doctr_gpu }}:latest
+
+  # Ray Tune OCR image (amd64 only)
+  build_raytune:
+    runs-on: ubuntu-latest
+    needs: essential
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Login to Gitea Registry
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ needs.essential.outputs.repo }}
+          username: username
+          password: ${{ secrets.CI_READWRITE }}
+
+      - name: Build and push Ray Tune image
+        uses: docker/build-push-action@v5
+        with:
+          context: src/raytune
+          file: src/raytune/Dockerfile
+          platforms: linux/amd64
+          push: true
+          tags: |
+            ${{ needs.essential.outputs.image_raytune }}:${{ needs.essential.outputs.Version }}
+            ${{ needs.essential.outputs.image_raytune }}:latest
--- a/README.md
+++ b/README.md
@@ -18,11 +18,15 @@ Optimizar el rendimiento de PaddleOCR para documentos académicos en español me

 ## Resultados Principales

+**Tabla.** *Comparación de métricas OCR entre configuración baseline y optimizada.*
+
 | Modelo | CER | Precisión Caracteres | WER | Precisión Palabras |
 |--------|-----|---------------------|-----|-------------------|
 | PaddleOCR (Baseline) | 7.78% | 92.22% | 14.94% | 85.06% |
 | **PaddleOCR-HyperAdjust** | **1.49%** | **98.51%** | **7.62%** | **92.38%** |

+*Fuente: Elaboración propia.*
+
 **Mejora obtenida:** Reducción del CER en un **80.9%**

 ### Configuración Óptima Encontrada
@@ -56,6 +60,8 @@ PDF (académico UNIR)

 ### Experimento de Optimización

+**Tabla.** *Parámetros de configuración del experimento Ray Tune.*
+
 | Parámetro | Valor |
 |-----------|-------|
 | Número de trials | 64 |
@@ -64,6 +70,8 @@ PDF (académico UNIR)
 | Trials concurrentes | 2 |
 | Tiempo total | ~6 horas (CPU) |

+*Fuente: Elaboración propia.*
+
 ---

 ## Estructura del Repositorio
@@ -143,16 +151,20 @@ Se realizó una validación adicional con aceleración GPU para evaluar la viabi

 ## Requisitos

+**Tabla.** *Dependencias principales del proyecto y versiones utilizadas.*
+
 | Componente | Versión |
 |------------|---------|
-| Python | 3.11.9 |
+| Python | 3.12.3 |
 | PaddlePaddle | 3.2.2 |
 | PaddleOCR | 3.3.2 |
 | Ray | 2.52.1 |
-| Optuna | 4.6.0 |
+| Optuna | 4.7.0 |
 | jiwer | (para métricas CER/WER) |
 | PyMuPDF | (para conversión PDF) |

+*Fuente: Elaboración propia.*
+
 ---

 ## Uso
@@ -262,11 +274,15 @@ python3 apply_content.py

 ### Archivos de Entrada y Salida

+**Tabla.** *Relación de scripts de generación con sus archivos de entrada y salida.*
+
 | Script | Entrada | Salida |
 |--------|---------|--------|
 | `generate_mermaid_figures.py` | `docs/*.md` (bloques ```mermaid```) | `thesis_output/figures/figura_*.png`, `figures_manifest.json` |
 | `apply_content.py` | `instructions/plantilla_individual.htm`, `docs/*.md`, `thesis_output/figures/*.png` | `thesis_output/plantilla_individual.htm` |

+*Fuente: Elaboración propia.*
+
 ### Contenido Generado Automáticamente

 - **30 tablas** con formato APA (Tabla X. *Título* + Fuente: ...)
--- a/apply_content.py
+++ b/apply_content.py
@@ -6,7 +6,8 @@ import os
 from bs4 import BeautifulSoup, NavigableString

 BASE_DIR = os.path.dirname(os.path.abspath(__file__))
-TEMPLATE = os.path.join(BASE_DIR, 'thesis_output/plantilla_individual.htm')
+TEMPLATE_INPUT = os.path.join(BASE_DIR, 'instructions/plantilla_individual.htm')
+TEMPLATE_OUTPUT = os.path.join(BASE_DIR, 'thesis_output/plantilla_individual.htm')
 DOCS_DIR = os.path.join(BASE_DIR, 'docs')

 # Global counters for tables and figures
@@ -365,7 +366,7 @@ def main():
    global table_counter, figure_counter

    print("Reading template...")
-    html_content = read_file(TEMPLATE)
+    html_content = read_file(TEMPLATE_INPUT)
    soup = BeautifulSoup(html_content, 'html.parser')

    print("Reading docs content...")
@@ -595,9 +596,9 @@ def main():

    print("Saving modified template...")
    output_html = str(soup)
-    write_file(TEMPLATE, output_html)
+    write_file(TEMPLATE_OUTPUT, output_html)

-    print(f"✓ Done! Modified: {TEMPLATE}")
+    print(f"✓ Done! Modified: {TEMPLATE_OUTPUT}")
    print("\nTo convert to DOCX:")
    print("1. Open the .htm file in Microsoft Word")
    print("2. Replace [Insertar diagrama Mermaid aquí] placeholders with actual diagrams")
--- a/docs/01_introduccion.md
+++ b/docs/01_introduccion.md
@@ -18,6 +18,8 @@ El procesamiento de documentos en español presenta particularidades que complic

 La Tabla 1 resume los principales desafíos lingüísticos del OCR en español:

+**Tabla 1.** *Desafíos lingüísticos específicos del OCR en español.*
+
 | Desafío | Descripción | Impacto en OCR |
 |---------|-------------|----------------|
 | Caracteres especiales | ñ, á, é, í, ó, ú, ü, ¿, ¡ | Confusión con caracteres similares (n/ñ, a/á) |
@@ -25,7 +27,7 @@ La Tabla 1 resume los principales desafíos lingüísticos del OCR en español:
 | Abreviaturas | Dr., Sra., Ud., etc. | Puntos internos confunden segmentación |
 | Nombres propios | Tildes en apellidos (García, Martínez) | Bases de datos sin soporte Unicode |

-*Tabla 1. Desafíos lingüísticos específicos del OCR en español. Fuente: Elaboración propia.*
+*Fuente: Elaboración propia.*

 Además de los aspectos lingüísticos, los documentos académicos y administrativos en español presentan características tipográficas que complican el reconocimiento: variaciones en fuentes entre encabezados, cuerpo y notas al pie; presencia de tablas con bordes y celdas; logotipos institucionales; marcas de agua; y elementos gráficos como firmas o sellos. Estos elementos generan ruido que puede propagarse en aplicaciones downstream como la extracción de entidades nombradas o el análisis semántico.

@@ -37,6 +39,8 @@ La adaptación de modelos preentrenados a dominios específicos típicamente req

 La Tabla 2 ilustra los requisitos típicos para diferentes estrategias de mejora de OCR:

+**Tabla 2.** *Comparación de estrategias de mejora de modelos OCR.*
+
 | Estrategia | Datos requeridos | Hardware | Tiempo | Expertise |
 |------------|------------------|----------|--------|-----------|
 | Fine-tuning completo | >10,000 imágenes etiquetadas | GPU (≥16GB VRAM) | Días-Semanas | Alto |
@@ -44,7 +48,7 @@ La Tabla 2 ilustra los requisitos típicos para diferentes estrategias de mejora
 | Transfer learning | >500 imágenes etiquetadas | GPU (≥8GB VRAM) | Horas | Medio |
 | **Optimización de hiperparámetros** | **<100 imágenes de validación** | **CPU suficiente** | **Horas** | **Bajo-Medio** |

-*Tabla 2. Comparación de estrategias de mejora de modelos OCR. Fuente: Elaboración propia.*
+*Fuente: Elaboración propia.*

 ### La oportunidad: optimización sin fine-tuning

@@ -88,6 +92,8 @@ Una solución técnicamente superior pero impracticable tiene valor limitado. Es

 Este trabajo se centra específicamente en:

+**Tabla 3.** *Delimitación del alcance del trabajo.*
+
 | Aspecto | Dentro del alcance | Fuera del alcance |
 |---------|-------------------|-------------------|
 | **Tipo de documento** | Documentos académicos digitales (PDF) | Documentos escaneados, manuscritos |
@@ -96,7 +102,7 @@ Este trabajo se centra específicamente en:
 | **Método de mejora** | Optimización de hiperparámetros | Fine-tuning, aumento de datos |
 | **Hardware** | Ejecución en CPU | Aceleración GPU |

-*Tabla 3. Delimitación del alcance del trabajo. Fuente: Elaboración propia.*
+*Fuente: Elaboración propia.*

 ### Relevancia y beneficiarios

--- a/docs/03_objetivos_metodologia.md
+++ b/docs/03_objetivos_metodologia.md
@@ -8,6 +8,8 @@ Este capítulo establece los objetivos del trabajo siguiendo la metodología SMA

 ### Justificación SMART del Objetivo General

+**Tabla 4.** *Justificación SMART del objetivo general.*
+
 | Criterio | Cumplimiento |
 |----------|--------------|
 | **Específico (S)** | Se define claramente qué se quiere lograr: optimizar PaddleOCR mediante ajuste de hiperparámetros para documentos en español |
@@ -16,6 +18,8 @@ Este capítulo establece los objetivos del trabajo siguiendo la metodología SMA
 | **Relevante (R)** | El impacto es demostrable: mejora la extracción de texto en documentos académicos sin costes adicionales de infraestructura |
 | **Temporal (T)** | El plazo es un cuatrimestre, correspondiente al TFM |

+*Fuente: Elaboración propia.*
+
 ## Objetivos específicos

 ### OE1: Comparar soluciones OCR de código abierto
@@ -115,12 +119,16 @@ class ImageTextDataset:

 #### Modelos Evaluados

+**Tabla 5.** *Modelos OCR evaluados en el benchmark inicial.*
+
 | Modelo | Versión | Configuración |
 |--------|---------|---------------|
 | EasyOCR | - | Idiomas: ['es', 'en'] |
 | PaddleOCR | PP-OCRv5 | Modelos server_det + server_rec |
 | DocTR | - | db_resnet50 + sar_resnet31 |

+*Fuente: Elaboración propia.*
+
 #### Métricas de Evaluación

 Se utilizó la biblioteca `jiwer` para calcular:
@@ -139,6 +147,8 @@ def evaluate_text(reference, prediction):

 #### Hiperparámetros Seleccionados

+**Tabla 6.** *Hiperparámetros seleccionados para optimización.*
+
 | Parámetro | Tipo | Rango/Valores | Descripción |
 |-----------|------|---------------|-------------|
 | `use_doc_orientation_classify` | Booleano | [True, False] | Clasificación de orientación del documento |
@@ -149,6 +159,8 @@ def evaluate_text(reference, prediction):
 | `text_det_unclip_ratio` | Fijo | 0.0 | Coeficiente de expansión (fijado) |
 | `text_rec_score_thresh` | Continuo | [0.0, 0.7] | Umbral de confianza de reconocimiento |

+*Fuente: Elaboración propia.*
+
 #### Configuración de Ray Tune

 ```python
@@ -235,23 +247,31 @@ Y retorna métricas en formato JSON:

 #### Hardware

+**Tabla 7.** *Especificaciones de hardware del entorno de desarrollo.*
+
 | Componente | Especificación |
 |------------|----------------|
-| CPU | Intel Core (especificar modelo) |
-| RAM | 16 GB |
-| GPU | No disponible (ejecución en CPU) |
+| CPU | AMD Ryzen 7 5800H |
+| RAM | 16 GB DDR4 |
+| GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |
 | Almacenamiento | SSD |

+*Fuente: Elaboración propia.*
+
 #### Software

+**Tabla 8.** *Versiones de software utilizadas.*
+
 | Componente | Versión |
 |------------|---------|
-| Sistema Operativo | Windows 10/11 |
-| Python | 3.11.9 |
+| Sistema Operativo | Ubuntu 24.04.3 LTS |
+| Python | 3.12.3 |
 | PaddleOCR | 3.3.2 |
 | PaddlePaddle | 3.2.2 |
 | Ray | 2.52.1 |
-| Optuna | 4.6.0 |
+| Optuna | 4.7.0 |
+
+*Fuente: Elaboración propia.*

 ### Limitaciones Metodológicas

--- a/docs/04_desarrollo_especifico.md
+++ b/docs/04_desarrollo_especifico.md
@@ -34,6 +34,11 @@ Se seleccionaron tres soluciones OCR de código abierto representativas del esta

 *Fuente: Elaboración propia.*

+**Imágenes Docker disponibles en el registro del proyecto:**
+- PaddleOCR: `seryus.ddns.net/unir/paddle-ocr-gpu`, `seryus.ddns.net/unir/paddle-ocr-cpu`
+- EasyOCR: `seryus.ddns.net/unir/easyocr-gpu`
+- DocTR: `seryus.ddns.net/unir/doctr-gpu`
+
 ### Criterios de Éxito

 Los criterios establecidos para evaluar las soluciones fueron:
@@ -322,7 +327,7 @@ Esta sección ha presentado:

 ### Introducción

-Esta sección describe el proceso de optimización de hiperparámetros de PaddleOCR utilizando Ray Tune con el algoritmo de búsqueda Optuna. Los experimentos fueron implementados en el notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb` y los resultados se almacenaron en `src/raytune_paddle_subproc_results_20251207_192320.csv`.
+Esta sección describe el proceso de optimización de hiperparámetros de PaddleOCR utilizando Ray Tune con el algoritmo de búsqueda Optuna. Los experimentos fueron implementados en [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py) con la librería de utilidades [`src/raytune_ocr.py`](https://github.com/seryus/MastersThesis/blob/main/src/raytune_ocr.py), y los resultados se almacenaron en [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results).

 La optimización de hiperparámetros representa una alternativa al fine-tuning tradicional que no requiere:
 - Acceso a GPU dedicada
@@ -339,17 +344,17 @@ El experimento se ejecutó en el siguiente entorno:

 | Componente | Versión/Especificación |
 |------------|------------------------|
-| Sistema operativo | Windows 10/11 |
-| Python | 3.11.9 |
+| Sistema operativo | Ubuntu 24.04.3 LTS |
+| Python | 3.12.3 |
 | PaddlePaddle | 3.2.2 |
 | PaddleOCR | 3.3.2 |
 | Ray | 2.52.1 |
-| Optuna | 4.6.0 |
-| CPU | Intel Core (multinúcleo) |
-| RAM | 16 GB |
-| GPU | No disponible (ejecución CPU) |
+| Optuna | 4.7.0 |
+| CPU | AMD Ryzen 7 5800H |
+| RAM | 16 GB DDR4 |
+| GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |

-*Fuente: Outputs del notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb`.*
+*Fuente: Configuración del entorno de ejecución. Resultados en `src/results/` generados por `src/run_tuning.py`.*

 #### Arquitectura de Ejecución

@@ -613,7 +618,7 @@ Configuración óptima:
 | text_det_unclip_ratio | 0.0 | 1.5 | -1.5 (fijado) |
 | text_rec_score_thresh | **0.6350** | 0.5 | +0.135 |

-*Fuente: Análisis del notebook.*
+*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*

 #### Análisis de Correlación

@@ -628,7 +633,7 @@ Se calculó la correlación de Pearson entre los parámetros continuos y las mé
 | `text_rec_score_thresh` | -0.161 | Correlación débil negativa |
 | `text_det_unclip_ratio` | NaN | Varianza cero (valor fijo) |

-*Fuente: Análisis del notebook.*
+*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*

 **Tabla 24.** *Correlación de parámetros con WER.*

@@ -638,7 +643,7 @@ Se calculó la correlación de Pearson entre los parámetros continuos y las mé
 | `text_det_box_thresh` | +0.227 | Correlación débil positiva |
 | `text_rec_score_thresh` | -0.173 | Correlación débil negativa |

-*Fuente: Análisis del notebook.*
+*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*

 **Hallazgo clave**: El parámetro `text_det_thresh` muestra la correlación más fuerte (-0.52 con ambas métricas), indicando que valores más altos de este umbral tienden a reducir el error. Este umbral controla qué píxeles se consideran "texto" en el mapa de probabilidad del detector.

@@ -653,7 +658,7 @@ El parámetro booleano `textline_orientation` demostró tener el mayor impacto e
 | True | 3.76% | 7.12% | 12.73% | 32 |
 | False | 12.40% | 14.93% | 21.71% | 32 |

-*Fuente: Análisis del notebook.*
+*Fuente: Análisis de [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results) generados por [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py).*

 **Interpretación:**

@@ -741,7 +746,7 @@ optimized_config = {
 | PaddleOCR (Baseline) | 7.78% | 92.22% | 14.94% | 85.06% |
 | PaddleOCR-HyperAdjust | **1.49%** | **98.51%** | **7.62%** | **92.38%** |

-*Fuente: Ejecución final en notebook `src/paddle_ocr_fine_tune_unir_raytune.ipynb`.*
+*Fuente: Validación final. Código en [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py), resultados en [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results).*

 #### Métricas de Mejora

@@ -823,9 +828,9 @@ Esta sección ha presentado:
 4. **Mejora final**: CER reducido de 7.78% a 1.49% (reducción del 80.9%)

 **Fuentes de datos:**
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb`: Código del experimento
- `src/raytune_paddle_subproc_results_20251207_192320.csv`: Resultados de 64 trials
- `src/paddle_ocr_tuning.py`: Script de evaluación
+- [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py): Script principal de optimización
+- [`src/raytune_ocr.py`](https://github.com/seryus/MastersThesis/blob/main/src/raytune_ocr.py): Librería de utilidades Ray Tune
+- [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results): Resultados CSV de los trials

 ## Discusión y análisis de resultados

@@ -1066,8 +1071,13 @@ Este capítulo ha presentado el desarrollo completo de la contribución:
 **Resultado principal**: Se logró alcanzar el objetivo de CER < 2% mediante optimización de hiperparámetros, sin requerir fine-tuning ni recursos GPU.

 **Fuentes de datos:**
- `src/raytune_paddle_subproc_results_20251207_192320.csv`: Resultados de 64 trials
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb`: Notebook principal del experimento
+- [`src/run_tuning.py`](https://github.com/seryus/MastersThesis/blob/main/src/run_tuning.py): Script principal de optimización
+- [`src/results/`](https://github.com/seryus/MastersThesis/tree/main/src/results): Resultados CSV de los trials
+
+**Imágenes Docker:**
+- `seryus.ddns.net/unir/paddle-ocr-gpu`: PaddleOCR con soporte GPU
+- `seryus.ddns.net/unir/easyocr-gpu`: EasyOCR con soporte GPU
+- `seryus.ddns.net/unir/doctr-gpu`: DocTR con soporte GPU

 ### Validación con Aceleración GPU

--- a/docs/05_conclusiones_trabajo_futuro.md
+++ b/docs/05_conclusiones_trabajo_futuro.md
@@ -10,10 +10,14 @@ Este Trabajo Fin de Máster ha demostrado que es posible mejorar significativame

 El objetivo principal del trabajo era alcanzar un CER inferior al 2% en documentos académicos en español. Los resultados obtenidos confirman el cumplimiento de este objetivo:

+**Tabla 39.** *Cumplimiento del objetivo de CER.*
+
 | Métrica | Objetivo | Resultado |
 |---------|----------|-----------|
 | CER | < 2% | **1.49%** |

+*Fuente: Elaboración propia.*
+
 ### Conclusiones Específicas

 **Respecto a OE1 (Comparativa de soluciones OCR)**:
--- a/docs/07_anexo_a.md
+++ b/docs/07_anexo_a.md
@@ -48,6 +48,8 @@ MastersThesis/

 ### Sistema de Desarrollo

+**Tabla A1.** *Especificaciones del sistema de desarrollo.*
+
 | Componente | Especificación |
 |------------|----------------|
 | Sistema Operativo | Ubuntu 24.04.3 LTS |
@@ -56,20 +58,30 @@ MastersThesis/
 | GPU | NVIDIA RTX 3060 Laptop (5.66 GB VRAM) |
 | CUDA | 12.4 |

+*Fuente: Elaboración propia.*
+
 ### Dependencias

+**Tabla A2.** *Dependencias del proyecto.*
+
 | Componente | Versión |
 |------------|---------|
-| Python | 3.11 |
-| Docker | 24+ |
+| Python | 3.12.3 |
+| Docker | 29.1.5 |
 | NVIDIA Container Toolkit | Requerido para GPU |
-| Ray | 2.52+ |
-| Optuna | 4.6+ |
+| Ray | 2.52.1 |
+| Optuna | 4.7.0 |
+
+*Fuente: Elaboración propia.*

 ## A.4 Instrucciones de Ejecución de Servicios OCR

 ### PaddleOCR (Puerto 8002)

+**Imágenes Docker:**
+- GPU: `seryus.ddns.net/unir/paddle-ocr-gpu`
+- CPU: `seryus.ddns.net/unir/paddle-ocr-cpu`
+
 ```bash
 cd src/paddle_ocr

@@ -82,6 +94,8 @@ docker compose -f docker-compose.cpu-registry.yml up -d

 ### DocTR (Puerto 8003)

+**Imagen Docker:** `seryus.ddns.net/unir/doctr-gpu`
+
 ```bash
 cd src/doctr_service

@@ -91,6 +105,8 @@ docker compose up -d

 ### EasyOCR (Puerto 8002)

+**Imagen Docker:** `seryus.ddns.net/unir/easyocr-gpu`
+
 ```bash
 cd src/easyocr_service

@@ -165,29 +181,37 @@ analyze_results(results, prefix='raytune_paddle', config_keys=PADDLE_OCR_CONFIG_

 ### Servicios y Puertos

+**Tabla A3.** *Servicios Docker y puertos.*
+
 | Servicio | Puerto | Script de Ajuste |
 |----------|--------|------------------|
 | PaddleOCR | 8002 | `paddle_ocr_payload` |
 | DocTR | 8003 | `doctr_payload` |
 | EasyOCR | 8002 | `easyocr_payload` |

+*Fuente: Elaboración propia.*
+
 ## A.7 Métricas de Rendimiento

 Los resultados detallados de las evaluaciones y ajustes de hiperparámetros se encuentran en:

 - [Métricas Generales](metrics/metrics.md) - Comparativa de los tres servicios
- [PaddleOCR](metrics/metrics_paddle.md) - Mejor precisión (7.72% CER)
+- [PaddleOCR](metrics/metrics_paddle.md) - Mejor precisión (7.76% CER baseline, **1.49% optimizado**)
 - [DocTR](metrics/metrics_doctr.md) - Más rápido (0.50s/página)
 - [EasyOCR](metrics/metrics_easyocr.md) - Balance intermedio

 ### Resumen de Resultados

+**Tabla A4.** *Resumen de resultados del benchmark por servicio.*
+
 | Servicio | CER Base | CER Ajustado | Mejora |
 |----------|----------|--------------|--------|
 | **PaddleOCR** | 8.85% | **7.72%** | 12.8% |
 | DocTR | 12.06% | 12.07% | 0% |
 | EasyOCR | 11.23% | 11.14% | 0.8% |

+*Fuente: Elaboración propia.*
+
 ## A.8 Licencia

 El código se distribuye bajo licencia MIT.
--- a/src/README.md
+++ b/src/README.md
@@ -1,74 +1,153 @@
-# Running Notebooks in Background
-
-## Quick: Check Ray Tune Progress
-
-```bash
-# Is papermill still running?
-ps aux | grep papermill | grep -v grep
-
-# View live log
-tail -f papermill.log
-
-# Find latest Ray Tune run and count completed trials
-LATEST=$(ls -td ~/ray_results/trainable_* 2>/dev/null | head -1)
-echo "Run: $LATEST"
-COMPLETED=$(find "$LATEST" -name "result.json" -size +0 2>/dev/null | wc -l)
-TOTAL=$(ls -d "$LATEST"/trainable_*/ 2>/dev/null | wc -l)
-echo "Completed: $COMPLETED / $TOTAL"
-
-# Check workers are healthy
-for port in 8001 8002 8003; do
-  status=$(curl -s "localhost:$port/health" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status','down'))" 2>/dev/null || echo "down")
-  echo "Worker $port: $status"
-done
-
-# Show best result so far
-if [ "$COMPLETED" -gt 0 ]; then
-  find "$LATEST" -name "result.json" -size +0 -exec cat {} \; 2>/dev/null | \
-    python3 -c "import sys,json; results=[json.loads(l) for l in sys.stdin if l.strip()]; best=min(results,key=lambda x:x.get('CER',999)); print(f'Best CER: {best[\"CER\"]:.4f}, WER: {best[\"WER\"]:.4f}')" 2>/dev/null
-fi
-```
-
---
-
-## Option 1: Papermill (Recommended)
-
-Runs notebooks directly without conversion.
-
-```bash
-pip install papermill
-nohup papermill <notebook>.ipynb output.ipynb > papermill.log 2>&1 &
-```
-
-Monitor:
-```bash
-tail -f papermill.log
-```
-
-## Option 2: Convert to Python Script
-
-```bash
-jupyter nbconvert --to script <notebook>.ipynb
-nohup python <notebook>.py > output.log 2>&1 &
-```
-
-**Note:** `%pip install` magic commands need manual removal before running as `.py`
-
-## Important Notes
-
- Ray Tune notebooks require the OCR service running first (Docker)
- For Ray workers, imports must be inside trainable functions
-
-## Example: Ray Tune PaddleOCR
-
-```bash
-# 1. Start OCR service
-cd src/paddle_ocr && docker compose up -d ocr-cpu
-
-# 2. Run notebook with papermill
-cd src
-nohup papermill paddle_ocr_raytune_rest.ipynb output_raytune.ipynb > papermill.log 2>&1 &
-
-# 3. Monitor
-tail -f papermill.log
-```
+# OCR Hyperparameter Tuning with Ray Tune
+
+This directory contains the Docker setup for running automated hyperparameter optimization on OCR services using Ray Tune with Optuna.
+
+## Prerequisites
+
+- Docker with NVIDIA GPU support (`nvidia-container-toolkit`)
+- NVIDIA GPU with CUDA support
+
+## Quick Start
+
+```bash
+cd src
+
+# Start PaddleOCR service and run tuning (images pulled from registry)
+docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+```
+
+## Available Services
+
+| Service | Port | Compose File |
+|---------|------|--------------|
+| PaddleOCR | 8002 | `docker-compose.tuning.paddle.yml` |
+| DocTR | 8003 | `docker-compose.tuning.doctr.yml` |
+| EasyOCR | 8002 | `docker-compose.tuning.easyocr.yml` |
+
+**Note:** PaddleOCR and EasyOCR both use port 8002. Run them separately.
+
+## Usage Examples
+
+### PaddleOCR Tuning
+
+```bash
+# Start service
+docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+
+# Wait for health check (check with)
+curl http://localhost:8002/health
+
+# Run tuning (64 samples)
+docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+
+# Stop service
+docker compose -f docker-compose.tuning.paddle.yml down
+```
+
+### DocTR Tuning
+
+```bash
+docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
+curl http://localhost:8003/health
+docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
+docker compose -f docker-compose.tuning.doctr.yml down
+```
+
+### EasyOCR Tuning
+
+```bash
+docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
+curl http://localhost:8002/health
+docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
+docker compose -f docker-compose.tuning.easyocr.yml down
+```
+
+### Run Multiple Services (PaddleOCR + DocTR)
+
+```bash
+# Start both services
+docker compose -f docker-compose.tuning.yml up -d paddle-ocr-gpu doctr-gpu
+
+# Run tuning for each
+docker compose -f docker-compose.tuning.yml run raytune --service paddle --samples 64
+docker compose -f docker-compose.tuning.yml run raytune --service doctr --samples 64
+
+# Stop all
+docker compose -f docker-compose.tuning.yml down
+```
+
+## Command Line Options
+
+```bash
+docker compose -f <compose-file> run raytune --service <service> --samples <n>
+```
+
+| Option | Description | Default |
+|--------|-------------|---------|
+| `--service` | OCR service: `paddle`, `doctr`, `easyocr` | Required |
+| `--samples` | Number of hyperparameter trials | 64 |
+
+## Output
+
+Results are saved to `src/results/` as CSV files:
+- `raytune_paddle_results_<timestamp>.csv`
+- `raytune_doctr_results_<timestamp>.csv`
+- `raytune_easyocr_results_<timestamp>.csv`
+
+## Directory Structure
+
+```
+src/
+├── docker-compose.tuning.yml          # All services (PaddleOCR + DocTR)
+├── docker-compose.tuning.paddle.yml   # PaddleOCR only
+├── docker-compose.tuning.doctr.yml    # DocTR only
+├── docker-compose.tuning.easyocr.yml  # EasyOCR only
+├── raytune/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   ├── raytune_ocr.py
+│   └── run_tuning.py
+├── dataset/                           # Input images and ground truth
+├── results/                           # Output CSV files
+└── debugset/                          # Debug output
+```
+
+## Docker Images
+
+All images are pre-built and pulled from registry:
+- `seryus.ddns.net/unir/raytune:latest` - Ray Tune tuning service
+- `seryus.ddns.net/unir/paddle-ocr-gpu:latest` - PaddleOCR GPU
+- `seryus.ddns.net/unir/doctr-gpu:latest` - DocTR GPU
+- `seryus.ddns.net/unir/easyocr-gpu:latest` - EasyOCR GPU
+
+### Build locally (development)
+
+```bash
+# Build raytune image locally
+docker build -t seryus.ddns.net/unir/raytune:latest ./raytune
+```
+
+## Troubleshooting
+
+### Service not ready
+Wait for the health check to pass before running tuning:
+```bash
+# Check service health
+curl http://localhost:8002/health
+# Expected: {"status": "ok", "model_loaded": true, ...}
+```
+
+### GPU not detected
+Ensure `nvidia-container-toolkit` is installed:
+```bash
+nvidia-smi  # Should show your GPU
+docker run --rm --gpus all nvidia/cuda:12.4.1-base nvidia-smi
+```
+
+### Port already in use
+Stop any running OCR services:
+```bash
+docker compose -f docker-compose.tuning.paddle.yml down
+docker compose -f docker-compose.tuning.easyocr.yml down
+```
--- a/src/docker-compose.tuning.doctr.yml
+++ b/src/docker-compose.tuning.doctr.yml
@@ -0,0 +1,50 @@
+# docker-compose.tuning.doctr.yml - Ray Tune with DocTR GPU
+# Usage:
+#   docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
+#   docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
+#   docker compose -f docker-compose.tuning.doctr.yml down
+
+services:
+  raytune:
+    image: seryus.ddns.net/unir/raytune:latest
+    command: ["--service", "doctr", "--host", "doctr-gpu", "--port", "8000", "--samples", "64"]
+    volumes:
+      - ./results:/app/results:rw
+    environment:
+      - PYTHONUNBUFFERED=1
+    depends_on:
+      doctr-gpu:
+        condition: service_healthy
+
+  doctr-gpu:
+    image: seryus.ddns.net/unir/doctr-gpu:latest
+    container_name: doctr-gpu-tuning
+    ports:
+      - "8003:8000"
+    volumes:
+      - ./dataset:/app/dataset:ro
+      - ./debugset:/app/debugset:rw
+      - doctr-cache:/root/.cache/doctr
+    environment:
+      - PYTHONUNBUFFERED=1
+      - CUDA_VISIBLE_DEVICES=0
+      - DOCTR_DET_ARCH=db_resnet50
+      - DOCTR_RECO_ARCH=crnn_vgg16_bn
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 180s
+
+volumes:
+  doctr-cache:
+    name: doctr-model-cache
--- a/src/docker-compose.tuning.easyocr.yml
+++ b/src/docker-compose.tuning.easyocr.yml
@@ -0,0 +1,51 @@
+# docker-compose.tuning.easyocr.yml - Ray Tune with EasyOCR GPU
+# Usage:
+#   docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
+#   docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
+#   docker compose -f docker-compose.tuning.easyocr.yml down
+#
+# Note: EasyOCR uses port 8002 (same as PaddleOCR). Cannot run simultaneously.
+
+services:
+  raytune:
+    image: seryus.ddns.net/unir/raytune:latest
+    command: ["--service", "easyocr", "--host", "easyocr-gpu", "--port", "8000", "--samples", "64"]
+    volumes:
+      - ./results:/app/results:rw
+    environment:
+      - PYTHONUNBUFFERED=1
+    depends_on:
+      easyocr-gpu:
+        condition: service_healthy
+
+  easyocr-gpu:
+    image: seryus.ddns.net/unir/easyocr-gpu:latest
+    container_name: easyocr-gpu-tuning
+    ports:
+      - "8002:8000"
+    volumes:
+      - ./dataset:/app/dataset:ro
+      - ./debugset:/app/debugset:rw
+      - easyocr-cache:/root/.EasyOCR
+    environment:
+      - PYTHONUNBUFFERED=1
+      - CUDA_VISIBLE_DEVICES=0
+      - EASYOCR_LANGUAGES=es,en
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 120s
+
+volumes:
+  easyocr-cache:
+    name: easyocr-model-cache
--- a/src/docker-compose.tuning.paddle.yml
+++ b/src/docker-compose.tuning.paddle.yml
@@ -0,0 +1,50 @@
+# docker-compose.tuning.paddle.yml - Ray Tune with PaddleOCR GPU
+# Usage:
+#   docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+#   docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+#   docker compose -f docker-compose.tuning.paddle.yml down
+
+services:
+  raytune:
+    image: seryus.ddns.net/unir/raytune:latest
+    command: ["--service", "paddle", "--host", "paddle-ocr-gpu", "--port", "8000", "--samples", "64"]
+    volumes:
+      - ./results:/app/results:rw
+    environment:
+      - PYTHONUNBUFFERED=1
+    depends_on:
+      paddle-ocr-gpu:
+        condition: service_healthy
+
+  paddle-ocr-gpu:
+    image: seryus.ddns.net/unir/paddle-ocr-gpu:latest
+    container_name: paddle-ocr-gpu-tuning
+    ports:
+      - "8002:8000"
+    volumes:
+      - ./dataset:/app/dataset:ro
+      - ./debugset:/app/debugset:rw
+      - paddlex-cache:/root/.paddlex
+    environment:
+      - PYTHONUNBUFFERED=1
+      - CUDA_VISIBLE_DEVICES=0
+      - PADDLE_DET_MODEL=PP-OCRv5_mobile_det
+      - PADDLE_REC_MODEL=PP-OCRv5_mobile_rec
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+volumes:
+  paddlex-cache:
+    name: paddlex-model-cache
--- a/src/docker-compose.tuning.yml
+++ b/src/docker-compose.tuning.yml
@@ -0,0 +1,82 @@
+# docker-compose.tuning.yml - Ray Tune with all OCR services (PaddleOCR + DocTR)
+# Usage:
+#   docker compose -f docker-compose.tuning.yml up -d paddle-ocr-gpu doctr-gpu
+#   docker compose -f docker-compose.tuning.yml run raytune --service paddle --samples 64
+#   docker compose -f docker-compose.tuning.yml run raytune --service doctr --samples 64
+#   docker compose -f docker-compose.tuning.yml down
+#
+# Note: EasyOCR uses port 8002 (same as PaddleOCR). Use docker-compose.tuning.easyocr.yml separately.
+
+services:
+  raytune:
+    image: seryus.ddns.net/unir/raytune:latest
+    network_mode: host
+    shm_size: '5gb'
+    volumes:
+      - ./results:/app/results:rw
+    environment:
+      - PYTHONUNBUFFERED=1
+
+  paddle-ocr-gpu:
+    image: seryus.ddns.net/unir/paddle-ocr-gpu:latest
+    container_name: paddle-ocr-gpu-tuning
+    ports:
+      - "8002:8000"
+    volumes:
+      - ./dataset:/app/dataset:ro
+      - ./debugset:/app/debugset:rw
+      - paddlex-cache:/root/.paddlex
+    environment:
+      - PYTHONUNBUFFERED=1
+      - CUDA_VISIBLE_DEVICES=0
+      - PADDLE_DET_MODEL=PP-OCRv5_mobile_det
+      - PADDLE_REC_MODEL=PP-OCRv5_mobile_rec
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+  doctr-gpu:
+    image: seryus.ddns.net/unir/doctr-gpu:latest
+    container_name: doctr-gpu-tuning
+    ports:
+      - "8003:8000"
+    volumes:
+      - ./dataset:/app/dataset:ro
+      - ./debugset:/app/debugset:rw
+      - doctr-cache:/root/.cache/doctr
+    environment:
+      - PYTHONUNBUFFERED=1
+      - CUDA_VISIBLE_DEVICES=0
+      - DOCTR_DET_ARCH=db_resnet50
+      - DOCTR_RECO_ARCH=crnn_vgg16_bn
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 180s
+
+volumes:
+  paddlex-cache:
+    name: paddlex-model-cache
+  doctr-cache:
+    name: doctr-model-cache
--- a/src/raytune/Dockerfile
+++ b/src/raytune/Dockerfile
@@ -0,0 +1,18 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application files
+COPY raytune_ocr.py .
+COPY run_tuning.py .
+
+# Create results directory
+RUN mkdir -p /app/results
+
+ENV PYTHONUNBUFFERED=1
+
+ENTRYPOINT ["python", "run_tuning.py"]
--- a/src/raytune/README.md
+++ b/src/raytune/README.md
@@ -0,0 +1,131 @@
+# Ray Tune OCR Hyperparameter Optimization
+
+Docker-based hyperparameter tuning for OCR services using Ray Tune with Optuna search.
+
+## Structure
+
+```
+raytune/
+├── Dockerfile          # Python 3.12-slim with Ray Tune + Optuna
+├── requirements.txt    # Dependencies
+├── raytune_ocr.py      # Shared utilities and search spaces
+├── run_tuning.py       # CLI entry point
+└── README.md
+```
+
+## Quick Start
+
+```bash
+cd src
+
+# Build the raytune image
+docker compose -f docker-compose.tuning.paddle.yml build raytune
+
+# Or pull from registry
+docker pull seryus.ddns.net/unir/raytune:latest
+```
+
+## Usage
+
+### PaddleOCR Tuning
+
+```bash
+# Start PaddleOCR service
+docker compose -f docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
+
+# Wait for health check, then run tuning
+docker compose -f docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
+
+# Stop when done
+docker compose -f docker-compose.tuning.paddle.yml down
+```
+
+### DocTR Tuning
+
+```bash
+docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
+docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
+docker compose -f docker-compose.tuning.doctr.yml down
+```
+
+### EasyOCR Tuning
+
+```bash
+# Note: EasyOCR uses port 8002 (same as PaddleOCR). Cannot run simultaneously.
+docker compose -f docker-compose.tuning.easyocr.yml up -d easyocr-gpu
+docker compose -f docker-compose.tuning.easyocr.yml run raytune --service easyocr --samples 64
+docker compose -f docker-compose.tuning.easyocr.yml down
+```
+
+## CLI Options
+
+```
+python run_tuning.py --service {paddle,doctr,easyocr} --samples N
+```
+
+| Option     | Description                          | Default |
+|------------|--------------------------------------|---------|
+| --service  | OCR service to tune (required)       | -       |
+| --samples  | Number of hyperparameter trials      | 64      |
+
+## Search Spaces
+
+### PaddleOCR
+- `use_doc_orientation_classify`: [True, False]
+- `use_doc_unwarping`: [True, False]
+- `textline_orientation`: [True, False]
+- `text_det_thresh`: uniform(0.0, 0.7)
+- `text_det_box_thresh`: uniform(0.0, 0.7)
+- `text_rec_score_thresh`: uniform(0.0, 0.7)
+
+### DocTR
+- `assume_straight_pages`: [True, False]
+- `straighten_pages`: [True, False]
+- `preserve_aspect_ratio`: [True, False]
+- `symmetric_pad`: [True, False]
+- `disable_page_orientation`: [True, False]
+- `disable_crop_orientation`: [True, False]
+- `resolve_lines`: [True, False]
+- `resolve_blocks`: [True, False]
+- `paragraph_break`: uniform(0.01, 0.1)
+
+### EasyOCR
+- `text_threshold`: uniform(0.3, 0.9)
+- `low_text`: uniform(0.2, 0.6)
+- `link_threshold`: uniform(0.2, 0.6)
+- `slope_ths`: uniform(0.0, 0.3)
+- `ycenter_ths`: uniform(0.3, 1.0)
+- `height_ths`: uniform(0.3, 1.0)
+- `width_ths`: uniform(0.3, 1.0)
+- `add_margin`: uniform(0.0, 0.3)
+- `contrast_ths`: uniform(0.05, 0.3)
+- `adjust_contrast`: uniform(0.3, 0.8)
+- `decoder`: ["greedy", "beamsearch"]
+- `beamWidth`: [3, 5, 7, 10]
+- `min_size`: [5, 10, 15, 20]
+
+## Output
+
+Results are saved to `src/results/` as CSV files:
+- `raytune_paddle_results_YYYYMMDD_HHMMSS.csv`
+- `raytune_doctr_results_YYYYMMDD_HHMMSS.csv`
+- `raytune_easyocr_results_YYYYMMDD_HHMMSS.csv`
+
+Each row contains:
+- Configuration parameters (prefixed with `config/`)
+- Metrics: CER, WER, TIME, PAGES, TIME_PER_PAGE
+- Worker URL used for the trial
+
+## Network Mode
+
+The raytune container uses `network_mode: host` to access OCR services on localhost ports:
+- PaddleOCR: port 8002
+- DocTR: port 8003
+- EasyOCR: port 8002 (conflicts with PaddleOCR)
+
+## Dependencies
+
+- ray[tune]==2.52.1
+- optuna==4.7.0
+- requests>=2.28.0
+- pandas>=2.0.0
--- a/src/raytune/raytune_ocr.py
+++ b/src/raytune/raytune_ocr.py
@@ -0,0 +1,371 @@
+# raytune_ocr.py
+# Shared Ray Tune utilities for OCR hyperparameter optimization
+#
+# Usage:
+#   from raytune_ocr import check_workers, create_trainable, run_tuner, analyze_results
+#
+# Environment variables:
+#   OCR_HOST: Host for OCR services (default: localhost)
+
+import os
+from datetime import datetime
+from typing import List, Dict, Any, Callable
+
+import requests
+import pandas as pd
+
+import ray
+from ray import tune
+from ray.tune.search.optuna import OptunaSearch
+
+
+def check_workers(
+    ports: List[int],
+    service_name: str = "OCR",
+    timeout: int = 180,
+    interval: int = 5,
+) -> List[str]:
+    """
+    Wait for workers to be fully ready (model + dataset loaded) and return healthy URLs.
+
+    Args:
+        ports: List of port numbers to check
+        service_name: Name for error messages
+        timeout: Max seconds to wait for each worker
+        interval: Seconds between retries
+
+    Returns:
+        List of healthy worker URLs
+
+    Raises:
+        RuntimeError if no healthy workers found after timeout
+    """
+    import time
+
+    host = os.environ.get("OCR_HOST", "localhost")
+    worker_urls = [f"http://{host}:{port}" for port in ports]
+    healthy_workers = []
+
+    for url in worker_urls:
+        print(f"Waiting for {url}...")
+        start = time.time()
+
+        while time.time() - start < timeout:
+            try:
+                health = requests.get(f"{url}/health", timeout=10).json()
+                model_ok = health.get('model_loaded', False)
+                dataset_ok = health.get('dataset_loaded', False)
+
+                if health.get('status') == 'ok' and model_ok:
+                    gpu = health.get('gpu_name', 'CPU')
+                    print(f"✓ {url}: ready ({gpu})")
+                    healthy_workers.append(url)
+                    break
+
+                elapsed = int(time.time() - start)
+                print(f"  [{elapsed}s] model={model_ok}")
+            except requests.exceptions.RequestException:
+                elapsed = int(time.time() - start)
+                print(f"  [{elapsed}s] not reachable")
+
+            time.sleep(interval)
+        else:
+            print(f"✗ {url}: timeout after {timeout}s")
+
+    if not healthy_workers:
+        raise RuntimeError(
+            f"No healthy {service_name} workers found.\n"
+            f"Checked ports: {ports}"
+        )
+
+    print(f"\n{len(healthy_workers)}/{len(worker_urls)} workers ready\n")
+    return healthy_workers
+
+
+def create_trainable(ports: List[int], payload_fn: Callable[[Dict], Dict]) -> Callable:
+    """
+    Factory to create a trainable function for Ray Tune.
+
+    Args:
+        ports: List of worker ports for load balancing
+        payload_fn: Function that takes config dict and returns API payload dict
+
+    Returns:
+        Trainable function for Ray Tune
+
+    Note:
+        Ray Tune 2.x API: tune.report(metrics_dict) - pass dict directly, NOT kwargs.
+        See: https://docs.ray.io/en/latest/tune/api/doc/ray.tune.report.html
+    """
+    def trainable(config):
+        import os
+        import random
+        import requests
+        from ray.tune import report  # Ray 2.x: report(dict), not report(**kwargs)
+
+        host = os.environ.get("OCR_HOST", "localhost")
+        api_url = f"http://{host}:{random.choice(ports)}"
+        payload = payload_fn(config)
+
+        try:
+            response = requests.post(f"{api_url}/evaluate", json=payload, timeout=None)
+            response.raise_for_status()
+            metrics = response.json()
+            metrics["worker"] = api_url
+            report(metrics)  # Ray 2.x API: pass dict directly
+        except Exception as e:
+            report({  # Ray 2.x API: pass dict directly
+                "CER": 1.0,
+                "WER": 1.0,
+                "TIME": 0.0,
+                "PAGES": 0,
+                "TIME_PER_PAGE": 0,
+                "worker": api_url,
+                "ERROR": str(e)[:500]
+            })
+
+    return trainable
+
+
+def run_tuner(
+    trainable: Callable,
+    search_space: Dict[str, Any],
+    num_samples: int = 64,
+    num_workers: int = 1,
+    metric: str = "CER",
+    mode: str = "min",
+) -> tune.ResultGrid:
+    """
+    Initialize Ray and run hyperparameter tuning.
+
+    Args:
+        trainable: Trainable function from create_trainable()
+        search_space: Dict of parameter names to tune.* search spaces
+        num_samples: Number of trials to run
+        num_workers: Max concurrent trials
+        metric: Metric to optimize
+        mode: "min" or "max"
+
+    Returns:
+        Ray Tune ResultGrid
+    """
+    ray.init(
+        ignore_reinit_error=True,
+        include_dashboard=False,
+        configure_logging=False,
+        _metrics_export_port=0,  # Disable metrics export to avoid connection warnings
+    )
+    print(f"Ray Tune ready (version: {ray.__version__})")
+
+    tuner = tune.Tuner(
+        trainable,
+        tune_config=tune.TuneConfig(
+            metric=metric,
+            mode=mode,
+            search_alg=OptunaSearch(),
+            num_samples=num_samples,
+            max_concurrent_trials=num_workers,
+        ),
+        param_space=search_space,
+    )
+
+    return tuner.fit()
+
+
+def analyze_results(
+    results: tune.ResultGrid,
+    output_folder: str = "results",
+    prefix: str = "raytune",
+    config_keys: List[str] = None,
+) -> pd.DataFrame:
+    """
+    Analyze and save tuning results.
+
+    Args:
+        results: Ray Tune ResultGrid
+        output_folder: Directory to save CSV
+        prefix: Filename prefix
+        config_keys: List of config keys to show in best result (without 'config/' prefix)
+
+    Returns:
+        Results DataFrame
+    """
+    os.makedirs(output_folder, exist_ok=True)
+    df = results.get_dataframe()
+
+    # Save to CSV
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    filename = f"{prefix}_results_{timestamp}.csv"
+    filepath = os.path.join(output_folder, filename)
+    df.to_csv(filepath, index=False)
+    print(f"Results saved: {filepath}")
+
+    # Best configuration
+    best = df.loc[df["CER"].idxmin()]
+    print(f"\nBest CER: {best['CER']:.6f}")
+    print(f"Best WER: {best['WER']:.6f}")
+
+    if config_keys:
+        print(f"\nOptimal Configuration:")
+        for key in config_keys:
+            col = f"config/{key}"
+            if col in best:
+                val = best[col]
+                if isinstance(val, float):
+                    print(f"  {key}: {val:.4f}")
+                else:
+                    print(f"  {key}: {val}")
+
+    return df
+
+
+def correlation_analysis(df: pd.DataFrame, param_keys: List[str]) -> None:
+    """
+    Print correlation of numeric parameters with CER/WER.
+
+    Args:
+        df: Results DataFrame
+        param_keys: List of config keys (without 'config/' prefix)
+    """
+    param_cols = [f"config/{k}" for k in param_keys if f"config/{k}" in df.columns]
+    numeric_cols = [c for c in param_cols if df[c].dtype in ['float64', 'int64']]
+
+    if not numeric_cols:
+        print("No numeric parameters for correlation analysis")
+        return
+
+    corr_cer = df[numeric_cols + ["CER"]].corr()["CER"].sort_values(ascending=False)
+    corr_wer = df[numeric_cols + ["WER"]].corr()["WER"].sort_values(ascending=False)
+
+    print("Correlation with CER:")
+    print(corr_cer)
+    print("\nCorrelation with WER:")
+    print(corr_wer)
+
+
+# =============================================================================
+# OCR-specific payload functions
+# =============================================================================
+
+def paddle_ocr_payload(config: Dict) -> Dict:
+    """Create payload for PaddleOCR API. Uses pages 5-10 (first doc) for tuning."""
+    return {
+        "pdf_folder": "/app/dataset",
+        "use_doc_orientation_classify": config.get("use_doc_orientation_classify", False),
+        "use_doc_unwarping": config.get("use_doc_unwarping", False),
+        "textline_orientation": config.get("textline_orientation", True),
+        "text_det_thresh": config.get("text_det_thresh", 0.0),
+        "text_det_box_thresh": config.get("text_det_box_thresh", 0.0),
+        "text_det_unclip_ratio": config.get("text_det_unclip_ratio", 1.5),
+        "text_rec_score_thresh": config.get("text_rec_score_thresh", 0.0),
+        "start_page": 5,
+        "end_page": 10,
+        "save_output": False,
+    }
+
+
+def doctr_payload(config: Dict) -> Dict:
+    """Create payload for DocTR API. Uses pages 5-10 (first doc) for tuning."""
+    return {
+        "pdf_folder": "/app/dataset",
+        "assume_straight_pages": config.get("assume_straight_pages", True),
+        "straighten_pages": config.get("straighten_pages", False),
+        "preserve_aspect_ratio": config.get("preserve_aspect_ratio", True),
+        "symmetric_pad": config.get("symmetric_pad", True),
+        "disable_page_orientation": config.get("disable_page_orientation", False),
+        "disable_crop_orientation": config.get("disable_crop_orientation", False),
+        "resolve_lines": config.get("resolve_lines", True),
+        "resolve_blocks": config.get("resolve_blocks", False),
+        "paragraph_break": config.get("paragraph_break", 0.035),
+        "start_page": 5,
+        "end_page": 10,
+        "save_output": False,
+    }
+
+
+def easyocr_payload(config: Dict) -> Dict:
+    """Create payload for EasyOCR API. Uses pages 5-10 (first doc) for tuning."""
+    return {
+        "pdf_folder": "/app/dataset",
+        "text_threshold": config.get("text_threshold", 0.7),
+        "low_text": config.get("low_text", 0.4),
+        "link_threshold": config.get("link_threshold", 0.4),
+        "slope_ths": config.get("slope_ths", 0.1),
+        "ycenter_ths": config.get("ycenter_ths", 0.5),
+        "height_ths": config.get("height_ths", 0.5),
+        "width_ths": config.get("width_ths", 0.5),
+        "add_margin": config.get("add_margin", 0.1),
+        "contrast_ths": config.get("contrast_ths", 0.1),
+        "adjust_contrast": config.get("adjust_contrast", 0.5),
+        "decoder": config.get("decoder", "greedy"),
+        "beamWidth": config.get("beamWidth", 5),
+        "min_size": config.get("min_size", 10),
+        "start_page": 5,
+        "end_page": 10,
+        "save_output": False,
+    }
+
+
+# =============================================================================
+# Search spaces
+# =============================================================================
+
+PADDLE_OCR_SEARCH_SPACE = {
+    "use_doc_orientation_classify": tune.choice([True, False]),
+    "use_doc_unwarping": tune.choice([True, False]),
+    "textline_orientation": tune.choice([True, False]),
+    "text_det_thresh": tune.uniform(0.0, 0.7),
+    "text_det_box_thresh": tune.uniform(0.0, 0.7),
+    "text_det_unclip_ratio": tune.choice([0.0]),
+    "text_rec_score_thresh": tune.uniform(0.0, 0.7),
+}
+
+DOCTR_SEARCH_SPACE = {
+    "assume_straight_pages": tune.choice([True, False]),
+    "straighten_pages": tune.choice([True, False]),
+    "preserve_aspect_ratio": tune.choice([True, False]),
+    "symmetric_pad": tune.choice([True, False]),
+    "disable_page_orientation": tune.choice([True, False]),
+    "disable_crop_orientation": tune.choice([True, False]),
+    "resolve_lines": tune.choice([True, False]),
+    "resolve_blocks": tune.choice([True, False]),
+    "paragraph_break": tune.uniform(0.01, 0.1),
+}
+
+EASYOCR_SEARCH_SPACE = {
+    "text_threshold": tune.uniform(0.3, 0.9),
+    "low_text": tune.uniform(0.2, 0.6),
+    "link_threshold": tune.uniform(0.2, 0.6),
+    "slope_ths": tune.uniform(0.0, 0.3),
+    "ycenter_ths": tune.uniform(0.3, 1.0),
+    "height_ths": tune.uniform(0.3, 1.0),
+    "width_ths": tune.uniform(0.3, 1.0),
+    "add_margin": tune.uniform(0.0, 0.3),
+    "contrast_ths": tune.uniform(0.05, 0.3),
+    "adjust_contrast": tune.uniform(0.3, 0.8),
+    "decoder": tune.choice(["greedy", "beamsearch"]),
+    "beamWidth": tune.choice([3, 5, 7, 10]),
+    "min_size": tune.choice([5, 10, 15, 20]),
+}
+
+
+# =============================================================================
+# Config keys for results display
+# =============================================================================
+
+PADDLE_OCR_CONFIG_KEYS = [
+    "use_doc_orientation_classify", "use_doc_unwarping", "textline_orientation",
+    "text_det_thresh", "text_det_box_thresh", "text_det_unclip_ratio", "text_rec_score_thresh",
+]
+
+DOCTR_CONFIG_KEYS = [
+    "assume_straight_pages", "straighten_pages", "preserve_aspect_ratio", "symmetric_pad",
+    "disable_page_orientation", "disable_crop_orientation", "resolve_lines", "resolve_blocks",
+    "paragraph_break",
+]
+
+EASYOCR_CONFIG_KEYS = [
+    "text_threshold", "low_text", "link_threshold", "slope_ths", "ycenter_ths",
+    "height_ths", "width_ths", "add_margin", "contrast_ths", "adjust_contrast",
+    "decoder", "beamWidth", "min_size",
+]
--- a/src/raytune/requirements.txt
+++ b/src/raytune/requirements.txt
@@ -0,0 +1,4 @@
+ray[tune]==2.52.1
+optuna==4.7.0
+requests>=2.28.0
+pandas>=2.0.0
--- a/src/raytune/run_tuning.py
+++ b/src/raytune/run_tuning.py
@@ -0,0 +1,80 @@
+#!/usr/bin/env python3
+"""Run hyperparameter tuning for OCR services."""
+
+import os
+import sys
+import argparse
+from raytune_ocr import (
+    check_workers, create_trainable, run_tuner, analyze_results,
+    paddle_ocr_payload, doctr_payload, easyocr_payload,
+    PADDLE_OCR_SEARCH_SPACE, DOCTR_SEARCH_SPACE, EASYOCR_SEARCH_SPACE,
+    PADDLE_OCR_CONFIG_KEYS, DOCTR_CONFIG_KEYS, EASYOCR_CONFIG_KEYS,
+)
+
+SERVICES = {
+    "paddle": {
+        "payload_fn": paddle_ocr_payload,
+        "search_space": PADDLE_OCR_SEARCH_SPACE,
+        "config_keys": PADDLE_OCR_CONFIG_KEYS,
+        "name": "PaddleOCR",
+    },
+    "doctr": {
+        "payload_fn": doctr_payload,
+        "search_space": DOCTR_SEARCH_SPACE,
+        "config_keys": DOCTR_CONFIG_KEYS,
+        "name": "DocTR",
+    },
+    "easyocr": {
+        "payload_fn": easyocr_payload,
+        "search_space": EASYOCR_SEARCH_SPACE,
+        "config_keys": EASYOCR_CONFIG_KEYS,
+        "name": "EasyOCR",
+    },
+}
+
+def main():
+    parser = argparse.ArgumentParser(description="Run OCR hyperparameter tuning")
+    parser.add_argument("--service", choices=["paddle", "doctr", "easyocr"], required=True)
+    parser.add_argument("--host", type=str, default="localhost", help="OCR service host")
+    parser.add_argument("--port", type=int, default=8000, help="OCR service port")
+    parser.add_argument("--samples", type=int, default=64, help="Number of samples")
+    args = parser.parse_args()
+
+    # Set environment variable for raytune_ocr module
+    os.environ["OCR_HOST"] = args.host
+
+    cfg = SERVICES[args.service]
+    ports = [args.port]
+
+    print(f"\n{'='*50}")
+    print(f"Hyperparameter Tuning: {cfg['name']}")
+    print(f"Host: {args.host}:{args.port}")
+    print(f"Samples: {args.samples}")
+    print(f"{'='*50}\n")
+
+    # Check workers
+    healthy = check_workers(ports, cfg["name"])
+
+    # Create trainable and run tuning
+    trainable = create_trainable(ports, cfg["payload_fn"])
+    results = run_tuner(
+        trainable=trainable,
+        search_space=cfg["search_space"],
+        num_samples=args.samples,
+        num_workers=len(healthy),
+    )
+
+    # Analyze results
+    df = analyze_results(
+        results,
+        output_folder="results",
+        prefix=f"raytune_{args.service}",
+        config_keys=cfg["config_keys"],
+    )
+
+    print(f"\n{'='*50}")
+    print("Tuning complete!")
+    print(f"{'='*50}")
+
+if __name__ == "__main__":
+    main()
--- a/thesis_output/plantilla_individual.htm
+++ b/thesis_output/plantilla_individual.htm
--- a/thesis_output/plantilla_individual.htm.bak
+++ b/thesis_output/plantilla_individual.htm.bak