regen docs
Some checks failed
build_docker / build_cpu (pull_request) Has been cancelled
build_docker / build_gpu (pull_request) Has been cancelled
build_docker / build_easyocr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (pull_request) Has been cancelled
build_docker / build_raytune (pull_request) Has been cancelled
build_docker / build_doctr (pull_request) Has been cancelled
build_docker / essential (pull_request) Has been cancelled
build_docker / build_doctr_gpu (pull_request) Has been cancelled
Some checks failed
build_docker / build_cpu (pull_request) Has been cancelled
build_docker / build_gpu (pull_request) Has been cancelled
build_docker / build_easyocr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (pull_request) Has been cancelled
build_docker / build_raytune (pull_request) Has been cancelled
build_docker / build_doctr (pull_request) Has been cancelled
build_docker / essential (pull_request) Has been cancelled
build_docker / build_doctr_gpu (pull_request) Has been cancelled
This commit is contained in:
@@ -104,16 +104,7 @@ flowchart LR
|
||||
|
||||
#### Clase ImageTextDataset
|
||||
|
||||
Se implementó una clase Python para cargar pares imagen-texto:
|
||||
|
||||
```python
|
||||
class ImageTextDataset:
|
||||
def __init__(self, root):
|
||||
# Carga pares (imagen, texto) de carpetas pareadas
|
||||
|
||||
def __getitem__(self, idx):
|
||||
# Retorna (PIL.Image, str)
|
||||
```
|
||||
Se implementó una clase Python para cargar pares imagen-texto que retorna tuplas (PIL.Image, str) desde carpetas pareadas. La implementación completa está disponible en `src/ocr_benchmark_notebook.ipynb` (ver Anexo A).
|
||||
|
||||
### Fase 2: Benchmark Comparativo
|
||||
|
||||
@@ -131,17 +122,7 @@ class ImageTextDataset:
|
||||
|
||||
#### Métricas de Evaluación
|
||||
|
||||
Se utilizó la biblioteca `jiwer` para calcular:
|
||||
|
||||
```python
|
||||
from jiwer import wer, cer
|
||||
|
||||
def evaluate_text(reference, prediction):
|
||||
return {
|
||||
'WER': wer(reference, prediction),
|
||||
'CER': cer(reference, prediction)
|
||||
}
|
||||
```
|
||||
Se utilizó la biblioteca `jiwer` para calcular CER y WER comparando el texto de referencia con la predicción del modelo OCR. La implementación está disponible en `src/ocr_benchmark_notebook.ipynb` (ver Anexo A).
|
||||
|
||||
### Fase 3: Espacio de Búsqueda
|
||||
|
||||
@@ -163,66 +144,45 @@ def evaluate_text(reference, prediction):
|
||||
|
||||
#### Configuración de Ray Tune
|
||||
|
||||
```python
|
||||
from ray import tune
|
||||
from ray.tune.search.optuna import OptunaSearch
|
||||
|
||||
search_space = {
|
||||
"use_doc_orientation_classify": tune.choice([True, False]),
|
||||
"use_doc_unwarping": tune.choice([True, False]),
|
||||
"textline_orientation": tune.choice([True, False]),
|
||||
"text_det_thresh": tune.uniform(0.0, 0.7),
|
||||
"text_det_box_thresh": tune.uniform(0.0, 0.7),
|
||||
"text_det_unclip_ratio": tune.choice([0.0]),
|
||||
"text_rec_score_thresh": tune.uniform(0.0, 0.7),
|
||||
}
|
||||
|
||||
tuner = tune.Tuner(
|
||||
trainable_paddle_ocr,
|
||||
tune_config=tune.TuneConfig(
|
||||
metric="CER",
|
||||
mode="min",
|
||||
search_alg=OptunaSearch(),
|
||||
num_samples=64,
|
||||
max_concurrent_trials=2
|
||||
)
|
||||
)
|
||||
```
|
||||
El espacio de búsqueda se definió utilizando `tune.choice()` para parámetros booleanos y `tune.uniform()` para parámetros continuos, con OptunaSearch como algoritmo de optimización configurado para minimizar CER en 64 trials. La implementación completa está disponible en `src/raytune/raytune_ocr.py` (ver Anexo A).
|
||||
|
||||
### Fase 4: Ejecución de Optimización
|
||||
|
||||
#### Arquitectura de Ejecución
|
||||
|
||||
Debido a incompatibilidades entre Ray y PaddleOCR en el mismo proceso, se implementó una arquitectura basada en subprocesos:
|
||||
Se implementó una arquitectura basada en contenedores Docker para aislar los servicios OCR y facilitar la reproducibilidad:
|
||||
|
||||
```mermaid
|
||||
---
|
||||
title: "Arquitectura de ejecución con subprocesos"
|
||||
title: "Arquitectura de ejecución con Docker Compose"
|
||||
---
|
||||
flowchart LR
|
||||
A["Ray Tune (proceso principal)"]
|
||||
subgraph Docker["Docker Compose"]
|
||||
A["RayTune Container"]
|
||||
B["OCR Service Container"]
|
||||
end
|
||||
|
||||
A --> B["Subprocess 1: paddle_ocr_tuning.py --config"]
|
||||
B --> B_out["Retorna JSON con métricas"]
|
||||
|
||||
A --> C["Subprocess 2: paddle_ocr_tuning.py --config"]
|
||||
C --> C_out["Retorna JSON con métricas"]
|
||||
A -->|"HTTP POST /evaluate"| B
|
||||
B -->|"JSON {CER, WER, TIME}"| A
|
||||
A -.->|"Health check /health"| B
|
||||
```
|
||||
|
||||
#### Script de Evaluación (paddle_ocr_tuning.py)
|
||||
#### Ejecución con Docker Compose
|
||||
|
||||
El script recibe hiperparámetros por línea de comandos:
|
||||
Los servicios se orquestan mediante Docker Compose (`src/docker-compose.tuning.*.yml`):
|
||||
|
||||
```bash
|
||||
python paddle_ocr_tuning.py \
|
||||
--pdf-folder ./dataset \
|
||||
--textline-orientation True \
|
||||
--text-det-box-thresh 0.5 \
|
||||
--text-det-thresh 0.4 \
|
||||
--text-rec-score-thresh 0.6
|
||||
# Iniciar servicio OCR
|
||||
docker compose -f docker-compose.tuning.doctr.yml up -d doctr-gpu
|
||||
|
||||
# Ejecutar optimización (64 trials)
|
||||
docker compose -f docker-compose.tuning.doctr.yml run raytune --service doctr --samples 64
|
||||
|
||||
# Detener servicios
|
||||
docker compose -f docker-compose.tuning.doctr.yml down
|
||||
```
|
||||
|
||||
Y retorna métricas en formato JSON:
|
||||
El servicio OCR expone una API REST que retorna métricas en formato JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user