Leyenda
Some checks failed
build_docker / essential (push) Successful in 1s
build_docker / build_paddle_ocr (push) Successful in 4m0s
build_docker / build_paddle_ocr_gpu (push) Successful in 18m53s
build_docker / build_easyocr (push) Successful in 16m12s
build_docker / build_easyocr_gpu (push) Successful in 22m37s
build_docker / build_doctr (push) Successful in 21m22s
build_docker / build_raytune (push) Successful in 2m50s
build_docker / build_doctr_gpu (push) Has been cancelled
Some checks failed
build_docker / essential (push) Successful in 1s
build_docker / build_paddle_ocr (push) Successful in 4m0s
build_docker / build_paddle_ocr_gpu (push) Successful in 18m53s
build_docker / build_easyocr (push) Successful in 16m12s
build_docker / build_easyocr_gpu (push) Successful in 22m37s
build_docker / build_doctr (push) Successful in 21m22s
build_docker / build_raytune (push) Successful in 2m50s
build_docker / build_doctr_gpu (push) Has been cancelled
This commit is contained in:
@@ -6,7 +6,7 @@ Se realizó un estudio comparativo de tres soluciones OCR de código abierto: Ea
|
||||
|
||||
Los resultados demuestran que la optimización de hiperparámetros logró mejoras significativas: el mejor trial individual alcanzó un CER de 0.79% (precisión del 99.21%), cumpliendo el objetivo de CER < 2%. Al validar la configuración optimizada sobre el dataset completo de 45 páginas, se obtuvo una mejora del 12.8% en CER (de 8.85% a 7.72%). El hallazgo más relevante fue que el parámetro `textline_orientation` (clasificación de orientación de línea de texto) tiene un impacto crítico en el rendimiento. Adicionalmente, se identificó que el umbral de detección (`text_det_thresh`) presenta una correlación positiva moderada (0.43) con el error, lo que indica que valores más bajos tienden a mejorar el rendimiento.
|
||||
|
||||
**Fuente:** [`docs/metrics/metrics_paddle.md`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics_paddle.md), [`src/results/correlations/paddle_correlations.csv`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/src/results/correlations/paddle_correlations.csv).
|
||||
**Fuente:** [`metrics_paddle.md`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics_paddle.md), [`paddle_correlations.csv`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/src/results/correlations/paddle_correlations.csv).
|
||||
|
||||
Este trabajo demuestra que la optimización de hiperparámetros es una alternativa viable al fine-tuning, especialmente útil cuando se dispone de modelos preentrenados para el idioma objetivo. La infraestructura dockerizada desarrollada permite reproducir los experimentos y facilita la evaluación sistemática de configuraciones OCR.
|
||||
|
||||
@@ -22,7 +22,7 @@ A comparative study of three open-source OCR solutions was conducted with EasyOC
|
||||
|
||||
Results demonstrate that hyperparameter optimization achieved significant improvements. The best individual trial reached a CER of 0.79% (99.21% accuracy), meeting the CER < 2% objective. When validating the optimized configuration on the full 45-page dataset, a 12.8% CER improvement was obtained (from 8.85% to 7.72%). The most relevant finding was that the `textline_orientation` parameter (text line orientation classification) has a critical impact on performance. Additionally, the detection threshold (`text_det_thresh`) showed a moderate positive correlation (0.43) with error, indicating that lower values tend to improve performance.
|
||||
|
||||
Sources: [`docs/metrics/metrics_paddle.md`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics_paddle.md), [`src/results/correlations/paddle_correlations.csv`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/src/results/correlations/paddle_correlations.csv).
|
||||
Sources: [`metrics_paddle.md`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics_paddle.md), [`paddle_correlations.csv`](https://seryus.ddns.net/unir/MastersThesis/src/branch/main/src/results/correlations/paddle_correlations.csv).
|
||||
|
||||
This work demonstrates that hyperparameter optimization is a viable alternative to fine-tuning, especially useful when pre-trained models for the target language are available. The dockerized infrastructure developed enables experiment reproducibility and facilitates systematic evaluation of OCR configurations.
|
||||
|
||||
|
||||
157
docs/compliance.md
Normal file
157
docs/compliance.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# UNIR Style Compliance Checklist
|
||||
|
||||
This document lists the UNIR TFE style requirements to verify before final submission.
|
||||
|
||||
## Page Layout
|
||||
|
||||
| Requirement | Specification | Check |
|
||||
|-------------|---------------|-------|
|
||||
| Page size | A4 | ☐ |
|
||||
| Left margin | 3.0 cm | ☐ |
|
||||
| Right margin | 2.0 cm | ☐ |
|
||||
| Top margin | 2.5 cm | ☐ |
|
||||
| Bottom margin | 2.5 cm | ☐ |
|
||||
| Header | Student name + TFE title | ☐ |
|
||||
| Footer | Page number | ☐ |
|
||||
|
||||
## Typography
|
||||
|
||||
| Element | Specification | Check |
|
||||
|---------|---------------|-------|
|
||||
| Body text | Calibri 12pt, justified, 1.5 line spacing | ☐ |
|
||||
| Título 1 (H1) | Calibri Light 18pt, blue, numbered (1., 2., ...) | ☐ |
|
||||
| Título 2 (H2) | Calibri Light 14pt, blue, numbered (1.1, 1.2, ...) | ☐ |
|
||||
| Título 3 (H3) | Calibri Light 12pt, numbered (1.1.1, 1.1.2, ...) | ☐ |
|
||||
| Título 4 (H4) | Calibri 12pt, bold, unnumbered | ☐ |
|
||||
| Footnotes | Calibri 10pt, justified, single spacing | ☐ |
|
||||
| Code blocks | Consolas 10pt | ☐ |
|
||||
|
||||
## Document Structure
|
||||
|
||||
| Section | Requirements | Check |
|
||||
|---------|--------------|-------|
|
||||
| Portada | Title, Author, Type, Director, Date | ☐ |
|
||||
| Resumen | 150-300 words in Spanish + Palabras clave (3-5) | ☐ |
|
||||
| Abstract | 150-300 words in English + Keywords (3-5) | ☐ |
|
||||
| Índice de contenidos | Auto-generated, new page | ☐ |
|
||||
| Índice de figuras | Auto-generated, new page | ☐ |
|
||||
| Índice de tablas | Auto-generated, new page | ☐ |
|
||||
| Cap. 1 Introducción | 1.1 Motivación, 1.2 Planteamiento, 1.3 Estructura | ☐ |
|
||||
| Cap. 2 Contexto | 2.1 Contexto, 2.2 Estado del arte, 2.3 Conclusiones | ☐ |
|
||||
| Cap. 3 Objetivos | 3.1 Objetivo general, 3.2 Específicos, 3.3 Metodología | ☐ |
|
||||
| Cap. 4 Desarrollo | Structure depends on work type | ☐ |
|
||||
| Cap. 5 Conclusiones | 5.1 Conclusiones, 5.2 Trabajo futuro | ☐ |
|
||||
| Referencias | APA format, alphabetical order | ☐ |
|
||||
| Anexos | Code repository URL, supplementary data | ☐ |
|
||||
|
||||
## Tables
|
||||
|
||||
| Requirement | Specification | Check |
|
||||
|-------------|---------------|-------|
|
||||
| Title position | Above the table | ☐ |
|
||||
| Title format | **Tabla N.** *Descriptive title in italics.* | ☐ |
|
||||
| Numbering | Sequential (1, 2, 3...), Anexo uses A1, A2... | ☐ |
|
||||
| Border style | APA: horizontal lines only (top, header bottom, table bottom) | ☐ |
|
||||
| Source position | Below the table, centered | ☐ |
|
||||
| Source format | Fuente: Author, Year. or Fuente: Elaboración propia. | ☐ |
|
||||
| Leyenda (if needed) | Below Fuente, same style (Piedefoto-tabla) | ☐ |
|
||||
| In TOT index | All tables appear in Índice de tablas | ☐ |
|
||||
|
||||
## Figures
|
||||
|
||||
| Requirement | Specification | Check |
|
||||
|-------------|---------------|-------|
|
||||
| Title position | Above the figure | ☐ |
|
||||
| Title format | **Figura N.** *Descriptive title in italics.* | ☐ |
|
||||
| Numbering | Sequential (1, 2, 3...), Anexo uses A1, A2... | ☐ |
|
||||
| Alignment | Centered | ☐ |
|
||||
| Source position | Below the figure, centered | ☐ |
|
||||
| Source format | Fuente: Author, Year. or Fuente: Elaboración propia. | ☐ |
|
||||
| Leyenda (if needed) | Below Fuente, same style (Piedefoto-tabla) | ☐ |
|
||||
| In TOF index | All figures appear in Índice de figuras | ☐ |
|
||||
|
||||
## Lists
|
||||
|
||||
| Requirement | Specification | Check |
|
||||
|-------------|---------------|-------|
|
||||
| Bullet lists | Indented 36pt, bullet symbol (·) | ☐ |
|
||||
| Numbered lists | Indented 36pt, sequential numbers (1, 2, 3...) | ☐ |
|
||||
| Spacing | Proper First/Middle/Last paragraph spacing | ☐ |
|
||||
|
||||
## Citations and References
|
||||
|
||||
| Requirement | Specification | Check |
|
||||
|-------------|---------------|-------|
|
||||
| Citation format | APA 7th edition | ☐ |
|
||||
| Single author | (Author, Year) or Author (Year) | ☐ |
|
||||
| Two authors | (Author1 & Author2, Year) | ☐ |
|
||||
| Three+ authors | (Author1 et al., Year) | ☐ |
|
||||
| Reference list | Alphabetical by first author surname | ☐ |
|
||||
| Hanging indent | 36pt left margin, -36pt text indent | ☐ |
|
||||
| DOI/URL | Include when available | ☐ |
|
||||
| No Wikipedia | Wikipedia citations not allowed | ☐ |
|
||||
| Source variety | Books, journals, conferences (not just URLs) | ☐ |
|
||||
|
||||
## SMART Objectives
|
||||
|
||||
All objectives must be SMART:
|
||||
|
||||
| Criterion | Requirement | Check |
|
||||
|-----------|-------------|-------|
|
||||
| **S**pecific | Clearly defined, unambiguous | ☐ |
|
||||
| **M**easurable | Quantifiable success metric (e.g., CER < 2%) | ☐ |
|
||||
| **A**ttainable | Feasible with available resources | ☐ |
|
||||
| **R**elevant | Demonstrable impact | ☐ |
|
||||
| **T**ime-bound | Achievable within timeframe | ☐ |
|
||||
|
||||
## Writing Style
|
||||
|
||||
| Requirement | Check |
|
||||
|-------------|-------|
|
||||
| Each chapter starts with introductory paragraph | ☐ |
|
||||
| Each paragraph has at least 3 sentences | ☐ |
|
||||
| No two consecutive headings without text between them | ☐ |
|
||||
| No superfluous phrases or repetition | ☐ |
|
||||
| All concepts defined with pertinent citations | ☐ |
|
||||
| Spelling checked (Word corrector) | ☐ |
|
||||
| Logical flow between paragraphs | ☐ |
|
||||
|
||||
## Final Checks
|
||||
|
||||
| Requirement | Check |
|
||||
|-------------|-------|
|
||||
| All cited references appear in reference list | ☐ |
|
||||
| All references in list are cited in text | ☐ |
|
||||
| All figures/tables have numbers and titles | ☐ |
|
||||
| Update all indices (Ctrl+A, F9 in Word) | ☐ |
|
||||
| Page count: 50-90 pages (excl. cover, indices, annexes) | ☐ |
|
||||
| Final format: PDF for deposit | ☐ |
|
||||
|
||||
## Automated Checks (apply_content.py)
|
||||
|
||||
The following are automatically handled by the generation scripts:
|
||||
|
||||
- ✓ Table/Figure sequential numbering
|
||||
- ✓ Anexo items use A1, A2... prefix
|
||||
- ✓ TC fields for Anexo items (appear in indices)
|
||||
- ✓ Piedefoto-tabla style for Fuente/Leyenda
|
||||
- ✓ MsoCaption style for titles
|
||||
- ✓ APA table borders (horizontal only)
|
||||
- ✓ MsoBibliography style for references
|
||||
- ✓ MsoQuote style for blockquotes
|
||||
- ✓ List paragraph classes (First/Middle/Last)
|
||||
- ✓ Bold H4 headings (unnumbered)
|
||||
|
||||
## Color Palette (UNIR Theme)
|
||||
|
||||
| Color | Hex | Usage |
|
||||
|-------|-----|-------|
|
||||
| Primary Blue | `#0098CD` | Headings, diagram borders |
|
||||
| Light Blue BG | `#E6F4F9` | Diagram backgrounds |
|
||||
| Dark Gray | `#404040` | Body text |
|
||||
| Accent Blue | `#5B9BD5` | Table headers |
|
||||
| Light Accent | `#9CC2E5` | Table borders |
|
||||
|
||||
---
|
||||
|
||||
**Reference:** UNIR TFE Guidelines (`instructions/instrucciones.pdf`, `instructions/plantilla_individual.pdf`)
|
||||
Reference in New Issue
Block a user