Some checks failed
build_docker / essential (push) Successful in 0s
build_docker / build_paddle_ocr (push) Successful in 4m57s
build_docker / build_raytune (push) Has been cancelled
build_docker / build_easyocr_gpu (push) Has been cancelled
build_docker / build_doctr (push) Has been cancelled
build_docker / build_doctr_gpu (push) Has been cancelled
build_docker / build_paddle_ocr_gpu (push) Has been cancelled
build_docker / build_easyocr (push) Has been cancelled
6.9 KiB
6.9 KiB
Review and validate the documentation for this Master's Thesis project.
Instructions
-
Read metrics source files first to get the correct values:
docs/metrics/metrics_paddle.md- PaddleOCR resultsdocs/metrics/metrics_doctr.md- DocTR resultsdocs/metrics/metrics_easyocr.md- EasyOCR resultsdocs/metrics/metrics.md- Comparative summarysrc/results/*.csv- Raw data from 64 trials per service (5-page tuning subset)src/*/requirements.txt- Dependency versions used for the experiments
-
Review UNIR guidelines for formatting and structure rules:
instructions/plantilla_individual.htm- PRIMARY REFERENCE for all styling (CSS classes, Word styles)instructions/plantilla_individual_files/- Support files with additional style definitionsinstructions/instrucciones.pdf- TFE writing instructionsinstructions/plantilla_individual.pdf- Official template preview
IMPORTANT: When styling elements (tables, figures, notes, quotes), ALWAYS check
plantilla_individual.htmfor existing Word/CSS classes (e.g.,MsoQuote,MsoCaption,Piedefoto-tabla). Use these classes instead of custom inline styles.
UNIR Color Palette (from plantilla_individual.htm)
| Color | Hex | Usage |
|---|---|---|
| Primary Blue | #0098CD |
Headings, titles, diagram borders |
| Light Blue BG | #E6F4F9 |
Backgrounds, callout boxes, nodes |
| Dark Gray | #404040 |
Primary text |
| Accent Blue | #5B9BD5 |
Table headers, accent elements |
| Light Accent | #9CC2E5 |
Table borders |
| Very Light Blue | #DEEAF6 |
Secondary backgrounds, subgraphs |
| White | #FFFFFF |
Header text, contrast |
Table Styles (from template)
MsoTableGrid- Basic grid tableMsoTable15Grid4Accent1- Styled table with UNIR colors (header:#5B9BD5, borders:#9CC2E5)Piedefoto-tabla- Table caption/source style
- Validate each documentation file checking:
Data Accuracy
- All CER/WER values must match those in
docs/metrics/*.md - Verify: baseline, optimized, best trial, percentage improvement
- Verify: GPU vs CPU acceleration factor
- Verify: dataset size (pages)
UNIR Formatting
- Tables:
**Tabla N.** *Descriptive title in italics.*followed by table, then a line that starts withFuente:immediately after the table (no blank lines), e.g.,Fuente: ...- Table titles must describe the content (e.g., "Comparación de modelos OCR")
- Figures:
**Figura N.** *Descriptive title in italics.*- Figure titles must describe the content (e.g., "Pipeline de un sistema OCR moderno")
- Sequential numbering (no duplicates, no gaps)
- APA citation format for references
Word Generation Alignment
- Table sources are only captured when the line immediately after the table starts with
Fuente:(perapply_content.py). - Mermaid figures use the YAML
title:for captions in Word output;**Figura N.**lines are ignored by the generator but should remain for UNIR compliance.
Mermaid Diagrams
- All diagrams must be in Mermaid format (no external images for flowcharts/charts)
- All Mermaid diagrams must use the UNIR color theme
- Required YAML frontmatter config (Mermaid v11+):
--- title: "Diagram Title" config: theme: base themeVariables: primaryColor: "#E6F4F9" primaryTextColor: "#404040" primaryBorderColor: "#0098CD" lineColor: "#0098CD" --- flowchart LR A[Node] --> B[Node] - Colors:
#0098CD(UNIR blue for borders/lines),#E6F4F9(light blue background) - All diagrams must have a descriptive
title:in YAML frontmatter - Titles MUST be quoted:
title: "Descriptive Title"(nottitle: Descriptive Title) - Titles should describe the diagram content, not generic "Diagrama N"
- Verify theme is applied to all diagrams in
docs/*.md
Note on Bar Charts (xychart-beta):
- Bar chart colors are automatically converted to light blue (
#0098CD) during figure generation - The
xyChart.plotColorPaletteconfig in YAML frontmatter does NOT work reliably with mmdc - Instead,
generate_mermaid_figures.pypost-processes SVG to replace default colors (#ECECFF,#FFF4DD) - No manual color configuration needed in xychart-beta blocks - they will be styled automatically
Files to Review
docs/00_resumen.md- Resumen/Abstractdocs/01_introduccion.md- Introduccióndocs/02_contexto_estado_arte.md- Contexto y estado del artedocs/03_objetivos_metodologia.md- Objetivos y metodologíadocs/04_desarrollo_especifico.md- Desarrollo específico (resultados)docs/05_conclusiones_trabajo_futuro.md- Conclusiones y trabajo futurodocs/06_referencias_bibliograficas.md- Referenciasdocs/07_anexo_a.md- Anexo técnicoREADME.md- Project overview
-
Report findings with:
- List of incorrect values found (with file:line references)
- Formatting issues detected
- Specific corrections needed
- Overall documentation health assessment
-
Language: All docs/* files must be in Spanish. README.md and CLAUDE.md can be in English.
-
Audit Run (repeatable process):
- Validate each Mermaid diagram that contains numbers against its stated source (CSV or metrics file).
- Confirm every figure/table that includes metrics has a valid
*Fuente:*line pointing to:src/results/*.csv,src/results/correlations/*.csv, ordocs/metrics/*.md, or- External sources listed in
docs/07_anexo_a.md.
- Record any missing or mismatched sources before making edits.
Writing Style (Required)
- Use fluent Spanish with standard punctuation, avoid long dashes.
- Prefer commas, semicolons, or short sentences over em dashes.
- Keep paragraphs concise and clear, avoid overly long sentences.
Data Integrity (Required)
- Do not invent or estimate values. Every numeric claim must be sourced from
src/results/*.csv,docs/metrics/*.md, or external documentation explicitly listed indocs/07_anexo_a.md. - If a value is not present in those sources, remove it or mark it as unknown and request clarification.
- Source of truth for OCR metrics in
docs/00-07: usedocs/metrics/*.mdfor both "Resultados del Subconjunto de Ajuste" and "Evaluación del Dataset Completo", andsrc/results/*.csvfor tuning subset values referenced by those sections.
CSV Verification (Required)
Use the CSVs to validate best-trial values and to confirm that tuning-only figures are not confused with full-dataset results.
Interpretation Rules
- The CSVs are from tuning on pages 5-10, not the full 45-page dataset.
- Values like “best trial CER” and “best trial WER” must match the CSVs.
- Full-dataset metrics must be sourced elsewhere and clearly labeled as full evaluation.
src/raytune_paddle_subproc_results_20251207_192320.csvis CPU-only timing reference; do not use it for accuracy claims.- GPU results are the primary research driver. CPU results are only used to illustrate timing without GPU.