Files
MastersThesis/AGENTS.md
sergio 0089b34cb3
Some checks failed
build_docker / essential (push) Successful in 0s
build_docker / build_paddle_ocr (push) Successful in 4m57s
build_docker / build_raytune (push) Has been cancelled
build_docker / build_easyocr_gpu (push) Has been cancelled
build_docker / build_doctr (push) Has been cancelled
build_docker / build_doctr_gpu (push) Has been cancelled
build_docker / build_paddle_ocr_gpu (push) Has been cancelled
build_docker / build_easyocr (push) Has been cancelled
Documentation review and data consistency.
2026-01-24 15:53:34 +01:00

2.9 KiB

Repository Guidelines

Project Structure & Module Organization

  • docs/: Thesis chapters 00-07 in Spanish (UNIR structure). Edit these for narrative changes.
  • src/: OCR tuning code, services, notebooks, and results. Key subfolders: raytune/, paddle_ocr/, doctr_service/, easyocr_service/, results/, results/correlations/.
  • instructions/: UNIR template and writing rules (plantilla_individual.htm is the styling source of truth).
  • thesis_output/: Generated thesis HTML and figures (do not edit by hand).
  • Root scripts: generate_mermaid_figures.py (Mermaid to PNG) and apply_content.py (template assembly).
  • Temporary scripts go in tem/scripts/.

Build, Test, and Development Commands

  • source .venv/bin/activate before installing or running Python tools.
  • npm install: install Mermaid CLI (node_modules/.bin/mmdc) for figure generation.
  • python3 generate_mermaid_figures.py: write PNGs to thesis_output/figures/ from docs/*.md.
  • python3 apply_content.py: generate thesis_output/plantilla_individual.htm from docs/ + instructions/.
  • jupyter notebook src/prepare_dataset.ipynb: prepare OCR dataset from PDFs.
  • jupyter notebook src/paddle_ocr_fine_tune_unir_raytune.ipynb: run the main tuning experiment.
  • Docker tuning (GPU):
    • docker compose -f src/docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu
    • docker compose -f src/docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64
    • docker compose -f src/docker-compose.tuning.paddle.yml down
  • Use .claude/commands/word-generation.md to regenerate the thesis output.

Coding Style & Naming Conventions

  • Python: PEP 8, 4-space indentation, snake_case.
  • Notebooks live in src/ and should keep execution order clean when committed.
  • Documentation in docs/ is Spanish; code comments stay in English.

Data, Documentation, and Formatting Rules

  • Run .claude/commands/documentation-review.md before editing docs/00-07.
  • Do not invent numbers. Every numeric claim must come from src/results/*.csv, src/results/correlations/*.csv, docs/metrics/*.md, or external sources listed in docs/07_anexo_a.md.
  • Tables and figures must use UNIR caption format: **Tabla N.** *Título.* / **Figura N.** *Título.* plus *Fuente: ...*.
  • Mermaid diagrams require YAML frontmatter with a quoted title: and UNIR theme variables.
  • Use full repository links in *Fuente:* lines, e.g. https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics.md.

Testing Guidelines

  • No automated tests. Validate changes by running a small tuning run and checking CSV output in src/results/.

Commit & Pull Request Guidelines

  • Commit messages are short, sentence case, and may include a tracker reference in parentheses.
  • Keep commits focused; mention generated outputs (figures, HTML) when relevant.

Agent-Specific Notes

  • Follow claude.md for thesis-specific constraints and templates.