# Repository Guidelines ## Project Structure & Module Organization - `docs/`: Thesis chapters 00-07 in Spanish (UNIR structure). Edit these for narrative changes. - `src/`: OCR tuning code, services, notebooks, and results. Key subfolders: `raytune/`, `paddle_ocr/`, `doctr_service/`, `easyocr_service/`, `results/`, `results/correlations/`. - `instructions/`: UNIR template and writing rules (`plantilla_individual.htm` is the styling source of truth). - `thesis_output/`: Generated thesis HTML and figures (do not edit by hand). - Root scripts: `generate_mermaid_figures.py` (Mermaid to PNG) and `apply_content.py` (template assembly). - Temporary scripts go in `tem/scripts/`. ## Build, Test, and Development Commands - `source .venv/bin/activate` before installing or running Python tools. - `npm install`: install Mermaid CLI (`node_modules/.bin/mmdc`) for figure generation. - `python3 generate_mermaid_figures.py`: write PNGs to `thesis_output/figures/` from `docs/*.md`. - `python3 apply_content.py`: generate `thesis_output/plantilla_individual.htm` from `docs/` + `instructions/`. - `jupyter notebook src/prepare_dataset.ipynb`: prepare OCR dataset from PDFs. - `jupyter notebook src/paddle_ocr_fine_tune_unir_raytune.ipynb`: run the main tuning experiment. - Docker tuning (GPU): - `docker compose -f src/docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu` - `docker compose -f src/docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64` - `docker compose -f src/docker-compose.tuning.paddle.yml down` - Use `.claude/commands/word-generation.md` to regenerate the thesis output. ## Coding Style & Naming Conventions - Python: PEP 8, 4-space indentation, `snake_case`. - Notebooks live in `src/` and should keep execution order clean when committed. - Documentation in `docs/` is Spanish; code comments stay in English. ## Data, Documentation, and Formatting Rules - Run `.claude/commands/documentation-review.md` before editing `docs/00-07`. - Do not invent numbers. Every numeric claim must come from `src/results/*.csv`, `src/results/correlations/*.csv`, `docs/metrics/*.md`, or external sources listed in `docs/07_anexo_a.md`. - Tables and figures must use UNIR caption format: `**Tabla N.** *Título.*` / `**Figura N.** *Título.*` plus `*Fuente: ...*`. - Mermaid diagrams require YAML frontmatter with a quoted `title:` and UNIR theme variables. - Use full repository links in `*Fuente:*` lines, e.g. `https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics.md`. ## Testing Guidelines - No automated tests. Validate changes by running a small tuning run and checking CSV output in `src/results/`. ## Commit & Pull Request Guidelines - Commit messages are short, sentence case, and may include a tracker reference in parentheses. - Keep commits focused; mention generated outputs (figures, HTML) when relevant. ## Agent-Specific Notes - Follow `claude.md` for thesis-specific constraints and templates.