Files
MastersThesis/AGENTS.md
sergio 0089b34cb3
Some checks failed
build_docker / essential (push) Successful in 0s
build_docker / build_paddle_ocr (push) Successful in 4m57s
build_docker / build_raytune (push) Has been cancelled
build_docker / build_easyocr_gpu (push) Has been cancelled
build_docker / build_doctr (push) Has been cancelled
build_docker / build_doctr_gpu (push) Has been cancelled
build_docker / build_paddle_ocr_gpu (push) Has been cancelled
build_docker / build_easyocr (push) Has been cancelled
Documentation review and data consistency.
2026-01-24 15:53:34 +01:00

52 lines
2.9 KiB
Markdown

# Repository Guidelines
## Project Structure & Module Organization
- `docs/`: Thesis chapters 00-07 in Spanish (UNIR structure). Edit these for narrative changes.
- `src/`: OCR tuning code, services, notebooks, and results. Key subfolders: `raytune/`, `paddle_ocr/`, `doctr_service/`, `easyocr_service/`, `results/`, `results/correlations/`.
- `instructions/`: UNIR template and writing rules (`plantilla_individual.htm` is the styling source of truth).
- `thesis_output/`: Generated thesis HTML and figures (do not edit by hand).
- Root scripts: `generate_mermaid_figures.py` (Mermaid to PNG) and `apply_content.py` (template assembly).
- Temporary scripts go in `tem/scripts/`.
## Build, Test, and Development Commands
- `source .venv/bin/activate` before installing or running Python tools.
- `npm install`: install Mermaid CLI (`node_modules/.bin/mmdc`) for figure generation.
- `python3 generate_mermaid_figures.py`: write PNGs to `thesis_output/figures/` from `docs/*.md`.
- `python3 apply_content.py`: generate `thesis_output/plantilla_individual.htm` from `docs/` + `instructions/`.
- `jupyter notebook src/prepare_dataset.ipynb`: prepare OCR dataset from PDFs.
- `jupyter notebook src/paddle_ocr_fine_tune_unir_raytune.ipynb`: run the main tuning experiment.
- Docker tuning (GPU):
- `docker compose -f src/docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu`
- `docker compose -f src/docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64`
- `docker compose -f src/docker-compose.tuning.paddle.yml down`
- Use `.claude/commands/word-generation.md` to regenerate the thesis output.
## Coding Style & Naming Conventions
- Python: PEP 8, 4-space indentation, `snake_case`.
- Notebooks live in `src/` and should keep execution order clean when committed.
- Documentation in `docs/` is Spanish; code comments stay in English.
## Data, Documentation, and Formatting Rules
- Run `.claude/commands/documentation-review.md` before editing `docs/00-07`.
- Do not invent numbers. Every numeric claim must come from `src/results/*.csv`, `src/results/correlations/*.csv`, `docs/metrics/*.md`, or external sources listed in `docs/07_anexo_a.md`.
- Tables and figures must use UNIR caption format: `**Tabla N.** *Título.*` / `**Figura N.** *Título.*` plus `*Fuente: ...*`.
- Mermaid diagrams require YAML frontmatter with a quoted `title:` and UNIR theme variables.
- Use full repository links in `*Fuente:*` lines, e.g. `https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics.md`.
## Testing Guidelines
- No automated tests. Validate changes by running a small tuning run and checking CSV output in `src/results/`.
## Commit & Pull Request Guidelines
- Commit messages are short, sentence case, and may include a tracker reference in parentheses.
- Keep commits focused; mention generated outputs (figures, HTML) when relevant.
## Agent-Specific Notes
- Follow `claude.md` for thesis-specific constraints and templates.