Some checks failed
build_docker / essential (push) Successful in 0s
build_docker / build_paddle_ocr (push) Successful in 4m57s
build_docker / build_raytune (push) Has been cancelled
build_docker / build_easyocr_gpu (push) Has been cancelled
build_docker / build_doctr (push) Has been cancelled
build_docker / build_doctr_gpu (push) Has been cancelled
build_docker / build_paddle_ocr_gpu (push) Has been cancelled
build_docker / build_easyocr (push) Has been cancelled
52 lines
2.9 KiB
Markdown
52 lines
2.9 KiB
Markdown
# Repository Guidelines
|
|
|
|
## Project Structure & Module Organization
|
|
|
|
- `docs/`: Thesis chapters 00-07 in Spanish (UNIR structure). Edit these for narrative changes.
|
|
- `src/`: OCR tuning code, services, notebooks, and results. Key subfolders: `raytune/`, `paddle_ocr/`, `doctr_service/`, `easyocr_service/`, `results/`, `results/correlations/`.
|
|
- `instructions/`: UNIR template and writing rules (`plantilla_individual.htm` is the styling source of truth).
|
|
- `thesis_output/`: Generated thesis HTML and figures (do not edit by hand).
|
|
- Root scripts: `generate_mermaid_figures.py` (Mermaid to PNG) and `apply_content.py` (template assembly).
|
|
- Temporary scripts go in `tem/scripts/`.
|
|
|
|
## Build, Test, and Development Commands
|
|
|
|
- `source .venv/bin/activate` before installing or running Python tools.
|
|
- `npm install`: install Mermaid CLI (`node_modules/.bin/mmdc`) for figure generation.
|
|
- `python3 generate_mermaid_figures.py`: write PNGs to `thesis_output/figures/` from `docs/*.md`.
|
|
- `python3 apply_content.py`: generate `thesis_output/plantilla_individual.htm` from `docs/` + `instructions/`.
|
|
- `jupyter notebook src/prepare_dataset.ipynb`: prepare OCR dataset from PDFs.
|
|
- `jupyter notebook src/paddle_ocr_fine_tune_unir_raytune.ipynb`: run the main tuning experiment.
|
|
- Docker tuning (GPU):
|
|
- `docker compose -f src/docker-compose.tuning.paddle.yml up -d paddle-ocr-gpu`
|
|
- `docker compose -f src/docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64`
|
|
- `docker compose -f src/docker-compose.tuning.paddle.yml down`
|
|
- Use `.claude/commands/word-generation.md` to regenerate the thesis output.
|
|
|
|
## Coding Style & Naming Conventions
|
|
|
|
- Python: PEP 8, 4-space indentation, `snake_case`.
|
|
- Notebooks live in `src/` and should keep execution order clean when committed.
|
|
- Documentation in `docs/` is Spanish; code comments stay in English.
|
|
|
|
## Data, Documentation, and Formatting Rules
|
|
|
|
- Run `.claude/commands/documentation-review.md` before editing `docs/00-07`.
|
|
- Do not invent numbers. Every numeric claim must come from `src/results/*.csv`, `src/results/correlations/*.csv`, `docs/metrics/*.md`, or external sources listed in `docs/07_anexo_a.md`.
|
|
- Tables and figures must use UNIR caption format: `**Tabla N.** *Título.*` / `**Figura N.** *Título.*` plus `*Fuente: ...*`.
|
|
- Mermaid diagrams require YAML frontmatter with a quoted `title:` and UNIR theme variables.
|
|
- Use full repository links in `*Fuente:*` lines, e.g. `https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics.md`.
|
|
|
|
## Testing Guidelines
|
|
|
|
- No automated tests. Validate changes by running a small tuning run and checking CSV output in `src/results/`.
|
|
|
|
## Commit & Pull Request Guidelines
|
|
|
|
- Commit messages are short, sentence case, and may include a tracker reference in parentheses.
|
|
- Keep commits focused; mention generated outputs (figures, HTML) when relevant.
|
|
|
|
## Agent-Specific Notes
|
|
|
|
- Follow `claude.md` for thesis-specific constraints and templates.
|