Some checks failed
build_docker / essential (push) Successful in 0s
build_docker / build_paddle_ocr (push) Successful in 4m57s
build_docker / build_raytune (push) Has been cancelled
build_docker / build_easyocr_gpu (push) Has been cancelled
build_docker / build_doctr (push) Has been cancelled
build_docker / build_doctr_gpu (push) Has been cancelled
build_docker / build_paddle_ocr_gpu (push) Has been cancelled
build_docker / build_easyocr (push) Has been cancelled
2.9 KiB
2.9 KiB
Repository Guidelines
Project Structure & Module Organization
docs/: Thesis chapters 00-07 in Spanish (UNIR structure). Edit these for narrative changes.src/: OCR tuning code, services, notebooks, and results. Key subfolders:raytune/,paddle_ocr/,doctr_service/,easyocr_service/,results/,results/correlations/.instructions/: UNIR template and writing rules (plantilla_individual.htmis the styling source of truth).thesis_output/: Generated thesis HTML and figures (do not edit by hand).- Root scripts:
generate_mermaid_figures.py(Mermaid to PNG) andapply_content.py(template assembly). - Temporary scripts go in
tem/scripts/.
Build, Test, and Development Commands
source .venv/bin/activatebefore installing or running Python tools.npm install: install Mermaid CLI (node_modules/.bin/mmdc) for figure generation.python3 generate_mermaid_figures.py: write PNGs tothesis_output/figures/fromdocs/*.md.python3 apply_content.py: generatethesis_output/plantilla_individual.htmfromdocs/+instructions/.jupyter notebook src/prepare_dataset.ipynb: prepare OCR dataset from PDFs.jupyter notebook src/paddle_ocr_fine_tune_unir_raytune.ipynb: run the main tuning experiment.- Docker tuning (GPU):
docker compose -f src/docker-compose.tuning.paddle.yml up -d paddle-ocr-gpudocker compose -f src/docker-compose.tuning.paddle.yml run raytune --service paddle --samples 64docker compose -f src/docker-compose.tuning.paddle.yml down
- Use
.claude/commands/word-generation.mdto regenerate the thesis output.
Coding Style & Naming Conventions
- Python: PEP 8, 4-space indentation,
snake_case. - Notebooks live in
src/and should keep execution order clean when committed. - Documentation in
docs/is Spanish; code comments stay in English.
Data, Documentation, and Formatting Rules
- Run
.claude/commands/documentation-review.mdbefore editingdocs/00-07. - Do not invent numbers. Every numeric claim must come from
src/results/*.csv,src/results/correlations/*.csv,docs/metrics/*.md, or external sources listed indocs/07_anexo_a.md. - Tables and figures must use UNIR caption format:
**Tabla N.** *Título.*/**Figura N.** *Título.*plus*Fuente: ...*. - Mermaid diagrams require YAML frontmatter with a quoted
title:and UNIR theme variables. - Use full repository links in
*Fuente:*lines, e.g.https://seryus.ddns.net/unir/MastersThesis/src/branch/main/docs/metrics/metrics.md.
Testing Guidelines
- No automated tests. Validate changes by running a small tuning run and checking CSV output in
src/results/.
Commit & Pull Request Guidelines
- Commit messages are short, sentence case, and may include a tracker reference in parentheses.
- Keep commits focused; mention generated outputs (figures, HTML) when relevant.
Agent-Specific Notes
- Follow
claude.mdfor thesis-specific constraints and templates.