test
This commit is contained in:
44
claude.md
44
claude.md
@@ -59,6 +59,11 @@ MastersThesis/
|
||||
│ ├── 05_conclusiones_trabajo_futuro.md # 5. Conclusiones (5.1, 5.2)
|
||||
│ ├── 06_referencias_bibliograficas.md # Referencias bibliográficas (APA format)
|
||||
│ └── 07_anexo_a.md # Anexo A: Código fuente y datos
|
||||
├── thesis_output/ # Generated thesis document
|
||||
│ ├── plantilla_individual.htm # Complete TFM (open in Word)
|
||||
│ └── figures/ # PNG figures from Mermaid diagrams
|
||||
│ ├── figura_1.png ... figura_7.png
|
||||
│ └── figures_manifest.json
|
||||
├── src/
|
||||
│ ├── paddle_ocr_fine_tune_unir_raytune.ipynb # Main experiment (64 trials)
|
||||
│ ├── paddle_ocr_tuning.py # CLI evaluation script
|
||||
@@ -69,7 +74,9 @@ MastersThesis/
|
||||
├── instructions/ # UNIR instructions and template
|
||||
│ ├── instrucciones.pdf # TFE writing guidelines
|
||||
│ ├── plantilla_individual.pdf # Word template (PDF version)
|
||||
│ └── plantilla_individual.htm # Word template (HTML version, readable)
|
||||
│ └── plantilla_individual.htm # Word template (HTML version, source)
|
||||
├── apply_content.py # Generates TFM document from docs/ + template
|
||||
├── generate_mermaid_figures.py # Converts Mermaid diagrams to PNG
|
||||
├── ocr_benchmark_notebook.ipynb # Initial OCR benchmark
|
||||
└── README.md
|
||||
```
|
||||
@@ -115,19 +122,48 @@ The template (`plantilla_individual.pdf`) requires **5 chapters**. The docs/ fil
|
||||
|
||||
### Completed Tasks
|
||||
- [x] **Structure docs/ to match UNIR template** - All chapters now follow exact numbering (1.1, 1.2, etc.)
|
||||
- [x] **Add Mermaid diagrams** - 4 diagrams added (OCR pipeline, Ray Tune architecture, CER comparison charts)
|
||||
- [x] **Add Mermaid diagrams** - 7 diagrams added (OCR pipeline, Ray Tune architecture, methodology flowcharts, CER comparison charts)
|
||||
- [x] **Generate unified thesis document** - `apply_content.py` generates complete document from docs/
|
||||
- [x] **Convert Mermaid to PNG** - `generate_mermaid_figures.py` generates figures automatically
|
||||
- [x] **Proper template formatting** - Tables/figures use `Piedefoto-tabla` class, references use `MsoBibliography`
|
||||
|
||||
### Priority Tasks
|
||||
1. **Validate on other document types** - Test optimal config on invoices, forms, contracts
|
||||
2. **Expand dataset** - Current dataset has only 24 pages
|
||||
3. **Complete unified thesis document** - Merge docs/ chapters into final UNIR Word format
|
||||
4. **Create presentation slides** - For thesis defense
|
||||
3. **Create presentation slides** - For thesis defense
|
||||
4. **Final document review** - Open in Word, update indices (Ctrl+A, F9), verify formatting
|
||||
|
||||
### Optional Extensions
|
||||
- Explore `text_det_unclip_ratio` parameter (was fixed at 0.0)
|
||||
- Compare with actual fine-tuning (if GPU access obtained)
|
||||
- Multi-objective optimization (CER + WER + inference time)
|
||||
|
||||
## Thesis Document Generation
|
||||
|
||||
To regenerate the thesis document:
|
||||
|
||||
```bash
|
||||
# 1. Generate PNG figures from Mermaid diagrams
|
||||
python3 generate_mermaid_figures.py
|
||||
|
||||
# 2. Apply docs/ content to UNIR template
|
||||
python3 apply_content.py
|
||||
|
||||
# 3. Open in Word and finalize
|
||||
# - Open thesis_output/plantilla_individual.htm in Microsoft Word
|
||||
# - Press Ctrl+A then F9 to update all indices
|
||||
# - Save as .docx
|
||||
```
|
||||
|
||||
**What `apply_content.py` does:**
|
||||
- Replaces Resumen and Abstract with actual content + keywords
|
||||
- Replaces all 5 chapters with content from docs/
|
||||
- Replaces Referencias with APA-formatted bibliography
|
||||
- Replaces Anexo with repository information
|
||||
- Converts Mermaid diagrams to embedded PNG images
|
||||
- Formats tables with `Piedefoto-tabla` captions and sources
|
||||
- Removes template instruction text ("Importante:", "Ejemplo de nota al pie", etc.)
|
||||
|
||||
---
|
||||
|
||||
## UNIR TFE Document Guidelines
|
||||
|
||||
Reference in New Issue
Block a user