clean up datasources

This commit is contained in:
2025-12-16 00:48:14 +01:00
parent 29aef93f63
commit 5220793328
6 changed files with 99 additions and 57 deletions

View File

@@ -100,8 +100,7 @@ The template (`plantilla_individual.pdf`) requires **5 chapters**. The docs/ fil
## Important Data Files
### Results CSV Files
- `src/raytune_paddle_subproc_results_20251207_192320.csv` - 64 Ray Tune trials with configs and metrics
- `results/ai_ocr_benchmark_finetune_results_20251206_113206.csv` - Per-page OCR benchmark results
- `src/raytune_paddle_subproc_results_20251207_192320.csv` - 64 Ray Tune trials with configs and metrics (PRIMARY DATA SOURCE)
### Key Notebooks
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb` - Main Ray Tune experiment
@@ -454,7 +453,6 @@ Fuente: American Psychological Association, 2020b.
| Data Type | Source File |
|-----------|-------------|
| Ray Tune 64 trials | `src/raytune_paddle_subproc_results_20251207_192320.csv` |
| Per-page benchmark | `results/ai_ocr_benchmark_finetune_results_20251206_113206.csv` |
| Experiment code | `src/paddle_ocr_fine_tune_unir_raytune.ipynb` |
| Final comparison | Output cells in the notebook (baseline vs optimized) |
@@ -463,11 +461,11 @@ Fuente: American Psychological Association, 2020b.
**WRONG:** "EasyOCR achieved 8.5% CER while PaddleOCR achieved 5.2% CER"
(We don't have this comparison data in our results files)
**RIGHT:** "PaddleOCR with baseline configuration achieved CER between 1.54% and 6.40% across pages 5-9 (source: `results/ai_ocr_benchmark_finetune_results_20251206_113206.csv`)"
**RIGHT:** "The optimization reduced CER from 7.78% to 1.49%, a reduction of 80.9% (source: final comparison in `paddle_ocr_fine_tune_unir_raytune.ipynb`)"
**WRONG:** "The optimization improved results by approximately 80%"
**RIGHT:** "The optimization reduced CER from 7.78% to 1.49%, a reduction of 80.9% (source: final comparison in `paddle_ocr_fine_tune_unir_raytune.ipynb`)"
**RIGHT:** "From the 64 trials in `raytune_paddle_subproc_results_20251207_192320.csv`, minimum CER achieved was 1.15%"
### When Working on Documentation