assit commands for claude
All checks were successful
build_docker / essential (pull_request) Successful in 1s
build_docker / build_cpu (pull_request) Successful in 5m0s
build_docker / build_gpu (pull_request) Successful in 22m37s
build_docker / build_easyocr (pull_request) Successful in 18m5s
build_docker / build_easyocr_gpu (pull_request) Successful in 15m43s
build_docker / build_doctr (pull_request) Successful in 17m17s
build_docker / build_raytune (pull_request) Successful in 3m24s
build_docker / build_doctr_gpu (pull_request) Successful in 16m54s
All checks were successful
build_docker / essential (pull_request) Successful in 1s
build_docker / build_cpu (pull_request) Successful in 5m0s
build_docker / build_gpu (pull_request) Successful in 22m37s
build_docker / build_easyocr (pull_request) Successful in 18m5s
build_docker / build_easyocr_gpu (pull_request) Successful in 15m43s
build_docker / build_doctr (pull_request) Successful in 17m17s
build_docker / build_raytune (pull_request) Successful in 3m24s
build_docker / build_doctr_gpu (pull_request) Successful in 16m54s
This commit is contained in:
57
claude.md
57
claude.md
@@ -12,39 +12,41 @@ This is a **Master's Thesis (TFM)** for UNIR's Master in Artificial Intelligence
|
||||
|
||||
### Why Hyperparameter Optimization Instead of Fine-tuning
|
||||
|
||||
Due to **hardware limitations** (no dedicated GPU, CPU-only execution), the project pivoted from fine-tuning to hyperparameter optimization:
|
||||
- Fine-tuning deep learning models without GPU is prohibitively slow
|
||||
- Inference time is ~69 seconds/page on CPU
|
||||
- Hyperparameter optimization proved to be an effective alternative, achieving 80.9% CER reduction
|
||||
The project chose **hyperparameter optimization** over fine-tuning because:
|
||||
- Fine-tuning requires extensive labeled datasets specific to the domain
|
||||
- Hyperparameter tuning can improve pretrained models without retraining
|
||||
- GPU acceleration (RTX 3060) enables efficient exploration of hyperparameter space
|
||||
|
||||
### Main Results
|
||||
### Main Results (GPU - Jan 2026)
|
||||
|
||||
| Model | CER | Character Accuracy |
|
||||
|-------|-----|-------------------|
|
||||
| PaddleOCR Baseline | 7.78% | 92.22% |
|
||||
| PaddleOCR-HyperAdjust | **1.49%** | **98.51%** |
|
||||
| PaddleOCR Baseline | 8.85% | 91.15% |
|
||||
| PaddleOCR-HyperAdjust (full dataset) | **7.72%** | **92.28%** |
|
||||
| PaddleOCR-HyperAdjust (best trial) | **0.79%** | **99.21%** |
|
||||
|
||||
**Goal achieved:** CER < 2% (target was < 2%, result is 1.49%)
|
||||
**Goal status:** CER < 2% achieved in best trial (0.79%). Full dataset shows 12.8% improvement.
|
||||
|
||||
### Optimal Configuration Found
|
||||
### Optimal Configuration Found (GPU)
|
||||
|
||||
```python
|
||||
config_optimizada = {
|
||||
"textline_orientation": True, # CRITICAL - reduces CER ~70%
|
||||
"use_doc_orientation_classify": False,
|
||||
"textline_orientation": True, # CRITICAL for complex layouts
|
||||
"use_doc_orientation_classify": True, # Improves document orientation
|
||||
"use_doc_unwarping": False,
|
||||
"text_det_thresh": 0.4690,
|
||||
"text_det_box_thresh": 0.5412,
|
||||
"text_det_thresh": 0.0462, # -0.52 correlation with CER
|
||||
"text_det_box_thresh": 0.4862,
|
||||
"text_det_unclip_ratio": 0.0,
|
||||
"text_rec_score_thresh": 0.6350,
|
||||
"text_rec_score_thresh": 0.5658,
|
||||
}
|
||||
```
|
||||
|
||||
### Key Findings
|
||||
|
||||
1. `textline_orientation=True` is the most impactful parameter (reduces CER by 69.7%)
|
||||
2. `text_det_thresh` has -0.52 correlation with CER; values < 0.1 cause catastrophic failures
|
||||
3. Document correction modules (`use_doc_orientation_classify`, `use_doc_unwarping`) are unnecessary for digital PDFs
|
||||
1. `textline_orientation=True` is critical for documents with mixed layouts
|
||||
2. `use_doc_orientation_classify=True` improves document orientation detection in GPU config
|
||||
3. `text_det_thresh` has -0.52 correlation with CER; values < 0.01 cause catastrophic failures
|
||||
4. `use_doc_unwarping=False` is optimal for digital PDFs (unnecessary processing)
|
||||
|
||||
## Repository Structure
|
||||
|
||||
@@ -99,13 +101,18 @@ The template (`plantilla_individual.pdf`) requires **5 chapters**. The docs/ fil
|
||||
|
||||
## Important Data Files
|
||||
|
||||
### Results CSV Files
|
||||
- `src/raytune_paddle_subproc_results_20251207_192320.csv` - 64 Ray Tune trials with configs and metrics (PRIMARY DATA SOURCE)
|
||||
### Results CSV Files (GPU - PRIMARY)
|
||||
- `src/results/raytune_paddle_results_20260119_122609.csv` - 64 Ray Tune trials PaddleOCR GPU (PRIMARY)
|
||||
- `src/results/raytune_easyocr_results_20260119_120204.csv` - 64 Ray Tune trials EasyOCR GPU
|
||||
- `src/results/raytune_doctr_results_20260119_121445.csv` - 64 Ray Tune trials DocTR GPU
|
||||
|
||||
### Key Notebooks
|
||||
- `src/paddle_ocr_fine_tune_unir_raytune.ipynb` - Main Ray Tune experiment
|
||||
- `src/prepare_dataset.ipynb` - PDF to image/text conversion
|
||||
- `ocr_benchmark_notebook.ipynb` - EasyOCR vs PaddleOCR vs DocTR comparison
|
||||
### Results CSV Files (CPU - time reference only)
|
||||
- `src/raytune_paddle_subproc_results_20251207_192320.csv` - CPU execution for time comparison (69.4s/page vs 0.84s/page GPU)
|
||||
|
||||
### Key Scripts
|
||||
- `src/run_tuning.py` - Main Ray Tune optimization script
|
||||
- `src/raytune/raytune_ocr.py` - Ray Tune utilities and search spaces
|
||||
- `src/paddle_ocr/paddle_ocr_tuning_rest.py` - PaddleOCR REST API
|
||||
|
||||
## Technical Stack
|
||||
|
||||
@@ -128,13 +135,13 @@ The template (`plantilla_individual.pdf`) requires **5 chapters**. The docs/ fil
|
||||
|
||||
### Priority Tasks
|
||||
1. **Validate on other document types** - Test optimal config on invoices, forms, contracts
|
||||
2. **Expand dataset** - Current dataset has only 24 pages
|
||||
2. **Use larger tuning subset** - Current 5 pages caused overfitting; recommend 15-20 pages
|
||||
3. **Create presentation slides** - For thesis defense
|
||||
4. **Final document review** - Open in Word, update indices (Ctrl+A, F9), verify formatting
|
||||
|
||||
### Optional Extensions
|
||||
- Explore `text_det_unclip_ratio` parameter (was fixed at 0.0)
|
||||
- Compare with actual fine-tuning (if GPU access obtained)
|
||||
- Compare with actual fine-tuning
|
||||
- Multi-objective optimization (CER + WER + inference time)
|
||||
|
||||
## Thesis Document Generation
|
||||
|
||||
Reference in New Issue
Block a user