Files
MastersThesis/README.md

31 lines
1.2 KiB
Markdown
Raw Normal View History

2025-10-08 09:45:22 +02:00
# 🧠 Intelligent OCR System for Scanned PDF Documents
**Masters Thesis Software Development Project**
**Línea de trabajo:** Percepción computacional & Aprendizaje automático
**Author:** Sergio Jiménez
**Institution:** (UNIR - Universidad Internacional de La Rioja] (https://www.unir.net/ingenieria/master-inteligencia-artificial/)
**Date:** 2025
---
## 📘 Overview
This project develops an **intelligent system for text extraction from scanned PDF documents**, combining **computer vision techniques** and **modern OCR models based on deep learning**.
The goal is to overcome the limitations of traditional OCR tools (e.g., Tesseract) when dealing with **low-quality, skewed, or noisy scanned documents**, particularly in **Spanish**.
---
## 🎯 Objectives
- Develop a **modular OCR pipeline** that processes scanned PDFs end-to-end.
- Compare classical OCR tools with **state-of-the-art deep learning approaches** (EasyOCR, TrOCR, CRNN).
- Evaluate performance using **Character Error Rate (CER)** and **Word Error Rate (WER)**.
- Provide a **CLI-based demonstration tool** and analysis module for automated evaluation.
---
## 🧩 System Architecture
TODO
2025-09-23 10:14:19 +00:00