Readme
This commit is contained in:
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
~$*.docx
|
||||||
30
README.md
30
README.md
@@ -1,2 +1,30 @@
|
|||||||
# MastersThesis
|
# 🧠 Intelligent OCR System for Scanned PDF Documents
|
||||||
|
|
||||||
|
**Master’s Thesis – Software Development Project**
|
||||||
|
**Línea de trabajo:** Percepción computacional & Aprendizaje automático
|
||||||
|
**Author:** Sergio Jiménez
|
||||||
|
**Institution:** (UNIR - Universidad Internacional de La Rioja] (https://www.unir.net/ingenieria/master-inteligencia-artificial/)
|
||||||
|
**Date:** 2025
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📘 Overview
|
||||||
|
|
||||||
|
This project develops an **intelligent system for text extraction from scanned PDF documents**, combining **computer vision techniques** and **modern OCR models based on deep learning**.
|
||||||
|
The goal is to overcome the limitations of traditional OCR tools (e.g., Tesseract) when dealing with **low-quality, skewed, or noisy scanned documents**, particularly in **Spanish**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Objectives
|
||||||
|
|
||||||
|
- Develop a **modular OCR pipeline** that processes scanned PDFs end-to-end.
|
||||||
|
- Compare classical OCR tools with **state-of-the-art deep learning approaches** (EasyOCR, TrOCR, CRNN).
|
||||||
|
- Evaluate performance using **Character Error Rate (CER)** and **Word Error Rate (WER)**.
|
||||||
|
- Provide a **CLI-based demonstration tool** and analysis module for automated evaluation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧩 System Architecture
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
|||||||
BIN
thesis_report.docx
Normal file
BIN
thesis_report.docx
Normal file
Binary file not shown.
Reference in New Issue
Block a user