Readme

2025-10-08 09:45:22 +02:00
parent 72383de9a8
commit 30a352efea
3 changed files with 30 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,30 @@
-# MastersThesis
+# 🧠 Intelligent OCR System for Scanned PDF Documents
+
+**Master’s Thesis – Software Development Project**  
+**Línea de trabajo:** Percepción computacional & Aprendizaje automático  
+**Author:** Sergio Jiménez   
+**Institution:** (UNIR - Universidad Internacional de La Rioja] (https://www.unir.net/ingenieria/master-inteligencia-artificial/)
+**Date:** 2025  
+
+---
+
+## 📘 Overview
+
+This project develops an **intelligent system for text extraction from scanned PDF documents**, combining **computer vision techniques** and **modern OCR models based on deep learning**.  
+The goal is to overcome the limitations of traditional OCR tools (e.g., Tesseract) when dealing with **low-quality, skewed, or noisy scanned documents**, particularly in **Spanish**.
+
+---
+
+## 🎯 Objectives
+
+- Develop a **modular OCR pipeline** that processes scanned PDFs end-to-end.
+- Compare classical OCR tools with **state-of-the-art deep learning approaches** (EasyOCR, TrOCR, CRNN).
+- Evaluate performance using **Character Error Rate (CER)** and **Word Error Rate (WER)**.
+- Provide a **CLI-based demonstration tool** and analysis module for automated evaluation.
+
+---
+
+## 🧩 System Architecture
+
+TODO