Files
MastersThesis/paddle_ocr_fine_tune_unir.ipynb

1286 lines
5.0 MiB
Plaintext
Raw Normal View History

2025-11-17 10:52:00 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "be3c1872",
"metadata": {},
"source": [
"# AI-based OCR Benchmark Notebook\n",
"\n",
"This notebook benchmarks **AI-based OCR models** on scanned PDF documents/images in Spanish.\n",
"It excludes traditional OCR engines like Tesseract that require external installations."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6a1e98fe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: pip in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (25.3)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: jupyter in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.1.1)\n",
"Requirement already satisfied: notebook in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.4.7)\n",
"Requirement already satisfied: jupyter-console in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (6.6.3)\n",
"Requirement already satisfied: nbconvert in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.16.6)\n",
"Requirement already satisfied: ipykernel in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.1.0)\n",
"Requirement already satisfied: ipywidgets in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (8.1.8)\n",
"Requirement already satisfied: jupyterlab in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (4.4.10)\n",
"Requirement already satisfied: comm>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (0.2.3)\n",
"Requirement already satisfied: debugpy>=1.6.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (1.8.17)\n",
"Requirement already satisfied: ipython>=7.23.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (9.7.0)\n",
"Requirement already satisfied: jupyter-client>=8.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (8.6.3)\n",
"Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (5.9.1)\n",
"Requirement already satisfied: matplotlib-inline>=0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (0.2.1)\n",
"Requirement already satisfied: nest-asyncio>=1.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (1.6.0)\n",
"Requirement already satisfied: packaging>=22 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (25.0)\n",
"Requirement already satisfied: psutil>=5.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (7.1.3)\n",
"Requirement already satisfied: pyzmq>=25 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (27.1.0)\n",
"Requirement already satisfied: tornado>=6.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (6.5.2)\n",
"Requirement already satisfied: traitlets>=5.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (5.14.3)\n",
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.4.6)\n",
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (5.2.1)\n",
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (1.1.1)\n",
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.19.2)\n",
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (3.0.52)\n",
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (2.19.2)\n",
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.6.3)\n",
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (4.15.0)\n",
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel->jupyter) (0.2.14)\n",
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=7.23.1->ipykernel->jupyter) (0.8.5)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-client>=8.0.0->ipykernel->jupyter) (2.9.0.post0)\n",
"Requirement already satisfied: platformdirs>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel->jupyter) (4.5.0)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->jupyter-client>=8.0.0->ipykernel->jupyter) (1.17.0)\n",
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (2.2.1)\n",
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (3.0.0)\n",
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (0.2.3)\n",
"Requirement already satisfied: widgetsnbextension~=4.0.14 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets->jupyter) (4.0.15)\n",
"Requirement already satisfied: jupyterlab_widgets~=3.0.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets->jupyter) (3.0.16)\n",
"Requirement already satisfied: async-lru>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.0.5)\n",
"Requirement already satisfied: httpx<1,>=0.25.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (0.28.1)\n",
"Requirement already satisfied: jinja2>=3.0.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (3.1.6)\n",
"Requirement already satisfied: jupyter-lsp>=2.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.3.0)\n",
"Requirement already satisfied: jupyter-server<3,>=2.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.17.0)\n",
"Requirement already satisfied: jupyterlab-server<3,>=2.27.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.28.0)\n",
"Requirement already satisfied: notebook-shim>=0.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (0.2.4)\n",
"Requirement already satisfied: setuptools>=41.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (65.5.0)\n",
"Requirement already satisfied: anyio in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (4.11.0)\n",
"Requirement already satisfied: certifi in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (2025.10.5)\n",
"Requirement already satisfied: httpcore==1.* in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (1.0.9)\n",
"Requirement already satisfied: idna in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (3.11)\n",
"Requirement already satisfied: h11>=0.16 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpcore==1.*->httpx<1,>=0.25.0->jupyterlab->jupyter) (0.16.0)\n",
"Requirement already satisfied: argon2-cffi>=21.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.1.0)\n",
"Requirement already satisfied: jupyter-events>=0.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.12.0)\n",
"Requirement already satisfied: jupyter-server-terminals>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.5.3)\n",
"Requirement already satisfied: nbformat>=5.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (5.10.4)\n",
"Requirement already satisfied: overrides>=5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (7.7.0)\n",
"Requirement already satisfied: prometheus-client>=0.9 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.23.1)\n",
"Requirement already satisfied: pywinpty>=2.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (3.0.2)\n",
"Requirement already satisfied: send2trash>=1.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.8.3)\n",
"Requirement already satisfied: terminado>=0.8.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.18.1)\n",
"Requirement already satisfied: websocket-client>=1.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.9.0)\n",
"Requirement already satisfied: babel>=2.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.17.0)\n",
"Requirement already satisfied: json5>=0.9.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.12.1)\n",
"Requirement already satisfied: jsonschema>=4.18.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (4.25.1)\n",
"Requirement already satisfied: requests>=2.31 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.32.5)\n",
"Requirement already satisfied: sniffio>=1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from anyio->httpx<1,>=0.25.0->jupyterlab->jupyter) (1.3.1)\n",
"Requirement already satisfied: argon2-cffi-bindings in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.1.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jinja2>=3.0.3->jupyterlab->jupyter) (3.0.3)\n",
"Requirement already satisfied: attrs>=22.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (25.4.0)\n",
"Requirement already satisfied: jsonschema-specifications>=2023.03.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2025.9.1)\n",
"Requirement already satisfied: referencing>=0.28.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.37.0)\n",
"Requirement already satisfied: rpds-py>=0.7.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.28.0)\n",
"Requirement already satisfied: python-json-logger>=2.0.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (4.0.0)\n",
"Requirement already satisfied: pyyaml>=5.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (6.0.2)\n",
"Requirement already satisfied: rfc3339-validator in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.1.4)\n",
"Requirement already satisfied: rfc3986-validator>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.1.1)\n",
"Requirement already satisfied: fqdn in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.5.1)\n",
"Requirement already satisfied: isoduration in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (20.11.0)\n",
"Requirement already satisfied: jsonpointer>1.13 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (3.0.0)\n",
"Requirement already satisfied: rfc3987-syntax>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.1.0)\n",
"Requirement already satisfied: uri-template in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.3.0)\n",
"Requirement already satisfied: webcolors>=24.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.10.0)\n",
"Requirement already satisfied: beautifulsoup4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (4.14.2)\n",
"Requirement already satisfied: bleach!=5.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach[css]!=5.0.0->nbconvert->jupyter) (6.3.0)\n",
"Requirement already satisfied: defusedxml in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.7.1)\n",
"Requirement already satisfied: jupyterlab-pygments in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.3.0)\n",
"Requirement already satisfied: mistune<4,>=2.0.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (3.1.4)\n",
"Requirement already satisfied: nbclient>=0.5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.10.2)\n",
"Requirement already satisfied: pandocfilters>=1.4.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (1.5.1)\n",
"Requirement already satisfied: webencodings in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert->jupyter) (0.5.1)\n",
"Requirement already satisfied: tinycss2<1.5,>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach[css]!=5.0.0->nbconvert->jupyter) (1.4.0)\n",
"Requirement already satisfied: fastjsonschema>=2.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbformat>=5.3.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.21.2)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (3.4.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.5.0)\n",
"Requirement already satisfied: lark>=1.2.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from rfc3987-syntax>=1.1.0->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.3.1)\n",
"Requirement already satisfied: cffi>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.0.0)\n",
"Requirement already satisfied: pycparser in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.23)\n",
"Requirement already satisfied: soupsieve>1.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from beautifulsoup4->nbconvert->jupyter) (2.8)\n",
"Requirement already satisfied: arrow>=0.15.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.4.0)\n",
"Requirement already satisfied: tzdata in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from arrow>=0.15.0->isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2025.2)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: ipywidgets in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (8.1.8)\n",
"Requirement already satisfied: comm>=0.1.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (0.2.3)\n",
"Requirement already satisfied: ipython>=6.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (9.7.0)\n",
"Requirement already satisfied: traitlets>=4.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (5.14.3)\n",
"Requirement already satisfied: widgetsnbextension~=4.0.14 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (4.0.15)\n",
"Requirement already satisfied: jupyterlab_widgets~=3.0.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (3.0.16)\n",
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.4.6)\n",
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (5.2.1)\n",
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (1.1.1)\n",
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.19.2)\n",
"Requirement already satisfied: matplotlib-inline>=0.1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.2.1)\n",
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (3.0.52)\n",
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (2.19.2)\n",
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)\n",
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (4.15.0)\n",
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.14)\n",
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=6.1.0->ipywidgets) (0.8.5)\n",
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (2.2.1)\n",
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (3.0.0)\n",
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (0.2.3)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: ipykernel in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (7.1.0)\n",
"Requirement already satisfied: comm>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (0.2.3)\n",
"Requirement already satisfied: debugpy>=1.6.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (1.8.17)\n",
"Requirement already satisfied: ipython>=7.23.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (9.7.0)\n",
"Requirement already satisfied: jupyter-client>=8.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (8.6.3)\n",
"Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (5.9.1)\n",
"Requirement already satisfied: matplotlib-inline>=0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (0.2.1)\n",
"Requirement already satisfied: nest-asyncio>=1.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (1.6.0)\n",
"Requirement already satisfied: packaging>=22 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (25.0)\n",
"Requirement already satisfied: psutil>=5.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (7.1.3)\n",
"Requirement already satisfied: pyzmq>=25 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (27.1.0)\n",
"Requirement already satisfied: tornado>=6.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (6.5.2)\n",
"Requirement already satisfied: traitlets>=5.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (5.14.3)\n",
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.4.6)\n",
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (5.2.1)\n",
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (1.1.1)\n",
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.19.2)\n",
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (3.0.52)\n",
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (2.19.2)\n",
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.6.3)\n",
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (4.15.0)\n",
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel) (0.2.14)\n",
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=7.23.1->ipykernel) (0.8.5)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-client>=8.0.0->ipykernel) (2.9.0.post0)\n",
"Requirement already satisfied: platformdirs>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel) (4.5.0)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->jupyter-client>=8.0.0->ipykernel) (1.17.0)\n",
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (2.2.1)\n",
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (3.0.0)\n",
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (0.2.3)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: transformers in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (4.57.1)\n",
"Requirement already satisfied: torch in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (2.9.0)\n",
"Requirement already satisfied: pdf2image in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.17.0)\n",
"Requirement already satisfied: pillow in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (12.0.0)\n",
"Requirement already satisfied: jiwer in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (4.0.0)\n",
"Requirement already satisfied: paddleocr in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.3.1)\n",
"Requirement already satisfied: hf_xet in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.2.0)\n",
"Requirement already satisfied: paddlepaddle in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.2.1)\n",
"Requirement already satisfied: filelock in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (3.20.0)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.36.0)\n",
"Requirement already satisfied: numpy>=1.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2.3.4)\n",
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (25.0)\n",
"Requirement already satisfied: pyyaml>=5.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (6.0.2)\n",
"Requirement already satisfied: regex!=2019.12.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2025.11.3)\n",
"Requirement already satisfied: requests in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2.32.5)\n",
"Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.22.1)\n",
"Requirement already satisfied: safetensors>=0.4.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.6.2)\n",
"Requirement already satisfied: tqdm>=4.27 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (4.67.1)\n",
"Requirement already satisfied: fsspec>=2023.5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (2025.10.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (4.15.0)\n",
"Requirement already satisfied: sympy>=1.13.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (1.14.0)\n",
"Requirement already satisfied: networkx>=2.5.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (3.5)\n",
"Requirement already satisfied: jinja2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (3.1.6)\n",
"Requirement already satisfied: click>=8.1.8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jiwer) (8.2.1)\n",
"Requirement already satisfied: rapidfuzz>=3.9.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jiwer) (3.14.3)\n",
"Requirement already satisfied: paddlex<3.4.0,>=3.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.3.9)\n",
"Requirement already satisfied: aistudio-sdk>=0.3.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.3.8)\n",
"Requirement already satisfied: chardet in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.2.0)\n",
"Requirement already satisfied: colorlog in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (6.10.1)\n",
"Requirement already satisfied: modelscope>=1.28.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.31.0)\n",
"Requirement already satisfied: pandas>=1.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.3.3)\n",
"Requirement already satisfied: prettytable in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.16.0)\n",
"Requirement already satisfied: py-cpuinfo in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (9.0.0)\n",
"Requirement already satisfied: pydantic>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.12.4)\n",
"Requirement already satisfied: ruamel.yaml in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.18.16)\n",
"Requirement already satisfied: ujson in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.11.0)\n",
"Requirement already satisfied: imagesize in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.4.1)\n",
"Requirement already satisfied: opencv-contrib-python==4.10.0.84 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (4.10.0.84)\n",
"Requirement already satisfied: pyclipper in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.3.0.post6)\n",
"Requirement already satisfied: pypdfium2>=4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.0.0)\n",
"Requirement already satisfied: python-bidi in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.6.7)\n",
"Requirement already satisfied: shapely in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.1.2)\n",
"Requirement already satisfied: httpx in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (0.28.1)\n",
"Requirement already satisfied: protobuf>=3.20.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (6.33.0)\n",
"Requirement already satisfied: opt-einsum==3.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (3.3.0)\n",
"Requirement already satisfied: psutil in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (7.1.3)\n",
"Requirement already satisfied: bce-python-sdk in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.9.50)\n",
"Requirement already satisfied: colorama in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from click>=8.1.8->jiwer) (0.4.6)\n",
"Requirement already satisfied: setuptools in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from modelscope>=1.28.0->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (65.5.0)\n",
"Requirement already satisfied: urllib3>=1.26 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from modelscope>=1.28.0->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.5.0)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2025.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2025.2)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.41.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.41.5)\n",
"Requirement already satisfied: typing-inspection>=0.4.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.4.2)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.17.0)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (3.4.4)\n",
"Requirement already satisfied: idna<4,>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (3.11)\n",
"Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (2025.10.5)\n",
"Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from sympy>=1.13.3->torch) (1.3.0)\n",
"Requirement already satisfied: pycryptodome>=3.8.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bce-python-sdk->aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.23.0)\n",
"Requirement already satisfied: future>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bce-python-sdk->aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.0.0)\n",
"Requirement already satisfied: anyio in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx->paddlepaddle) (4.11.0)\n",
"Requirement already satisfied: httpcore==1.* in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx->paddlepaddle) (1.0.9)\n",
"Requirement already satisfied: h11>=0.16 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpcore==1.*->httpx->paddlepaddle) (0.16.0)\n",
"Requirement already satisfied: sniffio>=1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from anyio->httpx->paddlepaddle) (1.3.1)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jinja2->torch) (3.0.3)\n",
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prettytable->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.2.14)\n",
"Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ruamel.yaml->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.2.14)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: PyMuPDF in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.26.6)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: pandas in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (2.3.3)\n",
"Requirement already satisfied: numpy>=1.23.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2.3.4)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2025.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2025.2)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: matplotlib in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.10.7)\n",
"Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (1.3.3)\n",
"Requirement already satisfied: cycler>=0.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (0.12.1)\n",
"Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (4.60.1)\n",
"Requirement already satisfied: kiwisolver>=1.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (1.4.9)\n",
"Requirement already satisfied: numpy>=1.23 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (2.3.4)\n",
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (25.0)\n",
"Requirement already satisfied: pillow>=8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (12.0.0)\n",
"Requirement already satisfied: pyparsing>=3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (3.2.5)\n",
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (2.9.0.post0)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: seaborn in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (0.13.2)\n",
"Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (2.3.4)\n",
"Requirement already satisfied: pandas>=1.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (2.3.3)\n",
"Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (3.10.7)\n",
"Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.3)\n",
"Requirement already satisfied: cycler>=0.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)\n",
"Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.60.1)\n",
"Requirement already satisfied: kiwisolver>=1.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.9)\n",
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (25.0)\n",
"Requirement already satisfied: pillow>=8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (12.0.0)\n",
"Requirement already satisfied: pyparsing>=3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.5)\n",
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2025.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2025.2)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade pip\n",
"%pip install --upgrade jupyter\n",
"%pip install --upgrade ipywidgets\n",
"%pip install --upgrade ipykernel\n",
"\n",
"# Install necessary packages\n",
"%pip install transformers torch pdf2image pillow jiwer paddleocr hf_xet paddlepaddle\n",
"# pdf reading\n",
"%pip install PyMuPDF\n",
"\n",
"# Data analysis and visualization\n",
"%pip install pandas\n",
"%pip install matplotlib\n",
"%pip install seaborn"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ae33632a",
"metadata": {},
"outputs": [],
"source": [
"# Imports\n",
"import os\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from pdf2image import convert_from_path\n",
"from PIL import Image, ImageOps\n",
"import torch\n",
"from jiwer import wer, cer\n",
"from paddleocr import PaddleOCR\n",
"import fitz # PyMuPDF\n",
"import re\n",
"from datetime import datetime"
]
},
{
"cell_type": "markdown",
"id": "0e00f1b0",
"metadata": {},
"source": [
"## 1 Configuration"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"PDF_FOLDER = './instructions' # Folder containing PDF files\n",
"OUTPUT_FOLDER = 'results'\n",
"os.makedirs(OUTPUT_FOLDER, exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "243849b9",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\sji\\AppData\\Local\\Temp\\ipykernel_5520\\2286581791.py:7: UserWarning: `lang` and `ocr_version` will be ignored when model names or model directories are not `None`.\n",
" paddleocr_model = PaddleOCR(\n",
"c:\\Users\\sji\\Desktop\\MastersThesis\\.venv\\Lib\\site-packages\\paddle\\utils\\cpp_extension\\extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md\n",
" warnings.warn(warning_message)\n",
"\u001b[32mCreating model: ('PP-LCNet_x1_0_textline_ori', None)\u001b[0m\n",
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-LCNet_x1_0_textline_ori`.\u001b[0m\n",
"\u001b[32mCreating model: ('PP-OCRv5_server_det', None)\u001b[0m\n",
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-OCRv5_server_det`.\u001b[0m\n",
"\u001b[32mCreating model: ('PP-OCRv5_server_rec', None)\u001b[0m\n",
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-OCRv5_server_rec`.\u001b[0m\n"
]
}
],
"source": [
"# 3. PaddleOCR \n",
"# https://www.paddleocr.ai/v3.0.0/en/version3.x/pipeline_usage/OCR.html?utm_source=chatgpt.com#21-command-line\n",
"from paddleocr import PaddleOCR\n",
"\n",
"# Initialize with better settings for Spanish/Latin text\n",
"# https://www.paddleocr.ai/main/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html?utm_source=chatgpt.com#5-models-and-their-supported-languages\n",
"paddleocr_model = PaddleOCR(\n",
" text_detection_model_name=\"PP-OCRv5_server_det\",\n",
" text_recognition_model_name=\"PP-OCRv5_server_rec\",\n",
" use_doc_orientation_classify=False,\n",
" use_doc_unwarping=False,\n",
" use_textline_orientation=True,\n",
" lang='es',\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "329da34a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.3.1\n"
]
}
],
"source": [
"import paddleocr\n",
"\n",
"print(paddleocr.__version__)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6082e2df",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Exported: paddleocr_pipeline_dump.yaml\n"
]
}
],
"source": [
"# 1) Dump the active paddlex pipeline config to a YAML file\n",
"yaml_path = \"paddleocr_pipeline_dump.yaml\"\n",
"paddleocr_model.export_paddlex_config_to_yaml(yaml_path)\n",
"print(\"Exported:\", yaml_path)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b1541bb6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"c:\\Users\\sji\\Desktop\\MastersThesis\\.venv\\Lib\\site-packages\\paddleocr\n"
]
}
],
"source": [
"# 1) Locate the installed PaddleOCR package\n",
"pkg_dir = os.path.dirname(paddleocr.__file__)\n",
"print(pkg_dir)"
]
},
{
"cell_type": "markdown",
"id": "84c999e2",
"metadata": {},
"source": [
"## 2 Helper Functions"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d8bddf8f",
"metadata": {},
"outputs": [],
"source": [
"# preprocess_image.py\n",
"import cv2\n",
"import numpy as np\n",
"\n",
"def preprocess_for_ocr(pil_image):\n",
" \"\"\"\n",
" Preprocesamiento optimizado para PaddleOCR\n",
" \n",
" Args:\n",
" pil_image (PIL.Image.Image): Imagen PIL\n",
" \n",
" Returns:\n",
" PIL.Image.Image: Imagen preprocesada en formato RGB\n",
" \"\"\"\n",
" \n",
" # Convertir PIL Image a numpy array\n",
" img_array = np.array(pil_image)\n",
" \n",
" # Si la imagen es RGBA, convertir a RGB\n",
" if img_array.shape[-1] == 4:\n",
" img_array = cv2.cvtColor(img_array, cv2.COLOR_RGBA2RGB)\n",
" \n",
" # Si la imagen es RGB, convertir a BGR para OpenCV\n",
" if len(img_array.shape) == 3 and img_array.shape[-1] == 3:\n",
" img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)\n",
" else:\n",
" img_bgr = img_array\n",
" \n",
" # Convertir a escala de grises\n",
" gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)\n",
" \n",
" # Upscaling si es necesario\n",
" height, width = gray.shape\n",
" if height < 1000:\n",
" scale = 1500 / height\n",
" new_width = int(width * scale)\n",
" new_height = int(height * scale)\n",
" gray = cv2.resize(gray, (new_width, new_height), \n",
" interpolation=cv2.INTER_CUBIC)\n",
" \n",
" # Binarización adaptativa\n",
" binary = cv2.adaptiveThreshold(\n",
" gray, 255, \n",
" cv2.ADAPTIVE_THRESH_GAUSSIAN_C, \n",
" cv2.THRESH_BINARY, \n",
" 11, 2\n",
" )\n",
" \n",
" # Denoise\n",
" denoised = cv2.fastNlMeansDenoising(binary, None, 10, 7, 21)\n",
" \n",
" # Dilate\n",
" kernel = np.ones((1,1), np.uint8)\n",
" dilated = cv2.dilate(denoised, kernel, iterations=1)\n",
" \n",
" # Convertir a RGB\n",
" rgb_img = cv2.cvtColor(dilated, cv2.COLOR_GRAY2RGB)\n",
" \n",
" # Convertir de vuelta a PIL Image\n",
" pil_img = Image.fromarray(rgb_img)\n",
" \n",
" return pil_img # PIL.Image.Image en modo RGB"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9596c7df",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Optional\n",
"\n",
"def show_page(img: Image.Image, text: str, scale: float = 1):\n",
" \"\"\"\n",
" Displays a smaller version of the image with text as a footer.\n",
" \"\"\"\n",
" # Compute plot size based on image dimensions (but without resizing the image)\n",
" w, h = img.size\n",
" figsize = (w * scale / 100, h * scale / 100) # convert pixels to inches approx\n",
"\n",
" fig, ax = plt.subplots(figsize=figsize)\n",
" ax.imshow(img)\n",
" ax.axis(\"off\")\n",
"\n",
"\n",
" # Add OCR text below the image (footer)\n",
" # plt.figtext(0.5, 0.02, text.strip(), wrap=True, ha='center', va='bottom', fontsize=10)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"def pdf_to_images(pdf_path: str, dpi: int = 300, pages: List[int] = None) -> List[Image.Image]:\n",
" \"\"\"\n",
" Render a PDF into a list of PIL Images using PyMuPDF or pdf2image.\n",
" 'pages' is 1-based (e.g., range(1, 10) -> pages 19).\n",
" \"\"\"\n",
" images = []\n",
"\n",
" if fitz is not None:\n",
" doc = fitz.open(pdf_path)\n",
" total_pages = len(doc)\n",
"\n",
" # Adjust page indices (PyMuPDF uses 0-based indexing)\n",
" if pages is None:\n",
" page_indices = list(range(total_pages))\n",
" else:\n",
" # Filter out invalid pages and convert to 0-based\n",
" page_indices = [p - 1 for p in pages if 1 <= p <= total_pages]\n",
"\n",
" for i in page_indices:\n",
" page = doc.load_page(i)\n",
" mat = fitz.Matrix(dpi / 72.0, dpi / 72.0)\n",
" pix = page.get_pixmap(matrix=mat, alpha=False)\n",
" img = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n",
" \n",
" images.append(img)\n",
" doc.close()\n",
" else:\n",
" raise RuntimeError(\"Install PyMuPDF or pdf2image to convert PDFs.\")\n",
"\n",
" return images\n",
"\n",
"def pdf_extract_text(pdf_path, page_num, line_tolerance=15) -> str:\n",
" \"\"\"\n",
" Extracts text from a specific PDF page in proper reading order.\n",
" Adds '\\n' when blocks are vertically separated more than line_tolerance.\n",
" Removes bullet-like characters (, •, ▪, etc.).\n",
" \"\"\"\n",
" doc = fitz.open(pdf_path)\n",
"\n",
" if page_num < 1 or page_num > len(doc):\n",
" return \"\"\n",
"\n",
" page = doc[page_num - 1]\n",
" blocks = page.get_text(\"blocks\") # (x0, y0, x1, y1, text, block_no, block_type)\n",
"\n",
" # Sort blocks: top-to-bottom, left-to-right\n",
" blocks_sorted = sorted(blocks, key=lambda b: (b[1], b[0]))\n",
"\n",
" text_lines = []\n",
" last_y = None\n",
"\n",
" for b in blocks_sorted:\n",
" y0 = b[1]\n",
" text_block = b[4].strip()\n",
"\n",
" # Remove bullet-like characters\n",
" text_block = re.sub(r\"[•▪◦●❖▶■]\", \"\", text_block)\n",
"\n",
" # If new line (based on vertical gap)\n",
" if last_y is not None and abs(y0 - last_y) > line_tolerance:\n",
" text_lines.append(\"\") # blank line for spacing\n",
"\n",
" text_lines.append(text_block.strip())\n",
" last_y = y0\n",
"\n",
" # Join all lines with real newlines\n",
" text = \"\\n\".join(text_lines)\n",
"\n",
" # Normalize spaces\n",
" text = re.sub(r\"\\s*\\n\\s*\", \"\\n\", text).strip() # remove spaces around newlines\n",
" text = re.sub(r\" +\", \" \", text).strip() # collapse multiple spaces to one\n",
" text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text).strip() # avoid triple blank lines\n",
"\n",
" doc.close()\n",
" return text\n",
"\n",
"def evaluate_text(reference, prediction):\n",
" return {'WER': wer(reference, prediction), 'CER': cer(reference, prediction)}"
]
},
{
"cell_type": "markdown",
"id": "a79cc4bf",
"metadata": {},
"source": [
"## 3 Model wrapers"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c56f3de2",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import re\n",
"\n",
"def _normalize_box_xyxy(box):\n",
" \"\"\"\n",
" Accepts:\n",
" - [[x,y],[x,y],[x,y],[x,y]] (quad)\n",
" - [x0, y0, x1, y1] (flat)\n",
" - [x0, y0, x1, y1, x2, y2, x3, y3] (flat quad)\n",
" Returns (x0, y0, x1, y1)\n",
" \"\"\"\n",
" # Quad as list of points?\n",
" if isinstance(box, (list, tuple)) and box and isinstance(box[0], (list, tuple)):\n",
" xs = [p[0] for p in box]\n",
" ys = [p[1] for p in box]\n",
" return min(xs), min(ys), max(xs), max(ys)\n",
"\n",
" # Flat list\n",
" if isinstance(box, (list, tuple)):\n",
" if len(box) == 4:\n",
" x0, y0, x1, y1 = box\n",
" # ensure order\n",
" return min(x0, x1), min(y0, y1), max(x0, x1), max(y0, y1)\n",
" if len(box) == 8:\n",
" xs = box[0::2]\n",
" ys = box[1::2]\n",
" return min(xs), min(ys), max(xs), max(ys)\n",
"\n",
" # Fallback\n",
" raise ValueError(f\"Unrecognized box format: {box!r}\")\n",
"\n",
"def ocr_paddle(img, image_array, min_score=0.0, line_tol_factor=0.6):\n",
" \"\"\"\n",
" Robust line grouping for PaddleOCR outputs:\n",
" - normalizes boxes to (x0,y0,x1,y1)\n",
" - adaptive line tolerance based on median box height\n",
" - optional confidence filter\n",
" - inserts '\\n' between lines and preserves left→right order\n",
" \"\"\"\n",
" result = paddleocr_model.predict(image_array)\n",
"\n",
" boxes_all = [] # (x0, y0, x1, y1, y_mid, text, score)\n",
" for item in result:\n",
" res = item.json.get(\"res\", {})\n",
" boxes = res.get(\"rec_boxes\", []) or [] # be defensive\n",
" texts = res.get(\"rec_texts\", []) or []\n",
" scores = res.get(\"rec_scores\", None)\n",
"\n",
" for i, (box, text) in enumerate(zip(boxes, texts)):\n",
" try:\n",
" x0, y0, x1, y1 = _normalize_box_xyxy(box)\n",
" except Exception:\n",
" # Skip weird boxes gracefully\n",
" continue\n",
"\n",
" y_mid = 0.5 * (y0 + y1)\n",
" score = float(scores[i]) if (scores is not None and i < len(scores)) else 1.0\n",
"\n",
" t = re.sub(r\"\\s+\", \" \", str(text)).strip()\n",
" if not t:\n",
" continue\n",
"\n",
" boxes_all.append((x0, y0, x1, y1, y_mid, t, score))\n",
"\n",
" if min_score > 0:\n",
" boxes_all = [b for b in boxes_all if b[6] >= min_score]\n",
"\n",
" if not boxes_all:\n",
" return \"\"\n",
"\n",
" # Adaptive line tolerance\n",
" heights = [b[3] - b[1] for b in boxes_all]\n",
" median_h = float(np.median(heights)) if heights else 20.0\n",
" line_tol = max(8.0, line_tol_factor * median_h)\n",
"\n",
" # Sort by vertical mid, then x0\n",
" boxes_all.sort(key=lambda b: (b[4], b[0]))\n",
"\n",
" # Group into lines\n",
" lines, cur, last_y = [], [], None\n",
" for x0, y0, x1, y1, y_mid, text, score in boxes_all:\n",
" if last_y is None or abs(y_mid - last_y) <= line_tol:\n",
" cur.append((x0, text))\n",
" else:\n",
" cur.sort(key=lambda t: t[0])\n",
" lines.append(\" \".join(t[1] for t in cur))\n",
" cur = [(x0, text)]\n",
" last_y = y_mid\n",
"\n",
" if cur:\n",
" cur.sort(key=lambda t: t[0])\n",
" lines.append(\" \".join(t[1] for t in cur))\n",
"\n",
" res = \"\\n\".join(lines)\n",
" res = re.sub(r\"\\s+\\n\", \"\\n\", res).strip()\n",
" return res\n"
]
},
{
"cell_type": "markdown",
"id": "e42cae29",
"metadata": {},
"source": [
"## 4 Run AI OCR Benchmark"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9b55c154",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qm8btX8P/CtedKAikqIokFII4VmRCONlCHKFClCkrkMP0PRICrzlKGQMhYVpWjQZCgKadQ8SPV/fdb5r9M+z32ec55z7z536v1+ve6L7j3nefZee+01ftdaD3vggQceaAAAAAAAAAAAAKBD83T5YQAAAAAAAAAAABCC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAA
"text/plain": [
"<Figure size 2481x3508 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ref: \n",
"Referencias bibliográficas: el trabajo debe incluir una sección de bibliografía, en\n",
"la que aparezcan en formato APA los detalles de todos los documentos a los que\n",
"se haga referencia en el TFE.\n",
"Anexos: estos son apartados opcionales que contienen cuestionarios, encuestas,\n",
"resultados de pilotos, documentos adicionales, capturas de pantalla, y otros\n",
"elementos que complementan o amplían la información del trabajo. Los anexos se\n",
"diferencian empleando una letra (Anexo A, Anexo B…).\n",
"En el punto 2 se describen con mayor detalle cada uno de los apartados del TFE.\n",
"La extensión mínima en un TFE individual es de 50 páginas y la máxima de 90 páginas,\n",
"sin contar portada, índices y anexos.\n",
"En el caso de un TFE grupal, la extensión del TFE debe garantizar que cada uno de los\n",
"integrantes del equipo ha dedicado a la elaboración y defensa del trabajo las\n",
"competencias previstas en la memoria. A modo orientativo, y para garantizar la calidad\n",
"y dedicación que requiere un TFE grupal, la extensión mínima ha de ser un 50% superior\n",
"a lo previsto para un TFE individual. Por lo tanto, la extensión mínima en un TFE grupal\n",
"es de 75 páginas.\n",
"1.3. Formatos y plantilla de trabajo\n",
"Para la elaboración del TFE, ya sea individual o grupal, debes utilizar la plantilla\n",
"disponible en el aula virtual. No modifiques los estilos definidos en esta plantilla:\n",
"márgenes, interlineado, tipos de letra, etc. Para cualquier duda que pueda surgirte,\n",
"puedes consultar esos formatos a continuación.\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"El trabajo deberá estar escrito en tamaño de página A4 con los siguientes\n",
"márgenes:\n",
"Izquierdo: 3,0 cm.\n",
"Derecho: 2,0 cm.\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"5\n",
"Máster Universitario en Inteligencia Artificial\n",
"paddle_text: \n",
"Referencias bibliográficas: el trabajo debe incluir una sección de bibliografía, en\n",
"la que aparezcan en formato APA los detalles de todos los documentos a los que\n",
"se haga referencia en el TFE.\n",
"Anexos: estos son apartados opcionales que contienen cuestionarios, encuestas,\n",
"resultados de pilotos, documentos adicionales, capturas de pantalla,y otros\n",
"elementos que complementan o amplían la información del trabajo. Los anexos se\n",
"diferencian empleando una letra (Anexo A, Anexo B.….).\n",
"En el punto 2 se describen con mayor detalle cada uno de los apartados del TfE.\n",
"La extensión mínima en un TfE individual es de 50 páginas y la máxima de 90 páginas,\n",
"sin contar portada,índices y anexos.\n",
"En el caso de e un TFE grupal, la extensión del TFE debe garantizar que cada uno de los\n",
"integrantes del equipo ha dedicado a I la elaboración y defensa del trabajo las\n",
"competencias previstas en la memoria. A modo orientativo,y para garantizar la calidad\n",
"y dedicación que requiere un TFE grupal, la extensión mínima ha de ser un 50% superior\n",
"a lo previsto para un TFE individual. Por lo tanto, la extensión mínima en un TFE grupal\n",
"es de 75 páginas.\n",
"1.3. Formatos y plantilla de trabajo\n",
"Para la elaboración del TFE,ya sea individual o grupal, debes utilizar la plantilla\n",
"disponible en el aula virtual. No modifiques los estilos definidos en esta plantilla:\n",
"margenes, interlineado, tipos de letra, etc. Para cualquier duda que pueda surgirte,\n",
"puedes consultar esos formatos a continuacion.\n",
"El trabajo deberá estar escrito en tamaño de página A4 con los siguientes\n",
"márgenes:\n",
"© Universidad Internacional de La Rioja (UNiR) Izquierdo: 3,0 cm.\n",
"Derecho: 2,0 cm.\n",
"Instrucciones para la redacción y elaboración del TfE 5\n",
"Máster Universitario en Inteligencia Artificial\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe4HVW9N+AFhJaQhBIg9N57770pKoKKgiKK91pQr+V6vfbeRdRr7w1QsIAIgvTee+81CT0hQEIJJd/zWzLn2+dwWpKBtPd9ngM5Ze89s2bNmtkzv/1f80ydOnVqAQAAAAAAAAAAgBbN2+aTAQAAAAAAAAAAQAinAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHX
"text/plain": [
"<Figure size 2481x3508 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ref: \n",
"Superior e inferior: 2,5 cm.\n",
"Formato de párrafo en texto principal (estilo de la plantilla “Normal”):\n",
"Calibri 12, justificado, interlineado 1,5, espacio entre párrafos 6 puntos\n",
"anterior y 6 puntos posterior, sin sangría.\n",
"Títulos:\n",
"Primer nivel (estilo de la plantilla “Título 1”): Calibri Light 18, azul, justificado,\n",
"interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6 puntos\n",
"posterior, sin sangría.\n",
"Segundo nivel (estilo de la plantilla “Título 2”): Calibri Light 14, azul,\n",
"justificado, interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6\n",
"puntos posterior, sin sangría.\n",
"Tercer nivel (estilo de la plantilla “Título 3”: Calibri Light 12, justificado,\n",
"interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6 puntos\n",
"posterior, sin sangría.\n",
"Notas al pie:\n",
"Calibri 10, justificado, interlineado sencillo, espacio entre párrafos 0 puntos\n",
"anterior y 0 puntos posterior, sin sangría.\n",
"Tablas y figuras:\n",
"Título en la parte superior de la tabla o figura.\n",
"Numeración tabla o figura (Tabla 1/ Figura1): Calibri 12, negrita, justificado.\n",
"Nombre tabla o figura: Calibri 12, cursiva, justificado.\n",
"Cuerpo: la tipografía de las tablas o figuras se pueden reducir hasta los 9\n",
"puntos si estas contienen mucha información. Si la tabla o figura es muy\n",
"grande, también se puede colocar en apaisado dentro de la hoja.\n",
"Fuente de la tabla o figura en la parte inferior. Calibri 9,5, centrado.\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"Encabezado y pie de página:\n",
"Todas las páginas llevarán un encabezado con el nombre completo del\n",
"estudiante y el título del TFE.\n",
"Todas las páginas llevarán también un pie de página con el número de página.\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"6\n",
"Máster Universitario en Inteligencia Artificial\n",
"paddle_text: \n",
"Superior e inferior: 2,5 cm.\n",
"Formato de párrafo en texto principal (estilo de la plantilla “Normal\"):\n",
"Calibri 12, justificado, interlineado 1,5, espacio entre párrafos 6 puntos\n",
"anterior y 6 puntos posterior, sin sangría.\n",
"Títulos:\n",
"Primer nivel (estilo de la plantillaTítulo 1\"): Calibri Light 18, azul, justificado,\n",
"interlineado 1,5,espacio entre párrafos 6 puntos anterior y 6 puntos\n",
"posterior, sin sangría.\n",
"Segundo nivel (estilo de la plantilla Titulo 2\"): Calibri Light 14, azul,\n",
"justificado, interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6\n",
"puntos posterior, sin sangría.\n",
"Tercer nivel (estilo de la plantilla Título 3\": Calibri Light 12, justificado,\n",
"interlineado 1,5,espacio entre párrafos 6 puntos anterior y 6 puntos\n",
"posterior, sin sangría.\n",
"Notas al pie:\n",
"Calibri 10, justificado, interlineado sencillo, espacio entre párrafos O puntos\n",
"anterior y O puntos posterior, sin sangra.\n",
"Tablas y figuras:\n",
"Título en la parte superior de la tabla o figura.\n",
"Numeración tabla o figura (Tabla 1/ Figura1): Calibri 12, negrita, justificado.\n",
"Nombre tabla o figura: Calibri 12, cursiva, justificado.\n",
"Cuerpo: la tipografía de las tablas o figuras se pueden reducir hasta los 9\n",
"puntos si estas contienen mucha información. Si la tabla o figura es muy\n",
"grande, también se puede colocar en apaisado dentro de la hoja.\n",
"Fuente de la tabla o figura en la parte inferior. Calibri 9,5, centrado.\n",
"Encabezado y pie de página:\n",
"Todas las páginas llevarán un encabezado con el nombre completo del\n",
"estudiante y el título del TFE.\n",
"© Universidad Internacional de La Rioja (UNiR)\n",
"Todas las páginas llevarán también un pie de página con el número de página.\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"Máster Universitario en Inteligencia Artificial 9\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe8XkWZP/AJhJaEXkLvEHrvVVFXsdBsWLFjW9eylt11bes2/+paVtfeUBQbioCIgKJ0pPfeOwRI6KT8P7+597x575tbw0kBvt/9ZPEm9773nDnnzMyZeeaZcbNnz55dAAAAAAAAAAAAoEWLtflhAAAAAAAAAAAAEILTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAA
"text/plain": [
"<Figure size 2481x3508 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ref: \n",
"Los borradores intermedios deberán entregarse en formato Word. El documento final\n",
"deberá depositarse en formato PDF.\n",
"1.4. Estética y estilo de redacción\n",
"Es fundamental que el TFE presente un aspecto elegante y correcto. Se trata de un\n",
"trabajo académico y debe reflejar la madurez y el nivel formativo de una persona que\n",
"ha finalizado un estudio de grado o postgrado. Ten en cuenta las siguientes\n",
"recomendaciones en todas y cada una de las entregas que realices y, en especial, en\n",
"el depósito final del documento:\n",
"Verifica la originalidad del documento, asegurándote de que citas todas las\n",
"fuentes consultadas y no existen textos de autoría ajena sin referenciar\n",
"correctamente.\n",
"Cuida la presentación del trabajo. Comprueba que formatos como tipo y tamaño\n",
"de letra, número de páginas, encabezados, justificación de párrafos, interlineado,\n",
"etc., son correctos.\n",
"Revisa la ortografía y la redacción. Utiliza el corrector de Word para asegurarte de\n",
"que no has dejado ninguna errata. Una lectura detenida del documento también\n",
"te ayudará a detectar erratas, omisiones o redundancias. Si es posible, pide a\n",
"alguien cercano que lo lea y te dé su opinión sobre la redacción. Presta especial\n",
"atención a los siguientes aspectos:\n",
"-\n",
"Que los párrafos sigan un orden o hilo argumental lógico.\n",
"-\n",
"Que la información se presente de una manera que facilite su\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"comprensión, definiendo los conceptos necesarios e incluyendo las citas\n",
"bibliográficas pertinentes.\n",
"-\n",
"Elimina párrafos demasiado cortos. Cada párrafo debería tener al menos\n",
"tres oraciones.\n",
"-\n",
"Elimina frases superfluas y repeticiones de ideas.\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"7\n",
"Máster Universitario en Inteligencia Artificial\n",
"paddle_text: \n",
"Los borradores intermedios deberán entregarse en formato Word. El documento final\n",
"deberá depositarse en formato PDf.\n",
"1.4. Estética y estilo de redacción\n",
"Es fundamental que el TFE presente un aspecto elegante y correcto. Se trata de un\n",
"trabajo académico y debe reflejar la madurez y el nivel formativo de una persona que\n",
"ha finalizado un estudio de grado o postgrado. Ten en cuenta las siguientes\n",
"recomendaciones en todas y cada una de las entregas que realices y, en especial, en\n",
"el deposito final del documento:\n",
"Verifica la originalidad del documento,asegurándote de que citas todas las\n",
"fuentes consultadas y no existen textos de autoría ajena sin referenciar\n",
"correctamente.\n",
"Cuida la presentación del trabajo. Comprueba que formatos como tipo y tamaño\n",
"de letra, número de páginas, encabezados, justificación de párrafos, interlineado,\n",
"etc., son correctos.\n",
"Revisa la ortografía y la redacción. Utiliza el corrector de Word para asegurarte de\n",
"que no has dejado ninguna errata. Una lectura detenida del documento también\n",
"te ayudará a detectar erratas, omisiones o redundancias. Si es posible, pide a\n",
"alguien cercano que lo lea y te dé su opinión sobre la redacción. Presta especial\n",
"atención a los siguientes aspectos:\n",
"Que los párrafos sigan un orden o hilo argumental lógico.\n",
"Que la información se presente de una manera que facilite su\n",
"comprensión, definiendo los conceptos necesarios e incluyendo las citas\n",
"bibliograficas pertinentes.\n",
"Elimina párrafos demasiado cortos. Cada párrafo debería tener al menos\n",
"© Universidad Internacional de La Rioja (UNiR) tres oraciones.\n",
"Elimina frases superfluas y repeticiones de ideas.\n",
"Instrucciones para la redacción y elaboración del TfE 7\n",
"Máster Universitario en Inteligencia Artificial\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3QeUJUX5N+AmgwQBySA5SY6CmAUxgaCogIABFJWkYiAIKiJiQjFnFCPmjH+zCAoKEgXJUUAByTl+51fz1WzP3Ql3Zns2Ps85e2B3Z+/trq6uqq56+625HnvssccaAAAAAAAAAAAA6NDcXX4YAAAAAAAAAAAAhOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAAD
"text/plain": [
"<Figure size 2481x3508 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ref: \n",
"-\n",
"Escribe siempre al menos un párrafo de introducción en cada capítulo o\n",
"apartado, explicando de qué vas a tratar en esa sección. Evita que\n",
"aparezcan dos encabezados de nivel consecutivos sin ningún texto entre\n",
"medias.\n",
"Repasa las citas bibliográficas. Comprueba que todas ellas son correctas y siguen\n",
"la normativa que exige la titulación.\n",
"Asegúrate de que las figuras y las tablas se ven clara y correctamente, e incluyen\n",
"número y título, así como su procedencia o fuente.\n",
"Comprueba que los índices se generan correctamente.\n",
"1.5. Normativa de citas\n",
"En esta titulación se cita de acuerdo con la normativa APA.\n",
"Recuerda que tienes una guía con explicaciones y ejemplos en el apartado Citas y\n",
"bibliografía del aula virtual: https://bibliografiaycitas.unir.net/\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"8\n",
"Máster Universitario en Inteligencia Artificial\n",
"paddle_text: \n",
"Escribe siempre al menos un párrafo de introducción en cada capítulo o\n",
"apartado,explicando de qué vas a tratar en esa sección. Evita que\n",
"aparezcan dos encabezados de nivel consecutivos sin ningún texto entre\n",
"medias.\n",
"Repasa las citas bibliográficas. Comprueba que todas ellas son correctas y siguen\n",
"la normativa que exige la titulación.\n",
"Asegúrate de que las figuras y las tablas se ven clara y correctamente, e incluyen\n",
"número y título, así como su procedencia o fuente.\n",
"Comprueba que los índices se generan correctamente.\n",
"1.5. Normativa adecitas\n",
"En esta titulacióon se cita de acuerdo con la normativa Apa.\n",
"Recuerda que tienes una guía con explicaciones y ejemplos en el apartado Citas y\n",
"bibliografía del aula virtual: https://bibliografiaycitas.unir.net/\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"Instrucciones para la redacción y elaboración del TfE\n",
"Máster Universitario en lnteligencia Artificial ∞\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe4XNdZL+5vnzOnN+moS5bce4udxOkJENKo91JD+VNvCDWUCyFwIYHkcuntUkO49EDoCZCQQpyeOLbjXiXbkq1eTu9t9v9ZW5JjS3NU52h0Zt73eUSEJM9Zs2ftMnv99vdleZ7nAQAAAAAAAAAAAFXUVM0XAwAAAAAAAAAAgEQ4DQAAAAAAAAAAgKoTTgMAAAAAAAAAAKDqhNMAAAAAAAAAAACoOuE0AAAAAAAAAAAAqk44DQAAAAAAAAAAgKoTTgMAAAAAAAAAAKDqhNMAAAAAAAAAAACoOuE0AAAAAAAAAKBh/edTo1F6513xkn95OMp5XuvhANQV4TQAAAAAAAAAAACqLstzsV8AAAAAAAAAoDHNlyPG5xeilEV0lZoiy7JaDwmgbginAQAAAAAAAAANK9X0OdrMM8XShNMAqkdbTwAAAAAAAACgYX1y70Rc8bcPxDd9+ImnQ2oAVIdwGgAAAAAAAADQsKYWyrFzYj72T87VeigAdUdbTwAAAAAAAACgYY3OluPx0enoammOy3tbtfUEqCLhNAAAAAAAAACgYZXzPObLeTRlWTRnIZwGUEXaegIAAAAAAAAADevOg1PxFe9/LH78Mzsjr/VgAOpMqdYDAAAAAAAAAAColaGZ+fjs/omYL5d9CABVJpwGAAAAAAAAADSs61Z2xB+9fEus6ShFVuvBANSZLM9zVSkBAAAAAAAAgIa0kOcxt5BHlkW0NmWRpd8AUBVN1XkZAAAAAAAAAIDlZ+vwTLzltl3xzocOheo+ANUlnAYAAAAAAAAANKyd47Px/x4ZjPc+MVTroQDUnVKtBwAAAAAAAAAAUCuX9rbFTz9nbWzuaQsNPQGqK8vzXFVKAAAAAAAAAKAhLeR5zC7k0ZRFtDZlkWUiagDVoq0nAAAAAAAAANCw9kzMx19vHYwPPjVa66EA1B3hNAAAAAAAAACgYT08NBU/8dld8Tv37Q+t5wCqq1Tl1wMAAAAAAAAAWDbWd7bE113cF5f1ttV6KAB1J8vzXPAXAAAAAAAAAGhI5TyP2XJetJ5racoiy7JaDwmgbmjrCQAAAAAAAAA0rANTC/H+J0fis/snaj0UgLojnAYAAAAAAAAANKx7Bybj2z+6I952++7Qeg6gukpVfj0AAAAAAAAAgGVjdXspXrWpJ65c2V7roQDUnSzPc8FfAAAAAAAAAAAAqkrlNAAAAAAAAACgYQ3PLMQjw9PR3dIU165sjyzLaj0kgLrRVOsBAAAAAAAAAADUyuf2T8RL3/tovPHjO0LrOYDqUjkNAAAAAAAAAGhYK9ua4yXru+OaFW21HgpA3cnyPBf8BQAAAAAAAAAa0kI5YmphIZqzLNqbM209AapIW08AAAAAAAAAoGGNzM7H3Qcn49Hh6VoPBaDuCKcBAAAAAAAAAA3rnoGpeP1HtsfP374ntJ4DqC7hNAAAAAAAAACgYXW3NMdV/R1xUU9rrYcCUHeyPM8FfwEAAAAAAACAhjQ5X479k3PR1twUGzpLkWVZrYcEUDdUTgMAAAAAAAAAGtbUfDl2jM3E3onZWg8FoO4IpwEAAAAAAAAADevhoen40U/vit++b39oPQdQXcJpAAAAAAAAAEDDKjVlsaK9FD0tzbUeCkDdyfI8F/wFAAAAAAAAABrS0PRCPDg0FT2tTXFDf0dkWVbrIQHUDZXTAAAAAAAAAICGlbJozcUvoTSAahNOAwAAAAAAAAAa1taRmXjrHXviXQ8dCq3nAKpLOA0AAAAAAAAAaFgL5TzG58oxOb9Q66EA1J0sz3PBXwAAAAAAAACgIR2amo8vHJyIle2leP6azsi09wSomlL1XgoAAAAAAAAAYHlpac5iTUdLdLVoPgdQbY6sAAAAAAAAAEDDemxkJt52597404cPhdZzANWlchoAAAAAAAAA0LAm5sqxdXgqOpX3Aai6LM9zwV8AAAAAAAAAoCHtm5yLT+8dj1VtzfElm3oiy7JaDwmgbginAQAAAAAAAAANa3q+HAen56OtOYs17SXhNIAqUpQSAAAAAAAAAGhYDw/PxE9+blf83/sPhNZzANVVqvLrAQAAAAAAAAAsG4Mz8/Hx3WMxNrtQ66EA1B1tPQEAAAAAAACAhrVvci4+s3c8+ttL8SUbu7X1BKgi4TQAAAAAAAAAAACqrqn6LwkAAAAAAAAAsDzcc2gyvuUj2+Pnb98d5Tyv9XAA6opwGgAAAAAAAADQsPZOzsc/PjEUt+4arfVQAOqOtp4AAAAAAAAAQMPaOzEXn9g7FqvaSvHlF/RElmW1HhJA3RBOAwAAAAAAAAAaVjmPWMjzSJG05iyE0wCqSFtPAAAAAAAAAKBhPTA4FW/42I745bv2RV7rwQDUGeE0AAAAAAAAAKBh7Zuci3/aPhy37h6t9VAA6k6p1gMAAAAAAAAAAKiVa1Z2xP996eZY09FStPYEoHqyPM9VpQQAAAAAAAAAGtJ8OY/phXI0Z1m0N2eRZSJqANWirScAAAAAAAAA0LC2jszEz962O/7ogYOhug9AdQmnAQAAAAAAAAANa/fEXPz1tqH4z6dGaj0UgLpTqvUAAAAAAAAAAABq5aKe1vjRG9bE5q7W0NAToLqyPM9VpQQAAAAAAAAAGtJcOY/xuYUoZVl0tzRFlomoAVSLtp4AAAAAAAAAQMPaNT4Xf/bwoXi/tp4AVSecBgAAAAAAAAA0rMdHZ+KX794ff/HIodB6DqC6SlV+PQAAAAAAAACAZWNDZ0t84yUr4tK+9loPBaDuZHmeC/4CAAAAAAAAAA1prpzHxNxCNDdl0V1qiizLaj0kgLqhrScAAAAAAAAA0LAOTs3HB54ajc/tn6j1UADqjnAaAAAAAAAAANCwHhyajh/+9M74jbv3hdZzANUlnAYAAAAAAAAANKzelqa4YWV7XN7bVuuhANSdLM9zwV8AAAAAAAAAoCHNl/OYnC9HcxbRWWqKLMtqPSSAuqFyGgAAAAAAAADQsKbm89g+NhN7J+drPRSAuiOcBgAAAAAAAAA0rM/tH4+XvXdrvPETO0LrOYDqEk4DAAAAAAAAABpWW3NTrGlrjpVtpVoPBaDuZHmeC/4CAAAAAAAAAA1pIc9jdiGPpiyitSmLLMtqPSSAuiGcBgAAAAAAAAAAQNVp6wkAAAAAAAAANKwP7RyNznfdHV/y3keirPkcQFUJpwEAAAAAAAAADaucR8yW85hLvwGgqrT1BAAAAAAAAAAa1txCHiNzC9GSZdHb2hRZltV6SAB1QzgNAAAAAAAAAGhYhzt5frFqmnAaQPVo6wkAAAAAAAAANKxP7h2PK//2gfjmj2x/RkQNgGooVeVVAAAAAAAAAACWocn5cjwxPhdrOmZqPRSAuqOtJwAAAAAAAADQsEZny/HY6HR0lZriir42bT0Bqkg4DQAAAAAAAABoWHn+xWaeWZbVdCwA9aap1gMAAAAAAAAAAKiVOw9OxWv+fVv82Kd3RvkZQTUAzl6pCq8BAAAAAAAAALAsHZqej1v3jsfE/EKthwJQd4TTAAAAAAAAAICGdV1/R/zxy7fE2o5SaOoJUF1Z/szmyQAAAAAAAAAADeTY2ESWiagBVEtT1V4JAAAAAAAAAGCZ2ToyG2+5bXf88UOHaj0UgLojnAYAAAAAAAAANKztozPxW/cdiHdvHQit5wCqq1Tl1wMAAAAAAAAAWDYu6W2LN9+4Ni7sbQ8NPQGqK8uPbZ4MAAAAAAAAANAgUmyinEdkWRThtCz9BoCqUDkNAAAAAAAAAGhYO8fn4v1PjcS6zpb47xf11Xo4AHWlqdYDAAAAAAAAAAColYeGpuNHP7MrfvuefaH1HEB1qZwGAAAAAAAAADSsNR2leO0FPXHlivZaDwWg7mR5ap4MAAAAAAAAANCAUmyinJIT2eH
"text/plain": [
"<Figure size 2481x3508 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"ref: \n",
"2. Estructura del documento\n",
"En esta sección se describe con mayor profundidad la estructura y los contenidos\n",
"esperados en cada apartado de tu TFE.\n",
"Léela con detenimiento y compárala con la programación semanal que encontrarás\n",
"en el aula virtual, pues en cada borrador deberás entregar completados diferentes\n",
"apartados que se explican a continuación, y que se elaboran de una manera no\n",
"necesariamente lineal.\n",
"Como ya se ha mencionado, la memoria debe estar estructurada en capítulos. Por\n",
"norma general, la estructura de capítulos suele reflejar la línea de discurso del\n",
"trabajo, empezando por una introducción donde se plantea el problema, seguida de\n",
"un estudio de la literatura donde se estudia y describe el contexto. Posteriormente\n",
"se establecen claramente la hipótesis de trabajo y los objetivos concretos de\n",
"investigación, así como la descripción de la metodología seguida para alcanzar los\n",
"objetivos. Posteriormente se describe la contribución del trabajo, seguida de una\n",
"evaluación de la misma. La evaluación da pie a la elaboración de las conclusiones,\n",
"que deben relacionar los resultados obtenidos con los objetivos planteados\n",
"inicialmente. Finalmente, se describen las líneas de trabajo futuro necesarias para\n",
"seguir avanzando hacia la consecución de los objetivos.\n",
"A continuación, te dejamos algunos consejos generales sobre cómo organizar los\n",
"capítulos, pero ten en cuenta que cada trabajo es único y esta organización es una\n",
"guía general adaptable. El director específico de tu TFE podrá aportarte consejos\n",
"sobre cómo organizar la memoria adaptándote al contexto de tu trabajo concreto.\n",
"© Universidad Internacional de La Rioja (UNIR)\n",
"Como recomendación general, la estructura de capítulos de tu memoria debería ser\n",
"similar a la siguiente propuesta:\n",
"Organización del trabajo en grupo (solo en trabajos grupales)\n",
"Capítulo 1 Introducción\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"9\n",
"Máster Universitario en Inteligencia Artificial\n",
"paddle_text: \n",
"2.E Estructura del documento\n",
"En esta sección se describe con mayor profundidad la estructura y los contenidos\n",
"esperados en cada apartado de tu Tfe.\n",
"Léela con detenimiento y compárala con la programación semanal que encontraras\n",
"en el aula virtual, pues en cada borrador deberás entregar completados diferentes\n",
"apartados que se explican a continuación,y que se elaboran de una manera no\n",
"necesariamente lineal.\n",
"Como ya se ha mencionado, la memoria debe estar estructurada en capítulos. Por\n",
"norma general, la estructura de capitulos suele reflejar la linea de discurso del\n",
"trabajo, empezando por una introducción donde se plantea el problema, seguida de\n",
"un estudio de la literatura donde se estudia y describe el contexto. Posteriormente\n",
"se establecen claramente la hipótesis de trabajo y los objetivos concretos de\n",
"investigación, así como la descripción de la metodología seguida para alcanzar los\n",
"objetivos. Posteriormente se describe la contribución del trabajo, seguida de una\n",
"evaluación de la misma. La evaluación da pie a la elaboración de las conclusiones,\n",
"que deben relacionar los resultados obtenidos con los objetivos planteados\n",
"inicialmente. Finalmente, se describen las líneas de trabajo futuro necesarias para\n",
"seguir avanzando hacia la consecución de los objetivos.\n",
"A continuación, te dejamos algunos consejos generales sobre cómo organizar los\n",
"capítulos, pero ten en cuenta que cada trabajo es único y esta organización es una\n",
"guía general adaptable. El director especifico de tu TFE podrá aportarte consejos\n",
"sobre cómo organizar la memoria adaptándote al contexto de tu trabajo concreto.\n",
"Como recomendación general, la estructura de capítulos de tu memoria debería ser\n",
"similar a la siguiente propuesta:\n",
"© Universidad Internacional de La Rioja (UNiR)\n",
"Organización del trabajo en grupo (solo en trabajos grupales)\n",
"Capítulo1Introducción\n",
"Instrucciones para la redacción y elaboración del TFE\n",
"Máster Universitario en Inteligencia Artificial 6\n"
]
}
],
"source": [
"results = []\n",
"\n",
"for pdf_file in os.listdir(PDF_FOLDER):\n",
" if not pdf_file.lower().endswith('.pdf'):\n",
" continue\n",
" pdf_path = os.path.join(PDF_FOLDER, pdf_file)\n",
" page_range = range(5, 10)\n",
" \n",
" images = pdf_to_images(pdf_path, 300, page_range)\n",
" \n",
" for i, img in enumerate(images):\n",
" # img = preprocess_for_ocr(img)\n",
" page_num = page_range[i]\n",
" ref = pdf_extract_text(pdf_path, page_num=page_num)\n",
" show_page(img, f\"page: {page_num}\", 1)\n",
" print(f\"ref: \\n{ref}\")\n",
" \n",
" # Convert PIL image to numpy array\n",
" image_array = np.array(img)\n",
" \n",
" # PaddleOCR\n",
" paddle_text = ocr_paddle(img, image_array)\n",
" print(f\"paddle_text: \\n{paddle_text}\")\n",
" results.append({'PDF': pdf_file, 'Page': page_num, 'Model': 'PaddleOCR', 'Prediction': paddle_text, **evaluate_text(ref, paddle_text)})\n",
" "
]
},
{
"cell_type": "markdown",
"id": "0db6dc74",
"metadata": {},
"source": [
"## 5 Save and Analyze Results"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "da3155e3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Benchmark results saved as ai_ocr_benchmark_finetune_results_20251112_095108.csv\n",
" WER CER\n",
"Model \n",
"PaddleOCR 0.109534 0.052167\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArwAAAIVCAYAAAAzqSxlAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQLpJREFUeJzt3Qd4VNX29/EVSqhSpBM6hF6lI1W4omIBC0WRqoJSgigQEIgFAa+CiCAoXuTipQsiAoKAgCCdgIpSlSZIEwk9gWTeZ+3/O+NMMglJSGbCzvfzPOeSc86eyZ6TyfWXPevsHeBwOBwCAAAAWCqDvzsAAAAApCYCLwAAAKxG4AUAAIDVCLwAAACwGoEXAAAAViPwAgAAwGoEXgAAAFiNwAsAAACrEXgBAABgNQIvAPjQjBkzJCAgQHbs2HFHXPdu3bpJzpw5b+s5XnrpJfnXv/6VYn3C7WnQoIEMHjyYy4h0hcALWOyjjz4y4ap+/frxttHzffv2TdTz3bhxQyZOnCh169aVu+66ywQh/VqP6TlvoqOj5bPPPpPmzZvL3XffLVmyZJFSpUpJ9+7dPUKfMwg6t0yZMklQUJAJXCdOnEhU/15//XWP58iQIYMUKVJEHn74YdmyZUuingMp6/Dhw/Lpp5/KsGHDzP6ZM2fMzyYkJCROWz2m58LCwuKc69Kli2TOnFmuXr1q9vV94f6zdt+yZs3qety6des8zmXMmFEKFiwoTz75pOzduzdJr2XZsmXm/Z4jRw7zvnriiSfk119/TfI1OX36tLz66qtSsWJFyZ49u3m+2rVry6hRo+TChQuudvo7E99r1Mcm93dnyJAhMnnyZDl16lSS+w7cqTL5uwMAUs+sWbNMuNy2bZscOnRIypUrl+znunLlirRp00bWr19vAqT+x1QD5YoVK0xQWbRokQkE+h9vp2vXrsnjjz9u2jRt2tSEHg29R44ckfnz58t///tfOXbsmBQrVsz1mDfffFNKly4t169fNyFV/2O+ceNG2bNnj0eQSciUKVNMGI+JiZHjx4/LtGnTzPfX61CzZs1kXwMk3QcffGB+ni1atDD7GjaDg4PNzzS2H374wYQ1/dfbuVq1apmA6KR/PGmYjk1DbWz9+/c3YVX/MPvpp59k6tSpJgzr+6pw4cK3fB3bt2+Xxx57TKpUqSL//ve/5eLFi7J06VJzvHLlyom6Fs7neeihh+Ty5cvSuXNnE3SV/vE3duxY+f777+Xbb791tdffjTFjxsR5nty5c8c5ltjfHX0duXLlMn8Q62OAdMEBwEq///67Q3/FFy1a5ChQoIDj9ddf99pO2/Tp0+eWz/fCCy+Yth9++GGcc5MmTTLnevfu7XFcn1ePv//++3Eec/PmTce7777rOH78uNn/7LPPTNvt27d7tBsyZIg5Pm/evFv2MSwszLQ9e/asx/E9e/aY48OGDXP4W3yvM625fPmy+bdr166OHDlyJOs5oqKiHPnz53cMHz7c43j37t0dGTNmdFy6dMnj+2XKlMnx9NNPO3LmzGneH04nT5401+zll192HUtsv9auXWseu2DBAo/jU6ZMMcffeeedRL2WwYMHOwICAhynTp3yOH79+nVHYv3999+OoKAgR6FChRx79+6Nc16f+6233nLtN2vWzFGlSpVbPm9yfnf69u3rKFmypCMmJibR/QfuZJQ0ABaP7ubNm9eMyurHt7qfXH/88Yf85z//kfvuu89r+UOfPn3MCJ6Otmlb52M+/vhjU7s5YMAAr6Nw+rGu++iuN02aNDH//vbbb8nuv3MET0cP3UVGRpqPz3XkW0cLixcvbmob9bi3so/FixdL1apVTVsd6dOR69j0I+SePXtK0aJFTTsdcXvxxRclKioqzvceOHCgFChQwIyKt2vXTs6ePevRRkfndTRdRyLr1Kkj2bJlk2rVqpl9paPquq+jdzpSuGvXLo/H60imjsSXKVPGtNHr0KNHD/nrr7+8loLox/NPP/20ed80btw43uu5e/du02/9yF1HKuOjo4vnzp2TVq1aeRzX59ZSF/cyk61bt8rNmzfNe0KfU7+Hk3PEN6E+JVVS31f6aYY3+jNOLP190PfH+PHjPUoSnAoVKiTDhw8XX7xG/b08evSox3UGbEbgBSylAVfLCQIDA6VTp05y8OBB83FqcnzzzTcmoGgdZXz0nAYWZwjUx+j+s88+K7dDyx+UhrDEOn/+vAlaWi+qIfD55583ga99+/auNlru8Oijj8p7770njzzyiHz44YfStm1bef/996VDhw5ew5vefNWxY0fzkbZ+bKw1nO7h8eTJk1KvXj2ZO3eueQ6tbdbXr2UgztpTp379+smPP/5oArcG4q+//trrHxNaiqIhVPuoH23//fff5mv9+b788svmY/E33njDhBp9ffq6nFatWiW///67qZfW16d9177pR+r/N7jv6amnnjL9HD16tLlm3uh7SP/w0fIC/RkndEPbpk2bTJDWtu6cwdW9rEFDbfny5U1b/SPIvawhocCrP+fYm5YbpPT7Sn+O+keaXnNv1y4xlixZYv5o0T9AE0t/77y9Ri0xup3X6Cyl8FY+AljJ30PMAFLejh07zEeZq1atMvv6sWWxYsUcISEhySppGDBggGm3a9eueNuEh4ebNgMHDjT7+vHzrR7j7WPZ1atXm5IELXX44osvTDlGlixZXKUPiSlpiL3lyZPHsWLFCo+2n3/+uSNDhgyODRs2eByfOnWqecwPP/zgOqb7gYGBjkOHDrmO/fjjj3FKPLp06WKe01u5gvOjY+frbNWqlcfHyXq99GP+CxcuuI7pR87adtOmTa5jK1euNMeyZcvmOHr0qOv4xx9/bI7rR/hOV69ejdOPOXPmmHbff/99nOvWqVOnOO3dSwc2btzoyJUrl6NNmzaJ+ii/c+fOjnz58nk9V7BgQUfLli1d+61btzalDqp9+/aOp556ynWuTp06juDg4Dj98vaz1k2fK3ZJw/Tp0837Sssj9L1Qrlw5U6Kwbds2R2IsXrzYkT17dvMzcr7Hkypv3ryOGjVqJLq9ljTE9xp79ep12787+p5+8cUXk/VagDsNN60BFtLRP/141HmjkI6y6Yjj//73Pxk3bpzXm3oScunSJfOvzswQH+c55+ia89+EHuNN7I+/9WN97fetSh/cLVy40NyUo1lVP0LWm9h0NFZvBmrUqJFps2DBAqlUqZL5aFlHzJx09FKtXbvW1dbZr7Jly7r2q1evbr6HjqAqHVnVkgcdfdXyg9j0Z+DuhRde8DimHz/r6LJ+zKzP7aQ3RDVs2NC175xxQ/tZokSJOMe1P1pqoHQ00UlHpLVUQKekUuHh4a6PvJ169+4d7zXV66Gv7f777zejxPrJwa3o6Hd8I6j33nuvGYHWEUy9DlreoKPtznPvvPOO+VpHnPVjd2+fFOiovY6Mx5Y/f/44x7SUw52WZHz++efmRrZb0RvKdPRcSxH0devPTt/XWgri1Lp1a9PXDRs2xPs8+juR1N8Hff/rTZexeft9SOrvjv5s3N/7gM0IvIBlNEBoINGwq1NCuQciDbtr1qwxoSUpnP+RdgbfxIRiDYO3eow3Ol2SfrQdEREh06dPN3etJ6VOUumMDO6hRz9C1pkBtIxg586d5piWeOi0VBp8vNFyCHfu4dI9MGiJgdL6Ww00WuObGLGfzxkMnc8XXzvn3flab+ztuPvjtbRDyx30/RD79ej1jU3rjb3RsKy14PoxuM6uEbsWOiHxffyv5QlffvmlCbM63Zj2R4Ou0j80tDxEP5LX97CWxngrZ9A/3GKHvPiMHDnSBHwN/fp99ZrEV5cbm9bV6vtHa9Wd04qNGDHCXHMtcVC//PKLKRlJiP5OJPX3Qeu7E/sak/q7oz+b2H+IAbYi8AKW+e677+TPP/80/0HXzdvob1IDr46EOm+Cim9aLz2nnFM0OW/K+fnnn5M0FZjWwDpHSLWmVoOO1rDu378/2Qsg6OM08H/11Vem9lFDhI7I6g1fOmrnTexAGd+oeHLrORP7fPG1S8zjdVRS62gHDRpkfgbOqdoeeOABj1pfJ/cRYXcamrT
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df_results = pd.DataFrame(results)\n",
"\n",
"# Generate a unique filename with timestamp\n",
"timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
"filename = f\"ai_ocr_benchmark_finetune_results_{timestamp}.csv\"\n",
"filepath = os.path.join(OUTPUT_FOLDER, filename)\n",
"\n",
"df_results.to_csv(filepath, index=False)\n",
"print(f\"Benchmark results saved as {filename}\")\n",
"\n",
"# Summary by model\n",
"summary = df_results.groupby('Model')[['WER', 'CER']].mean()\n",
"print(summary)\n",
"\n",
"# Plot\n",
"summary.plot(kind='bar', figsize=(8,5), title='AI OCR Benchmark (WER & CER)')\n",
"plt.ylabel('Error Rate')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "3e0f00c0",
"metadata": {},
"source": [
"### How to read this chart:\n",
"- CER (Character Error Rate) focus on raw transcription quality\n",
"- WER (Word Error Rate) penalizes incorrect tokenization or missing spaces\n",
"- CER and WER are error metrics, which means:\n",
" - Higher values = worse performance\n",
" - Lower values = better accuracy"
]
},
{
"cell_type": "markdown",
"id": "41b427d4",
"metadata": {},
"source": [
"### Compared solutions\n",
"| Model | Type | Components | Key Strengths | Why It Matters |\n",
"| :--------------------- | :--------------------------- | :--------------------------- | :--------------------------------------------------------- | :------------------------------------------------------- |\n",
"| **EasyOCR** | End-to-end (det + rec) | DB + CRNN/Transformer | Lightweight, easy to run, multilingual | Serves as *baseline neuronal* (fast & reproducible). |\n",
"| **PaddleOCR (PP-OCR)** | End-to-end (det + rec + cls) | DB + SRN/CRNN | Strong multilingual support, configurable pipeline | Industrial reference; widely benchmarked. |\n",
"| **DocTR** | End-to-end (det + rec) | DB/LinkNet + CRNN/SAR/VitSTR | Research-oriented, clean API, high-level structured output | Represents the latest *PyTorch*-based academic approach. |\n",
"\n",
"\n",
"These cover the three major open-source paradigms for deep OCR:\n",
"\n",
"EasyOCR: compact CRNN-based recognizer.\n",
"\n",
"PaddleOCR: large industrial model (PP-OCR family).\n",
"\n",
"DocTR: modular research library from Mindee, built for experimentation.\n",
"\n",
"Together they already let you analyse:\n",
"\n",
"accuracy (CER/WER),\n",
"\n",
"inference latency,\n",
"\n",
"model architecture trade-offs."
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv (3.11.9)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}