1286 lines
5.0 MiB
Plaintext
1286 lines
5.0 MiB
Plaintext
|
|
{
|
|||
|
|
"cells": [
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "be3c1872",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"# AI-based OCR Benchmark Notebook\n",
|
|||
|
|
"\n",
|
|||
|
|
"This notebook benchmarks **AI-based OCR models** on scanned PDF documents/images in Spanish.\n",
|
|||
|
|
"It excludes traditional OCR engines like Tesseract that require external installations."
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 1,
|
|||
|
|
"id": "6a1e98fe",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: pip in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (25.3)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: jupyter in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.1.1)\n",
|
|||
|
|
"Requirement already satisfied: notebook in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.4.7)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-console in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (6.6.3)\n",
|
|||
|
|
"Requirement already satisfied: nbconvert in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.16.6)\n",
|
|||
|
|
"Requirement already satisfied: ipykernel in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (7.1.0)\n",
|
|||
|
|
"Requirement already satisfied: ipywidgets in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (8.1.8)\n",
|
|||
|
|
"Requirement already satisfied: jupyterlab in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter) (4.4.10)\n",
|
|||
|
|
"Requirement already satisfied: comm>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (0.2.3)\n",
|
|||
|
|
"Requirement already satisfied: debugpy>=1.6.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (1.8.17)\n",
|
|||
|
|
"Requirement already satisfied: ipython>=7.23.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (9.7.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-client>=8.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (8.6.3)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (5.9.1)\n",
|
|||
|
|
"Requirement already satisfied: matplotlib-inline>=0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (0.2.1)\n",
|
|||
|
|
"Requirement already satisfied: nest-asyncio>=1.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (1.6.0)\n",
|
|||
|
|
"Requirement already satisfied: packaging>=22 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (25.0)\n",
|
|||
|
|
"Requirement already satisfied: psutil>=5.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (7.1.3)\n",
|
|||
|
|
"Requirement already satisfied: pyzmq>=25 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (27.1.0)\n",
|
|||
|
|
"Requirement already satisfied: tornado>=6.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (6.5.2)\n",
|
|||
|
|
"Requirement already satisfied: traitlets>=5.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel->jupyter) (5.14.3)\n",
|
|||
|
|
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.4.6)\n",
|
|||
|
|
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (5.2.1)\n",
|
|||
|
|
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (1.1.1)\n",
|
|||
|
|
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.19.2)\n",
|
|||
|
|
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (3.0.52)\n",
|
|||
|
|
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (2.19.2)\n",
|
|||
|
|
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.6.3)\n",
|
|||
|
|
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel->jupyter) (4.15.0)\n",
|
|||
|
|
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel->jupyter) (0.2.14)\n",
|
|||
|
|
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=7.23.1->ipykernel->jupyter) (0.8.5)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-client>=8.0.0->ipykernel->jupyter) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: platformdirs>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel->jupyter) (4.5.0)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->jupyter-client>=8.0.0->ipykernel->jupyter) (1.17.0)\n",
|
|||
|
|
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (2.2.1)\n",
|
|||
|
|
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (3.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel->jupyter) (0.2.3)\n",
|
|||
|
|
"Requirement already satisfied: widgetsnbextension~=4.0.14 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets->jupyter) (4.0.15)\n",
|
|||
|
|
"Requirement already satisfied: jupyterlab_widgets~=3.0.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets->jupyter) (3.0.16)\n",
|
|||
|
|
"Requirement already satisfied: async-lru>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.0.5)\n",
|
|||
|
|
"Requirement already satisfied: httpx<1,>=0.25.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (0.28.1)\n",
|
|||
|
|
"Requirement already satisfied: jinja2>=3.0.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (3.1.6)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-lsp>=2.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.3.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-server<3,>=2.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.17.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyterlab-server<3,>=2.27.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (2.28.0)\n",
|
|||
|
|
"Requirement already satisfied: notebook-shim>=0.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (0.2.4)\n",
|
|||
|
|
"Requirement already satisfied: setuptools>=41.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab->jupyter) (65.5.0)\n",
|
|||
|
|
"Requirement already satisfied: anyio in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (4.11.0)\n",
|
|||
|
|
"Requirement already satisfied: certifi in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (2025.10.5)\n",
|
|||
|
|
"Requirement already satisfied: httpcore==1.* in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (1.0.9)\n",
|
|||
|
|
"Requirement already satisfied: idna in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx<1,>=0.25.0->jupyterlab->jupyter) (3.11)\n",
|
|||
|
|
"Requirement already satisfied: h11>=0.16 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpcore==1.*->httpx<1,>=0.25.0->jupyterlab->jupyter) (0.16.0)\n",
|
|||
|
|
"Requirement already satisfied: argon2-cffi>=21.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.1.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-events>=0.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.12.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-server-terminals>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.5.3)\n",
|
|||
|
|
"Requirement already satisfied: nbformat>=5.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (5.10.4)\n",
|
|||
|
|
"Requirement already satisfied: overrides>=5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (7.7.0)\n",
|
|||
|
|
"Requirement already satisfied: prometheus-client>=0.9 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.23.1)\n",
|
|||
|
|
"Requirement already satisfied: pywinpty>=2.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (3.0.2)\n",
|
|||
|
|
"Requirement already satisfied: send2trash>=1.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.8.3)\n",
|
|||
|
|
"Requirement already satisfied: terminado>=0.8.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.18.1)\n",
|
|||
|
|
"Requirement already satisfied: websocket-client>=1.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.9.0)\n",
|
|||
|
|
"Requirement already satisfied: babel>=2.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.17.0)\n",
|
|||
|
|
"Requirement already satisfied: json5>=0.9.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.12.1)\n",
|
|||
|
|
"Requirement already satisfied: jsonschema>=4.18.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (4.25.1)\n",
|
|||
|
|
"Requirement already satisfied: requests>=2.31 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.32.5)\n",
|
|||
|
|
"Requirement already satisfied: sniffio>=1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from anyio->httpx<1,>=0.25.0->jupyterlab->jupyter) (1.3.1)\n",
|
|||
|
|
"Requirement already satisfied: argon2-cffi-bindings in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.1.0)\n",
|
|||
|
|
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jinja2>=3.0.3->jupyterlab->jupyter) (3.0.3)\n",
|
|||
|
|
"Requirement already satisfied: attrs>=22.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (25.4.0)\n",
|
|||
|
|
"Requirement already satisfied: jsonschema-specifications>=2023.03.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2025.9.1)\n",
|
|||
|
|
"Requirement already satisfied: referencing>=0.28.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.37.0)\n",
|
|||
|
|
"Requirement already satisfied: rpds-py>=0.7.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema>=4.18.0->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (0.28.0)\n",
|
|||
|
|
"Requirement already satisfied: python-json-logger>=2.0.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (4.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pyyaml>=5.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (6.0.2)\n",
|
|||
|
|
"Requirement already satisfied: rfc3339-validator in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.1.4)\n",
|
|||
|
|
"Requirement already satisfied: rfc3986-validator>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (0.1.1)\n",
|
|||
|
|
"Requirement already satisfied: fqdn in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.5.1)\n",
|
|||
|
|
"Requirement already satisfied: isoduration in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (20.11.0)\n",
|
|||
|
|
"Requirement already satisfied: jsonpointer>1.13 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (3.0.0)\n",
|
|||
|
|
"Requirement already satisfied: rfc3987-syntax>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.1.0)\n",
|
|||
|
|
"Requirement already satisfied: uri-template in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.3.0)\n",
|
|||
|
|
"Requirement already satisfied: webcolors>=24.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (25.10.0)\n",
|
|||
|
|
"Requirement already satisfied: beautifulsoup4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (4.14.2)\n",
|
|||
|
|
"Requirement already satisfied: bleach!=5.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach[css]!=5.0.0->nbconvert->jupyter) (6.3.0)\n",
|
|||
|
|
"Requirement already satisfied: defusedxml in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.7.1)\n",
|
|||
|
|
"Requirement already satisfied: jupyterlab-pygments in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.3.0)\n",
|
|||
|
|
"Requirement already satisfied: mistune<4,>=2.0.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (3.1.4)\n",
|
|||
|
|
"Requirement already satisfied: nbclient>=0.5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (0.10.2)\n",
|
|||
|
|
"Requirement already satisfied: pandocfilters>=1.4.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbconvert->jupyter) (1.5.1)\n",
|
|||
|
|
"Requirement already satisfied: webencodings in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert->jupyter) (0.5.1)\n",
|
|||
|
|
"Requirement already satisfied: tinycss2<1.5,>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bleach[css]!=5.0.0->nbconvert->jupyter) (1.4.0)\n",
|
|||
|
|
"Requirement already satisfied: fastjsonschema>=2.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from nbformat>=5.3.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.21.2)\n",
|
|||
|
|
"Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (3.4.4)\n",
|
|||
|
|
"Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests>=2.31->jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter) (2.5.0)\n",
|
|||
|
|
"Requirement already satisfied: lark>=1.2.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from rfc3987-syntax>=1.1.0->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.3.1)\n",
|
|||
|
|
"Requirement already satisfied: cffi>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pycparser in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2.23)\n",
|
|||
|
|
"Requirement already satisfied: soupsieve>1.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from beautifulsoup4->nbconvert->jupyter) (2.8)\n",
|
|||
|
|
"Requirement already satisfied: arrow>=0.15.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (1.4.0)\n",
|
|||
|
|
"Requirement already satisfied: tzdata in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from arrow>=0.15.0->isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter) (2025.2)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: ipywidgets in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (8.1.8)\n",
|
|||
|
|
"Requirement already satisfied: comm>=0.1.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (0.2.3)\n",
|
|||
|
|
"Requirement already satisfied: ipython>=6.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (9.7.0)\n",
|
|||
|
|
"Requirement already satisfied: traitlets>=4.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (5.14.3)\n",
|
|||
|
|
"Requirement already satisfied: widgetsnbextension~=4.0.14 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (4.0.15)\n",
|
|||
|
|
"Requirement already satisfied: jupyterlab_widgets~=3.0.15 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipywidgets) (3.0.16)\n",
|
|||
|
|
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.4.6)\n",
|
|||
|
|
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (5.2.1)\n",
|
|||
|
|
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (1.1.1)\n",
|
|||
|
|
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.19.2)\n",
|
|||
|
|
"Requirement already satisfied: matplotlib-inline>=0.1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.2.1)\n",
|
|||
|
|
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (3.0.52)\n",
|
|||
|
|
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (2.19.2)\n",
|
|||
|
|
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)\n",
|
|||
|
|
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=6.1.0->ipywidgets) (4.15.0)\n",
|
|||
|
|
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.14)\n",
|
|||
|
|
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=6.1.0->ipywidgets) (0.8.5)\n",
|
|||
|
|
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (2.2.1)\n",
|
|||
|
|
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (3.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=6.1.0->ipywidgets) (0.2.3)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: ipykernel in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (7.1.0)\n",
|
|||
|
|
"Requirement already satisfied: comm>=0.1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (0.2.3)\n",
|
|||
|
|
"Requirement already satisfied: debugpy>=1.6.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (1.8.17)\n",
|
|||
|
|
"Requirement already satisfied: ipython>=7.23.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (9.7.0)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-client>=8.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (8.6.3)\n",
|
|||
|
|
"Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (5.9.1)\n",
|
|||
|
|
"Requirement already satisfied: matplotlib-inline>=0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (0.2.1)\n",
|
|||
|
|
"Requirement already satisfied: nest-asyncio>=1.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (1.6.0)\n",
|
|||
|
|
"Requirement already satisfied: packaging>=22 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (25.0)\n",
|
|||
|
|
"Requirement already satisfied: psutil>=5.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (7.1.3)\n",
|
|||
|
|
"Requirement already satisfied: pyzmq>=25 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (27.1.0)\n",
|
|||
|
|
"Requirement already satisfied: tornado>=6.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (6.5.2)\n",
|
|||
|
|
"Requirement already satisfied: traitlets>=5.4.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipykernel) (5.14.3)\n",
|
|||
|
|
"Requirement already satisfied: colorama>=0.4.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.4.6)\n",
|
|||
|
|
"Requirement already satisfied: decorator>=4.3.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (5.2.1)\n",
|
|||
|
|
"Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (1.1.1)\n",
|
|||
|
|
"Requirement already satisfied: jedi>=0.18.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.19.2)\n",
|
|||
|
|
"Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (3.0.52)\n",
|
|||
|
|
"Requirement already satisfied: pygments>=2.11.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (2.19.2)\n",
|
|||
|
|
"Requirement already satisfied: stack_data>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (0.6.3)\n",
|
|||
|
|
"Requirement already satisfied: typing_extensions>=4.6 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ipython>=7.23.1->ipykernel) (4.15.0)\n",
|
|||
|
|
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel) (0.2.14)\n",
|
|||
|
|
"Requirement already satisfied: parso<0.9.0,>=0.8.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jedi>=0.18.1->ipython>=7.23.1->ipykernel) (0.8.5)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-client>=8.0.0->ipykernel) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: platformdirs>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel) (4.5.0)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->jupyter-client>=8.0.0->ipykernel) (1.17.0)\n",
|
|||
|
|
"Requirement already satisfied: executing>=1.2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (2.2.1)\n",
|
|||
|
|
"Requirement already satisfied: asttokens>=2.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (3.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pure-eval in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from stack_data>=0.6.0->ipython>=7.23.1->ipykernel) (0.2.3)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: transformers in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (4.57.1)\n",
|
|||
|
|
"Requirement already satisfied: torch in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (2.9.0)\n",
|
|||
|
|
"Requirement already satisfied: pdf2image in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.17.0)\n",
|
|||
|
|
"Requirement already satisfied: pillow in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (12.0.0)\n",
|
|||
|
|
"Requirement already satisfied: jiwer in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (4.0.0)\n",
|
|||
|
|
"Requirement already satisfied: paddleocr in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.3.1)\n",
|
|||
|
|
"Requirement already satisfied: hf_xet in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.2.0)\n",
|
|||
|
|
"Requirement already satisfied: paddlepaddle in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.2.1)\n",
|
|||
|
|
"Requirement already satisfied: filelock in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (3.20.0)\n",
|
|||
|
|
"Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.36.0)\n",
|
|||
|
|
"Requirement already satisfied: numpy>=1.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2.3.4)\n",
|
|||
|
|
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (25.0)\n",
|
|||
|
|
"Requirement already satisfied: pyyaml>=5.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (6.0.2)\n",
|
|||
|
|
"Requirement already satisfied: regex!=2019.12.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2025.11.3)\n",
|
|||
|
|
"Requirement already satisfied: requests in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (2.32.5)\n",
|
|||
|
|
"Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.22.1)\n",
|
|||
|
|
"Requirement already satisfied: safetensors>=0.4.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (0.6.2)\n",
|
|||
|
|
"Requirement already satisfied: tqdm>=4.27 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from transformers) (4.67.1)\n",
|
|||
|
|
"Requirement already satisfied: fsspec>=2023.5.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (2025.10.0)\n",
|
|||
|
|
"Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (4.15.0)\n",
|
|||
|
|
"Requirement already satisfied: sympy>=1.13.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (1.14.0)\n",
|
|||
|
|
"Requirement already satisfied: networkx>=2.5.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (3.5)\n",
|
|||
|
|
"Requirement already satisfied: jinja2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from torch) (3.1.6)\n",
|
|||
|
|
"Requirement already satisfied: click>=8.1.8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jiwer) (8.2.1)\n",
|
|||
|
|
"Requirement already satisfied: rapidfuzz>=3.9.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jiwer) (3.14.3)\n",
|
|||
|
|
"Requirement already satisfied: paddlex<3.4.0,>=3.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.3.9)\n",
|
|||
|
|
"Requirement already satisfied: aistudio-sdk>=0.3.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.3.8)\n",
|
|||
|
|
"Requirement already satisfied: chardet in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.2.0)\n",
|
|||
|
|
"Requirement already satisfied: colorlog in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (6.10.1)\n",
|
|||
|
|
"Requirement already satisfied: modelscope>=1.28.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.31.0)\n",
|
|||
|
|
"Requirement already satisfied: pandas>=1.3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.3.3)\n",
|
|||
|
|
"Requirement already satisfied: prettytable in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.16.0)\n",
|
|||
|
|
"Requirement already satisfied: py-cpuinfo in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (9.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pydantic>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.12.4)\n",
|
|||
|
|
"Requirement already satisfied: ruamel.yaml in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.18.16)\n",
|
|||
|
|
"Requirement already satisfied: ujson in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.11.0)\n",
|
|||
|
|
"Requirement already satisfied: imagesize in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.4.1)\n",
|
|||
|
|
"Requirement already satisfied: opencv-contrib-python==4.10.0.84 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (4.10.0.84)\n",
|
|||
|
|
"Requirement already satisfied: pyclipper in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.3.0.post6)\n",
|
|||
|
|
"Requirement already satisfied: pypdfium2>=4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (5.0.0)\n",
|
|||
|
|
"Requirement already satisfied: python-bidi in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.6.7)\n",
|
|||
|
|
"Requirement already satisfied: shapely in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.1.2)\n",
|
|||
|
|
"Requirement already satisfied: httpx in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (0.28.1)\n",
|
|||
|
|
"Requirement already satisfied: protobuf>=3.20.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (6.33.0)\n",
|
|||
|
|
"Requirement already satisfied: opt-einsum==3.3.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from paddlepaddle) (3.3.0)\n",
|
|||
|
|
"Requirement already satisfied: psutil in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (7.1.3)\n",
|
|||
|
|
"Requirement already satisfied: bce-python-sdk in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.9.50)\n",
|
|||
|
|
"Requirement already satisfied: colorama in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from click>=8.1.8->jiwer) (0.4.6)\n",
|
|||
|
|
"Requirement already satisfied: setuptools in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from modelscope>=1.28.0->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (65.5.0)\n",
|
|||
|
|
"Requirement already satisfied: urllib3>=1.26 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from modelscope>=1.28.0->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.5.0)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.7.0)\n",
|
|||
|
|
"Requirement already satisfied: pydantic-core==2.41.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (2.41.5)\n",
|
|||
|
|
"Requirement already satisfied: typing-inspection>=0.4.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pydantic>=2->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.4.2)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas>=1.3->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.17.0)\n",
|
|||
|
|
"Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (3.4.4)\n",
|
|||
|
|
"Requirement already satisfied: idna<4,>=2.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (3.11)\n",
|
|||
|
|
"Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from requests->transformers) (2025.10.5)\n",
|
|||
|
|
"Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from sympy>=1.13.3->torch) (1.3.0)\n",
|
|||
|
|
"Requirement already satisfied: pycryptodome>=3.8.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bce-python-sdk->aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (3.23.0)\n",
|
|||
|
|
"Requirement already satisfied: future>=0.6.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from bce-python-sdk->aistudio-sdk>=0.3.5->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (1.0.0)\n",
|
|||
|
|
"Requirement already satisfied: anyio in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx->paddlepaddle) (4.11.0)\n",
|
|||
|
|
"Requirement already satisfied: httpcore==1.* in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpx->paddlepaddle) (1.0.9)\n",
|
|||
|
|
"Requirement already satisfied: h11>=0.16 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from httpcore==1.*->httpx->paddlepaddle) (0.16.0)\n",
|
|||
|
|
"Requirement already satisfied: sniffio>=1.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from anyio->httpx->paddlepaddle) (1.3.1)\n",
|
|||
|
|
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from jinja2->torch) (3.0.3)\n",
|
|||
|
|
"Requirement already satisfied: wcwidth in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from prettytable->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.2.14)\n",
|
|||
|
|
"Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from ruamel.yaml->paddlex<3.4.0,>=3.3.0->paddlex[ocr-core]<3.4.0,>=3.3.0->paddleocr) (0.2.14)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: PyMuPDF in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (1.26.6)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: pandas in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (2.3.3)\n",
|
|||
|
|
"Requirement already satisfied: numpy>=1.23.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2.3.4)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: matplotlib in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (3.10.7)\n",
|
|||
|
|
"Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (1.3.3)\n",
|
|||
|
|
"Requirement already satisfied: cycler>=0.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (0.12.1)\n",
|
|||
|
|
"Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (4.60.1)\n",
|
|||
|
|
"Requirement already satisfied: kiwisolver>=1.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (1.4.9)\n",
|
|||
|
|
"Requirement already satisfied: numpy>=1.23 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (2.3.4)\n",
|
|||
|
|
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (25.0)\n",
|
|||
|
|
"Requirement already satisfied: pillow>=8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (12.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pyparsing>=3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (3.2.5)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n",
|
|||
|
|
"Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
|
|||
|
|
"Requirement already satisfied: seaborn in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (0.13.2)\n",
|
|||
|
|
"Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (2.3.4)\n",
|
|||
|
|
"Requirement already satisfied: pandas>=1.2 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (2.3.3)\n",
|
|||
|
|
"Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from seaborn) (3.10.7)\n",
|
|||
|
|
"Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.3)\n",
|
|||
|
|
"Requirement already satisfied: cycler>=0.10 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)\n",
|
|||
|
|
"Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.60.1)\n",
|
|||
|
|
"Requirement already satisfied: kiwisolver>=1.3.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.9)\n",
|
|||
|
|
"Requirement already satisfied: packaging>=20.0 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (25.0)\n",
|
|||
|
|
"Requirement already satisfied: pillow>=8 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (12.0.0)\n",
|
|||
|
|
"Requirement already satisfied: pyparsing>=3 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.5)\n",
|
|||
|
|
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)\n",
|
|||
|
|
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2025.2)\n",
|
|||
|
|
"Requirement already satisfied: six>=1.5 in c:\\users\\sji\\desktop\\mastersthesis\\.venv\\lib\\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)\n",
|
|||
|
|
"Note: you may need to restart the kernel to use updated packages.\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"%pip install --upgrade pip\n",
|
|||
|
|
"%pip install --upgrade jupyter\n",
|
|||
|
|
"%pip install --upgrade ipywidgets\n",
|
|||
|
|
"%pip install --upgrade ipykernel\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Install necessary packages\n",
|
|||
|
|
"%pip install transformers torch pdf2image pillow jiwer paddleocr hf_xet paddlepaddle\n",
|
|||
|
|
"# pdf reading\n",
|
|||
|
|
"%pip install PyMuPDF\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Data analysis and visualization\n",
|
|||
|
|
"%pip install pandas\n",
|
|||
|
|
"%pip install matplotlib\n",
|
|||
|
|
"%pip install seaborn"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 2,
|
|||
|
|
"id": "ae33632a",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# Imports\n",
|
|||
|
|
"import os\n",
|
|||
|
|
"import numpy as np\n",
|
|||
|
|
"import pandas as pd\n",
|
|||
|
|
"import matplotlib.pyplot as plt\n",
|
|||
|
|
"from pdf2image import convert_from_path\n",
|
|||
|
|
"from PIL import Image, ImageOps\n",
|
|||
|
|
"import torch\n",
|
|||
|
|
"from jiwer import wer, cer\n",
|
|||
|
|
"from paddleocr import PaddleOCR\n",
|
|||
|
|
"import fitz # PyMuPDF\n",
|
|||
|
|
"import re\n",
|
|||
|
|
"from datetime import datetime"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "0e00f1b0",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## 1 Configuration"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 3,
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"PDF_FOLDER = './instructions' # Folder containing PDF files\n",
|
|||
|
|
"OUTPUT_FOLDER = 'results'\n",
|
|||
|
|
"os.makedirs(OUTPUT_FOLDER, exist_ok=True)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 4,
|
|||
|
|
"id": "243849b9",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stderr",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"C:\\Users\\sji\\AppData\\Local\\Temp\\ipykernel_5520\\2286581791.py:7: UserWarning: `lang` and `ocr_version` will be ignored when model names or model directories are not `None`.\n",
|
|||
|
|
" paddleocr_model = PaddleOCR(\n",
|
|||
|
|
"c:\\Users\\sji\\Desktop\\MastersThesis\\.venv\\Lib\\site-packages\\paddle\\utils\\cpp_extension\\extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md\n",
|
|||
|
|
" warnings.warn(warning_message)\n",
|
|||
|
|
"\u001b[32mCreating model: ('PP-LCNet_x1_0_textline_ori', None)\u001b[0m\n",
|
|||
|
|
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-LCNet_x1_0_textline_ori`.\u001b[0m\n",
|
|||
|
|
"\u001b[32mCreating model: ('PP-OCRv5_server_det', None)\u001b[0m\n",
|
|||
|
|
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-OCRv5_server_det`.\u001b[0m\n",
|
|||
|
|
"\u001b[32mCreating model: ('PP-OCRv5_server_rec', None)\u001b[0m\n",
|
|||
|
|
"\u001b[32mModel files already exist. Using cached files. To redownload, please delete the directory manually: `C:\\Users\\sji\\.paddlex\\official_models\\PP-OCRv5_server_rec`.\u001b[0m\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# 3. PaddleOCR \n",
|
|||
|
|
"# https://www.paddleocr.ai/v3.0.0/en/version3.x/pipeline_usage/OCR.html?utm_source=chatgpt.com#21-command-line\n",
|
|||
|
|
"from paddleocr import PaddleOCR\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Initialize with better settings for Spanish/Latin text\n",
|
|||
|
|
"# https://www.paddleocr.ai/main/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html?utm_source=chatgpt.com#5-models-and-their-supported-languages\n",
|
|||
|
|
"paddleocr_model = PaddleOCR(\n",
|
|||
|
|
" text_detection_model_name=\"PP-OCRv5_server_det\",\n",
|
|||
|
|
" text_recognition_model_name=\"PP-OCRv5_server_rec\",\n",
|
|||
|
|
" use_doc_orientation_classify=False,\n",
|
|||
|
|
" use_doc_unwarping=False,\n",
|
|||
|
|
" use_textline_orientation=True,\n",
|
|||
|
|
" lang='es',\n",
|
|||
|
|
")"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 5,
|
|||
|
|
"id": "329da34a",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"3.3.1\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"import paddleocr\n",
|
|||
|
|
"\n",
|
|||
|
|
"print(paddleocr.__version__)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 6,
|
|||
|
|
"id": "6082e2df",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Exported: paddleocr_pipeline_dump.yaml\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# 1) Dump the active paddlex pipeline config to a YAML file\n",
|
|||
|
|
"yaml_path = \"paddleocr_pipeline_dump.yaml\"\n",
|
|||
|
|
"paddleocr_model.export_paddlex_config_to_yaml(yaml_path)\n",
|
|||
|
|
"print(\"Exported:\", yaml_path)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 7,
|
|||
|
|
"id": "b1541bb6",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"c:\\Users\\sji\\Desktop\\MastersThesis\\.venv\\Lib\\site-packages\\paddleocr\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"# 1) Locate the installed PaddleOCR package\n",
|
|||
|
|
"pkg_dir = os.path.dirname(paddleocr.__file__)\n",
|
|||
|
|
"print(pkg_dir)"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "84c999e2",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## 2 Helper Functions"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 8,
|
|||
|
|
"id": "d8bddf8f",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"# preprocess_image.py\n",
|
|||
|
|
"import cv2\n",
|
|||
|
|
"import numpy as np\n",
|
|||
|
|
"\n",
|
|||
|
|
"def preprocess_for_ocr(pil_image):\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Preprocesamiento optimizado para PaddleOCR\n",
|
|||
|
|
" \n",
|
|||
|
|
" Args:\n",
|
|||
|
|
" pil_image (PIL.Image.Image): Imagen PIL\n",
|
|||
|
|
" \n",
|
|||
|
|
" Returns:\n",
|
|||
|
|
" PIL.Image.Image: Imagen preprocesada en formato RGB\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Convertir PIL Image a numpy array\n",
|
|||
|
|
" img_array = np.array(pil_image)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Si la imagen es RGBA, convertir a RGB\n",
|
|||
|
|
" if img_array.shape[-1] == 4:\n",
|
|||
|
|
" img_array = cv2.cvtColor(img_array, cv2.COLOR_RGBA2RGB)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Si la imagen es RGB, convertir a BGR para OpenCV\n",
|
|||
|
|
" if len(img_array.shape) == 3 and img_array.shape[-1] == 3:\n",
|
|||
|
|
" img_bgr = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)\n",
|
|||
|
|
" else:\n",
|
|||
|
|
" img_bgr = img_array\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Convertir a escala de grises\n",
|
|||
|
|
" gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Upscaling si es necesario\n",
|
|||
|
|
" height, width = gray.shape\n",
|
|||
|
|
" if height < 1000:\n",
|
|||
|
|
" scale = 1500 / height\n",
|
|||
|
|
" new_width = int(width * scale)\n",
|
|||
|
|
" new_height = int(height * scale)\n",
|
|||
|
|
" gray = cv2.resize(gray, (new_width, new_height), \n",
|
|||
|
|
" interpolation=cv2.INTER_CUBIC)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Binarización adaptativa\n",
|
|||
|
|
" binary = cv2.adaptiveThreshold(\n",
|
|||
|
|
" gray, 255, \n",
|
|||
|
|
" cv2.ADAPTIVE_THRESH_GAUSSIAN_C, \n",
|
|||
|
|
" cv2.THRESH_BINARY, \n",
|
|||
|
|
" 11, 2\n",
|
|||
|
|
" )\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Denoise\n",
|
|||
|
|
" denoised = cv2.fastNlMeansDenoising(binary, None, 10, 7, 21)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Dilate\n",
|
|||
|
|
" kernel = np.ones((1,1), np.uint8)\n",
|
|||
|
|
" dilated = cv2.dilate(denoised, kernel, iterations=1)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Convertir a RGB\n",
|
|||
|
|
" rgb_img = cv2.cvtColor(dilated, cv2.COLOR_GRAY2RGB)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Convertir de vuelta a PIL Image\n",
|
|||
|
|
" pil_img = Image.fromarray(rgb_img)\n",
|
|||
|
|
" \n",
|
|||
|
|
" return pil_img # PIL.Image.Image en modo RGB"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 9,
|
|||
|
|
"id": "9596c7df",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"from typing import List, Optional\n",
|
|||
|
|
"\n",
|
|||
|
|
"def show_page(img: Image.Image, text: str, scale: float = 1):\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Displays a smaller version of the image with text as a footer.\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" # Compute plot size based on image dimensions (but without resizing the image)\n",
|
|||
|
|
" w, h = img.size\n",
|
|||
|
|
" figsize = (w * scale / 100, h * scale / 100) # convert pixels to inches approx\n",
|
|||
|
|
"\n",
|
|||
|
|
" fig, ax = plt.subplots(figsize=figsize)\n",
|
|||
|
|
" ax.imshow(img)\n",
|
|||
|
|
" ax.axis(\"off\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Add OCR text below the image (footer)\n",
|
|||
|
|
" # plt.figtext(0.5, 0.02, text.strip(), wrap=True, ha='center', va='bottom', fontsize=10)\n",
|
|||
|
|
" plt.tight_layout()\n",
|
|||
|
|
" plt.show()\n",
|
|||
|
|
"\n",
|
|||
|
|
"def pdf_to_images(pdf_path: str, dpi: int = 300, pages: List[int] = None) -> List[Image.Image]:\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Render a PDF into a list of PIL Images using PyMuPDF or pdf2image.\n",
|
|||
|
|
" 'pages' is 1-based (e.g., range(1, 10) -> pages 1–9).\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" images = []\n",
|
|||
|
|
"\n",
|
|||
|
|
" if fitz is not None:\n",
|
|||
|
|
" doc = fitz.open(pdf_path)\n",
|
|||
|
|
" total_pages = len(doc)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Adjust page indices (PyMuPDF uses 0-based indexing)\n",
|
|||
|
|
" if pages is None:\n",
|
|||
|
|
" page_indices = list(range(total_pages))\n",
|
|||
|
|
" else:\n",
|
|||
|
|
" # Filter out invalid pages and convert to 0-based\n",
|
|||
|
|
" page_indices = [p - 1 for p in pages if 1 <= p <= total_pages]\n",
|
|||
|
|
"\n",
|
|||
|
|
" for i in page_indices:\n",
|
|||
|
|
" page = doc.load_page(i)\n",
|
|||
|
|
" mat = fitz.Matrix(dpi / 72.0, dpi / 72.0)\n",
|
|||
|
|
" pix = page.get_pixmap(matrix=mat, alpha=False)\n",
|
|||
|
|
" img = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n",
|
|||
|
|
" \n",
|
|||
|
|
" images.append(img)\n",
|
|||
|
|
" doc.close()\n",
|
|||
|
|
" else:\n",
|
|||
|
|
" raise RuntimeError(\"Install PyMuPDF or pdf2image to convert PDFs.\")\n",
|
|||
|
|
"\n",
|
|||
|
|
" return images\n",
|
|||
|
|
"\n",
|
|||
|
|
"def pdf_extract_text(pdf_path, page_num, line_tolerance=15) -> str:\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Extracts text from a specific PDF page in proper reading order.\n",
|
|||
|
|
" Adds '\\n' when blocks are vertically separated more than line_tolerance.\n",
|
|||
|
|
" Removes bullet-like characters (, •, ▪, etc.).\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" doc = fitz.open(pdf_path)\n",
|
|||
|
|
"\n",
|
|||
|
|
" if page_num < 1 or page_num > len(doc):\n",
|
|||
|
|
" return \"\"\n",
|
|||
|
|
"\n",
|
|||
|
|
" page = doc[page_num - 1]\n",
|
|||
|
|
" blocks = page.get_text(\"blocks\") # (x0, y0, x1, y1, text, block_no, block_type)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Sort blocks: top-to-bottom, left-to-right\n",
|
|||
|
|
" blocks_sorted = sorted(blocks, key=lambda b: (b[1], b[0]))\n",
|
|||
|
|
"\n",
|
|||
|
|
" text_lines = []\n",
|
|||
|
|
" last_y = None\n",
|
|||
|
|
"\n",
|
|||
|
|
" for b in blocks_sorted:\n",
|
|||
|
|
" y0 = b[1]\n",
|
|||
|
|
" text_block = b[4].strip()\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Remove bullet-like characters\n",
|
|||
|
|
" text_block = re.sub(r\"[•▪◦●❖▶■]\", \"\", text_block)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # If new line (based on vertical gap)\n",
|
|||
|
|
" if last_y is not None and abs(y0 - last_y) > line_tolerance:\n",
|
|||
|
|
" text_lines.append(\"\") # blank line for spacing\n",
|
|||
|
|
"\n",
|
|||
|
|
" text_lines.append(text_block.strip())\n",
|
|||
|
|
" last_y = y0\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Join all lines with real newlines\n",
|
|||
|
|
" text = \"\\n\".join(text_lines)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Normalize spaces\n",
|
|||
|
|
" text = re.sub(r\"\\s*\\n\\s*\", \"\\n\", text).strip() # remove spaces around newlines\n",
|
|||
|
|
" text = re.sub(r\" +\", \" \", text).strip() # collapse multiple spaces to one\n",
|
|||
|
|
" text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text).strip() # avoid triple blank lines\n",
|
|||
|
|
"\n",
|
|||
|
|
" doc.close()\n",
|
|||
|
|
" return text\n",
|
|||
|
|
"\n",
|
|||
|
|
"def evaluate_text(reference, prediction):\n",
|
|||
|
|
" return {'WER': wer(reference, prediction), 'CER': cer(reference, prediction)}"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "a79cc4bf",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## 3 Model wrapers"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 10,
|
|||
|
|
"id": "c56f3de2",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [],
|
|||
|
|
"source": [
|
|||
|
|
"import numpy as np\n",
|
|||
|
|
"import re\n",
|
|||
|
|
"\n",
|
|||
|
|
"def _normalize_box_xyxy(box):\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Accepts:\n",
|
|||
|
|
" - [[x,y],[x,y],[x,y],[x,y]] (quad)\n",
|
|||
|
|
" - [x0, y0, x1, y1] (flat)\n",
|
|||
|
|
" - [x0, y0, x1, y1, x2, y2, x3, y3] (flat quad)\n",
|
|||
|
|
" Returns (x0, y0, x1, y1)\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" # Quad as list of points?\n",
|
|||
|
|
" if isinstance(box, (list, tuple)) and box and isinstance(box[0], (list, tuple)):\n",
|
|||
|
|
" xs = [p[0] for p in box]\n",
|
|||
|
|
" ys = [p[1] for p in box]\n",
|
|||
|
|
" return min(xs), min(ys), max(xs), max(ys)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Flat list\n",
|
|||
|
|
" if isinstance(box, (list, tuple)):\n",
|
|||
|
|
" if len(box) == 4:\n",
|
|||
|
|
" x0, y0, x1, y1 = box\n",
|
|||
|
|
" # ensure order\n",
|
|||
|
|
" return min(x0, x1), min(y0, y1), max(x0, x1), max(y0, y1)\n",
|
|||
|
|
" if len(box) == 8:\n",
|
|||
|
|
" xs = box[0::2]\n",
|
|||
|
|
" ys = box[1::2]\n",
|
|||
|
|
" return min(xs), min(ys), max(xs), max(ys)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Fallback\n",
|
|||
|
|
" raise ValueError(f\"Unrecognized box format: {box!r}\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"def ocr_paddle(img, image_array, min_score=0.0, line_tol_factor=0.6):\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" Robust line grouping for PaddleOCR outputs:\n",
|
|||
|
|
" - normalizes boxes to (x0,y0,x1,y1)\n",
|
|||
|
|
" - adaptive line tolerance based on median box height\n",
|
|||
|
|
" - optional confidence filter\n",
|
|||
|
|
" - inserts '\\n' between lines and preserves left→right order\n",
|
|||
|
|
" \"\"\"\n",
|
|||
|
|
" result = paddleocr_model.predict(image_array)\n",
|
|||
|
|
"\n",
|
|||
|
|
" boxes_all = [] # (x0, y0, x1, y1, y_mid, text, score)\n",
|
|||
|
|
" for item in result:\n",
|
|||
|
|
" res = item.json.get(\"res\", {})\n",
|
|||
|
|
" boxes = res.get(\"rec_boxes\", []) or [] # be defensive\n",
|
|||
|
|
" texts = res.get(\"rec_texts\", []) or []\n",
|
|||
|
|
" scores = res.get(\"rec_scores\", None)\n",
|
|||
|
|
"\n",
|
|||
|
|
" for i, (box, text) in enumerate(zip(boxes, texts)):\n",
|
|||
|
|
" try:\n",
|
|||
|
|
" x0, y0, x1, y1 = _normalize_box_xyxy(box)\n",
|
|||
|
|
" except Exception:\n",
|
|||
|
|
" # Skip weird boxes gracefully\n",
|
|||
|
|
" continue\n",
|
|||
|
|
"\n",
|
|||
|
|
" y_mid = 0.5 * (y0 + y1)\n",
|
|||
|
|
" score = float(scores[i]) if (scores is not None and i < len(scores)) else 1.0\n",
|
|||
|
|
"\n",
|
|||
|
|
" t = re.sub(r\"\\s+\", \" \", str(text)).strip()\n",
|
|||
|
|
" if not t:\n",
|
|||
|
|
" continue\n",
|
|||
|
|
"\n",
|
|||
|
|
" boxes_all.append((x0, y0, x1, y1, y_mid, t, score))\n",
|
|||
|
|
"\n",
|
|||
|
|
" if min_score > 0:\n",
|
|||
|
|
" boxes_all = [b for b in boxes_all if b[6] >= min_score]\n",
|
|||
|
|
"\n",
|
|||
|
|
" if not boxes_all:\n",
|
|||
|
|
" return \"\"\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Adaptive line tolerance\n",
|
|||
|
|
" heights = [b[3] - b[1] for b in boxes_all]\n",
|
|||
|
|
" median_h = float(np.median(heights)) if heights else 20.0\n",
|
|||
|
|
" line_tol = max(8.0, line_tol_factor * median_h)\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Sort by vertical mid, then x0\n",
|
|||
|
|
" boxes_all.sort(key=lambda b: (b[4], b[0]))\n",
|
|||
|
|
"\n",
|
|||
|
|
" # Group into lines\n",
|
|||
|
|
" lines, cur, last_y = [], [], None\n",
|
|||
|
|
" for x0, y0, x1, y1, y_mid, text, score in boxes_all:\n",
|
|||
|
|
" if last_y is None or abs(y_mid - last_y) <= line_tol:\n",
|
|||
|
|
" cur.append((x0, text))\n",
|
|||
|
|
" else:\n",
|
|||
|
|
" cur.sort(key=lambda t: t[0])\n",
|
|||
|
|
" lines.append(\" \".join(t[1] for t in cur))\n",
|
|||
|
|
" cur = [(x0, text)]\n",
|
|||
|
|
" last_y = y_mid\n",
|
|||
|
|
"\n",
|
|||
|
|
" if cur:\n",
|
|||
|
|
" cur.sort(key=lambda t: t[0])\n",
|
|||
|
|
" lines.append(\" \".join(t[1] for t in cur))\n",
|
|||
|
|
"\n",
|
|||
|
|
" res = \"\\n\".join(lines)\n",
|
|||
|
|
" res = re.sub(r\"\\s+\\n\", \"\\n\", res).strip()\n",
|
|||
|
|
" return res\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "e42cae29",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## 4 Run AI OCR Benchmark"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 11,
|
|||
|
|
"id": "9b55c154",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qm8btX8P/CtedKAikqIokFII4VmRCONlCHKFClCkrkMP0PRICrzlKGQMhYVpWjQZCgKadQ8SPV/fdb5r9M+z32ec55z7z536v1+ve6L7j3nefZee+01ftdaD3vggQceaAAAAAAAAAAAAKBD83T5YQAAAAAAAAAAABCC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAAAAAAAoHOC0wAAAAAAAAAAAOic4DQAAAAAAAAAAAA6JzgNAAAAAAAAAACAzglOAwAAAAA
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 2481x3508 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ref: \n",
|
|||
|
|
"Referencias bibliográficas: el trabajo debe incluir una sección de bibliografía, en\n",
|
|||
|
|
"la que aparezcan en formato APA los detalles de todos los documentos a los que\n",
|
|||
|
|
"se haga referencia en el TFE.\n",
|
|||
|
|
"Anexos: estos son apartados opcionales que contienen cuestionarios, encuestas,\n",
|
|||
|
|
"resultados de pilotos, documentos adicionales, capturas de pantalla, y otros\n",
|
|||
|
|
"elementos que complementan o amplían la información del trabajo. Los anexos se\n",
|
|||
|
|
"diferencian empleando una letra (Anexo A, Anexo B…).\n",
|
|||
|
|
"En el punto 2 se describen con mayor detalle cada uno de los apartados del TFE.\n",
|
|||
|
|
"La extensión mínima en un TFE individual es de 50 páginas y la máxima de 90 páginas,\n",
|
|||
|
|
"sin contar portada, índices y anexos.\n",
|
|||
|
|
"En el caso de un TFE grupal, la extensión del TFE debe garantizar que cada uno de los\n",
|
|||
|
|
"integrantes del equipo ha dedicado a la elaboración y defensa del trabajo las\n",
|
|||
|
|
"competencias previstas en la memoria. A modo orientativo, y para garantizar la calidad\n",
|
|||
|
|
"y dedicación que requiere un TFE grupal, la extensión mínima ha de ser un 50% superior\n",
|
|||
|
|
"a lo previsto para un TFE individual. Por lo tanto, la extensión mínima en un TFE grupal\n",
|
|||
|
|
"es de 75 páginas.\n",
|
|||
|
|
"1.3. Formatos y plantilla de trabajo\n",
|
|||
|
|
"Para la elaboración del TFE, ya sea individual o grupal, debes utilizar la plantilla\n",
|
|||
|
|
"disponible en el aula virtual. No modifiques los estilos definidos en esta plantilla:\n",
|
|||
|
|
"márgenes, interlineado, tipos de letra, etc. Para cualquier duda que pueda surgirte,\n",
|
|||
|
|
"puedes consultar esos formatos a continuación.\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"El trabajo deberá estar escrito en tamaño de página A4 con los siguientes\n",
|
|||
|
|
"márgenes:\n",
|
|||
|
|
"Izquierdo: 3,0 cm.\n",
|
|||
|
|
"Derecho: 2,0 cm.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"5\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n",
|
|||
|
|
"paddle_text: \n",
|
|||
|
|
"Referencias bibliográficas: el trabajo debe incluir una sección de bibliografía, en\n",
|
|||
|
|
"la que aparezcan en formato APA los detalles de todos los documentos a los que\n",
|
|||
|
|
"se haga referencia en el TFE.\n",
|
|||
|
|
"Anexos: estos son apartados opcionales que contienen cuestionarios, encuestas,\n",
|
|||
|
|
"resultados de pilotos, documentos adicionales, capturas de pantalla,y otros\n",
|
|||
|
|
"elementos que complementan o amplían la información del trabajo. Los anexos se\n",
|
|||
|
|
"diferencian empleando una letra (Anexo A, Anexo B.….).\n",
|
|||
|
|
"En el punto 2 se describen con mayor detalle cada uno de los apartados del TfE.\n",
|
|||
|
|
"La extensión mínima en un TfE individual es de 50 páginas y la máxima de 90 páginas,\n",
|
|||
|
|
"sin contar portada,índices y anexos.\n",
|
|||
|
|
"En el caso de e un TFE grupal, la extensión del TFE debe garantizar que cada uno de los\n",
|
|||
|
|
"integrantes del equipo ha dedicado a I la elaboración y defensa del trabajo las\n",
|
|||
|
|
"competencias previstas en la memoria. A modo orientativo,y para garantizar la calidad\n",
|
|||
|
|
"y dedicación que requiere un TFE grupal, la extensión mínima ha de ser un 50% superior\n",
|
|||
|
|
"a lo previsto para un TFE individual. Por lo tanto, la extensión mínima en un TFE grupal\n",
|
|||
|
|
"es de 75 páginas.\n",
|
|||
|
|
"1.3. Formatos y plantilla de trabajo\n",
|
|||
|
|
"Para la elaboración del TFE,ya sea individual o grupal, debes utilizar la plantilla\n",
|
|||
|
|
"disponible en el aula virtual. No modifiques los estilos definidos en esta plantilla:\n",
|
|||
|
|
"margenes, interlineado, tipos de letra, etc. Para cualquier duda que pueda surgirte,\n",
|
|||
|
|
"puedes consultar esos formatos a continuacion.\n",
|
|||
|
|
"El trabajo deberá estar escrito en tamaño de página A4 con los siguientes\n",
|
|||
|
|
"márgenes:\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNiR) Izquierdo: 3,0 cm.\n",
|
|||
|
|
"Derecho: 2,0 cm.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TfE 5\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe4HVW9N+AFhJaQhBIg9N57770pKoKKgiKK91pQr+V6vfbeRdRr7w1QsIAIgvTee+81CT0hQEIJJd/zWzLn2+dwWpKBtPd9ngM5Ze89s2bNmtkzv/1f80ydOnVqAQAAAAAAAAAAgBbN2+aTAQAAAAAAAAAAQAinAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHXCaQAAAAAAAAAAALROOA0AAAAAAAAAAIDWCacBAAAAAAAAAADQOuE0AAAAAAAAAAAAWiecBgAAAAAAAAAAQOuE0wAAAAAAAAAAAGidcBoAAAAAAAAAAACtE04DAAAAAAAAAACgdcJpAAAAAAAAAAAAtE44DQAAAAAAAAAAgNYJpwEAAAAAAAAAANA64TQAAAAAAAAAAABaJ5wGAAAAAAAAAABA64TTAAAAAAAAAAAAaJ1wGgAAAAAAAAAAAK0TTgMAAAAAAAAAAKB1wmkAAAAAAAAAAAC0TjgNAAAAAAAAAACA1gmnAQAAAAAAAAAA0DrhNAAAAAAAAAAAAFonnAYAAAAAAAAAAEDrhNMAAAAAAAAAAABonXAaAAAAAAAAAAAArRNOAwAAAAAAAAAAoHX
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 2481x3508 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ref: \n",
|
|||
|
|
"Superior e inferior: 2,5 cm.\n",
|
|||
|
|
"Formato de párrafo en texto principal (estilo de la plantilla “Normal”):\n",
|
|||
|
|
"Calibri 12, justificado, interlineado 1,5, espacio entre párrafos 6 puntos\n",
|
|||
|
|
"anterior y 6 puntos posterior, sin sangría.\n",
|
|||
|
|
"Títulos:\n",
|
|||
|
|
"Primer nivel (estilo de la plantilla “Título 1”): Calibri Light 18, azul, justificado,\n",
|
|||
|
|
"interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6 puntos\n",
|
|||
|
|
"posterior, sin sangría.\n",
|
|||
|
|
"Segundo nivel (estilo de la plantilla “Título 2”): Calibri Light 14, azul,\n",
|
|||
|
|
"justificado, interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6\n",
|
|||
|
|
"puntos posterior, sin sangría.\n",
|
|||
|
|
"Tercer nivel (estilo de la plantilla “Título 3”: Calibri Light 12, justificado,\n",
|
|||
|
|
"interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6 puntos\n",
|
|||
|
|
"posterior, sin sangría.\n",
|
|||
|
|
"Notas al pie:\n",
|
|||
|
|
"Calibri 10, justificado, interlineado sencillo, espacio entre párrafos 0 puntos\n",
|
|||
|
|
"anterior y 0 puntos posterior, sin sangría.\n",
|
|||
|
|
"Tablas y figuras:\n",
|
|||
|
|
"Título en la parte superior de la tabla o figura.\n",
|
|||
|
|
"Numeración tabla o figura (Tabla 1/ Figura1): Calibri 12, negrita, justificado.\n",
|
|||
|
|
"Nombre tabla o figura: Calibri 12, cursiva, justificado.\n",
|
|||
|
|
"Cuerpo: la tipografía de las tablas o figuras se pueden reducir hasta los 9\n",
|
|||
|
|
"puntos si estas contienen mucha información. Si la tabla o figura es muy\n",
|
|||
|
|
"grande, también se puede colocar en apaisado dentro de la hoja.\n",
|
|||
|
|
"Fuente de la tabla o figura en la parte inferior. Calibri 9,5, centrado.\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"Encabezado y pie de página:\n",
|
|||
|
|
"Todas las páginas llevarán un encabezado con el nombre completo del\n",
|
|||
|
|
"estudiante y el título del TFE.\n",
|
|||
|
|
"Todas las páginas llevarán también un pie de página con el número de página.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"6\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n",
|
|||
|
|
"paddle_text: \n",
|
|||
|
|
"Superior e inferior: 2,5 cm.\n",
|
|||
|
|
"Formato de párrafo en texto principal (estilo de la plantilla “Normal\"):\n",
|
|||
|
|
"Calibri 12, justificado, interlineado 1,5, espacio entre párrafos 6 puntos\n",
|
|||
|
|
"anterior y 6 puntos posterior, sin sangría.\n",
|
|||
|
|
"Títulos:\n",
|
|||
|
|
"Primer nivel (estilo de la plantillaTítulo 1\"): Calibri Light 18, azul, justificado,\n",
|
|||
|
|
"interlineado 1,5,espacio entre párrafos 6 puntos anterior y 6 puntos\n",
|
|||
|
|
"posterior, sin sangría.\n",
|
|||
|
|
"Segundo nivel (estilo de la plantilla Titulo 2\"): Calibri Light 14, azul,\n",
|
|||
|
|
"justificado, interlineado 1,5, espacio entre párrafos 6 puntos anterior y 6\n",
|
|||
|
|
"puntos posterior, sin sangría.\n",
|
|||
|
|
"Tercer nivel (estilo de la plantilla Título 3\": Calibri Light 12, justificado,\n",
|
|||
|
|
"interlineado 1,5,espacio entre párrafos 6 puntos anterior y 6 puntos\n",
|
|||
|
|
"posterior, sin sangría.\n",
|
|||
|
|
"Notas al pie:\n",
|
|||
|
|
"Calibri 10, justificado, interlineado sencillo, espacio entre párrafos O puntos\n",
|
|||
|
|
"anterior y O puntos posterior, sin sangra.\n",
|
|||
|
|
"Tablas y figuras:\n",
|
|||
|
|
"Título en la parte superior de la tabla o figura.\n",
|
|||
|
|
"Numeración tabla o figura (Tabla 1/ Figura1): Calibri 12, negrita, justificado.\n",
|
|||
|
|
"Nombre tabla o figura: Calibri 12, cursiva, justificado.\n",
|
|||
|
|
"Cuerpo: la tipografía de las tablas o figuras se pueden reducir hasta los 9\n",
|
|||
|
|
"puntos si estas contienen mucha información. Si la tabla o figura es muy\n",
|
|||
|
|
"grande, también se puede colocar en apaisado dentro de la hoja.\n",
|
|||
|
|
"Fuente de la tabla o figura en la parte inferior. Calibri 9,5, centrado.\n",
|
|||
|
|
"Encabezado y pie de página:\n",
|
|||
|
|
"Todas las páginas llevarán un encabezado con el nombre completo del\n",
|
|||
|
|
"estudiante y el título del TFE.\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNiR)\n",
|
|||
|
|
"Todas las páginas llevarán también un pie de página con el número de página.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial 9\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe8XkWZP/AJhJaEXkLvEHrvVVFXsdBsWLFjW9eylt11bes2/+paVtfeUBQbioCIgKJ0pPfeOwRI6KT8P7+597x575tbw0kBvt/9ZPEm9773nDnnzMyZeeaZcbNnz55dAAAAAAAAAAAAoEWLtflhAAAAAAAAAAAAEILTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAAAAACgdYLTAAAAAAAAAAAAaJ3gNAAAAAAAAAAAAFonOA0AAAAAAAAAAIDWCU4DAAAAAAA
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 2481x3508 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ref: \n",
|
|||
|
|
"Los borradores intermedios deberán entregarse en formato Word. El documento final\n",
|
|||
|
|
"deberá depositarse en formato PDF.\n",
|
|||
|
|
"1.4. Estética y estilo de redacción\n",
|
|||
|
|
"Es fundamental que el TFE presente un aspecto elegante y correcto. Se trata de un\n",
|
|||
|
|
"trabajo académico y debe reflejar la madurez y el nivel formativo de una persona que\n",
|
|||
|
|
"ha finalizado un estudio de grado o postgrado. Ten en cuenta las siguientes\n",
|
|||
|
|
"recomendaciones en todas y cada una de las entregas que realices y, en especial, en\n",
|
|||
|
|
"el depósito final del documento:\n",
|
|||
|
|
"Verifica la originalidad del documento, asegurándote de que citas todas las\n",
|
|||
|
|
"fuentes consultadas y no existen textos de autoría ajena sin referenciar\n",
|
|||
|
|
"correctamente.\n",
|
|||
|
|
"Cuida la presentación del trabajo. Comprueba que formatos como tipo y tamaño\n",
|
|||
|
|
"de letra, número de páginas, encabezados, justificación de párrafos, interlineado,\n",
|
|||
|
|
"etc., son correctos.\n",
|
|||
|
|
"Revisa la ortografía y la redacción. Utiliza el corrector de Word para asegurarte de\n",
|
|||
|
|
"que no has dejado ninguna errata. Una lectura detenida del documento también\n",
|
|||
|
|
"te ayudará a detectar erratas, omisiones o redundancias. Si es posible, pide a\n",
|
|||
|
|
"alguien cercano que lo lea y te dé su opinión sobre la redacción. Presta especial\n",
|
|||
|
|
"atención a los siguientes aspectos:\n",
|
|||
|
|
"-\n",
|
|||
|
|
"Que los párrafos sigan un orden o hilo argumental lógico.\n",
|
|||
|
|
"-\n",
|
|||
|
|
"Que la información se presente de una manera que facilite su\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"comprensión, definiendo los conceptos necesarios e incluyendo las citas\n",
|
|||
|
|
"bibliográficas pertinentes.\n",
|
|||
|
|
"-\n",
|
|||
|
|
"Elimina párrafos demasiado cortos. Cada párrafo debería tener al menos\n",
|
|||
|
|
"tres oraciones.\n",
|
|||
|
|
"-\n",
|
|||
|
|
"Elimina frases superfluas y repeticiones de ideas.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"7\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n",
|
|||
|
|
"paddle_text: \n",
|
|||
|
|
"Los borradores intermedios deberán entregarse en formato Word. El documento final\n",
|
|||
|
|
"deberá depositarse en formato PDf.\n",
|
|||
|
|
"1.4. Estética y estilo de redacción\n",
|
|||
|
|
"Es fundamental que el TFE presente un aspecto elegante y correcto. Se trata de un\n",
|
|||
|
|
"trabajo académico y debe reflejar la madurez y el nivel formativo de una persona que\n",
|
|||
|
|
"ha finalizado un estudio de grado o postgrado. Ten en cuenta las siguientes\n",
|
|||
|
|
"recomendaciones en todas y cada una de las entregas que realices y, en especial, en\n",
|
|||
|
|
"el deposito final del documento:\n",
|
|||
|
|
"Verifica la originalidad del documento,asegurándote de que citas todas las\n",
|
|||
|
|
"fuentes consultadas y no existen textos de autoría ajena sin referenciar\n",
|
|||
|
|
"correctamente.\n",
|
|||
|
|
"Cuida la presentación del trabajo. Comprueba que formatos como tipo y tamaño\n",
|
|||
|
|
"de letra, número de páginas, encabezados, justificación de párrafos, interlineado,\n",
|
|||
|
|
"etc., son correctos.\n",
|
|||
|
|
"Revisa la ortografía y la redacción. Utiliza el corrector de Word para asegurarte de\n",
|
|||
|
|
"que no has dejado ninguna errata. Una lectura detenida del documento también\n",
|
|||
|
|
"te ayudará a detectar erratas, omisiones o redundancias. Si es posible, pide a\n",
|
|||
|
|
"alguien cercano que lo lea y te dé su opinión sobre la redacción. Presta especial\n",
|
|||
|
|
"atención a los siguientes aspectos:\n",
|
|||
|
|
"Que los párrafos sigan un orden o hilo argumental lógico.\n",
|
|||
|
|
"Que la información se presente de una manera que facilite su\n",
|
|||
|
|
"comprensión, definiendo los conceptos necesarios e incluyendo las citas\n",
|
|||
|
|
"bibliograficas pertinentes.\n",
|
|||
|
|
"Elimina párrafos demasiado cortos. Cada párrafo debería tener al menos\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNiR) tres oraciones.\n",
|
|||
|
|
"Elimina frases superfluas y repeticiones de ideas.\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TfE 7\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3QeUJUX5N+AmgwQBySA5SY6CmAUxgaCogIABFJWkYiAIKiJiQjFnFCPmjH+zCAoKEgXJUUAByTl+51fz1WzP3Ql3Zns2Ps85e2B3Z+/trq6uqq56+625HnvssccaAAAAAAAAAAAA6NDcXX4YAAAAAAAAAAAAhOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAADonOA0AAAAAAAAAAAAOic4DQAAAAAAAAAAgM4JTgMAAAAAAAAAAKBzgtMAAAAAAAAAAAD
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 2481x3508 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ref: \n",
|
|||
|
|
"-\n",
|
|||
|
|
"Escribe siempre al menos un párrafo de introducción en cada capítulo o\n",
|
|||
|
|
"apartado, explicando de qué vas a tratar en esa sección. Evita que\n",
|
|||
|
|
"aparezcan dos encabezados de nivel consecutivos sin ningún texto entre\n",
|
|||
|
|
"medias.\n",
|
|||
|
|
"Repasa las citas bibliográficas. Comprueba que todas ellas son correctas y siguen\n",
|
|||
|
|
"la normativa que exige la titulación.\n",
|
|||
|
|
"Asegúrate de que las figuras y las tablas se ven clara y correctamente, e incluyen\n",
|
|||
|
|
"número y título, así como su procedencia o fuente.\n",
|
|||
|
|
"Comprueba que los índices se generan correctamente.\n",
|
|||
|
|
"1.5. Normativa de citas\n",
|
|||
|
|
"En esta titulación se cita de acuerdo con la normativa APA.\n",
|
|||
|
|
"Recuerda que tienes una guía con explicaciones y ejemplos en el apartado Citas y\n",
|
|||
|
|
"bibliografía del aula virtual: https://bibliografiaycitas.unir.net/\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"8\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n",
|
|||
|
|
"paddle_text: \n",
|
|||
|
|
"Escribe siempre al menos un párrafo de introducción en cada capítulo o\n",
|
|||
|
|
"apartado,explicando de qué vas a tratar en esa sección. Evita que\n",
|
|||
|
|
"aparezcan dos encabezados de nivel consecutivos sin ningún texto entre\n",
|
|||
|
|
"medias.\n",
|
|||
|
|
"Repasa las citas bibliográficas. Comprueba que todas ellas son correctas y siguen\n",
|
|||
|
|
"la normativa que exige la titulación.\n",
|
|||
|
|
"Asegúrate de que las figuras y las tablas se ven clara y correctamente, e incluyen\n",
|
|||
|
|
"número y título, así como su procedencia o fuente.\n",
|
|||
|
|
"Comprueba que los índices se generan correctamente.\n",
|
|||
|
|
"1.5. Normativa adecitas\n",
|
|||
|
|
"En esta titulacióon se cita de acuerdo con la normativa Apa.\n",
|
|||
|
|
"Recuerda que tienes una guía con explicaciones y ejemplos en el apartado Citas y\n",
|
|||
|
|
"bibliografía del aula virtual: https://bibliografiaycitas.unir.net/\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TfE\n",
|
|||
|
|
"Máster Universitario en lnteligencia Artificial ∞\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAACacAAA2dCAYAAADLbTdiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Qe4XNdZL+5vnzOnN+moS5bce4udxOkJENKo91JD+VNvCDWUCyFwIYHkcuntUkO49EDoCZCQQpyeOLbjXiXbkq1eTu9t9v9ZW5JjS3NU52h0Zt73eUSEJM9Zs2ftMnv99vdleZ7nAQAAAAAAAAAAAFXUVM0XAwAAAAAAAAAAgEQ4DQAAAAAAAAAAgKoTTgMAAAAAAAAAAKDqhNMAAAAAAAAAAACoOuE0AAAAAAAAAAAAqk44DQAAAAAAAAAAgKoTTgMAAAAAAAAAAKDqhNMAAAAAAAAAAACoOuE0AAAAAAAAAKBh/edTo1F6513xkn95OMp5XuvhANQV4TQAAAAAAAAAAACqLstzsV8AAAAAAAAAoDHNlyPG5xeilEV0lZoiy7JaDwmgbginAQAAAAAAAAANK9X0OdrMM8XShNMAqkdbTwAAAAAAAACgYX1y70Rc8bcPxDd9+ImnQ2oAVIdwGgAAAAAAAADQsKYWyrFzYj72T87VeigAdUdbTwAAAAAAAACgYY3OluPx0enoammOy3tbtfUEqCLhNAAAAAAAAACgYZXzPObLeTRlWTRnIZwGUEXaegIAAAAAAAAADevOg1PxFe9/LH78Mzsjr/VgAOpMqdYDAAAAAAAAAAColaGZ+fjs/omYL5d9CABVJpwGAAAAAAAAADSs61Z2xB+9fEus6ShFVuvBANSZLM9zVSkBAAAAAAAAgIa0kOcxt5BHlkW0NmWRpd8AUBVN1XkZAAAAAAAAAIDlZ+vwTLzltl3xzocOheo+ANUlnAYAAAAAAAAANKyd47Px/x4ZjPc+MVTroQDUnVKtBwAAAAAAAAAAUCuX9rbFTz9nbWzuaQsNPQGqK8vzXFVKAAAAAAAAAKAhLeR5zC7k0ZRFtDZlkWUiagDVoq0nAAAAAAAAANCw9kzMx19vHYwPPjVa66EA1B3hNAAAAAAAAACgYT08NBU/8dld8Tv37Q+t5wCqq1Tl1wMAAAAAAAAAWDbWd7bE113cF5f1ttV6KAB1J8vzXPAXAAAAAAAAAGhI5TyP2XJetJ5racoiy7JaDwmgbmjrCQAAAAAAAAA0rANTC/H+J0fis/snaj0UgLojnAYAAAAAAAAANKx7Bybj2z+6I952++7Qeg6gukpVfj0AAAAAAAAAgGVjdXspXrWpJ65c2V7roQDUnSzPc8FfAAAAAAAAAAAAqkrlNAAAAAAAAACgYQ3PLMQjw9PR3dIU165sjyzLaj0kgLrRVOsBAAAAAAAAAADUyuf2T8RL3/tovPHjO0LrOYDqUjkNAAAAAAAAAGhYK9ua4yXru+OaFW21HgpA3cnyPBf8BQAAAAAAAAAa0kI5YmphIZqzLNqbM209AapIW08AAAAAAAAAoGGNzM7H3Qcn49Hh6VoPBaDuCKcBAAAAAAAAAA3rnoGpeP1HtsfP374ntJ4DqC7hNAAAAAAAAACgYXW3NMdV/R1xUU9rrYcCUHeyPM8FfwEAAAAAAACAhjQ5X479k3PR1twUGzpLkWVZrYcEUDdUTgMAAAAAAAAAGtbUfDl2jM3E3onZWg8FoO4IpwEAAAAAAAAADevhoen40U/vit++b39oPQdQXcJpAAAAAAAAAEDDKjVlsaK9FD0tzbUeCkDdyfI8F/wFAAAAAAAAABrS0PRCPDg0FT2tTXFDf0dkWVbrIQHUDZXTAAAAAAAAAICGlbJozcUvoTSAahNOAwAAAAAAAAAa1taRmXjrHXviXQ8dCq3nAKpLOA0AAAAAAAAAaFgL5TzG58oxOb9Q66EA1J0sz3PBXwAAAAAAAACgIR2amo8vHJyIle2leP6azsi09wSomlL1XgoAAAAAAAAAYHlpac5iTUdLdLVoPgdQbY6sAAAAAAAAAEDDemxkJt52597404cPhdZzANWlchoAAAAAAAAA0LAm5sqxdXgqOpX3Aai6LM9zwV8AAAAAAAAAoCHtm5yLT+8dj1VtzfElm3oiy7JaDwmgbginAQAAAAAAAAANa3q+HAen56OtOYs17SXhNIAqUpQSAAAAAAAAAGhYDw/PxE9+blf83/sPhNZzANVVqvLrAQAAAAAAAAAsG4Mz8/Hx3WMxNrtQ66EA1B1tPQEAAAAAAACAhrVvci4+s3c8+ttL8SUbu7X1BKgi4TQAAAAAAAAAAACqrqn6LwkAAAAAAAAAsDzcc2gyvuUj2+Pnb98d5Tyv9XAA6opwGgAAAAAAAADQsPZOzsc/PjEUt+4arfVQAOqOtp4AAAAAAAAAQMPaOzEXn9g7FqvaSvHlF/RElmW1HhJA3RBOAwAAAAAAAAAaVjmPWMjzSJG05iyE0wCqSFtPAAAAAAAAAKBhPTA4FW/42I745bv2RV7rwQDUGeE0AAAAAAAAAKBh7Zuci3/aPhy37h6t9VAA6k6p1gMAAAAAAAAAAKiVa1Z2xP996eZY09FStPYEoHqyPM9VpQQAAAAAAAAAGtJ8OY/phXI0Z1m0N2eRZSJqANWirScAAAAAAAAA0LC2jszEz962O/7ogYOhug9AdQmnAQAAAAAAAAANa/fEXPz1tqH4z6dGaj0UgLpTqvUAAAAAAAAAAABq5aKe1vjRG9bE5q7W0NAToLqyPM9VpQQAAAAAAAAAGtJcOY/xuYUoZVl0tzRFlomoAVSLtp4AAAAAAAAAQMPaNT4Xf/bwoXi/tp4AVSecBgAAAAAAAAA0rMdHZ+KX794ff/HIodB6DqC6SlV+PQAAAAAAAACAZWNDZ0t84yUr4tK+9loPBaDuZHmeC/4CAAAAAAAAAA1prpzHxNxCNDdl0V1qiizLaj0kgLqhrScAAAAAAAAA0LAOTs3HB54ajc/tn6j1UADqjnAaAAAAAAAAANCwHhyajh/+9M74jbv3hdZzANUlnAYAAAAAAAAANKzelqa4YWV7XN7bVuuhANSdLM9zwV8AAAAAAAAAoCHNl/OYnC9HcxbRWWqKLMtqPSSAuqFyGgAAAAAAAADQsKbm89g+NhN7J+drPRSAuiOcBgAAAAAAAAA0rM/tH4+XvXdrvPETO0LrOYDqEk4DAAAAAAAAABpWW3NTrGlrjpVtpVoPBaDuZHmeC/4CAAAAAAAAAA1pIc9jdiGPpiyitSmLLMtqPSSAuiGcBgAAAAAAAAAAQNVp6wkAAAAAAAAANKwP7RyNznfdHV/y3keirPkcQFUJpwEAAAAAAAAADaucR8yW85hLvwGgqrT1BAAAAAAAAAAa1txCHiNzC9GSZdHb2hRZltV6SAB1QzgNAAAAAAAAAGhYhzt5frFqmnAaQPVo6wkAAAAAAAAANKxP7h2PK//2gfjmj2x/RkQNgGooVeVVAAAAAAAAAACWocn5cjwxPhdrOmZqPRSAuqOtJwAAAAAAAADQsEZny/HY6HR0lZriir42bT0Bqkg4DQAAAAAAAABoWHn+xWaeWZbVdCwA9aap1gMAAAAAAAAAAKiVOw9OxWv+fVv82Kd3RvkZQTUAzl6pCq8BAAAAAAAAALAsHZqej1v3jsfE/EKthwJQd4TTAAAAAAAAAICGdV1/R/zxy7fE2o5SaOoJUF1Z/szmyQAAAAAAAAAADeTY2ESWiagBVEtT1V4JAAAAAAAAAGCZ2ToyG2+5bXf88UOHaj0UgLojnAYAAAAAAAAANKztozPxW/cdiHdvHQit5wCqq1Tl1wMAAAAAAAAAWDYu6W2LN9+4Ni7sbQ8NPQGqK8uPbZ4MAAAAAAAAANAgUmyinEdkWRThtCz9BoCqUDkNAAAAAAAAAGhYO8fn4v1PjcS6zpb47xf11Xo4AHWlqdYDAAAAAAAAAAColYeGpuNHP7MrfvuefaH1HEB1qZwGAAAAAAAAADSsNR2leO0FPXHlivZaDwWg7mR5ap4MAAAAAAAAANCAUmyinJIT2eH
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 2481x3508 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"ref: \n",
|
|||
|
|
"2. Estructura del documento\n",
|
|||
|
|
"En esta sección se describe con mayor profundidad la estructura y los contenidos\n",
|
|||
|
|
"esperados en cada apartado de tu TFE.\n",
|
|||
|
|
"Léela con detenimiento y compárala con la programación semanal que encontrarás\n",
|
|||
|
|
"en el aula virtual, pues en cada borrador deberás entregar completados diferentes\n",
|
|||
|
|
"apartados que se explican a continuación, y que se elaboran de una manera no\n",
|
|||
|
|
"necesariamente lineal.\n",
|
|||
|
|
"Como ya se ha mencionado, la memoria debe estar estructurada en capítulos. Por\n",
|
|||
|
|
"norma general, la estructura de capítulos suele reflejar la línea de discurso del\n",
|
|||
|
|
"trabajo, empezando por una introducción donde se plantea el problema, seguida de\n",
|
|||
|
|
"un estudio de la literatura donde se estudia y describe el contexto. Posteriormente\n",
|
|||
|
|
"se establecen claramente la hipótesis de trabajo y los objetivos concretos de\n",
|
|||
|
|
"investigación, así como la descripción de la metodología seguida para alcanzar los\n",
|
|||
|
|
"objetivos. Posteriormente se describe la contribución del trabajo, seguida de una\n",
|
|||
|
|
"evaluación de la misma. La evaluación da pie a la elaboración de las conclusiones,\n",
|
|||
|
|
"que deben relacionar los resultados obtenidos con los objetivos planteados\n",
|
|||
|
|
"inicialmente. Finalmente, se describen las líneas de trabajo futuro necesarias para\n",
|
|||
|
|
"seguir avanzando hacia la consecución de los objetivos.\n",
|
|||
|
|
"A continuación, te dejamos algunos consejos generales sobre cómo organizar los\n",
|
|||
|
|
"capítulos, pero ten en cuenta que cada trabajo es único y esta organización es una\n",
|
|||
|
|
"guía general adaptable. El director específico de tu TFE podrá aportarte consejos\n",
|
|||
|
|
"sobre cómo organizar la memoria adaptándote al contexto de tu trabajo concreto.\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNIR)\n",
|
|||
|
|
"Como recomendación general, la estructura de capítulos de tu memoria debería ser\n",
|
|||
|
|
"similar a la siguiente propuesta:\n",
|
|||
|
|
"Organización del trabajo en grupo (solo en trabajos grupales)\n",
|
|||
|
|
"Capítulo 1 – Introducción\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"9\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial\n",
|
|||
|
|
"paddle_text: \n",
|
|||
|
|
"2.E Estructura del documento\n",
|
|||
|
|
"En esta sección se describe con mayor profundidad la estructura y los contenidos\n",
|
|||
|
|
"esperados en cada apartado de tu Tfe.\n",
|
|||
|
|
"Léela con detenimiento y compárala con la programación semanal que encontraras\n",
|
|||
|
|
"en el aula virtual, pues en cada borrador deberás entregar completados diferentes\n",
|
|||
|
|
"apartados que se explican a continuación,y que se elaboran de una manera no\n",
|
|||
|
|
"necesariamente lineal.\n",
|
|||
|
|
"Como ya se ha mencionado, la memoria debe estar estructurada en capítulos. Por\n",
|
|||
|
|
"norma general, la estructura de capitulos suele reflejar la linea de discurso del\n",
|
|||
|
|
"trabajo, empezando por una introducción donde se plantea el problema, seguida de\n",
|
|||
|
|
"un estudio de la literatura donde se estudia y describe el contexto. Posteriormente\n",
|
|||
|
|
"se establecen claramente la hipótesis de trabajo y los objetivos concretos de\n",
|
|||
|
|
"investigación, así como la descripción de la metodología seguida para alcanzar los\n",
|
|||
|
|
"objetivos. Posteriormente se describe la contribución del trabajo, seguida de una\n",
|
|||
|
|
"evaluación de la misma. La evaluación da pie a la elaboración de las conclusiones,\n",
|
|||
|
|
"que deben relacionar los resultados obtenidos con los objetivos planteados\n",
|
|||
|
|
"inicialmente. Finalmente, se describen las líneas de trabajo futuro necesarias para\n",
|
|||
|
|
"seguir avanzando hacia la consecución de los objetivos.\n",
|
|||
|
|
"A continuación, te dejamos algunos consejos generales sobre cómo organizar los\n",
|
|||
|
|
"capítulos, pero ten en cuenta que cada trabajo es único y esta organización es una\n",
|
|||
|
|
"guía general adaptable. El director especifico de tu TFE podrá aportarte consejos\n",
|
|||
|
|
"sobre cómo organizar la memoria adaptándote al contexto de tu trabajo concreto.\n",
|
|||
|
|
"Como recomendación general, la estructura de capítulos de tu memoria debería ser\n",
|
|||
|
|
"similar a la siguiente propuesta:\n",
|
|||
|
|
"© Universidad Internacional de La Rioja (UNiR)\n",
|
|||
|
|
"Organización del trabajo en grupo (solo en trabajos grupales)\n",
|
|||
|
|
"Capítulo1–Introducción\n",
|
|||
|
|
"Instrucciones para la redacción y elaboración del TFE\n",
|
|||
|
|
"Máster Universitario en Inteligencia Artificial 6\n"
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"results = []\n",
|
|||
|
|
"\n",
|
|||
|
|
"for pdf_file in os.listdir(PDF_FOLDER):\n",
|
|||
|
|
" if not pdf_file.lower().endswith('.pdf'):\n",
|
|||
|
|
" continue\n",
|
|||
|
|
" pdf_path = os.path.join(PDF_FOLDER, pdf_file)\n",
|
|||
|
|
" page_range = range(5, 10)\n",
|
|||
|
|
" \n",
|
|||
|
|
" images = pdf_to_images(pdf_path, 300, page_range)\n",
|
|||
|
|
" \n",
|
|||
|
|
" for i, img in enumerate(images):\n",
|
|||
|
|
" # img = preprocess_for_ocr(img)\n",
|
|||
|
|
" page_num = page_range[i]\n",
|
|||
|
|
" ref = pdf_extract_text(pdf_path, page_num=page_num)\n",
|
|||
|
|
" show_page(img, f\"page: {page_num}\", 1)\n",
|
|||
|
|
" print(f\"ref: \\n{ref}\")\n",
|
|||
|
|
" \n",
|
|||
|
|
" # Convert PIL image to numpy array\n",
|
|||
|
|
" image_array = np.array(img)\n",
|
|||
|
|
" \n",
|
|||
|
|
" # PaddleOCR\n",
|
|||
|
|
" paddle_text = ocr_paddle(img, image_array)\n",
|
|||
|
|
" print(f\"paddle_text: \\n{paddle_text}\")\n",
|
|||
|
|
" results.append({'PDF': pdf_file, 'Page': page_num, 'Model': 'PaddleOCR', 'Prediction': paddle_text, **evaluate_text(ref, paddle_text)})\n",
|
|||
|
|
" "
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "0db6dc74",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"## 5 Save and Analyze Results"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "code",
|
|||
|
|
"execution_count": 12,
|
|||
|
|
"id": "da3155e3",
|
|||
|
|
"metadata": {},
|
|||
|
|
"outputs": [
|
|||
|
|
{
|
|||
|
|
"name": "stdout",
|
|||
|
|
"output_type": "stream",
|
|||
|
|
"text": [
|
|||
|
|
"Benchmark results saved as ai_ocr_benchmark_finetune_results_20251112_095108.csv\n",
|
|||
|
|
" WER CER\n",
|
|||
|
|
"Model \n",
|
|||
|
|
"PaddleOCR 0.109534 0.052167\n"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"data": {
|
|||
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArwAAAIVCAYAAAAzqSxlAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQLpJREFUeJzt3Qd4VNX29/EVSqhSpBM6hF6lI1W4omIBC0WRqoJSgigQEIgFAa+CiCAoXuTipQsiAoKAgCCdgIpSlSZIEwk9gWTeZ+3/O+NMMglJSGbCzvfzPOeSc86eyZ6TyfWXPevsHeBwOBwCAAAAWCqDvzsAAAAApCYCLwAAAKxG4AUAAIDVCLwAAACwGoEXAAAAViPwAgAAwGoEXgAAAFiNwAsAAACrEXgBAABgNQIvAPjQjBkzJCAgQHbs2HFHXPdu3bpJzpw5b+s5XnrpJfnXv/6VYn3C7WnQoIEMHjyYy4h0hcALWOyjjz4y4ap+/frxttHzffv2TdTz3bhxQyZOnCh169aVu+66ywQh/VqP6TlvoqOj5bPPPpPmzZvL3XffLVmyZJFSpUpJ9+7dPUKfMwg6t0yZMklQUJAJXCdOnEhU/15//XWP58iQIYMUKVJEHn74YdmyZUuingMp6/Dhw/Lpp5/KsGHDzP6ZM2fMzyYkJCROWz2m58LCwuKc69Kli2TOnFmuXr1q9vV94f6zdt+yZs3qety6des8zmXMmFEKFiwoTz75pOzduzdJr2XZsmXm/Z4jRw7zvnriiSfk119/TfI1OX36tLz66qtSsWJFyZ49u3m+2rVry6hRo+TChQuudvo7E99r1Mcm93dnyJAhMnnyZDl16lSS+w7cqTL5uwMAUs+sWbNMuNy2bZscOnRIypUrl+znunLlirRp00bWr19vAqT+x1QD5YoVK0xQWbRokQkE+h9vp2vXrsnjjz9u2jRt2tSEHg29R44ckfnz58t///tfOXbsmBQrVsz1mDfffFNKly4t169fNyFV/2O+ceNG2bNnj0eQSciUKVNMGI+JiZHjx4/LtGnTzPfX61CzZs1kXwMk3QcffGB+ni1atDD7GjaDg4PNzzS2H374wYQ1/dfbuVq1apmA6KR/PGmYjk1DbWz9+/c3YVX/MPvpp59k6tSpJgzr+6pw4cK3fB3bt2+Xxx57TKpUqSL//ve/5eLFi7J06VJzvHLlyom6Fs7neeihh+Ty5cvSuXNnE3SV/vE3duxY+f777+Xbb791tdffjTFjxsR5nty5c8c5ltjfHX0duXLlMn8Q62OAdMEBwEq///67Q3/FFy1a5ChQoIDj9ddf99pO2/Tp0+eWz/fCCy+Yth9++GGcc5MmTTLnevfu7XFcn1ePv//++3Eec/PmTce7777rOH78uNn/7LPPTNvt27d7tBsyZIg5Pm/evFv2MSwszLQ9e/asx/E9e/aY48OGDXP4W3yvM625fPmy+bdr166OHDlyJOs5oqKiHPnz53cMHz7c43j37t0dGTNmdFy6dMnj+2XKlMnx9NNPO3LmzGneH04nT5401+zll192HUtsv9auXWseu2DBAo/jU6ZMMcffeeedRL2WwYMHOwICAhynTp3yOH79+nVHYv3999+OoKAgR6FChRx79+6Nc16f+6233nLtN2vWzFGlSpVbPm9yfnf69u3rKFmypCMmJibR/QfuZJQ0ABaP7ubNm9eMyurHt7qfXH/88Yf85z//kfvuu89r+UOfPn3MCJ6Otmlb52M+/vhjU7s5YMAAr6Nw+rGu++iuN02aNDH//vbbb8nuv3MET0cP3UVGRpqPz3XkW0cLixcvbmob9bi3so/FixdL1apVTVsd6dOR69j0I+SePXtK0aJFTTsdcXvxxRclKioqzvceOHCgFChQwIyKt2vXTs6ePevRRkfndTRdRyLr1Kkj2bJlk2rVqpl9paPquq+jdzpSuGvXLo/H60imjsSXKVPGtNHr0KNHD/nrr7+8loLox/NPP/20ed80btw43uu5e/du02/9yF1HKuOjo4vnzp2TVq1aeRzX59ZSF/cyk61bt8rNmzfNe0KfU7+Hk3PEN6E+JVVS31f6aYY3+jNOLP190PfH+PHjPUoSnAoVKiTDhw8XX7xG/b08evSox3UGbEbgBSylAVfLCQIDA6VTp05y8OBB83FqcnzzzTcmoGgdZXz0nAYWZwjUx+j+s88+K7dDyx+UhrDEOn/+vAlaWi+qIfD55583ga99+/auNlru8Oijj8p7770njzzyiHz44YfStm1bef/996VDhw5ew5vefNWxY0fzkbZ+bKw1nO7h8eTJk1KvXj2ZO3eueQ6tbdbXr2UgztpTp379+smPP/5oArcG4q+//trrHxNaiqIhVPuoH23//fff5mv9+b788svmY/E33njDhBp9ffq6nFatWiW///67qZfW16d9177pR+r/N7jv6amnnjL9HD16tLlm3uh7SP/w0fIC/RkndEPbpk2bTJDWtu6cwdW9rEFDbfny5U1b/SPIvawhocCrP+fYm5YbpPT7Sn+O+keaXnNv1y4xlixZYv5o0T9AE0t/77y9Ri0xup3X6Cyl8FY+AljJ30PMAFLejh07zEeZq1atMvv6sWWxYsUcISEhySppGDBggGm3a9eueNuEh4ebNgMHDjT7+vHzrR7j7WPZ1atXm5IELXX44osvTDlGlixZXKUPiSlpiL3lyZPHsWLFCo+2n3/+uSNDhgyODRs2eByfOnWqecwPP/zgOqb7gYGBjkOHDrmO/fjjj3FKPLp06WKe01u5gvOjY+frbNWqlcfHyXq99GP+CxcuuI7pR87adtOmTa5jK1euNMeyZcvmOHr0qOv4xx9/bI7rR/hOV69ejdOPOXPmmHbff/99nOvWqVOnOO3dSwc2btzoyJUrl6NNmzaJ+ii/c+fOjnz58nk9V7BgQUfLli1d+61btzalDqp9+/aOp556ynWuTp06juDg4Dj98vaz1k2fK3ZJw/Tp0837Sssj9L1Qrlw5U6Kwbds2R2IsXrzYkT17dvMzcr7Hkypv3ryOGjVqJLq9ljTE9xp79ep12787+p5+8cUXk/VagDsNN60BFtLRP/141HmjkI6y6Yjj//73Pxk3bpzXm3oScunSJfOvzswQH+c55+ia89+EHuNN7I+/9WN97fetSh/cLVy40NyUo1lVP0LWm9h0NFZvBmrUqJFps2DBAqlUqZL5aFlHzJx09FKtXbvW1dbZr7Jly7r2q1evbr6HjqAqHVnVkgcdfdXyg9j0Z+DuhRde8DimHz/r6LJ+zKzP7aQ3RDVs2NC175xxQ/tZokSJOMe1P1pqoHQ00UlHpLVUQKekUuHh4a6PvJ169+4d7zXV66Gv7f777zejxPrJwa3o6Hd8I6j33nuvGYHWEUy9DlreoKPtznPvvPOO+VpHnPVjd2+fFOiovY6Mx5Y/f/44x7SUw52WZHz++efmRrZb0RvKdPRcSxH0devPTt/XWgri1Lp1a9PXDRs2xPs8+juR1N8Hff/rTZexeft9SOrvjv5s3N/7gM0IvIBlNEBoINGwq1NCuQciDbtr1qwxoSUpnP+RdgbfxIRiDYO3eow3Ol2SfrQdEREh06dPN3etJ6VOUumMDO6hRz9C1pkBtIxg586d5piWeOi0VBp8vNFyCHfu4dI9MGiJgdL6Ww00WuObGLGfzxkMnc8XXzvn3flab+ztuPvjtbRDyx30/RD79ej1jU3rjb3RsKy14PoxuM6uEbsWOiHxffyv5QlffvmlCbM63Zj2R4Ou0j80tDxEP5LX97CWxngrZ9A/3GKHvPiMHDnSBHwN/fp99ZrEV5cbm9bV6vtHa9Wd04qNGDHCXHMtcVC//PKLKRlJiP5OJPX3Qeu7E/sak/q7oz+b2H+IAbYi8AKW+e677+TPP/80/0HXzdvob1IDr46EOm+Cim9aLz2nnFM0OW/K+fnnn5M0FZjWwDpHSLWmVoOO1rDu378/2Qsg6OM08H/11Vem9lFDhI7I6g1fOmrnTexAGd+oeHLrORP7fPG1S8zjdVRS62gHDRpkfgbOqdoeeOABj1pfJ/cRYXcamrT
|
|||
|
|
"text/plain": [
|
|||
|
|
"<Figure size 800x500 with 1 Axes>"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
"metadata": {},
|
|||
|
|
"output_type": "display_data"
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"source": [
|
|||
|
|
"df_results = pd.DataFrame(results)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Generate a unique filename with timestamp\n",
|
|||
|
|
"timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
|
|||
|
|
"filename = f\"ai_ocr_benchmark_finetune_results_{timestamp}.csv\"\n",
|
|||
|
|
"filepath = os.path.join(OUTPUT_FOLDER, filename)\n",
|
|||
|
|
"\n",
|
|||
|
|
"df_results.to_csv(filepath, index=False)\n",
|
|||
|
|
"print(f\"Benchmark results saved as {filename}\")\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Summary by model\n",
|
|||
|
|
"summary = df_results.groupby('Model')[['WER', 'CER']].mean()\n",
|
|||
|
|
"print(summary)\n",
|
|||
|
|
"\n",
|
|||
|
|
"# Plot\n",
|
|||
|
|
"summary.plot(kind='bar', figsize=(8,5), title='AI OCR Benchmark (WER & CER)')\n",
|
|||
|
|
"plt.ylabel('Error Rate')\n",
|
|||
|
|
"plt.show()"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "3e0f00c0",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### How to read this chart:\n",
|
|||
|
|
"- CER (Character Error Rate) focus on raw transcription quality\n",
|
|||
|
|
"- WER (Word Error Rate) penalizes incorrect tokenization or missing spaces\n",
|
|||
|
|
"- CER and WER are error metrics, which means:\n",
|
|||
|
|
" - Higher values = worse performance\n",
|
|||
|
|
" - Lower values = better accuracy"
|
|||
|
|
]
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"cell_type": "markdown",
|
|||
|
|
"id": "41b427d4",
|
|||
|
|
"metadata": {},
|
|||
|
|
"source": [
|
|||
|
|
"### Compared solutions\n",
|
|||
|
|
"| Model | Type | Components | Key Strengths | Why It Matters |\n",
|
|||
|
|
"| :--------------------- | :--------------------------- | :--------------------------- | :--------------------------------------------------------- | :------------------------------------------------------- |\n",
|
|||
|
|
"| **EasyOCR** | End-to-end (det + rec) | DB + CRNN/Transformer | Lightweight, easy to run, multilingual | Serves as *baseline neuronal* (fast & reproducible). |\n",
|
|||
|
|
"| **PaddleOCR (PP-OCR)** | End-to-end (det + rec + cls) | DB + SRN/CRNN | Strong multilingual support, configurable pipeline | Industrial reference; widely benchmarked. |\n",
|
|||
|
|
"| **DocTR** | End-to-end (det + rec) | DB/LinkNet + CRNN/SAR/VitSTR | Research-oriented, clean API, high-level structured output | Represents the latest *PyTorch*-based academic approach. |\n",
|
|||
|
|
"\n",
|
|||
|
|
"\n",
|
|||
|
|
"These cover the three major open-source paradigms for deep OCR:\n",
|
|||
|
|
"\n",
|
|||
|
|
"EasyOCR: compact CRNN-based recognizer.\n",
|
|||
|
|
"\n",
|
|||
|
|
"PaddleOCR: large industrial model (PP-OCR family).\n",
|
|||
|
|
"\n",
|
|||
|
|
"DocTR: modular research library from Mindee, built for experimentation.\n",
|
|||
|
|
"\n",
|
|||
|
|
"Together they already let you analyse:\n",
|
|||
|
|
"\n",
|
|||
|
|
"accuracy (CER/WER),\n",
|
|||
|
|
"\n",
|
|||
|
|
"inference latency,\n",
|
|||
|
|
"\n",
|
|||
|
|
"model architecture trade-offs."
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
"metadata": {
|
|||
|
|
"kernelspec": {
|
|||
|
|
"display_name": ".venv (3.11.9)",
|
|||
|
|
"language": "python",
|
|||
|
|
"name": "python3"
|
|||
|
|
},
|
|||
|
|
"language_info": {
|
|||
|
|
"codemirror_mode": {
|
|||
|
|
"name": "ipython",
|
|||
|
|
"version": 3
|
|||
|
|
},
|
|||
|
|
"file_extension": ".py",
|
|||
|
|
"mimetype": "text/x-python",
|
|||
|
|
"name": "python",
|
|||
|
|
"nbconvert_exporter": "python",
|
|||
|
|
"pygments_lexer": "ipython3",
|
|||
|
|
"version": "3.11.9"
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"nbformat": 4,
|
|||
|
|
"nbformat_minor": 5
|
|||
|
|
}
|