debug set and locking
Some checks failed
build_docker / essential (pull_request) Successful in 1s
build_docker / build_cpu (linux/amd64) (pull_request) Successful in 4m7s
build_docker / build_gpu (linux/amd64) (pull_request) Successful in 17m38s
build_docker / build_cpu (linux/arm64) (pull_request) Successful in 22m10s
build_docker / build_easyocr (linux/amd64) (pull_request) Successful in 15m39s
build_docker / build_gpu (linux/arm64) (pull_request) Successful in 20m12s
build_docker / manifest_cpu (pull_request) Has been cancelled
build_docker / manifest_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr_gpu (pull_request) Has been cancelled
build_docker / build_doctr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr (pull_request) Has been cancelled
build_docker / build_easyocr (linux/arm64) (pull_request) Has been cancelled

This commit is contained in:
2026-01-18 18:03:23 +01:00
parent 9a1bf407ca
commit 68efb27a1e
19 changed files with 754 additions and 1290 deletions

View File

@@ -42,4 +42,33 @@ class ImageTextDataset:
with open(txt_path, "r", encoding="utf-8") as f:
text = f.read()
return image, text
return image, text
def get_output_path(self, idx, output_subdir, debugset_root="/app/debugset"):
"""Get output path for saving OCR result to debugset folder.
Args:
idx: Sample index
output_subdir: Subdirectory name (e.g., 'paddle_text', 'doctr_text')
debugset_root: Root folder for debug output (default: /app/debugset)
Returns:
Path like /app/debugset/doc1/{output_subdir}/page_001.txt
"""
img_path, _ = self.samples[idx]
# img_path: /app/dataset/doc1/img/page_001.png
# Extract relative path: doc1/img/page_001.png
parts = img_path.split("/dataset/", 1)
if len(parts) == 2:
rel_path = parts[1] # doc1/img/page_001.png
else:
rel_path = os.path.basename(img_path)
# Replace /img/ with /{output_subdir}/
rel_parts = rel_path.rsplit("/img/", 1)
doc_folder = rel_parts[0] # doc1
fname = os.path.splitext(rel_parts[1])[0] + ".txt" # page_001.txt
out_dir = os.path.join(debugset_root, doc_folder, output_subdir)
os.makedirs(out_dir, exist_ok=True)
return os.path.join(out_dir, fname)