debug set and locking
Some checks failed
build_docker / essential (pull_request) Successful in 1s
build_docker / build_cpu (linux/amd64) (pull_request) Successful in 4m7s
build_docker / build_gpu (linux/amd64) (pull_request) Successful in 17m38s
build_docker / build_cpu (linux/arm64) (pull_request) Successful in 22m10s
build_docker / build_easyocr (linux/amd64) (pull_request) Successful in 15m39s
build_docker / build_gpu (linux/arm64) (pull_request) Successful in 20m12s
build_docker / manifest_cpu (pull_request) Has been cancelled
build_docker / manifest_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr_gpu (pull_request) Has been cancelled
build_docker / build_doctr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr (pull_request) Has been cancelled
build_docker / build_easyocr (linux/arm64) (pull_request) Has been cancelled
Some checks failed
build_docker / essential (pull_request) Successful in 1s
build_docker / build_cpu (linux/amd64) (pull_request) Successful in 4m7s
build_docker / build_gpu (linux/amd64) (pull_request) Successful in 17m38s
build_docker / build_cpu (linux/arm64) (pull_request) Successful in 22m10s
build_docker / build_easyocr (linux/amd64) (pull_request) Successful in 15m39s
build_docker / build_gpu (linux/arm64) (pull_request) Successful in 20m12s
build_docker / manifest_cpu (pull_request) Has been cancelled
build_docker / manifest_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_easyocr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr_gpu (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr_gpu (pull_request) Has been cancelled
build_docker / manifest_doctr_gpu (pull_request) Has been cancelled
build_docker / build_doctr (linux/amd64) (pull_request) Has been cancelled
build_docker / build_doctr (linux/arm64) (pull_request) Has been cancelled
build_docker / manifest_easyocr (pull_request) Has been cancelled
build_docker / build_easyocr (linux/arm64) (pull_request) Has been cancelled
This commit is contained in:
@@ -42,4 +42,33 @@ class ImageTextDataset:
|
||||
with open(txt_path, "r", encoding="utf-8") as f:
|
||||
text = f.read()
|
||||
|
||||
return image, text
|
||||
return image, text
|
||||
|
||||
def get_output_path(self, idx, output_subdir, debugset_root="/app/debugset"):
|
||||
"""Get output path for saving OCR result to debugset folder.
|
||||
|
||||
Args:
|
||||
idx: Sample index
|
||||
output_subdir: Subdirectory name (e.g., 'paddle_text', 'doctr_text')
|
||||
debugset_root: Root folder for debug output (default: /app/debugset)
|
||||
|
||||
Returns:
|
||||
Path like /app/debugset/doc1/{output_subdir}/page_001.txt
|
||||
"""
|
||||
img_path, _ = self.samples[idx]
|
||||
# img_path: /app/dataset/doc1/img/page_001.png
|
||||
# Extract relative path: doc1/img/page_001.png
|
||||
parts = img_path.split("/dataset/", 1)
|
||||
if len(parts) == 2:
|
||||
rel_path = parts[1] # doc1/img/page_001.png
|
||||
else:
|
||||
rel_path = os.path.basename(img_path)
|
||||
|
||||
# Replace /img/ with /{output_subdir}/
|
||||
rel_parts = rel_path.rsplit("/img/", 1)
|
||||
doc_folder = rel_parts[0] # doc1
|
||||
fname = os.path.splitext(rel_parts[1])[0] + ".txt" # page_001.txt
|
||||
|
||||
out_dir = os.path.join(debugset_root, doc_folder, output_subdir)
|
||||
os.makedirs(out_dir, exist_ok=True)
|
||||
return os.path.join(out_dir, fname)
|
||||
@@ -14,6 +14,7 @@ services:
|
||||
- "8003:8000"
|
||||
volumes:
|
||||
- ../dataset:/app/dataset:ro
|
||||
- ../debugset:/app/debugset:rw
|
||||
- doctr-cache:/root/.cache/doctr
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
@@ -35,6 +36,7 @@ services:
|
||||
- "8003:8000"
|
||||
volumes:
|
||||
- ../dataset:/app/dataset:ro
|
||||
- ../debugset:/app/debugset:rw
|
||||
- doctr-cache:/root/.cache/doctr
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
|
||||
@@ -169,6 +169,7 @@ class EvaluateRequest(BaseModel):
|
||||
# Page range
|
||||
start_page: int = Field(5, ge=0, description="Start page index (inclusive)")
|
||||
end_page: int = Field(10, ge=1, description="End page index (exclusive)")
|
||||
save_output: bool = Field(False, description="Save OCR predictions to debugset folder")
|
||||
|
||||
|
||||
class EvaluateResponse(BaseModel):
|
||||
@@ -302,6 +303,12 @@ def evaluate(request: EvaluateRequest):
|
||||
)
|
||||
time_per_page_list.append(float(time.time() - tp0))
|
||||
|
||||
# Save prediction to debugset if requested
|
||||
if request.save_output:
|
||||
out_path = state.dataset.get_output_path(idx, "doctr_text")
|
||||
with open(out_path, "w", encoding="utf-8") as f:
|
||||
f.write(pred)
|
||||
|
||||
m = evaluate_text(ref, pred)
|
||||
cer_list.append(m["CER"])
|
||||
wer_list.append(m["WER"])
|
||||
|
||||
Reference in New Issue
Block a user