Docling
GPU-first document extraction uses Docling for DOCX and PPTX files on elin GPU via SSH. It provides semantic extraction and rich metadata, replacing fallback methods. Extraction is constrained by CUDA 12.8 and integrated as a mandatory component, ensuring high-quality parsing without fallback. It includes PyTorch, transformers, OCR support, and verified extraction quality via tests. The DOCX extractor uses Docling for extraction on the elin GPU as a mandatory tool to perform semantic chunking with embedded JSON tables and rich metadata. The PPTX extractor uses Docling on elin GPU mandatorily for slide-based semantic chunking with embedded tables, images, and optional speaker notes support. Docling extraction for DOCX and PPTX files requires the GPU hardware on the elin server to accelerate the extraction and embedding processes. The RQ Worker uses Docling as the exclusive extraction method for DOCX and PPTX files and enforces failure if any Docling extraction errors occur, prohibiting fallback extraction methods. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions.