backend/app/extractors/docx_extractor.py
Uses Docling GPU on elin for DOCX extraction with semantic chunking, heading hierarchy, and table embedding as JSON. Falls back to python-docx for simple files; refactored for Phase 2 GPU-first processing. The DocxExtractor is part of the Docling extraction system for DOCX documents using GPU extraction. The docx_extractor.py extractor interfaces with Docling on elin GPU via SSH for DOCX extraction with semantic chunking and rich metadata. GPU-first document extraction includes extracting DOCX files using backend/app/extractors/docx_extractor.py that calls Docling on elin GPU. The DOCX extractor uses Docling for extraction on the elin GPU as a mandatory tool to perform semantic chunking with embedded JSON tables and rich metadata. The RQ worker calls the DOCX extractor for extraction using Docling and fails hard if extraction fails, enforcing the no fallback policy. The DOCX extractor produces semantic chunks with rich metadata including hierarchy and provenance to support DS-STAR queries for document reasoning.