backend/app/workers/extract.py
The extraction worker uses the DOCX extractor for GPU-first document extraction of DOCX files. The extraction worker uses the PPTX extractor for GPU-first extraction of PPTX files. The extraction worker chains to the batch vectorize job to process GPU embeddings after extraction. The extraction worker uses Docling as mandatory extractor for DOCX/PPTX files and fails extraction if Docling fails. The extract.py worker is modified to write extracted data using pg_data_service.py instead of DuckDBService. The extract.py worker invokes Docling-based extractors for DOCX and PPTX files and enforces a no-fallback failure policy if Docling fails. The extract.py worker runs as part of the RQ workers to process extraction jobs asynchronously. The RQ worker calls the DOCX extractor for extraction using Docling and fails hard if extraction fails, enforcing the no fallback policy. The RQ worker calls the PPTX extractor for extraction using Docling, failing hard on extraction errors without fallback. The extraction worker chains extraction results to the embedding service for batch vectorization on GPU after successful DOCX/PPTX extraction.