BatchJobIntegrations
Batch extraction
A large-scale GPU-first extraction process targeting 214 files, expected to complete in about 15-20 minutes, using Docling and Ollama for document parsing and embeddings. The batch processor orchestrator manages background workers including catalog, extract, vectorize, and prioritize jobs. The file prioritizer component is utilized by the batch processor orchestrator to assign tiers and manage processing order. Batch extraction subprocess is part of Backend batch processing strategy using subprocess calls instead of RQ for extraction jobs.