All Domains
1587 entities found
docker-compose.coolify.yml
docker-compose.coolify.yml is part of the Docker-compose configuration used in deployment. The RQ Worker service depends on the service definition in docker-compose.coolify.yml for backend container deployment and job processing. The postgres (Docker) service is defined in docker-compose.coolify.yml. The redis (Docker) service is defined in docker-compose.coolify.yml. The backend (Docker) service is defined in docker-compose.coolify.yml. The worker (Docker) service is defined in docker-compose.coolify.yml. The frontend (Docker) service is defined in docker-compose.coolify.yml. The postgres_data (Docker) service is defined in docker-compose.coolify.yml. The redis_data (Docker) service is defined in docker-compose.coolify.yml. The backend_storage (Docker) service is defined in docker-compose.coolify.yml.
docker-compose.yml
Docker Deployment contains the docker-compose.yml file for full stack orchestration. The Dockerfile is used together with docker-compose.yml for multi-container deployment. The Docker deployment uses docker-compose.yml to orchestrate the stack The postgres (Docker) service is defined in docker-compose.yml. The redis (Docker) service is defined in docker-compose.yml. The backend (Docker) service is defined in docker-compose.yml. The worker (Docker) service is defined in docker-compose.yml. The ironclaw (Docker) service is defined in docker-compose.yml. The frontend (Docker) service is defined in docker-compose.yml. The postgres_data (Docker) service is defined in docker-compose.yml. The redis_data (Docker) service is defined in docker-compose.yml. The backend_storage (Docker) service is defined in docker-compose.yml. Docker deployment is defined using the docker-compose.yml configuration file.
docker-essentials
Docker essentials are used within the DataLens platform backend for containerization. The DataLens platform backend includes the docker-essentials skill. The DataLens platform backend integrates the docker-essentials skill.
DOCKER.md
Acceptance document with unconditional delivery, contents unspecified; no update provided.
Dockerfile
Backend includes a production-ready Dockerfile for deployment. Frontend includes a multi-stage Dockerfile for production builds. DataLens Agent Mode integrates IronClaw running as a sidecar Docker service for deployment. The FastAPI backend provides a production-ready Dockerfile for deployment. The SvelteKit frontend provides its own production-ready Dockerfile for deployment. The Dockerfile is used together with docker-compose.yml for multi-container deployment.
Docling
GPU-first document extraction uses Docling for DOCX and PPTX files on elin GPU via SSH. It provides semantic extraction and rich metadata, replacing fallback methods. Extraction is constrained by CUDA 12.8 and integrated as a mandatory component, ensuring high-quality parsing without fallback. It includes PyTorch, transformers, OCR support, and verified extraction quality via tests. The DOCX extractor uses Docling for extraction on the elin GPU as a mandatory tool to perform semantic chunking with embedded JSON tables and rich metadata. The PPTX extractor uses Docling on elin GPU mandatorily for slide-based semantic chunking with embedded tables, images, and optional speaker notes support. Docling extraction for DOCX and PPTX files requires the GPU hardware on the elin server to accelerate the extraction and embedding processes. The RQ Worker uses Docling as the exclusive extraction method for DOCX and PPTX files and enforces failure if any Docling extraction errors occur, prohibiting fallback extraction methods. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions.
Docling extraction on elin GPU
Docling extraction on elin GPU is the core process used by the Docling extraction system for document processing using GPU. Docling extraction runs mandatorily on the elin GPU server for DOCX and PPTX extraction with no fallback. The Install_docling_elin.sh script installs Docling 2.75.0 and dependencies on the elin GPU server for mandatory extraction. Docling-based extraction leverages the RTX 4000 SFF Ada 20GB GPU available on elin for DOCX and PPTX file processing.
Docling extraction quality
Docling extraction quality is validated by the implemented Docling extraction system in the GPU-first document extraction pipeline.
Docling extraction system
The Docling extraction system was built as part of the DataLens Phase 2 GPU-first document extraction system. Docling extraction on elin GPU is the core process used by the Docling extraction system for document processing using GPU. The GPU-first document extraction system includes the Docling extraction system as the mandatory method for DOCX and PPTX extraction. Docling extraction quality is validated by the implemented Docling extraction system in the GPU-first document extraction pipeline. Semantic chunking is enforced as a business rule in the Docling extraction system for section/slide-based document processing. The Docling extraction system uses a business rule that tables extracted from documents are embedded as JSON within semantic chunks. Rich metadata including hierarchy, confidence, and provenance is enforced by the Docling extraction system as a business rule for DS-STAR reasoning. The Docling extraction system utilizes Ollama embeddings (nomic-embed-text) to generate vector embeddings for semantic search and reasoning. The DocxExtractor is part of the Docling extraction system for DOCX documents using GPU extraction. The PptxExtractor is part of the Docling extraction system for PPTX documents using GPU extraction. EmbeddingService is used by the Docling extraction system to produce GPU-accelerated embeddings for semantic chunk vectors. RQ worker depends on the Docling extraction system for processing DOCX/PPTX extraction jobs without fallback failure tolerance. DuckDB text_chunks physical table stores semantic chunks produced by the Docling extraction system for querying and analysis. The Deploy_gpu_extractors.sh script installs and configures the Docling extraction system and related GPU-first extraction components.
Docling MANDATORY
Enforced GPU-first extraction system requiring Docling for docx and pptx files, with no fallback to Python tools. Fail hard on errors, ensuring consistent high-quality processing as per recent implementation.
docling>=2.0.0
GPU-only Docling extraction tool for DOCX/PPTX files, installed version 2.0.0, used in GPU-first extraction pipeline, with no fallback.
docs/architecture/CONSOLIDATION_EXAMPLES.md
Provides concrete examples of schema relationships and consolidation processes, aiding comprehension of multi-table queries.
docs/architecture/MULTI_STAGE_TEXT2SQL.md
Contains architecture design for multi-stage SQL generation including improvements and strategies for handling large schemas.
docs/architecture/QUICK_WINS_IMPLEMENTATION.md
Contains implementation plans for rapid deployment of improvements like schema reduction, keyword expansion, and retrial logic.
docs/design/agent-mode-implementation-plan.md
Design document for agent mode implementation, no additional details provided.
docs/design/agent-mode-ironclaw.md
Design document for IronClaw-powered agent mode, no details provided.
docs/DISCOVERY_IMPLEMENTATION.md file
The Discovery Implementation.md document describes the technical approach and architecture for the data discovery module, such as semantic table matching and guided discovery process.
docs/DISCOVERY_WORKFLOW.md file
Workflow documentation describing the data discovery, consolidation, and analysis process.
docs/INTELLIGENT_CONSOLIDATION.md
Describes the architecture for table relationship discovery, schema relation graphs, and multi-table query optimization.
docs/MIGRATION_DUCKDB_TO_POSTGRESQL.md
Migration guide detailing the switch from DuckDB to PostgreSQL backend, including code modifications, rollback plan, and verification steps, completed on 2026-03-02.
Document ingestion pipeline
Document RAG includes a document ingestion pipeline to prepare documents for retrieval.
Document RAG
Integrates LlamaIndex and Qdrant, using bge-large-en-v1.5 embeddings, for semantic document chunking, embedding, and retrieval in the DataLens platform, leveraging RAGAgent for unstructured text search and answer synthesis.
Documentation
Complete documentation exists with guides detailing the system architecture and workflows.
DocumentConverter
DOCX extractor
The batch upload pipeline depends on new extractors including the DOCX extractor. The DOCX extractor uses the python-docx third-party component. The DOCX extractor optionally uses Docling as an extraction method for better text quality and semantic structure. The DOCX extractor falls back to python-docx for faster extraction of simple documents. The DOCX extractor implements semantic chunking with section-based chunk boundaries to preserve document structure. The DOCX extractor handles tables by embedding them as JSON within text chunks instead of separate DuckDB tables. The DOCX extractor relies on background workers for asynchronous processing. The DOCX extractor capability is validated by the test_extractors test case.
DOCX files
Phase 2 file types include DOCX files Phase 2 Strategy Research & Decision Point considers 100 DOCX files for processing in Phase 2 as a medium priority task. Opus 4.6 recommends parsing all DOCX files if policy questions are real; otherwise selectively processing top 20, valuing DOCX as having high ROI for 3-4 hours effort. Opus 4.6 states DOCX files provide better ROI than PDFs because DOCX are cleaner text with narrative context, while PDFs are more noisy and variable effort.
DOCXExtractorTET
Refactored for semantic, section-based chunking with optional Docling GPU-accelerated extraction on elin. Uses heading hierarchy to define boundaries, embeds tables as JSON, improves text quality, and integrates with batch processing. The DOCX Extractor is deployed as part of the Backend and produces text chunks and table parsing outputs.
DS-STAR
DataLens uses DS-STAR as part of its integrated architecture for AI cataloging and extraction. DS-STAR reasoning uses rich metadata such as hierarchy and provenance produced by the Docling extraction system for advanced AI cataloging and analysis. DS-STAR queries use DS-STAR reasoning capabilities that leverage the rich metadata and embeddings from the Docling extraction system. The DataLens platform backend includes the DS-STAR pipeline. The DS-STAR pipeline includes the FileAnalyzer component. The DS-STAR pipeline includes the PlannerAgent component. The DS-STAR pipeline includes the VerifierAgent component. The DS-STAR pipeline includes the RouterAgent component. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. The Platform Backend uses DS-STAR subprocess calls to implement cataloging, extraction, and SQL generation features. DS-STAR Intelligence is encompassed within the broader DS-STAR System epic. DataLens incorporates the DS-STAR autonomous extraction pattern as a built-in integration for the SVGV project data processing. The DataLens platform backend uses the DS-STAR pipeline for various processing tasks.
DS-STAR Agent API
DS-STAR AI cataloging is integrated via the DS-STAR Agent API running on elin and proxied on theo. Backend integrates with DS-STAR API for AI cataloging functions. The Backend integrates with DS-STAR AI for autonomous extraction, Text-to-SQL, and RAG DataLens Development integrates with the DS-STAR Agent API running on elin for AI cataloging and autonomous extraction. DS-STAR Agent API deployment includes 12 DS-STAR agents providing AI cataloging functionality.
DS-STAR Agent API integration
Integrates DS-STAR's autonomous, iterative data cataloging agents (Planner, Verifier, Router, Orchestrator) to enable quality-verified, local LLM-powered extraction. This enhances automation, data quality, and privacy, replacing manual or cloud API solutions, aligned with platform goals of self-hosting and comprehensive data management.