All Domains
1587 entities found
RESULTS_VISUALIZATION_TEST_REPORT.md
Test report for visualization components, verifying functionality and performance.
Rich metadata
Rich metadata including hierarchy, confidence, and provenance is enforced by the Docling extraction system as a business rule for DS-STAR reasoning. DS-STAR reasoning uses rich metadata such as hierarchy and provenance produced by the Docling extraction system for advanced AI cataloging and analysis.
RingfencedSkills
RingfencedSkills uses ElinSkillClient to execute constrained skill operations for DataLens agent. RingfencedSkills replace raw SQL skills with constrained operations used by SkillExecutor when executing agent skills. ElinSkillClient integrates with RingfencedSkills to perform ringfenced skill executions on elin.
Rollback Plan
Router
Router agent manages fixes and extensions to the extraction plan.
Router Architecture
DataLens Development applies the Router Architecture for modular endpoint routing and domain separation.
RouterAgent
DS-STAR Intelligence includes the RouterAgent component. Router Agent decides between fixing and extending steps, managing the iteration loop in the plan. RouterAgent is part of the DS-STAR pipeline. DS-STAR Orchestrator uses RouterAgent for decision logic in its process DSStarOrchestrator uses router agent for decision making in the extraction process DS-STAR Orchestrator uses RouterAgent for decision logic in its process DSStarOrchestrator uses router agent for decision making in the extraction process The DS-STAR pipeline includes the RouterAgent component. DS-STAR Intelligence contains the RouterAgent component. The Router Agent manages iteration loops in the DS-STAR planning process as per the plan. The DS-STAR Intelligence Layer includes the RouterAgent component. DS-STAR Intelligence includes the RouterAgent that makes decision logic for extraction steps. DS-STAR Intelligence includes the RouterAgent component which manages decision logic such as FIX, ADD, or PROCEED. Router Agent uses outputs from Verifier Agent to decide extraction plan adjustments.
Row-Level Security
Vanna 2.0 enforces row-level security by filtering queries based on user permissions. Vanna 2.0 enforces row-level security by filtering queries per user permissions. Vanna 2.0 enforces row-level security filtering queries per user permissions
RQ
RQ is a background job management tool used for tasks like schema profiling and session warming, though details are limited in the messages. The Batch Upload process uses the RQ job queue for reliable background job execution. The RQ job queue manages the execution of the extract_file_job() function with a timeout for large files. The Batch Upload process uses the RQ job queue for reliable background job execution. The RQ job queue manages the execution of the extract_file_job() function with a timeout for large files. AI Summary Generation is implemented asynchronously using the RQ job queue to avoid blocking HTTP responses during file list retrieval. Vectorize Progress Tracking uses the RQ job queue to track status and progress of asynchronous embedding jobs. Vectorize Progress Tracking queries the RQ job queue and chunk counts to provide accurate vectorization progress percentages. The RQ worker for async job processing consumes jobs from the RQ job queue to generate AI summaries and process embeddings asynchronously. The RQ worker depends on the RQ job queue to receive tasks for async AI summary generation and embedding processing. Background workers use RQ job chaining for job chaining and orchestration. Background workers utilize RQ job chaining to coordinate sequential processing tasks.
rq
Rq depends on redis as the message broker for job queueing.
RQ job queue
The AI Summary Generation feature uses the RQ job queue for asynchronous summary generation after extraction completes. The RQ job queue is utilized and managed by backend app api files.py for async AI summary generation processing. The RQ job queue depends on the Redis service for managing asynchronous job scheduling and processing.
RQ job queue for async summary generation
RQ Queue
Redis provides the backend queue for the RQ worker in the extraction pipeline. The RQ worker listens to the RQ queue to process extraction jobs.
RQ Queue with Redis backend
Uses Redis for managing background jobs like file extraction, summaries, and vectorization in DataLens, with bidirectional protocol and pattern. The extraction pipeline depends on the RQ queue for batch extraction job management. RQ Worker depends on RQ queue to consume extraction jobs and process them. The RQ extraction queue on redis triggers the extract_file_job(file_id) function to process file extraction jobs for summaries. The Docker-compose configuration depends on the RQ extraction queue on redis for job processing. The Docker-compose configuration was constrained by a misconfiguration of the RQ extraction queue on redis causing job processing issues. The Backend container uses the RQ extraction queue on redis for managing extraction jobs. RQ Worker extraction processing depends on Redis RQ job queuing for managing extraction jobs. Batch Processing Strategy uses RQ job queue for job management and reliability. Batch extraction is managed by the Backend using RQ job queue for job orchestration.
RQ serialization
Serialization challenge addressed by replacing RQ jobs with subprocess calls for GPU extraction.
RQ worker
DataLens depends on the RQ worker to process queued extraction jobs for the SVGV dataset files. The RQ worker processes extraction jobs for the 132 SVGV dataset files. Redis functions as the queue backend supporting the RQ worker processing extraction jobs. The Backend API interacts with the RQ worker to manage extraction job queues and status for SVGV files. The extract_file_job function is executed by the RQ worker to process extraction of files. The worker container runs the RQ worker instance responsible for processing extraction jobs. The worker container is idle, waiting for extraction jobs in the RQ worker queue after the reset. The RQ worker is used by the FastAPI backend to handle asynchronous jobs such as embeddings and summary generation.
RQ Worker
Worker process correctly configured and processing extraction jobs; initial queueing issues fixed with proper function name, now actively handling 132 files for re-extraction. RQ Worker extraction processing depends on Redis RQ job queuing for managing extraction jobs. RQ Worker extraction processing uses Backend API endpoints for extraction tasks coordination. The SVGV Full Reset process depends on RQ Worker extraction processing to handle extraction jobs after resetting files and schema. RQ Worker processed the SVGV extraction jobs and is currently idle after completion. The Data Discovery system utilizes RQ Worker to process background extraction and consolidation jobs. The RQ Worker uses Docling as the exclusive extraction method for DOCX and PPTX files and enforces failure if any Docling extraction errors occur, prohibiting fallback extraction methods. The Extraction Pipeline depends on the RQ Worker to process files asynchronously in the extraction queue. The Backend depends on the RQ Worker to process asynchronous tasks such as extraction and AI summary generation. The 132 extraction jobs are processed by the RQ worker. The RQ worker listens to the RQ queue to process extraction jobs. The Worker container hosts the RQ worker process for asynchronous job processing. The extraction queue fix requires the RQ Worker to be running and active to process extraction jobs. The extraction queue fix enables the RQ Worker to process all 132 extraction jobs successfully. The Data Discovery system depends on the RQ Worker to process extraction jobs asynchronously. RQ worker processes execute extraction jobs by calling the Extraction API endpoints for each SVGV file.
RQ worker for async job processing
The RQ worker for async job processing consumes jobs from the RQ job queue to generate AI summaries and process embeddings asynchronously. The RQ worker depends on the RQ job queue to receive tasks for async AI summary generation and embedding processing. The DataLens platform uses the RQ worker for async job processing to handle background tasks for summaries and embeddings. The RQ worker for async job processing is part of the backend infrastructure of the DataLens platform executing asynchronous tasks. The deployment and availability of the RQ worker for async job processing depends on the Coolify deployment platform configuration and deployment status. RQ worker runs on theo server to process async jobs like embeddings and summaries in the DataLens platform. The Extraction Pipeline depends on the RQ Worker to process the extraction queue for files asynchronously. Backend uses the RQ Worker configured to listen on the 'extraction' queue for asynchronous extraction jobs.
RQ Worker service
The RQ Worker service depends on the service definition in docker-compose.coolify.yml for backend container deployment and job processing.
RQ worker service on backend server theo
A background worker in progress to handle file extraction and processing tasks.
RQ workers
DataLens Platform plans to use RQ background jobs for asynchronous cataloging and extraction in future iterations. RQ workers depend on Redis for job queue management. RQ worker depends on the Docling extraction system for processing DOCX/PPTX extraction jobs without fallback failure tolerance. RQ Worker depends on RQ queue to consume extraction jobs and process them. Backend processing depends on RQ Worker to execute background extraction and embedding jobs. The extract.py worker runs as part of the RQ workers to process extraction jobs asynchronously. Future work on the DataLens Platform includes integration of RQ workers for background job processing.
RTX 4000 SFF Ada
DataLens performs GPU-accelerated workloads on elin, specifically utilizing the RTX 4000 SFF Ada GPU. GPU-first document extraction uses the RTX 4000 SFF Ada 20GB GPU on elin for document extraction and embeddings generation. Ollama runs on the elin RTX 4000 SFF Ada 20GB GPU to provide embedding services for document chunks. The GPU-first document extraction system uses the RTX 4000 GPU on the elin server for fast document extraction and vectorization. DataLens agent requires GPU usage on RTX 4000 SFF Ada with max memory utilization 0.5 due to shared hardware usage constraint. GPU resource management policies require monitoring and usage of the shared RTX 4000 SFF Ada 20GB GPU for extraction and embedding tasks. Docling-based extraction leverages the RTX 4000 SFF Ada 20GB GPU available on elin for DOCX and PPTX file processing. Ollama embeddings run on the RTX 4000 SFF Ada 20GB GPU for batch processing of text chunks.
RunSqlTool
A tool used within the agent to execute SQL queries on DuckDB for structured data retrieval.
Safety net cleanup
Safety net cleanup enhances SQL extraction regex fix by stripping explanation text markers after extraction to ensure pure SQL before execution. Safety net cleanup was deployed on theo.
sales table
The sales table is stored in DuckDB (analytics.db)
sample_sales table
sample_sales table exists within DuckDB (analytics.db) The sample_sales table is stored in DuckDB as a physical table.
sample_sales.csv
Scaling Considerations
Scandinavian budget data schema
Schema API endpoints
API endpoints for schema detection and mapping are planned to enable optional, AI-assisted schema assignment, existing as part of development improvements.