Project: datalens
81 entity types
Matrix/All Domains

All Domains

1587 entities found

CapabilityIntent

Excel Worker AI Agents

DataLens implements production-ready autonomous agents approximating the functionality of Excel Worker AI Agents like LangGraph and LangChain. Excel Worker AI Agents are prototype implementations following the ReAct pattern, while DataLens offers a production-ready equivalent solution.

Entity

Execute + Visualize

After SQL execution, results are visualized as part of the pipeline.

Entity

execute_query

Runs SQL on DuckDB within project storage, returns results.

DesignDecisionArchitecture

EXECUTION_FLOW.md

Describes backend execution flow, highlighting Qdrant lazy initialization to improve startup times and request handling.

DesignDecisionArchitecture

Executor

StakeholderIntent

Exerun

The Admin User belongs to the organization Exerun.

RequirementIntent

explore-schema

DataLens Agent Mode implements the explore-schema skill to profile project data schemas.

BusinessProcessIntent

ExploreSchemaSkill

ExploreSchemaSkill produces SkillResult when executing to discover and profile project data schema. AgentWarmingService uses ExploreSchemaSkill to assemble warm context by computing schema profiles.

BusinessProcessIntent

ExportService

ExportService uses DuckDBService to export query results in various formats. ExportService converts SkillResult data into CSV, Excel, or JSON formats.

RequirementIntent

extract progress

RequirementIntent

Extract progress endpoint

Initially missing or broken, now fixed to show accurate extraction progress.

DataEntityData Model

extract worker

Background workers include the extract worker as a component. The batch processor orchestrator relies on the extract worker to perform data extraction from files.

ThirdPartyComponentArchitecture

extract-msg

The MSG extractor uses the extract-msg third-party component.

BatchJobIntegrations

extract_file_job

A batch job responsible for extracting data from files like DOCX, PPTX, PDF, and Excel, triggering GPU extraction workflows, with no specific recurrence or failure consequences detailed. The RQ job queue manages the execution of the extract_file_job() function with a timeout for large files. The RQ extraction queue on redis triggers the extract_file_job(file_id) function to process file extraction jobs for summaries. The extract_file_job function is executed by the RQ worker to process extraction of files. The batch job extract_file_job performs DOCX and PPTX extraction via SSH to elin using Docling, producing JSON results.

AgentCommandAgentic Discipline

extract_file_task

Agent command for file extraction; queued jobs are now correctly processed after fixing the function name mismatch, enabling full pipeline operation.

PhysicalTableData Model

extracted_tables

Each project schema such as project_14.* contains multiple tables whose metadata is registered in the extracted_tables registry table in the public schema. PgDataService updates and queries the extracted_tables table to track metadata of tables per project schema. extracted_tables table tracks metadata about extracted tables per project inside PostgreSQL. PgDataService manages the extracted_tables registry in PostgreSQL for tracking table metadata.

PhysicalTableData Model

extracted_tables registry

PhysicalTableData Model

extracted_text_chunks

extracted_text_chunks table stores all textual data chunks from all projects centrally in PostgreSQL public schema. PgDataService manages inserting and querying extracted_text_chunks table in PostgreSQL for textual data storage and retrieval. Text chunks for all projects are stored in a unified table in PostgreSQL's public schema. Text chunks for all projects are stored in a unified table in PostgreSQL's public schema.

PhysicalTableData Model

extracted_text_chunks table

DataEntityData Model

ExtractedTable model

IntegrationIntegrations

Extraction API Endpoint

Extraction API processes SVGV files to extract data tables and write them to DuckDB. RQ worker processes execute extraction jobs by calling the Extraction API endpoints for each SVGV file. Backend API exposes the Extraction API endpoints to trigger extraction of individual SVGV files.

BusinessProcessIntent

Extraction coordinator (/backend/app/services/extraction_coordinator.py)

The Extraction coordinator is part of the workflow that triggers AI Summary Generation after file extraction completes. BatchProcessor uses ExtractionCoordinator to coordinate data extraction in the pipeline. ExtractionCoordinator uses PrepareDataSkill to process data after extraction across CPU and GPU services. The extraction coordinator service depends on the DuckDB service for managing extracted text chunks and related data. BatchProcessor orchestrates the full pipeline that involves ExtractionCoordinator for extraction tasks. ExtractionCoordinator coordinates extraction processes that are prioritized by FilePrioritizer.

Entity

extraction log

RequirementIntent

Extraction pipeline

DataLens Platform uses an extraction pipeline involving DS-STAR extractors and DuckDB for data processing. The extraction pipeline depends on the RQ queue for batch extraction job management. The extraction pipeline previously wrote extracted data into DuckDB, causing write locks during extraction. The extraction pipeline was modified to write extracted data into PostgreSQL enabling concurrent query operation. The extraction pipeline depends on the RQ queue for batch extraction job management. The extraction pipeline previously wrote extracted data into DuckDB, causing write locks during extraction. The extraction pipeline was modified to write extracted data into PostgreSQL enabling concurrent query operation. The Extraction Pipeline depends on the RQ Worker to process the extraction queue for files asynchronously. Extraction Pipeline stores extracted file data and catalog information in PostgreSQL database with language support. The Backend implements the extraction pipeline business process including DS-STAR integration. The DataLens Platform includes an extraction pipeline that converts CSV, Excel, and PDF files into DuckDB usable data. The Extraction pipeline in the DataLens Platform uses pandas for data manipulation and loading extracted data. The Extraction Pipeline depends on the RQ Worker to process files asynchronously in the extraction queue.

AcceptanceCriteriaIntent

extraction quality metrics

Must show over 90% accuracy in extracting tables, maintaining document structure, and semantic chunking, validated through tests.

DesignDecisionArchitecture

Extraction Queue Fix

The extraction queue fix requires the RQ Worker to be running and active to process extraction jobs. The extraction queue fix enables the RQ Worker to process all 132 extraction jobs successfully.

AgentCommandAgentic Discipline

extraction queueing code

PhysicalTableData Model

extraction worker

Extraction Worker currently writes extracted data to DuckDB files. Extraction Worker will write extracted data to PostgreSQL schemas instead of DuckDB.

UserStoryIntent

ExtractionParams

Entity

ExtractionResult