Project: datalens
81 entity types
Matrix/All Domains

All Domains

1587 entities found

IntegrationEndpointIntegrations

DS-STAR agents

DS-STAR agents are located at /home/ops/datalens/agents/ on elin and used by the DataLens backend for AI tasks via subprocess integration. They include components like FileAnalyzer, PlannerAgent, VerifierAgent, and RouterAgent, supporting autonomous data extraction and analysis within the platform. The environment uses a virtual environment at /home/ops/datalens/ and the backend API communicates with these agents for cataloging, extraction, and query processing.

ThirdPartyComponentArchitecture

DS-STAR agents (12 agents on elin)

The DS-STAR AI cataloging functionality uses 12 agents running on elin as part of the DS-STAR agents.

RequirementIntent

DS-STAR AI cataloging

DS-STAR AI cataloging is integrated via the DS-STAR Agent API running on elin and proxied on theo. DS-STAR AI cataloging system uses Text-to-SQL with Ollama for natural language query translation. The DS-STAR AI cataloging functionality uses 12 agents running on elin as part of the DS-STAR agents.

DesignDecisionArchitecture

DS-STAR Architecture

DataLens Development implements the DS-STAR Architecture adapted from Google's DS-STAR paper. The DS-STAR architecture based on multiple agents is implemented as part of DataLens Development for autonomous cataloging and extraction.

CapabilityIntent

DS-STAR autonomous extraction

AI Core capability requires the DS-STAR autonomous extraction capability. The AI Core includes DS-STAR autonomous extraction components such as PlannerAgent, VerifierAgent, RouterAgent, and Orchestrator. DS-STAR Intelligence Layer realizes the Autonomous Data Extraction capability with local LLMs and iterative quality refinement.

CapabilityIntent

DS-STAR Autonomous Extraction Pattern

BusinessRuleIntent

DS-STAR cataloging (12 agents)

Entity

DS-STAR COMPARISON.md

PageUser Interface

DS-STAR extractor

The DataLens Platform utilizes DS-STAR extractors to transform catalog elements into structured data. The DS-STAR extractors are invoked via DS-STAR agents subprocess integration for extraction tasks.

PhysicalTableData Model

DS-STAR extractors

Utilizes CSV, Excel, and PDF extractors for data ingestion; CSV extractor handles CSV data, Excel extractor manages multi-sheet files, and PDF extractor extracts tables from PDFs, as part of the data extraction plan on Day 2.

IntegrationEndpointIntegrations

DS-STAR FileAnalyzer

The DataLens Platform integrates with the DS-STAR FileAnalyzer component for automated file cataloging. DataLens Platform integrates the DS-STAR FileAnalyzer for automatic AI cataloging upon file upload. DS-STAR FileAnalyzer integration depends on the file upload feature to automatically catalog files upon upload. DSStarOrchestrator depends on the file analyzer agent for initial data analysis The file_analyzer.py component is part of the DS-STAR Intelligence system used by DataLens Project. The FastAPI backend uses DS-STAR FileAnalyzer for AI cataloging of uploaded files. The file_analyzer.py component is part of the DS-STAR Intelligence system used by DataLens Project. The FastAPI backend uses DS-STAR FileAnalyzer for AI cataloging of uploaded files. The Platform Backend integrates the DS-STAR FileAnalyzer for automatic cataloging of uploaded files. The Platform Backend depends on DS-STAR FileAnalyzer to perform automatic cataloging immediately after file upload. AI cataloging in the DataLens Platform integrates with the DS-STAR FileAnalyzer for automatic cataloging of uploaded files.

EpicIntent

DS-STAR Intelligence

Includes components like PlannerAgent, VerifierAgent, and RouterAgent within the DS-STAR system, implementing autonomous file analysis, data validation, and query generation on GPU infrastructure and DuckDB, involving iterative extraction and AI-driven quality assessment.

BusinessProcessIntent

DS-STAR loop orchestrator

DS-STAR Orchestrator comprises PlannerAgent, VerifierAgent, and RouterAgent for autonomous data extraction, processing CSV, Excel, and PDF files, and implementing an iterative refinement loop during extraction. The DS-STAR Intelligence capability includes a DS-STAR loop orchestrator to orchestrate the agent workflow loop for iterative extraction improvement. DS-STAR Intelligence contains the DS-STAR loop orchestrator responsible for coordinating the iterative process.

Entity

DS-STAR Orchestrator

The DS-STAR Intelligence system includes the DS-STAR Orchestrator to run the iterative refinement loop.

UseCaseIntent

DS-STAR planning loop

Objective: Implement an autonomous cycle that fixes extraction errors via multi-step reasoning, quality assessment, and iterative refinement to improve data extraction quality.

PageUser Interface

DS-STAR queries

DS-STAR queries use DS-STAR reasoning capabilities that leverage the rich metadata and embeddings from the Docling extraction system. The DOCX extractor produces semantic chunks with rich metadata including hierarchy and provenance to support DS-STAR queries for document reasoning. The PPTX extractor provides slide-based semantic chunks with detailed metadata including slide layout and image counts that DS-STAR queries utilize for document analysis.

PageUser Interface

DS-STAR sql_agent

The DataLens Platform uses the DS-STAR sql_agent to generate SQL queries from natural language questions.

RequirementIntent

DSSTAR_AGENTS_DIR environment variable

RequirementIntent

DSSTAR_VENV_PYTHON environment variable

BusinessProcessIntent

DSStarService

DSStarService depends on BatchProcessor to orchestrate DS-STAR Agent API workflows. DataLens Development uses DSStarService as an HTTP client to integrate with DS-STAR Agent API for AI cataloging and extraction. Project context in summary generation is required by the DSStarService file cataloging workflow to produce contextually relevant AI summaries. DSStarService integrates with DiscoveryService as client for DS-STAR Agent API on elin.

IntegrationIntegrations

DSStarService HTTP client

DesignDecisionArchitecture

Dual database architecture

RequirementIntent

Dual-write then cutover

ExternalSystemIntegrations

DuckDB

DuckDB is used by DataLens for local structured data storage and querying from extracted analysis files. It is central to analysis and is used by SQLAgent. The platform is migrating to PostgreSQL to eliminate DuckDB's write-lock, enhancing concurrency. It stores tables like sales, surveys, and test data, supporting the platform's in-memory querying and analysis processes. Qdrant indexes embeddings generated from text chunks stored in DuckDB, enabling semantic search in the platform. OpenClaw Skill API queries DuckDB which contains 473 extracted budget tables for analytical data. DuckDB hosts the data tables extracted for Project 14 from SVGV files for analytical queries. The implementation uses DuckDB as a unified database for storing extracted data. CSV Extractor loads validated and cleaned CSV files into DuckDB. Excel Extractor loads normalized Excel data into DuckDB. PDF Extractor extracts tables from PDFs and loads them into DuckDB. SQLAgent executes generated SQL queries on DuckDB. The system uses LangChain framework alongside DuckDB for data pipeline management. RAPIDS cuDF is optionally used for large dataframe acceleration alongside DuckDB.

ServerOperations

DuckDB (per-project analytical data)

DuckDB is used for storing structured extracted data and semantic text chunks for each project, currently containing over 700 entities with detailed tabular and textual information from document processing. DuckDB isolation by separate .duckdb files per project is a design decision implemented within DataLens Development.

ServerOperations

DuckDB database system

The Qdrant vector index depends on the DuckDB database for text chunk storage and embedding data source in the DataLens platform.

RequirementIntent

DuckDB extraction

Employs DuckDB for data extraction and schema auto-discovery, though initially marked as a technical requirement not yet fully met.

PhysicalTableData Model

DuckDB file /app/storage/project_4.duckdb

A DuckDB file located at /app/storage/project_4.duckdb holds the extracted project data, including structured tables and metadata, used during processing and analysis. Project 4 data is stored in a dedicated DuckDB project database file at /app/storage/project_4.duckdb. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDB engine is mapped to the physical analytics.db database file on elin. DockerDB file storage is represented as Project Storage for DuckDB database files.

IntegrationIntegrations

DuckDB integration

The FastAPI backend integrates with DuckDB for data extraction storage and querying.

CapabilityIntent

DuckDB schema auto-discovery

The Text-to-SQL capability in the DataLens Project requires DuckDB schema auto-discovery.