Project: datalens
81 entity types
Matrix/All Domains

All Domains

1587 entities found

ThirdPartyComponentArchitecture

DuckDB SQL

The unified question interface routes structured queries to DuckDB SQL within DataLens. The question router classifies questions and routes structured queries to DuckDB SQL backend. SQLCoder-7B generates valid DuckDB SQL code for query execution. The Text chunks table in DuckDB is a physical table supporting DuckDB SQL queries. The text chunks table in DuckDB with full-text search support is mapped to the DuckDB SQL data store.

IntegrationEndpointIntegrations

DuckDB tables

Phase 2 Strategy depends on DuckDB tables loaded by processing Excel files DuckDB text_chunks physical table stores semantic chunks produced by the Docling extraction system for querying and analysis. The 132 SVGV files map to DuckDB tables, which are used to store extracted budget data Backend API queries DuckDB tables produced from the SVGV extraction for budget data

RequirementIntent

DuckDB Text-to-SQL strategies

UserStoryIntent

DuckDB-NSQL-7B

The DuckDB-NSQL-7B model planned as a future upgrade for faster and more efficient SQL queries; currently, SQLCoder-7B handles SQL generation, with plans to enhance speed and capabilities.

CodingGuidelineGuidelines

DUCKDB_TO_PG_TYPES mapping

The DUCKDB_TO_PG_TYPES mapping guides PgDataService in converting DuckDB data types to PostgreSQL types when loading data.

PageUser Interface

DuckDBService

DuckDBService manages data storage and querying using DuckDB as the database. QuestionRouter uses DuckDBService to execute and fetch results from SQL queries as part of structured query execution. QuestionRouter uses DuckDBService to execute and fetch results from SQL queries as part of structured query execution. DuckDBService and PgDataService are separate services managing database queries for different systems. ExportService uses DuckDBService to export query results in various formats. PgDataService and DuckDBService manage SQL database queries for PostgreSQL and DuckDB respectively. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). PgDataService replaces DuckDBService as the data service backend for written extraction data. Backend currently uses DuckDBService for managing extracted data storage and querying. PgDataService replaces DuckDBService within the Backend as the data storage service. DuckDBService manages Project 14 DuckDB file for extracted data operations. DuckDBService provides database connections and query execution needed by QueryDataSkill. PgDataService is an alternative service managing PostgreSQL connections, complementing DuckDBService for DuckDB database management. QuestionRouter uses the DuckDB Service for executing SQL queries and accessing data. DuckDB Service is defined in backend/app/services/duckdb_service.py. The execute_query method is part of the DuckDB Service. Project Storage is used by the DuckDB Service to store project-specific DuckDB files.

CapabilityIntent

DuckDBService class

The DuckDBService class is defined within backend/app/services/duckdb_service.py. DuckDBService class uses the project_4.duckdb file to store and query project-specific data. PgDataService replaces DuckDBService for data storage and querying of extracted data. The new pg_data_service.py file provides PostgreSQL data management functionality replacing DuckDBService. The DATA_BACKEND feature flag can toggle between using PostgreSQL (PgDataService) and DuckDBService as the data backend. PgDataService replaced DuckDBService in extract.py and question_router.py to switch backend from DuckDB to PostgreSQL. The execute_query method is part of the DuckDB Service.

DesignDecisionArchitecture

E2E over features

Emphasizes delivering end-to-end (E2E) functionality over additional feature development to ensure core system robustness.

TestStrategyTesting

E2E test suite

The E2E test suite comprises 15 Playwright tests validating DataLens features on SVGV dataset, covering core functionality, UX, performance, and data validation. It ensures the Data Discovery system's robustness, stability, and correctness, with tests run through the script `tests/test-discovery.sh`. It is part of a comprehensive validation framework now completed and ready for deployment.

RequirementIntent

E2E workflow test

DataLens Development realizes a complete end-to-end flow that covers file upload, data extraction, SQL query execution, and result export.

ThirdPartyComponentArchitecture

E2E_DISCOVERY_TESTS.md

A complete guide for running and understanding the Playwright E2E tests that validate the full DataDiscovery pipeline, from question input to consolidation, analysis, and UI responsiveness using the SVGV dataset.

CapabilityIntent

Efficiency Analyzer

Efficiency Analyzer is a DS-STAR agent focused on operational and efficiency metrics analysis. Agent Selector directs queries to the Efficiency Analyzer agent

ServerOperations

elin

DataLens utilizes the shared GPU box named elin for data analysis workloads, running Docling on an RTX 4000 Ada 20GB GPU for document extraction, with SSH and network setup for Ollama and Qdrant APIs. The platform depends on elin for GPU inference, document extraction, and DS-STAR agents, supporting large document processing with limited tested capabilities for large PDFs. elin hosts the Ollama LLM server including SQLCoder-7B and Arctic-Text2SQL-R1-7B models accessible by theo. DataLens operates using the GPU box (elin) for agentic data analysis workloads with GPU acceleration. The hybrid deployment involves backend running on elin. Hybrid deployment architecture runs all Python and AI workloads on elin. The hybrid deployment involves backend running on elin. Hybrid deployment architecture runs all Python and AI workloads on elin.

ServerOperations

elin (GPU processing)

The Extraction Pipeline (GPU-First) utilizes the elin GPU processing server which hosts the RTX 4000 GPU, runs Docling for extraction, Ollama for embeddings, and CUDA 12.8. The DataLens Platform uses the GPU box (elin) which hosts Ollama, Qdrant, and DS-STAR agents for AI capabilities. The theo backend integrates with the elin GPU server via SSH to orchestrate Docling extraction and embedding jobs on the GPU hardware running on elin. Docling extraction for DOCX and PPTX files requires the GPU hardware on the elin server to accelerate the extraction and embedding processes. The Backend API integrates with the Ollama GPU service to run LLM models qwen3-coder-next and nomic-embed-text for query classification and embeddings generation.

RequirementIntent

Elin environment variables

The Anthropic API key must be set in the elin environment for the OpenClaw Gateway to authenticate API calls to Anthropic. The Anthropic API key was added to the elin environment and resulted in successful OpenClaw Gateway authentication and Claude response. The Anthropic API key must be set in the elin environment for the OpenClaw Gateway to authenticate API calls to Anthropic. The Anthropic API key was added to the elin environment and resulted in successful OpenClaw Gateway authentication and Claude response.

ServerOperations

elin:11434

ServerOperations

elin:6333

IntegrationIntegrations

ElinSkillClient

RingfencedSkills uses ElinSkillClient to execute constrained skill operations for DataLens agent. ElinSkillClient integrates with RingfencedSkills to perform ringfenced skill executions on elin.

ThirdPartyComponentArchitecture

email-validator

email-validator v2.2.0 is used, no additional details given.

Entity

embed_texts

Generates text embeddings via Ollama for semantic search.

BusinessProcessIntent

EmbeddingService

EmbeddingService produces embeddings used by QdrantService for semantic search and vector collections. Document RAG uses embedding models for semantic vector generation, optionally on CPU or Ollama GPU.

DataEntityData Model

Entity extraction

The Backend discovery service requires the Entity extraction capability to process Danish questions.

DesignDecisionArchitecture

Entity-Based Query Parsing

RequirementIntent

Environment Variables

The ANTHROPIC_API_KEY is required to be set in the Coolify environment to allow Claude to respond and prevent analysis request timeouts The User is expected to provide the Anthropic API key The Anthropic API key is required by the OpenClaw Gateway to authenticate calls to Claude for query processing The Anthropic API key must be set in the Elin environment variables for the OpenClaw Gateway to use it The ANTHROPIC_API_KEY is required to be set in the Coolify environment to allow Claude to respond and prevent analysis request timeouts The User is expected to provide the Anthropic API key The Anthropic API key is required by the OpenClaw Gateway to authenticate calls to Claude for query processing The Anthropic API key must be set in the Elin environment variables for the OpenClaw Gateway to use it

AcceptanceDocumentGovernance

Example workflows

RequirementIntent

Excel Extraction

Excel extractor component implements the Excel Extraction requirement.

Entity

Excel extractor

DataLens Project includes an Excel extractor tested with multi-sheet support and Unicode handling. Excel Extractor loads normalized Excel data into DuckDB.

BusinessProcessIntent

Excel files

Phase 2 Strategy depends on the prior processing status and results of Excel files from Phase 1 Budget questions are partially answerable from Excel files available after Phase 1 processing Phase 2 Strategy Research & Decision Point uses data from Excel files processed in Phase 1 as context for decision making. Extractor Agents handle Excel files for data normalization and extraction.

Entity

Excel sheets

PhysicalTableData Model

Excel survey data

The Excel survey data is mapped to physical tables in DuckDB database.