Data Model
152 entities found
catalog worker
Background workers include the catalog worker as a component. The batch processor orchestrator depends on the catalog worker as the first step in the batch processing pipeline.
Chicago municipal datasets
Cleveland Clinic Dataset
DataLens Development uses the Cleveland Clinic Dataset for end-to-end system testing and validation.
Commit 507c94b
Commit 507c94b adds support for decimal types by importing Decimal and updating numeric column detection. Commit 507c94b adds Decimal type support by importing Decimal from the decimal module and updating the numeric column detection logic.
CompressedSchemaBuilder
CompressedSchemaBuilder produces compressed schema representations optimized for SQL generation by Arctic-Text2SQL-R1-7B within token limits. Multi-Stage Text-to-SQL Architecture realizes the CompressedSchemaBuilder use case to generate compact schema representations for SQL generation.
Consolidated Unified Views
The logical Schema Graph of tables maps to the physical Consolidated Unified Views created as transient database views. The Consolidation Mechanism produces Consolidated Unified Views by creating session-scoped joins of related tables for queries.
consolidated view
The analysis pipeline is planned to use a consolidated view entity when conduction analyses. The consolidated view contains the unified schema for querying purposes.
consolidation
The consolidation entity is to be connected and used within the analysis pipeline.
Consolidation recommendation store
The Data Consolidation process stores consolidation recommendations for user reference.
CONSOLIDATION_EXAMPLES.md
Sample cases illustrating schema consolidation and table linking strategies.
Danish budget tables
490+ Danish budget tables with complex Danish names and descriptions. Handle schema comprehension, translation, and relevance filtering using multi-stage semantic and re-ranking processes. Focused on high-quality, fast SQL generation while managing large schemas. Multi-Stage Text-to-SQL Architecture handles 490+ Danish budget tables as the input schema. The Backend Service schema_graph.py uses the Danish Budget Tables to perform join key analysis and table clustering for consolidation.
Danish Keywords Dictionary
In use for Danish-language question understanding and keyword-based routing, improving NLP processing in the platform.
Danish municipal budget data
Contains 473 extracted budget tables (SVGV dataset) with ~351,842 rows, structured for Danish municipal budget analysis, queryable via PostgreSQL.
Danish questions
The Discovery Service processes Danish questions for entity extraction, table ranking, and join detection. User asks Danish language budget queries to the DataLens SVGV Budget analysis system Agent Chat interface handles Danish language budget queries from users
Danish table names
Danish table names, such as 'udgifter_til_sociale_ydelser_2023', are translated into English, e.g., 'expenses social benefits 2023', aiding semantic understanding in data processing.
Data
Data entities include structured tables and summaries, with ongoing validation of integrity.
data analytics database
PostgreSQL is used as the primary data storage for extracted data in the new platform, supporting concurrent read/write access, schema per project, and full-text search, replacing DuckDB for better performance and scalability.
Data queryable tables
473 tables created in PostgreSQL for Project 14 after full dataset extraction, totaling ~351,842 rows. Tables are queryable and include Danish field names, supporting analysis and natural language questions.
database schema
Represents the data structure used during data management but no specific details provided in messages.
discovery.py service
The Data Discovery feature contains the discovery.py service. The discovery.py service uses the TableIndex for semantic table matching. The Discovery Service processes Danish questions for entity extraction, table ranking, and join detection. The Discovery Service uses a 4-factor relevance score to rank tables by matching criteria. The Discovery Service applies the Known keys join strategy to discover joins with 95% confidence. The Discovery Service applies the ID matching join strategy for join discovery with 85% confidence. The Discovery Service applies the Value overlap join strategy to identify joins by data overlap with 75% confidence. The Discovery Service implements the Schema consolidation mechanism to improve query success rate. DiscoveryService uses Table to represent database tables with metadata in consolidation recommendations. DiscoveryService uses JoinPath to represent joins between database tables for consolidation. DiscoveryService produces ConsolidationRecommendation to suggest table consolidations for questions. FilePrioritizer uses DiscoveryService outputs to prioritize project files relevant to analytical questions. The discovery.py service implements semantic table matching through TableIndex. The Data Discovery Feature includes the Backend discovery service which performs entity extraction, table ranking, and join discovery. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Discovery Service passes unified schemas to Arctic LLM for SQL generation after table consolidation. The Data Discovery Feature includes the Backend discovery service which performs entity extraction, table ranking, and join discovery. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Discovery Service passes unified schemas to Arctic LLM for SQL generation after table consolidation.
doc_text_chunks
The text_chunks table includes a file_id referencing the file from which the text chunk originated. Each text chunk in the text_chunks table is associated with a project via project_id. File uploads contain text_chunks tables which store chunks of text extracted from uploaded files.
doc_text_chunks table schema
Dockerfile
Backend includes a production-ready Dockerfile for deployment. Frontend includes a multi-stage Dockerfile for production builds. DataLens Agent Mode integrates IronClaw running as a sidecar Docker service for deployment. The FastAPI backend provides a production-ready Dockerfile for deployment. The SvelteKit frontend provides its own production-ready Dockerfile for deployment. The Dockerfile is used together with docker-compose.yml for multi-container deployment.
docs/architecture/CONSOLIDATION_EXAMPLES.md
Provides concrete examples of schema relationships and consolidation processes, aiding comprehension of multi-table queries.
docs/architecture/MULTI_STAGE_TEXT2SQL.md
Contains architecture design for multi-stage SQL generation including improvements and strategies for handling large schemas.
docs/INTELLIGENT_CONSOLIDATION.md
Describes the architecture for table relationship discovery, schema relation graphs, and multi-table query optimization.
DS-STAR extractors
Utilizes CSV, Excel, and PDF extractors for data ingestion; CSV extractor handles CSV data, Excel extractor manages multi-sheet files, and PDF extractor extracts tables from PDFs, as part of the data extraction plan on Day 2.
DuckDB file /app/storage/project_4.duckdb
A DuckDB file located at /app/storage/project_4.duckdb holds the extracted project data, including structured tables and metadata, used during processing and analysis. Project 4 data is stored in a dedicated DuckDB project database file at /app/storage/project_4.duckdb. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDB engine is mapped to the physical analytics.db database file on elin. DockerDB file storage is represented as Project Storage for DuckDB database files.
Entity extraction
The Backend discovery service requires the Entity extraction capability to process Danish questions.
Excel survey data
The Excel survey data is mapped to physical tables in DuckDB database.