MKB Explorer

Matrix/Data Model

Data Model

152 entities found

catalog worker

Background workers include the catalog worker as a component. The batch processor orchestrator depends on the catalog worker as the first step in the batch processing pipeline.

DataEntityData Model

Chicago municipal datasets

DataEntityData Model

Cleveland Clinic Dataset

DataLens Development uses the Cleveland Clinic Dataset for end-to-end system testing and validation.

PhysicalTableData Model

Commit 507c94b

Commit 507c94b adds support for decimal types by importing Decimal and updating numeric column detection. Commit 507c94b adds Decimal type support by importing Decimal from the decimal module and updating the numeric column detection logic.

PhysicalTableData Model

CompressedSchemaBuilder

CompressedSchemaBuilder produces compressed schema representations optimized for SQL generation by Arctic-Text2SQL-R1-7B within token limits. Multi-Stage Text-to-SQL Architecture realizes the CompressedSchemaBuilder use case to generate compact schema representations for SQL generation.

PhysicalTableData Model

Consolidated Unified Views

The logical Schema Graph of tables maps to the physical Consolidated Unified Views created as transient database views. The Consolidation Mechanism produces Consolidated Unified Views by creating session-scoped joins of related tables for queries.

PhysicalTableData Model

consolidated view

The analysis pipeline is planned to use a consolidated view entity when conduction analyses. The consolidated view contains the unified schema for querying purposes.

DataEntityData Model

consolidation

The consolidation entity is to be connected and used within the analysis pipeline.

DataEntityData Model

Consolidation recommendation store

The Data Consolidation process stores consolidation recommendations for user reference.

PhysicalTableData Model

CONSOLIDATION_EXAMPLES.md

Sample cases illustrating schema consolidation and table linking strategies.

DataEntityData Model

Danish budget tables

490+ Danish budget tables with complex Danish names and descriptions. Handle schema comprehension, translation, and relevance filtering using multi-stage semantic and re-ranking processes. Focused on high-quality, fast SQL generation while managing large schemas. Multi-Stage Text-to-SQL Architecture handles 490+ Danish budget tables as the input schema. The Backend Service schema_graph.py uses the Danish Budget Tables to perform join key analysis and table clustering for consolidation.

NamingConventionData Model

Danish Keywords Dictionary

In use for Danish-language question understanding and keyword-based routing, improving NLP processing in the platform.

DataEntityData Model

Danish municipal budget data

Contains 473 extracted budget tables (SVGV dataset) with ~351,842 rows, structured for Danish municipal budget analysis, queryable via PostgreSQL.

PhysicalTableData Model

Danish questions

The Discovery Service processes Danish questions for entity extraction, table ranking, and join detection. User asks Danish language budget queries to the DataLens SVGV Budget analysis system Agent Chat interface handles Danish language budget queries from users

NamingConventionData Model

Danish table names

Danish table names, such as 'udgifter_til_sociale_ydelser_2023', are translated into English, e.g., 'expenses social benefits 2023', aiding semantic understanding in data processing.

DataEntityData Model

Data

Data entities include structured tables and summaries, with ongoing validation of integrity.

PhysicalTableData Model

data analytics database

PostgreSQL is used as the primary data storage for extracted data in the new platform, supporting concurrent read/write access, schema per project, and full-text search, replacing DuckDB for better performance and scalability.

DataEntityData Model

Data queryable tables

473 tables created in PostgreSQL for Project 14 after full dataset extraction, totaling ~351,842 rows. Tables are queryable and include Danish field names, supporting analysis and natural language questions.

DataEntityData Model

database schema

Represents the data structure used during data management but no specific details provided in messages.

DataEntityData Model

discovery.py service

The Data Discovery feature contains the discovery.py service. The discovery.py service uses the TableIndex for semantic table matching. The Discovery Service processes Danish questions for entity extraction, table ranking, and join detection. The Discovery Service uses a 4-factor relevance score to rank tables by matching criteria. The Discovery Service applies the Known keys join strategy to discover joins with 95% confidence. The Discovery Service applies the ID matching join strategy for join discovery with 85% confidence. The Discovery Service applies the Value overlap join strategy to identify joins by data overlap with 75% confidence. The Discovery Service implements the Schema consolidation mechanism to improve query success rate. DiscoveryService uses Table to represent database tables with metadata in consolidation recommendations. DiscoveryService uses JoinPath to represent joins between database tables for consolidation. DiscoveryService produces ConsolidationRecommendation to suggest table consolidations for questions. FilePrioritizer uses DiscoveryService outputs to prioritize project files relevant to analytical questions. The discovery.py service implements semantic table matching through TableIndex. The Data Discovery Feature includes the Backend discovery service which performs entity extraction, table ranking, and join discovery. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Discovery Service passes unified schemas to Arctic LLM for SQL generation after table consolidation. The Data Discovery Feature includes the Backend discovery service which performs entity extraction, table ranking, and join discovery. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Discovery Service passes unified schemas to Arctic LLM for SQL generation after table consolidation.

PhysicalTableData Model

doc_text_chunks

The text_chunks table includes a file_id referencing the file from which the text chunk originated. Each text chunk in the text_chunks table is associated with a project via project_id. File uploads contain text_chunks tables which store chunks of text extracted from uploaded files.

PhysicalTableData Model

doc_text_chunks table schema

PhysicalTableData Model

Dockerfile

Backend includes a production-ready Dockerfile for deployment. Frontend includes a multi-stage Dockerfile for production builds. DataLens Agent Mode integrates IronClaw running as a sidecar Docker service for deployment. The FastAPI backend provides a production-ready Dockerfile for deployment. The SvelteKit frontend provides its own production-ready Dockerfile for deployment. The Dockerfile is used together with docker-compose.yml for multi-container deployment.

PhysicalTableData Model

docs/architecture/CONSOLIDATION_EXAMPLES.md

Provides concrete examples of schema relationships and consolidation processes, aiding comprehension of multi-table queries.

PhysicalTableData Model

docs/architecture/MULTI_STAGE_TEXT2SQL.md

Contains architecture design for multi-stage SQL generation including improvements and strategies for handling large schemas.

PhysicalTableData Model

docs/INTELLIGENT_CONSOLIDATION.md

Describes the architecture for table relationship discovery, schema relation graphs, and multi-table query optimization.

PhysicalTableData Model

DS-STAR extractors

Utilizes CSV, Excel, and PDF extractors for data ingestion; CSV extractor handles CSV data, Excel extractor manages multi-sheet files, and PDF extractor extracts tables from PDFs, as part of the data extraction plan on Day 2.

PhysicalTableData Model

DuckDB file /app/storage/project_4.duckdb

A DuckDB file located at /app/storage/project_4.duckdb holds the extracted project data, including structured tables and metadata, used during processing and analysis. Project 4 data is stored in a dedicated DuckDB project database file at /app/storage/project_4.duckdb. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDBService uses the DuckDB file as physical storage for per-project data (e.g., project_4.duckdb). Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. DuckDB engine is mapped to the physical analytics.db database file on elin. DockerDB file storage is represented as Project Storage for DuckDB database files.

DataEntityData Model

Entity extraction

The Backend discovery service requires the Entity extraction capability to process Danish questions.

PhysicalTableData Model

Excel survey data

The Excel survey data is mapped to physical tables in DuckDB database.