MKB Explorer

Matrix/Data Model

Data Model

152 entities found

DataEntityData Model

NYC municipal datasets

DataEntityData Model

omkostningsberegning

PhysicalTableData Model

omkostningsberegning table

The omkostningsberegning table is part of the PostgreSQL project_14 schema storing extracted SVGV data. The Data Discovery System requires the omkostningsberegning table for performing budget calculation queries.

EntityAttributeData Model

org_id

PhysicalTableData Model

Organization

Organization physical table includes multiple User entities representing users belonging to an organization. Project physical table represents projects that belong to an Organization entity. Organization physical table contains User physical table as members belonging to the organization. Organizations have users tables which store information about users belonging to the organization. Organizations contain projects tables which list all projects under the organization.

DataEntityData Model

PDF files

Phase 2 file types include PDF files Phase 2 Strategy Research & Decision Point considers 48 PDF files for processing in Phase 2 as a high priority task. Opus 4.6 recommends text extraction from all 48 PDFs with OCR only where relevant due to medium-high ROI and 2-3 hours effort estimation. Opus 4.6 states DOCX files provide better ROI than PDFs because DOCX are cleaner text with narrative context, while PDFs are more noisy and variable effort. Extractor Agents process PDFs to extract tables.

DataEntityData Model

PostgreSQL Decimal type

FindingsGenerator requires support for PostgreSQL Decimal type to correctly process numeric data in findings. The Live Backend supports PostgreSQL Decimal type for findings generation.

PhysicalTableData Model

PostgreSQL schema

The DataLens Platform uses a multi-tenant PostgreSQL schema for organizations, users, projects, files, and cataloging data. The DataLens Platform uses SQLAlchemy to interact with the PostgreSQL schema for database operations. The Backend uses a PostgreSQL schema for data storage.

PhysicalTableData Model

PostgreSQL theo:5433 proxy

PhysicalTableData Model

processed data directory

Stores processed data files, with no additional details specified.

PhysicalTableData Model

ProcessingJob

ProcessingJob physical table records background jobs executed for projects. The processing_jobs table contains a project_id column relating jobs to projects. FileUpload physical table depends on ProcessingJob physical table representing background processing jobs of uploaded files. Projects include processing_jobs tables that track background jobs related to files or data processing for the project.

PhysicalTableData Model

Project 14

Project 14 involves processing 131 SVGV files for budget analysis; current progress includes uploading, extraction, and initial query testing with plans to fully validate and utilize generated SQL and summaries. Project 14 initially used DuckDB to store extracted data tables during extraction pipeline operations. Arctic-Text2SQL-R1-7B queries Project 14 data through DuckDB schemas for SQL generation. Project 14 data was migrated from DuckDB to PostgreSQL to enable concurrent reads and writes. Arne Hauge is a verified user with access to Project 14. Project 14 contains the SVGV Budget 2026 data used in testing the Data Discovery feature. Project 4 data is stored in a dedicated DuckDB project database file at /app/storage/project_4.duckdb. Project 14 will transition to using its own PostgreSQL schema for extracted data storage. The DataLens project is approved and accessed by the user admin@exerun.com. DuckDB hosts the data tables extracted for Project 14 from SVGV files for analytical queries.

PhysicalTableData Model

project_14.duckdb

DuckDBService manages Project 14 DuckDB file for extracted data operations. Migration script reads data from Project 14 DuckDB file with read-only access during migration.

PhysicalTableData Model

project_gdpr_flags

project_gdpr_flags stores GDPR sensitivity detection results for files and columns, indicating potential PII, with confidence scores and timestamps for compliance. The project_gdpr_flags table stores GDPR flags linked to projects via project_id. GDPR flags reference files through the file_id column in project_gdpr_flags. GDPR PII detection results are recorded in the project_gdpr_flags table to track personal data sensitivity. ProjectGdprFlag data entity uses SchemaProfile data entity to define schema-related GDPR flags. AgentSession data entity uses ProjectGdprFlag data entity to manage GDPR compliance flags. Projects contain project_gdpr_flags tables that flag GDPR-related data for the project.

DataEntityData Model

projects

The Backend handles projects The DataLens Platform has a Projects module to handle project creation and management. The projects tables contain agent_sessions tables providing session data for analysis. Projects contain project_gdpr_flags tables that flag GDPR-related data for the project. Projects include schema_profiles tables which hold profile data of database schemas associated with the project. Projects include processing_jobs tables that track background jobs related to files or data processing for the project. Organizations contain projects tables which list all projects under the organization. Projects include file_uploads tables which track files uploaded to the project. Projects track queries tables that record SQL queries run against project data. Projects possess insights tables which store generated insights tied to project data analysis. projects page is implemented as a SvelteKit page. projects page is built using the SvelteKit framework. projects page is part of the DataLens system. The projects page is part of the DataLens project.

EntityAttributeData Model

projects.default_schema

PhysicalTableData Model

projects/svgv-budget-analysis/ANALYTICAL_RESULTS.md file

Contains 33 of 35 answered analytical questions, verified budget of 328.5M DKK, includes sources, and is part of report derivation.

PhysicalTableData Model

Python script question_router.py

Modified to reduce max_tables from 5 to 2, improve schema selection, and handle context window issues, boosting success rate in query generation.

PhysicalTableData Model

Qdrant vector search service

The Qdrant vector search service uses Ollama embeddings for generating vector representations of data. The Qdrant vector index depends on the DuckDB database for text chunk storage and embedding data source in the DataLens platform.

DataEntityData Model

Qdrant vectors

Qdrant vectors store the vector embeddings generated by Ollama embeddings from Docling extracted chunks for semantic search.

PhysicalTableData Model

Query

Query uses PostgreSQL Database to persist query history and metadata about user questions and projects. Query physical table refers to queries executed on projects as part of data analysis. The queries table includes a project_id column that links each query to a specific project. The queries table includes a user_id column that associates each query with a user. Query history contains Query records representing individual answered questions stored in the database. Query physical table generates Insight physical table containing analytical insights from executed queries. Projects track queries tables that record SQL queries run against project data. Query data entity references the Project entity by project_id. Query data entity references the User entity by user_id.

PhysicalTableData Model

Query Router

Query Router currently reads extracted data from DuckDB. After migration, Query Router will read extracted data from PostgreSQL schemas.

DataEntityData Model

query success rate

Intelligent consolidation improves the query success rate from 70% to over 95%.

PhysicalTableData Model

sales table

The sales table is stored in DuckDB (analytics.db)

PhysicalTableData Model

sample_sales table

sample_sales table exists within DuckDB (analytics.db) The sample_sales table is stored in DuckDB as a physical table.

DataEntityData Model

Scandinavian budget data schema

DataEntityData Model

Schema Graph

Phase A Schema Graph Construction implements the creation of the Schema Graph representing join relationships and clustering of tables. The logical Schema Graph of tables maps to the physical Consolidated Unified Views created as transient database views.

PhysicalTableData Model

schema.sql

The DataLens platform backend uses the schema.sql database schema file. The DataLens platform backend includes the schema.sql database schema.

DataEntityData Model

SchemaProfile

Pydantic model in agent_models.py. Fields include domain_area, classification, sensitivity_flag, and persistence_type, representing schema profile data. Classification: value_object; domain_area and other attributes are optional. Schema profiles relate to projects using the project_id field in schema_profiles. ProjectGdprFlag data entity uses SchemaProfile data entity to define schema-related GDPR flags. Projects include schema_profiles tables which hold profile data of database schemas associated with the project.

EntityAttributeData Model

scope field

The project goal concept is represented by the scope field in the data model although renamed in UI. The scope field is required in the ProjectCreate pydantic model as per design. The scope field is included in the ProjectResponse to ensure it is always present. The scope field replaces hardcoded text in backend/app/workers/catalog.py to generate more accurate file summaries. Word count validation constrains the scope field to have a hard minimum of 20 words. Project Creation Form uses the scope field renamed as "Project Goal" in the UI for project creation. ProjectCreate validation for scope validates that the scope field meets minimum word count requirements.