MKB Explorer

Matrix/Data Model

Data Model

152 entities found

extract worker

Background workers include the extract worker as a component. The batch processor orchestrator relies on the extract worker to perform data extraction from files.

PhysicalTableData Model

extracted_tables

Each project schema such as project_14.* contains multiple tables whose metadata is registered in the extracted_tables registry table in the public schema. PgDataService updates and queries the extracted_tables table to track metadata of tables per project schema. extracted_tables table tracks metadata about extracted tables per project inside PostgreSQL. PgDataService manages the extracted_tables registry in PostgreSQL for tracking table metadata.

PhysicalTableData Model

extracted_tables registry

PhysicalTableData Model

extracted_text_chunks

extracted_text_chunks table stores all textual data chunks from all projects centrally in PostgreSQL public schema. PgDataService manages inserting and querying extracted_text_chunks table in PostgreSQL for textual data storage and retrieval. Text chunks for all projects are stored in a unified table in PostgreSQL's public schema. Text chunks for all projects are stored in a unified table in PostgreSQL's public schema.

PhysicalTableData Model

extracted_text_chunks table

DataEntityData Model

ExtractedTable model

PhysicalTableData Model

extraction worker

Extraction Worker currently writes extracted data to DuckDB files. Extraction Worker will write extracted data to PostgreSQL schemas instead of DuckDB.

PhysicalTableData Model

extractors

DS-STAR Intelligence integrates with existing extractors. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. DataLens Master Implementation Plan depends on Extractor components for data ingestion and processing during extraction phases. DS-STAR Intelligence integrates with existing extractors to improve extraction quality.

EntityAttributeData Model

file_upload.catalog_data

PhysicalTableData Model

file_uploads

The file_uploads data entity tracks the processing status of the 132 files in the project_14 schema. The file_uploads table contains a project_id column linking uploaded files to projects. The file_uploads table includes an uploaded_by field that links each file to the user who uploaded it. The text_chunks table includes a file_id referencing the file from which the text chunk originated. GDPR flags reference files through the file_id column in project_gdpr_flags. The data catalog generation process depends on the upload directory where user files are stored. File uploads contain text_chunks tables which store chunks of text extracted from uploaded files. Projects include file_uploads tables which track files uploaded to the project.

EntityAttributeData Model

FileUpload record ai_summary column

AI Summary Generation stores summaries in the FileUpload record ai_summary column after extraction completes in the cataloging workflow. Dashboard file.ai_summary display renders data stored in the FileUpload record ai_summary column for users to see file summaries. The PostgreSQL database stores FileUpload records with the ai_summary column used for AI summaries of files. File prioritizer uses information in FileUpload columns such as tier and AI summary to assign processing priorities.

PhysicalTableData Model

FL 2022-2028 budget file

Docling was used to extract 30 tables including from the FL 2022-2028 budget file within the SVGV Budget Analysis Project.

PhysicalTableData Model

frontend/tests/test-discovery.sh

A comprehensive test script created to validate the DataDiscovery feature, allowing automated or interactive testing of table discovery, consolidation, and analysis workflows with real SVGV data, ensuring full end-to-end functionality and performance validation.

DataSensitivityData Model

GDPR-blocked data

PhysicalTableData Model

generate_sql

Converts natural language questions into SQL queries using LLM, schema info, and prompt.

PhysicalTableData Model

get_all_schemas

Gets complete schema info for all tables in a project, used for prompt building.

PhysicalTableData Model

get_schema

Retrieves schema details for specific table in a project.

PhysicalTableData Model

get_tables

Lists all tables within a project's DuckDB database.

DataEntityData Model

HR data

PhysicalTableData Model

information_schema.tables

A system view in SQL databases used to retrieve metadata about existing tables, crucial for detecting and managing schema changes during the data extraction and consolidation processes.

PhysicalTableData Model

Insight

Insight physical table holds insights derived from project data. The insights table has a project_id column that links each insight to a project. Query physical table generates Insight physical table containing analytical insights from executed queries. Insight physical table is derived from AgentFinding data entity representing findings extracted by the agent. Projects possess insights tables which store generated insights tied to project data analysis.

DataEntityData Model

insights table

The insights table is the data source that Analysis Recommendations build upon to generate actionable cards using the project's goal context.

PhysicalTableData Model

intelligent table discovery

Built as part of DataLens' data discovery system, it automates table ranking and join discovery, enhancing query success from 70% to over 95%. It uses entity extraction, relevance scoring, and pattern recognition to pre-select related tables, creating transient views for Arctic to generate accurate SQL.

PhysicalTableData Model

IronClaw agent tables

IronClaw agent tables are physical tables mapped within the PostgreSQL database for persistent storage of agent sessions and related data. The Agent session data entity maps to the IronClaw agent tables in the database. The IronClaw agent feature depends on the IronClaw agent tables for storing session and message data.

PhysicalTableData Model

IronClaw database

IronClaw Agent requires IronClaw database to be configured and operational to persist sessions and support agent session creation. IronClaw onboarding process initializes the IronClaw database to enable session storage and persistent agent threads. IronClaw service on elin requires IronClaw database for session persistence and agent thread management. IronClaw onboard process configures and initializes the IronClaw database to enable session and thread management.

PhysicalTableData Model

Join discovery

The Backend discovery service depends on Join discovery mechanisms to find table relationships.

DataRelationshipData Model

Join relationships

NamingConventionData Model

JSON format (not JSONB/ARRAY)

To ensure cross-database compatibility, the DataLens Platform uses JSON format instead of JSONB or ARRAY for storing structured data.

DataEntityData Model

LLM

WrenAI supports multiple LLM providers including OpenAI, Anthropic, Ollama, and Bedrock.

DataEntityData Model

MDL

A declarative schema layer used in other architectures for data modeling, not primary to DataLens.