MKB Explorer

Phase A Schema Graph Construction

Phase A Schema Graph Construction implements the creation of the Schema Graph representing join relationships and clustering of tables.

Phase A: Batch upload and enhanced AI cataloging

Phase A includes the batch upload and enhanced AI cataloging as part of the batch upload pipeline implementation. The batch upload pipeline includes Phase A which is batch upload plus enhanced AI cataloging.

Phase B Intelligent Retrieval

Phase B Intelligent Retrieval implements the Query Enhancer for entity extraction and relevant table identification for queries. Phase B involving the smart auto-processing pipeline with Qdrant is part of the smart processing UX model.

Phase C Integration

Phase C Integration modifies Backend Application Services to use consolidated views for query analysis within the DataLens System. Phase C, the unified question interface, is part of the smart processing UX model for DataLens.

Pipeline Architecture

The plan calls for testing the full pipeline from upload to query and insight generation.

Port Mapping

POSTGRES_DB environment variable

POSTGRES_PASSWORD environment variable

POSTGRES_USER environment variable

AcceptanceCriteriaIntent

PostgreSQL Health Check

PostgreSQL metadata storage

Uses PostgreSQL for storing system metadata, with ready deployment in production.

postgreSQL MVCC

prepare-data

DataLens Agent Mode implements the prepare-data skill for cleaning and transforming datasets via SQL or Python operations.

PrepareDataSkill

PrepareDataSkill produces SkillResult during data cleaning and transformation executions. ExtractionCoordinator uses PrepareDataSkill to process data after extraction across CPU and GPU services.

UserStoryIntent

prioritize worker

Background workers include the prioritize worker as a component. The batch processor orchestrator uses the prioritize worker to assign processing tiers.

Private Model Backend

DataLens Agent Mode supports a Private Model Backend using Ollama self-hosted LLMs for GDPR-compliant inference.

process_message

The process_message function is expected to call the _run_query function to execute queries, but current data flow problem stops execution before _run_query is reached. In agent_skills.py, the process_message() function calls _run_query asynchronously to generate query results.

Production-Ready Infrastructure

Production-Ready Infrastructure relies on DuckDB for analytics data storage with read-only connections and timeouts. Production-Ready Infrastructure integrates with Qdrant for vector storage. Production-Ready Infrastructure uses Ollama for GPU-accelerated LLM inference and embeddings.

Progress Indicator Fix

Corrected progress tracking by removing faulty ORM import, enabling accurate real-time display of catalog, extraction, and vectorization statuses.

StakeholderIntent

Project

Represents a data analysis project involving internal stakeholders with high influence. Data models such as StandardSalaryRecord, StandardHealthRecord, StandardFinancialTransaction, StandardGeographicData, and StandardBudgetRecord are used within projects to structure relevant data. The project entity is linked to physical tables like FileUpload, Query, Insight, and ProcessingJob, which manage project files, executed queries, insights, and background tasks respectively. Each Project's data is stored in a dedicated DuckDB file (e.g., project_4.duckdb) managed by DuckDBService. Each Project's semantic data is stored in a dedicated Qdrant collection used by QdrantService. User interacts with Project data via the API, querying and managing project-specific information. Project physical table contains multiple FileUpload physical tables representing uploaded files associated with the project. PostgreSQL database stores project metadata like org_id and created_by user. Query data entity references the Project entity by project_id.

EpicIntent

Project 4

SVGV Budget Analysis is the Project 4 deployed on the platform for batch extraction and analysis.

EpicIntent

Project 9 with SVGV scope

Comprehensive plan to build a multi-tenant data platform for municipal finance data, leveraging AI extraction and structured schemas.

Project context in summary generation

Project context in summary generation is required by the DSStarService file cataloging workflow to produce contextually relevant AI summaries.

EpicIntent

Project Goal Feature

A key epic for setting project scope, priorities, and goals, under development and testing.

VisionIntent

Project goal field