MKB Explorer

Matrix/All Domains

All Domains

1587 entities found

extractors

DS-STAR Intelligence integrates with existing extractors. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. DataLens Master Implementation Plan depends on Extractor components for data ingestion and processing during extraction phases. DS-STAR Intelligence integrates with existing extractors to improve extraction quality.

BusinessProcessIntent

FallbackTableIndex

Simple, in-memory keyword-based index for table search when Qdrant is unavailable, in backend/app/services/table_index.py.

ThirdPartyComponentArchitecture

FastAPI

The Backend API is implemented using FastAPI framework. The DataLens Platform backend is implemented as a FastAPI app to expose APIs for authentication, projects, files, extraction, and analysis. The Backend uses FastAPI as its capability framework for serving APIs. The streaming responses via /ask-stream endpoint are implemented using the FastAPI Framework to provide asynchronous request handling. IronClaw Agent Feature uses FastAPI for backend API implementation including asynchronous generators. API LAYER is implemented using FastAPI framework for backend services. The Backend uses FastAPI framework API Layer is implemented with FastAPI as the backend web framework. Backend process runs the FastAPI application along with its dependencies to serve API requests. Agent Gateway is implemented as a FastAPI module that bridges the frontend and IronClaw Service. The Platform Backend is built as a FastAPI app exposing API endpoints for auth, projects, files, extraction, and analysis. theo Backend is implemented with FastAPI for its web API and service operations. FastAPI depends on Uvicorn for serving the application. FastAPI uses python-multipart for multipart form data parsing. FastAPI framework implements the API Layer.

LayerArchitecture

FastAPI backend

Handles API requests, manages business logic, connects to data services, and serves the frontend for DataLens, using FastAPI routes and dependencies as its interface. DataLens Platform uses a FastAPI app backend to provide API endpoints and services. The FastAPI backend in the DataLens Project uses SQLAlchemy ORM for data access. The FastAPI backend uses DS-STAR FileAnalyzer for AI cataloging of uploaded files. The FastAPI backend integrates with DuckDB for data extraction storage and querying. The FastAPI backend exposes a Text-to-SQL query API for natural language queries. The SvelteKit frontend communicates with the FastAPI backend via API endpoints. The FastAPI backend uses PostgreSQL database for multi-tenant metadata storage. The FastAPI backend provides a production-ready Dockerfile for deployment. The FastAPI backend uses AI cataloging to automatically discover table structures on file upload. FastAPI backend runs on theo server and orchestrates file extraction, query execution, and database storage. The DataLens Platform is built with a FastAPI app backend that passes all 13 tests. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. httpx is used as part of tests covering the FastAPI app implementation of the DataLens Platform. Project 13 uses the FastAPI backend for file extraction and DuckDB storage as part of its data platform. theo hosts the FastAPI backend which runs file extraction, DuckDB storage, and uses SQLCoder-7B for query processing. The RQ worker is used by the FastAPI backend to handle asynchronous jobs such as embeddings and summary generation.

IntegrationIntegrations

FastAPI OpenAPI Spec

Defines RESTful API specifications for seamless integration and documentation.

ArchitecturalViewArchitecture

FastAPI Swagger

BusinessRuleIntent

Feature Flag Pattern

BusinessProcessIntent

File extraction process

The file extraction process triggers the async embedding queue to generate embeddings asynchronously after extraction completes.

PageUser Interface

File prioritizer

The file prioritizer component is utilized by the batch processor orchestrator to assign tiers and manage processing order. BatchProcessor uses FilePrioritizer to prioritize project files in processing pipeline orchestration. FilePrioritizer uses DiscoveryService outputs to prioritize project files relevant to analytical questions. The File prioritizer is used by the Batch processor orchestrator in the processing pipeline. The Project dashboard uses file prioritization features to show tier badges and organize files. ExtractionCoordinator coordinates extraction processes that are prioritized by FilePrioritizer. StorageService handles file storage and retrieval for FilePrioritizer's prioritization process. File prioritizer uses information in FileUpload columns such as tier and AI summary to assign processing priorities. The File prioritizer capability is validated by the test_file_prioritizer test case.

BusinessProcessIntent

File Status Breakdown

UseCaseIntent

File summaries

File summaries are developed to accurately reflect content, relevance, and key questions for each uploaded data file, informing users and linking to project goals. Files are cataloged within DataLens, supporting comprehensive AI-generated summaries to enhance data understanding.

RequirementIntent

file summary prompt

The project goal is used to inform the file summary prompt replacing the previous hardcoded budget analysis description.

UseCaseIntent

File Summary task

Generates concise summaries describing file contents, relevance, and questions they can answer, during file ingestion and cataloging, to improve data cataloging and documentation. File Summary Generation uses the project's scope instead of a hardcoded string to contextualize the summaries.

CapabilityIntent

File-First Data Platform

DefectTesting

file_analyzer.py

DataLens Project uses file_analyzer.py agent for file analysis but has a datetime serialization bug needing fix.

EntityAttributeData Model

file_upload.catalog_data

PhysicalTableData Model

file_uploads

The file_uploads data entity tracks the processing status of the 132 files in the project_14 schema. The file_uploads table contains a project_id column linking uploaded files to projects. The file_uploads table includes an uploaded_by field that links each file to the user who uploaded it. The text_chunks table includes a file_id referencing the file from which the text chunk originated. GDPR flags reference files through the file_id column in project_gdpr_flags. The data catalog generation process depends on the upload directory where user files are stored. File uploads contain text_chunks tables which store chunks of text extracted from uploaded files. Projects include file_uploads tables which track files uploaded to the project.

ThirdPartyComponentArchitecture

FileAnalyzer

The DataLens DS-STAR Implementation Plan includes the FileAnalyzer as a core component. FileAnalyzer is a component of the DS-STAR pipeline. The DS-STAR pipeline includes the FileAnalyzer component. The DataLens DS-STAR Implementation Plan includes the FileAnalyzer component. FileAnalyzer produces data catalogs used in the extraction planning process. Planner Agent uses the data catalog generated by FileAnalyzer as input.

Entity

files

The Backend manages files

IntegrationIntegrations

Files API Endpoints

The Files API supports AI Summary Generation by exposing necessary endpoints and data fields for storage and retrieval of AI summaries.

StakeholderIntent

FileUpload

DataLens Platform includes a file upload feature for CSV, Excel, and PDF files. DS-STAR FileAnalyzer integration depends on the file upload feature to automatically catalog files upon upload. FileUpload physical table entries are associated with Project entities, storing files related to projects. The Backend implements the file upload requirement. AI Summary Generation stores the generated summaries in the ai_summary column of the FileUpload records. Project physical table contains multiple FileUpload physical tables representing uploaded files associated with the project. FileUpload physical table depends on ProcessingJob physical table representing background processing jobs of uploaded files.

EntityAttributeData Model

FileUpload record ai_summary column

AI Summary Generation stores summaries in the FileUpload record ai_summary column after extraction completes in the cataloging workflow. Dashboard file.ai_summary display renders data stored in the FileUpload record ai_summary column for users to see file summaries. The PostgreSQL database stores FileUpload records with the ai_summary column used for AI summaries of files. File prioritizer uses information in FileUpload columns such as tier and AI summary to assign processing priorities.

UIComponentUser Interface

FileUpload.svelte

The Frontend uses FileUpload.svelte UI component for drag-drop file upload.

RequirementIntent

FILTER categories

The Full Findings Visualization Layer capability requires the support of 7 filter categories for filtering findings in the UI.

CapabilityIntent

final-validator-test.png

Image file showing test results or validation outcome for visualization features.

BusinessProcessIntent

Finding

FindingsGenerator uses Finding to represent individual analytical findings generated from query results. Finding is a data structure created and managed by FindingsGenerator during analytical finding generation.

Entity

findings

The generation and display of findings depend on correct SQL extraction, which was fixed by the regex updates.

PageUser Interface

Findings Board

PageUser Interface

Findings generation

Logging is needed for findings generation to trace data flow and identify failure points during findings creation. QuestionRouter integration works with FindingsGenerator logic in the agent architecture to process and generate findings. FindingsGenerator logic outputs generated findings via the message streaming format for frontend rendering.

RequirementIntent

findings_generator logging

findings_generator logging is a specific logging to be added at the start of findings generation for diagnostics. findings_generator logging is to be implemented in backend/app/services/findings_generator.py.