MKB Explorer

Matrix/All Domains

All Domains

1587 entities found

PandasAI

DataLens supports embedded analytics with DuckDB and SQL which covers capabilities similar to PandasAI. PandasAI and DataLens both provide capabilities to query and analyze data, but DataLens uses DuckDB and SQL for more robust analysis.

RequirementIntent

paragraph boundary splitter

RequirementIntent

Parallel inference across multiple Ollama instances

RequirementIntent

Partial Extraction

The Phase 2 MVP includes Partial Extraction of 44 files (Excel and PDF) into DuckDB. Partial Extraction stores extracted data in DuckDB file for Project 4.

ThirdPartyComponentArchitecture

passlib

passlib is used by the Auth system in the DataLens Platform for hashing passwords securely. Passlib depends on bcrypt for password hashing.

IntegrationIntegrations

PATCH /api/v1/auth/me API endpoint

The User Language Preference capability uses the PATCH /api/v1/auth/me API endpoint to update the user's language setting.

IntegrationEndpointIntegrations

PATCH /findings/{finding_id}

Update a finding (pin/unpin, tag) at PATCH /findings/{finding_id}. Listing findings for a project and updating individual findings are related operations for managing findings. Updating and deleting a finding relate to managing the lifecycle of findings in the agent backend.

IntegrationEndpointIntegrations

PATCH /me

Getting and updating the current user's information are related operations in auth.py.

IntegrationEndpointIntegrations

PATCH /sessions/{session_id}/model

API to switch the LLM backend for a session via PATCH /sessions/{session_id}/model. Within the agent backend, the message sending for sessions and model switching for sessions operate on the same session resource.

CapabilityIntent

Payment & Commitment Cluster

Payment & Commitment Cluster requires the Consolidation Mechanism to join payments and commitments data tables appropriately.

RequirementIntent

PDF Extraction

PDF extractor component implements the PDF Extraction requirement, but large file support is partially untested.

Entity

PDF extractor

DataLens Project includes a PDF extractor, with limited tested capabilities on large documents. PDF Extractor extracts tables from PDFs and loads them into DuckDB.

DataEntityData Model

PDF files

Phase 2 file types include PDF files Phase 2 Strategy Research & Decision Point considers 48 PDF files for processing in Phase 2 as a high priority task. Opus 4.6 recommends text extraction from all 48 PDFs with OCR only where relevant due to medium-high ROI and 2-3 hours effort estimation. Opus 4.6 states DOCX files provide better ROI than PDFs because DOCX are cleaner text with narrative context, while PDFs are more noisy and variable effort. Extractor Agents process PDFs to extract tables.

BusinessProcessIntent

PDF Infrastructure

The Cleveland Clinic E2E use case utilizes the PDF Infrastructure to handle extraction of PDF files. DataLens Development requires PDF Infrastructure capability to support lazy loading and on-demand table extraction from large PDFs. Lazy Loading Implementation is part of the PDF Infrastructure feature set to enhance user experience during large PDF uploads.

ThirdPartyComponentArchitecture

pdfplumber

Pdfplumber uses python-magic for MIME type detection during PDF table extraction.

VisionIntent

PermissionFilter

DataLens needs to integrate PermissionFilter mechanisms to enforce row-level security and user permission filtering in SQL. DataLens requires adding PermissionFilter for query filtering based on user access. DataLens requires PermissionFilter to enforce per-user query filtering like Vanna

CapabilityIntent

pg_data_service.py

The new pg_data_service.py file provides PostgreSQL data management functionality replacing DuckDBService. The extract.py worker is modified to write extracted data using pg_data_service.py instead of DuckDBService. question_router.py is updated to use pg_data_service.py for reading extracted data in place of DuckDBService.

AgentCommandAgentic Discipline

pg_dump command

AgentCommandAgentic Discipline

pg_isready command

ThirdPartyComponentArchitecture

PgDataService

PgDataService manages PostgreSQL data storage for DataLens, replacing DuckDBService. It handles schema creation, table registry, data insertion, and querying, supporting project-specific schemas, and facilitates text chunk and table metadata storage, with type conversions guiding data loading and query execution in PostgreSQL. PgDataService is an alternative service managing PostgreSQL connections, complementing DuckDBService for DuckDB database management. DataLensPostgresRunner wraps PgDataService to enforce project-scoped schema isolation for SQL execution.

RequirementIntent

Phase 1 Implementation

Phase 1 Implementation includes reducing the maximum number of tables passed to SQL generator via Schema Limiting to avoid Arctic context overflow. The Phase 2 Strategy Research & Decision Point depends on the status of Phase 1, which is currently frozen with partial Excel file processing completed.

BusinessProcessIntent

Phase 1 Setup

EpicIntent

phase 1 to 7 Implementation steps

WorkPackageGovernance

Phase 1: Language column + LLM prompts

Implement addition of language preference in user profiles and modify LLM prompts to respond in Danish or English accordingly.

ServerOperations

Phase 2 file types

PHASE2_UNIFIED_STRATEGY.md defines the pipeline design and tool justifications for processing Phase 2 file types PHASE2_IMPLEMENTATION_PLAN.md provides go/no-go recommendation and effort analysis for Phase 2 file types processing Phase 2 file types include PDF files Phase 2 file types include DOCX files Phase 2 file types include PPTX files Phase 2 file types include MSG files The Docling extraction system was built as part of the DataLens Phase 2 GPU-first document extraction system.

RequirementIntent

Phase 2 Implementation

Phase 2 GPU-First Document Extraction involves GPU-first document extraction as its core capability.

BusinessProcessIntent

All Domains

PandasAI

paragraph boundary splitter

Parallel inference across multiple Ollama instances

Partial Extraction

passlib

PATCH /api/v1/auth/me API endpoint

PATCH /findings/{finding_id}

PATCH /me

PATCH /sessions/{session_id}/model

Payment & Commitment Cluster

PDF Extraction

PDF extractor

PDF files

PDF Infrastructure

pdfplumber

Performance Considerations

performance metrics

Performance test suite

PermissionFilter

pg_data_service.py

pg_dump command

pg_isready command

PgDataService

Phase 1 Implementation

Phase 1 Setup

phase 1 to 7 Implementation steps

Phase 1: Language column + LLM prompts

Phase 2 file types

Phase 2 Implementation

Phase 2 Migrate Existing Data