MKB Explorer

Matrix/All Domains

All Domains

1587 entities found

PUT /{project_id}

Update project endpoint, status unspecified.

pydantic-ai-slim

Pydantic-ai-slim integrates with anthropic for AI model support. Pydantic-ai-slim integrates with mistralai for AI model support. Pydantic-ai-slim integrates with groq for AI model support.

ThirdPartyComponentArchitecture

pydantic-graph

ThirdPartyComponentArchitecture

PyPDF2

ThirdPartyComponentArchitecture

pytest

DataLens uses pytest to run local tests before deployment. DataLens deployment workflow requires that pytest tests pass locally before pushing code for backend testing. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.

ThirdPartyComponentArchitecture

pytest-asyncio

EnvironmentOperations

Python

DataLens uses Python as part of its data analysis environment. The Architecture includes all Python and AI components running on elin.

ThirdPartyComponentArchitecture

Python client

RequirementIntent

Python environment setup

GPU Infrastructure requires Python environment setup with all dependencies.

PhysicalTableData Model

Python script question_router.py

Modified to reduce max_tables from 5 to 2, improve schema selection, and handle context window issues, boosting success rate in query generation.

ServerOperations

Python venv

The DataLens Project uses a Python virtual environment with dependencies like vanna, llama-index, duckdb, and pandas. The implementation requires Python environment setup with all dependencies.

ThirdPartyComponentArchitecture

python-docx

Docling extraction strategy conflicts with python-docx as fallback for DOCX extraction; python-docx was removed in favor of Docling only. The DOCX extractor uses python-docx as a fallback extraction method if Docling extraction is not enabled or fails. The DOCX extractor uses the python-docx third-party component. The DOCX extractor falls back to python-docx for faster extraction of simple documents.

ThirdPartyComponentArchitecture

python-jose

The Auth system in the DataLens Platform uses python-jose as a dependency for security or token management.

ThirdPartyComponentArchitecture

python-magic

Pdfplumber uses python-magic for MIME type detection during PDF table extraction.

ThirdPartyComponentArchitecture

python-multipart

Version 0.0.9, used for handling multipart form data in Python. The File upload feature in the DataLens Platform uses python-multipart to handle multipart form data uploads. FastAPI uses python-multipart for multipart form data parsing.

ThirdPartyComponentArchitecture

python-pptx

Docling extraction conflicts with python-pptx as fallback for PPTX extraction; python-pptx was removed as fallback is disallowed. The PPTX extractor implementation is based on python-pptx for slide and text extraction with semantic chunking. The PPTX extractor uses the python-pptx third-party component. The PPTX extractor uses python-pptx to extract slide-based chunks during DataLens Phase 2.

ThirdPartyComponentArchitecture

PyTorch

Used as a dependency in the GPU extraction process, supporting Docling on elin GPU for document parsing. Docling includes PyTorch, transformers, and OCR support as part of its dependencies.

PageUser Interface

q2-sql-capture.png

Screenshot capturing SQL queries or data during quarter 2 analysis.

ExternalSystemIntegrations

Qdrant

Qdrant is employed as a vector database for document semantic search in DataLens, storing embeddings of text chunks. It facilitates similarity search and retrieval for RAG functions, integrating with the platform's API and backend for fast document similarity ranking, supporting document retrieval and hybrid search capabilities, and is hosted on elin at port 6333. Qdrant indexes embeddings generated from text chunks stored in DuckDB, enabling semantic search in the platform. Semantic Search using nomic-embed-text relies on Qdrant vector database for vector storage and search. RAGAgent stores embeddings in the Qdrant vector database for retrieval.

ThirdPartyComponentArchitecture

Qdrant semantic search

Question router routes textual queries to Qdrant semantic search service.

PhysicalTableData Model

Qdrant vector search service

The Qdrant vector search service uses Ollama embeddings for generating vector representations of data. The Qdrant vector index depends on the DuckDB database for text chunk storage and embedding data source in the DataLens platform.

DataEntityData Model

Qdrant vectors

Qdrant vectors store the vector embeddings generated by Ollama embeddings from Docling extracted chunks for semantic search.

ThirdPartyComponentArchitecture

qdrant-client

Qdrant-client uses requests for HTTP communications. Vanna depends on qdrant-client for vector database integration.

ServerOperations

QDRANT_HOST environment variable

Set to 176.9.90.154 for vector DB connection.

EnvironmentOperations

QDRANT_PORT environment variable

Set to 6333 for Qdrant vector database access.

BusinessProcessIntent

QdrantService

EmbeddingService produces embeddings used by QdrantService for semantic search and vector collections. QdrantService supports TableIndexService by providing vector collections for semantic table search indexes. DataLensAgentMemory is backed by QdrantService to provide vector-based agent memory. QuestionRouter uses the Qdrant Service for semantic vector search. Qdrant Service is defined in backend/app/services/qdrant_service.py. Qdrant Service uses Ollama Embedding Service to create vector embeddings. The search method is part of the Qdrant Service. QdrantService initialization was changed to lazy loading in the QuestionRouter to prevent startup timeouts

IntegrationEndpointIntegrations

QdrantService class

The QdrantService class is defined within backend/app/services/qdrant_service.py. QuestionRouter depends on QdrantService but changed its initialization to lazy loading to prevent startup delays. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. TableIndexService uses QdrantService to build semantic search indices for database tables. QuestionRouter now initializes QdrantService lazily instead of at startup to avoid delays during startup and allow immediate request handling without waiting for Qdrant health status. The search method is part of the Qdrant Service.

PhysicalTableData Model

Query

Query uses PostgreSQL Database to persist query history and metadata about user questions and projects. Query physical table refers to queries executed on projects as part of data analysis. The queries table includes a project_id column that links each query to a specific project. The queries table includes a user_id column that associates each query with a user. Query history contains Query records representing individual answered questions stored in the database. Query physical table generates Insight physical table containing analytical insights from executed queries. Projects track queries tables that record SQL queries run against project data. Query data entity references the Project entity by project_id. Query data entity references the User entity by user_id.

CapabilityIntent

Query classification

DesignDecisionArchitecture

Query Enhancer

Phase B Intelligent Retrieval implements the Query Enhancer for entity extraction and relevant table identification for queries.