MKB Explorer

Project: datalens

81 entity types

Matrix/Architecture

Architecture

232 entities found

ThirdPartyComponentArchitecture

Project services

Authentication services cooperate with Project services in the backend Project services cooperate with Database services in the backend

SystemBoundaryArchitecture

project_data schema

SystemBoundaryArchitecture

ProjectCreate pydantic model

The scope field is required in the ProjectCreate pydantic model as per design. ProjectCreate validation for scope applies to the ProjectCreate pydantic model to enforce a minimum word count. The ProjectCreate pydantic model is modified to make the scope field required with word count validation.

SystemBoundaryArchitecture

ProjectResponse

The scope field is included in the ProjectResponse to ensure it is always present.

SystemBoundaryArchitecture

Projects/svgv-budget-analysis folder

DesignDecisionArchitecture

Provider Abstraction

Defines a unified interface to support multiple data sources and APIs, enabling flexible, scalable integration of diverse data inputs and provider systems, simplifying platform extensibility and maintainability.

ThirdPartyComponentArchitecture

psycopg2-binary

SQLAlchemy depends on psycopg2-binary as a PostgreSQL database driver.

ThirdPartyComponentArchitecture

pydantic-ai-slim

Pydantic-ai-slim integrates with anthropic for AI model support. Pydantic-ai-slim integrates with mistralai for AI model support. Pydantic-ai-slim integrates with groq for AI model support.

ThirdPartyComponentArchitecture

pydantic-graph

ThirdPartyComponentArchitecture

PyPDF2

ThirdPartyComponentArchitecture

pytest

DataLens uses pytest to run local tests before deployment. DataLens deployment workflow requires that pytest tests pass locally before pushing code for backend testing. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.

ThirdPartyComponentArchitecture

pytest-asyncio

ThirdPartyComponentArchitecture

Python client

ThirdPartyComponentArchitecture

python-docx

Docling extraction strategy conflicts with python-docx as fallback for DOCX extraction; python-docx was removed in favor of Docling only. The DOCX extractor uses python-docx as a fallback extraction method if Docling extraction is not enabled or fails. The DOCX extractor uses the python-docx third-party component. The DOCX extractor falls back to python-docx for faster extraction of simple documents.

ThirdPartyComponentArchitecture

python-jose

The Auth system in the DataLens Platform uses python-jose as a dependency for security or token management.

ThirdPartyComponentArchitecture

python-magic

Pdfplumber uses python-magic for MIME type detection during PDF table extraction.

ThirdPartyComponentArchitecture

python-multipart

Version 0.0.9, used for handling multipart form data in Python. The File upload feature in the DataLens Platform uses python-multipart to handle multipart form data uploads. FastAPI uses python-multipart for multipart form data parsing.

ThirdPartyComponentArchitecture

python-pptx

Docling extraction conflicts with python-pptx as fallback for PPTX extraction; python-pptx was removed as fallback is disallowed. The PPTX extractor implementation is based on python-pptx for slide and text extraction with semantic chunking. The PPTX extractor uses the python-pptx third-party component. The PPTX extractor uses python-pptx to extract slide-based chunks during DataLens Phase 2.

ThirdPartyComponentArchitecture

PyTorch

Used as a dependency in the GPU extraction process, supporting Docling on elin GPU for document parsing. Docling includes PyTorch, transformers, and OCR support as part of its dependencies.

ThirdPartyComponentArchitecture

Qdrant semantic search

Question router routes textual queries to Qdrant semantic search service.

ThirdPartyComponentArchitecture

qdrant-client

Qdrant-client uses requests for HTTP communications. Vanna depends on qdrant-client for vector database integration.

DesignDecisionArchitecture

Query Enhancer

Phase B Intelligent Retrieval implements the Query Enhancer for entity extraction and relevant table identification for queries.

ThirdPartyComponentArchitecture

Query history

Query history contains Query records representing individual answered questions stored in the database. The Frontend uses the Query History entity for the analysis view.

DesignDecisionArchitecture

query tracking middleware

DataLens requires query tracking middleware to implement audit logging of queries and users for regulatory compliance. DataLens requires adding query tracking middleware for audit logs and compliance. DataLens requires query tracking middleware to track user queries

ThirdPartyComponentArchitecture

Qwen model

DataLens team started with 100K/200K tokens, consumed 147K tokens and continuously evaluated usage versus quality between Qwen and Sonnet models, ultimately keeping the Sonnet model for quality.

ThirdPartyComponentArchitecture

Qwen2.5-Coder-14B-AWQ

vLLM model deployed on elin with 14B parameters, optimized for GPU inference, using a 4-bit quantization to fit in 10GB VRAM. Qwen2.5-Coder-14B-AWQ is a part of the DataLens DS-STAR Implementation Plan as the deployed vLLM model. The vLLM component uses the Qwen2.5-Coder-14B-AWQ model version.

ThirdPartyComponentArchitecture

Qwen3

The Multi-Stage Text-to-SQL Architecture uses Qwen3 for complex query handling, table selection, and answer synthesis. TableReranker uses the Qwen3 model to re-rank candidate tables and select the most relevant ones for query answering. AnswerSynthesizer leverages Qwen3 to synthesize human-readable answers in Danish or English from SQL query results. The Multi-Stage Architecture uses the Qwen3 model for schema selection and answer synthesis phases. The Data Discovery feature architecture uses Qwen3 with large context for table selection and Arctic with smaller context for SQL generation. The Data Discovery feature architecture uses Qwen3 with large context for table selection and Arctic with smaller context for SQL generation. The Data Discovery System requires the Qwen3 table selection capability to identify relevant tables before SQL generation. SQL extraction regex fix addresses the problem in Qwen3 response format where the SQL query was not properly captured due to missing newline before closing backticks. Multi-Stage Text-to-SQL Architecture uses Qwen3 for schema comprehension and complex SQL tasks. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Data Discovery Architecture uses Qwen3 for table selection with a large context window. The Qwen3 response format required the SQL extraction regex to be fixed to handle SQL code blocks without a newline before closing backticks. The Live Backend processes Qwen3 multi-block XML responses for SQL extraction.

ThirdPartyComponentArchitecture

qwen3-coder-next 80B

ThirdPartyComponentArchitecture

RAPIDS cuDF

The plan optionally uses RAPIDS cuDF for accelerating large dataframe computations on GPU. The plan optionally uses RAPIDS cuDF for accelerating large dataframe processing on elin GPU. RAPIDS cuDF is optionally used for large dataframe acceleration alongside DuckDB.

ArchPatternArchitecture

Read replicas for PostgreSQL