Architecture
232 entities found
Project services
Authentication services cooperate with Project services in the backend Project services cooperate with Database services in the backend
project_data schema
ProjectCreate pydantic model
The scope field is required in the ProjectCreate pydantic model as per design. ProjectCreate validation for scope applies to the ProjectCreate pydantic model to enforce a minimum word count. The ProjectCreate pydantic model is modified to make the scope field required with word count validation.
ProjectResponse
The scope field is included in the ProjectResponse to ensure it is always present.
Projects/svgv-budget-analysis folder
Provider Abstraction
Defines a unified interface to support multiple data sources and APIs, enabling flexible, scalable integration of diverse data inputs and provider systems, simplifying platform extensibility and maintainability.
psycopg2-binary
SQLAlchemy depends on psycopg2-binary as a PostgreSQL database driver.
pydantic-ai-slim
Pydantic-ai-slim integrates with anthropic for AI model support. Pydantic-ai-slim integrates with mistralai for AI model support. Pydantic-ai-slim integrates with groq for AI model support.
pydantic-graph
PyPDF2
pytest
DataLens uses pytest to run local tests before deployment. DataLens deployment workflow requires that pytest tests pass locally before pushing code for backend testing. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.
pytest-asyncio
Python client
python-docx
Docling extraction strategy conflicts with python-docx as fallback for DOCX extraction; python-docx was removed in favor of Docling only. The DOCX extractor uses python-docx as a fallback extraction method if Docling extraction is not enabled or fails. The DOCX extractor uses the python-docx third-party component. The DOCX extractor falls back to python-docx for faster extraction of simple documents.
python-jose
The Auth system in the DataLens Platform uses python-jose as a dependency for security or token management.
python-magic
Pdfplumber uses python-magic for MIME type detection during PDF table extraction.
python-multipart
Version 0.0.9, used for handling multipart form data in Python. The File upload feature in the DataLens Platform uses python-multipart to handle multipart form data uploads. FastAPI uses python-multipart for multipart form data parsing.
python-pptx
Docling extraction conflicts with python-pptx as fallback for PPTX extraction; python-pptx was removed as fallback is disallowed. The PPTX extractor implementation is based on python-pptx for slide and text extraction with semantic chunking. The PPTX extractor uses the python-pptx third-party component. The PPTX extractor uses python-pptx to extract slide-based chunks during DataLens Phase 2.
PyTorch
Used as a dependency in the GPU extraction process, supporting Docling on elin GPU for document parsing. Docling includes PyTorch, transformers, and OCR support as part of its dependencies.
Qdrant semantic search
Question router routes textual queries to Qdrant semantic search service.
qdrant-client
Qdrant-client uses requests for HTTP communications. Vanna depends on qdrant-client for vector database integration.
Query Enhancer
Phase B Intelligent Retrieval implements the Query Enhancer for entity extraction and relevant table identification for queries.
Query history
Query history contains Query records representing individual answered questions stored in the database. The Frontend uses the Query History entity for the analysis view.
query tracking middleware
DataLens requires query tracking middleware to implement audit logging of queries and users for regulatory compliance. DataLens requires adding query tracking middleware for audit logs and compliance. DataLens requires query tracking middleware to track user queries
Qwen model
DataLens team started with 100K/200K tokens, consumed 147K tokens and continuously evaluated usage versus quality between Qwen and Sonnet models, ultimately keeping the Sonnet model for quality.
Qwen2.5-Coder-14B-AWQ
vLLM model deployed on elin with 14B parameters, optimized for GPU inference, using a 4-bit quantization to fit in 10GB VRAM. Qwen2.5-Coder-14B-AWQ is a part of the DataLens DS-STAR Implementation Plan as the deployed vLLM model. The vLLM component uses the Qwen2.5-Coder-14B-AWQ model version.
Qwen3
The Multi-Stage Text-to-SQL Architecture uses Qwen3 for complex query handling, table selection, and answer synthesis. TableReranker uses the Qwen3 model to re-rank candidate tables and select the most relevant ones for query answering. AnswerSynthesizer leverages Qwen3 to synthesize human-readable answers in Danish or English from SQL query results. The Multi-Stage Architecture uses the Qwen3 model for schema selection and answer synthesis phases. The Data Discovery feature architecture uses Qwen3 with large context for table selection and Arctic with smaller context for SQL generation. The Data Discovery feature architecture uses Qwen3 with large context for table selection and Arctic with smaller context for SQL generation. The Data Discovery System requires the Qwen3 table selection capability to identify relevant tables before SQL generation. SQL extraction regex fix addresses the problem in Qwen3 response format where the SQL query was not properly captured due to missing newline before closing backticks. Multi-Stage Text-to-SQL Architecture uses Qwen3 for schema comprehension and complex SQL tasks. The Discovery Service uses Qwen3 LLM for table selection in the intelligent table discovery process. The Data Discovery Architecture uses Qwen3 for table selection with a large context window. The Qwen3 response format required the SQL extraction regex to be fixed to handle SQL code blocks without a newline before closing backticks. The Live Backend processes Qwen3 multi-block XML responses for SQL extraction.
qwen3-coder-next 80B
RAPIDS cuDF
The plan optionally uses RAPIDS cuDF for accelerating large dataframe computations on GPU. The plan optionally uses RAPIDS cuDF for accelerating large dataframe processing on elin GPU. RAPIDS cuDF is optionally used for large dataframe acceleration alongside DuckDB.