Project: datalens
81 entity types
Matrix/Architecture

Architecture

232 entities found

TechConstraintArchitecture

Docker multi-stage build

The Frontend's build process is constrained to use Docker multi-stage build technology. The Frontend uses Docker multi-stage build for containerization

TechConstraintArchitecture

Docker support

The Backend is designed to support Docker deployment as a technical constraint. The Backend includes Docker support

SystemBoundaryArchitecture

docker-compose.yml

Docker Deployment contains the docker-compose.yml file for full stack orchestration. The Dockerfile is used together with docker-compose.yml for multi-container deployment. The Docker deployment uses docker-compose.yml to orchestrate the stack The postgres (Docker) service is defined in docker-compose.yml. The redis (Docker) service is defined in docker-compose.yml. The backend (Docker) service is defined in docker-compose.yml. The worker (Docker) service is defined in docker-compose.yml. The ironclaw (Docker) service is defined in docker-compose.yml. The frontend (Docker) service is defined in docker-compose.yml. The postgres_data (Docker) service is defined in docker-compose.yml. The redis_data (Docker) service is defined in docker-compose.yml. The backend_storage (Docker) service is defined in docker-compose.yml. Docker deployment is defined using the docker-compose.yml configuration file.

ThirdPartyComponentArchitecture

Docling

GPU-first document extraction uses Docling for DOCX and PPTX files on elin GPU via SSH. It provides semantic extraction and rich metadata, replacing fallback methods. Extraction is constrained by CUDA 12.8 and integrated as a mandatory component, ensuring high-quality parsing without fallback. It includes PyTorch, transformers, OCR support, and verified extraction quality via tests. The DOCX extractor uses Docling for extraction on the elin GPU as a mandatory tool to perform semantic chunking with embedded JSON tables and rich metadata. The PPTX extractor uses Docling on elin GPU mandatorily for slide-based semantic chunking with embedded tables, images, and optional speaker notes support. Docling extraction for DOCX and PPTX files requires the GPU hardware on the elin server to accelerate the extraction and embedding processes. The RQ Worker uses Docling as the exclusive extraction method for DOCX and PPTX files and enforces failure if any Docling extraction errors occur, prohibiting fallback extraction methods. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor depends on Docling library as the primary extraction tool with no fallbacks. Docling extractor is deployed as part of the Backend extraction pipeline for mandatory document extraction. Backend extraction pipeline requires Docling as mandatory dependency for high-quality DOCX/PPTX extractions.

ThirdPartyComponentArchitecture

docling>=2.0.0

GPU-only Docling extraction tool for DOCX/PPTX files, installed version 2.0.0, used in GPU-first extraction pipeline, with no fallback.

DesignDecisionArchitecture

docs/design/agent-mode-implementation-plan.md

Design document for agent mode implementation, no additional details provided.

DesignDecisionArchitecture

docs/design/agent-mode-ironclaw.md

Design document for IronClaw-powered agent mode, no details provided.

DesignDecisionArchitecture

docs/DISCOVERY_IMPLEMENTATION.md file

The Discovery Implementation.md document describes the technical approach and architecture for the data discovery module, such as semantic table matching and guided discovery process.

ThirdPartyComponentArchitecture

DOCX extractor

The batch upload pipeline depends on new extractors including the DOCX extractor. The DOCX extractor uses the python-docx third-party component. The DOCX extractor optionally uses Docling as an extraction method for better text quality and semantic structure. The DOCX extractor falls back to python-docx for faster extraction of simple documents. The DOCX extractor implements semantic chunking with section-based chunk boundaries to preserve document structure. The DOCX extractor handles tables by embedding them as JSON within text chunks instead of separate DuckDB tables. The DOCX extractor relies on background workers for asynchronous processing. The DOCX extractor capability is validated by the test_extractors test case.

DesignDecisionArchitecture

DS-STAR Agent API integration

Integrates DS-STAR's autonomous, iterative data cataloging agents (Planner, Verifier, Router, Orchestrator) to enable quality-verified, local LLM-powered extraction. This enhances automation, data quality, and privacy, replacing manual or cloud API solutions, aligned with platform goals of self-hosting and comprehensive data management.

ThirdPartyComponentArchitecture

DS-STAR agents (12 agents on elin)

The DS-STAR AI cataloging functionality uses 12 agents running on elin as part of the DS-STAR agents.

DesignDecisionArchitecture

DS-STAR Architecture

DataLens Development implements the DS-STAR Architecture adapted from Google's DS-STAR paper. The DS-STAR architecture based on multiple agents is implemented as part of DataLens Development for autonomous cataloging and extraction.

DesignDecisionArchitecture

Dual database architecture

ThirdPartyComponentArchitecture

DuckDB SQL

The unified question interface routes structured queries to DuckDB SQL within DataLens. The question router classifies questions and routes structured queries to DuckDB SQL backend. SQLCoder-7B generates valid DuckDB SQL code for query execution. The Text chunks table in DuckDB is a physical table supporting DuckDB SQL queries. The text chunks table in DuckDB with full-text search support is mapped to the DuckDB SQL data store.

DesignDecisionArchitecture

E2E over features

Emphasizes delivering end-to-end (E2E) functionality over additional feature development to ensure core system robustness.

ThirdPartyComponentArchitecture

E2E_DISCOVERY_TESTS.md

A complete guide for running and understanding the Playwright E2E tests that validate the full DataDiscovery pipeline, from question input to consolidation, analysis, and UI responsiveness using the SVGV dataset.

ThirdPartyComponentArchitecture

email-validator

email-validator v2.2.0 is used, no additional details given.

DesignDecisionArchitecture

Entity-Based Query Parsing

DesignDecisionArchitecture

EXECUTION_FLOW.md

Describes backend execution flow, highlighting Qdrant lazy initialization to improve startup times and request handling.

DesignDecisionArchitecture

Executor

ThirdPartyComponentArchitecture

extract-msg

The MSG extractor uses the extract-msg third-party component.

DesignDecisionArchitecture

Extraction Queue Fix

The extraction queue fix requires the RQ Worker to be running and active to process extraction jobs. The extraction queue fix enables the RQ Worker to process all 132 extraction jobs successfully.

ThirdPartyComponentArchitecture

FastAPI

The Backend API is implemented using FastAPI framework. The DataLens Platform backend is implemented as a FastAPI app to expose APIs for authentication, projects, files, extraction, and analysis. The Backend uses FastAPI as its capability framework for serving APIs. The streaming responses via /ask-stream endpoint are implemented using the FastAPI Framework to provide asynchronous request handling. IronClaw Agent Feature uses FastAPI for backend API implementation including asynchronous generators. API LAYER is implemented using FastAPI framework for backend services. The Backend uses FastAPI framework API Layer is implemented with FastAPI as the backend web framework. Backend process runs the FastAPI application along with its dependencies to serve API requests. Agent Gateway is implemented as a FastAPI module that bridges the frontend and IronClaw Service. The Platform Backend is built as a FastAPI app exposing API endpoints for auth, projects, files, extraction, and analysis. theo Backend is implemented with FastAPI for its web API and service operations. FastAPI depends on Uvicorn for serving the application. FastAPI uses python-multipart for multipart form data parsing. FastAPI framework implements the API Layer.

LayerArchitecture

FastAPI backend

Handles API requests, manages business logic, connects to data services, and serves the frontend for DataLens, using FastAPI routes and dependencies as its interface. DataLens Platform uses a FastAPI app backend to provide API endpoints and services. The FastAPI backend in the DataLens Project uses SQLAlchemy ORM for data access. The FastAPI backend uses DS-STAR FileAnalyzer for AI cataloging of uploaded files. The FastAPI backend integrates with DuckDB for data extraction storage and querying. The FastAPI backend exposes a Text-to-SQL query API for natural language queries. The SvelteKit frontend communicates with the FastAPI backend via API endpoints. The FastAPI backend uses PostgreSQL database for multi-tenant metadata storage. The FastAPI backend provides a production-ready Dockerfile for deployment. The FastAPI backend uses AI cataloging to automatically discover table structures on file upload. FastAPI backend runs on theo server and orchestrates file extraction, query execution, and database storage. The DataLens Platform is built with a FastAPI app backend that passes all 13 tests. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. httpx is used as part of tests covering the FastAPI app implementation of the DataLens Platform. Project 13 uses the FastAPI backend for file extraction and DuckDB storage as part of its data platform. theo hosts the FastAPI backend which runs file extraction, DuckDB storage, and uses SQLCoder-7B for query processing. The RQ worker is used by the FastAPI backend to handle asynchronous jobs such as embeddings and summary generation.

ArchitecturalViewArchitecture

FastAPI Swagger

ThirdPartyComponentArchitecture

FileAnalyzer

The DataLens DS-STAR Implementation Plan includes the FileAnalyzer as a core component. FileAnalyzer is a component of the DS-STAR pipeline. The DS-STAR pipeline includes the FileAnalyzer component. The DataLens DS-STAR Implementation Plan includes the FileAnalyzer component. FileAnalyzer produces data catalogs used in the extraction planning process. Planner Agent uses the data catalog generated by FileAnalyzer as input.

ThirdPartyComponentArchitecture

findings_generator.py

The Backend includes the findings_generator.py service which analyzes query results and generates findings. findings_generator.py realizes the Full Findings Visualization Layer capability by producing structured Finding objects including metrics, trends, and outliers. The question_router.py module depends on the FindingsGenerator to generate findings from query results. FindingsGenerator includes findings in the API response for all structured queries. The findings_generator.py service implements the Full Findings Visualization Layer by analyzing query results and generating structured findings.

LayerArchitecture

FINDINGS_LAYER_SUMMARY.md

TechConstraintArchitecture

Firewall

The Firewall configuration was updated to open port 6333 to allow the Backend server (theo) to access the Qdrant database on elin.

ThirdPartyComponentArchitecture

Flask