Project: datalens
81 entity types
Matrix/Architecture

Architecture

232 entities found

SystemBoundaryArchitecture

frontend/Dockerfile

ThirdPartyComponentArchitecture

genai-prices

ThirdPartyComponentArchitecture

Google

Google services are listed as third-party AI providers available for integration to support AI inference and model hosting within DataLens.

TechConstraintArchitecture

GPU acceleration

The deployment on theo lacks local GPU features, which constrains the availability of AI features like DS-STAR and Ollama inference.

DesignDecisionArchitecture

GPU-first design

DataLens implements a GPU-first design by leveraging Ollama on elin for embeddings and inference. GPU-first document extraction uses Docling for DOCX and PPTX extraction as a mandatory component without fallback. The GPU-first document extraction uses the theo server for orchestration including FastAPI backend, RQ workers, and job queuing. GPU-first document extraction uses the RTX 4000 SFF Ada 20GB GPU on elin for document extraction and embeddings generation. The GPU-first document extraction implementation is validated by the test suite 'test_docling_extractors.py'. DataLens uses a GPU-first architecture leveraging Ollama on Elin GPU for embeddings and inference.

ThirdPartyComponentArchitecture

groq

Pydantic-ai-slim integrates with groq for AI model support.

TechConstraintArchitecture

HTTPS

ThirdPartyComponentArchitecture

httpx

httpx is used as part of tests covering the FastAPI app implementation of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.

ThirdPartyComponentArchitecture

i18n/da.json translation file

Translation file for Danish UI strings, supporting multilingual UI.

ThirdPartyComponentArchitecture

i18n/en.json translation file

Translation file for English UI strings, part of internationalization setup.

ThirdPartyComponentArchitecture

IronClaw

DataLens Agent Mode uses IronClaw as the underlying AI agent framework for autonomous data analysis sessions. IronClaw incorporates WASM sandboxing as a technical constraint for tool isolation. IronClaw is constrained by the use of TEE credential vault for security. IronClaw conforms to a GDPR-compatible security model to ensure data protection compliance. Agent Skills Integration renders each streamed finding as a separate IronClawMessage. IronClaw agent frontend components implement the user interface for the IronClaw agent feature. IronClaw backend endpoints expose API functionality required by the IronClaw agent feature. IronClaw agent's data and sessions are stored in specific agent session database tables. IronClaw agent depends on PostgreSQL database to store sessions, messages, findings, and skill logs. IronClaw agent back-end logic is integrated with the main app via API router registration. DataLens Agent Mode uses IronClaw as the AI agent framework to power autonomous data analysis sessions. OpenClaw is disqualified in favor of IronClaw due to security concerns preventing its use for personal data workloads.

ArchitecturalViewArchitecture

IronClaw agent architecture

Full implementation of IronClaw agent deployed, accessible via UI buttons, which initiate autonomous sessions using 4600+ code lines. Features include chat interface, GDPR detection, skill management, and backend integration, now online for project analysis.

ThirdPartyComponentArchitecture

IronClaw Service

The Agent Gateway module in FastAPI acts as a bridge and uses IronClaw Service for agent session management and skill execution. IronClaw Service uses Ollama as one of the LLM providers for self-hosted private model inference. IronClaw Service uses Anthropic Claude as the cloud LLM provider backend option. IronClaw Service stores session memory and manages persistent agent memory within PostgreSQL tables. Agent Gateway depends on the IronClaw Service to handle reasoning loops, skill execution, and memory management via HTTP and WebSocket communication. IronClaw Service uses PostgreSQL to store session memory, agent tables, and persistent state. The theo backend server connects to the IronClaw service running on elin to delegate agent message processing via IronClaw Gateway API. IronClaw service on elin uses Anthropic Claude model for large language processing and SQL generation. IronClaw service on elin requires IronClaw database for session persistence and agent thread management. IronClaw service includes the IronClaw Gateway component responsible for agent orchestration and Claude connectivity.

SystemBoundaryArchitecture

ironclaw-full-test.png

Image demonstrating full testing scope and results of IronClaw features.

ThirdPartyComponentArchitecture

LangChain

DataLens needs to adopt LangChain or LiteLLM to enable flexibility in choosing LLM providers beyond Ollama. DataLens requires using LangChain or similar to support multiple LLM providers beyond Ollama. LangChain is used within the implementation plan for chaining LLM and agent workflows. The system uses LangChain framework alongside DuckDB for data pipeline management.

ThirdPartyComponentArchitecture

LangExtract

ThirdPartyComponentArchitecture

LangGraph

ThirdPartyComponentArchitecture

LiteLLM

DataLens needs to adopt LiteLLM or LangChain for provider abstraction to support multiple LLM backends. DataLens requires using LiteLLM or similar abstraction to enable LLM flexibility.

ThirdPartyComponentArchitecture

llm-judge pattern

Pattern used for quality assessment and verification within the data analysis pipeline.

ThirdPartyComponentArchitecture

logfire-api

ThirdPartyComponentArchitecture

lucide-svelte

npm dependency: lucide-svelte@^0.575.0, used in frontend, version 0.575.0, licensing and approval status unspecified.

ThirdPartyComponentArchitecture

lux-api

Lux-api depends on pandas for auto-visualization features.

LayerArchitecture

Master branch

The Commit identification and deployment process involves using the Master branch for pushing the reverted commit. The last known good commit will be reverted, pushed to master branch and auto-deployed via Coolify as the next step to restore function.

ThirdPartyComponentArchitecture

Metric definitions

ThirdPartyComponentArchitecture

mistralai

Pydantic-ai-slim integrates with mistralai for AI model support.

ThirdPartyComponentArchitecture

mode-watcher

Third-party npm dependency: mode-watcher@^1.1.0, used in frontend package.json, no specific version or license info available.

ThirdPartyComponentArchitecture

MSG extractor

The batch upload pipeline depends on new extractors including the MSG extractor. The MSG extractor uses the extract-msg third-party component. MSG extractor uses background workers to process message files.

ThirdPartyComponentArchitecture

nomic-embed-text

Embedding service uses the nomic-embed-text 768-dimensional model via Ollama for GPU accelerated vector embedding. The nomic-embed-text component is part of Ollama embeddings used for GPU batch embedding processing of document chunks. The Nomic-embed-text embedding model is used by the async embedding queue for batch GPU embedding processing. Semantic Search capability uses nomic-embed-text component for creating table embeddings to improve table ranking in schema selection. TableEmbeddingIndex uses nomic-embed-text for embedding table names and descriptions. RAG Agent employs the nomic-embed-text embedding model for creating document embeddings.

ThirdPartyComponentArchitecture

npm

DataLens uses npm to run local tests as part of its deployment workflow. DataLens deployment workflow uses npm commands for testing frontend before pushing code.

ThirdPartyComponentArchitecture

Object storage (S3/MinIO)