Architecture
232 entities found
frontend/Dockerfile
genai-prices
Google services are listed as third-party AI providers available for integration to support AI inference and model hosting within DataLens.
GPU acceleration
The deployment on theo lacks local GPU features, which constrains the availability of AI features like DS-STAR and Ollama inference.
GPU-first design
DataLens implements a GPU-first design by leveraging Ollama on elin for embeddings and inference. GPU-first document extraction uses Docling for DOCX and PPTX extraction as a mandatory component without fallback. The GPU-first document extraction uses the theo server for orchestration including FastAPI backend, RQ workers, and job queuing. GPU-first document extraction uses the RTX 4000 SFF Ada 20GB GPU on elin for document extraction and embeddings generation. The GPU-first document extraction implementation is validated by the test suite 'test_docling_extractors.py'. DataLens uses a GPU-first architecture leveraging Ollama on Elin GPU for embeddings and inference.
groq
Pydantic-ai-slim integrates with groq for AI model support.
HTTPS
httpx
httpx is used as part of tests covering the FastAPI app implementation of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.
i18n/da.json translation file
Translation file for Danish UI strings, supporting multilingual UI.
i18n/en.json translation file
Translation file for English UI strings, part of internationalization setup.
IronClaw
DataLens Agent Mode uses IronClaw as the underlying AI agent framework for autonomous data analysis sessions. IronClaw incorporates WASM sandboxing as a technical constraint for tool isolation. IronClaw is constrained by the use of TEE credential vault for security. IronClaw conforms to a GDPR-compatible security model to ensure data protection compliance. Agent Skills Integration renders each streamed finding as a separate IronClawMessage. IronClaw agent frontend components implement the user interface for the IronClaw agent feature. IronClaw backend endpoints expose API functionality required by the IronClaw agent feature. IronClaw agent's data and sessions are stored in specific agent session database tables. IronClaw agent depends on PostgreSQL database to store sessions, messages, findings, and skill logs. IronClaw agent back-end logic is integrated with the main app via API router registration. DataLens Agent Mode uses IronClaw as the AI agent framework to power autonomous data analysis sessions. OpenClaw is disqualified in favor of IronClaw due to security concerns preventing its use for personal data workloads.
IronClaw agent architecture
Full implementation of IronClaw agent deployed, accessible via UI buttons, which initiate autonomous sessions using 4600+ code lines. Features include chat interface, GDPR detection, skill management, and backend integration, now online for project analysis.
IronClaw Service
The Agent Gateway module in FastAPI acts as a bridge and uses IronClaw Service for agent session management and skill execution. IronClaw Service uses Ollama as one of the LLM providers for self-hosted private model inference. IronClaw Service uses Anthropic Claude as the cloud LLM provider backend option. IronClaw Service stores session memory and manages persistent agent memory within PostgreSQL tables. Agent Gateway depends on the IronClaw Service to handle reasoning loops, skill execution, and memory management via HTTP and WebSocket communication. IronClaw Service uses PostgreSQL to store session memory, agent tables, and persistent state. The theo backend server connects to the IronClaw service running on elin to delegate agent message processing via IronClaw Gateway API. IronClaw service on elin uses Anthropic Claude model for large language processing and SQL generation. IronClaw service on elin requires IronClaw database for session persistence and agent thread management. IronClaw service includes the IronClaw Gateway component responsible for agent orchestration and Claude connectivity.
ironclaw-full-test.png
Image demonstrating full testing scope and results of IronClaw features.
LangChain
DataLens needs to adopt LangChain or LiteLLM to enable flexibility in choosing LLM providers beyond Ollama. DataLens requires using LangChain or similar to support multiple LLM providers beyond Ollama. LangChain is used within the implementation plan for chaining LLM and agent workflows. The system uses LangChain framework alongside DuckDB for data pipeline management.
LangExtract
LangGraph
LiteLLM
DataLens needs to adopt LiteLLM or LangChain for provider abstraction to support multiple LLM backends. DataLens requires using LiteLLM or similar abstraction to enable LLM flexibility.
llm-judge pattern
Pattern used for quality assessment and verification within the data analysis pipeline.
logfire-api
lucide-svelte
npm dependency: lucide-svelte@^0.575.0, used in frontend, version 0.575.0, licensing and approval status unspecified.
lux-api
Lux-api depends on pandas for auto-visualization features.
Master branch
The Commit identification and deployment process involves using the Master branch for pushing the reverted commit. The last known good commit will be reverted, pushed to master branch and auto-deployed via Coolify as the next step to restore function.
Metric definitions
mistralai
Pydantic-ai-slim integrates with mistralai for AI model support.
mode-watcher
Third-party npm dependency: mode-watcher@^1.1.0, used in frontend package.json, no specific version or license info available.
MSG extractor
The batch upload pipeline depends on new extractors including the MSG extractor. The MSG extractor uses the extract-msg third-party component. MSG extractor uses background workers to process message files.
nomic-embed-text
Embedding service uses the nomic-embed-text 768-dimensional model via Ollama for GPU accelerated vector embedding. The nomic-embed-text component is part of Ollama embeddings used for GPU batch embedding processing of document chunks. The Nomic-embed-text embedding model is used by the async embedding queue for batch GPU embedding processing. Semantic Search capability uses nomic-embed-text component for creating table embeddings to improve table ranking in schema selection. TableEmbeddingIndex uses nomic-embed-text for embedding table names and descriptions. RAG Agent employs the nomic-embed-text embedding model for creating document embeddings.
npm
DataLens uses npm to run local tests as part of its deployment workflow. DataLens deployment workflow uses npm commands for testing frontend before pushing code.