MKB Explorer

Project: datalens

81 entity types

Matrix/Operations

Operations

97 entities found

ServerOperations

Phase 2 file types

PHASE2_UNIFIED_STRATEGY.md defines the pipeline design and tool justifications for processing Phase 2 file types PHASE2_IMPLEMENTATION_PLAN.md provides go/no-go recommendation and effort analysis for Phase 2 file types processing Phase 2 file types include PDF files Phase 2 file types include DOCX files Phase 2 file types include PPTX files Phase 2 file types include MSG files The Docling extraction system was built as part of the DataLens Phase 2 GPU-first document extraction system.

DeploymentProcedureOperations

Phase 2 Strategy Research & Decision Point

Phase 2 Strategy involves using Opus 4.6 to generate PHASE2_UNIFIED_STRATEGY.md and Sonnet 4.6 for PHASE2_IMPLEMENTATION_PLAN.md. The decision depends on processing status and Excel tables loaded during Phase 1, with Jesper responsible for approval. Currently running old code, the backend services await code update for full functionality.

ServerOperations

postgres-qg4kk8ccggw844wscsossogs-093750206021

The postgres (Docker) service is defined in docker-compose.yml. The postgres (Docker) service is defined in docker-compose.coolify.yml.

ServerOperations

postgres_data

Docker container with no exposed ports, used for PostgreSQL data storage. Backend service requires a volume for postgres_data to persist PostgreSQL database files. The postgres_data (Docker) service is defined in docker-compose.yml. The postgres_data (Docker) service is defined in docker-compose.coolify.yml.

ServerOperations

PostgreSQL

DataLens migrated project data from DuckDB to PostgreSQL for enhanced concurrency, with backend interactions, data storage, and metadata management relying on PostgreSQL, supporting multilingual features, and hosting the project_14 schema with 72 tables. Extraction Worker will write extracted data to PostgreSQL schemas instead of DuckDB. After migration, Query Router will read extracted data from PostgreSQL schemas. Post-migration Arctic SQL will generate SQL queries targeting PostgreSQL with minor translation. PgDataService manages extracted project data stored in PostgreSQL schemas. Each project schema such as Project 14's schema is a distinct namespace within PostgreSQL. Text chunks for all projects are stored in a unified table in PostgreSQL's public schema. extracted_tables table tracks metadata about extracted tables per project inside PostgreSQL. Phase 2 Strategy Research & Decision Point uses PostgreSQL for file storing and metadata management as indicated by files stored in PostgreSQL during deployment tests. Backend server uses PostgreSQL for tracking file metadata, users, projects, and authentication. PostgreSQL database stores the project_14 schema used for SVGV data storage and extraction results. The Data Discovery system integrates with the PostgreSQL database for storing and retrieving data. IronClaw agent tables are physical tables mapped within the PostgreSQL database for persistent storage of agent sessions and related data. OpenClaw Skill API references PostgreSQL database for metadata of the 132 budget files. theo Backend accesses PostgreSQL database to manage budget files metadata as part of the data platform. PostgreSQL catalogs metadata for SVGV files but does not contain extracted budget tables after system restart. PostgreSQL database stores user information for authentication and project tracking. PostgreSQL database stores project metadata like org_id and created_by user.

ServerOperations

PostgreSQL (metadata storage)

POST /files/upload?project_id=4 endpoint stores uploaded files in PostgreSQL database

ServerOperations

PostgreSQL 16 service

The backend service depends on the PostgreSQL 16 service as its metadata storage with connection configured via environment variable. Backend API depends on PostgreSQL 16 service for metadata storage. The production deployment on theo uses PostgreSQL 16 for metadata storage.

ServerOperations

PostgreSQL container

The Production Deployment on theo includes running the PostgreSQL container. The PostgreSQL container runs on the server named theo as part of the Coolify infrastructure.

ServerOperations

PostgreSQL database

A PostgreSQL database system is employed as the backend database platform, serving as the core server for metadata, user, and project data management for DataLens. The PostgreSQL database stores FileUpload records with the ai_summary column used for AI summaries of files. Query uses PostgreSQL Database to persist query history and metadata about user questions and projects.

ServerOperations

PostgreSQL database system

InfrastructureSpecOperations

PostgreSQL project_14 schema

The SVGV Full Reset includes dropping and recreating the PostgreSQL project_14 schema as part of the reset. Arne Hauge accesses data stored in the PostgreSQL project_14 schema for budget analysis. The Data Discovery System requires the PostgreSQL project_14 schema as the primary data store for consolidated tables and query execution. The omkostningsberegning table is part of the PostgreSQL project_14 schema storing extracted SVGV data.

ServerOperations

process ID 2097134 (nohup bash)

EnvironmentOperations

Python

DataLens uses Python as part of its data analysis environment. The Architecture includes all Python and AI components running on elin.

ServerOperations

Python venv

The DataLens Project uses a Python virtual environment with dependencies like vanna, llama-index, duckdb, and pandas. The implementation requires Python environment setup with all dependencies.

ServerOperations

QDRANT_HOST environment variable

Set to 176.9.90.154 for vector DB connection.

EnvironmentOperations

QDRANT_PORT environment variable

Set to 6333 for Qdrant vector database access.

SLADefinitionOperations

question_router.py

question_router.py is updated to use pg_data_service.py for reading extracted data in place of DuckDBService. question_router.py uses translate_duckdb_to_pg to convert DuckDB-style SQL to PostgreSQL-compatible SQL before execution. The question_router.py module depends on the FindingsGenerator to generate findings from query results. The question_router.py module was modified to use PgDataService instead of DuckDBService for query operations. The LLM Prompt Injection use case modifies prompts in question_router.py to include language parameter for answer synthesis. question_router.py uses FindingsGenerator to generate findings from query results

ServerOperations

Redis

Agent Gateway interacts with Redis as part of the backend ecosystem for caching and background jobs. theo server uses Redis as job queue to manage extraction and batch vectorize jobs for document chunks. theo server uses Redis as job queue to manage extraction and batch vectorize jobs for document chunks. The DataLens Platform integrates with Redis 7 for future use with background jobs and caching. DataLens Platform uses Redis as a future component for background jobs and caching. RQ workers depend on Redis for job queue management. The Backend uses Redis as a backing store for the RQ job queue system. Redis functions as the queue backend supporting the RQ worker processing extraction jobs. The Data Discovery system uses Redis for job queue management and caching. The RQ job queue depends on the Redis service for managing asynchronous job scheduling and processing. Redis provides the backend queue for the RQ worker in the extraction pipeline. The SSE progress endpoint integrates with Redis pub/sub for real-time progress streaming.

ServerOperations

Redis (job queue)

ServerOperations

Redis 7 service

The backend service depends on the Redis 7 service for job queueing with connection configured via environment variable. Backend API depends on Redis 7 service for job queue management. The production deployment on theo uses Redis 7 as job queue infrastructure.

ServerOperations

redis_data

Docker container with no exposed ports, used for Redis cache. Backend service requires a volume for redis_data to persist Redis data. The redis (Docker) service is defined in docker-compose.yml. The redis_data (Docker) service is defined in docker-compose.yml. The redis (Docker) service is defined in docker-compose.coolify.yml. The redis_data (Docker) service is defined in docker-compose.coolify.yml. The redis (Docker) service is defined in docker-compose.yml. The redis_data (Docker) service is defined in docker-compose.yml. The redis (Docker) service is defined in docker-compose.coolify.yml. The redis_data (Docker) service is defined in docker-compose.coolify.yml.

EnvironmentOperations

Request Context

Provides contextual information for data analysis, including user identity and environment details.

ContingencyPlanOperations

Rollback Plan

ServerOperations

RQ worker

DataLens depends on the RQ worker to process queued extraction jobs for the SVGV dataset files. The RQ worker processes extraction jobs for the 132 SVGV dataset files. Redis functions as the queue backend supporting the RQ worker processing extraction jobs. The Backend API interacts with the RQ worker to manage extraction job queues and status for SVGV files. The extract_file_job function is executed by the RQ worker to process extraction of files. The worker container runs the RQ worker instance responsible for processing extraction jobs. The worker container is idle, waiting for extraction jobs in the RQ worker queue after the reset. The RQ worker is used by the FastAPI backend to handle asynchronous jobs such as embeddings and summary generation.

ServerOperations

RQ Worker

Worker process correctly configured and processing extraction jobs; initial queueing issues fixed with proper function name, now actively handling 132 files for re-extraction. RQ Worker extraction processing depends on Redis RQ job queuing for managing extraction jobs. RQ Worker extraction processing uses Backend API endpoints for extraction tasks coordination. The SVGV Full Reset process depends on RQ Worker extraction processing to handle extraction jobs after resetting files and schema. RQ Worker processed the SVGV extraction jobs and is currently idle after completion. The Data Discovery system utilizes RQ Worker to process background extraction and consolidation jobs. The RQ Worker uses Docling as the exclusive extraction method for DOCX and PPTX files and enforces failure if any Docling extraction errors occur, prohibiting fallback extraction methods. The Extraction Pipeline depends on the RQ Worker to process files asynchronously in the extraction queue. The Backend depends on the RQ Worker to process asynchronous tasks such as extraction and AI summary generation. The 132 extraction jobs are processed by the RQ worker. The RQ worker listens to the RQ queue to process extraction jobs. The Worker container hosts the RQ worker process for asynchronous job processing. The extraction queue fix requires the RQ Worker to be running and active to process extraction jobs. The extraction queue fix enables the RQ Worker to process all 132 extraction jobs successfully. The Data Discovery system depends on the RQ Worker to process extraction jobs asynchronously. RQ worker processes execute extraction jobs by calling the Extraction API endpoints for each SVGV file.

ServerOperations

RQ worker for async job processing

The RQ worker for async job processing consumes jobs from the RQ job queue to generate AI summaries and process embeddings asynchronously. The RQ worker depends on the RQ job queue to receive tasks for async AI summary generation and embedding processing. The DataLens platform uses the RQ worker for async job processing to handle background tasks for summaries and embeddings. The RQ worker for async job processing is part of the backend infrastructure of the DataLens platform executing asynchronous tasks. The deployment and availability of the RQ worker for async job processing depends on the Coolify deployment platform configuration and deployment status. RQ worker runs on theo server to process async jobs like embeddings and summaries in the DataLens platform. The Extraction Pipeline depends on the RQ Worker to process the extraction queue for files asynchronously. Backend uses the RQ Worker configured to listen on the 'extraction' queue for asynchronous extraction jobs.

ServerOperations

RQ worker service on backend server theo

A background worker in progress to handle file extraction and processing tasks.

ServerOperations

RTX 4000 SFF Ada

DataLens performs GPU-accelerated workloads on elin, specifically utilizing the RTX 4000 SFF Ada GPU. GPU-first document extraction uses the RTX 4000 SFF Ada 20GB GPU on elin for document extraction and embeddings generation. Ollama runs on the elin RTX 4000 SFF Ada 20GB GPU to provide embedding services for document chunks. The GPU-first document extraction system uses the RTX 4000 GPU on the elin server for fast document extraction and vectorization. DataLens agent requires GPU usage on RTX 4000 SFF Ada with max memory utilization 0.5 due to shared hardware usage constraint. GPU resource management policies require monitoring and usage of the shared RTX 4000 SFF Ada 20GB GPU for extraction and embedding tasks. Docling-based extraction leverages the RTX 4000 SFF Ada 20GB GPU available on elin for DOCX and PPTX file processing. Ollama embeddings run on the RTX 4000 SFF Ada 20GB GPU for batch processing of text chunks.

SLADefinitionOperations

SQLAgent

The plan includes SQLAgent that uses Text-to-SQL capabilities to generate and execute SQL queries. SQLAgent integrates with Vanna.AI for SQL generation from natural language questions. SQLAgent executes queries using DuckDB as the analytics database SQLAgent is represented as the sql agent entity SQLAgent is represented as the sql agent entity SQLAgent executes queries using DuckDB as the analytics database SQLAgent is represented as the sql agent entity SQLAgent is represented as the sql agent entity SQLAgent executes queries using DuckDB as the analytics database SQLAgent is represented as the sql agent entity SQLAgent is represented as the sql agent entity The DataLens Project uses the SQLAgent component to convert natural language to SQL queries. The DataLens Project uses the SQLAgent component to convert natural language to SQL queries. The DataLens Project uses the SQLAgent component to convert natural language to SQL queries. The plan includes the Text-to-SQL Agent integration with Vanna.AI for natural language to SQL translation. The Text-to-SQL Agent integrates with Vanna.AI to generate SQL queries from natural language. The plan includes the Text-to-SQL Agent integration with Vanna.AI for natural language to SQL translation. The Text-to-SQL Agent integrates with Vanna.AI to generate SQL queries from natural language. SQLAgent realizes the Text-to-SQL capability by converting natural language to SQL executions. SQLAgent realizes the Natural Language Querying capability converting questions into SQL queries. Lightweight Implementation uses SQLAgent for text-to-SQL functionality. DataLens Project includes SQLAgent which converts natural language to SQL using Ollama. SQLAgent executes generated SQL queries on DuckDB.

ServerOperations

SQLCoder-7B

SQLCoder-7B is deployed on elin at port 11434, replacing qwen3-coder-next for faster Text-to-SQL, reducing inference time to 2-3 seconds. Generates valid DuckDB SQL, used by backend on theo, and supports full SVGV analysis post-deployment. Hosted as Arctic-Text2SQL-R1-7B with 7B parameters, optimized for GPU (16GB VRAM).