MKB Explorer

preview endpoint

The /api/v1/discovery endpoints include the preview endpoint.

Project 13

Project 13 uses the FastAPI backend for file extraction and DuckDB storage as part of its data platform.

project scope

The ProjectCreate pydantic model is modified to make the scope field required with word count validation. The catalog.py:_generate_ai_summary function is modified to replace hardcoded budget analysis text with the Project's scope in prompts. File Summary Generation uses the project's scope instead of a hardcoded string to contextualize the summaries. The Project Creation UI includes a textarea for the scope (project goal) with word count validation and generation support. The Project Dashboard displays the project's scope as the Project Goal for users. Analysis Recommendations generation uses the project scope and file catalog to suggest specific analysis questions. The Backend API for Projects validates the scope field with a minimum 20-word count requirement. Frontend UI Components implement the user interface elements for editing and displaying the scope as the Project Goal.

Projects API Endpoints

DataLens Platform supports CRUD functionality for managing projects. The Backend API for Projects validates the scope field with a minimum 20-word count requirement.

PUT /{project_id}

Update project endpoint, status unspecified.

Qdrant

Qdrant is employed as a vector database for document semantic search in DataLens, storing embeddings of text chunks. It facilitates similarity search and retrieval for RAG functions, integrating with the platform's API and backend for fast document similarity ranking, supporting document retrieval and hybrid search capabilities, and is hosted on elin at port 6333. Qdrant indexes embeddings generated from text chunks stored in DuckDB, enabling semantic search in the platform. Semantic Search using nomic-embed-text relies on Qdrant vector database for vector storage and search. RAGAgent stores embeddings in the Qdrant vector database for retrieval.

QdrantService class

The QdrantService class is defined within backend/app/services/qdrant_service.py. QuestionRouter depends on QdrantService but changed its initialization to lazy loading to prevent startup delays. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. TableIndexService uses QdrantService to build semantic search indices for database tables. QuestionRouter now initializes QdrantService lazily instead of at startup to avoid delays during startup and allow immediate request handling without waiting for Qdrant health status. The search method is part of the Qdrant Service.

QUESTION ROUTER

The backend's QuestionRouter in backend/app/services/question_router.py classifies questions and routes queries to DuckDB SQL backend or Qdrant semantic search. It lazily initializes QdrantService to avoid startup delays, managing structured, textual, and hybrid query paths. It works with TextToSQLService and DuckDBService for SQL generation and execution, and handles query classification and routing within the API layer. QdrantService initialization was changed to lazy loading in the QuestionRouter to prevent startup timeouts QuestionRouter depends on Qdrant, but now falls back gracefully if Qdrant is unavailable Question router routes textual queries to Qdrant semantic search service. The /ask endpoint uses the Question router to classify and route user questions. The Question router capability is validated by the test_question_router test case.

QuestionRouter class

The QuestionRouter class is defined within backend/app/services/question_router.py. The QuestionRouter class uses the TEXT-TO-SQL Service for SQL query generation. The QuestionRouter class uses the DUCKDB Service for executing SQL queries and retrieving data. The QuestionRouter class uses the QDRANT SERVICE for semantic search over vector data. The QuestionRouter class is used by the API LAYER including the backend/app/api/analysis.py endpoint for processing queries. QuestionRouter.route() is the main method within QuestionRouter that orchestrates query execution. The route method is part of the QuestionRouter class.

QuestionRouter integration

QuestionRouter integration works with FindingsGenerator logic in the agent architecture to process and generate findings. IronClaw Agent depends on QuestionRouter.route() to route queries and execute them correctly as part of the async processing pipeline.

qwen3-coder-next

RAG Agent generates answers using the qwen3-coder-next LLM via Ollama The Ollama API provides access to the qwen3-coder-next model. qwen3-coder-next model is deployed and running on the elin server before the deployment of SQLCoder-7B. SQLCoder-7B replaced qwen3-coder-next as the default model for Text-to-SQL queries to improve speed from 40-50s to 2-3s per inference. SQLCoder-7B is proposed to be used for Text-to-SQL queries, while qwen3-coder-next may still be used for summary generation tasks. SQLCoder-7B is deployed as a replacement for qwen3-coder-next to achieve significant speed improvements in DataLens queries. qwen3-coder-next continues to be considered as a fallback model for AI summary generation when SQLCoder-7B may fail in that role.

Redis 7

Backend API uses Redis 7 as job queue and caching service. The Platform Backend depends on Redis 7 for future background job management and caching, although it is not yet implemented. The DataLens Platform backend depends on Redis 7 as a caching or queuing technology. Redis 7 server is integrated within the Coolify deployment environment for DataLens Development.

Redshift

RQ job queue

The AI Summary Generation feature uses the RQ job queue for asynchronous summary generation after extraction completes. The RQ job queue is utilized and managed by backend app api files.py for async AI summary generation processing. The RQ job queue depends on the Redis service for managing asynchronous job scheduling and processing.

RQ job queue for async summary generation

RQ Queue

Redis provides the backend queue for the RQ worker in the extraction pipeline. The RQ worker listens to the RQ queue to process extraction jobs.

RQ Queue with Redis backend

Uses Redis for managing background jobs like file extraction, summaries, and vectorization in DataLens, with bidirectional protocol and pattern. The extraction pipeline depends on the RQ queue for batch extraction job management. RQ Worker depends on RQ queue to consume extraction jobs and process them. The RQ extraction queue on redis triggers the extract_file_job(file_id) function to process file extraction jobs for summaries. The Docker-compose configuration depends on the RQ extraction queue on redis for job processing. The Docker-compose configuration was constrained by a misconfiguration of the RQ extraction queue on redis causing job processing issues. The Backend container uses the RQ extraction queue on redis for managing extraction jobs. RQ Worker extraction processing depends on Redis RQ job queuing for managing extraction jobs. Batch Processing Strategy uses RQ job queue for job management and reliability. Batch extraction is managed by the Backend using RQ job queue for job orchestration.

RQ Worker service

The RQ Worker service depends on the service definition in docker-compose.coolify.yml for backend container deployment and job processing.

Schema API endpoints

API endpoints for schema detection and mapping are planned to enable optional, AI-assisted schema assignment, existing as part of development improvements.

SCP

Data files are transferred to elin GPU server using SCP before Docling extraction runs remotely.

search

Performs vector similarity search in Qdrant collection for project. The /api/v1/discovery endpoints include the search endpoint.

Skill API

The ops user owns and runs the Skill API on the agent server, handling ringfenced SQL execution isolated from the OpenClaw agent service.

Snowflake

Arctic-Text2SQL-R1-7B is a production model backed by Snowflake integration.

SQL Server

SQLite

SSE streaming

DataLens needs to implement SSE streaming to provide streaming UI with progress updates and partial results for better user experience. The SSE progress endpoint implements real-time streaming via Redis pub/sub mechanisms. DataLens requires SSE streaming to provide streaming UI responses like Vanna

SSH

GPU-first document extraction integrates with Docling on elin GPU via SSH and SCP for remote execution and file transfer.

STORAGE_ROOT environment variable

Configured as /app/storage for local data storage. Backend API requires a file upload directory configured by STORAGE_ROOT environment variable. Backend API requires a file upload directory configured by STORAGE_ROOT environment variable.

Streaming SSE Response

Vanna 2.0 provides streaming server-sent events responses with progress updates and structured UI components. Timeout issues with long-running queries are mitigated by streaming responses via the /ask-stream endpoint to prevent HTTP client timeouts. The streaming responses via the /ask-stream endpoint realize the Text-to-SQL analysis query endpoint by enabling streaming of query results to avoid client timeouts. The streaming responses via /ask-stream endpoint are implemented using the FastAPI Framework to provide asynchronous request handling. Timeout issues with long-running queries are mitigated by streaming responses via the /ask-stream endpoint to prevent HTTP client timeouts. The streaming responses via the /ask-stream endpoint realize the Text-to-SQL analysis query endpoint by enabling streaming of query results to avoid client timeouts. The streaming responses via /ask-stream endpoint are implemented using the FastAPI Framework to provide asynchronous request handling. Vanna 2.0 provides streaming UI components to stream structured response objects like tables and charts. Vanna 2.0 provides streaming UI components to stream structured response objects like tables and charts. The SSE progress endpoint integrates with Redis pub/sub for real-time progress streaming.