All Domains
1587 entities found
PUT /{project_id}
Update project endpoint, status unspecified.
pydantic-ai-slim
Pydantic-ai-slim integrates with anthropic for AI model support. Pydantic-ai-slim integrates with mistralai for AI model support. Pydantic-ai-slim integrates with groq for AI model support.
pydantic-graph
PyPDF2
pytest
DataLens uses pytest to run local tests before deployment. DataLens deployment workflow requires that pytest tests pass locally before pushing code for backend testing. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. Httpx is used in tests with pytest and pytest-asyncio for async HTTP testing compatibility.
pytest-asyncio
Python
DataLens uses Python as part of its data analysis environment. The Architecture includes all Python and AI components running on elin.
Python client
Python environment setup
GPU Infrastructure requires Python environment setup with all dependencies.
Python script question_router.py
Modified to reduce max_tables from 5 to 2, improve schema selection, and handle context window issues, boosting success rate in query generation.
Python venv
The DataLens Project uses a Python virtual environment with dependencies like vanna, llama-index, duckdb, and pandas. The implementation requires Python environment setup with all dependencies.
python-docx
Docling extraction strategy conflicts with python-docx as fallback for DOCX extraction; python-docx was removed in favor of Docling only. The DOCX extractor uses python-docx as a fallback extraction method if Docling extraction is not enabled or fails. The DOCX extractor uses the python-docx third-party component. The DOCX extractor falls back to python-docx for faster extraction of simple documents.
python-jose
The Auth system in the DataLens Platform uses python-jose as a dependency for security or token management.
python-magic
Pdfplumber uses python-magic for MIME type detection during PDF table extraction.
python-multipart
Version 0.0.9, used for handling multipart form data in Python. The File upload feature in the DataLens Platform uses python-multipart to handle multipart form data uploads. FastAPI uses python-multipart for multipart form data parsing.
python-pptx
Docling extraction conflicts with python-pptx as fallback for PPTX extraction; python-pptx was removed as fallback is disallowed. The PPTX extractor implementation is based on python-pptx for slide and text extraction with semantic chunking. The PPTX extractor uses the python-pptx third-party component. The PPTX extractor uses python-pptx to extract slide-based chunks during DataLens Phase 2.
PyTorch
Used as a dependency in the GPU extraction process, supporting Docling on elin GPU for document parsing. Docling includes PyTorch, transformers, and OCR support as part of its dependencies.
q2-sql-capture.png
Screenshot capturing SQL queries or data during quarter 2 analysis.
Qdrant
Qdrant is employed as a vector database for document semantic search in DataLens, storing embeddings of text chunks. It facilitates similarity search and retrieval for RAG functions, integrating with the platform's API and backend for fast document similarity ranking, supporting document retrieval and hybrid search capabilities, and is hosted on elin at port 6333. Qdrant indexes embeddings generated from text chunks stored in DuckDB, enabling semantic search in the platform. Semantic Search using nomic-embed-text relies on Qdrant vector database for vector storage and search. RAGAgent stores embeddings in the Qdrant vector database for retrieval.
Qdrant semantic search
Question router routes textual queries to Qdrant semantic search service.
Qdrant vector search service
The Qdrant vector search service uses Ollama embeddings for generating vector representations of data. The Qdrant vector index depends on the DuckDB database for text chunk storage and embedding data source in the DataLens platform.
Qdrant vectors
Qdrant vectors store the vector embeddings generated by Ollama embeddings from Docling extracted chunks for semantic search.
qdrant-client
Qdrant-client uses requests for HTTP communications. Vanna depends on qdrant-client for vector database integration.
QDRANT_HOST environment variable
Set to 176.9.90.154 for vector DB connection.
QDRANT_PORT environment variable
Set to 6333 for Qdrant vector database access.
QdrantService
EmbeddingService produces embeddings used by QdrantService for semantic search and vector collections. QdrantService supports TableIndexService by providing vector collections for semantic table search indexes. DataLensAgentMemory is backed by QdrantService to provide vector-based agent memory. QuestionRouter uses the Qdrant Service for semantic vector search. Qdrant Service is defined in backend/app/services/qdrant_service.py. Qdrant Service uses Ollama Embedding Service to create vector embeddings. The search method is part of the Qdrant Service. QdrantService initialization was changed to lazy loading in the QuestionRouter to prevent startup timeouts
QdrantService class
The QdrantService class is defined within backend/app/services/qdrant_service.py. QuestionRouter depends on QdrantService but changed its initialization to lazy loading to prevent startup delays. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. QuestionRouter uses QdrantService for semantic search and retrieving relevant document chunks when processing textual or hybrid queries. QdrantService calls Ollama embedding API to transform texts into vector embeddings used for semantic search. QdrantService connects to and manages a Qdrant vector database instance hosted at 176.9.90.154 for semantic search. TableIndexService uses QdrantService to build semantic search indices for database tables. QuestionRouter now initializes QdrantService lazily instead of at startup to avoid delays during startup and allow immediate request handling without waiting for Qdrant health status. The search method is part of the Qdrant Service.
Query
Query uses PostgreSQL Database to persist query history and metadata about user questions and projects. Query physical table refers to queries executed on projects as part of data analysis. The queries table includes a project_id column that links each query to a specific project. The queries table includes a user_id column that associates each query with a user. Query history contains Query records representing individual answered questions stored in the database. Query physical table generates Insight physical table containing analytical insights from executed queries. Projects track queries tables that record SQL queries run against project data. Query data entity references the Project entity by project_id. Query data entity references the User entity by user_id.
Query classification
Query Enhancer
Phase B Intelligent Retrieval implements the Query Enhancer for entity extraction and relevant table identification for queries.