All Domains
1587 entities found
schema detection
Schema detection (AI via Ollama)
Implementing AI-based schema detection is a 'must-have' feature. Current development is ongoing, with integration of Ollama for detecting schema types and suggesting column mappings, aiming for full functionality.
schema detection via Ollama
Schema Graph
Phase A Schema Graph Construction implements the creation of the Schema Graph representing join relationships and clustering of tables. The logical Schema Graph of tables maps to the physical Consolidated Unified Views created as transient database views.
Schema Limiting
Schema Limiting constrains Arctic-Text2SQL-R1-7B to reduce the number of database tables provided for each query to avoid context window overflow. Phase 1 Implementation includes reducing the maximum number of tables passed to SQL generator via Schema Limiting to avoid Arctic context overflow.
schema mapping
schema mapping suggestions
Schema Relationship Explorer
Schema Relationship Explorer is an instance or feature of the User Schema Relationship Mapper used for data visualization. The Schema consolidation mechanism is planned to be presented via the Schema relationship explorer UI for better transparency and customization.
Schema Selection Stage
Developed to improve table relevancy and joinability detection, enabling more reliable user queries through schema relationship discovery.
schema system
schema.sql
The DataLens platform backend uses the schema.sql database schema file. The DataLens platform backend includes the schema.sql database schema.
SchemaMapper
A SchemaMapper service is under development to automate column renaming and schema application, with current iteration addressing persistent mapping storage and automated application issues. SchemaMapper uses StorageService to manage file storage when mapping uploaded file columns to standard schemas.
SchemaMapper Service
The Standard Schemas capability requires the SchemaMapper Service for AI-powered column mapping.
SchemaProfile
Pydantic model in agent_models.py. Fields include domain_area, classification, sensitivity_flag, and persistence_type, representing schema profile data. Classification: value_object; domain_area and other attributes are optional. Schema profiles relate to projects using the project_id field in schema_profiles. ProjectGdprFlag data entity uses SchemaProfile data entity to define schema-related GDPR flags. Projects include schema_profiles tables which hold profile data of database schemas associated with the project.
scope field
The project goal concept is represented by the scope field in the data model although renamed in UI. The scope field is required in the ProjectCreate pydantic model as per design. The scope field is included in the ProjectResponse to ensure it is always present. The scope field replaces hardcoded text in backend/app/workers/catalog.py to generate more accurate file summaries. Word count validation constrains the scope field to have a hard minimum of 20 words. Project Creation Form uses the scope field renamed as "Project Goal" in the UI for project creation. ProjectCreate validation for scope validates that the scope field meets minimum word count requirements.
SCP
Data files are transferred to elin GPU server using SCP before Docling extraction runs remotely.
scripts/migrate_duckdb_to_pg.py
scripts/migrate_duckdb_to_postgresql.py
Migration script used for transferring existing data from DuckDB to PostgreSQL, run with specific project ID and path, completed during the migration process. Migration script reads existing Project 14 data from DuckDB in read-only mode to avoid conflicts during migration. Migration script writes migrated data into PostgreSQL via PgDataService. Migration script reads data from Project 14 DuckDB file with read-only access during migration.
scripts/monitor-extraction.sh
The monitor-extraction.sh script monitors the extraction progress of the 132 files including queue size and extracted/pending counts.
scripts/reset-and-reextract.py
The reset-and-reextract.py script processes the 132 files for full reset and re-extraction of the SVGV dataset.
search
Performs vector similarity search in Qdrant collection for project. The /api/v1/discovery endpoints include the search endpoint.
SECRET_KEY environment variable
DataLens Platform requires setting a unique SECRET_KEY environment variable for session signing. Backend API requires session signing key configured as SECRET_KEY environment variable. DataLens Platform requires setting a unique SECRET_KEY environment variable for session signing. Backend API requires session signing key configured as SECRET_KEY environment variable.
section-based chunking
Semantic chunking based on document sections or slides is prioritized for better document structure preservation. It involves detecting headings and dividing content accordingly, with fallback to fixed size chunks for efficiency.
SecurePass123!
Security Checklist
semantic chunking
Technique for dividing documents into meaningful segments for embedding and search, used in RAG system. The DOCX extractor will be refactored to implement semantic chunking using section-based chunk boundaries and heading hierarchy tracking. The PPTX extractor uses semantic chunking based on slide units with enhancements for sub-slide chunk splitting in case of dense content. The chunking strategy for DOCX and PPTX files requires semantic chunking based on section or slide boundaries, improving retrieval quality for RAG. Semantic chunking is enforced as a business rule in the Docling extraction system for section/slide-based document processing. The DOCX extractor implements semantic chunking with section-based chunk boundaries to preserve document structure. Document RAG implements semantic chunking of documents to improve retrieval accuracy.
Semantic Layer
WrenAI employs a semantic layer with YAML definitions encoding schema, metrics, joins, and governance rules.
Semantic Layer (MDL Models)
WrenAI implements a Semantic Layer using MDL Models to define table schemas, metrics, joins, and governance rules. WrenAI uses a semantic layer with YAML model definitions for schema, metrics, and governance