Intent
495 entities found
schema system
SchemaMapper
A SchemaMapper service is under development to automate column renaming and schema application, with current iteration addressing persistent mapping storage and automated application issues. SchemaMapper uses StorageService to manage file storage when mapping uploaded file columns to standard schemas.
SchemaMapper Service
The Standard Schemas capability requires the SchemaMapper Service for AI-powered column mapping.
scripts/reset-and-reextract.py
The reset-and-reextract.py script processes the 132 files for full reset and re-extraction of the SVGV dataset.
SECRET_KEY environment variable
DataLens Platform requires setting a unique SECRET_KEY environment variable for session signing. Backend API requires session signing key configured as SECRET_KEY environment variable. DataLens Platform requires setting a unique SECRET_KEY environment variable for session signing. Backend API requires session signing key configured as SECRET_KEY environment variable.
section-based chunking
Semantic chunking based on document sections or slides is prioritized for better document structure preservation. It involves detecting headings and dividing content accordingly, with fallback to fixed size chunks for efficiency.
Security Checklist
semantic chunking
Technique for dividing documents into meaningful segments for embedding and search, used in RAG system. The DOCX extractor will be refactored to implement semantic chunking using section-based chunk boundaries and heading hierarchy tracking. The PPTX extractor uses semantic chunking based on slide units with enhancements for sub-slide chunk splitting in case of dense content. The chunking strategy for DOCX and PPTX files requires semantic chunking based on section or slide boundaries, improving retrieval quality for RAG. Semantic chunking is enforced as a business rule in the Docling extraction system for section/slide-based document processing. The DOCX extractor implements semantic chunking with section-based chunk boundaries to preserve document structure. Document RAG implements semantic chunking of documents to improve retrieval accuracy.
Semantic Layer
WrenAI employs a semantic layer with YAML definitions encoding schema, metrics, joins, and governance rules.
Semantic Layer (MDL Models)
WrenAI implements a Semantic Layer using MDL Models to define table schemas, metrics, joins, and governance rules. WrenAI uses a semantic layer with YAML model definitions for schema, metrics, and governance
Semantic Search
Semantic Search capability uses nomic-embed-text component for creating table embeddings to improve table ranking in schema selection. Semantic Search stores table embeddings in Qdrant vector database for fast similarity search and ranking. QuestionRouter uses Semantic search capability to handle unstructured textual queries via QdrantService. DataLens performs semantic search querying Qdrant vectors. Semantic Search using nomic-embed-text relies on Qdrant vector database for vector storage and search. Question Router routes queries to Semantic Search using Qdrant vectors
semantic table matching
Data Discovery feature added with semantic table matching, improving query success rate from 70% to over 95%. Implementation includes /api/v1/discovery endpoints and discovery components, currently not started. The discovery.py service implements semantic table matching through TableIndex.
SESSION 7: FRESH TEST BATCH
Validated extraction pipeline on 8 files, confirming successful data load with ongoing issues in summaries and API endpoints.
SESSION 8: SVGV 5-FILE FULL VALIDATION
Validation of 5 uploaded files across extraction, summaries, and feature workflows, confirming readiness of the platform with ongoing blocker fixes.
Shared GPU Policy
Single table with project_id
SkillExecutor
LocalAgentClient uses SkillExecutor to process messages locally with an async generator. IronClawClient.send_message() depends on SkillExecutor or corresponding executor to process messages and yield responses asynchronously. SkillExecutor uses SkillResult as the result from skill execution. SkillExecutor orchestrates agent's ReAct loop and produces SkillResult. LocalAgentClient.send_message uses SkillExecutor to execute and stream query results asynchronously. LocalAgentClient instantiates SkillExecutor internally for processing agent messages. AgentWarmingService assembles warm context that is used by SkillExecutor in agent sessions. RingfencedSkills replace raw SQL skills with constrained operations used by SkillExecutor when executing agent skills. LocalAgentClient directly uses SkillExecutor to run agent logic without IronClaw service dependency.
SkillResult
SkillExecutor uses SkillResult as the result from skill execution. ExploreSchemaSkill produces SkillResult when executing to discover and profile project data schema. QueryDataSkill produces SkillResult when executing natural language queries via Text-to-SQL. DiscoverInsightsSkill uses SkillResult when performing statistical profiling and anomaly detection. VisualizeSkill uses SkillResult when generating Plotly chart specifications. PrepareDataSkill produces SkillResult during data cleaning and transformation executions. GenerateReportSkill uses SkillResult to compile findings into structured reports. SkillExecutor orchestrates agent's ReAct loop and produces SkillResult. ExportService converts SkillResult data into CSV, Excel, or JSON formats.
slide chunking
The PPTX extractor implements slide-based chunking, with potential sub-slide splits for dense content.
slide layout detection
The PPTX extractor design includes slide layout detection such as title, bullets, and blank layouts to improve chunk semantic understanding.
smart processing UX model
The batch upload pipeline is part of the smart processing UX model for DataLens. Phase B involving the smart auto-processing pipeline with Qdrant is part of the smart processing UX model. Phase C, the unified question interface, is part of the smart processing UX model for DataLens. The smart processing capability involves Phase B, which is a smart auto-processing pipeline with Qdrant. The batch upload pipeline uses smart processing to handle uploaded data effectively. DataLens implements smart processing as part of its data handling capabilities.
Source attribution
DataLens provides source attribution tracing results back to exact files and chunks.
SQL accuracy
Targeted to achieve greater than 40% correctness on DABStep benchmark.
SQL Common Table Expression (CTE)
Planned to create SQL CTEs for consolidating multiple related tables, supporting complex and multi-table queries with better accuracy.
SQL extraction regex fix
SQL extraction regex fix addresses the problem in Qwen3 response format where the SQL query was not properly captured due to missing newline before closing backticks. Safety net cleanup enhances SQL extraction regex fix by stripping explanation text markers after extraction to ensure pure SQL before execution. SQL extraction regex fix reduces SQL syntax errors by properly capturing SQL code blocks which fixed previous syntax errors related to backticks. Logging is associated with SQL extraction regex fix as it will be added to trace extraction and execution results. SQL extraction regex fix was deployed on theo. Commit 408be74 fixes the SQL extraction issue where multiple code blocks in Qwen3 response cause incorrect matching by capturing only SELECT...; patterns in code blocks. The Qwen3 response format required the SQL extraction regex to be fixed to handle SQL code blocks without a newline before closing backticks. Commit 70f724f cleaned up safety nets by stripping explanation text markers from the extracted SQL, ensuring pure SQL before execution. The fixed SQL extraction regex and cleanup removed SQL syntax errors in the logs caused by improper SQL extraction. The generation and display of findings depend on correct SQL extraction, which was fixed by the regex updates.
SQL Generation Stage
Ongoing development to produce accurate SQL queries from natural language, integrating multi-stage reasoning and handling larger schemas.
SQL migration execution script
The SQL migration execution script is used to apply the agent migration file 003_agent_tables.sql to the PostgreSQL database on theo.
SQL query
System reviews indicate no specific summaries provided; current focus is on core functionality.