All Domains
1587 entities found
extractors
DS-STAR Intelligence integrates with existing extractors. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. The plan includes CSV Extractor to validate, clean, and load CSV data into DuckDB. The plan includes Excel Extractor to handle multi-sheet workbooks, normalize headers, detect merged cells, and load data into DuckDB. The plan includes PDF Extractor that uses vLLM to extract tables as JSON and loads validated data into DuckDB. DataLens Master Implementation Plan depends on Extractor components for data ingestion and processing during extraction phases. DS-STAR Intelligence integrates with existing extractors to improve extraction quality.
FallbackTableIndex
Simple, in-memory keyword-based index for table search when Qdrant is unavailable, in backend/app/services/table_index.py.
FastAPI
The Backend API is implemented using FastAPI framework. The DataLens Platform backend is implemented as a FastAPI app to expose APIs for authentication, projects, files, extraction, and analysis. The Backend uses FastAPI as its capability framework for serving APIs. The streaming responses via /ask-stream endpoint are implemented using the FastAPI Framework to provide asynchronous request handling. IronClaw Agent Feature uses FastAPI for backend API implementation including asynchronous generators. API LAYER is implemented using FastAPI framework for backend services. The Backend uses FastAPI framework API Layer is implemented with FastAPI as the backend web framework. Backend process runs the FastAPI application along with its dependencies to serve API requests. Agent Gateway is implemented as a FastAPI module that bridges the frontend and IronClaw Service. The Platform Backend is built as a FastAPI app exposing API endpoints for auth, projects, files, extraction, and analysis. theo Backend is implemented with FastAPI for its web API and service operations. FastAPI depends on Uvicorn for serving the application. FastAPI uses python-multipart for multipart form data parsing. FastAPI framework implements the API Layer.
FastAPI backend
Handles API requests, manages business logic, connects to data services, and serves the frontend for DataLens, using FastAPI routes and dependencies as its interface. DataLens Platform uses a FastAPI app backend to provide API endpoints and services. The FastAPI backend in the DataLens Project uses SQLAlchemy ORM for data access. The FastAPI backend uses DS-STAR FileAnalyzer for AI cataloging of uploaded files. The FastAPI backend integrates with DuckDB for data extraction storage and querying. The FastAPI backend exposes a Text-to-SQL query API for natural language queries. The SvelteKit frontend communicates with the FastAPI backend via API endpoints. The FastAPI backend uses PostgreSQL database for multi-tenant metadata storage. The FastAPI backend provides a production-ready Dockerfile for deployment. The FastAPI backend uses AI cataloging to automatically discover table structures on file upload. FastAPI backend runs on theo server and orchestrates file extraction, query execution, and database storage. The DataLens Platform is built with a FastAPI app backend that passes all 13 tests. pytest is used to execute tests that cover the FastAPI app components of the DataLens Platform. httpx is used as part of tests covering the FastAPI app implementation of the DataLens Platform. Project 13 uses the FastAPI backend for file extraction and DuckDB storage as part of its data platform. theo hosts the FastAPI backend which runs file extraction, DuckDB storage, and uses SQLCoder-7B for query processing. The RQ worker is used by the FastAPI backend to handle asynchronous jobs such as embeddings and summary generation.
FastAPI OpenAPI Spec
Defines RESTful API specifications for seamless integration and documentation.
FastAPI Swagger
Feature Flag Pattern
File extraction process
The file extraction process triggers the async embedding queue to generate embeddings asynchronously after extraction completes.
File prioritizer
The file prioritizer component is utilized by the batch processor orchestrator to assign tiers and manage processing order. BatchProcessor uses FilePrioritizer to prioritize project files in processing pipeline orchestration. FilePrioritizer uses DiscoveryService outputs to prioritize project files relevant to analytical questions. The File prioritizer is used by the Batch processor orchestrator in the processing pipeline. The Project dashboard uses file prioritization features to show tier badges and organize files. ExtractionCoordinator coordinates extraction processes that are prioritized by FilePrioritizer. StorageService handles file storage and retrieval for FilePrioritizer's prioritization process. File prioritizer uses information in FileUpload columns such as tier and AI summary to assign processing priorities. The File prioritizer capability is validated by the test_file_prioritizer test case.
File Status Breakdown
File summaries
File summaries are developed to accurately reflect content, relevance, and key questions for each uploaded data file, informing users and linking to project goals. Files are cataloged within DataLens, supporting comprehensive AI-generated summaries to enhance data understanding.
file summary prompt
The project goal is used to inform the file summary prompt replacing the previous hardcoded budget analysis description.
File Summary task
Generates concise summaries describing file contents, relevance, and questions they can answer, during file ingestion and cataloging, to improve data cataloging and documentation. File Summary Generation uses the project's scope instead of a hardcoded string to contextualize the summaries.
File-First Data Platform
file_analyzer.py
DataLens Project uses file_analyzer.py agent for file analysis but has a datetime serialization bug needing fix.
file_upload.catalog_data
file_uploads
The file_uploads data entity tracks the processing status of the 132 files in the project_14 schema. The file_uploads table contains a project_id column linking uploaded files to projects. The file_uploads table includes an uploaded_by field that links each file to the user who uploaded it. The text_chunks table includes a file_id referencing the file from which the text chunk originated. GDPR flags reference files through the file_id column in project_gdpr_flags. The data catalog generation process depends on the upload directory where user files are stored. File uploads contain text_chunks tables which store chunks of text extracted from uploaded files. Projects include file_uploads tables which track files uploaded to the project.
FileAnalyzer
The DataLens DS-STAR Implementation Plan includes the FileAnalyzer as a core component. FileAnalyzer is a component of the DS-STAR pipeline. The DS-STAR pipeline includes the FileAnalyzer component. The DataLens DS-STAR Implementation Plan includes the FileAnalyzer component. FileAnalyzer produces data catalogs used in the extraction planning process. Planner Agent uses the data catalog generated by FileAnalyzer as input.
files
The Backend manages files
Files API Endpoints
The Files API supports AI Summary Generation by exposing necessary endpoints and data fields for storage and retrieval of AI summaries.
FileUpload
DataLens Platform includes a file upload feature for CSV, Excel, and PDF files. DS-STAR FileAnalyzer integration depends on the file upload feature to automatically catalog files upon upload. FileUpload physical table entries are associated with Project entities, storing files related to projects. The Backend implements the file upload requirement. AI Summary Generation stores the generated summaries in the ai_summary column of the FileUpload records. Project physical table contains multiple FileUpload physical tables representing uploaded files associated with the project. FileUpload physical table depends on ProcessingJob physical table representing background processing jobs of uploaded files.
FileUpload record ai_summary column
AI Summary Generation stores summaries in the FileUpload record ai_summary column after extraction completes in the cataloging workflow. Dashboard file.ai_summary display renders data stored in the FileUpload record ai_summary column for users to see file summaries. The PostgreSQL database stores FileUpload records with the ai_summary column used for AI summaries of files. File prioritizer uses information in FileUpload columns such as tier and AI summary to assign processing priorities.
FileUpload.svelte
The Frontend uses FileUpload.svelte UI component for drag-drop file upload.
FILTER categories
The Full Findings Visualization Layer capability requires the support of 7 filter categories for filtering findings in the UI.
final-validator-test.png
Image file showing test results or validation outcome for visualization features.
Finding
FindingsGenerator uses Finding to represent individual analytical findings generated from query results. Finding is a data structure created and managed by FindingsGenerator during analytical finding generation.
findings
The generation and display of findings depend on correct SQL extraction, which was fixed by the regex updates.
Findings Board
Findings generation
Logging is needed for findings generation to trace data flow and identify failure points during findings creation. QuestionRouter integration works with FindingsGenerator logic in the agent architecture to process and generate findings. FindingsGenerator logic outputs generated findings via the message streaming format for frontend rendering.
findings_generator logging
findings_generator logging is a specific logging to be added at the start of findings generation for diagnostics. findings_generator logging is to be implemented in backend/app/services/findings_generator.py.