Data Model
152 entities found
NYC municipal datasets
omkostningsberegning
omkostningsberegning table
The omkostningsberegning table is part of the PostgreSQL project_14 schema storing extracted SVGV data. The Data Discovery System requires the omkostningsberegning table for performing budget calculation queries.
org_id
Organization
Organization physical table includes multiple User entities representing users belonging to an organization. Project physical table represents projects that belong to an Organization entity. Organization physical table contains User physical table as members belonging to the organization. Organizations have users tables which store information about users belonging to the organization. Organizations contain projects tables which list all projects under the organization.
PDF files
Phase 2 file types include PDF files Phase 2 Strategy Research & Decision Point considers 48 PDF files for processing in Phase 2 as a high priority task. Opus 4.6 recommends text extraction from all 48 PDFs with OCR only where relevant due to medium-high ROI and 2-3 hours effort estimation. Opus 4.6 states DOCX files provide better ROI than PDFs because DOCX are cleaner text with narrative context, while PDFs are more noisy and variable effort. Extractor Agents process PDFs to extract tables.
PostgreSQL Decimal type
FindingsGenerator requires support for PostgreSQL Decimal type to correctly process numeric data in findings. The Live Backend supports PostgreSQL Decimal type for findings generation.
PostgreSQL schema
The DataLens Platform uses a multi-tenant PostgreSQL schema for organizations, users, projects, files, and cataloging data. The DataLens Platform uses SQLAlchemy to interact with the PostgreSQL schema for database operations. The Backend uses a PostgreSQL schema for data storage.
PostgreSQL theo:5433 proxy
processed data directory
Stores processed data files, with no additional details specified.
ProcessingJob
ProcessingJob physical table records background jobs executed for projects. The processing_jobs table contains a project_id column relating jobs to projects. FileUpload physical table depends on ProcessingJob physical table representing background processing jobs of uploaded files. Projects include processing_jobs tables that track background jobs related to files or data processing for the project.
Project 14
Project 14 involves processing 131 SVGV files for budget analysis; current progress includes uploading, extraction, and initial query testing with plans to fully validate and utilize generated SQL and summaries. Project 14 initially used DuckDB to store extracted data tables during extraction pipeline operations. Arctic-Text2SQL-R1-7B queries Project 14 data through DuckDB schemas for SQL generation. Project 14 data was migrated from DuckDB to PostgreSQL to enable concurrent reads and writes. Arne Hauge is a verified user with access to Project 14. Project 14 contains the SVGV Budget 2026 data used in testing the Data Discovery feature. Project 4 data is stored in a dedicated DuckDB project database file at /app/storage/project_4.duckdb. Project 14 will transition to using its own PostgreSQL schema for extracted data storage. The DataLens project is approved and accessed by the user admin@exerun.com. DuckDB hosts the data tables extracted for Project 14 from SVGV files for analytical queries.
project_14.duckdb
DuckDBService manages Project 14 DuckDB file for extracted data operations. Migration script reads data from Project 14 DuckDB file with read-only access during migration.
project_gdpr_flags
project_gdpr_flags stores GDPR sensitivity detection results for files and columns, indicating potential PII, with confidence scores and timestamps for compliance. The project_gdpr_flags table stores GDPR flags linked to projects via project_id. GDPR flags reference files through the file_id column in project_gdpr_flags. GDPR PII detection results are recorded in the project_gdpr_flags table to track personal data sensitivity. ProjectGdprFlag data entity uses SchemaProfile data entity to define schema-related GDPR flags. AgentSession data entity uses ProjectGdprFlag data entity to manage GDPR compliance flags. Projects contain project_gdpr_flags tables that flag GDPR-related data for the project.
projects
The Backend handles projects The DataLens Platform has a Projects module to handle project creation and management. The projects tables contain agent_sessions tables providing session data for analysis. Projects contain project_gdpr_flags tables that flag GDPR-related data for the project. Projects include schema_profiles tables which hold profile data of database schemas associated with the project. Projects include processing_jobs tables that track background jobs related to files or data processing for the project. Organizations contain projects tables which list all projects under the organization. Projects include file_uploads tables which track files uploaded to the project. Projects track queries tables that record SQL queries run against project data. Projects possess insights tables which store generated insights tied to project data analysis. projects page is implemented as a SvelteKit page. projects page is built using the SvelteKit framework. projects page is part of the DataLens system. The projects page is part of the DataLens project.
projects.default_schema
projects/svgv-budget-analysis/ANALYTICAL_RESULTS.md file
Contains 33 of 35 answered analytical questions, verified budget of 328.5M DKK, includes sources, and is part of report derivation.
Python script question_router.py
Modified to reduce max_tables from 5 to 2, improve schema selection, and handle context window issues, boosting success rate in query generation.
Qdrant vector search service
The Qdrant vector search service uses Ollama embeddings for generating vector representations of data. The Qdrant vector index depends on the DuckDB database for text chunk storage and embedding data source in the DataLens platform.
Qdrant vectors
Qdrant vectors store the vector embeddings generated by Ollama embeddings from Docling extracted chunks for semantic search.
Query
Query uses PostgreSQL Database to persist query history and metadata about user questions and projects. Query physical table refers to queries executed on projects as part of data analysis. The queries table includes a project_id column that links each query to a specific project. The queries table includes a user_id column that associates each query with a user. Query history contains Query records representing individual answered questions stored in the database. Query physical table generates Insight physical table containing analytical insights from executed queries. Projects track queries tables that record SQL queries run against project data. Query data entity references the Project entity by project_id. Query data entity references the User entity by user_id.
Query Router
Query Router currently reads extracted data from DuckDB. After migration, Query Router will read extracted data from PostgreSQL schemas.
query success rate
Intelligent consolidation improves the query success rate from 70% to over 95%.
sales table
The sales table is stored in DuckDB (analytics.db)
sample_sales table
sample_sales table exists within DuckDB (analytics.db) The sample_sales table is stored in DuckDB as a physical table.
Scandinavian budget data schema
Schema Graph
Phase A Schema Graph Construction implements the creation of the Schema Graph representing join relationships and clustering of tables. The logical Schema Graph of tables maps to the physical Consolidated Unified Views created as transient database views.
schema.sql
The DataLens platform backend uses the schema.sql database schema file. The DataLens platform backend includes the schema.sql database schema.
SchemaProfile
Pydantic model in agent_models.py. Fields include domain_area, classification, sensitivity_flag, and persistence_type, representing schema profile data. Classification: value_object; domain_area and other attributes are optional. Schema profiles relate to projects using the project_id field in schema_profiles. ProjectGdprFlag data entity uses SchemaProfile data entity to define schema-related GDPR flags. Projects include schema_profiles tables which hold profile data of database schemas associated with the project.
scope field
The project goal concept is represented by the scope field in the data model although renamed in UI. The scope field is required in the ProjectCreate pydantic model as per design. The scope field is included in the ProjectResponse to ensure it is always present. The scope field replaces hardcoded text in backend/app/workers/catalog.py to generate more accurate file summaries. Word count validation constrains the scope field to have a hard minimum of 20 words. Project Creation Form uses the scope field renamed as "Project Goal" in the UI for project creation. ProjectCreate validation for scope validates that the scope field meets minimum word count requirements.