Project: datalens
81 entity types
Matrix/Intent/Extraction pipeline
RequirementIntent

Extraction pipeline

DataLens Platform uses an extraction pipeline involving DS-STAR extractors and DuckDB for data processing. The extraction pipeline depends on the RQ queue for batch extraction job management. The extraction pipeline previously wrote extracted data into DuckDB, causing write locks during extraction. The extraction pipeline was modified to write extracted data into PostgreSQL enabling concurrent query operation. The extraction pipeline depends on the RQ queue for batch extraction job management. The extraction pipeline previously wrote extracted data into DuckDB, causing write locks during extraction. The extraction pipeline was modified to write extracted data into PostgreSQL enabling concurrent query operation. The Extraction Pipeline depends on the RQ Worker to process the extraction queue for files asynchronously. Extraction Pipeline stores extracted file data and catalog information in PostgreSQL database with language support. The Backend implements the extraction pipeline business process including DS-STAR integration. The DataLens Platform includes an extraction pipeline that converts CSV, Excel, and PDF files into DuckDB usable data. The Extraction pipeline in the DataLens Platform uses pandas for data manipulation and loading extracted data. The Extraction Pipeline depends on the RQ Worker to process files asynchronously in the extraction queue.