DesignDecisionArchitecture
GPU-first design
DataLens implements a GPU-first design by leveraging Ollama on elin for embeddings and inference. GPU-first document extraction uses Docling for DOCX and PPTX extraction as a mandatory component without fallback. The GPU-first document extraction uses the theo server for orchestration including FastAPI backend, RQ workers, and job queuing. GPU-first document extraction uses the RTX 4000 SFF Ada 20GB GPU on elin for document extraction and embeddings generation. The GPU-first document extraction implementation is validated by the test suite 'test_docling_extractors.py'. DataLens uses a GPU-first architecture leveraging Ollama on Elin GPU for embeddings and inference.