ThirdPartyComponentArchitecture
Qwen2.5-Coder-14B-AWQ
vLLM model deployed on elin with 14B parameters, optimized for GPU inference, using a 4-bit quantization to fit in 10GB VRAM. Qwen2.5-Coder-14B-AWQ is a part of the DataLens DS-STAR Implementation Plan as the deployed vLLM model. The vLLM component uses the Qwen2.5-Coder-14B-AWQ model version.