semantic chunking
Technique for dividing documents into meaningful segments for embedding and search, used in RAG system. The DOCX extractor will be refactored to implement semantic chunking using section-based chunk boundaries and heading hierarchy tracking. The PPTX extractor uses semantic chunking based on slide units with enhancements for sub-slide chunk splitting in case of dense content. The chunking strategy for DOCX and PPTX files requires semantic chunking based on section or slide boundaries, improving retrieval quality for RAG. Semantic chunking is enforced as a business rule in the Docling extraction system for section/slide-based document processing. The DOCX extractor implements semantic chunking with section-based chunk boundaries to preserve document structure. Document RAG implements semantic chunking of documents to improve retrieval accuracy.