xAI: xAI: Grok Collections API | SignalBreak

xAI: xAI: Grok Collections API | SignalBreak | SignalBreak

AI Impact Summary

### State-of-the-art RAG system built directly into our API. Today, we're excited to announce Collections API. With Collections, you can upload and search through entire datasets. From PDFs and Excel sheets to entire codebases, you can upload your files into a knowledge base that supports precise and fast search. This allows developers to build RAG applications without the headache of managing indexing and retrieval infrastructure. To help you get started, we're making file indexing and storage free for the first week*, with retrieval priced at a flat rate of $2.50 per 1,000 searches. #### Indexing - Powerful document understanding: We use OCR and layout-aware parsing to extract text while preserving structure such as the layout of a PDF, hierarchy of an Excel table, or the syntax of code. - Smart file management: Easily upload, update, and download files. And when a file changes, our system efficiently reindexes it to ensure your collection is never stale. - Broad format support: Collections supports a wide range of file types. (see full list) #### Retrieval Choose the retrieval method that best fits your use case: - Semantic search: To search using the meaning and intent behind a query. - Keyword search: For precise term matching. - Hybrid search: For the highest accuracy, combine keyword and semantic search. We support both a dedicated reranker model and reciprocal rank fusion. #### Benchmark Results Our Collections API delivers state-of-the-art retrieval performance, matching or outperforming leading models in real-world RAG tasks across finance, legal, and coding domains. These fields are especially challenging due to their long, dense documents. To avoid hallucinations and deliver reliable answers, models must retrieve the exact passages and reason over them accurately. Accuracy* (Higher is better) Task | xAI Grok 4.1 Fast | Google Gemini Pro 3 | OpenAI GPT 5.1 Finance Tabular and numerical questions | 93.0 | 85.9 | 84.7 Legal Complex reasoning over multiple chunks | 73.9 | 74.5 | 71.2 Coding Code understanding and large file systems | 86 | 85 | 81 *Internal source. #### Financial Analysis Extracting tabular and numerical data from files can be challenging with semantic search alone. Hybrid search enables you to accurately retrieve this data from documents such as SEC filings*, allowing the model to precisely reference information. #### Legal Analysis (LegalBench) The LegalBench dataset tests retrieval and reasoning over nuanced legal language and complex cross-references, consisting of 128 challenging question-answer pairs drawn from an extensive corpus of authentic commercial contracts across multiple datasets. #### Codebase (DeepCodeBench) Code understanding is crucial for applications such as code summarization and generation. We use the DeepCodeBench dataset to comprehensively benchmark for this. It features a diverse set of tasks drawn from real-world open-source repositories, API usage, and complex algorithmic probl

Source text

xAI: Grok Collections API

More from xAI

Get alerts for xAI