Skip to main content

Introduction

Welcome to the Ragextract Documentation
Here you'll discover and learn how to integrate our Subworkflow APIs and start building production ready RAG applications.

Ragextract aims to be a simple document search service to power AI agents and automations. It does this by providing:

  1. API service to upload PDF, Docx and Pptx documents of up to 300mb or 500 pages.
  2. Search and retrieval APIs for one or many uploaded document contents

Ragextract is best used in AI projects where performing Q&A workflows are key ie. extracting specific data from within the document.

  • Real Estate & Property: Process hundreds of contracts, surveys and proposals without bottleneck allowing you to serve more customers.
  • Construction & Building: Streamline RFPs, questionnaires and siteplans in seconds to get answers quickly.
  • Legal & Compliance: Be free of arbitrary page limits and easily create vector search indexes over documents from 1 page to 5000 pages and even more.
  • Finance & Accounting: Empower your Structured outputs workflow when reports are heavy with charts and graphs and can typically be in excess of 100+ pages.
  • Academia & Research: Simplify research paper automation without increase in additional infrastructure.

Features

  • Simple API schema: Designed for speed, the Datasets API is light and minimal to make integration easier and faster.
  • Distributed Document Processing: Under the hood, a powerful processing engine ensures durability and reliability for your projects.
  • Document Conversion: Supports PDF, Docx, PPTX and more whilst converting pages into images for Visual Language models.
  • Efficient Document Page Retrieval: After processing a document, retrieve only relevant pages rather than all to reduce memory load.
  • Automated Search Indexing: Search APIs provided over document contents via SOTA image embeddings saves time.
  • Self-Expiring Asset Links: Protects against asset links being leaked when passing pages to LLM provider and increase compliance.
  • Web Portal: Manage your organisation, team, workspaces and generate API keys to use the Datasets API.

Getting Support

Email and Private Chat Support are available for our Standard and Enterprise Users. Please contact [email protected]. All members including Starter Plan members have access to our community support via Discord.

About Us

Subworkflow AI is dedicated to building AI Subworkflows for AI Developers. We're a small products lab based in London, UK and enjoy collaborating in AI projects all over the world. Visit our Contacts Page to get in touch.

Company

Ragextract is owned and operated by Subworkflow AI Limited (16781125) who is registered in England & Wales, United Kingdom. Our registered address is 71-75 Shelton Street, Covent Garden, London, WC2H 9JQ, United Kingdom. Please read our terms of use, privacy policy and acceptable usage policy before using our service.

2025 © Subworkflow AI Limited. All rights reserved.