Skip to main content

Overview

Welcome to the Unstructured.io integration guide for Julep! This integration allows you to extract structured information from a wide variety of document formats, enabling you to build workflows that leverage advanced document processing capabilities. Whether you’re developing a document analysis system, creating a RAG pipeline, or need to convert unstructured documents into structured data, this guide will walk you through the setup and usage.

Prerequisites

To use the Unstructured.io integration, you need an API key. You can obtain this key by signing up at Unstructured.io.

How to Use the Integration

To get started with the Unstructured.io integration, follow these steps to configure and create a task:
1

Configure Your API Key

Add your API key to the tools section of your task. This will allow Julep to authenticate requests to Unstructured.io on your behalf.
2

Create Task Definition

Use the following YAML configuration to define your document parsing task:
Unstructured Example
name: Unstructured Document Processing Task
tools:
- name: unstructured_processor
  type: integration
  integration:
    provider: unstructured
    method: parse
    setup:
      unstructured_api_key: "UNSTRUCTURED_API_KEY"

main:
- tool: unstructured_processor
  arguments:
    file: document_base64 # this is a placeholder for the actual file
    filename: document.pdf # this is a placeholder for the actual filename
    partition_params:
      key1: value1 # these are placeholders for the actual parameters
      key2: value2 # these are placeholders for the actual parameters
3

Run Task

Deploy your task by creating a new execution.

YAML Explanation

  • name: A descriptive name for the task, in this case, “Unstructured Document Processing Task”.
  • tools: This section lists the tools or integrations being used. Here, unstructured_processor is defined as an integration tool.
  • type: Specifies the type of tool, which is integration in this context.
  • integration: Details the provider and setup for the integration.
    • provider: Indicates the service provider, which is unstructured for Unstructured.io.
    • method: Indicates the method to be used, which is parse for Unstructured.io.
    • setup: Contains configuration details
      • unstructured_api_key: (Required) The API key for your Unstructured.io account.
      • server_url: (Optional) Custom API endpoint URL if needed.
      • server: (Optional) Server name to use.
      • url_params: (Optional) Dictionary of parameters to template the server URL with.
      • timeout_ms: (Optional) Request timeout in milliseconds.
  • main: Defines the main execution steps.
    • tool: Refers to the tool defined earlier (unstructured_processor).
    • arguments: Specifies the input parameters for the tool:
      • file: Base64 encoded file string.
      • filename: (Optional) The name of the file. Helpful for file type detection. In case no filename is provided, a random UUID will be generated.
      • partition_params: (Optional) Advanced parameters for document processing. To see the full list of parameters, please refer to the Unstructured.io API documentation.
The different parameters available for the Unstructured.io integration can be found in the Unstructured.io API documentation.
  • Remember to replace UNSTRUCTURED_API_KEY with your actual API key or use environment variables. For base64 encoded files, ensure your file is properly encoded before passing it to the integration.
  • Unstructured.io supports a wide range of file types including PDFs, Word documents, PowerPoint presentations, Excel spreadsheets, emails, HTML, images, and more. For a full list of supported file types, please refer to the Unstructured.io documentation.

Conclusion

With the Unstructured.io integration, you can efficiently convert unstructured documents into structured data for analysis, search, and AI applications. This integration provides a powerful solution for document processing, enhancing your workflow’s capabilities and enabling advanced RAG (Retrieval-Augmented Generation) pipelines.
For more information, please refer to the Unstructured.io documentation.
I