Video Processing

Overview

This tutorial demonstrates how to:

Upload and process videos using Cloudinary integration
Extract and analyze video content
Add overlays and transformations
Process video subtitles and speaker information

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    upload_file:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file
    subtitle_vtt:
      type: string
      description: The vtt file content to add subtitles to the video

This schema specifies that our task expects:

A video file URL
A public ID for the video
A transformation prompt describing desired changes
VTT subtitle content (optional)

2. Tools Configuration

Next, we define the external tools our task will use:

- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: YOUR_CLOUDINARY_API_KEY
      cloudinary_api_secret: YOUR_CLOUDINARY_API_SECRET
      cloudinary_cloud_name: YOUR_CLOUDINARY_CLOUD_NAME

- name: cloudinary_edit
  type: integration
  integration:
    provider: cloudinary
    method: media_edit

- name: ffmpeg_edit
  type: integration
  integration:
    provider: ffmpeg

We’re using three main integrations:

Cloudinary for video uploads and transformations
FFmpeg for additional video processing capabilities

3. Main Workflow Steps

Initial Video Upload

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video

Understanding the use of the steps[0].input variable

This step:

Takes the input video URL
Uploads it to Cloudinary
Specifies the resource type as video

Create Video Preview

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation:
        - start_offset: 0
          end_offset: 30

This step:

Creates a 30-second preview of the video
Useful for quick analysis and processing

Analyze Video Content

- prompt:
  - role: user
    content: 
      - type: image_url
        image_url:
          url: trimmed_video_url
      - type: text
        text: |-
          Which speakers are speaking in the video? And where does each of them sit?

This step:

Analyzes the video content
Identifies speakers and their positions
Uses VTT subtitles for additional context

Generate Speaker Transformations

- evaluate:
    speakers_transformations: |-
      $ [
      transform
      for speaker in _.speakers_json
        for transform in [
        {
          "overlay": {"font_family": "Arial", "font_size": 32, "text": speaker.speaker},
          "color": "white"
        },
        {
          "duration": 5,
          "flags": "layer_apply",
          "gravity": "south_east" if speaker.position == "right" else "south_west",
          "start_offset": speaker.timestamps[0].start,
          "y": 80,
          "x": 80
        }
        ]
      ]

This step:

Creates transformations for each speaker
Adds speaker labels with proper positioning
Sets timing for each overlay

Apply Transformations

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation: _.speakers_transformations

This step:

Uses the Cloudinary upload tool to apply the generated transformations
Processes the video with speaker labels and positioning
Returns a URL to the transformed video with all overlays applied

Complete Task YAML

YAML

# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/src/schemas/create_task_request.json
name: Julep Video Processing Task
description: A Julep agent that can process and analyze videos using Cloudinary

########################################################
################### INPUT SCHEMA #######################
########################################################

input_schema:
  type: object
  properties:
    video_url:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file

########################################################
################### TOOLS ##############################
########################################################

tools:
- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: "YOUR_CLOUDINARY_API_KEY"
      cloudinary_api_secret: "YOUR_CLOUDINARY_API_SECRET"
      cloudinary_cloud_name: "YOUR_CLOUDINARY_CLOUD_NAME"

########################################################
################### MAIN WORKFLOW ######################
########################################################

main:
# Step #0 - Upload the video to cloudinary
- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video

# Step #1 - Analyze the video content with a prompt
- prompt:
  - role: user
    content:

      - type: text
        text: |-
          You are a Cloudinary expert. You are given a medial url. it might be an image or a video.
          You need to come up with a json of transformations to apply to the given media.
          Overall the json could have multiple transformation json objects.
          Each transformation json object can have the multiple key value pairs.
          Each key value pair should have the key as the transformation name like "aspect_ratio", "crop", "width" etc and the value as the transformation parameter value.
          Given below is an example of a transformation json list. Don't provide explanations and/or comments in the json.
          ``json
          [
            {{
              "aspect_ratio": "1.0",
              "width": 250,
            }},
            {{
              "fetch_format": "auto"
            }},
            {{
              "overlay":
              {{
                "url": "<image_url>"
              }}
            }},
            {{
              "flags": "layer_apply"
            }}
          ]
          ``
      - type: image_url
        image_url:
          url: $ _.url

      - type: text
        text: |-
          $ f'''Hey, check the video above, I need to apply the following transformations using cloudinary.
          {steps[0].input.transformation_prompt}'''

  unwrap: true
  settings:
    model: gemini/gemini-1.5-pro

# Step #2 - Extract the json from the model's response
- evaluate:
    model_transformation: >-
      $ load_json(
        _[_.find("```json")+7:][:_[_.find("```json")+7:].find("```")])

# Step #3 - Upload the video to cloudinary
- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      transformation: $ _.model_transformation
      resource_type: video

# Step #4 - Evaluate the transformed video url
- evaluate:
    transformed_video_url: $ _.url

Usage

Here’s how to use this task with the Julep SDK:

import time
import yaml
from julep import Client

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

transformation_prompt = """
1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:
https://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png

2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.
"""
# Create the agent
agent = client.agents.create(
  name="Julep Video Processing Agent",
  about="A Julep agent that can process and analyze videos using Cloudinary and FFmpeg.",
)

# Load the task definition
with open('video_processing_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
    task_id=task.id,
    input={
        "video_url":  "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4",
        "public_id": "video_test",
        "transformation_prompt": transformation_prompt,
    }
)
# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(1)

# Print the result
if result.status == "succeeded":
    print(result.output)
else:
    print(f"Error: {result.error}")

Example Output

This is an example output when the task is run over the sample video input.

Sample Video Input

Sample Video Output

Next Steps

Try this task yourself, check out the full example, see the video-processing-with-natural-language cookbook.
To learn more about the integrations used in this task, check out the integrations page.

​Overview

​Task Structure

​1. Input Schema

​2. Tools Configuration

​3. Main Workflow Steps

​Usage

​Example Output

​Next Steps

​Related Concepts

Overview

Task Structure

1. Input Schema

2. Tools Configuration

3. Main Workflow Steps

Usage

Example Output

Next Steps

Related Concepts