Skip to main content

Overview

This tutorial demonstrates how to:
  • Upload and process videos using Cloudinary integration
  • Extract and analyze video content
  • Add overlays and transformations
  • Process video subtitles and speaker information

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:
input_schema:
  type: object
  properties:
    upload_file:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file
    subtitle_vtt:
      type: string
      description: The vtt file content to add subtitles to the video
This schema specifies that our task expects:
  • A video file URL
  • A public ID for the video
  • A transformation prompt describing desired changes
  • VTT subtitle content (optional)

2. Tools Configuration

Next, we define the external tools our task will use:
- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: YOUR_CLOUDINARY_API_KEY
      cloudinary_api_secret: YOUR_CLOUDINARY_API_SECRET
      cloudinary_cloud_name: YOUR_CLOUDINARY_CLOUD_NAME

- name: cloudinary_edit
  type: integration
  integration:
    provider: cloudinary
    method: media_edit

- name: ffmpeg_edit
  type: integration
  integration:
    provider: ffmpeg
We’re using three main integrations:
  • Cloudinary for video uploads and transformations
  • FFmpeg for additional video processing capabilities

3. Main Workflow Steps

1

Initial Video Upload

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
The steps[0].input variable refers to the initial input object passed to the task. It’s used to access the input parameters defined in the input schema.
This step:
  • Takes the input video URL
  • Uploads it to Cloudinary
  • Specifies the resource type as video
2

Create Video Preview

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation:
        - start_offset: 0
          end_offset: 30
This step:
  • Creates a 30-second preview of the video
  • Useful for quick analysis and processing
3

Analyze Video Content

- prompt:
  - role: user
    content: 
      - type: image_url
        image_url:
          url: trimmed_video_url
      - type: text
        text: |-
          Which speakers are speaking in the video? And where does each of them sit?
This step:
  • Analyzes the video content
  • Identifies speakers and their positions
  • Uses VTT subtitles for additional context
4

Generate Speaker Transformations

- evaluate:
    speakers_transformations: |-
      $ [
      transform
      for speaker in _.speakers_json
        for transform in [
        {
          "overlay": {"font_family": "Arial", "font_size": 32, "text": speaker.speaker},
          "color": "white"
        },
        {
          "duration": 5,
          "flags": "layer_apply",
          "gravity": "south_east" if speaker.position == "right" else "south_west",
          "start_offset": speaker.timestamps[0].start,
          "y": 80,
          "x": 80
        }
        ]
      ]
This step:
  • Creates transformations for each speaker
  • Adds speaker labels with proper positioning
  • Sets timing for each overlay
5

Apply Transformations

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation: _.speakers_transformations
This step:
  • Uses the Cloudinary upload tool to apply the generated transformations
  • Processes the video with speaker labels and positioning
  • Returns a URL to the transformed video with all overlays applied
YAML
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/src/schemas/create_task_request.json
name: Julep Video Processing Task
description: A Julep agent that can process and analyze videos using Cloudinary

########################################################
################### INPUT SCHEMA #######################
########################################################

input_schema:
  type: object
  properties:
    video_url:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file

########################################################
################### TOOLS ##############################
########################################################

tools:
- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: "YOUR_CLOUDINARY_API_KEY"
      cloudinary_api_secret: "YOUR_CLOUDINARY_API_SECRET"
      cloudinary_cloud_name: "YOUR_CLOUDINARY_CLOUD_NAME"

########################################################
################### MAIN WORKFLOW ######################
########################################################

main:
# Step #0 - Upload the video to cloudinary
- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video

# Step #1 - Analyze the video content with a prompt
- prompt:
  - role: user
    content:

      - type: text
        text: |-
          You are a Cloudinary expert. You are given a medial url. it might be an image or a video.
          You need to come up with a json of transformations to apply to the given media.
          Overall the json could have multiple transformation json objects.
          Each transformation json object can have the multiple key value pairs.
          Each key value pair should have the key as the transformation name like "aspect_ratio", "crop", "width" etc and the value as the transformation parameter value.
          Given below is an example of a transformation json list. Don't provide explanations and/or comments in the json.
          ``json
          [
            {{
              "aspect_ratio": "1.0",
              "width": 250,
            }},
            {{
              "fetch_format": "auto"
            }},
            {{
              "overlay":
              {{
                "url": "<image_url>"
              }}
            }},
            {{
              "flags": "layer_apply"
            }}
          ]
          ``
      - type: image_url
        image_url:
          url: $ _.url

      - type: text
        text: |-
          $ f'''Hey, check the video above, I need to apply the following transformations using cloudinary.
          {steps[0].input.transformation_prompt}'''

  unwrap: true
  settings:
    model: gemini/gemini-1.5-pro

# Step #2 - Extract the json from the model's response
- evaluate:
    model_transformation: >-
      $ load_json(
        _[_.find("```json")+7:][:_[_.find("```json")+7:].find("```")])

# Step #3 - Upload the video to cloudinary
- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.video_url
    public_id: $ steps[0].input.public_id
    upload_params:
      transformation: $ _.model_transformation
      resource_type: video

# Step #4 - Evaluate the transformed video url
- evaluate:
    transformed_video_url: $ _.url

Usage

Here’s how to use this task with the Julep SDK:
import time
import yaml
from julep import Client

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

transformation_prompt = """
1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:
https://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png

2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.
"""
# Create the agent
agent = client.agents.create(
  name="Julep Video Processing Agent",
  about="A Julep agent that can process and analyze videos using Cloudinary and FFmpeg.",
)

# Load the task definition
with open('video_processing_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
    task_id=task.id,
    input={
        "video_url":  "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4",
        "public_id": "video_test",
        "transformation_prompt": transformation_prompt,
    }
)
# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(1)

# Print the result
if result.status == "succeeded":
    print(result.output)
else:
    print(f"Error: {result.error}")

Example Output

This is an example output when the task is run over the sample video input.

Next Steps

⌘I