Overview

This tutorial demonstrates how to:

  • Upload and process videos using Cloudinary integration
  • Extract and analyze video content
  • Add overlays and transformations
  • Process video subtitles and speaker information

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    upload_file:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file
    subtitle_vtt:
      type: string
      description: The vtt file content to add subtitles to the video

This schema specifies that our task expects:

  • A video file URL
  • A public ID for the video
  • A transformation prompt describing desired changes
  • VTT subtitle content (optional)

2. Tools Configuration

Next, we define the external tools our task will use:

- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: YOUR_CLOUDINARY_API_KEY
      cloudinary_api_secret: YOUR_CLOUDINARY_API_SECRET
      cloudinary_cloud_name: YOUR_CLOUDINARY_CLOUD_NAME

- name: cloudinary_edit
  type: integration
  integration:
    provider: cloudinary
    method: media_edit

- name: ffmpeg_edit
  type: integration
  integration:
    provider: ffmpeg

We’re using three main integrations:

  • Cloudinary for video uploads and transformations
  • FFmpeg for additional video processing capabilities

3. Main Workflow Steps

1

Initial Video Upload

- tool: cloudinary_upload
arguments:
  file: $ steps[0].input.video_url
  public_id: $ steps[0].input.public_id
  upload_params:
    resource_type: video

This step:

  • Takes the input video URL
  • Uploads it to Cloudinary
  • Specifies the resource type as video
2

Create Video Preview

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation:
        - start_offset: 0
          end_offset: 30

This step:

  • Creates a 30-second preview of the video
  • Useful for quick analysis and processing
3

Analyze Video Content

- prompt:
  - role: user
    content: 
      - type: image_url
        image_url:
          url: trimmed_video_url
      - type: text
        text: |-
          Which speakers are speaking in the video? And where does each of them sit?

This step:

  • Analyzes the video content
  • Identifies speakers and their positions
  • Uses VTT subtitles for additional context
4

Generate Speaker Transformations

- evaluate:
    speakers_transformations: |-
      $ [
      transform
      for speaker in _.speakers_json
        for transform in [
        {
          "overlay": {"font_family": "Arial", "font_size": 32, "text": speaker.speaker},
          "color": "white"
        },
        {
          "duration": 5,
          "flags": "layer_apply",
          "gravity": "south_east" if speaker.position == "right" else "south_west",
          "start_offset": speaker.timestamps[0].start,
          "y": 80,
          "x": 80
        }
        ]
      ]

This step:

  • Creates transformations for each speaker
  • Adds speaker labels with proper positioning
  • Sets timing for each overlay
5

Apply Transformations

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation: _.speakers_transformations

This step:

  • Uses the Cloudinary upload tool to apply the generated transformations
  • Processes the video with speaker labels and positioning
  • Returns a URL to the transformed video with all overlays applied

Usage

Here’s how to use this task with the Julep SDK:

import time
import yaml
from julep import Client

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

transformation_prompt = """
1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:
https://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png

2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.
"""
# Create the agent
agent = client.agents.create(
  name="Julep Video Processing Agent",
  about="A Julep agent that can process and analyze videos using Cloudinary and FFmpeg.",
)

# Load the task definition
with open('video_processing_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
    task_id=task.id,
    input={
        "video_url":  "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4",
        "public_id": "video_test",
        "transformation_prompt": transformation_prompt,
    }
)
# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(1)

# Print the result
if result.status == "succeeded":
    print(result.output)
else:
    print(f"Error: {result.error}")

Example Output

This is an example output when the task is run over the sample video input.

Next Steps