Overview

This tutorial demonstrates how to:
  • Fetch and filter top stories from Hacker News API
  • Scrape full article content using web scraping integration
  • Personalize content based on user preferences using AI
  • Generate concise summaries for curated stories
  • Process data in parallel for optimal performance

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:
input_schema:
  type: object
  properties:
    min_score:
      type: integer
      default: 50
    num_stories:
      type: integer
      default: 10
      description: Number of stories to include in newsletter
    user_preferences:
      type: array
      items:
        type: string
      description: User's technology interests (e.g., ["AI/ML", "Python", "Startups"])
This schema allows users to:
  • Set a minimum HN score threshold for quality filtering
  • Specify how many stories to include in the final newsletter
  • Define their technology interests for personalization

2. Tools Configuration

Next, we define the external tools our task will use:
- name: fetch_hn_stories
  type: api_call
  api_call:
    method: GET
    url: https://hacker-news.firebaseio.com/v0/topstories.json
    headers:
      Content-Type: application/json

- name: get_story_details
  type: api_call
  api_call:
    method: GET
    url: "https://example.com"
    headers:
      Content-Type: application/json

- name: get_comment_details
  type: api_call
  api_call:
    method: GET
    url: https://hacker-news.firebaseio.com/v0/item/{{comment_id}}.json

- name: spider_fetch
  type: integration
  integration:
    provider: spider
    setup:
      spider_api_key: YOUR_SPIDER_API_KEY
We’re using:
  • Direct Hacker News API calls for stories and comments
  • Spider integration for advanced web scraping capabilities

3. Main Workflow Steps

1

Fetch Top Story IDs

- tool: fetch_hn_stories
  arguments:
    url: "https://hacker-news.firebaseio.com/v0/topstories.json"
  label: fetch_story_ids

- evaluate:
    story_ids: $ steps["fetch_story_ids"].output.json[:50]
    message: $ f"Fetched {len(steps['fetch_story_ids'].output.json)} stories, processing top 50"
  label: extract_ids
This step:
  • Fetches the current top 500 story IDs from Hacker News
  • Extracts the first 50 for processing
2

Fetch Story Details in Parallel

- over: $ steps["extract_ids"].output["story_ids"]
  parallelism: 10
  map:
    tool: get_story_details
    arguments:
      method: GET
      url: $ f"https://hacker-news.firebaseio.com/v0/item/{_}.json"
  label: all_stories

- evaluate:
    stories: $ [item["json"] for item in _ if item and "json" in item]
  label: extract_stories
This step:
  • Fetches full details for each story ID
  • Processes 10 stories in parallel for efficiency
  • Extracts successfully fetched story data
3

Filter and Sort Stories

- evaluate:
    filtered: $ [s for s in steps["extract_stories"]["output"]["stories"] 
                 if "score" in s and s["score"] >= inputs.get("min_score", 50)]
  label: filter_stories

- evaluate:
    sorted_stories: '$ steps["filter_stories"]["output"]["filtered"][:inputs.get("num_stories", 10)]'
  label: sort_stories
This step:
  • Filters stories by minimum score threshold
  • Sorts by score and takes the top N stories
  • Ensures quality content for the newsletter
4

Scrape Full Article Content

- over: $ steps["sort_stories"]["output"]["sorted_stories"]
  parallelism: 4
  map:
    tool: spider_fetch
    arguments:
      url: $ _['url']
      params:
        request: smart_mode
        return_format: markdown
        proxy_enabled: $ True
        filter_output_images: $ True
        filter_output_svg: $ True
        readability: $ True
        limit: 1
  label: fetch_content

- evaluate:
    scraped_contents: '$ [item["result"][0]["content"] if item and "result" in item 
                         and item["result"] and "content" in item["result"][0] else "" 
                         for item in _]'
  label: extract_scraped_content
This step:
  • Scrapes full article content for each story
  • Converts to clean markdown format
  • Handles failed scrapes gracefully
5

Fetch Top Comments

- evaluate:
    comment_pairs: '$ [{"story_id": story["id"], "story_index": idx, "comment_id": kid} 
                      for idx, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"]) 
                      if "kids" in story for kid in story["kids"][:3]]'
  label: prepare_comments

- over: '$ steps["prepare_comments"]["output"]["comment_pairs"]'
  parallelism: 15
  map:
    tool: get_comment_details
    arguments:
      method: GET
      url: '$ f"https://hacker-news.firebaseio.com/v0/item/{_["comment_id"]}.json"'
  label: fetch_all_comments
This step:
  • Prepares comment IDs (up to 3 per story)
  • Fetches comment details with high parallelism
  • Maintains story-comment relationships
6

Personalize Content

- evaluate:
    stories_with_comments: '$ [dict(story, 
                                   content=steps["extract_scraped_content"]["output"]["scraped_contents"][i], 
                                   top_comments=[item[1] for item in steps["comments_with_index"]["output"]["comments_grouped"] 
                                               if item[0] == i]) 
                              for i, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"])]'
  label: final_stories_with_comments

- over: $ steps["final_stories_with_comments"]["output"]["stories_with_comments"]
  parallelism: 10
  map:
    prompt:
    - role: system
      content: |-
        $ f'''
        You are a content curator. Score this HN story's relevance to the user's interests.
        User interests: {steps[0].input.user_preferences}

        Return only a JSON object with the relevance score (0-100).
        Return ONLY raw JSON without markdown code blocks
        '''
    - role: user
      content: >-
        $ f'''
        Story to analyze:
        Title: {_["title"]}
        URL: {_["url"]}
        Score: {_["score"]}
        Content preview: {_["content"]}
        Top comment: {_["top_comments"][0]["text"]}

        Return format: "relevance_score" from 0 to 100
        '''
    unwrap: true
  label: score_stories

- evaluate:
    personalized_stories: $ [item for item in steps["combine_scores"]["output"]["scored_stories"] 
                            if item["relevance_score"] >= 60]
  label: filter_personalized
This step:
  • Combines stories with their content and comments
  • Uses AI to score relevance (0-100) based on user preferences
  • Filters stories with relevance >= 60 for high personalization
7

Generate Summaries and Final Output

- over: $ steps["filter_personalized"]["output"]["personalized_stories"]
  parallelism: 10
  map:
    prompt:
    - role: system
      content: |
        Generate a concise, insightful summary (max 100 words) for this article.
        Focus on key insights and why it matters.
    - role: user
      content: >-
        $ f'''
        Title: {_["story"]["title"]}
        Content: {_["story"]["content"]}
        Top comments: {_["story"]["top_comments"]}
        '''
    unwrap: true
  label: generate_summaries

- evaluate:
    final_output: |
      $ [{
          "title": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["title"],
          "url": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["url"],
          "hn_url": f"https://news.ycombinator.com/item?id={steps['filter_personalized']['output']['personalized_stories'][i]['story']['id']}",
          "comments_count": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"].get("descendants", 0),
          "summary": steps["generate_summaries"]["output"][i]
      } for i in range(len(steps["filter_personalized"]["output"]["personalized_stories"]))]
  label: prepare_final_output
This step:
  • Generates 100-word AI summaries for each story
  • Formats the final newsletter with all relevant information
  • Includes both article URL and HN discussion URL

Usage

Here’s how to use this task with the Julep SDK:
from julep import Client
import time
import yaml

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

# Create the agent
agent = client.agents.create(
  name="Hacker News Agent",
  about="A hacker news agent that can fetch the top stories from Hacker News and summarize them.",
  model="gpt-4o"
)

# Load the task definition
with open('hn_newsletter_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
  task_id=task.id,
  input={
    "min_score": 100,
    "num_stories": 5,
    "user_preferences": ["AI/ML", "Python", "DevOps", "Cloud Computing"]
  }
)

# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(5)

# Print the result
if result.status == "succeeded":
    for story in result.output['final_output']:
        print(f"Title: {story['title']}")
        print(f"URL: {story['url']}")
        print(f"HN Discussion: {story['hn_url']}")
        print(f"Comments: {story['comments_count']}")
        print(f"Summary: {story['summary']}")
        print("-" * 80)
else:
    print(f"Error: {result.error}")

Example Output

An example output when running this task with user preferences for AI/ML and Python:

Monitoring Execution

Track the execution progress and debug issues:
# Get execution transitions
transitions = client.executions.transitions.list(execution.id).items

for i, transition in enumerate(transitions):
    print(f"Step {i}: {transition.type}")
    if transition.type == "step":
        print(f"Label: {transition.current.label}")
        print(f"Status: {transition.status}")
        if transition.status == "failed":
            print(f"Error: {transition.error}")
    print("-" * 40)

Customization Ideas

  1. Email Integration: Add email sending to deliver newsletters automatically
  2. Scheduling: Set up periodic execution for daily/weekly newsletters

Next Steps