Overview
This tutorial demonstrates how to:- Fetch and filter top stories from Hacker News API
- Scrape full article content using web scraping integration
- Personalize content based on user preferences using AI
- Generate concise summaries for curated stories
- Process data in parallel for optimal performance
Task Structure
Let’s break down the task into its core components:1. Input Schema
First, we define what inputs our task expects:- Set a minimum HN score threshold for quality filtering
- Specify how many stories to include in the final newsletter
- Define their technology interests for personalization
2. Tools Configuration
Next, we define the external tools our task will use:- Direct Hacker News API calls for stories and comments
- Spider integration for advanced web scraping capabilities
3. Main Workflow Steps
1
Fetch Top Story IDs
- Fetches the current top 500 story IDs from Hacker News
- Extracts the first 50 for processing
2
Fetch Story Details in Parallel
- Fetches full details for each story ID
- Processes 10 stories in parallel for efficiency
- Extracts successfully fetched story data
3
Filter and Sort Stories
- Filters stories by minimum score threshold
- Sorts by score and takes the top N stories
- Ensures quality content for the newsletter
4
Scrape Full Article Content
Spider scraping parameters explained
Spider scraping parameters explained
- smart_mode: Intelligently extracts main content
- return_format: markdown: Clean, parseable text format
- proxy_enabled: Avoids rate limiting and blocks
- filter_output_images/svg: Text-only content
- readability: Enhanced article parsing
- parallelism: 4: Balanced to avoid overwhelming target sites
- Scrapes full article content for each story
- Converts to clean markdown format
- Handles failed scrapes gracefully
5
Fetch Top Comments
- Prepares comment IDs (up to 3 per story)
- Fetches comment details with high parallelism
- Maintains story-comment relationships
6
Personalize Content
- Combines stories with their content and comments
- Uses AI to score relevance (0-100) based on user preferences
- Filters stories with relevance >= 60 for high personalization
7
Generate Summaries and Final Output
- Generates 100-word AI summaries for each story
- Formats the final newsletter with all relevant information
- Includes both article URL and HN discussion URL
Complete Task YAML
Complete Task YAML
YAML
Usage
Here’s how to use this task with the Julep SDK:Example Output
An example output when running this task with user preferences for AI/ML and Python:Example Newsletter Output
Example Newsletter Output
Monitoring Execution
Track the execution progress and debug issues:Customization Ideas
- Email Integration: Add email sending to deliver newsletters automatically
- Scheduling: Set up periodic execution for daily/weekly newsletters
Next Steps
- Try this task yourself, check out the full example in the Hacker News cookbook
- Learn more about the Spider integration for web scraping
- Explore parallel processing patterns in Julep