Learn how to build a personalized newsletter generator that fetches top Hacker News stories, analyzes their relevance to user interests, and creates AI-powered summaries
Fetches the current top 500 story IDs from Hacker News
Extracts the first 50 for processing
2
Fetch Story Details in Parallel
- over: $ steps["extract_ids"].output["story_ids"] parallelism: 10 map: tool: get_story_details arguments: method: GET url: $ f"https://hacker-news.firebaseio.com/v0/item/{_}.json" label: all_stories- evaluate: stories: $ [item["json"] for item in _ if item and "json" in item] label: extract_stories
This step:
Fetches full details for each story ID
Processes 10 stories in parallel for efficiency
Extracts successfully fetched story data
3
Filter and Sort Stories
- evaluate: filtered: $ [s for s in steps["extract_stories"]["output"]["stories"] if "score" in s and s["score"] >= inputs.get("min_score", 50)] label: filter_stories- evaluate: sorted_stories: '$ steps["filter_stories"]["output"]["filtered"][:inputs.get("num_stories", 10)]' label: sort_stories
This step:
Filters stories by minimum score threshold
Sorts by score and takes the top N stories
Ensures quality content for the newsletter
4
Scrape Full Article Content
- over: $ steps["sort_stories"]["output"]["sorted_stories"] parallelism: 4 map: tool: spider_fetch arguments: url: $ _['url'] params: request: smart_mode return_format: markdown proxy_enabled: $ True filter_output_images: $ True filter_output_svg: $ True readability: $ True limit: 1 label: fetch_content- evaluate: scraped_contents: '$ [item["result"][0]["content"] if item and "result" in item and item["result"] and "content" in item["result"][0] else "" for item in _]' label: extract_scraped_content
Spider scraping parameters explained
smart_mode: Intelligently extracts main content
return_format: markdown: Clean, parseable text format
proxy_enabled: Avoids rate limiting and blocks
filter_output_images/svg: Text-only content
readability: Enhanced article parsing
parallelism: 4: Balanced to avoid overwhelming target sites
This step:
Scrapes full article content for each story
Converts to clean markdown format
Handles failed scrapes gracefully
5
Fetch Top Comments
- evaluate: comment_pairs: '$ [{"story_id": story["id"], "story_index": idx, "comment_id": kid} for idx, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"]) if "kids" in story for kid in story["kids"][:3]]' label: prepare_comments- over: '$ steps["prepare_comments"]["output"]["comment_pairs"]' parallelism: 15 map: tool: get_comment_details arguments: method: GET url: '$ f"https://hacker-news.firebaseio.com/v0/item/{_["comment_id"]}.json"' label: fetch_all_comments
This step:
Prepares comment IDs (up to 3 per story)
Fetches comment details with high parallelism
Maintains story-comment relationships
6
Personalize Content
- evaluate: stories_with_comments: '$ [dict(story, content=steps["extract_scraped_content"]["output"]["scraped_contents"][i], top_comments=[item[1] for item in steps["comments_with_index"]["output"]["comments_grouped"] if item[0] == i]) for i, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"])]' label: final_stories_with_comments- over: $ steps["final_stories_with_comments"]["output"]["stories_with_comments"] parallelism: 10 map: prompt: - role: system content: |- $ f''' You are a content curator. Score this HN story's relevance to the user's interests. User interests: {steps[0].input.user_preferences} Return only a JSON object with the relevance score (0-100). Return ONLY raw JSON without markdown code blocks ''' - role: user content: >- $ f''' Story to analyze: Title: {_["title"]} URL: {_["url"]} Score: {_["score"]} Content preview: {_["content"]} Top comment: {_["top_comments"][0]["text"]} Return format: "relevance_score" from 0 to 100 ''' unwrap: true label: score_stories- evaluate: personalized_stories: $ [item for item in steps["combine_scores"]["output"]["scored_stories"] if item["relevance_score"] >= 60] label: filter_personalized
This step:
Combines stories with their content and comments
Uses AI to score relevance (0-100) based on user preferences
Filters stories with relevance >= 60 for high personalization
7
Generate Summaries and Final Output
- over: $ steps["filter_personalized"]["output"]["personalized_stories"] parallelism: 10 map: prompt: - role: system content: | Generate a concise, insightful summary (max 100 words) for this article. Focus on key insights and why it matters. - role: user content: >- $ f''' Title: {_["story"]["title"]} Content: {_["story"]["content"]} Top comments: {_["story"]["top_comments"]} ''' unwrap: true label: generate_summaries- evaluate: final_output: | $ [{ "title": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["title"], "url": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["url"], "hn_url": f"https://news.ycombinator.com/item?id={steps['filter_personalized']['output']['personalized_stories'][i]['story']['id']}", "comments_count": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"].get("descendants", 0), "summary": steps["generate_summaries"]["output"][i] } for i in range(len(steps["filter_personalized"]["output"]["personalized_stories"]))] label: prepare_final_output
This step:
Generates 100-word AI summaries for each story
Formats the final newsletter with all relevant information
Includes both article URL and HN discussion URL
Complete Task YAML
YAML
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/src/schemas/create_task_request.jsonname: HN Newsletter Generatordescription: Fetch top Hacker News stories, personalize contentinput_schema: type: object properties: min_score: type: integer default: 50 num_stories: type: integer default: 10 description: Number of stories to include in newsletter user_preferences: type: array items: type: string description: User's technology interests (e.g., ["AI/ML", "Python", "Startups"])tools:# Fetch top story IDs from Hacker News- name: fetch_hn_stories type: api_call api_call: method: GET url: https://hacker-news.firebaseio.com/v0/topstories.json headers: Content-Type: application/json# Get detailed information for a specific story- name: get_story_details type: api_call api_call: method: GET url: "https://example.com" headers: Content-Type: application/json# Fetch individual comment details- name: get_comment_details type: api_call api_call: method: GET url: https://hacker-news.firebaseio.com/v0/item/{{comment_id}}.json# Spider web scraping integration- name: spider_fetch type: integration integration: provider: spider setup: spider_api_key: YOUR_SPIDER_API_KEYmain:# Step 0: Fetch top story IDs from Hacker News- tool: fetch_hn_stories arguments: url: "https://hacker-news.firebaseio.com/v0/topstories.json" label: fetch_story_ids# Step 1: Extract first 50 story IDs- evaluate: story_ids: $ steps["fetch_story_ids"].output.json[:50] message: $ f"Fetched {len(steps['fetch_story_ids'].output.json)} stories, processing top 50" label: extract_ids# Step 2: Fetch details for each story in parallel- over: $ steps["extract_ids"].output["story_ids"] parallelism: 10 map: tool: get_story_details arguments: method: GET url: $ f"https://hacker-news.firebaseio.com/v0/item/{_}.json" label: all_stories# Step 3: Extract successfully fetched story data- evaluate: stories: $ [item["json"] for item in _ if item and "json" in item] label: extract_stories# Step 4: Filter by score- evaluate: filtered: $ [s for s in steps["extract_stories"]["output"]["stories"] if "score" in s and s["score"] >= inputs.get("min_score", 50)] label: filter_stories# Step 5: Sort stories by score and limit- evaluate: sorted_stories: '$ steps["filter_stories"]["output"]["filtered"][:inputs.get("num_stories", 10)]' label: sort_stories# Step 6: Fetch full article content using Spider- over: $ steps["sort_stories"]["output"]["sorted_stories"] parallelism: 4 map: tool: spider_fetch arguments: url: $ _['url'] params: request: smart_mode return_format: markdown proxy_enabled: $ True filter_output_images: $ True filter_output_svg: $ True readability: $ True limit: 1 label: fetch_content# Step 7: Extract scraped content- evaluate: scraped_contents: '$ [item["result"][0]["content"] if item and "result" in item and item["result"] and "content" in item["result"][0] else "" for item in _]' label: extract_scraped_content# Step 8: Prepare comment fetching- evaluate: comment_pairs: '$ [{"story_id": story["id"], "story_index": idx, "comment_id": kid} for idx, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"]) if "kids" in story for kid in story["kids"][:3]]' label: prepare_comments# Step 9: Fetch all comment details- over: '$ steps["prepare_comments"]["output"]["comment_pairs"]' parallelism: 15 map: tool: get_comment_details arguments: method: GET url: '$ f"https://hacker-news.firebaseio.com/v0/item/{_["comment_id"]}.json"' label: fetch_all_comments# Step 10: Extract comment data- evaluate: comment_results: '$ [item["json"] for item in _ if item and "json" in item and item["json"]]' label: extract_comments# Step 11: Group comments by story- evaluate: comments_grouped: '$ [[pair["story_index"], steps["extract_comments"]["output"]["comment_results"][i]] for i, pair in enumerate(steps["prepare_comments"]["output"]["comment_pairs"])]' label: comments_with_index# Step 12: Combine stories with content and comments- evaluate: stories_with_comments: '$ [dict(story, content=steps["extract_scraped_content"]["output"]["scraped_contents"][i], top_comments=[item[1] for item in steps["comments_with_index"]["output"]["comments_grouped"] if item[0] == i]) for i, story in enumerate(steps["sort_stories"]["output"]["sorted_stories"])]' label: final_stories_with_comments# Step 13: Score stories based on user preferences- over: $ steps["final_stories_with_comments"]["output"]["stories_with_comments"] parallelism: 10 map: prompt: - role: system content: |- $ f''' You are a content curator. Score this HN story's relevance to the user's interests. User interests: $ {{ steps[0].input.user_preferences }} Return only a JSON object with the relevance score (0-100). Return ONLY raw JSON without markdown code blocks ''' - role: user content: >- $ f''' Story to analyze: Title: $ {{ _["title"] }} URL: $ {{ _["url"] }} Score: $ {{ _["score"] }} Content preview: $ {{ _["content"]}} Top comment: $ {{ _["top_comments"][0]["text"] }} Return format: "relevance_score" from 0 to 100 ''' unwrap: true label: score_stories# Step 14: Combine with scores- evaluate: scored_stories: '$ [{"story": steps["final_stories_with_comments"]["output"]["stories_with_comments"][i], "relevance_score": json.loads(steps["score_stories"]["output"][i])["relevance_score"]} for i in range(len(steps["score_stories"]["output"]))]' label: combine_scores# Step 15: Filter by relevance- evaluate: personalized_stories: $ [item for item in steps["combine_scores"]["output"]["scored_stories"] if item["relevance_score"] >= 60] label: filter_personalized# Step 16: Generate summaries- over: $ steps["filter_personalized"]["output"]["personalized_stories"] parallelism: 10 map: prompt: - role: system content: | Generate a concise, insightful summary (max 100 words) for this article. Focus on key insights and why it matters. - role: user content: >- $ f''' Title: {{ _["story"]["title"] }} Content: {{ _["story"]["content"] }} Top comments: {{ _["story"]["top_comments"] }} ''' unwrap: true label: generate_summaries# Step 17: Prepare final output- evaluate: final_output: | $ [{ "title": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["title"], "url": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"]["url"], "hn_url": f"https://news.ycombinator.com/item?id={{steps['filter_personalized']['output']['personalized_stories'][i]['story']['id']}}", "comments_count": steps["filter_personalized"]["output"]["personalized_stories"][i]["story"].get("descendants", 0), "summary": steps["generate_summaries"]["output"][i] } for i in range(len(steps["filter_personalized"]["output"]["personalized_stories"]))] label: prepare_final_output
An example output when running this task with user preferences for AI/ML and Python:
Example Newsletter Output
Title: OpenAI Announces GPT-5 with Revolutionary Reasoning Capabilities URL:https://openai.com/research/gpt-5 HN Discussion:https://news.ycombinator.com/item?id=12345678 Comments: 234 Summary: OpenAI’s GPT-5 demonstrates unprecedented reasoning abilities and multimodal understanding. The model shows significant improvements in code generation, mathematical reasoning, and real-world problem solving. Key breakthrough involves new architecture allowing dynamic computation allocation based on task complexity. Community discusses implications for AI safety and potential applications in scientific research.Title: Python 3.13 Released with Major Performance Improvements URL:https://python.org/downloads/release/python-313 HN Discussion:https://news.ycombinator.com/item?id=12345679 Comments: 156 Summary: Python 3.13 brings 40% performance improvements through adaptive bytecode specialization and improved memory management. New features include better error messages, enhanced typing support, and native WASM compilation. Developers report significant speedups in data processing workloads. Discussion highlights compatibility concerns with popular libraries and migration strategies for large codebases.Title: New ML Framework Achieves 10x Training Speed on Consumer GPUs URL:https://github.com/fastML/framework HN Discussion:https://news.ycombinator.com/item?id=12345680 Comments: 189 Summary: FastML framework enables training large language models on consumer hardware through innovative gradient compression and distributed computing techniques. Benchmarks show 10x speedup compared to PyTorch for specific workloads. Framework supports automatic mixed precision and memory-efficient attention mechanisms. Community excited about democratizing ML research but debates production readiness.