Twitter newsfeed in 3 days w/ Ruby on Rails & Tailwind CSS
I spend hours browsing through news & articles to find some quality content around remote work. Mostly, I end up knowing nothing new. If I am lucky, I stumble upon a great piece or a "breaking news" item.
So I built something simple but crazy interesting - an automated feed that gives you the best bite-sized remote work content.
Link: Twitter Newsfeed
In this article, I will delve into the specifics of how I built the newsfeed. For more details on the problem & solution, check out this Twitter thread:
Launched Remote Shorts on @ProductHunt 🚀
— Hrishikesh Pardeshi (@hrishiptweets) October 30, 2020
☕️ your daily dose of
🍰 best bite-sized
🌏 remote work content
🗣️ w/ personal commentary
⏱️ save hours
🔍 searching content
📜 reading 1000-word articles
🔥 Product Hunt: https://t.co/Mpzn6y6x2K
Here's why & what we have built 👇 pic.twitter.com/fNDp1n2BPN
TABLE OF CONTENTS
- Architecture
- Content Fetch & Update
- Content Processing & Storage
- Cron Jobs
- Display
- Integration with Community
- Admin panel
- Future updates
Architecture
Content Fetch & Update
This component primarily interfaces with the Twitter API. The content fetcher is periodically invoked by a set of CRON jobs to fetch & update content.
For fetching new content,
1) Twitter Search API (advanced search)
- List of terms like 'remote work', 'work from home' etc.
- Twitter offers its own ranking of top tweets configured by using result_type: "popular".
2) Twitter Timeline API (user timeline)
- List of user handles pre-populated in DB.
- Store the last tweet fetched for every user handle & fetch new tweets only after that.
For updating content, loop through all tweets stored in DB over the last 4 days and call the Twitter update API.
Content Processing & Storage
1) Automated categorisation
-
Top tweets shown on the feed need to be categorised so that it provides a smooth flow of reading
-
The first level of processing that happens on a tweet is to categorise it basis:
- Keywords in the tweet content (e.g. if it has 'remote job', it is probably a job tweet).
- External URLs (possibly a blog/ articles shared).
- Tweets from a list of handles (e.g. if it is from NY Post, it is a news item).
2) Calculate score before saving to DB
- Every tweet stored in the DB has a score attached to it.
- Before every save, score is refreshed.
- Score is computed basis a combination of likes, replies & retweets for the tweet.
Cron Jobs
The content fetch modules are triggered at a regular frequency through a set of cron jobs. These are made independent & spaced out to ensure Twitter limits are not breached now or in future.
Following are the cron jobs & their frequency
- Timeline API - 20 min of every hour
- Search API - 40 min of every hour
- Update meta (# of likes, retweets etc.) for each tweet - 60 min of every hour
- Deleting past tweets - weekly
Display
Display module interfaces with the DB but has no say in deciding what will be shown in the feed. Top 10 tweets basis score are fetched from the DB directly and displayed.
Tweets are displayed in the feed first categorised by day and then by category (e.g. opinions, articles, news etc.). To display tweets, wanted to naturally use Twitter's own embed code but it resulted in high page load time.
Problem: Twitter embed script is super heavy and it takes good 10-15s for the page to load if there are > 7-8 tweets.
Solution:
- Had to write custom CSS replicating the tweet UI
- Embed code is fetched from Twitter API but I explicitly removed the script & unneeded meta data.
There's also a subscribe box (for daily newsletter) at the top and a countdown (written in vanilla JS) that shows when the next update is coming.
Integration with Community
Aim:
- Make it super easy for users to start or join a discussion on the nugget they like
- Browse through the top posts from Remote Clan
Solution:
- Sticky right sidebar with top posts from Remote Clan (mention twitter hack)
- Start a discussion - 1-click button to create an automated post on Remote Clan.
- Join discussion - If there's already a post, show the linked post instead.
- Join the community & # of members online CTAs
Admin panel
Automated feed is the default but manual override needed to clean & rearrange content.
When you are in hurry,
- Queueing system to mark tweets invisible from feed. When you mark a tweet invisible, a fresh candidate shows up.
When there's plenty of time for curation,
- Manually look through all tweets and override the algorithm. Explicitly force a tweet to show up in the feed.
Future updates
- Infinite scroll
- Improve accuracy of curation algorithm
- Include other sources of content (HN, Reddit etc.)
I write regularly about tech, products, startups on Twitter. You can follow my updates there.