» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
Slowly but surely, FeedBlendr is moving into one of the realms that it was originally intended to serve: being a true feed platform. Part of the goal of providing APIs and alternate ways of accessing blends is to help people use Blendr and the blended feeds that it provides in different contexts than just a normal feed reader. That’s exactly what Near-Time is doing – using FeedBlendr to provide feed services for their online publishing platform. In the words of Joel Bush, their Sales Director:
Near-Time is a hosted Web 2.0 collaboration and publishing platform that includes wikis, weblogs, RSS/Atom output, and a number of other tools. The platform is built around the concept of “spaces” – users can build an unlimited number of spaces and include an unlimited number of members. The sidebar of each space has a number of optional components – one of them is the ability to pull in an RSS feed from another Near-Time space, or anywhere on the web for that matter.
You can see the beginnings of this integration by checking out this growing Publishers 2.0 space, where Joel is combining the power of FeedBlendr to combine 6 different feeds together, then using Near-Time’s RSS module to pull in the contents of that feed and display them on the page.
I forgot to mention that I’m at excellent WordCamp 2007 in San Francisco today and tomorrow, so if you see me (wearing a WordCamp 2006 shirt), come and say hi! WordCamp is a collection of people who are passionate about WordPress getting together and discussing their experiences, looking at what’s coming up, and working together to improve WordPress.
DISCLAIMER: I’m not a “sysadmin” by any stretch of the imagination, but I know my way around the Internets and have spent my fair share of time dealing with DNS, networks, server configuration, automation, HTTP-related stuff etc etc to know my way around things like this. I’m sure some of this would have been a lot easier for someone else, but hey - it works.
ASSUMPTIONS: I’m going to assume you know about Amazon EC2 and S3 and some of the terminology involved therein, so if not, please go read up a little on that first.
So — when I was looking for a new home for FeedBlendr, I wanted something that would be extremely scalable, because I have high hopes (obviously), and it’s part of a much bigger puzzle for me, so the scalability side of things was important. In this sort of application, the biggest issue with scaling and load has been processor time and memory, since my system spends a lot of time downloading feeds from the ‘net and then holding them in memory while it’s blending them and re-ordering them and whatnot. My main issue is not database “bandwidth”, it’s “web processing power”. With that in mind, here’s what I’ve done.
- Right now, my database remains on DreamHost (outside of Amazon entirely)
- I have a relatively dynamic system configured where I can call up a new instance from EC2 based on my own customized AMI. When it loads, it will grab a copy of my latest “distribution” of my web app from S3, install it on itself, and then send me an email (and an SMS) to let me know it’s ready to roll, and to add it into the DNS if I want it to be a part of my main cluster.
- I have 2 instances (servers) running in Amazon, configured using round-robin DNS to handle/balance the requests involved in powering FeedBlendr.
Setting Up an AMI
My first task (once getting myself set up with the EC2 Tools) was to actually set up my own Amazon Machine Image (AMI). This is your “server” if you like - operating system and all. I worked from a Fedora Core 6 base image that someone had shared on the Amazon Developer Forums, so that was a good starting point for me. Basically, this is what I did:
- Got the AMI running, then logged into it (getting logged on using shared certificates etc was new for me, but I got it sorted out)
- Did a bunch of
yum updateandyum installprocesses to install some things I needed (Apache, PHP etc) - Configured everything to work as I wanted. Remember to use name-based domain VirtualHost configuration on your images, because you don’t know what IP they’ll have when they come online (unless you wanted to factor that into your launch procedure somehow)
- While doing all of this, keep track of the process I needed to go through to actually install the codebase that runs FeedBlendr (and some other things) and permissions that needed to be changed etc.
- Built out the deployment process/scripts and did some iterative testing to make sure it worked etc.
- Deployed 2 instances, added their IPs to my DNS service and switched everything over to being hosted by Amazon — EASY! Right?
Custom Deployment Process
So I think what makes my process a little interesting is my deployment process. Rather than install and configure my complete application on my server, then take a snapshot of that and bundle it up as my AMI, I opted for a process where my AMI doesn’t actually contain my code at all. What happens is that the AMI is configured as a relatively barebones Apache+PHP system, capable of serving anything. When it launches, it calls a few very simple commands, which grab a package from S3, then extract it and execute a script contained within it.
That script does all the magic. It handles relocating files to where they need to be, fixing permissions, creating symbolic links, etc etc etc. It does everything it needs to do to deploy my entire system (including 2 websites, the custom feed handling core, a WordPress installation etc) in about 15 lines of bash script.
Why go to all the trouble of having this de-coupled AMI/deployment process? Simple: I work on the code for FeedBlendr a lot, and it’s undergoing pretty constant revisions. I realized very quickly that making AMIs and uploading them into S3 and registering them etc etc… sucks. It’s slow, it’s tedious, and I wanted to do it as little as possible. So doing things this way, I don’t have to make a new AMI every time I change my code, I just make a new distro package, throw it in S3, then launch a new instance and it’s got it all running. I can also “re-launch” an instance that’s already running (to save me dealing with DNS) by running a simple script which goes through the same process as when my instances first start up to get new code and overwrite everything currently running.
Dealing with DNS
DNS will come up pretty quickly as an issue if you’re working with EC2 - obviously. You launch an instance, you get a new IP. Close it down, launch a new one. New IP. Problem.
The short answer is just get yourself a custom DNS account somewhere. I’m using DynDNS, but they may not be the best. One specific problem I have with them is that there’s no programmatic way to update a hostname that’s configured with round-robin load balancing. I have 2 IPs allocated to the same domain name (feedblendr.com), so I can’t use any of their clients to add/remove/change IPs for that host. That’s something I specifically want to be able to do (have instances automatically jump into my round-robin and start balancing load - so if you know of someone, let me know!). ZoneEdit might be another option, and I know there are all sorts of other providers out there as well.
Set up your hostname in your new DNS service, and configure with a low TTL (Time To Live) (since you want to be able to change the authoritative IPs for your host quickly in case an instance goes away). I have mine set to 300 seconds, but you might even want to go shorter (if your provider will allow you to). On DynDNS, their Custom DNS service (to enable all of this) costs $25 per year, per host. Not too bad.
Now you’re in a position to add/remove IPs to that host and load balance, shift requests to a new instance etc as required. Always remember - instances in EC2 are transient! They may disappear and never come back.
Deployment Distributions
If you’re wondering how I build my packages for distribution purposes - here’s the deal:
- I use subversion as my code repository/version control system, so everything is in there and up to date at all times (hopefully :-p)
- I love
make, it’s capable of some really cool things, so I use it here and there to automate some project management related tasks - I already used make to do local testing (handling exporting from SVN and then setting up permissions/links etc within the project), so it made sense to extend that process to my deployment packages.
- I can check everything into SVN, then go to my “extras” directory and type
make ec2-distroand that’s it - It exports all the sub-projects that make up everything that will be deployed on the server, sets up permissions within the scope of the project, creates some internal symlinks (relative file-paths of course) and then tar’s it all up. From there, it uses s3curl.pl to send a copy up into S3 in a pre-defined location, and then it’s done.
- That package is what gets downloaded to instances and deployed when they launch.
Challenges I Faced/Face
It’s not all smooth sailing. I have had, and continue to have some things I’m not entirely happy with in this process, and in my experience with EC2. Here’s a couple specific ones in no particular order:
- DNS: I’m not entirely happy with my DNS set up. Right now if an instance disappears, it relies on me noticing and removing it from my DNS entries, then involves some amount of time before that change is noticed. I plan on trying to improve this by figuring out some sort of heart-beat based monitoring of my instances, possibly using Nagios or something like that. I wanted to use something like WeoCEO, but I’ve not heard back from the guys there in the timeframe I was working under, so had to go it alone.
- Shared Filesystems: I had hoped to make use of the promising S3DFS system, which promises to provide you with a fast-access (through lots of internal caching), shared filesystem, which is backed onto S3, but is accessible as a normal, local filesystem (using the FUSE system). Now here’s the kicker. It promises to enable multiple instances to access the same filesystem simultaneously. I had hoped to be able to use it to have multiple instances share a cache repository between them to improve the performance of my caching backend, and not have both instances downloading the same content right after each other because of round-robin issues. Well to make the story short - there were performance problems that meant that wasn’t an option. BUT! I’ve been in touch with the developers, and they’re working on a beta right now which should address all my problems, so I’m hoping to try it out again and use it in the future.
- Web Stats: I used to use Analog/Webalizer-type tools to look at my server logs, but with multiple instances serving content, that starts to get difficult, unless you’re willing to log to a central server, or write something custom to deal with merging logs etc. Rather than do that, I installed Google Analytics on my site, so I now get centralized stats from that, but it doesn’t cover my non-Javascript enabled content (e.g. any feed accesses). Luckily I log those details myself, but now it’s more important that I build some good tools for peering into that data.
- Hosting a Database: I’ve read all sorts of interesting posts about hosting databases within EC2, but something about it just makes me uneasy
Call me old-fashioned, but I’d like to know that my database was hosted on a machine that’s not going to disappear if it crashes. I suppose it’s just one more level of true redundancy to deal with right? I haven’t figured out master-master replication which seems to be an obvious requirement for that yet, so I’m not 100% happy with my database situation just yet. - Keeping My AMI Generic: Because I wanted to be able to modify my AMI as little as possible over time, I actually ended up moving my PHP and Apache configuration files into my distribution package as well. I have a directory called “extras” which contains things generally related to deployment, including a vhosts.conf file, and a php.ini. During deployment, these files are copied into place on the server and then Apache is automatically restarted. This allows me to customize my Apache configuration (including RewriteRules etc), without having to modify the AMI.
Handy Tools for EC2/S3
Here are a couple tools I found useful in this process, which might help you out as well:
- s3curl.pl — a really handy little Perl script that you can use as a cURL wrapper for doing command-line requests against S3. Great because it handles the complex authentication stuff, you just give it your access keys and it takes care of things so that you can use it basically the same as you would use cURL on the command line.
- S3 Browser — a very cool, lightweight and simple tool for checking out what you have in S3 buckets (and uploading/downloading/deleting things)
That will do for now - please ask in the comments if you have any questions and I’ll answer them here and/or revise this post to reflect new information.
Cheers — Beau
So those updates that I mentioned I was working on… they’re live now!
As mentioned in that post, this was mostly a bunch of core upgrades that you really won’t see any major differences from, but it makes life a lot easier for me going forward. It’s looking like it might also help with some of the caching problems we’ve been having (since I had to completely overhaul the caching system to factor in my new globalized caching code).
I’ll be keeping an eye on things to make sure the new code settles in nicely, but please let me know here or via email if you notice anything strange.
And that post about Amazon EC2 and S3 is coming soon, honestly
I’ll try to get it written up tomorrow, but I want to give it the attention it deserves.
I’m currently working on some significant core upgrades to FeedBlendr that will also allow the service to start scaling towards some of the other tools that I’m going to be rolling out. These changes affect pretty much everything internally, and will allow me to add a few new features, and to build out the other tools faster>
- New, standardized Caching framework (that will hopefully not cause more problems!)
- Improved database connection handling (this will allow me to more easily use slave DBs etc when required)
- Standardized and centralized “auto-discovery” class for locating feeds
- Improved and generalized HTTP class, customized to handle everything I need and nothing I don’t
- Totally separate “format conversion” layer for doing output in Serialized PHP, JS, JSON etc - some cool things can happen here!
I’m really excited about these changes (and some others that I’m still working on). This is the next step in ramping up towards the rest of the Feedville family. Keep an eye on this blog for more info!
Oh — and in response to my previous post about moving to Amazon Elastic Compute Cloud and Simple Storage Service, I got a few questions about my experiences etc, so I’ll be writing up another quick post summarizing it all soon.
I’ll explain in detail once I’m moved, but if you’re having problems with FeedBlendr, it’s probably because I’m in the process of moving servers (for real this time).
UPDATE 2007-05-13: As promised, here’s some more information about this Amazon business.
After running FeedBlendr on DreamHost for a year and bit, and having a few problems along the way with causing too much load, they finally pulled the pin on me. They’ve been very good about things, and worked with me to help identify and solve the problems I was causing (I was on a shared server, so my usage was affecting other customers). Basically, in the end there wasn’t a “fix” per se, because I just had too many people requesting blends too many times a day (over a MILLION times a month!), and had to do something about it.
In comes Amazon. For those of you who don’t know, Amazon has started getting into the Web Services world, and 2 of their offerings are of particular interest to me (us!):
- Elastic Compute Cloud (EC2): A system whereby I can request a new “copy” of a complete server on-demand, and use it as part of a cluster of machines to power FeedBlendr.
- Simple Storage Solution (S3) Unlimited, fully-redundant (in the good way) online storage, allowing me to keep copies of things out there in “Amazon-space” where my new EC2 servers can get at it.
I won’t bore you with all the details (although feel free to get in touch if you’re interested), but FeedBlendr is now running on 2 Amazon EC2 “instances” in a balanced manner (requests go to both machines), so hopefully performance is a lot better, and things will be more reliable. You may also have noticed that I fixed some caching bugs, so blends load faster, and should be more stable. I’ve also bumped up the minimum age for blends slightly, so you may notice now that blends can get slightly older before they will get refreshed. This is mainly because people were just requesting their blends too often, causing my servers to have to rebuild a complete blend every 5 minutes, just because a single feed changed. I’ll be looking at better ways to address this, but in the mean time, please let me know if this causes any problems.
Here’s to looking forward, and seeing FeedBlendr continue to improve and serve your feed-blending needs better!
Another week, another round of updates and tweaks to the core FeedBlendr service. This week brings:
- Fixed an RDF-parsing problem so that now you’ll get full content of a post if it’s available (was particularly affecting some FeedBurner feeds)
- Added the ability to force re-check a feed URL if it fails (just click the red dot that appears next to it to force check it)
- Fixed a few small things that were preventing feeds from rendering properly in FireFox, so both Atom and RSS versions should work now. Haven’t looked at IE7’s internal feed rendering yet, so I don’t know about that
- Improved the internal caching models to speed things up all over the place
- Switched over to a system that allows parallel outgoing requests for a pretty significant performance increase (especially on larger blends)
- LOADS of small bug fixes and improvements throughout the core code
Growth continues, and I haven’t been kicked off DreamHost just yet (fingers crossed). I’m actually looking in Amazon EC2 to see if that will provide a reasonable option, so hopefully I’ll figure that out in the near future. I’ll keep you posted.
Keep on blending!
UPDATE: Looks like there was a bug in some of the caching that was preventing Blendr from picking up any new feed entries, but I believe it’s fixed now. Thanks to Orlin, Steve and benjy for pointing this out!
I’ve just finished uploading and testing a new round of updates for FeedBlendr, please let me know if you spot anything funky!
- Added a “What Are Feeds?” page (link) to the homepage.
- Improved RSS output (stripped some unnecessary attributes, output <category> tags properly, and a couple others).
- Changed all the URLs that you use to request blends and subscribe pages etc. I’ve added redirects to handle all the old formats and redirect them to the new formats, but please take note of the new formats as outlined on a blend’s detail page. This change makes all the request URLs much more RESTful.
- Added the “Source” in the output for mobile reading.
- Got rid of the separate subscription buttons and added a single, “AddThis” button which links to all sorts of readers, both online and actual clients.
- Added caching to JavaScript and JSON output, so hopefully they’ll work a little faster. You may see slightly longer delays with updates of content, but actual page-loads should be a bit quicker for you now.
- Added the option to get headlines-only from JavaScript and JSON outputs. Read about it in the Developers section
You can email me using this address if you notice anything strange, or would like to suggest a new feature.
A couple of days ago, I got an email with the subject “API User” and I thought wow, someone’s actually using that API that I spent so long writing, and I suppose this will be a problem that they’re having with it.
I was half right, it was indeed from someone using the FeedBlendr API, but I was wrong about the second part - he wasn’t having any problems with it, he just wanted to let me know that he was using it, and it was working! The email was from Toby, one of the developers of HubTag.com, a new service which allows you to create, track and promote a “hub tag”. According to their description, “a HubTag is a unique word on the internet used to tag your photos and videos that you have published on different web 2.0 tools”.
Their implementation appears to involve automated blending from a number of public Web2.0 properties, so that you can get a single feed containing everything that uses a certain, unique tag. Toby Beresford, co-founder said “in practice this means www.hubtag.com will promote using unique hubtags for specific purposes like events, projects and personal filing.”
Looks like the guys are still working on the site as it’s already started adding in extra features and tools since I first looked at it, so it could be one to keep an eye on in the future. According to Neil Johnston, the other co-founder, their vision for HubTag is “everyone using tags relevantly and seamlessly to maintain better connections online”.
Great stuff - if anyone else is using the API for anything I’d love to hear about it!
The new version of FeedBlendr has been getting some mentions around the place, so I thought I’d post a quick round-up of some of the mentions I’ve spotted including reviews, flames, praise etc
- Read/Write Web: FeedBlendr - Feed Remix Service, by Alex Iskold of Adaptive Blue
- Newsniche: Feedblendr feeds into one, by Allan Burns
- CyberNews: Combine your feeds into one with FeedBlendr
- Smallbizunplugged: Feedblendr: Relationships still the core of business, including blogging for business
- Technorati: Blogs containing “feedblendr”
- Google Blog Search: Blogs containing “feedblendr”
And as a bonus, here’s a chart from Technorati that shows mentions of “feedblendr” that they’re tracking over time!
After many months of development (the complete engine was re-written from the ground up!), I’m very happy to announce that FeedBlendr 2.0 is now officially launched and available for use. I’ve mentioned in previous posts all the new features in this version, so I won’t repeat myself, but needless to say it’s a major upgrade.
I’ll be fine-tuning and tweaking over the coming weeks, but hopefully this will be a stable revision to the system and will open up some new options for people who couldn’t use FeedBlendr for something in particular in the past.
One glaring omission in this new version is user accounts, which a lot of you asked for. Rest assured that this feature is coming, but it’s part of a bigger picture, so has been left out intentionally for now to allow me to integrate future features into it better.
Please check it out and let me know how the new version goes!
PS: If you’re into that sort of thing, be sure to check out the new Developers page with information on more powerful ways of interacting with FeedBlendr
A new beta version of FeedBlendr is now online at http://beta.feedblendr.com so please check it out. This version adds/updates:
- Improved date-sorting
- OPML output (and improved OPML handling on import/blend creation)
- In-linking (allows you to pre-fill the Blendr homepage with details via a link)
- The integrated feed-reader now supports embedded media players for videos and audio!
- JSON output
- Improved handling of source feed errors
- Some good info on the Developers Page
- Lots of smaller bug fixes
Please try it out and again - let me know how it goes!
I’ve (finally) moved the FeedBlendr Blog feed to being served from the wonderful Feedburner, so please update your feed URLs to point to http://feeds.feedburner.com/FeedBlendr. Thanks!
I was checking out Digg this morning and noticed this story over at Valleywag:
How To Get all the tech news you need (in 20 minutes a day)
A quick look told me there’s a better way, so here it is. The article suggests a number of news sources that you should scan on a daily basis to ensure you’re getting a good coverage of what’s going on. Checking multiple sites takes way too long, so why not blend those news sources into a single feed and then subscribe to it in your newsreader, or even view it online?
The only news source I couldn’t include in there was Wired Magazine in “Mainstays”, but that’s much sexier to read in old-school hardcopy anyway, so go subscribe to it and read it on your commute or something.
These blends will all be upgraded with the new beta version of FeedBlendr soon, and then you’ll be able to access the contents of them online, on your cell-phone, wherever you want; care of some of the cool new features about to be released.
Rather than do the trendy, Web 2.0 thing and make FeedBlendr perpertually beta, I’m going to have a limited-time version up at http://beta.feedblendr.com to test out a new version of the core engine that I’ve been working on over the past few months. This is live right now, so please check it out. New features you may (or may not) notice are:
- Updated funky design (thanks again Ray Hernandez!)
- Supports podcasts/video blogs now!
- Fancy online reader for viewing the contents of your blend, including embedded audio/video
- Mobile reader - allows you to view a blend in a lightweight format specially designed for phones/PDAs etc.
- JavaScript “export” to allow you to drop the contents of a blend into your own page and style it to suit your layout using simple CSS.
- Supports custom XML-extensions (like MediaRSS, GeoRSS etc) on the specific entries sourced from feeds which contained them
- Default output format is now Atom
- Much-improved OPML/bulk feed blending operations
- Fancy AJAX feed checking that actually works
- Much better caching/crawling engine (hopefully) to improve response times and server performance
- Whiz-bang REST/OPML/XML-based API that will allow developers to build some cool stuff with Blendr as part of it (if you’re into that sort of thing)
I’d really love it if people could try out some blends on the beta version, and let me know how it goes. Specifically I want to know about any problems you have, but feel free to lavish me with praise as well
No need to tell me that the “about”, “blog”, “tips” and “contact” links don’t work either - they will when it goes live!
WARNING: Blends created on beta.feedblendr.com will NOT be transferred over to the live version once the beta phase is complete (within a few weeks), so please only use it for testing!
Having some sort of problem with refreshing the contents of feeds right now which I’m looking into.
Basically this will mean that your blends will appear to not have any new posts in them, even though the source feeds are updating.
It looks like this is actually a file-system problem being caused by there being too many files in a single directory and thus confusing the system into thinking that files are fresh (there’s over 45,000 files in cache directory…)
I’ll be fixing this today, but the change won’t be live until tonight most likely. Sorry for the delay.
UPDATE (7:00pm WST): This problem should be “fixed” now, at least for long enough to roll out the new version of FeedBlendr. The problem was indeed in the caching model, and it goes a little something like this:
MagpieRSS uses an md5 hash of the URL where it got a feed from to create a filename to store its cached copy under. That’s fine under normal circumstances, but I had over 45,000 cache files in a single directory, and PHP was choking when trying to figure out if a file was fresh or stale, and for some reason it was assuming it was fresh.
I’ve adjusted the way that Magpie caches files, so now it’s using a directory structure 5 levels deep, based on the md5 name of the cache file, which should mean that at the deepest level, there are less files per directory, thus avoiding this problem for a while.
Thanks for your patience and I hope it didn’t cause too many problems for you all out there. Let’s all look forward to the new version (coming soon, honest!), which will hopefully work around things like this!
From the suggestions of a few users, and my own desire to improve the functionality of FeedBlendr, I’m hoping to increase the number of feed formats/variants/dialects supported in the near future. At the moment, the way that FeedBlendr works means that it only supports (or rather only outputs) pretty bare-bones feeds. This means it almost always works, but it also precludes it from doing really cool things like creating belnded podcasts or vlog feeds amongst other things.
I’m going to be working from Niall Kennedy’s list of feed specs, and hope to ensure support for these formats (and possibly more if I can figure out a reliable way to open up the elements I pass through):
- RSS 0.9, 0.91, 1.0, 2.0 with enclosures
- MediaRSS
- iTunes formatted podcasting feeds
- GeoRSS
If there are any other specific formats that you’d like to see FeedBlendr support, please use the comments to let me know!
Change of plans!
Thanks to a new policy at DreamHost, there’s no immediate need to relocate, so FeedBlendr will be staying right where it is for now.
I’m hoping to get a bit of a chance to do some refactoring and what-not in the near future though, which will make blendr a little more flexible, and a little more standards compliant
Some time over the next week, I will be re-locating FeedBlendr.com to a new home at hosting.com. During that process, you may experience some problems accessing blends, but I hope to minimize those problems as much as possible.
I will post again when things look stable
With increasing popularity come a number of problems that are only to be expected. Right now, FeedBlendr is experiencing one of those problems - scalability/load issues.
After a number of warnings/notifications from DreamHost, I’ve been asked to either figure out a way to lower my processor usage immediately, or risk having my account disabled until I can do so. I’m using too much of the processor on my shared server and it’s unfair to the other users - fair enough.
Now I’m faced with a decision: do I shut FeedBlendr down (I don’t want to)? is there something I can do to lower usage (not that I’ve found yet)? can I justify upgrading my hosting (without making any money from FeedBlendr under the current model)?
At the moment I’m considering upgrading to a dedicated server from Hosting.com, which should give me the power/flexibility I need, but it’s a lot of extra management/set up etc that I’d have to deal with just to get it all happening, and life is just plain busy right now. So as a stop-gap, and either way, I’ve decided to open up for donations, allowing anyone who would like to do so to show their appreciation for the service FeedBlendr offers, by dropping a few bucks in my virtual tip-jar.
This may well affect my decision on whether or not to keep FeedBlendr live - if people aren’t even willing to drop a few dollars once-off in appreciation of the service, then I don’t know if I can continue providing it, and pay extra for the privelege of keeping FeedBlendr public and free.
It’s up to you people - so please show how you feel about FeedBlendr and donate now!









