Welcome to Working With Rails

 

Discussion Forums

Discuss all things Ruby on Rails with perhaps the web's most vibrant group of Ruby on Rails enthusiasts.
RSS feeds into Rails database
10 Posts
RSS feeds into Rails database

I would like to get RSS feeds from various sites into my Rails database so that I can save that data and use it later to display. I want to run this program at certain time of the day every day. I can put in the checks to exclude any duplicates. Can someone pls help me.

Forgot to mention. Have this piece of code. But not sure how to put the values into tables.

Provides RSS parsing capabilities

require 'rss'

Allows open to access remote files

require 'open-uri'

What feed are we parsing?

rss_feed = "http://www.hindustantimes.com/RSSFeed/RssFeed.ashx?c=Business"

Variable for storing feed content

rss_content = ""

Read the feed into rss_content

open(rss_feed) do |f| rss_content = f.read end

Parse the feed, dumping its contents to rss

rss = RSS::Parser.parse(rss_content, false)

Output the feed title and website URL

puts "Title: #{rss.channel.title}" puts "RSS URL: #{rss.channel.link}" puts "Total entries: #{rss.items.size}"

Hello Venu,

I am sort of out of principle opposed to just writing code up for people. However, I would be more than happy to work through this with you. I would encourage you to just give it a go using your understanding of Rails so far and post it up here, I'm sure many people would be more than happy to either refactor portions or point out how you might better do something.

I will give you your first clue. For running a cron job style function in rails you can use something like script/runner which lets you you a function in a model every so often (depending no your cron job). Let me give you an example. Lets say you have a model called Job and Job has a method called do_something... lets fill it out:

class Job < ActiveRecord::Base

def do_something jobs = Jpbs.find(:all) jobs.each do |job| job.name += "augmented from the runner" job.save end end

end

Okay... so lets say you ran this corn job once a month, all it would do is add "augmented from the runner" to the end of every job name, somewhat useless, but you can get the idea of what you can do with something like script/runner...

I hope that gives you somewhere to start, I would really encourage you to go ahead and try building this method first, post some rough code and I'm sure you will have more than one person more than happy to help you out with your code.

Enjoy!

Venu,

By far, the easiest way to achieve your goal is to use cron (the built-in task scheduler for Linux and most Unix variants) with either wget or curl. For example, the following line inserted into crontab would invoke the refresh method of controller called cron that you would write for your app. In this case, you'd just inspect params[:target] for the RSS URL you need to scrape. (And verify params[:secret_key] to make sure nobody is spoofing a request) Inside your controller's refresh method, you can perform all the necessary actions to update your database. Also in this case, you'd get a request sent to your Rails App every 5 minutes -- which probably is not what you want. "man 5 crontab" or google crontab to get the right settings for your needs.

0-59/5 * * * * /usr/bin/wget -O - -q http://railsapp.yourhost.com:3000/cron/refresh?target=http://www.not404.com/rss.xml&secret_key=foo

The best thing about this approach is that it doesn't require anything extra in your Rails or Ruby app, and the processing is done within the context of your Rails App, where it belongs. Other "daemon" hacks run outside the scope and context of your Rails app,and can become horrible messes when you later try to add authentication to your solution.

Tip: Don't insert anything into your crontab until you are certain your wget or curl command-line triggers your Rails App as you expect and intend.

Windows has a similar task scheduler called at, if you're trying to do this from a Windows box. All you need is a MinGW or windows-native version of wget, or other suitable command-line HTTP client.

That should get you started. Feel free to respond in this thread, or PM me if you need more detailed info.

Got it. Thank you so much. I will try this.

Laurence & Venu,

Please note what I said above. It is far simpler to use script/runner and much more secure than running it from wget and using some secret key that is plain texted over the web. I would HIGHLY recommend you take a quick look and see how SIMPLE it is to use script/runner, like I said in my previous post, outlining the basic use of it, it will let you setup cron jobs for you app on your server making it secure and not something that is dependent on a web connection. In fact, I Would say the wget method is much more complicated.

To copy your example in the crontab it would look like:

0-59/5 * * * * /path/to/ruby/ruby path/to/app/script/runner Model.method

see how simple that is? and its secure. Plus, all the logic is in the model and not in a controller.

I would encourage you to explore this route Venu as I personally feel it falls into a best practice for Ruby on Rails and that is what the script/runner is for.

Matthew,

Certainly, your approach is acceptable if it's a small application, where everything is guaranteed to run from the same machine, and all off of the same user account. It's just a matter of architecture style and preference.

Generally, I advocate HTTP-client based triggers, as they are much easier to invoke within an Intranet. This is particularly important for anyone who hopes to deploy within an Enterprise, when the triggers may need to be fired from Workflows and Orchestration servers elsewhere on the Enterprise Service Bus.

Designing for script/runner is not SOA or ESB friendly, and is generally a design architecture I avoid. HTTP or SOAP targets are easier to deploy without getting the client's Network Security guys involved.

If you designed for the script/runner approach, and you later need to work nicely with a Workflow or Orchestration server, you'll need to rewrite your cron rules (and Rails Code) to "http trigger" style. Otherwise, you'll need to go through lots of pain working with the client's Network Security team to get the Rails-App user account associated with a Certificate, in order to remote-execute that script/runner process via SSH.

Considering both approaches are launched via command-line, and aren't much different in terms of coding complexity, I'd lean toward the more Enterprise-scalable approach to future-proof my Rails App. Again, it's all just architecture style and preference. :-)

Hiya,

I'm not sure how many records you plan on Importing, but ActiveRecord is not very efficient if you are doing bulk inserts.

Each time you do Model.new & Model.save you are incurring both memory & sql overheads. If the import looks like its taking way longer than it should have a look at ActiveRecord Extensions here -> http://www.continuousthinking.com/are/activerecord-extensions-0-0-5

Hi

Sorry could not get time to try this out. I am trying to import the feeds into database. About 50 records each from about 20 sites. And this will run once a day. So the users can see the data without being aware of all this happening in the back. I will checkout the extensions as it seems like I would need that. Thank you all.

Hi Matt, Thank you so much. Your gem works fine and it makes it faster. Thank you all for all your help.

10 Posts
Login to add your message