The RSS feeds at Pitchfork are pretty broken. The links and guids (which are the same) cycle through a list of servers (www.pitchforkmedia.com
, webapp.pitchforkmedia.com:9010
, 62.42.344a.static.theplanet.com:9014
, etc.). Most of the links don’t work when you try to follow them, and the items keep showing up as new because the guid has changed. Also, links to features are broken: they point to /features/id
instead of feature/id
.
I wrote a simple script to fix these problems. It also makes the feed valid by changing the author
element to dc:creator
(author
requires an email address) and removing HTML from the title
element. Oh, and setting the content-type properly.
For those who are interested, here’s the code:
#!/usr/bin/env ruby
require 'open-uri'
require 'rss/2.0'
require 'rss/dublincore'
require 'dublincore-rss2'
r = Apache.request
r.content_type = 'application/rss+xml; charset=utf-8'
r.send_http_header
exit(Apache::OK) if r.header_only?
DOMAIN = 'pitchforkmedia.com'
RSS_BASE = "http://#{DOMAIN}/rss/"
section = r.path_info.match(%r{(?!/)(\w+)}).to_s
section = 'today' if section.empty?
section.untaint
begin
feed = open(RSS_BASE + section).read
rescue OpenURI::HTTPError
exit(Apache::HTTP_NOT_FOUND)
end
rss = RSS::Parser.parse(feed, false)
rss.items.each do |item|
uri = URI.parse(item.guid.content)
uri.host = DOMAIN
uri.port = 80
uri.path.sub!(%r{/features/}, '/feature/')
item.guid.content = uri
item.link = uri
item.dc_creator = item.author
item.author = nil
item.title.gsub!(%r{</?.*?>},'')
end
r.puts rss
dublincore-rss2.rb
:
# mix dublincore into rss 2.0
# http://www.cozmixng.org/repos/rss/trunk/lib/rss/dublincore/2.0.rb
module RSS
Rss.install_ns(DC_PREFIX, DC_URI)
class Rss
class Channel
include DublinCoreModel
class Item; include DublinCoreModel; end
end
end
end
previously: Delete Songs From Smart Playlist in Itunes