RSS Sample Package

The RSS Sample package demonstrates agent-based harvesting of information from the web and the display of that information via RSP.

Introduction

It is common today for content providers on the Web to syndicate their content using RDF Site Summary (RSS). This makes it possible increase the distribution of their content by allowing it to be included on other Web sites.


The RSS sample package viewed in a Web browser.

RDF Gateway makes it easy to create a Web site that aggregates information from external data sources. In this article, I describe a package rss_sample that I created for RDF Gateway. This package implements a simple news Web site by aggregating information from various RSS news feeds on the Web. The information from the news feeds is used to generate a Web page that displays a title, brief description and link to for each news item. The news items are periodically updated from the feeds to keep the content on the site current.

I intentionally kept the implementation of the package simple to highlight specific features of RDF Gateway. If you are interested in expanding upon this package, it is available for download.


RSS Sample Package

The news site can be considered a complete Web application. It is implemented as a single RDF Gateway package named rss_sample. The package uses RDF Gateway data services to gather the RDF data contained in remote RSS news feeds. The information from the news feeds is copied into a RDF Gateway database table. The information in the table is kept current by using a RDF Gateway timer to periodically update it with the latest information from the news feeds. A RDF Server Page (RSP) formats the news information in the database table into a HMTL Web page. Users can view the Web page for the news site by navigating their Web browser to the URL of the RSP.


The rss_sample package runs on RDF Gateway.

In the following sections of the article I discuss the various files that make up the package.

include.rql
default.rsp
global.rsa


include.rql

This file contains some common RDFQL that is used in the other package files.


include.rql

In this file I implemented a function called LoadRSS. The purpose of this function is to read the remote RDF data from the RSS news feeds and store it in a database table. I use the RDF Gateway INET data service to connect to the remote RDF data sources. I then use a RDFQL query to insert all of the data from the remote sources into the table.

Let's first look at how I create a data service connection. A data service connection is a RDFQL data source, which means it can be queried using any of the RDFQL query commands. I create the data service connection by constructing a DataSource object with a data service connection string as its only argument. A data service connection string uses a format similar to a URL. It always begins the registered name of the data service, in this case "inet". If the data service requires any connection parameters then a "?" character is placed after the name of the data service followed by the connection parameters. URL encoding is used when specifying the connection parameters. For the INET data service two connection parameters are required, url and parsetype. In this case, for the url parameter I specify the URL of the remote RSS news feed. The parsetype parameter I set to "rdf" to direct the INET data service to parse the remote source as RDF data.

Once I have established all of the data service connections I insert all of the RDF data from the data service connections into a RDF Gateway database table named "rss". I first delete all of the existing data from the table because I want the table to contain only the latest news. To make it easier to use all of the data service connections in a query I add them to an array. This way I only need to specify the array in the USING clause of the query instead of listing all of the data service connections individually. Since the array variable is considered a RDFQL expression that is evaluated at runtime it must be added to the query as a replaceable query parameter denoted by the "#" character.

Note that I insert all of the data from the RSS news feeds into the database table. I figured I would leave it to the user interface portion of the package to determine which information to use from the feeds.

Listed below are the URLs for the RSS news feeds used in this package. Note that these URLs may have changed since the time this article was written.

http://slashdot.org/articles.rss
http://www.infoworld.com/rss/news.rdf
http://xmlhack.com/rss10.php

The other purpose of this common include file is to define all of the namespaces used in this package. I like to place these in a common include file so I only need to define them in one place.

default.rsp

This file implements the RDF Server Page (RSP) that produces the HTML Web page for the news site.


default.rsp

For those of you that are familiar with Active Server Pages (ASP) the syntax of RDF Server Pages (RSP) should be easy to follow. The RSP consists of HTML tags with blocks of RDFQL embedded in the special tag <%   %>. Any RDFQL is executed when the page is requested and is replaced by its output.

This RSP queries the database table rss for the latest data from the RSS news feeds and formats it as HTML. The RSP could get the data directly from the remote RSS news feeds. However this would be slower than accessing a database table. The only RSS data needed by this RSP is the title, description and link for each item. If you take a look at the RDFQL you will see that I create a cursor rs from a SELECT command that returns the title, description and link property for each RSS item in the table. I then iterate over each row in the cursor and generate the HTML for each news item.

One very powerful feature of RDF Gateway is its inference rules. You may have noticed in the RDFQL above the definition of a rule base. A rule base is nothing more than a set of inference rules that are included automatically into your queries. Here I create a rule to infer a RSS description property for every RSS item. Once this rule is defined I can write my query to use just the RSS description property (rss:description). The rule will infer this property for each RSS item from either the RSS description or the Dublin Core description, or create one if neither is defined for the item. The rule allows me to ensure that every news item has a description.

If your are not familiar with the RSS and Dublin Core schemas I've provided some helpful links below. These links were valid at the time I wrote this article.

RDF Site Summary (RSS) 1.0
RDF Site Summary 1.0 Modules: Dublin Core

global.rsa

This is a specially named file that contains all of the package event handler functions called by RDF Gateway. The global.rsa file must be located in the root directory of the package.


global.rsa

In the global.rsa file for the rss_sample package I implement one package event handler function Package_OnStart. If this function exists in the global.rsa file for a package it is called by RDF Gateway when the package is started. A package is normally started the first time it is accessed.

In this function I implement the startup logic for the news site. I first create the in-memory database table rss to hold a local cache of all the data from the RSS news feeds. I then call my LoadRSS function to initially fill the table with the latest data from the news feeds. I then create a named timer rss_sample_timer that fires every 20 minutes. The period of the timer is specified using a string formatted using the XML schema duration format, in this case "PT20M". I then set my LoadRSS function as an event handler for the timer so the data from the RSS news feeds is updated every 20 minutes. This design allows me to increase the performance of my package by enabling user requests to be serviced using a database table instead of accessing the remote data sources directly.

That's all there is to it, you can download the package and try it out for yourself.

Downloading And Installing The Package

Click on the following link to download the rss_sample package files.

RSS Sample Package Download


RDF Query Analyzer

Expand the compressed file in a directory that can be access by your installation of RDF Gateway. Then connect to RDF Gateway using RDF Query Analyzer and execute the RDFQL shown in the figure. You will need to replace the file path "c:/your_package_path" with the file path where you expanded the package files.