...Because someone's gotta tell the story...


To return to the main Blog List, click Full Blog Listing.



Further technical details on the gas price database

Monday, May 05, 2008 in Technical Articles (Views: 5916)
So, I wanted to blog a little about the gas price database. The database ran as a result of several components, they were:

1. A bot which ran nightly to extract prices

2. An application which parsed those prices

3. A SQL Stored procedure which handled all of the processing of the records

4. A clean up procedure to make sure records were not in the database 7 days

...and of course, the web interface for both the home page and the actual gas price database page.

So, let's go through the components:

The bot was nothing more than a glorified wget application (a program that can download web pages). The application then read in the data provided, and extracted things like the station name, address, prices, and when it was last verified.

This data was then fed into a SQL database, where a stored procedure was waiting for it. The stored procedure would:

1. Take in the information

2. Look to see if the station already existed.

If so, update the record

If not, add the record

3. It would repeat the process with all street names

4. Then, it would check to see if a station with the same address existed.

If so, update the station with the new prices

If not, add a new station with the prices.

Then of course, the process would clean up all records older than 7 days at the end.

So, what about the database itself?

Well, to store so many records, there were about 180,000 and to conserve disk space, I normalized the data as much as possible. Consider what the record itself looked like, it contained the following:

1. A map field

2. The station name

3. The address (including city, state, and zip)

4. Prices for regular, plus, premium, and diesel (if it knew)

5. The date the prices were verified

Now, only some of those fields ACTUALLY existed.

Here are how the tables were set up:

1. A gas price table, which was the main table

2. A table for street names

3. A seperate table for zipcodes

4. A table for gas stations

Here's how the data was presented:

Map: This was a very long SQL query that rendered a URL to MapQuest. This data never existed in the database. It was pulled from address, city, state and zip information.

Station Name: Existed in a seperate table, and an index was left in the main table. For example, "6" could mean Sunoco.

Address: This cross referenced another table also. The data would look like this: one field would be 123, another 244. 123 would have been the street number, and 244 referenced the street name, such as Main Street.

City and Statr: They also never existed. This information was pulled from a seperate SQL zip code database table.

ZIP: This field existed in the main database.

Date Verified: Also existed in the database.

Needless to say, there were a lot of subqueries going on when the data was binding to a page, but it beat the alternative of having to live Twith a single table.

The geolocator, which was kind of a new invention, was blogged on previously.

In summary, the gas price database was great while it lasted. It was a good tool, useful for finding the best prices in the general area, and gave me a project with a large database to work with.


Related Blogs You May Be Interested In:

To leave a comment, please log in and/or register.