Posted: Tue Aug 30, 2005 7:21 pm Post subject: Regexp Module
This is just a rambling about an idea for a module.
A module that would take data to parse through, and a bunch of keywords (to become hash keys) and strings to search for in the data.
So like....
Code:
use Module;
# Get name data for 'Bob'
my $data = LWP::Simple::get "http://www.weddingvendors.com/baby-names/meaning/bob/";
my %results = Module::parse (
$data,
name => '<h1>(.*?) \(First Name Origin and Meaning\)<\/h1>',
origin => '<dt>Origin<dt>\s*<dd>(.*?)<\/dd>',
mean => '<dt>Meaning<\/dt>\s*<dd>(.*?)<\/dd>',
gender => '<dt>Gender<\/dt>\s*<dd>(.*?)<\/dd>',
);
# And then %results might read:
%results = (
name => 'Bob',
origin => 'German',
mean => 'Form of Robert. Bright fame.',
gender => 'male',
);
Anyhow, just an idea for many of our site leechin' commands to just LWP::Simple::get the data from the URL and then give it a few regexps instead if doing a lot of m/// lines or if statements like our commands tend to do.
So for big pages like weather reports it could save many lines of code.
Just an idea, might start on it in my free time, or if one such module exists go ahead and reply about it. Save me a project, I have enough as it is, lol _________________ Current Site (2008) http://www.cuvou.com/
I like the idea of a module such as that. I have been working on a group of modules that encapsulate the data within a particular site, called WebData::SiteName, where SiteName, obviously, is a shortened version of the name of the site. The idea is that the gory details of parsing the sites are contained in the modules and then there is a very simple API (the WebData modules) that are common among all sites. I've written about 7 or 8 so far, but the guts are all hand-coded. Using a module as you suggest would definitely speed up the process or writing new modules. Ultimately, if we had many people writing to this system, we could get access to a lot of interesting and useful data.
Here is one of my example test functions for my US President WebData module.
Code:
sub testPotus
{
my $president_keys = WebData::Potus::getPresidentKeys();
That spits out a hash containing field/value pairs for presidents, for example, name, place of birth, date of birth, years of presidency, etc. The data from the site is now in a very easy to use format that can be imported directly into a database or used in other ways.