User Control Panel
Advertisements

HELP US, HELP YOU!

Regexp Module

 
Post new topic   Reply to topic    Bot Depot Forum Index -> General Chat & Discussion
View unanswered posts
Author Message
Cer
Upgraded Agent
Upgraded Agent


Joined: 03 Feb 2004
Posts: 3776
Location: Michigan
Reputation: 146.9
votes: 4

PostPosted: Tue Aug 30, 2005 7:21 pm    Post subject: Regexp Module Reply with quote

This is just a rambling about an idea for a module.

A module that would take data to parse through, and a bunch of keywords (to become hash keys) and strings to search for in the data.

So like....
Code:
use Module;

# Get name data for 'Bob'
my $data = LWP::Simple::get "http://www.weddingvendors.com/baby-names/meaning/bob/";

my %results = Module::parse (
   $data,
   name   => '<h1>(.*?) \(First Name Origin and Meaning\)<\/h1>',
   origin => '<dt>Origin<dt>\s*<dd>(.*?)<\/dd>',
   mean   => '<dt>Meaning<\/dt>\s*<dd>(.*?)<\/dd>',
   gender => '<dt>Gender<\/dt>\s*<dd>(.*?)<\/dd>',
);

# And then %results might read:
%results = (
   name => 'Bob',
   origin => 'German',
   mean => 'Form of Robert. Bright fame.',
   gender => 'male',
);


Anyhow, just an idea for many of our site leechin' commands to just LWP::Simple::get the data from the URL and then give it a few regexps instead if doing a lot of m/// lines or if statements like our commands tend to do.

So for big pages like weather reports it could save many lines of code.

Just an idea, might start on it in my free time, or if one such module exists go ahead and reply about it. Smile Save me a project, I have enough as it is, lol

_________________
Current Site (2008) http://www.cuvou.com/
Back to top
Mojave
Almost An Agent
Almost An Agent


Joined: 01 Nov 2003
Posts: 1434

Reputation: 66.4

PostPosted: Wed Aug 31, 2005 4:15 am    Post subject: Reply with quote

I like the idea of a module such as that. I have been working on a group of modules that encapsulate the data within a particular site, called WebData::SiteName, where SiteName, obviously, is a shortened version of the name of the site. The idea is that the gory details of parsing the sites are contained in the modules and then there is a very simple API (the WebData modules) that are common among all sites. I've written about 7 or 8 so far, but the guts are all hand-coded. Using a module as you suggest would definitely speed up the process or writing new modules. Ultimately, if we had many people writing to this system, we could get access to a lot of interesting and useful data.

Here is one of my example test functions for my US President WebData module.

Code:
sub testPotus
{
   my $president_keys = WebData::Potus::getPresidentKeys();

   print( Data::Dumper::Dumper( $president_keys ) );
   print( "\n\n" );

   my @keys = keys %$president_keys;
   for( my $i=0; $i<2; $i++ )
   {
      my $president = $keys[$i];
      print( "$president\n" );

      my $info = WebData::Potus::getPresidentInfo( $president );

      print( Data::Dumper::Dumper( $info ) );
      print( "\n" );
   }
}


That spits out a hash containing field/value pairs for presidents, for example, name, place of birth, date of birth, years of presidency, etc. The data from the site is now in a very easy to use format that can be imported directly into a database or used in other ways.
Back to top
Display posts from previous:   
Post new topic   Reply to topic    Bot Depot Forum Index -> General Chat & Discussion All times are GMT
Page 1 of 1

 



Protected by phpBB Security phpBB-TweakS
phpBB Security Has Blocked 9 Exploit Attempts.
Antispam Captcha Mod by phpbb-security.com
Powered by phpBB © 2001, 2005 phpBB Group