Posted: Wed Oct 05, 2005 9:08 pm Post subject: searchbot
Hi Faiz,
My names Mike and I was hoping one of you guys may be able to offer advise or indeed help.
I have a seach engine called Searchwire, its written in PHP and basically what I am trying to do is to find a developer who can create for me a Bot which will act in a similar way to Googlebot.
Trawl the net, add links to my site automatically etc. I have searched high and low on the net and no joy. Could you offer advise or do you know who I should contact on this?
I have made one, in Perl, which trawls the web for links, which it then queues, and searchs those pages.
It is however very inefficient (it takes loads of memory) and won't find non-xhtml links as well (ie, the regex is that that it only gets href="*?", rather than all variations.
I would be happy to give it to you, and to forward your enquiry to someone i know who has done the regexes, etc. Both versions are in Perl, but can be easily converted =].
Quote:
add links to my site automatically etc
It'll be easy(ish) to add the links to a database, but the problem is that the spider can't distinguish "good" links from "bad" links, which will give you thousands (my last 20 minute trawl indexed around 20,000 pages, most of which have 20+ links) of results, which will be useless. For instance, imagine a spider getting caught on this board - it'd have thousands of links, mostly the same, to go through.
Hope this is of some help =] _________________ ~ Josh
[ Need bot hosting on a dedicated server? PM me. ]
I have made one, in Perl, which trawls the web for links, which it then queues, and searchs those pages.
It is however very inefficient (it takes loads of memory) and won't find non-xhtml links as well (ie, the regex is that that it only gets href="*?", rather than all variations.
I would be happy to give it to you, and to forward your enquiry to someone i know who has done the regexes, etc. Both versions are in Perl, but can be easily converted =].
Quote:
add links to my site automatically etc
It'll be easy(ish) to add the links to a database, but the problem is that the spider can't distinguish "good" links from "bad" links, which will give you thousands (my last 20 minute trawl indexed around 20,000 pages, most of which have 20+ links) of results, which will be useless. For instance, imagine a spider getting caught on this board - it'd have thousands of links, mostly the same, to go through.
Hope this is of some help =]
Hi
Thats excellent:) could i download this please to take a look at it,
Posted: Thu Oct 06, 2005 8:20 am Post subject: Bot
Hi Josh:)
Downloaded the progiee thats cool. So, If I may just give you a little more info. My site is in PHP coded for me by another and basically my understanding of HTML is 100% but Perl and PHP its 0%.
Is it possible you could contact this dude who can make the bot more intelligent ? Here is the idea I have:
Catergory: Business
Lots of sub cats though, however the bot could have trawl instructions perhaps to trawl for specifics eg. Business> Accountants and return a search based on this.
Ok so ideally
Bot trawls the net under the trawl term of Business> Sub Cat and perhaps reads the meta tags of sites , title etc (robots.txt)
Then bot reports home with information and dumps it into a database or whatever which is then uploaded to the site.
My site has Perl, PHP etc on the hosting.
Hope you understand all this mate,
Finally, for your help to make this Bot more intelligent, I would pay say £100 if thats ok:)
Please add me on MSN to discuss further. I have PM'd you my MSN email.
Using meta tags will be difficult, as it will miss certain things, as..how do you define "business"? You would have to list business types, and professions...and then you would miss some. It's complicated. MSN me to talk about it =].
Quote:
Downloaded the progiee thats cool
If you mean the one in my signature, that's not it . That's a bot for MSN - not what you want. this is what you want . _________________ ~ Josh
[ Need bot hosting on a dedicated server? PM me. ]