Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
August 11, 2006, 7:42 am
rate this thread
Since they have more than 20,000 entries I have to go to each
, parse with regular expression and extract data to database. This data
be updated every two days.
The program i am analyzing now is that I have a number of clients site
on the same machine and if my program occupies the cpu usages( more
web server might hang and won't accept any connection from outside
I came up with some idea to reduce process overhead.
1. go to the site and download all sites without parsing.
2. once all sites have been downloaded to local starts parsing.
3. save all data in a database.
if any has a better idea let me know.
Thank you for sharing your idea.
I don't know much about the rss feeds, but it should be set up
in the source site right? Say if i want to get some data from
mysite.com has to provide the rss xml file right?
What i want to do is the crawling the external sites and extracting
August 11, 2006, 1:01 pm
Hard to tell from your post...is this script running through the web
server as a page?
If so, you might try converting it to a command line script and run it
via cron. In that situation, ideally, the cpu should use appropriate
process threading to prevent the web server from locking up.