Python Crawler

This IMO is one of the dirtiest way for me up till now to accomplish a requirement. But it does the job I want. :D

_ DISCLAIMER: I’m a learner. There must be better, smarter and easier way to accomplish the same task._

**rrs@learner:~/My_Documents/My Books $ cat /home/rrs/devel/eclipse/PythonFun/web_pattern_fetcher.py
**

**#!/usr/bin/env python

”“”
This tiny little script does the job of crawling into Apache generated directory listings
and download scanning a specific pattern.
I’m using it to download anything that apache shows as TXT or IMG.
I’m sure others will be able to extend it more.
”“”

import urllib, urllib2, string

url = “http://www.wuppy.net.ru/Fun/"
req = urllib2.Request(url)
handle = urllib2.urlopen(req)

x = 1
data = “

while x:
data = “
line = handle.readline()
if “[TXT]” in line or “[IMG]” in line:
word_list = line.split(’ ‘)
word = word_list[4:5]
req_word = str(word)
# Break and take out the relevant data uri
begin_num = req_word.find(“>”)
end_num = req_word.find(”</A” )
req_word = list(req_word)
while begin_num < end_num - 1:
final_word = string.lstrip( string.rstrip(str(req_word[begin_num+1:begin_num+2]), “‘]”), “[‘“)
data += final_word
begin_num += 1
#data.append(req_word[begin_num+1:begin_num+2])
real_url = url + data
urllib.urlretrieve(real_url, data)
if line == “:
x = 0**