Python Crawler

This IMO is one of the dirtiest way for me up till now to accomplish a requirement. But it does the job I want. :D

_ DISCLAIMER: I’m a learner. There must be better, smarter and easier way to accomplish the same task._

rrs@learner:~/My_Documents/My Books $ cat
/home/rrs/devel/eclipse/PythonFun/web_pattern_fetcher.py  

#!/usr/bin/env python  
  
"""  
This tiny little script does the job of crawling into Apache generated
directory listings  
and download scanning a specific pattern.  
I'm using it to download anything that apache shows as TXT or IMG.  
I'm sure others will be able to extend it more.  
"""  
  
import urllib, urllib2, string  
  
url = "http://www.wuppy.net.ru/Fun/"  
req = urllib2.Request(url)  
handle = urllib2.urlopen(req)  
  
x = 1  
data = ''  
  
while x:  
    data = ''  
    line = handle.readline()  
    if "[TXT]" in line or "[IMG]" in line:  
        word_list = line.split(' ')  
        word = word_list[4:5]  
        req_word = str(word)  
        # Break and take out the relevant data uri  
        begin_num = req_word.find(">")  
        end_num = req_word.find("</A" )  
        req_word = list(req_word)  
        while begin_num < end_num - 1:  
            final_word = string.lstrip( string.rstrip(str(req_word[begin_num+1:begin_num+2]), "']"), "['")  
            data += final_word  
            begin_num += 1  
            #data.append(req_word[begin_num+1:begin_num+2])  
        real_url = url + data  
        urllib.urlretrieve(real_url, data)  
    if line == '':  
        x = 0