RESEARCHUT -- Minds With Innovations
RESEARCHUT
Minds With Innovations

RESEARCHUT - minds with innovations

iprint in mod_python

Monday 17 April 2006 at 10:09 pm

This small script demonstrates mod_python.

Python Crawler

Sunday 16 April 2006 at 4:43 pm

This IMO is one of the dirtiest way for me up till now to accomplish a requirement. But it does the job I want. :D

 DISCLAIMER: I'm a learner. There must be better, smarter and easier way to accomplish the same task.

rrs@learner:~/My_Documents/My Books $ cat /home/rrs/devel/eclipse/PythonFun/web_pattern_fetcher.py

#!/usr/bin/env python

"""
This tiny little script does the job of crawling into Apache generated directory listings
and download scanning a specific pattern.
I'm using it to download anything that apache shows as TXT or IMG.
I'm sure others will be able to extend it more.
"""

import urllib, urllib2, string

url = "http://www.wuppy.net.ru/Fun/"
req = urllib2.Request(url)
handle = urllib2.urlopen(req)

x = 1
data = ''

while x:
    data = ''
    line = handle.readline()
    if "[TXT]" in line or "[IMG]" in line:
        word_list = line.split(' ')
        word = word_list[4:5]
        req_word = str(word)
        # Break and take out the relevant data uri
        begin_num = req_word.find(">")
        end_num = req_word.find("</A" )
        req_word = list(req_word)
        while begin_num < end_num - 1:
            final_word = string.lstrip( string.rstrip(str(req_word[begin_num+1:begin_num+2]), "']"), "['")
            data += final_word
            begin_num += 1
            #data.append(req_word[begin_num+1:begin_num+2])
        real_url = url + data
        urllib.urlretrieve(real_url, data)
    if line == '':
        x = 0

Pythonic Addiction

Saturday 15 April 2006 at 02:47 am

#!/usr/bin/env python

def files(root):
    for path, folders, files in os.walk(root):
        for file in files:
            yield path, file


def find_match(repository): # aka walk_tree_copy()
    for path, file in files(repository):
        if file.endswith ('html') or file.endswith ('htm') or file.endswith ('HTML') or file.endswith ('HTM'):
        #if file.endswith ('html.gz') or file.endswith ('htm.gz') or file.endswith ('HTML.gz') or file.endswith ('HTM.gz'):
            try:
                os.environ['__TEMP__VAL'] = file
                os.chdir(path) # We need to chdir so that gzip can see the file in the cwd
                os.system('gzip $__TEMP__VAL')
                sys.stdout.write("%s/%s has been gzipped\n" % (path, file))
            except IOError:
                sys.stdout.write("Aieeeee.... I got some error with %s!\n\n" % (file))
            continue
            #return True
    return False


def main():
    REPOSITORY = raw_input("Please enter a path to look for the files to zip.\nHit Return Key if you want the default path i.e. \"/home/rrs/My_Documents/My Books\"")

    if REPOSITORY == '':
        REPOSITORY = "/home/rrs/My_Documents/My Books/"

    find_match(REPOSITORY)

if __name__ == '__main__':
    import os, sys, shutil
    main()

Ritesh Sarraf Ticketed Again

Wednesday 12 April 2006 at 6:13 pm

Ritesh Sarraf Ticketed Again