タグやURLの正規表現は超適当。

#!/usr/bin/python

import sys, os, urllib, gzip, re

target = ""

tag = re.compile('<[^>]+>')
url = re.compile('^http://.+')
space = re.compile('(\s+|\r?\n)')

if(len(sys.argv) == 2):
    target = sys.argv[1]

html = urllib.urlopen(target).read()
text = space.sub(' ', tag.sub('', html))
print text

chikoの日記

Webページを取ってきてタグを取り除くPythonスクリプト