原创:Python爬虫实战之爬取代理ip
编程的快乐只有在运行成功的那一刻才知道QAQ
目标网站:https://www.kuaidaili.com/free/inha/ #若有侵权请联系我
因为上面的代理都是http的所以没写这个判断
代码如下:
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 import urllib.request 4 import re 5 import time 6 n = 1 7 headers = {\'User-Agent\':\'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\'} 8 def web(url): 9 req=urllib.request.Request(url=url,headers=headers) 10 response = urllib.request.urlopen(url) 11 html = response.read().decode(\'UTF-8\',\'ignore\') 12 ip = r\'[0-9]+(?:\.[0-9]+){3}\' 13 port = r\'"PORT">(\d{0,1}\d{0,1}\d{0,1}\d{0,1}\d)<\' 14 out = re.findall(ip,html) 15 out1 = re.findall(port,html) 16 i = 0 17 dictionary = {} 18 while i <= 14: 19 dictionary[0] = (out[i],out1[i]) 20 store(dictionary) 21 i += 1 22 print(out,\'\n\',out1) 23 def store(dictionary): 24 with open(\'ip.txt\',\'a\') as f: 25 c = \'ip:\' + dictionary[0][0] + \'\tport:\' + dictionary[0][1] + \'\n\' 26 f.write(c) 27 print(\'store successfully\') 28 while n <= 3313: 29 url1 = "https://www.kuaidaili.com/free/inha/" 30 url = url1 + str(n) +\'/\' 31 web(url) 32 time.sleep(5) 33 n += 1
版权声明:本文为vhhi原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。