Robin.Liu-CSDN博客

转载 flask实现异步任务

from flask import Flaskimport timefrom concurrent.futures import ThreadPoolExecutorexecutor = ThreadPoolExecutor(1)app = Flask(__name__)@app.route('/synchronize')def update_redis(): exe...

2018-10-29 13:55:20 1817

原创日常简单函数（爬虫-请求）

请求数据# coding=utf-8import requestsfrom fake_useragent import UserAgent# 私有请求模块def _request(url, times=0, type='text'): Retry_times = 5 # 重试次数 try: res = requests.get(url, headers...

2018-06-28 17:27:45 508

原创 lxml的坑

正常情况下请求requests 请求的内容进行转换html = etree.HTML(response)固定思维，html是一个selector对象但是如果response是‘空’a = ''html = etree.HTML(a)print(html)那html是什么呢是None如果是None呢他就会报错...

2018-06-24 14:20:25 926

原创 MongoDB远程导出，本地导入命令笔记

介绍三个命令：命令模式下：连接远程数据库 mongo ip:port/仓库名称远程导出数据库 mongodump -h ip --port 端口 -d 远程仓库-o 本地存储路径导入本地数据库 mongorestore -d 仓库名称 --drop 数据库备份路径...

2018-04-22 12:28:35 2889

原创 python CSDN模拟登陆（三种方法）

第一种方法：requests直接携带cookies上代码：import requestsimport reclass myLogin(): def __init__(self): self.header = { 'User-Agent': 'iTunes/4.2 (Macintosh; U; PPC Mac OS X 10.2)',

2017-11-25 23:04:33 880

原创多线程代理ip验证（requests）

直接上代码：import requestsfrom queue import Queueimport threadingclass proxy_ip(): # 初始化参数 def __init__(self): self.url = 'http://www.baidu.com/' self.ip_list_queue = Queue()

2017-11-24 03:59:04 8475

原创 python爬虫之scrapy中user agent浅谈（两种方法）

user agent简述User Agent中文名为用户代理，简称 UA，它是一个特殊字符串头，使得服务器能够识别客户使用的操作系统及版本、CPU 类型、浏览器及版本、浏览器渲染引擎、浏览器语言、浏览器插件等。user agent开始（测试不同类型user agent返回值）手机user agent 测试：Mozilla/5.0 (Linux; U; Android 0.5;

2017-11-24 00:35:59 19904 6

原创 scrapy抓取腾讯招聘数据并入库mongodb(浅)

明确目标：抓取内容：职位名称、人数，类别、地点、发布时间以及详情页面的岗位职责、工作要求1.配置itens.py既然以及确定目标，开始定义items.pyimport scrapyclass TtspiderItem(scrapy.Item): mc = scrapy.Field() # 名称 lb = scrapy.Field() # 类别 rs =

2017-11-15 16:53:56 542

原创 python爬虫之验证码识别（浅）

话不多说，大人上码ლ(′◉❥◉｀ლ)！！！# coding = utf-8import requestsimport pytesseractfrom PIL import Imageclass checkcode(): def __init__(self): # 初始化参数 self.start_url = 'http://jxjy.dwjtaq.com/

2017-11-13 22:18:08 481

原创 python爬虫selenium模块实现登陆（浅）

selenium是一个非常好的模块使用selenium首先要导入模块from selenium import webdriver要拿webdriver实现功能需要实例化一个driverdriver = webdriver.Chrome此时driver具有以下方法（此处仅仅介绍常用）driver.get(url) # 请求数据d

2017-11-13 20:47:55 564

原创多线程爬虫案例（浅）一

单线程爬虫：# coding = utf-8import requestsfrom lxml import etreeimport timeclass bdjSpider(): def __init__(self): self.start_url = 'http://www.budejie.com/text/' self.headers =

2017-11-13 18:23:23 356