今天我看到太多的精品文章,如果你也想学习python3.4以后的异步,不妨可以看看我在文末推荐的几篇文章。对我理解异步很有帮助,我愿称之为精品。
下面先展示同步和异步代码和速度截图:
同步:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import time
import requests
from lxml import html
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
def sync_test(url):
response = requests.get(url, verify=False)
print('Hello World:%s' % time.time())
return response.text
def run():
url = "https://51zuoyejun.com/programCase.html"
response = requests.get(url, verify=False)
etree = html.etree
html_ = etree.HTML(response.text)
detail_list = html_.xpath('//div[@class="content"]/a/@href')
for href in detail_list:
tasks.append(sync_test("https://51zuoyejun.com{}".format(href)))
if __name__ == '__main__':
tasks = list()
s = time.time()
print('start time:{}'.format(s))
run()
for t in tasks:
print(t[:15])
e = time.time()
print('end time:{}'.format(e))
print('spend time:{}s'.format(e - s))
两次运行时间截图:
异步:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import asyncio
import time
import requests
from aiohttp import ClientSession
from lxml import html
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
async def async_test(url):
# 建立一个session对象
async with ClientSession() as session:
# 用session对象请求接口
async with session.get(url) as response:
# 在等待操作前加上await
response = await response.read()
# print('response', response)
print('Hello World:%s' % time.time())
return response
def run():
url = "https://51zuoyejun.com/programCase.html"
response = requests.get(url, verify=False)
etree = html.etree
html_ = etree.HTML(response.text)
detail_list = html_.xpath('//div[@class="content"]/a/@href')
for href in detail_list:
task = asyncio.ensure_future(async_test("https://51zuoyejun.com{}".format(href)))
tasks.append(task)
if __name__ == '__main__':
tasks = list()
# 创建事件循环
loop = asyncio.get_event_loop()
run()
s = time.time()
print('start time:{}'.format(s))
# 让时间循环执行这个任务
loop.run_until_complete(asyncio.wait(tasks))
# 自主通过属性获取结果
# for t in tasks:
# print(t._result.decode()[:15])
# 收集http响应
result = loop.run_until_complete(asyncio.gather(*tasks))
for r in result:
print(r.decode()[:15])
e = time.time()
print('end time:{}'.format(e))
print('spend time:{}s'.format(e - s))
两次运行截图:
由于网络抖动,和这个目标网站访问本来就有点慢的原因,同步最慢可以达到43秒,异步最慢也可以达到19秒。但总得来说异步比同步快的很多倍,而且这一小段也是我之前写过的一点爬虫里面的逻辑,其实对于生产来讲这种提速还是有很大帮助的。
我就不记录这个asyncios模块的用法和原理了,下面我开始推荐我一路看过来的几篇文章写的都很不错。
asyncio模块使用操作,推荐来自博客园文章python异步编程之asyncio(百万并发);讲解asyncio实现原理,我愿称之为精品,推荐来自简书文章[asyncio随记一]asyncio的实现原理和关键源码分析;讲解原理过程中,底层是基于Treading中Event对象实现的event pool,推荐来自博客园文章,python之event事件。
这些优秀的文章作者无不是对源码有着深刻了解,还是学会看源码很重要。