爬虫爬取图片

代码

from bs4 import BeautifulSoup
import requests
import os
import shutil

headers = {
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
}

def download_jpg(imgge_url, image_localpath):
    response = requests.get(imgge_url, stream=True)
    if response.status_code == 200:
        with open(image_localpath, 'wb') as f:
            response.decode_content = True
            shutil.copyfileobj(response.raw, f)

def craw(url):
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    for div in soup.find_all('div', class_='group'):
        for img in div.find_all('img'):
            imgurl = img['src']
            dir = os.path.abspath('./download')
            filename = os.path.basename(imgurl)
            imgpath = os.path.join(dir, filename)
            print('开始下载 %s' % imgurl)
            download_jpg(imgurl, imgpath)


for i in range(1, 10, 1):
    url = 'http://xxxxxx.com/plugin.php?id=group&page=' + str(i)
    print(url)
    print('第 %s 页' %i)
    craw(url)

运行

7PiZ22V1Gu.jpg

营养有点不足,溜了溜了

文章名: 《爬虫爬取图片》

本文链接:https://lula.fun/1040.html

除特别注明外,文章均由 Lula(噜啦) 原创

 原创文章 转载时请注明 出处 以及文章链接
最后修改:2019 年 10 月 10 日 04 : 52 PM

发表评论