Python初学...爬取百度图库图片^ 功能就是输入一个想要爬取的字符串比如“超级玛丽”，程序会自动分页加载一直爬取，直到没有图片codeimportrequestsimportosimportrepn=0#从哪个图片下标开始rn=30#每次多少张图片pn与rn参数是在Google开发者工具 ...

网站源码下载网 › 首页 › 综合资讯› 技术文章 ›

Python初学...爬取百度图库图片

技术文章

每日更新

2024-8-16 21:39 572人浏览 0人回复

原作者: 全都有综合资源网来自: 全都有综合资源网收藏分享邀请

摘要

^ 功能就是输入一个想要爬取的字符串比如“超级玛丽”，程序会自动分页加载一直爬取，直到没有图片codeimportrequestsimportosimportrepn=0#从哪个图片下标开始rn=30#每次多少张图片pn与rn参数是在Google开发者工具 ...

^ 功用就是输入一个想要爬取的字符串比如“超级玛丽”，法式会自动分页加载一向爬取，直到没有图片

layui-box layui-code-view" style="margin-top: 10px; margin-bottom: 10px; padding: 0px; -webkit-tap-highlight-color: rgba(0, 0, 0, 0); white-space: pre-wrap; overflow-wrap: break-word; box-sizing: content-box; position: relative; font-size: 12px; border-width: 1px 1px 1px 6px; border-style: solid; border-color: rgb(226, 226, 226); border-image: initial; background-color: rgb(242, 242, 242); color: rgb(51, 51, 51); font-family: "Courier New";">code
import requests
import os
import re
pn = 0 #从哪个图片下标起头 
rn = 30 #每次几多张图片 pn与rn参数是在Google开辟者工具里面找到的两个参数。。。很难找，坑爹啊啊啊
global number
#中文的话文件夹名字会乱码
name = "chaojimali"
def getImagePath(pn = 0):
    try:
        url = '''http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%s&pn=%d&rn=%d''' % (name,pn,rn)
        headers = {"user_agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers)
        # source = BeautifulSoup(response.content, 'lxml', from_encoding='utf-8')
        # paths = source.find_all("objURL")
        content = response.content
        #正则考证
        # links = re.findall('"((http|ftp)s?://.*?.(png|jpg|jpeg|gif))"', response.content)
        links = re.findall('"((http|ftp)s?://.*?.(png|jpg|jpeg|gif))"',content)
        if not os.path.exists(name):
            os.mkdir(name)
        for path in links:
            imgPath = path[0]
            image = requests.get(imgPath)
            #返回码为200才去下载
            if image.status_code != 200:
                continue
            print imgPath
            try:
                #尝试下载图片，失利了跳过这张图
                open(name + os.sep + (imgPath[imgPath.rfind("/"):]), "wb").write(image.content)
            except:
                continue
        pn+=rn
        getImagePath(pn)
    except:
        pn += rn
        getImagePath(pn)
#起头
getImagePath(pn)

©版权免责声明

1、本站所有资源均来自用户上传及互联网。如有侵权，请联系站长！
2、分享目的仅供大家学习交流。下载后必须在24小时内删除！
3、不得用于非法商业目的或违反国家法律。否则，后果自负！
4、本站提供的源代码、模板、插件等资源不包含技术服务。敬请谅解！
5.如果出现无法下载、无效或有广告的链接，请联系管理员寻求帮助！
6、本站资源价格仅用于赞助，所收取的费用仅用于维持本站日常运营！
7、如果遇到加密压缩包，请使用WINRAR解压。如果遇到无法解压的加密压缩包，请联系管理员！
8、由于精力有限，很多源代码无法详细测试（解密），部分源代码无法区分为病毒或误报，所以没有进行修改。请在使用前进行筛选。

本文网址：

联系客服

Python初学, 爬取, 图库图片