Python 抓取Bing搜索每日图片

下面举一个小例子抛砖引玉

import urllib,sys,os,json,re
import xml.etree.ElementTree as ET


baseSavePath = '/var/www/html/img/bings/'
baseXMLPath = '/var/www/html/lib/xml/'
XMLName = 'bingdailyimg.xml'

#depth参数确定需要抓取的天数
def get_bing_photo(depth=1):
        for i in range(0,depth):
                url = 'http://cn.bing.com/HPImageArchive.aspx?format=js&idx='+str(i)  +'&n=1&nc=15852669114&FORM=HZY'
                html = urllib.urlopen(url).read()
                if html == 'null':
                        print 'open & read bing error! at '+str(i)
                        break

                jsondat=json.JSONDecoder().decode(html)
                imgjson=jsondat['images'][0]

                hashval = imgjson['hsh']
                name = imgjson['url'].split('/')[-1]
                describe = imgjson['copyright']
                date = imgjson['fullstartdate']
                imgurl = imgjson['url']

                print 'Json data loads success'
                savepath =baseSavePath + name
                imgurl='http://cn.bing.com'+imgurl
                urllib.urlretrieve(imgurl, savepath)

                print name + ' save success!'

                tree = ET.parse(baseXMLPath + XMLName)
                root = tree.getroot()
                imgtag = ET.SubElement(root,"img")
                hashtag = ET.SubElement(imgtag,"hash")
                hashtag.text = hashval
                nametag = ET.SubElement(imgtag,"name")
                nametag.text = name
                pathtag = ET.SubElement(imgtag,"path")
                pathtag.text = WebPath + name
                desctag = ET.SubElement(imgtag,"describe")
                desctag.text = describe
                timetag = ET.SubElement(imgtag,"time")
                timetag.text = date
                tree.write(baseXMLPath + XMLName)

                print 'XML tree construct complete'

get_bing_photo(1)

几点说明

  • http://cn.bing.com/HPImageArchive.aspx是获取Bing每日图片的标准API,通过此API可获得一个Json数据,可以在浏览器上自行研究
    idx指起始天数;n指合计天数
  • 本例建立了一个XML用于建立每日图片详细信息的索引,作为一般网页应用使用足矣,但建议做成数据库,那样会比用XML方法具有更高的可靠性

  • 留下评论