TA的每日心情 | 擦汗 4 天前 |
---|
签到天数: 532 天 [LV.9]以坛为家II
版主
- 积分
- 1215
|
处理起来更加简单
- import requests
- import parsel
- #此函数获取故事的链接
- def get_url():
- res = requests.get('http://book.sbkk8.com/gushihui/taijiaogushi/')
- selector = parsel.Selector(res.text)
- #print(selector)
- url_s = selector.css('.mulu > ul > li > a::attr(href)').getall()
- #print(url_s)
- for url in url_s:
- url = 'http://book.sbkk8.com/' + url#到这里取出了所有正确的可访问的URL
- #调用下载函数
- down_story(url)
- print('全部爬取完成!')
- #根据URL下载故事
- def down_story(url):
- res = requests.get(url)
- res.encoding=res.apparent_encoding#返回的进行编码
- #print(res.text)
- selector = parsel.Selector(res.text)
- title = selector.css('#maincontent > h1::text').get()
- story_list = selector.css('#content > p::text').getall()
- #print(story_list)
- with open (title+'.txt','w',encoding='utf-8') as f:
- for i in range(len(story_list)):
- f.write(story_list[i])
- print('<'+title+'>'+' 爬取成功!')
- if __name__ == "__main__":
- get_url()
复制代码
|
-
查看全部评分
总评分:好评 +2
贡献 +1
|