python爬虫，爬出来和源码不同

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

python爬虫，爬出来和源码不同

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

求教，爬移民家园的网站，爬不到有效内容，这是为什么，怎么才能爬到具体的帖子内容？（附图是用下面的代码爬下来的内容）

import urllib.request
url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
headers = {
    "User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like  Gecko) Chrome/75.0.3770.142  Safari/537.36",
 "Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
}
req = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(req)
html = response.read().decode("utf-8")
print(html)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

304 views

1 Answer

深蓝 · Answer 1 · 2021-01-26T20:53:07+0000

需要带上cookie才有数据，用一个seesion访问2次就行了

import requests
url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
headers = {
    "User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like  Gecko) Chrome/75.0.3770.142  Safari/537.36",
 "Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost",
#"Cookie": "agZD_b1dd_saltkey=s88c1OTO; agZD_b1dd_lastrequest=da9fBUNoIWsWCDoenEkJt1v2UMl1NFvuWruxtrWGzzWv%2FGdOzvGY",
}
s = requests.session()
content = s.get(url=url, headers=headers).content
content = s.get(url=url, headers=headers).content
print content.decode('gbk','ignore')

Categories

python爬虫，爬出来和源码不同

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags