关于爬虫遇见了"input type=hidden",然后想直接用lxml，不用selenium的办法-SofaSofa

我想要去爬取同花顺网站，根据股票的信息，想要了解他们的高管信息，所以一切都挺顺利的。

但是在爬取的时候，发现“检查”里有我需要的信息标签，而网页源代码里却找不到这些信息，几乎是掩盖了，我观察到自己需要的数据标签中“检查”中含有"input type=hidden"字段，觉得应该是有影响的，导致我抓不到信息。

想找到一些办法.......

...http://stockpage.10jqka.com.cn/000002/company/#manager...#目标网址

 html = etree.HTML(content)
# 使用xpath找到该网页所有高管的信息，储存到divs里(div->[@id=ml_001]all  then 点击获取->class=person_table)
 divs = html.xpath('//div[@id="ml_001"]//div[contains(@class, "person_table")]')
 print(divs)
# 对于每一位高管，信息都是相对储存的
for div in divs:
# 我们要以键值对的形式填写csv文件
item = {}
# 获取标签下的信息
item['name'] = div.xpath('.//thead/tr/td/h3/text()')[0].replace(',', '-')
item['jobs'] = div.xpath('.//thead/tr/td[2]/text()')[0].replace(',', '/')
gender_age_education = div.xpath('.//thead/tr[2]/td[1]/text()')[0].split()

Cypher 2020-06-05 19:01

关于爬虫遇见了"input type=hidden",然后想直接用lxml，不用selenium的办法

Warning

还没有回答。我来答！