美丽的汤和虫子[2]

查找信息

<>.find_all(name,attrs,recursive,string,**kwargs)返回一个列表类型,存储查找的结果

name:对标签名检索

attrs:对标签属性值的检索字符串,可标注属性检索

recursive:是否对子孙全部检索,默认为true

string:找字符串

import requests
from bs4 import BeautifulSoup
r= requests.get("http://python123.io/ws/demo.html")
demo=r.text
soup=BeautifulSoup(demo,"html.parser")
for tag in soup.find_all(True):
    print(tag.name)
==================================================
html
head
title
body
p
b
p
a
a

其中soup为

<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice
to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>

可以看出,每一个尖括号里的标签名称都被打印出来了

如果只想在其中找包含’b’字符的标签,那要咋办呢

下面有请,正则表达式库:

import re
for tag in soup.find_all(re.compile('b')):
    print(tag.name)
============================
body
b

也可以寻找带有某个属性值的标签,对比2段不同的查找代码(以后就省略find_all了

print(soup('p'))
========================================================
[<p class="title"><b>The demo python introduces several python courses.</b></p>, 
<p class="course">Python is a
wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>]


print(soup('p','course'))
=============================================================
[<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>]

显然可以看出,第二条语句的结果中没有了第一个p标签

再来

print(soup(id='link1'))
==========================================================
[<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>]

找一个id域等于1的标签元素

soup('a',recursive=false)
==================
[]

接下来找字符串

print(soup(string=re.compile('python')))
================================
['This is a python demo page', 'The demo python introduces several python courses.']

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注