python 如何过滤所有html标签的属性除img a标签的href属性

331 阅读 0 评论 0 点赞

你可以使用Python中的正则表达式来删除HTML标签中的属性，但考虑到HTML的复杂性，更推荐使用HTML解析库（如Beautiful Soup）来处理。以下是使用正则表达式和Beautiful Soup两种方法分别实现的示例：

使用正则表达式（不推荐，可能不够稳定）：

import re

def remove_attributes_except_src_and_href(html_text):
    # 使用正则表达式匹配HTML标签中的属性，但保留img的src和a标签的href属性
    cleaned_html = re.sub(r'<(?!img|a)([^>]+)>', r'<\1>', html_text)
    
    return cleaned_html

# 原始的HTML文本
html_content = '<p style="color: red;">This is a <a href="https://example.com">link</a>.</p> <img src="image.jpg" alt="Image">'

# 删除标签中的属性（除img的src和a标签的href属性）后的HTML文本
filtered_html = remove_attributes_except_src_and_href(html_content)
print(filtered_html)

使用Beautiful Soup（推荐）：

from bs4 import BeautifulSoup

def remove_attributes_except_src_and_href(html_text):
    soup = BeautifulSoup(html_text, 'html.parser')
    
    # 遍历所有标签
    for tag in soup.find_all():
        if tag.name != 'img' and tag.name != 'a':
            tag.attrs = {}
    
    return str(soup)

# 原始的HTML文本
html_content = '<p style="color: red;">This is a <a href="https://example.com">link</a>.</p> <img src="image.jpg" alt="Image">'

# 删除标签中的属性（除img的src和a标签的href属性）后的HTML文本
filtered_html = remove_attributes_except_src_and_href(html_content)
print(filtered_html)

无论使用哪种方法，都请在处理HTML时谨慎考虑各种情况，以确保操作的准确性和稳定性。

（本文内容根据网络资料整理和来自用户投稿，出于传递更多信息之目的，不代表本站其观点和立场。也不对其真实性、可靠性承担任何法律责任，特此声明！）

点赞(0) 打赏

本文分类：PYTHON编程
本文标签：无
浏览次数：331 次浏览
发布日期：2023-08-15 01:46:55
本文链接：https://www.yelongauto.com/index.php/PYTHONbiancheng/2073.html

上一篇 > 用python快速过滤html指定标签函数
下一篇 > 用python 正则表达式写过滤style和script标签

python 如何过滤所有html标签的属性除img a标签的href属性

评论列表共有 0 条评论

发表评论取消回复

python 如何过滤所有html标签的属性除img a标签的href属性

python png模板图片上居中加文字 半透明处理后 居中合并到另外一个图片

python 图片加文字水印 且根据文字内容的长度自动换行的3总方法

python 图片加水印且根据文字长度自动换行

python 图片加长文字中textwrap.wrap文本自动换行与填充

评论列表 共有 0 条评论

发表评论 取消回复

python png模板图片上居中加文字半透明处理后居中合并到另外一个图片

python 图片加文字水印且根据文字内容的长度自动换行的3总方法

评论列表共有 0 条评论

发表评论取消回复