vb.net - Stripping all html tags with Html Agility Pack

Question

Welcome To Ask or Share your Answers For Others

vb.net - Stripping all html tags with Html Agility Pack

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

vb.net - Stripping all html tags with Html Agility Pack

I have a html string like this:

<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>

I wish to strip all html tags so that the resulting string becomes:

foo bar baz

From another post here at SO I've come up with this function (which uses the Html Agility Pack):

  Public Shared Function stripTags(ByVal html As String) As String
    Dim plain As String = String.Empty
    Dim htmldoc As New HtmlAgilityPack.HtmlDocument

    htmldoc.LoadHtml(html)
    Dim invalidNodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//html|//body|//p|//a")

    If Not htmldoc Is Nothing Then
      For Each node In invalidNodes
        node.ParentNode.RemoveChild(node, True)
      Next
    End If

    Return htmldoc.DocumentNode.WriteContentTo
  End Function

Unfortunately this does not return what I expect, instead it gives:

bazbarfoo

Please, where do I go wrong - and is this the best approach?

Regards and happy coding!

UPDATE: by the answer below I came up with this function, might be usefull to others:

  Public Shared Function stripTags(ByVal html As String) As String
    Dim htmldoc As New HtmlAgilityPack.HtmlDocument
    htmldoc.LoadHtml(html.Replace("</p>", "</p>" & New String(Environment.NewLine, 2)).Replace("<br/>", Environment.NewLine))
    Return htmldoc.DocumentNode.InnerText
  End Function

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:39:35+0000

Why not just return htmldoc.DocumentNode.InnerText instead of removing all the non-text nodes? It should give you what you want.

Categories

vb.net - Stripping all html tags with Html Agility Pack

vb.net - Stripping all html tags with Html Agility Pack

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags