I would like to extract text from an html document keeping the links inside it. for example:
From this HTML code
<div class="CssClass21">bla1 bla1 bla1 <a href="http://www.ibrii.com">go to ibrii</a> bla2 bla2 bla2 <img src="http://www.contoso.com/hello.jpg"> <span class="cssClass34">hello hello</span>
I would like to extract just this
bla1 bla1 bla1 <a href="http://www.ibrii.com">go to ibrii</a> bla2 bla2 bla2 hello hello
In another post on StackOverflow i have found the RegEx <[^>]*>
which allows to extract text by replacing every match with nothing. How can I exclude the anchor tags from the match? It seems that RegEx do not allow inverse matching.