Tuesday, August 19, 2008

Stripping HTML Code / Tags in C#

I needed a function today to take a string of HTML code and to return only the embedded text, i.e. to remove all html code.
As such I knocked up a simple function using regular expressions in C# - I have added this code below, it may be of use to someone.

Please note: You will need to import the system.text.regularexpressions namespace for this to work.

Here is the function:

private string stripHTML(string inputHTML)
{
inputHTML = inputHTML.Trim();
string toReturn = Regex.Replace(inputHTML, @"<(.\n)*?>", string.Empty);
return toReturn;
}