In this post we will learn how to remove HTML tags from string in C#. While working in project we may need this and here below will share working sample code for the same with details.
A C# String might include HTML elements, and our objective is to eliminate them. This is advantageous for exhibiting HTML content as plain text and eliminating decorations such as bold and italics.
For more such topics you can Search or Visit our C# Section , IIS Section & SQL Section too.
There can be multiple ways but let’s discuss following two ways to handle this:
- Use RegEx to remove HTML tags.
- Using custom logic of replacing the HTML tags
Let’s see how to do this using RegEx
// 1. RegEx removal
System.Text.RegularExpressions.Regex rx = new System.Text.RegularExpressions.Regex("<[^>]*>");
FinalData = rx.Replace(FinalData, "");
Remove HTML tags from string C# without Regex:
I below method you can pass your HTML string and it do the needful. It will check for ‘<‘ and ‘>’ and will remove all the data lying in-between these braces. It is easier to implement and customizable as per our needs.
public string RemoveHTMLTagsCharArray(string html)
{
char[] charArray = new char[html.Length];
int index = 0;
bool isInside = false;
for (int i = 0; i < html.Length; i++)
{
char left = html[i];
if (left == '<')
{
isInside = true;
continue;
}
if (left == '>')
{
isInside = false;
continue;
}
if (!isInside)
{
charArray[index] = left;
index++;
}
}
return new string(charArray, 0, index);
}
You can also check custom logic to remove required tags from HTML like
// removing ul from HTML string.
//we have taken <ul (without “>” )so that it can also pick all ul which have style in it.
while (webdata.ToLower().IndexOf("<ul") > 0) // removing ul list
{
try
{
string ul = webdata.Substring(webdata.IndexOf("<ul"), (webdata.IndexOf("</ul>") + 5) - webdata.IndexOf("<ul"));
webdata = webdata.Replace(ul, "");
}
catch (Exception ex)
{
string errormsg = ex.Message;
counter = counter + 1;
if (counter > 100)
{
/* goto is used so that we can skip, if error
occurred because of bad HTML tags*/
goto footer;
}
}
}
footer:
C# remove div tags from HTML String:
You can check below method which is using RegEx to remove HTML tags. Please try this method; it will defiantly work.
public static string removeHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
How to remove all html tags using Java script
you can use below in your JS-Code
item = item.replace(/<(.|\n)*?>/g, '');
You can check more on RegEx Here.
I hope this post will help you about removing the HTML tags from your string or web response.
You can check more on Microsoft Learn and for more such topics you can Search or Visit our C# Section too.