Normally, I'd turn to a regular expression to handle this, however in SQL server, to use regular expressions you need to mess about with COM to get a VBScript interface etc, and I'm lazy, so I wanted a simpler way. A bit of research turned up an article at http://blog.sqlauthority.com/ which appeared to do exactly what I needed using just a normal user-defined function, and as luck would have it, it had only been written 11 days ago, so my timing was pretty good ;) Of course, it wasn't exactly what I needed, but it was pretty close, and after a few changes (mostly copy-paste code to handle each HTML entity that was likely to appear), I had a lovely new function which did exactly what I needed it to do. The performance of it is probably pretty dire, but that's acceptable in this case.
Here's the code, should anyone need it:
ALTER FUNCTION [dbo].[udf_StripHTML]
(
@HTMLText varchar(MAX)
)
RETURNS varchar(MAX)
AS
BEGIN
DECLARE @Start int
DECLARE @End int
DECLARE @Length int
-- Replace the HTML entity & with the '&' character (this needs to be done first, as
-- '&' might be double encoded as '&')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity < with the '<' character
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity > with the '>' character
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>')
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity & with the '&' character
SET @Start = CHARINDEX('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity with the ' ' character
SET @Start = CHARINDEX(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ')
SET @Start = CHARINDEX(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Replace any <br> tags with a newline
SET @Start = CHARINDEX('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace any <br/> tags with a newline
SET @Start = CHARINDEX('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace any <br /> tags with a newline
SET @Start = CHARINDEX('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)')
SET @Start = CHARINDEX('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Remove anything between <whatever> tags
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
Of course, most of the credit for this goes to Pinal Dave of SQLAuthority.com, I just hacked his code around a bit to suit my particular needs.
Note the formatting is going a bit awry, mostly due to HTML entities in the code. I'll fix this when I get a chance
12 comments:
Hello,
Good improvement to original code. I have posted link to your version on my blog as well. Users can use either version.
Link to this article
Regards,
Pinal Dave (SQLAuthority.com)
Excellent code. Thanks a lot for posting it, it really helped me out.
Perfect!! THX
Very helpful - Thanks.
Only change I had to make was to allow for quotes (").
This code is really helpfull for my reports.
Excellent code.
Thanks a lot for posting it.
Thanks,
MJ
hey,
thanks for script, I put it into my master project :)
Of course with source-link to you page :)
Best regards.
I understand why & is in a loop, but why not just do a replace for other entities?
select @html = replace(@html, '"', '"')
select @html = replace(@html, '<', '<')
or you could nest those as
select @html = replace(replace(replace(
etc.
It doesn't work to remove the comments when there're tags inside of it. This happens when the HTML is generated by MS Word.
Here's a new version of it:
http://blog.this.com.ar/2010/11/stripping-html-from-text-in-sql.html
Is there anyway I can get this function to ignore sup tags? Thanks
I tried using this function and got this error msg:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'font'.
Msg 132, Level 15, State 1, Line 2
The label 'size' has already been declared. Label names must be unique within a query batch or stored procedure.
Msg 132, Level 15, State 1, Line 2
The label 'family' has already been declared. Label names must be unique within a query batch or stored procedure.
Msg 132, Level 15, State 1, Line 2
it is helpful to my project....but some modification
Post a Comment