Stripping extended ascii characters from MS word copy/paste

1 minute read

(( this isnt my work, I just posted it here so I wouldnt loose it. I'll post the author's name and website when I find it ))





Expecting to see all sorts of copy-n-pasted text from ms word
containing things like the ellipse (…), left single-quote (‘),
right single-quote (’), and smart quotes (left double-quote - “
and right double-quote - ”) along with all sorts of naughty high-ascii
characters? then do the right thing and convert them to their numeric
character entity. using cold fusion this is as simple as constructing two
lists (one containing all the actual characters which you'll use to search
for and the other containing the appropriate numeric character entities
which you'll replace them with).

<textarea>
<!--- initialize the variable - bad_chars --->
<cfparam name="bad_chars" default="">
<!--- initialize the variable - good_chars --->
<cfparam name="good_chars" default="">
<!--- loop through the range of high-ascii --->
<cfloop index="i" from="127" to="255">
<!--- append each high-ascii character to a list
contained in the variable bad_chars --->
<cfset bad_chars =" ListAppend(bad_chars,">
<!--- append each numeric character entity
representation of the high-ascii character to
a list contained in the variable good_chars --->
<cfset good_chars =" ListAppend(good_chars," class="452291620-19022007"> </span>&amp; NumberFormat(i, '0000') & ";")>
</cfloop>

<cfquery name="InsertReport" datasource="mysource">
INSERT INTO reportstable (
Content
)
VALUES (
'#Trim(ReplaceList(Content, bad_chars, good_chars))#'
)
</cfquery>
<cfparam name="bad_chars" default=""><cfparam name="good_chars" default=""><cfloop index="i" from="127" to="255"><cfset bad_chars=" ListAppend(bad_chars,"><cfquery name="InsertReport" datasource="mysource"></cfquery></cfset></cfloop></cfparam></cfparam>
</textarea>