Quantcast
Channel: Adobe Community: Message List - ColdFusion
Viewing all articles
Browse latest Browse all 21760

Re: I'm trying to remove the html from my solr collections.

$
0
0

I'm indexing an sql database...

the specific field contains the html for the selected page.

 

(<p> <span style="font-family ect...)

 

I've tried several things to try and remove the html from the search results. I've tried various striphtml fuctions that haven't work along with trying to do it in the solr schema... I recently converted all of my collections over to solr hoping for better search results. AND in verity the following code (that's still in place, don't know why it won't work in solr) was working perfect in the aspect of removing the html.

 

*************************************

example a.

<cfset searchterm = rereplace(searchterm, '%20', ' ', 'all')>

<cfset searchterm = rereplace(searchterm, "acute", "'", "all")>

<cfset searchterm = rereplace(searchterm, "\(", "", "all")>

<cfset searchterm = rereplace(searchterm, "\)", "", "all")>

<cfset searchterm = rereplace(searchterm, "\/", " ", "all")>

<cfset searchterm = rereplace(searchterm, "\\", " ", "all")>

 

<cfsearch name = "getSearchResults2"

collection = "s_mysamplepage"

criteria = "#searchterm#"

status = "info"

ContextPassages = "10"

ContextBytes = "500"

suggestions = "Always"

contextHighlightBegin = "<font color=red><strong>"

contextHighlightEnd = "</strong><font>">

<cfcatch>

<cfoutput>

<p> Invalid Search Criteria.</p>

</cfoutput>

****************************************************

Also included in the output query....

*******************************

<cfoutput query="getSearchResults2">

    <cftry>

          <cfset cleaned = rereplaced(Context, "<.*?>", "", "all")>

          <cfset cleaned = rereplaced(cleaned, "<.*?$", "", "all")>

          <cfset cleaned = rereplaced(cleaned, "^.*?>", "", "all")>

          <cfset cleaned = rereplaceNoCase(cleaned, "#searchterm#", "<font color=red><b>#searchterm#</b></font>", "all")>

          <cfset currPage = replace(URL, '/', '0', 'all')>

    <cfcatch></cfcatch>

    </cftry>

********************************

 

Now I've been pulling my hair out trying to get this to work from the getSearchResults2 query...

Is it possible to strip out the HTML when making the collection?????

what about stripping it during the index??????

 

Any help is appreciated....

 

Thanx

 




Viewing all articles
Browse latest Browse all 21760

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>