A long time ago I was writing about a very simple technic which can be used to quickly provide auto-suggest for websites with the support of Solr: Incredibly fast Solr autosuggest . This was using the terms function of Solr which enables us to search for terms, surprisingly.
This solution is working well if you have mostly single term searches or most of your queryables are strongly related.
But what happens if you have a lot of different domains in your websites? Let’s say your selling electronics and clothes also. Weird suggestions can arise. Like typing ‘widescreen t’ in some cases may return ‘widescreen top’ or ‘widescreen trousers’. Is this relevant or close to what you were looking for? Possibly not. Most likely won’t even produce result. You want your previously typed full words to effect the suggestion. Like the image on the right.
Typeahead with relevancy match
So what we need is a typeahead where previous words are being taken into consideration and only provide autosuggest (or typeahead) which makes sense: further filter the resultset and relevant to the already typed words. Like the example below.
To remedy the situation we can tune our previous query a bit without sacrifycing any of the awesomeness of Solr.
We can treat the previous words (if any) as existing search where the last word (that we are typing at the moment) is one of the possible facets on the same full-text field. Think like this:
Search: widescreen
Facets on full-text terms:
- tv (312)
- monitor (27)
- tablet (12)
- dvd (3)
…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
○ curl 'http://localhost:8983/solr/select?q=widescreen&facet=on&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1' |pp_json { "facet_counts": { "facet_dates": {}, "facet_fields": { "text": [ "tv", 312, "monitor", 27, "tablet", 12, "dvd", 3, ... ] }, "facet_queries": {}, "facet_ranges": {} }, "response": { "docs": [], "numFound": 2497, "start": 0 } } |
So all you have to do is filter the facets which match your already type prefix. Solr has this feature build in with the so called facet.prefix parameter.
Parameter | Value | Description |
---|---|---|
q | ‘widescreen’ | the query string except the last word, in case only one word was (partially) typed this should be ‘*’ |
facet | ‘on’ | Turning faceting on |
facet.prefix | ‘t’ | The fragment of the last word being typed |
facet.field | ‘text’ | The name of the full text field that you’re querying against |
wt | ‘json’ | Make the output JSON for better parsability |
omitHeader | ‘true’ | We don’t need all the crap |
facet.limit | 5 | Limit the facets (suggestions) to 5 |
rows | 0 | Limit the results to 0 because we are not interested in the results at the moment |
facet.mincount | 1 | Only return facets which will actually have result if searched for |
In a url this looks like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
○ curl 'http://localhost:8983/solr/select?q=widescreen&facet=on&facet.prefix=t&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1' |pp_json { "facet_counts": { "facet_dates": {}, "facet_fields": { "text": [ "tv", 312, "tablet", 12, .. ] }, "facet_queries": {}, "facet_ranges": {} }, "response": { "docs": [], "numFound": 2497, "start": 0 } } |
Setting up Nginx rules
Same as we did before we can setup nginx location to proxy our query to Solr.
1 2 3 4 |
location ~ ^/suggest/ { rewrite /suggest/(.*)/(.*) /solr/select?q=$1&facet=on&facet.prefix=$2&facet.field=text&wt=json&omitHeader=true&facet.limit=5&rows=0&facet.mincount=1 break; proxy_pass http://[SOLR_HOST]:8983; } |
Please note we have used two parameters here. One is the query (the full words), second is the fragment or prefix.
Recent comments