Solr dynamic field with data importer

I’m managing a Solr installation which holds products for an e-commerce site. Lately a new feature was introduced. Namely, every product type can have specific dynamic attributes which is configurable on the Type level. For example monitor types have resolution, size, etc. while processor types have clock frequency, socket, L2 cache, etc. The problem occurred when the owners wanted to store this in Solr to be able to search on these dynamic fields.

The schema.xml look like this:

Naive start

First I assumed I can simply put it in the data-config.xml as usual.

But this didn’t result in what I wanted. Solr stored the first seen value by that query and ignored the rest and didn’t substitute ${attrs.id} as I expected.

Found the following bug report/feature request: SOLR-2039 But for now this didn’t really help me. I needed a solution as it turned out Solr data-config doesn’t support templateing in the field node.

Solution

Fortunately the DataImportHandler has a nice feature called transformer where you can specify different types of predefined transformation or what is even better you can write your own by using the script transformer (http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer). By default it expects javascript but according to the documentation: “ In case you’re using another language, specify on the script tag with attribute 'language="MyLanguage"' (must be supported by java 6)” you can write it in other languages too.

Javascript is good enough for me.

My data-config.xml now looks like this:

Which does exactly what I wanted in the first place.

Results after re-indexing using the ScriptTransformer:

Side effects

Of course it impacts the speed of the processing. The full index speed is a bit slower. According to my measurements the penalty is under 4%. And since in my case a full re-index only takes couple of seconds that 4% is negligible. On the other hand you should always do delta indexing so full index time shouldn’t really matter.

You might like these too

Postgresql full text search vs Solr Postgresql has really come a long way from being the standard but relatively slow database to the feature rich and extremely fast database what it is ...
Typeahead and autosuggest with pure Solr and Nginx A long time ago I was writing about a very simple technic which can be used to quickly provide auto-suggest for websites with the support of Solr: Inc...
Solr Stats component is available in sunburnt StatsComponent is now available with stats function in the suburnt Solr python client library. More info: http://wiki.apache.org/solr/StatsComponen...