Solr dynamic field with data importer

I’m managing a Solr installation which holds products for an e-commerce site. Lately a new feature was introduced. Namely, every product type can have specific dynamic attributes which is configurable on the Type level. For example monitor types have resolution, size, etc. while processor types have clock frequency, socket, L2 cache, etc. The problem occurred when the owners wanted to store this in Solr to be able to search on these dynamic fields.

The schema.xml look like this:

Naive start

First I assumed I can simply put it in the data-config.xml as usual.

But this didn’t result in what I wanted. Solr stored the first seen value by that query and ignored the rest and didn’t substitute ${attrs.id} as I expected.

Found the following bug report/feature request: SOLR-2039 But for now this didn’t really help me. I needed a solution as it turned out Solr data-config doesn’t support templateing in the field node.

Solution

Fortunately the DataImportHandler has a nice feature called transformer where you can specify different types of predefined transformation or what is even better you can write your own by using the script transformer (http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer). By default it expects javascript but according to the documentation: “ In case you’re using another language, specify on the script tag with attribute 'language="MyLanguage"' (must be supported by java 6)” you can write it in other languages too.

Javascript is good enough for me.

My data-config.xml now looks like this:

Which does exactly what I wanted in the first place.

Results after re-indexing using the ScriptTransformer:

Side effects

Of course it impacts the speed of the processing. The full index speed is a bit slower. According to my measurements the penalty is under 4%. And since in my case a full re-index only takes couple of seconds that 4% is negligible. On the other hand you should always do delta indexing so full index time shouldn’t really matter.

You might like these too

Solr benchmark – first blood This is a quick impression about the freshly installed Solr 3.5 server. Enviroment The base system is a Amazon Microinstance equivalent virtual mach...
Postgresql full text search vs Solr Postgresql has really come a long way from being the standard but relatively slow database to the feature rich and extremely fast database what it is ...
Dataimport handler for Sunburnt Solr python librar... Dataimport handler I had to trigger the dataimporter delta-import command from the code so I added support for this function. Example: Followin...