I’m managing a Solr installation which holds products for an e-commerce site. Lately a new feature was introduced. Namely, every product type can have specific dynamic attributes which is configurable on the Type level. For example monitor types have resolution, size, etc. while processor types have clock frequency, socket, L2 cache, etc. The problem occurred when the owners wanted to store this in Solr to be able to search on these dynamic fields.
The schema.xml look like this:
1 |
<dynamicField name="attr_*" type="string" indexed="true" stored="true"/> |
Naive start
First I assumed I can simply put it in the data-config.xml as usual.
1 2 3 |
<entity name="attrs" query="SELECT attribute_id as id, raw_value FROM ware_wareattribute WHERE ware_id = ${ware.id}"> <field name="attr_${attrs.id}" column="raw_value"/> </entity> |
But this didn’t result in what I wanted. Solr stored the first seen value by that query and ignored the rest and didn’t substitute ${attrs.id} as I expected.
1 |
<str name="attr_">8</str> |
Found the following bug report/feature request: SOLR-2039 But for now this didn’t really help me. I needed a solution as it turned out Solr data-config doesn’t support templateing in the field node.
Solution
Fortunately the DataImportHandler has a nice feature called transformer where you can specify different types of predefined transformation or what is even better you can write your own by using the script transformer (http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer). By default it expects javascript but according to the documentation: “ In case you’re using another language, specify on the script tag with attribute 'language="MyLanguage"' (must be supported by java 6)” you can write it in other languages too.
Javascript is good enough for me.
My data-config.xml now looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<dataConfig> <script><![CDATA[ function WareAttributes(row){ row.put('attr_' + row.get('id'), row.get('raw_value') ); row.remove('id'); row.remove('raw_value'); return row; } ]]></script> ... <entity name="attrs" query="SELECT attribute_id as id, raw_value FROM ware_wareattribute WHERE ware_id = ${ware.id}" transformer="script:WareAttributes"/> </entity> </document> </dataConfig> |
Which does exactly what I wanted in the first place.
Results after re-indexing using the ScriptTransformer:
1 2 3 |
<str name="attr_1">80</str> <str name="attr_2">12</str> <str name="attr_3">8</str> |
Side effects
Of course it impacts the speed of the processing. The full index speed is a bit slower. According to my measurements the penalty is under 4%. And since in my case a full re-index only takes couple of seconds that 4% is negligible. On the other hand you should always do delta indexing so full index time shouldn’t really matter.
Recent comments