Solr Multithreaded concurrent atomic updates problem.

Solr Multithreaded concurrent atomic updates problem.

Solr Multithreaded concurrent atomic updates problem:
Solr has few limitations for the data ingestion, as it doesn’t provide row level lock over document.
I face this problem while uploading data in bulk to solr5 in multithread environment and I solved it by solrj client side lock.
When concurrent threads try to make atomic update on a multivalued field of a document at the same time, few threads changes get overridden and it happens because last thread update take sometime to get indexed.

Data ingestion scenario:
There are two tables in RDBMS and I need to denormalize in solr, Steps I was following for atomic/partial document update-
1- Fetch the existing document.
2- Update the single value fields if required and add/set the new values to multivlaued fields.
3- Update the final document back to solr.

e.g.

collection fields-
field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
field name="address" type="text_general" indexed="true" stored="true" multiValued="true"/>
field name="name" type="text_general" indexed="true" stored="true" />
field name="_version_" type="long" indexed="true" stored="true"/>

img1

img1


img2

img2

I followed below steps to create client side lock to resolve this problem:

  • Store last recently updates in last recently used set LRUSet.
  • Set maximum number of elements limit in LRUSet
  • if new update present in LRUSet then check whether that document is indexed succesfully or not.
  • if document is indexed then make atomic update set/add to solr else wait current thread until document is indexed in solr successfuly
  • Add or replace new entry in LRUSet.

Please post your comment if you have any queries.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.