-    ≉  Under 1 min read

Reindex Subset Data in Elasticsearch

The Elasticsearch Reindex API is a powerful way to index a subset of data from existing data. If you think of a long term statistics solution, you can aggregate data and store the aggregated values instead the atomic details. In my company we have an index that contains approximately 150 fields in each document. For a longterm solution only 30 are relevant. The Reindex API can just fetch the 30 desired fields and store them in a new index.

The reindex template

curl -XPOST "http://elasticsearch:9200/_reindex" -H 'Content-Type: application/json' -d'{
"source": {
 "index": "source-index-2017.07.26",
  "_source": [
     "field_1",
     "field_2",
     ..
     "field_30",      
   ],
   "query": {
     "match_all": {}
   }
},
"dest": {
 "index": "target-index-2017.07.26"
}}'

The general approach is to use source filtering for reindex action.


Tan-Vinh Nguyen

Just a coder

Similar Stories


Data

Elasticsearch Range Query

An accident in the Elasticsearch universe. Instead writing to an daily index it was index to a yearly index. Now I had to check the date range of the documents. The Read on

IT

Elasticsearch Date Processor Pipeline

I write some configuration documents with the Elasticsearch low level Java Rest Client. The documents are missing a timestamp, therefore I define a simple pipeline, which adds the Read on

IT

Import Currency codes into Elasticsearch

Working in the financial business requires to have the currency code master-data accessible for various reasons. The ISO 4217 currency codes can be obtained from the Read on