Jay Taylor's notes

back to listing index

Elasticsearch - Search made easy for (web) developers

[web search]
Original source (spinscale.github.com)
Tags: elasticsearch slides spinscale.github.com
Clipped on: 2012-11-01

JSON as query language

http://host:9200/products/product/_search
{ "query" : { "term" : { "name": "MacBook Air" }}}
{ "query" : { "prefix" : { "name": "Mac" }}}
{ "query" : { "range" : { "price" : { "from" : 1000, "to": 2000 } } } }
{ "from": 0, "size": 10, "query" : { "term" : { "name": "MacBook Air" }}}
{ "sort" : { "name" :  { "order": "asc" } }, "query" : { "term" : { "name": "MacBook Air" }}}

JSON as query language

http://host:9200/products/product/_search

{ "took":206,"timed_out":false,
"_shards":{"total":1,"successful":1,"failed":0},
"hits":{ "total":1,"max_score":2.098612,
  "hits":[ {
    "_index":"products1","_type":"product","_id":"1",
    "_score":2.098612, "_source" : {
      "id": "1",
      "name" : "MacBook Air",
      "price": 1099,
      "descr" : "Some lengthy never-read description", 
      "attributes" : {
        "color" : "silver",
        "display" : 13.3,
        "ram" : 4
      }
}}]}}

Configuration - Mapping

  • On indexing the JSON document is parsed and all data types are extracted
  • Mapping fields to datatypes is done automatically on first indexing
  • Can be configured on a per-type basis
  • Strings can have their own analyzer
  • Sample types: float, long, boolean, date (+formatting), object
  • One field can have multiple fields analyzed differently (lowercase, query)

Sample mapping


{
    "product": {
        "properties": {
            "ProductId":            { "type": "string", "index": "not_analyzed" },

            "ProductEnabled":       { "type": "boolean" },
            "PiecesIncluded":       { "type": "long" },
            "LastModified":         { "type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS" },

            "AvailableInventory":   { "type": "float" },
            "Price":                { "type": "float" },

            "LongDescription":      { "type": "string", "include_in_all" : true },
            
            "ProductName" : {
                "type" : "multi_field",
                "include_in_all" : true,
                "fields" : {
                    "ProductName":  { "type": "string", "index": "not_analyzed" },
                    "lowercase":    { "type": "string", "analyzer": "lowercase_analyzer" },
                    "suggest" :     { "type": "string", "analyzer": "suggest_analyzer" }
                }
            }
        }
    }
}    
    

Configuration - Analyzers

  • An analyzer consists of a Tokenizer and an arbitrary amount of filters
  • Example:
suggest_analyzer:
  type: custom
  tokenizer: whitespace
  filter: [standard, lowercase, shingle]
  • Stripping html code:
    char_filter: html_strip

Java API - Creating a client

Settings settings = ImmutableSettings.settingsBuilder().
    put("cluster.name", clusterName).build();

InetSocketTransportAddress addr = 
    new InetSocketTransportAddress(host, port)

Client client = new TransportClient(settings).
    addTransportAddress(addr);

Starting an embedded server

File config = new File("elasticsearch-local.yml");
String config = FileUtils.readFileToString(config);

Builder settingsBuilder = ImmutableSettings.settingsBuilder().
    loadFromSource(config);

Node node = NodeBuilder.nodeBuilder().
    settings(settingsBuilder).node();

Client client = node.client();