ElasticSearch DSL

2018/05/25 Tool

简介

ElasticSearch 使用自定义的 json 格式的语句进行各种各样的查询,使用 POST 请求,将 json 放到请求体里:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'

返回内容:

{
  "took" : 26,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "1",
      "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    }, {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "6",
      "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
    }, {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "13",

took:是查询花费的时间,毫秒单位

time_out:标识查询是否超时

_shards:描述了查询分片的信息,查询了多少个分片、成功的分片数量、失败的分片数量等

hits:搜索的结果,total是全部的满足的文档数目,hits是返回的实际数目(默认是10)

_score是文档的分数信息,与排名相关度有关,参考各大搜索引擎的搜索结果,就容易理解。

简单查询

Term Query

直接使用关键字进行查询,不对关键字进行分词, 文档中必须包含整个搜索的词:

{
  "from": 0,
  "size": 10,
  "query": {
    "term": {
      "pkg": "com.youku.phone"
    }
  },
  "sort": {
    "date": {
      "order": "desc"
    }
  }
}

Terms Query

指定多个关键字进行查询:

{ 
  "from": 0,
  "size": 10,
  "query": {
    "terms": {
      "tags": ["novel","book"]
    }
  }
}

Match Query

匹配的时候,会将查询的关键字进行分词,合并每个分词后的查询结果:

{
  "from": 0,
  "size": 10,
  "query": {
    "match":{
      "content" : {
        "query" : "我的宝马多少马力"
      }
    }
  }
}

Match Phrase Query

在 Match Query 的基础上更进一步,筛选所有分词都匹配的结果:

{
  "from": 0,
  "size": 10,
  "query": {
    "match_phrase": {
      "content" : {
        "query" : "我的宝马多少马力"
      }
    }
  }
}

Multi Match Query

多个字段进行匹配:

{
  "query": {
    "multi_match": {
        "query" : "我的宝马多少马力",
        "fields" : ["title", "content"]
    }
  }
}

Wildcard Query

支持 * 和 ?:

{ 
    "query": {
        "wildcard": {
             "title": "cr?me"
        }
    }
}

Range Query

{ 
    "query": {
        "range": {
             "year": {
                  "gte" :1890,
                  "lte":1900
              }
        }
    }
}

Regex Query

{ 
    "query": {
        "regexp": {
             "title": {
                  "value" :"cr.m[ae]",
                  "boost":10.0
              }
        }
    }
}

Query String

将查询语句组装成字符串传递给 ES:

{
  "query" : {
    "query_string" : {
      "query" : "\"com.tencent.mm\"",
      "fields" : [ "pkg" ]
    }
  }
}

Exists Query

没有 Nested Object 类型下:

 {
  "from" : 0,
  "size" : 10,
  "query" : {
    "exists":{
        "field":"age"
    }
  }
}

有 Nested Object 类型下:

 {
  "from" : 0,
  "size" : 10,
  "query" : {
      "nested":{
          "path":"apps",
          "query":{
            "exists":{
                "field":"apps.scene_class"
            }
          }
      }
  }
}

Bool Query

Bool Query 可以将很多查询条件组合起来, 组合条件支持 must, must_nout, should, filter

{
  "query" : {
    "bool" : {
      "must" : [ {
        "query_string" : {
          "query" : "\"com.tencent.mm\"",
          "fields" : [ "pkg" ]
        }
      }, {
        "query_string" : {
          "query" : "\"com.xiaomi.channel\"",
          "fields" : [ "pkg" ]
        }
      } ]
    }
  }
}

聚合查询

聚合查询分为 metric 聚合和 Bucketing 聚合, 两者又可以嵌套起来使用, 查询格式如下:

"aggregations" : {                  // 表示聚合操作,可以使用aggs替代
    "<aggregation_name>" : {        // 聚合名,可以是任意的字符串。用做响应的key,便于快速取得正确的响应数据。
        "<aggregation_type>" : {    // 聚合类别,就是各种类型的聚合,如min等
            "<aggregation_body>":{}// 聚合体,不同的聚合有不同的body
        }
        [,"aggregations" : {  }]    // 嵌套的子聚合,可以有0或多个
    }
    [,"<aggregation_name_2>" : {  } ] // 另外的聚合,可以有0或多个
}

返回结果格式为:

{
"aggregations": {                    // 聚合结果
    "<aggregation_name>": {          // 前面输入的聚合名
       "<aggregation_body>":{}             // 聚合后的数据 不同的聚合格式可能不同
    }
  }
}

聚合也可以嵌套使用 filter, 如下中内部嵌套的聚合是经过 filter 过滤后的文档集合中进行聚合:

{
    "aggregations" : {
        "<aggregation_name>" : {
            "filter" : "<query_body>",
            "aggregations" : {
                "<aggregation_name>" : "<aggregation_body>"
            }
        }
    }
}

也可以单独只使用 filter:

 {
  "from" : 0,
  "size" : 0,
  "query" : {
    "bool" : { }
  },
  "aggregations" : {
        "生活消费" : {
            "filter" : {
                "nested" : {
                    "path" : "apps",
                    "query" : {
                            "term":{
                                "apps.scene_class":"生活消费"
                            }
                        }
                    }
                }
            }
      }
}

metric

metric 聚合用于对数字类型的数据进行计算。

Min Aggregation

获取最小值

{
  "query": {},
  "aggs": {
    "min_age": {
      "min": {
        "field": "age"
      }
    }
  }
}

Max Aggregation

获取最大值

{
  "query": {}, 
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}

Sum Aggregation

求和

{
  "query": {}, 
  "aggs": {
    "sum_age": {
      "sum": {
        "field": "age"
      }
    }
  }
}

Avg Aggregation

计算平均值

{
  "query": {}, 
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age"
      }
    }
  }
}

NOTE: 计算的时候字段不存在的文档将被忽略, 也就是分子不算其值, 分母不 +1, 除非使用 "avg":{"field":"age","missing":10} 设置其默认值

Bucketing

普通分组

{
  "query": {}, 
  "aggs": {
    "brand": {
      "terms": {
        "field": "brand"
      }
    },
    "size":50
  }
}

Nested 聚合

{
  "from": 0,
  "size": 10,
  "aggregations": {
    "min_age": {
      "nested": {
        "path": "apps "
      },
      "aggregations": {
        "sum_age": {
          "terms": {
            "field": "apps.times"
          }
        }
      }
    }
  }
}

Search

    Table of Contents