1

I can't get a count of fields with a filtered document value.

I have this json

`` {

"took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "net",
        "_type" : "_doc",
        "_id" : "RTHRTH",
        "_score" : 1.0,
        "_source" : {
          "created_at" : "2020-05-31 19:01:01",
          "data" : [...]
      {
        "_index" : "net",
        "_type" : "_doc",
        "_id" : "LLLoIJBHHM",
        "_score" : 1.0,
        "_source" : {
          "created_at" : "2020-06-23 15:11:59",
          "data" : [...]
        }
      }
    ]
  }
}
``

In the "data" field, there are more fields within other fields respectively.

I want to filter the most recent document, and then count a certain value in the most recent document. This is my query:

`{
  "query": {
    "match": {
      "name.keyword": "net"
    }
  },
  "sort": [
    {
      "created_at.keyword": {
        "order": "desc"
      }
    }
  ],
  "size": 1,
  "aggs": {
    "CountValue": {
      "terms": {
        "field": "data.add.serv.desc.keyword",
        "include": "nginx"
      }
    }
  }
}`

And the output is:

`{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "CountValue" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "nginx",
          "doc_count" : 2
        }
      ]
    }
  }`

I suspect that doc_count is the number of documents the value appears in, not the number of times the value is repeated within the filtered document. Any advice I will be very grateful!

1 Answer 1

0

Unless any of the fields under the path data.add.serv are of the nested type, the terms agg will produce per-whole-doc results, not per-field.

Exempli gratia:

POST example/_doc
{
  "serv": [
    {
      "desc": "nginx"
    },
    {
      "desc": "nginx"
    },
    {
      "desc": "nginx"
    }
  ]
}

then 


GET example/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "serv.desc.keyword"
      }
    }
  }
}

produces doc_count==1.

When, however, specified as nested:

DELETE example

PUT example
{
  "mappings": {
    "properties": {
      "serv": {
        "type": "nested"
      }
    }
  }
}

POST example/_doc
{"serv":[{"desc":"nginx"},{"desc":"nginx"},{"desc":"nginx"}]}

then

GET example/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "nested": {
        "path": "serv"
      },
      "aggs": {
        "NAME": {
          "terms": {
            "field": "serv.desc.keyword"
          }
        }
      }
    }
  }
}

we end up with doc_count==3.

This has to do with the way non-nested array types are flattened and de-duplicated. At the end, you may need to reindex your collections after having applied the nested mapping.


EDIT

In order to only take the latest doc, you could do the following:

PUT example
{
  "mappings": {
    "properties": {
      "serv": {
        "type": "nested"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

then

POST example/_doc
{
  "created_at" : "2020-05-31 19:01:01",
  "serv": [
    {
      "desc": "nginx"
    },
    {
      "desc": "nginx"
    },
    {
      "desc": "nginx"
    }
  ]
}

POST example/_doc
{
  "created_at" : "2020-06-23 15:11:59",
  "serv": [
    {
      "desc": "nginx"
    },
    {
      "desc": "nginx"
    }
  ]
}

then use a terms agg of size 1, sorted by timestamp desc:

GET example/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "created_at",
        "order": {
          "_term": "desc"
        },
        "size": 1
      },
      "aggs": {
        "NAME2": {
          "nested": {
            "path": "serv"
          },
          "aggs": {
            "NAME": {
              "terms": {
                "field": "serv.desc.keyword"
              }
            }
          }
        }
      }
    }
  }
}
2
  • Thank you, That was the answer to the "count" problem. I need to filter by the document with the most recent date If you know the way without applying a range between two values, it would be very useful to me. Anyway thank yoy so much Joe Jul 24, 2020 at 12:35
  • Hello Joe, is it possible to get aggregation of true values ​​in boolean field? stackoverflow.com/questions/63093724/… Jul 25, 2020 at 21:27

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.