Merge branch 'main' into disableServerlessMLCache

elastic · May 17, 2024 · d0c88e4 · d0c88e4
2 parents 8fa8129 + befb6ff
commit d0c88e4
Show file tree

Hide file tree

Showing 54 changed files with 1,540 additions and 444 deletions.
diff --git a/docs/changelog/108679.yaml b/docs/changelog/108679.yaml
@@ -0,0 +1,6 @@
+pr: 108679
+summary: Suppress deprecation warnings from ingest pipelines when deleting trained model
+area: Machine Learning
+type: bug
+issues:
+ - 105004
diff --git a/docs/changelog/108780.yaml b/docs/changelog/108780.yaml
@@ -0,0 +1,6 @@
+pr: 108780
+summary: Add `continent_code` support to the geoip processor
+area: Ingest Node
+type: enhancement
+issues:
+ - 85820
diff --git a/docs/changelog/108786.yaml b/docs/changelog/108786.yaml
@@ -0,0 +1,5 @@
+pr: 108786
+summary: Make ingest byte stat names more descriptive
+area: Ingest Node
+type: enhancement
+issues: []
diff --git a/docs/reference/cluster/cluster-info.asciidoc b/docs/reference/cluster/cluster-info.asciidoc
@@ -207,14 +207,14 @@ pipeline.
 (integer)
 Total number of failed operations for the ingest pipeline.
 
-`ingested_in_bytes`::
+`ingested_as_first_pipeline_in_bytes`::
 (Optional, integer)
 Total number of bytes of all documents ingested by the pipeline.
 This field is only present on pipelines which are the first to process a document.
 Thus, it is not present on pipelines which only serve as a final pipeline after a default pipeline, a pipeline run after
 a reroute processor, or pipelines in pipeline processors.
 
-`produced_in_bytes`::
+`produced_as_first_pipeline_in_bytes`::
 (Optional, integer)
 Total number of bytes of all documents produced by the pipeline.
 This field is only present on pipelines which are the first to process a document.

diff --git a/docs/reference/cluster/nodes-stats.asciidoc b/docs/reference/cluster/nodes-stats.asciidoc
@@ -2643,14 +2643,14 @@ pipeline.
 (integer)
 Total number of failed operations for the ingest pipeline.
 
-`ingested_in_bytes`::
+`ingested_as_first_pipeline_in_bytes`::
 (Optional, integer)
 Total number of bytes of all documents ingested by the pipeline.
 This field is only present on pipelines which are the first to process a document.
 Thus, it is not present on pipelines which only serve as a final pipeline after a default pipeline, a pipeline run after
 a reroute processor, or pipelines in pipeline processors.
 
-`produced_in_bytes`::
+`produced_as_first_pipeline_in_bytes`::
 (Optional, integer)
 Total number of bytes of all documents produced by the pipeline.
 This field is only present on pipelines which are the first to process a document.

diff --git a/docs/reference/ilm/actions/ilm-rollover.asciidoc b/docs/reference/ilm/actions/ilm-rollover.asciidoc
@@ -7,6 +7,13 @@ Phases allowed: hot.
 Rolls over a target to a new index when the existing index satisfies
 the specified rollover conditions.
 
+[NOTE]
+====
+When an index is rolled over, the previous index's age is updated to reflect the rollover time. 
+This date, rather than the index's `creation_date`, is used in {ilm} 
+`min_age` phase calculations. <<min-age-calculation,Learn more>>.
+====
+
 IMPORTANT: If the rollover action is used on a <<ccr-put-follow,follower index>>,
 policy execution waits until the leader index rolls over (or is
 <<skipping-rollover, otherwise marked complete>>),
@@ -46,11 +53,11 @@ PUT my-index-000001
 [[ilm-rollover-options]]
 ==== Options
 
-A rollover action must specify at least one max_* condition, it may include zero
-or more min_* conditions. An empty rollover action is invalid.
+A rollover action must specify at least one `max_*` condition, it may include zero
+or more `min_*` conditions. An empty rollover action is invalid.
 
-The index will rollover once any max_* condition is satisfied and all
-min_* conditions are satisfied. Note, however, that empty indices are not rolled
+The index will roll over once any `max_*` condition is satisfied and all
+`min_*` conditions are satisfied. Note, however, that empty indices are not rolled
 over by default.
 
 // tag::rollover-conditions[]
@@ -256,7 +263,7 @@ PUT _ilm/policy/my_policy
 ===== Roll over using multiple conditions
 
 When you specify multiple rollover conditions,
-the index is rolled over when _any_ of the max_* and _all_ of the min_* conditions are met.
+the index is rolled over when _any_ of the `max_*` and _all_ of the `min_*` conditions are met.
 This example rolls the index over if it is at least 7 days old or at least 100 gigabytes,
 but only as long as the index contains at least 1000 documents.
 

diff --git a/docs/reference/ilm/error-handling.asciidoc b/docs/reference/ilm/error-handling.asciidoc
@@ -154,11 +154,12 @@ You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the
 === Common {ilm-init} setting issues
 
 [discrete]
+[[min-age-calculation]]
 ==== How `min_age` is calculated
 
 When setting up an <<set-up-lifecycle-policy,{ilm-init} policy>> or <<getting-started-index-lifecycle-management,automating rollover with {ilm-init}>>, be aware that `min_age` can be relative to either the rollover time or the index creation time.
 
-If you use <<ilm-rollover,{ilm-init} rollover>>, `min_age` is calculated relative to the time the index was rolled over. This is because the <<indices-rollover-index,rollover API>> generates a new index. The `creation_date` of the new index (retrievable via <<indices-get-settings>>) is used in the calculation. If you do not use rollover in the {ilm-init} policy, `min_age` is calculated relative to the `creation_date` of the original index.
+If you use <<ilm-rollover,{ilm-init} rollover>>, `min_age` is calculated relative to the time the index was rolled over. This is because the <<indices-rollover-index,rollover API>> generates a new index and updates the `age` of the previous index to reflect the rollover time. If the index hasn't been rolled over, then the `age` is the same as the `creation_date` for the index.
 
 You can override how `min_age` is calculated using the `index.lifecycle.origination_date` and `index.lifecycle.parse_origination_date` <<ilm-settings,{ilm-init} settings>>.
 

diff --git a/docs/reference/ilm/ilm-index-lifecycle.asciidoc b/docs/reference/ilm/ilm-index-lifecycle.asciidoc
@@ -43,6 +43,12 @@ a "cold" phase with a minimum age either unset, or >= 10 days.
 The minimum age defaults to zero, which causes {ilm-init} to move indices to the next phase
 as soon as all actions in the current phase complete.
 
+[NOTE]
+====
+If an index has been <<ilm-rollover,rolled over>>, then the `min_age` value is relative to the time 
+the index was rolled over, not the index creation time. <<min-age-calculation,Learn more>>.
+====
+
 If an index has unallocated shards and the <<cluster-health,cluster health status>> is yellow,
 the index can still transition to the next phase according to its {ilm} policy.
 However, because {es} can only perform certain clean up tasks on a green

diff --git a/docs/reference/ilm/ilm-tutorial.asciidoc b/docs/reference/ilm/ilm-tutorial.asciidoc
@@ -57,7 +57,7 @@ reaches either a `max_primary_shard_size` of 50 gigabytes or a `max_age` of 30 d
 
 [NOTE]
 ====
-The `min_age` value is relative to the rollover time, not the index creation time.
+The `min_age` value is relative to the rollover time, not the index creation time. <<min-age-calculation,Learn more>>.
 ====
 
 You can create the policy through {kib} or with the

diff --git a/docs/reference/ilm/index-rollover.asciidoc b/docs/reference/ilm/index-rollover.asciidoc
@@ -3,8 +3,7 @@
 
 When indexing time series data like logs or metrics, you can't write to a single index indefinitely. 
 To meet your indexing and search performance requirements and manage resource usage, 
-you write to an index until some threshold is met and 
-then create a new index and start writing to it instead. 
+you write to an index until some threshold is met and then create a new index and start writing to it instead. 
 Using rolling indices enables you to:
 
 * Optimize the active index for high ingest rates on high-performance _hot_ nodes.
@@ -35,8 +34,15 @@ more configuration steps and concepts:
 You optimize this configuration for ingestion, typically using as many shards as you have hot nodes.
 * An _index alias_ that references the entire set of indices. 
 * A single index designated as the _write index_.
-This is the active index that handles all write requests.
-On each rollover, the new index becomes the write index.
+This is the active index that handles all write requests. 
+On each rollover, the new index becomes the write index. 
+
+[NOTE]
+====
+When an index is rolled over, the previous index's age is updated to reflect the rollover time. 
+This date, rather than the index's `creation_date`, is used in {ilm} 
+`min_age` phase calculations. <<min-age-calculation,Learn more>>.
+====
 
 [discrete]
 [[ilm-automatic-rollover]]

diff --git a/docs/reference/ingest/processors/geoip.asciidoc b/docs/reference/ingest/processors/geoip.asciidoc
@@ -48,11 +48,11 @@ field instead.
 *Depends on what is available in `database_file`:
 
 * If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`,
-`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
+`country_iso_code`, `country_name`, `continent_code`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
 and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
 * If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
-`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
-were configured in `properties`.
+`country_iso_code`, `country_name`, `continent_code`, and `continent_name`. The fields actually added depend on what has been found
+and which properties were configured in `properties`.
 * If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
 `asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
 in `properties`.
@@ -67,10 +67,10 @@ The fields actually added depend on what has been found and which properties wer
 `organization_name`, `network`, `isp`, `isp_organization`, `mobile_country_code`, and `mobile_network_code`. The fields actually added
 depend on what has been found and which properties were configured in `properties`.
 * If the GeoIP2 Enterprise database is used, then the following fields may be added under the `target_field`: `ip`,
-`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `location`, `asn`,
-`organization_name`, `network`, `hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, `residential_proxy`,
-`domain`, `isp`, `isp_organization`, `mobile_country_code`, `mobile_network_code`, `user_type`, and `connection_type`. The fields
-actually added  depend on what has been found and which properties were configured in `properties`.
+`country_iso_code`, `country_name`, `continent_code`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
+`location`, `asn`, `organization_name`, `network`, `hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`,
+`residential_proxy`, `domain`, `isp`, `isp_organization`, `mobile_country_code`, `mobile_network_code`, `user_type`, and
+`connection_type`. The fields actually added  depend on what has been found and which properties were configured in `properties`.
 
 
 Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:

diff --git a/docs/reference/mapping/types/range.asciidoc b/docs/reference/mapping/types/range.asciidoc
@@ -352,7 +352,7 @@ Will become:
 // TEST[s/^/{"_source":/ s/\n$/}/]
 
 [[range-synthetic-source-inclusive]]
-Range field vales are always represented as inclusive on both sides with bounds adjusted accordingly. For example:
+Range field vales are always represented as inclusive on both sides with bounds adjusted accordingly. Default values for range bounds are represented as `null`. This is true even if range bound was explicitly provided. For example:
 [source,console,id=synthetic-source-range-normalization-example]
 ----
 PUT idx
@@ -388,6 +388,42 @@ Will become:
 ----
 // TEST[s/^/{"_source":/ s/\n$/}/]
 
+[[range-synthetic-source-default-bounds]]
+Default values for range bounds are represented as `null` in synthetic source. This is true even if range bound was explicitly provided with default value. For example:
+[source,console,id=synthetic-source-range-bounds-example]
+----
+PUT idx
+{
+  "mappings": {
+    "_source": { "mode": "synthetic" },
+    "properties": {
+      "my_range": { "type": "integer_range" }
+    }
+  }
+}
+
+PUT idx/_doc/1
+{
+  "my_range": {
+    "lte": 2147483647
+  }
+}
+----
+// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
+
+Will become:
+
+[source,console-result]
+----
+{
+  "my_range": {
+    "gte": null,
+    "lte": null
+  }
+}
+----
+// TEST[s/^/{"_source":/ s/\n$/}/]
+
 `date` ranges are formatted using provided `format` or by default using `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format. For example:
 [source,console,id=synthetic-source-range-date-example]
 ----

diff --git a/...les/ingest-common/src/yamlRestTest/resources/rest-api-spec/test/ingest/15_info_ingest.yml b/...les/ingest-common/src/yamlRestTest/resources/rest-api-spec/test/ingest/15_info_ingest.yml
@@ -90,8 +90,8 @@ teardown:
   - gte: { ingest.pipelines.ingest_info_pipeline.time_in_millis: 0 }
   - match: { ingest.pipelines.ingest_info_pipeline.current: 0 }
   - match: { ingest.pipelines.ingest_info_pipeline.failed: 0 }
-  - gt: { ingest.pipelines.ingest_info_pipeline.ingested_in_bytes: 0 }
-  - gt: { ingest.pipelines.ingest_info_pipeline.produced_in_bytes: 0 }
+  - gt: { ingest.pipelines.ingest_info_pipeline.ingested_as_first_pipeline_in_bytes: 0 }
+  - gt: { ingest.pipelines.ingest_info_pipeline.produced_as_first_pipeline_in_bytes: 0 }
 
   # Processors section
   - is_true: ingest.pipelines.ingest_info_pipeline.processors.0.set
@@ -129,8 +129,8 @@ teardown:
       cluster.info:
         target: [ ingest ]
   - match: { ingest.pipelines.pipeline-1.failed: 1 }
-  - gt: { ingest.pipelines.pipeline-1.ingested_in_bytes: 0 }
-  - match: { ingest.pipelines.pipeline-1.produced_in_bytes: 0 }
+  - gt: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: 0 }
+  - match: { ingest.pipelines.pipeline-1.produced_as_first_pipeline_in_bytes: 0 }
 
 ---
 "Test drop processor":
@@ -156,8 +156,8 @@ teardown:
   - do:
       cluster.info:
         target: [ ingest ]
-  - gt: { ingest.pipelines.pipeline-1.ingested_in_bytes: 0 }
-  - match: { ingest.pipelines.pipeline-1.produced_in_bytes: 0 }
+  - gt: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: 0 }
+  - match: { ingest.pipelines.pipeline-1.produced_as_first_pipeline_in_bytes: 0 }
 
 ---
 "Test that pipeline processor has byte stats recorded in first pipeline":
@@ -211,11 +211,11 @@ teardown:
   - do:
       cluster.info:
         target: [ ingest ]
-  - gt: { ingest.pipelines.pipeline-1.ingested_in_bytes: 0 }
-  - set: { ingest.pipelines.pipeline-1.ingested_in_bytes: ingest_bytes }
-  - gt: { ingest.pipelines.pipeline-1.produced_in_bytes: $ingest_bytes }
-  - match: { ingest.pipelines.pipeline-2.ingested_in_bytes: null }
-  - match: { ingest.pipelines.pipeline-2.produced_in_bytes: null }
+  - gt: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: 0 }
+  - set: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: ingest_bytes }
+  - gt: { ingest.pipelines.pipeline-1.produced_as_first_pipeline_in_bytes: $ingest_bytes }
+  - match: { ingest.pipelines.pipeline-2.ingested_as_first_pipeline_in_bytes: null }
+  - match: { ingest.pipelines.pipeline-2.produced_as_first_pipeline_in_bytes: null }
 
 ---
 "Test that final pipeline has byte stats recorded in first pipeline":
@@ -264,11 +264,11 @@ teardown:
   - do:
       cluster.info:
         target: [ ingest ]
-  - gt: { ingest.pipelines.pipeline-1.ingested_in_bytes: 0 }
-  - set: { ingest.pipelines.pipeline-1.ingested_in_bytes: ingest_bytes }
-  - gt: { ingest.pipelines.pipeline-1.produced_in_bytes: $ingest_bytes }
-  - match: { ingest.pipelines.pipeline-2.ingested_in_bytes: null }
-  - match: { ingest.pipelines.pipeline-2.produced_in_bytes: null }
+  - gt: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: 0 }
+  - set: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: ingest_bytes }
+  - gt: { ingest.pipelines.pipeline-1.produced_as_first_pipeline_in_bytes: $ingest_bytes }
+  - match: { ingest.pipelines.pipeline-2.ingested_as_first_pipeline_in_bytes: null }
+  - match: { ingest.pipelines.pipeline-2.produced_as_first_pipeline_in_bytes: null }
 
 
 ---
@@ -330,8 +330,8 @@ teardown:
   - do:
       cluster.info:
         target: [ ingest ]
-  - gt: { ingest.pipelines.pipeline-1.ingested_in_bytes: 0 }
-  - set: { ingest.pipelines.pipeline-1.ingested_in_bytes: ingest_bytes }
-  - gt: { ingest.pipelines.pipeline-1.produced_in_bytes: $ingest_bytes }
-  - match: { ingest.pipelines.pipeline-2.ingested_in_bytes: null }
-  - match: { ingest.pipelines.pipeline-2.produced_in_bytes: null }
+  - gt: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: 0 }
+  - set: { ingest.pipelines.pipeline-1.ingested_as_first_pipeline_in_bytes: ingest_bytes }
+  - gt: { ingest.pipelines.pipeline-1.produced_as_first_pipeline_in_bytes: $ingest_bytes }
+  - match: { ingest.pipelines.pipeline-2.ingested_as_first_pipeline_in_bytes: null }
+  - match: { ingest.pipelines.pipeline-2.produced_as_first_pipeline_in_bytes: null }
diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/Database.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/Database.java
@@ -30,6 +30,7 @@ enum Database {
         Set.of(
             Property.IP,
             Property.COUNTRY_ISO_CODE,
+            Property.CONTINENT_CODE,
             Property.COUNTRY_NAME,
             Property.CONTINENT_NAME,
             Property.REGION_ISO_CODE,
@@ -49,7 +50,7 @@ enum Database {
         )
     ),
     Country(
-        Set.of(Property.IP, Property.CONTINENT_NAME, Property.COUNTRY_NAME, Property.COUNTRY_ISO_CODE),
+        Set.of(Property.IP, Property.CONTINENT_CODE, Property.CONTINENT_NAME, Property.COUNTRY_NAME, Property.COUNTRY_ISO_CODE),
         Set.of(Property.CONTINENT_NAME, Property.COUNTRY_NAME, Property.COUNTRY_ISO_CODE)
     ),
     Asn(
@@ -82,6 +83,7 @@ enum Database {
             Property.IP,
             Property.COUNTRY_ISO_CODE,
             Property.COUNTRY_NAME,
+            Property.CONTINENT_CODE,
             Property.CONTINENT_NAME,
             Property.REGION_ISO_CODE,
             Property.REGION_NAME,
@@ -235,6 +237,7 @@ enum Property {
         IP,
         COUNTRY_ISO_CODE,
         COUNTRY_NAME,
+        CONTINENT_CODE,
         CONTINENT_NAME,
         REGION_ISO_CODE,
         REGION_NAME,

diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java
@@ -234,6 +234,12 @@ private Map<String, Object> retrieveCityGeoData(GeoIpDatabase geoIpDatabase, Ine
                         geoData.put("country_name", countryName);
                     }
                 }
+                case CONTINENT_CODE -> {
+                    String continentCode = continent.getCode();
+                    if (continentCode != null) {
+                        geoData.put("continent_code", continentCode);
+                    }
+                }
                 case CONTINENT_NAME -> {
                     String continentName = continent.getName();
                     if (continentName != null) {
@@ -307,6 +313,12 @@ private Map<String, Object> retrieveCountryGeoData(GeoIpDatabase geoIpDatabase,
                         geoData.put("country_name", countryName);
                     }
                 }
+                case CONTINENT_CODE -> {
+                    String continentCode = continent.getCode();
+                    if (continentCode != null) {
+                        geoData.put("continent_code", continentCode);
+                    }
+                }
                 case CONTINENT_NAME -> {
                     String continentName = continent.getName();
                     if (continentName != null) {
@@ -485,6 +497,12 @@ private Map<String, Object> retrieveEnterpriseGeoData(GeoIpDatabase geoIpDatabas
                         geoData.put("country_name", countryName);
                     }
                 }
+                case CONTINENT_CODE -> {
+                    String continentCode = continent.getCode();
+                    if (continentCode != null) {
+                        geoData.put("continent_code", continentCode);
+                    }
+                }
                 case CONTINENT_NAME -> {
                     String continentName = continent.getName();
                     if (continentName != null) {