OpenTelemetry Bot d680729c09 [chore] Prepare release 0.90.0 (#29543)		vor 1 Jahr
..
images	60be78a1b8 [awscontainerinsights receiver]Add Readme for ECS (#4375)	vor 3 Jahren
internal	f4c44858b5 [all][chore] Moved from interface{} to any for all go code (#29072)	vor 1 Jahr
testdata	f343580ebb [chore] Change receiver config tests to unmarshal config only for that component. (#13383)	vor 2 Jahren
Makefile	961c9cd3aa set up the skeleton for aws container insight receiver (#3218)	vor 3 Jahren
README.md	4982f49841 [chore] update examples to use debugexporter (#26715)	vor 1 Jahr
config.go	5133f4ccd6 [chore] use license shortform (#22052)	vor 1 Jahr
config_test.go	3ec818bb72 [chore][receiver/awscontainerinsight] use generated status header (#22826)	vor 1 Jahr
design.md	60be78a1b8 [awscontainerinsights receiver]Add Readme for ECS (#4375)	vor 3 Jahren
doc.go	3ec818bb72 [chore][receiver/awscontainerinsight] use generated status header (#22826)	vor 1 Jahr
factory.go	3ec818bb72 [chore][receiver/awscontainerinsight] use generated status header (#22826)	vor 1 Jahr
factory_test.go	5133f4ccd6 [chore] use license shortform (#22052)	vor 1 Jahr
go.mod	d680729c09 [chore] Prepare release 0.90.0 (#29543)	vor 1 Jahr
go.sum	40b485f08a Update core for v0.90.0 release (#29539)	vor 1 Jahr
metadata.yaml	8a4348cb00 [chore] add codeowners to metadata (#24404)	vor 1 Jahr
receiver.go	e136bfdee5 [chore] Migrate all `aws` receviers to use errors.Join (#25185)	vor 1 Jahr
receiver_test.go	a24294f2a1 [exporter/awsemf] Enforce TTL on metric calculator maps (#25066)	vor 1 Jahr

AWS Container Insights Receiver

Status
Stability	beta: metrics
Distributions	contrib, aws, observiq, sumo
Warnings	Other
Issues
Code Owners	@Aneurysm9, @pxaws

Overview

AWS Container Insights Receiver (awscontainerinsightreceiver) is an AWS specific receiver that supports CloudWatch Container Insights. CloudWatch Container Insights collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. Data are collected as as performance log events using embedded metric format. From the EMF data, Amazon CloudWatch can create the aggregated CloudWatch metrics at the cluster, node, pod, task, and service level.

CloudWatch Container Insights has been supported by ECS Agent and CloudWatch Agent to collect infrastructure metrics for many resources such as such as CPU, memory, disk, and network. To migrate existing customers to use OpenTelemetry, AWS Container Insights Receiver (together with CloudWatch EMF Exporter) aims to support the same CloudWatch Container Insights experience for the following platforms:

Amazon ECS
Amazon EKS
Kubernetes platforms on Amazon EC2

Design of AWS Container Insights Receiver

See the design doc

Configuration

Example configuration:

receivers:
  awscontainerinsightreceiver:
    # all parameters are optional
    collection_interval: 60s
    container_orchestrator: eks
    add_service_as_attribute: true 
    prefer_full_pod_name: false 
    add_full_pod_name_metric_label: false

There is no need to provide any parameters since they are all optional.

collection_interval (optional)

The interval at which metrics should be collected. The default is 60 second.

container_orchestrator (optional)

The type of container orchestration service, e.g. eks or ecs. The default is eks.

add_service_as_attribute (optional)

Whether to add the associated service name as attribute. The default is true

prefer_full_pod_name (optional)

The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... If it can not be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. The default value is false.

add_full_pod_name_metric_label (optional)

The "FullPodName" attribute is the pod name including suffix. If false FullPodName label is not added. The default value is false

Sample configuration for Container Insights

This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver and awsemfexporter for an EKS cluster:

# create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: aws-otel-eks
  labels:
    name: aws-otel-eks

---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-otel-sa
  namespace: aws-otel-eks

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aoc-agent-role
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes", "endpoints"]
    verbs: ["list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["list", "watch"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["nodes/stats", "configmaps", "events"]
    verbs: ["create", "get"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["otel-container-insight-clusterleader"]
    verbs: ["get","update"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aoc-agent-role-binding
subjects:
  - kind: ServiceAccount
    name: aws-otel-sa
    namespace: aws-otel-eks
roleRef:
  kind: ClusterRole
  name: aoc-agent-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  namespace: aws-otel-eks
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    extensions:
      health_check:

    receivers:
      awscontainerinsightreceiver:

    processors:
      batch/metrics:
        timeout: 60s

    exporters:
      awsemf:
        namespace: ContainerInsights
        log_group_name: '/aws/containerinsights/{ClusterName}/performance'
        log_stream_name: '{NodeName}'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          # node metrics
          - dimensions: [[NodeName, InstanceId, ClusterName]]
            metric_name_selectors:
              - node_cpu_utilization
              - node_memory_utilization
              - node_network_total_bytes
              - node_cpu_reserved_capacity
              - node_memory_reserved_capacity
              - node_number_of_running_pods
              - node_number_of_running_containers
          - dimensions: [[ClusterName]]
            metric_name_selectors:
              - node_cpu_utilization
              - node_memory_utilization
              - node_network_total_bytes
              - node_cpu_reserved_capacity
              - node_memory_reserved_capacity
              - node_number_of_running_pods
              - node_number_of_running_containers
              - node_cpu_usage_total
              - node_cpu_limit
              - node_memory_working_set
              - node_memory_limit

          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization
              - pod_network_rx_bytes
              - pod_network_tx_bytes
              - pod_cpu_utilization_over_pod_limit
              - pod_memory_utilization_over_pod_limit
          - dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - pod_cpu_reserved_capacity
              - pod_memory_reserved_capacity
          - dimensions: [[PodName, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_number_of_container_restarts

          # cluster metrics
          - dimensions: [[ClusterName]]
            metric_name_selectors:
              - cluster_node_count
              - cluster_failed_node_count

          # service metrics
          - dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - service_number_of_running_pods

          # node fs metrics
          - dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
            metric_name_selectors:
              - node_filesystem_utilization

          # namespace metrics
          - dimensions: [[Namespace, ClusterName], [ClusterName]]
            metric_name_selectors:
              - namespace_number_of_running_pods


      debug:
        verbosity: detailed

    service:
      pipelines:
        metrics:
          receivers: [awscontainerinsightreceiver]
          processors: [batch/metrics]
          exporters: [awsemf]

      extensions: [health_check]

---
# create Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-otel-eks-ci
  namespace: aws-otel-eks
spec:
  selector:
    matchLabels:
      name: aws-otel-eks-ci
  template:
    metadata:
      labels:
        name: aws-otel-eks-ci
    spec:
      containers:
        - name: aws-otel-collector
          image: {collector-image-url}
          env:
            #- name: AWS_REGION
            #  value: "us-east-1"
            - name: K8S_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
            - name: HOST_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: K8S_NAMESPACE
              valueFrom:
                 fieldRef:
                   fieldPath: metadata.namespace
          imagePullPolicy: Always
          command:
            - "/awscollector"
            - "--config=/conf/otel-agent-config.yaml"
          volumeMounts:
            - name: rootfs
              mountPath: /rootfs
              readOnly: true
            - name: dockersock
              mountPath: /var/run/docker.sock
              readOnly: true
            - name: varlibdocker
              mountPath: /var/lib/docker
              readOnly: true
            - name: containerdsock
              mountPath: /run/containerd/containerd.sock
              readOnly: true
            - name: sys
              mountPath: /sys
              readOnly: true
            - name: devdisk
              mountPath: /dev/disk
              readOnly: true
            - name: otel-agent-config-vol
              mountPath: /conf
          resources:
            limits:
              cpu:  200m
              memory: 200Mi
            requests:
              cpu: 200m
              memory: 200Mi
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol
        - name: rootfs
          hostPath:
            path: /
        - name: dockersock
          hostPath:
            path: /var/run/docker.sock
        - name: varlibdocker
          hostPath:
            path: /var/lib/docker
        - name: containerdsock
          hostPath:
            path: /run/containerd/containerd.sock
        - name: sys
          hostPath:
            path: /sys
        - name: devdisk
          hostPath:
            path: /dev/disk/
      serviceAccountName: aws-otel-sa

To deploy to an EKS cluster

kubectl apply -f config.yaml

Available Metrics and Resource Attributes

Cluster

Metric	Unit
cluster_failed_node_count	Count
cluster_node_count	Count

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Type | | Timestamp | | Version | | Sources |

Cluster Namespace

Metric	Unit
namespace_number_of_running_pods	Count

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Namespace | | Type | | Timestamp | | Version | | Sources | | kubernete |

Cluster Service

Metric	Unit
service_number_of_running_pods	Count

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Namespace | | Service | | Type | | Timestamp | | Version | | Sources | | kubernete |

Node

Metric	Unit
node_cpu_limit	Millicore
node_cpu_request	Millicore
node_cpu_reserved_capacity	Percent
node_cpu_usage_system	Millicore
node_cpu_usage_total	Millicore
node_cpu_usage_user	Millicore
node_cpu_utilization	Percent
node_memory_cache	Bytes
node_memory_failcnt	Count
node_memory_hierarchical_pgfault	Count/Second
node_memory_hierarchical_pgmajfault	Count/Second
node_memory_limit	Bytes
node_memory_mapped_file	Bytes
node_memory_max_usage	Bytes
node_memory_pgfault	Count/Second
node_memory_pgmajfault	Count/Second
node_memory_request	Bytes
node_memory_reserved_capacity	Percent
node_memory_rss	Bytes
node_memory_swap	Bytes
node_memory_usage	Bytes
node_memory_utilization	Percent
node_memory_working_set	Bytes
node_network_rx_bytes	Bytes/Second
node_network_rx_dropped	Count/Second
node_network_rx_errors	Count/Second
node_network_rx_packets	Count/Second
node_network_total_bytes	Bytes/Second
node_network_tx_bytes	Bytes/Second
node_network_tx_dropped	Count/Second
node_network_tx_errors	Count/Second
node_network_tx_packets	Count/Second
node_number_of_running_containers	Count
node_number_of_running_pods	Count

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | NodeName | | Timestamp | | Type | | Version | | Sources | | kubernete |

Node Disk IO

Metric	Unit
node_diskio_io_serviced_async	Count/Second
node_diskio_io_serviced_read	Count/Second
node_diskio_io_serviced_sync	Count/Second
node_diskio_io_serviced_total	Count/Second
node_diskio_io_serviced_write	Count/Second
node_diskio_io_service_bytes_async	Bytes/Second
node_diskio_io_service_bytes_read	Bytes/Second
node_diskio_io_service_bytes_sync	Bytes/Second
node_diskio_io_service_bytes_total	Bytes/Second
node_diskio_io_service_bytes_write	Bytes/Second

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | EBSVolumeId | | device | | Type | | Version | | Sources | | kubernete |

Node Filesystem

Metric	Unit
node_filesystem_available	Bytes
node_filesystem_capacity	Bytes
node_filesystem_inodes	Count
node_filesystem_inodes_free	Count
node_filesystem_usage	Bytes
node_filesystem_utilization	Percent

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | EBSVolumeId | | device | | fstype | | Type | | Version | | Sources | | kubernete |

Node Network

Metric	Unit
node_interface_network_rx_bytes	Bytes/Second
node_interface_network_rx_dropped	Count/Second
node_interface_network_rx_errors	Count/Second
node_interface_network_rx_packets	Count/Second
node_interface_network_total_bytes	Bytes/Second
node_interface_network_tx_bytes	Bytes/Second
node_interface_network_tx_dropped	Count/Second
node_interface_network_tx_errors	Count/Second
node_interface_network_tx_packets	Count/Second

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | Type | | Version | | interface | | Sources | | kubernete |

Pod

Metric	Unit
pod_cpu_limit	Millicore
pod_cpu_request	Millicore
pod_cpu_reserved_capacity	Percent
pod_cpu_usage_system	Millicore
pod_cpu_usage_total	Millicore
pod_cpu_usage_user	Millicore
pod_cpu_utilization	Percent
pod_cpu_utilization_over_pod_limit	Percent
pod_memory_cache	Bytes
pod_memory_failcnt	Count
pod_memory_hierarchical_pgfault	Count/Second
pod_memory_hierarchical_pgmajfault	Count/Second
pod_memory_limit	Bytes
pod_memory_mapped_file	Bytes
pod_memory_max_usage	Bytes
pod_memory_pgfault	Count/Second
pod_memory_pgmajfault	Count/Second
pod_memory_request	Bytes
pod_memory_reserved_capacity	Percent
pod_memory_rss	Bytes
pod_memory_swap	Bytes
pod_memory_usage	Bytes
pod_memory_utilization	Percent
pod_memory_utilization_over_pod_limit	Percent
pod_memory_working_set	Bytes
pod_network_rx_bytes	Bytes/Second
pod_network_rx_dropped	Count/Second
pod_network_rx_errors	Count/Second
pod_network_rx_packets	Count/Second
pod_network_total_bytes	Bytes/Second
pod_network_tx_bytes	Bytes/Second
pod_network_tx_dropped	Count/Second
pod_network_tx_errors	Count/Second
pod_network_tx_packets	Count/Second
pod_number_of_container_restarts	Count
pod_number_of_containers	Count
pod_number_of_running_containers	Count

Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
K8sPodName
Namespace
NodeName
PodId
Timestamp
Type
Version
Sources
kubernete
pod_status

Pod Network

Metric	Unit
pod_interface_network_rx_bytes	Bytes/Second
pod_interface_network_rx_dropped	Count/Second
pod_interface_network_rx_errors	Count/Second
pod_interface_network_rx_packets	Count/Second
pod_interface_network_total_bytes	Bytes/Second
pod_interface_network_tx_bytes	Bytes/Second
pod_interface_network_tx_dropped	Count/Second
pod_interface_network_tx_errors	Count/Second
pod_interface_network_tx_packets	Count/Second

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | K8sPodName | | Namespace | | NodeName | | PodId | | Timestamp | | Type | | Version | | interface | | Sources | | kubernete | | pod_status |

Container

Metric	Unit
container_cpu_limit	Millicore
container_cpu_request	Millicore
container_cpu_usage_system	Millicore
container_cpu_usage_total	Millicore
container_cpu_usage_user	Millicore
container_cpu_utilization	Percent
container_memory_cache	Bytes
container_memory_failcnt	Count
container_memory_hierarchical_pgfault	Count/Second
container_memory_hierarchical_pgmajfault	Count/Second
container_memory_limit	Bytes
container_memory_mapped_file	Bytes
container_memory_max_usage	Bytes
container_memory_pgfault	Count/Second
container_memory_pgmajfault	Count/Second
container_memory_request	Bytes
container_memory_rss	Bytes
container_memory_swap	Bytes
container_memory_usage	Bytes
container_memory_utilization	Percent
container_memory_working_set	Bytes
number_of_container_restarts	Count

Resource Attribute
AutoScalingGroupName
ClusterName
ContainerId
ContainerName
InstanceId
InstanceType
K8sPodName
Namespace
NodeName
PodId
Timestamp
Type
Version
Sources
kubernetes
container_status
container_status_reason
container_last_termination_reason

The attribute container_status_reason is present only when container_status is in "Waiting" or "Terminated" State. The attribute container_last_termination_reason is present only when container_status is in "Terminated" State.

This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver and awsemfexporter for an ECS cluster to collect the instance level metrics:

receivers:
  awscontainerinsightreceiver:
    collection_interval: 10s
    container_orchestrator: ecs

processors:
  batch/metrics:
    timeout: 60s

exporters:
  awsemf:
    namespace: ContainerInsightsEC2Instance
    log_group_name: '/aws/ecs/containerinsights/{ClusterName}/performance'
    log_stream_name: 'instanceTelemetry/{ContainerInstanceId}'
    resource_to_telemetry_conversion:
      enabled: true
    dimension_rollup_option: NoDimensionRollup
    parse_json_encoded_attr_values: [Sources]
    metric_declarations:
      # instance metrics
      - dimensions: [ [ ContainerInstanceId, InstanceId, ClusterName] ]
        metric_name_selectors:
          - instance_cpu_utilization
          - instance_memory_utilization
          - instance_network_total_bytes
          - instance_cpu_reserved_capacity
          - instance_memory_reserved_capacity
          - instance_number_of_running_tasks
          - instance_filesystem_utilization
      - dimensions: [ [ClusterName] ]
        metric_name_selectors:
          - instance_cpu_utilization
          - instance_memory_utilization
          - instance_network_total_bytes
          - instance_cpu_reserved_capacity
          - instance_memory_reserved_capacity
          - instance_number_of_running_tasks
          - instance_cpu_usage_total
          - instance_cpu_limit
          - instance_memory_working_set
          - instance_memory_limit
  debug:
    verbosity: detailed
service:
  pipelines:
    metrics:
      receivers: [awscontainerinsightreceiver]
      processors: [batch/metrics]
      exporters: [awsemf,debug]

To deploy to an ECS cluster check this doc for details

Available Metrics and Resource Attributes

Instance

Metric	Unit
instance_cpu_limit	Millicore
instance_cpu_reserved_capacity	Percent
instance_cpu_usage_system	Millicore
instance_cpu_usage_total	Millicore
instance_cpu_usage_user	Millicore
instance_cpu_utilization	Percent
instance_memory_cache	Bytes
instance_memory_failcnt	Count
instance_memory_hierarchical_pgfault	Count/Second
instance_memory_hierarchical_pgmajfault	Count/Second
instance_memory_limit	Bytes
instance_memory_mapped_file	Bytes
instance_memory_max_usage	Bytes
instance_memory_pgfault	Count/Second
instance_memory_pgmajfault	Count/Second
instance_memory_reserved_capacity	Percent
instance_memory_rss	Bytes
instance_memory_swap	Bytes
instance_memory_usage	Bytes
instance_memory_utilization	Percent
instance_memory_working_set	Bytes
instance_network_rx_bytes	Bytes/Second
instance_network_rx_dropped	Count/Second
instance_network_rx_errors	Count/Second
instance_network_rx_packets	Count/Second
instance_network_total_bytes	Bytes/Second
instance_network_tx_bytes	Bytes/Second
instance_network_tx_dropped	Count/Second
instance_network_tx_errors	Count/Second
instance_network_tx_packets	Count/Second
instance_number_of_running_tasks	Count

Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId

Instance Disk IO

Metric	Unit
instance_diskio_io_serviced_async	Count/Second
instance_diskio_io_serviced_read	Count/Second
instance_diskio_io_serviced_sync	Count/Second
instance_diskio_io_serviced_total	Count/Second
instance_diskio_io_serviced_write	Count/Second
instance_diskio_io_service_bytes_async	Bytes/Second
instance_diskio_io_service_bytes_read	Bytes/Second
instance_diskio_io_service_bytes_sync	Bytes/Second
instance_diskio_io_service_bytes_total	Bytes/Second
instance_diskio_io_service_bytes_write	Bytes/Second

Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId
EBSVolumeId

Instance Filesystem

Metric	Unit
instance_filesystem_available	Bytes
instance_filesystem_capacity	Bytes
instance_filesystem_inodes	Count
instance_filesystem_inodes_free	Count
instance_filesystem_usage	Bytes
instance_filesystem_utilization	Percent

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | AutoScalingGroupName | | Timestamp | | Type | | Version | | Sources | | ContainerInstanceId | | InstanceId | | EBSVolumeId |

Instance Network

Metric	Unit
instance_interface_network_rx_bytes	Bytes/Second
instance_interface_network_rx_dropped	Count/Second
instance_interface_network_rx_errors	Count/Second
instance_interface_network_rx_packets	Count/Second
instance_interface_network_total_bytes	Bytes/Second
instance_interface_network_tx_bytes	Bytes/Second
instance_interface_network_tx_dropped	Count/Second
instance_interface_network_tx_errors	Count/Second
instance_interface_network_tx_packets	Count/Second

Warnings

Root permissions

When using this component, the collector process needs root permission to be able to read the content of the files located in the following locations:

/
/var/run/docker.sock
/var/lib/docker
/run/containerd/containerd.sock
/sys
/dev/disk

This requirement comes from the fact that this component is based on cAdvisor.

README.md

AWS Container Insights Receiver

Overview

Design of AWS Container Insights Receiver

Configuration

Sample configuration for Container Insights

Available Metrics and Resource Attributes

Cluster

Cluster Namespace

Cluster Service

Node

Node Disk IO

Node Filesystem

Node Network

Pod

Pod Network

Container

Available Metrics and Resource Attributes

Instance

Instance Disk IO

Instance Filesystem

Instance Network

Warnings

Root permissions