# AWS Container Insights Receiver | Status | | | ------------- |-----------| | Stability | [beta]: metrics | | Distributions | [contrib], [aws], [observiq], [sumo] | | Warnings | [Other](#warnings) | | Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Areceiver%2Fawscontainerinsight%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Areceiver%2Fawscontainerinsight) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Areceiver%2Fawscontainerinsight%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Areceiver%2Fawscontainerinsight) | | [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@Aneurysm9](https://www.github.com/Aneurysm9), [@pxaws](https://www.github.com/pxaws) | [beta]: https://github.com/open-telemetry/opentelemetry-collector#beta [contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib [aws]: https://github.com/aws-observability/aws-otel-collector [observiq]: https://github.com/observIQ/observiq-otel-collector [sumo]: https://github.com/SumoLogic/sumologic-otel-collector ## Overview AWS Container Insights Receiver (`awscontainerinsightreceiver`) is an AWS specific receiver that supports [CloudWatch Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html). CloudWatch Container Insights collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. Data are collected as as performance log events using [embedded metric format](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html). From the EMF data, Amazon CloudWatch can create the aggregated CloudWatch metrics at the cluster, node, pod, task, and service level. CloudWatch Container Insights has been supported by [ECS Agent](https://github.com/aws/amazon-ecs-agent) and [CloudWatch Agent](https://github.com/aws/amazon-cloudwatch-agent) to collect infrastructure metrics for many resources such as such as CPU, memory, disk, and network. To migrate existing customers to use OpenTelemetry, AWS Container Insights Receiver (together with CloudWatch EMF Exporter) aims to support the same CloudWatch Container Insights experience for the following platforms: * Amazon ECS * Amazon EKS * Kubernetes platforms on Amazon EC2 ## Design of AWS Container Insights Receiver See the [design doc](./design.md) ## Configuration Example configuration: ``` receivers: awscontainerinsightreceiver: # all parameters are optional collection_interval: 60s container_orchestrator: eks add_service_as_attribute: true prefer_full_pod_name: false add_full_pod_name_metric_label: false ``` There is no need to provide any parameters since they are all optional. **collection_interval (optional)** The interval at which metrics should be collected. The default is 60 second. **container_orchestrator (optional)** The type of container orchestration service, e.g. eks or ecs. The default is eks. **add_service_as_attribute (optional)** Whether to add the associated service name as attribute. The default is true **prefer_full_pod_name (optional)** The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... If it can not be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. The default value is false. **add_full_pod_name_metric_label (optional)** The "FullPodName" attribute is the pod name including suffix. If false FullPodName label is not added. The default value is false ## Sample configuration for Container Insights This is a sample configuration for AWS Container Insights using the `awscontainerinsightreceiver` and `awsemfexporter` for an EKS cluster: ``` # create namespace apiVersion: v1 kind: Namespace metadata: name: aws-otel-eks labels: name: aws-otel-eks --- # create cwagent service account and role binding apiVersion: v1 kind: ServiceAccount metadata: name: aws-otel-sa namespace: aws-otel-eks --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: aoc-agent-role rules: - apiGroups: [""] resources: ["pods", "nodes", "endpoints"] verbs: ["list", "watch"] - apiGroups: ["apps"] resources: ["replicasets"] verbs: ["list", "watch"] - apiGroups: ["batch"] resources: ["jobs"] verbs: ["list", "watch"] - apiGroups: [""] resources: ["nodes/proxy"] verbs: ["get"] - apiGroups: [""] resources: ["nodes/stats", "configmaps", "events"] verbs: ["create", "get"] - apiGroups: [""] resources: ["configmaps"] resourceNames: ["otel-container-insight-clusterleader"] verbs: ["get","update"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: aoc-agent-role-binding subjects: - kind: ServiceAccount name: aws-otel-sa namespace: aws-otel-eks roleRef: kind: ClusterRole name: aoc-agent-role apiGroup: rbac.authorization.k8s.io --- apiVersion: v1 kind: ConfigMap metadata: name: otel-agent-conf namespace: aws-otel-eks labels: app: opentelemetry component: otel-agent-conf data: otel-agent-config: | extensions: health_check: receivers: awscontainerinsightreceiver: processors: batch/metrics: timeout: 60s exporters: awsemf: namespace: ContainerInsights log_group_name: '/aws/containerinsights/{ClusterName}/performance' log_stream_name: '{NodeName}' resource_to_telemetry_conversion: enabled: true dimension_rollup_option: NoDimensionRollup parse_json_encoded_attr_values: [Sources, kubernetes] metric_declarations: # node metrics - dimensions: [[NodeName, InstanceId, ClusterName]] metric_name_selectors: - node_cpu_utilization - node_memory_utilization - node_network_total_bytes - node_cpu_reserved_capacity - node_memory_reserved_capacity - node_number_of_running_pods - node_number_of_running_containers - dimensions: [[ClusterName]] metric_name_selectors: - node_cpu_utilization - node_memory_utilization - node_network_total_bytes - node_cpu_reserved_capacity - node_memory_reserved_capacity - node_number_of_running_pods - node_number_of_running_containers - node_cpu_usage_total - node_cpu_limit - node_memory_working_set - node_memory_limit # pod metrics - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]] metric_name_selectors: - pod_cpu_utilization - pod_memory_utilization - pod_network_rx_bytes - pod_network_tx_bytes - pod_cpu_utilization_over_pod_limit - pod_memory_utilization_over_pod_limit - dimensions: [[PodName, Namespace, ClusterName], [ClusterName]] metric_name_selectors: - pod_cpu_reserved_capacity - pod_memory_reserved_capacity - dimensions: [[PodName, Namespace, ClusterName]] metric_name_selectors: - pod_number_of_container_restarts # cluster metrics - dimensions: [[ClusterName]] metric_name_selectors: - cluster_node_count - cluster_failed_node_count # service metrics - dimensions: [[Service, Namespace, ClusterName], [ClusterName]] metric_name_selectors: - service_number_of_running_pods # node fs metrics - dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]] metric_name_selectors: - node_filesystem_utilization # namespace metrics - dimensions: [[Namespace, ClusterName], [ClusterName]] metric_name_selectors: - namespace_number_of_running_pods debug: verbosity: detailed service: pipelines: metrics: receivers: [awscontainerinsightreceiver] processors: [batch/metrics] exporters: [awsemf] extensions: [health_check] --- # create Daemonset apiVersion: apps/v1 kind: DaemonSet metadata: name: aws-otel-eks-ci namespace: aws-otel-eks spec: selector: matchLabels: name: aws-otel-eks-ci template: metadata: labels: name: aws-otel-eks-ci spec: containers: - name: aws-otel-collector image: {collector-image-url} env: #- name: AWS_REGION # value: "us-east-1" - name: K8S_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: HOST_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: K8S_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace imagePullPolicy: Always command: - "/awscollector" - "--config=/conf/otel-agent-config.yaml" volumeMounts: - name: rootfs mountPath: /rootfs readOnly: true - name: dockersock mountPath: /var/run/docker.sock readOnly: true - name: varlibdocker mountPath: /var/lib/docker readOnly: true - name: containerdsock mountPath: /run/containerd/containerd.sock readOnly: true - name: sys mountPath: /sys readOnly: true - name: devdisk mountPath: /dev/disk readOnly: true - name: otel-agent-config-vol mountPath: /conf resources: limits: cpu: 200m memory: 200Mi requests: cpu: 200m memory: 200Mi volumes: - configMap: name: otel-agent-conf items: - key: otel-agent-config path: otel-agent-config.yaml name: otel-agent-config-vol - name: rootfs hostPath: path: / - name: dockersock hostPath: path: /var/run/docker.sock - name: varlibdocker hostPath: path: /var/lib/docker - name: containerdsock hostPath: path: /run/containerd/containerd.sock - name: sys hostPath: path: /sys - name: devdisk hostPath: path: /dev/disk/ serviceAccountName: aws-otel-sa ``` To deploy to an EKS cluster ``` kubectl apply -f config.yaml ``` ## Available Metrics and Resource Attributes ### Cluster | Metric | Unit | |---------------------------|-------| | cluster_failed_node_count | Count | | cluster_node_count | Count |

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Type | | Timestamp | | Version | | Sources |

### Cluster Namespace | Metric | Unit | |----------------------------------|-------| | namespace_number_of_running_pods | Count |

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Namespace | | Type | | Timestamp | | Version | | Sources | | kubernete |

### Cluster Service | Metric | Unit | |--------------------------------|-------| | service_number_of_running_pods | Count |

| Resource Attribute | |--------------------| | ClusterName | | NodeName | | Namespace | | Service | | Type | | Timestamp | | Version | | Sources | | kubernete |

### Node | Metric | Unit | |-------------------------------------|---------------| | node_cpu_limit | Millicore | | node_cpu_request | Millicore | | node_cpu_reserved_capacity | Percent | | node_cpu_usage_system | Millicore | | node_cpu_usage_total | Millicore | | node_cpu_usage_user | Millicore | | node_cpu_utilization | Percent | | node_memory_cache | Bytes | | node_memory_failcnt | Count | | node_memory_hierarchical_pgfault | Count/Second | | node_memory_hierarchical_pgmajfault | Count/Second | | node_memory_limit | Bytes | | node_memory_mapped_file | Bytes | | node_memory_max_usage | Bytes | | node_memory_pgfault | Count/Second | | node_memory_pgmajfault | Count/Second | | node_memory_request | Bytes | | node_memory_reserved_capacity | Percent | | node_memory_rss | Bytes | | node_memory_swap | Bytes | | node_memory_usage | Bytes | | node_memory_utilization | Percent | | node_memory_working_set | Bytes | | node_network_rx_bytes | Bytes/Second | | node_network_rx_dropped | Count/Second | | node_network_rx_errors | Count/Second | | node_network_rx_packets | Count/Second | | node_network_total_bytes | Bytes/Second | | node_network_tx_bytes | Bytes/Second | | node_network_tx_dropped | Count/Second | | node_network_tx_errors | Count/Second | | node_network_tx_packets | Count/Second | | node_number_of_running_containers | Count | | node_number_of_running_pods | Count |

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | NodeName | | Timestamp | | Type | | Version | | Sources | | kubernete |

### Node Disk IO | Metric | Unit | |------------------------------------|---------------| | node_diskio_io_serviced_async | Count/Second | | node_diskio_io_serviced_read | Count/Second | | node_diskio_io_serviced_sync | Count/Second | | node_diskio_io_serviced_total | Count/Second | | node_diskio_io_serviced_write | Count/Second | | node_diskio_io_service_bytes_async | Bytes/Second | | node_diskio_io_service_bytes_read | Bytes/Second | | node_diskio_io_service_bytes_sync | Bytes/Second | | node_diskio_io_service_bytes_total | Bytes/Second | | node_diskio_io_service_bytes_write | Bytes/Second |

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | EBSVolumeId | | device | | Type | | Version | | Sources | | kubernete |

### Node Filesystem | Metric | Unit | |-----------------------------|---------| | node_filesystem_available | Bytes | | node_filesystem_capacity | Bytes | | node_filesystem_inodes | Count | | node_filesystem_inodes_free | Count | | node_filesystem_usage | Bytes | | node_filesystem_utilization | Percent |

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | EBSVolumeId | | device | | fstype | | Type | | Version | | Sources | | kubernete |

### Node Network | Metric | Unit | |------------------------------------|--------------| | node_interface_network_rx_bytes | Bytes/Second | | node_interface_network_rx_dropped | Count/Second | | node_interface_network_rx_errors | Count/Second | | node_interface_network_rx_packets | Count/Second | | node_interface_network_total_bytes | Bytes/Second | | node_interface_network_tx_bytes | Bytes/Second | | node_interface_network_tx_dropped | Count/Second | | node_interface_network_tx_errors | Count/Second | | node_interface_network_tx_packets | Count/Second |

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | NodeName | | Timestamp | | Type | | Version | | interface | | Sources | | kubernete |

### Pod | Metric | Unit | |---------------------------------------|---------------| | pod_cpu_limit | Millicore | | pod_cpu_request | Millicore | | pod_cpu_reserved_capacity | Percent | | pod_cpu_usage_system | Millicore | | pod_cpu_usage_total | Millicore | | pod_cpu_usage_user | Millicore | | pod_cpu_utilization | Percent | | pod_cpu_utilization_over_pod_limit | Percent | | pod_memory_cache | Bytes | | pod_memory_failcnt | Count | | pod_memory_hierarchical_pgfault | Count/Second | | pod_memory_hierarchical_pgmajfault | Count/Second | | pod_memory_limit | Bytes | | pod_memory_mapped_file | Bytes | | pod_memory_max_usage | Bytes | | pod_memory_pgfault | Count/Second | | pod_memory_pgmajfault | Count/Second | | pod_memory_request | Bytes | | pod_memory_reserved_capacity | Percent | | pod_memory_rss | Bytes | | pod_memory_swap | Bytes | | pod_memory_usage | Bytes | | pod_memory_utilization | Percent | | pod_memory_utilization_over_pod_limit | Percent | | pod_memory_working_set | Bytes | | pod_network_rx_bytes | Bytes/Second | | pod_network_rx_dropped | Count/Second | | pod_network_rx_errors | Count/Second | | pod_network_rx_packets | Count/Second | | pod_network_total_bytes | Bytes/Second | | pod_network_tx_bytes | Bytes/Second | | pod_network_tx_dropped | Count/Second | | pod_network_tx_errors | Count/Second | | pod_network_tx_packets | Count/Second | | pod_number_of_container_restarts | Count | | pod_number_of_containers | Count | | pod_number_of_running_containers | Count | | Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | K8sPodName | | Namespace | | NodeName | | PodId | | Timestamp | | Type | | Version | | Sources | | kubernete | | pod_status |

### Pod Network | Metric | Unit | |------------------------------------|--------------| | pod_interface_network_rx_bytes | Bytes/Second | | pod_interface_network_rx_dropped | Count/Second | | pod_interface_network_rx_errors | Count/Second | | pod_interface_network_rx_packets | Count/Second | | pod_interface_network_total_bytes | Bytes/Second | | pod_interface_network_tx_bytes | Bytes/Second | | pod_interface_network_tx_dropped | Count/Second | | pod_interface_network_tx_errors | Count/Second | | pod_interface_network_tx_packets | Count/Second |

| Resource Attribute | |----------------------| | AutoScalingGroupName | | ClusterName | | InstanceId | | InstanceType | | K8sPodName | | Namespace | | NodeName | | PodId | | Timestamp | | Type | | Version | | interface | | Sources | | kubernete | | pod_status |

### Container | Metric | Unit | |-----------------------------------------|---------------| | container_cpu_limit | Millicore | | container_cpu_request | Millicore | | container_cpu_usage_system | Millicore | | container_cpu_usage_total | Millicore | | container_cpu_usage_user | Millicore | | container_cpu_utilization | Percent | | container_memory_cache | Bytes | | container_memory_failcnt | Count | | container_memory_hierarchical_pgfault | Count/Second | | container_memory_hierarchical_pgmajfault| Count/Second | | container_memory_limit | Bytes | | container_memory_mapped_file | Bytes | | container_memory_max_usage | Bytes | | container_memory_pgfault | Count/Second | | container_memory_pgmajfault | Count/Second | | container_memory_request | Bytes | | container_memory_rss | Bytes | | container_memory_swap | Bytes | | container_memory_usage | Bytes | | container_memory_utilization | Percent | | container_memory_working_set | Bytes | | number_of_container_restarts | Count |

| Resource Attribute | |-----------------------------------| | AutoScalingGroupName | | ClusterName | | ContainerId | | ContainerName | | InstanceId | | InstanceType | | K8sPodName | | Namespace | | NodeName | | PodId | | Timestamp | | Type | | Version | | Sources | | kubernetes | | container_status | | container_status_reason | | container_last_termination_reason | The attribute `container_status_reason` is present only when `container_status` is in "Waiting" or "Terminated" State. The attribute `container_last_termination_reason` is present only when `container_status` is in "Terminated" State. This is a sample configuration for AWS Container Insights using the `awscontainerinsightreceiver` and `awsemfexporter` for an ECS cluster to collect the instance level metrics: ``` receivers: awscontainerinsightreceiver: collection_interval: 10s container_orchestrator: ecs processors: batch/metrics: timeout: 60s exporters: awsemf: namespace: ContainerInsightsEC2Instance log_group_name: '/aws/ecs/containerinsights/{ClusterName}/performance' log_stream_name: 'instanceTelemetry/{ContainerInstanceId}' resource_to_telemetry_conversion: enabled: true dimension_rollup_option: NoDimensionRollup parse_json_encoded_attr_values: [Sources] metric_declarations: # instance metrics - dimensions: [ [ ContainerInstanceId, InstanceId, ClusterName] ] metric_name_selectors: - instance_cpu_utilization - instance_memory_utilization - instance_network_total_bytes - instance_cpu_reserved_capacity - instance_memory_reserved_capacity - instance_number_of_running_tasks - instance_filesystem_utilization - dimensions: [ [ClusterName] ] metric_name_selectors: - instance_cpu_utilization - instance_memory_utilization - instance_network_total_bytes - instance_cpu_reserved_capacity - instance_memory_reserved_capacity - instance_number_of_running_tasks - instance_cpu_usage_total - instance_cpu_limit - instance_memory_working_set - instance_memory_limit debug: verbosity: detailed service: pipelines: metrics: receivers: [awscontainerinsightreceiver] processors: [batch/metrics] exporters: [awsemf,debug] ``` To deploy to an ECS cluster check this [doc](https://aws-otel.github.io/docs/setup/ecs#3-setup-the-aws-otel-collector-for-ecs-ec2-instance-metrics) for details ## Available Metrics and Resource Attributes ### Instance | Metric | Unit | |-----------------------------------------|---------------| | instance_cpu_limit | Millicore | | instance_cpu_reserved_capacity | Percent | | instance_cpu_usage_system | Millicore | | instance_cpu_usage_total | Millicore | | instance_cpu_usage_user | Millicore | | instance_cpu_utilization | Percent | | instance_memory_cache | Bytes | | instance_memory_failcnt | Count | | instance_memory_hierarchical_pgfault | Count/Second | | instance_memory_hierarchical_pgmajfault | Count/Second | | instance_memory_limit | Bytes | | instance_memory_mapped_file | Bytes | | instance_memory_max_usage | Bytes | | instance_memory_pgfault | Count/Second | | instance_memory_pgmajfault | Count/Second | | instance_memory_reserved_capacity | Percent | | instance_memory_rss | Bytes | | instance_memory_swap | Bytes | | instance_memory_usage | Bytes | | instance_memory_utilization | Percent | | instance_memory_working_set | Bytes | | instance_network_rx_bytes | Bytes/Second | | instance_network_rx_dropped | Count/Second | | instance_network_rx_errors | Count/Second | | instance_network_rx_packets | Count/Second | | instance_network_total_bytes | Bytes/Second | | instance_network_tx_bytes | Bytes/Second | | instance_network_tx_dropped | Count/Second | | instance_network_tx_errors | Count/Second | | instance_network_tx_packets | Count/Second | | instance_number_of_running_tasks | Count |

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | AutoScalingGroupName | | Timestamp | | Type | | Version | | Sources | | ContainerInstanceId | | InstanceId |

### Instance Disk IO | Metric | Unit | |----------------------------------------|---------------| | instance_diskio_io_serviced_async | Count/Second | | instance_diskio_io_serviced_read | Count/Second | | instance_diskio_io_serviced_sync | Count/Second | | instance_diskio_io_serviced_total | Count/Second | | instance_diskio_io_serviced_write | Count/Second | | instance_diskio_io_service_bytes_async | Bytes/Second | | instance_diskio_io_service_bytes_read | Bytes/Second | | instance_diskio_io_service_bytes_sync | Bytes/Second | | instance_diskio_io_service_bytes_total | Bytes/Second | | instance_diskio_io_service_bytes_write | Bytes/Second |

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | AutoScalingGroupName | | Timestamp | | Type | | Version | | Sources | | ContainerInstanceId | | InstanceId | | EBSVolumeId |

### Instance Filesystem | Metric | Unit | |---------------------------------|---------| | instance_filesystem_available | Bytes | | instance_filesystem_capacity | Bytes | | instance_filesystem_inodes | Count | | instance_filesystem_inodes_free | Count | | instance_filesystem_usage | Bytes | | instance_filesystem_utilization | Percent |

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | AutoScalingGroupName | | Timestamp | | Type | | Version | | Sources | | ContainerInstanceId | | InstanceId | | EBSVolumeId |

### Instance Network | Metric | Unit | |----------------------------------------|--------------| | instance_interface_network_rx_bytes | Bytes/Second | | instance_interface_network_rx_dropped | Count/Second | | instance_interface_network_rx_errors | Count/Second | | instance_interface_network_rx_packets | Count/Second | | instance_interface_network_total_bytes | Bytes/Second | | instance_interface_network_tx_bytes | Bytes/Second | | instance_interface_network_tx_dropped | Count/Second | | instance_interface_network_tx_errors | Count/Second | | instance_interface_network_tx_packets | Count/Second |

| Resource Attribute | |----------------------| | ClusterName | | InstanceType | | AutoScalingGroupName | | Timestamp | | Type | | Version | | Sources | | ContainerInstanceId | | InstanceId | | EBSVolumeId |

# Warnings ## Root permissions When using this component, the collector process needs root permission to be able to read the content of the files located in the following locations: * `/` * `/var/run/docker.sock` * `/var/lib/docker` * `/run/containerd/containerd.sock` * `/sys` * `/dev/disk` This requirement comes from the fact that this component is based on [cAdvisor](https://github.com/google/cadvisor).