OpenTelemetry Bot d680729c09 [chore] Prepare release 0.90.0 (#29543) | 1 éve | |
---|---|---|
.. | ||
images | 3 éve | |
internal | 1 éve | |
testdata | 2 éve | |
Makefile | 3 éve | |
README.md | 1 éve | |
config.go | 1 éve | |
config_test.go | 1 éve | |
design.md | 3 éve | |
doc.go | 1 éve | |
factory.go | 1 éve | |
factory_test.go | 1 éve | |
go.mod | 1 éve | |
go.sum | 1 éve | |
metadata.yaml | 1 éve | |
receiver.go | 1 éve | |
receiver_test.go | 1 éve |
Status | |
---|---|
Stability | beta: metrics |
Distributions | contrib, aws, observiq, sumo |
Warnings | Other |
Issues | |
Code Owners | @Aneurysm9, @pxaws |
AWS Container Insights Receiver (awscontainerinsightreceiver
) is an AWS specific receiver that supports CloudWatch Container Insights. CloudWatch Container Insights collect, aggregate,
and summarize metrics and logs from your containerized applications and microservices. Data are collected as as performance log events
using embedded metric format. From the EMF data, Amazon CloudWatch can create the aggregated CloudWatch metrics at the cluster, node, pod, task, and service level.
CloudWatch Container Insights has been supported by ECS Agent and CloudWatch Agent to collect infrastructure metrics for many resources such as such as CPU, memory, disk, and network. To migrate existing customers to use OpenTelemetry, AWS Container Insights Receiver (together with CloudWatch EMF Exporter) aims to support the same CloudWatch Container Insights experience for the following platforms:
See the design doc
Example configuration:
receivers:
awscontainerinsightreceiver:
# all parameters are optional
collection_interval: 60s
container_orchestrator: eks
add_service_as_attribute: true
prefer_full_pod_name: false
add_full_pod_name_metric_label: false
There is no need to provide any parameters since they are all optional.
collection_interval (optional)
The interval at which metrics should be collected. The default is 60 second.
container_orchestrator (optional)
The type of container orchestration service, e.g. eks or ecs. The default is eks.
add_service_as_attribute (optional)
Whether to add the associated service name as attribute. The default is true
prefer_full_pod_name (optional)
The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... If it can not be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. The default value is false.
add_full_pod_name_metric_label (optional)
The "FullPodName" attribute is the pod name including suffix. If false FullPodName label is not added. The default value is false
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an EKS cluster:
# create namespace
apiVersion: v1
kind: Namespace
metadata:
name: aws-otel-eks
labels:
name: aws-otel-eks
---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-otel-sa
namespace: aws-otel-eks
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/stats", "configmaps", "events"]
verbs: ["create", "get"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["otel-container-insight-clusterleader"]
verbs: ["get","update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role-binding
subjects:
- kind: ServiceAccount
name: aws-otel-sa
namespace: aws-otel-eks
roleRef:
kind: ClusterRole
name: aoc-agent-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-conf
namespace: aws-otel-eks
labels:
app: opentelemetry
component: otel-agent-conf
data:
otel-agent-config: |
extensions:
health_check:
receivers:
awscontainerinsightreceiver:
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsights
log_group_name: '/aws/containerinsights/{ClusterName}/performance'
log_stream_name: '{NodeName}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources, kubernetes]
metric_declarations:
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_working_set
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
- dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_number_of_container_restarts
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# service metrics
- dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- service_number_of_running_pods
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
# namespace metrics
- dimensions: [[Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- namespace_number_of_running_pods
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf]
extensions: [health_check]
---
# create Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-otel-eks-ci
namespace: aws-otel-eks
spec:
selector:
matchLabels:
name: aws-otel-eks-ci
template:
metadata:
labels:
name: aws-otel-eks-ci
spec:
containers:
- name: aws-otel-collector
image: {collector-image-url}
env:
#- name: AWS_REGION
# value: "us-east-1"
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
imagePullPolicy: Always
command:
- "/awscollector"
- "--config=/conf/otel-agent-config.yaml"
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: dockersock
mountPath: /var/run/docker.sock
readOnly: true
- name: varlibdocker
mountPath: /var/lib/docker
readOnly: true
- name: containerdsock
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: devdisk
mountPath: /dev/disk
readOnly: true
- name: otel-agent-config-vol
mountPath: /conf
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 200m
memory: 200Mi
volumes:
- configMap:
name: otel-agent-conf
items:
- key: otel-agent-config
path: otel-agent-config.yaml
name: otel-agent-config-vol
- name: rootfs
hostPath:
path: /
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: varlibdocker
hostPath:
path: /var/lib/docker
- name: containerdsock
hostPath:
path: /run/containerd/containerd.sock
- name: sys
hostPath:
path: /sys
- name: devdisk
hostPath:
path: /dev/disk/
serviceAccountName: aws-otel-sa
To deploy to an EKS cluster
kubectl apply -f config.yaml
Metric | Unit |
---|---|
cluster_failed_node_count | Count |
cluster_node_count | Count |
| Resource Attribute |
|--------------------|
| ClusterName |
| NodeName |
| Type |
| Timestamp |
| Version |
| Sources |
Metric | Unit |
---|---|
namespace_number_of_running_pods | Count |
| Resource Attribute |
|--------------------|
| ClusterName |
| NodeName |
| Namespace |
| Type |
| Timestamp |
| Version |
| Sources |
| kubernete |
Metric | Unit |
---|---|
service_number_of_running_pods | Count |
| Resource Attribute |
|--------------------|
| ClusterName |
| NodeName |
| Namespace |
| Service |
| Type |
| Timestamp |
| Version |
| Sources |
| kubernete |
Metric | Unit |
---|---|
node_cpu_limit | Millicore |
node_cpu_request | Millicore |
node_cpu_reserved_capacity | Percent |
node_cpu_usage_system | Millicore |
node_cpu_usage_total | Millicore |
node_cpu_usage_user | Millicore |
node_cpu_utilization | Percent |
node_memory_cache | Bytes |
node_memory_failcnt | Count |
node_memory_hierarchical_pgfault | Count/Second |
node_memory_hierarchical_pgmajfault | Count/Second |
node_memory_limit | Bytes |
node_memory_mapped_file | Bytes |
node_memory_max_usage | Bytes |
node_memory_pgfault | Count/Second |
node_memory_pgmajfault | Count/Second |
node_memory_request | Bytes |
node_memory_reserved_capacity | Percent |
node_memory_rss | Bytes |
node_memory_swap | Bytes |
node_memory_usage | Bytes |
node_memory_utilization | Percent |
node_memory_working_set | Bytes |
node_network_rx_bytes | Bytes/Second |
node_network_rx_dropped | Count/Second |
node_network_rx_errors | Count/Second |
node_network_rx_packets | Count/Second |
node_network_total_bytes | Bytes/Second |
node_network_tx_bytes | Bytes/Second |
node_network_tx_dropped | Count/Second |
node_network_tx_errors | Count/Second |
node_network_tx_packets | Count/Second |
node_number_of_running_containers | Count |
node_number_of_running_pods | Count |
| Resource Attribute |
|----------------------|
| ClusterName |
| InstanceType |
| NodeName |
| Timestamp |
| Type |
| Version |
| Sources |
| kubernete |
Metric | Unit |
---|---|
node_diskio_io_serviced_async | Count/Second |
node_diskio_io_serviced_read | Count/Second |
node_diskio_io_serviced_sync | Count/Second |
node_diskio_io_serviced_total | Count/Second |
node_diskio_io_serviced_write | Count/Second |
node_diskio_io_service_bytes_async | Bytes/Second |
node_diskio_io_service_bytes_read | Bytes/Second |
node_diskio_io_service_bytes_sync | Bytes/Second |
node_diskio_io_service_bytes_total | Bytes/Second |
node_diskio_io_service_bytes_write | Bytes/Second |
| Resource Attribute |
|----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| NodeName |
| Timestamp |
| EBSVolumeId |
| device |
| Type |
| Version |
| Sources |
| kubernete |
Metric | Unit |
---|---|
node_filesystem_available | Bytes |
node_filesystem_capacity | Bytes |
node_filesystem_inodes | Count |
node_filesystem_inodes_free | Count |
node_filesystem_usage | Bytes |
node_filesystem_utilization | Percent |
| Resource Attribute |
|----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| NodeName |
| Timestamp |
| EBSVolumeId |
| device |
| fstype |
| Type |
| Version |
| Sources |
| kubernete |
Metric | Unit |
---|---|
node_interface_network_rx_bytes | Bytes/Second |
node_interface_network_rx_dropped | Count/Second |
node_interface_network_rx_errors | Count/Second |
node_interface_network_rx_packets | Count/Second |
node_interface_network_total_bytes | Bytes/Second |
node_interface_network_tx_bytes | Bytes/Second |
node_interface_network_tx_dropped | Count/Second |
node_interface_network_tx_errors | Count/Second |
node_interface_network_tx_packets | Count/Second |
| Resource Attribute |
|----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| NodeName |
| Timestamp |
| Type |
| Version |
| interface |
| Sources |
| kubernete |
Metric | Unit |
---|---|
pod_cpu_limit | Millicore |
pod_cpu_request | Millicore |
pod_cpu_reserved_capacity | Percent |
pod_cpu_usage_system | Millicore |
pod_cpu_usage_total | Millicore |
pod_cpu_usage_user | Millicore |
pod_cpu_utilization | Percent |
pod_cpu_utilization_over_pod_limit | Percent |
pod_memory_cache | Bytes |
pod_memory_failcnt | Count |
pod_memory_hierarchical_pgfault | Count/Second |
pod_memory_hierarchical_pgmajfault | Count/Second |
pod_memory_limit | Bytes |
pod_memory_mapped_file | Bytes |
pod_memory_max_usage | Bytes |
pod_memory_pgfault | Count/Second |
pod_memory_pgmajfault | Count/Second |
pod_memory_request | Bytes |
pod_memory_reserved_capacity | Percent |
pod_memory_rss | Bytes |
pod_memory_swap | Bytes |
pod_memory_usage | Bytes |
pod_memory_utilization | Percent |
pod_memory_utilization_over_pod_limit | Percent |
pod_memory_working_set | Bytes |
pod_network_rx_bytes | Bytes/Second |
pod_network_rx_dropped | Count/Second |
pod_network_rx_errors | Count/Second |
pod_network_rx_packets | Count/Second |
pod_network_total_bytes | Bytes/Second |
pod_network_tx_bytes | Bytes/Second |
pod_network_tx_dropped | Count/Second |
pod_network_tx_errors | Count/Second |
pod_network_tx_packets | Count/Second |
pod_number_of_container_restarts | Count |
pod_number_of_containers | Count |
pod_number_of_running_containers | Count |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
InstanceId |
InstanceType |
K8sPodName |
Namespace |
NodeName |
PodId |
Timestamp |
Type |
Version |
Sources |
kubernete |
pod_status |
Metric | Unit |
---|---|
pod_interface_network_rx_bytes | Bytes/Second |
pod_interface_network_rx_dropped | Count/Second |
pod_interface_network_rx_errors | Count/Second |
pod_interface_network_rx_packets | Count/Second |
pod_interface_network_total_bytes | Bytes/Second |
pod_interface_network_tx_bytes | Bytes/Second |
pod_interface_network_tx_dropped | Count/Second |
pod_interface_network_tx_errors | Count/Second |
pod_interface_network_tx_packets | Count/Second |
| Resource Attribute |
|----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| K8sPodName |
| Namespace |
| NodeName |
| PodId |
| Timestamp |
| Type |
| Version |
| interface |
| Sources |
| kubernete |
| pod_status |
Metric | Unit |
---|---|
container_cpu_limit | Millicore |
container_cpu_request | Millicore |
container_cpu_usage_system | Millicore |
container_cpu_usage_total | Millicore |
container_cpu_usage_user | Millicore |
container_cpu_utilization | Percent |
container_memory_cache | Bytes |
container_memory_failcnt | Count |
container_memory_hierarchical_pgfault | Count/Second |
container_memory_hierarchical_pgmajfault | Count/Second |
container_memory_limit | Bytes |
container_memory_mapped_file | Bytes |
container_memory_max_usage | Bytes |
container_memory_pgfault | Count/Second |
container_memory_pgmajfault | Count/Second |
container_memory_request | Bytes |
container_memory_rss | Bytes |
container_memory_swap | Bytes |
container_memory_usage | Bytes |
container_memory_utilization | Percent |
container_memory_working_set | Bytes |
number_of_container_restarts | Count |
Resource Attribute |
---|
AutoScalingGroupName |
ClusterName |
ContainerId |
ContainerName |
InstanceId |
InstanceType |
K8sPodName |
Namespace |
NodeName |
PodId |
Timestamp |
Type |
Version |
Sources |
kubernetes |
container_status |
container_status_reason |
container_last_termination_reason |
The attribute container_status_reason
is present only when container_status
is in "Waiting" or "Terminated" State. The attribute container_last_termination_reason
is present only when container_status
is in "Terminated" State.
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an ECS cluster to collect the instance level metrics:
receivers:
awscontainerinsightreceiver:
collection_interval: 10s
container_orchestrator: ecs
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsightsEC2Instance
log_group_name: '/aws/ecs/containerinsights/{ClusterName}/performance'
log_stream_name: 'instanceTelemetry/{ContainerInstanceId}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources]
metric_declarations:
# instance metrics
- dimensions: [ [ ContainerInstanceId, InstanceId, ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_filesystem_utilization
- dimensions: [ [ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_cpu_usage_total
- instance_cpu_limit
- instance_memory_working_set
- instance_memory_limit
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf,debug]
To deploy to an ECS cluster check this doc for details
Metric | Unit |
---|---|
instance_cpu_limit | Millicore |
instance_cpu_reserved_capacity | Percent |
instance_cpu_usage_system | Millicore |
instance_cpu_usage_total | Millicore |
instance_cpu_usage_user | Millicore |
instance_cpu_utilization | Percent |
instance_memory_cache | Bytes |
instance_memory_failcnt | Count |
instance_memory_hierarchical_pgfault | Count/Second |
instance_memory_hierarchical_pgmajfault | Count/Second |
instance_memory_limit | Bytes |
instance_memory_mapped_file | Bytes |
instance_memory_max_usage | Bytes |
instance_memory_pgfault | Count/Second |
instance_memory_pgmajfault | Count/Second |
instance_memory_reserved_capacity | Percent |
instance_memory_rss | Bytes |
instance_memory_swap | Bytes |
instance_memory_usage | Bytes |
instance_memory_utilization | Percent |
instance_memory_working_set | Bytes |
instance_network_rx_bytes | Bytes/Second |
instance_network_rx_dropped | Count/Second |
instance_network_rx_errors | Count/Second |
instance_network_rx_packets | Count/Second |
instance_network_total_bytes | Bytes/Second |
instance_network_tx_bytes | Bytes/Second |
instance_network_tx_dropped | Count/Second |
instance_network_tx_errors | Count/Second |
instance_network_tx_packets | Count/Second |
instance_number_of_running_tasks | Count |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Timestamp |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
Metric | Unit |
---|---|
instance_diskio_io_serviced_async | Count/Second |
instance_diskio_io_serviced_read | Count/Second |
instance_diskio_io_serviced_sync | Count/Second |
instance_diskio_io_serviced_total | Count/Second |
instance_diskio_io_serviced_write | Count/Second |
instance_diskio_io_service_bytes_async | Bytes/Second |
instance_diskio_io_service_bytes_read | Bytes/Second |
instance_diskio_io_service_bytes_sync | Bytes/Second |
instance_diskio_io_service_bytes_total | Bytes/Second |
instance_diskio_io_service_bytes_write | Bytes/Second |
Resource Attribute |
---|
ClusterName |
InstanceType |
AutoScalingGroupName |
Timestamp |
Type |
Version |
Sources |
ContainerInstanceId |
InstanceId |
EBSVolumeId |
Metric | Unit |
---|---|
instance_filesystem_available | Bytes |
instance_filesystem_capacity | Bytes |
instance_filesystem_inodes | Count |
instance_filesystem_inodes_free | Count |
instance_filesystem_usage | Bytes |
instance_filesystem_utilization | Percent |
| Resource Attribute |
|----------------------|
| ClusterName |
| InstanceType |
| AutoScalingGroupName |
| Timestamp |
| Type |
| Version |
| Sources |
| ContainerInstanceId |
| InstanceId |
| EBSVolumeId |
Metric | Unit |
---|---|
instance_interface_network_rx_bytes | Bytes/Second |
instance_interface_network_rx_dropped | Count/Second |
instance_interface_network_rx_errors | Count/Second |
instance_interface_network_rx_packets | Count/Second |
instance_interface_network_total_bytes | Bytes/Second |
instance_interface_network_tx_bytes | Bytes/Second |
instance_interface_network_tx_dropped | Count/Second |
instance_interface_network_tx_errors | Count/Second |
instance_interface_network_tx_packets | Count/Second |
| Resource Attribute |
|----------------------|
| ClusterName |
| InstanceType |
| AutoScalingGroupName |
| Timestamp |
| Type |
| Version |
| Sources |
| ContainerInstanceId |
| InstanceId |
| EBSVolumeId |
When using this component, the collector process needs root permission to be able to read the content of the files located in the following locations:
/
/var/run/docker.sock
/var/lib/docker
/run/containerd/containerd.sock
/sys
/dev/disk
This requirement comes from the fact that this component is based on cAdvisor.