README.md.gotmpl 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370
  1. {{ template "chart.header" . }}
  2. {{ template "chart.versionBadge" . }}{{ template "chart.typeBadge" . }}{{ template "chart.appVersionBadge" . }}
  3. {{ template "chart.description" . }}
  4. {{ template "chart.sourcesSection" . }}
  5. {{ template "chart.requirementsSection" . }}
  6. ## Chart Repo
  7. Add the following repo to use the chart:
  8. ```console
  9. helm repo add grafana https://grafana.github.io/helm-charts
  10. ```
  11. ## Upgrading
  12. ### Upgrading an existing Release to a new major version
  13. Major version upgrades listed here indicate that there is an incompatible breaking change needing manual actions.
  14. ### From 0.68.x to 0.69.0
  15. The in-memory `fifocache` has been renamed to more general `embedded_cache`, which currently doesn't have a `max_size_items` attribute.
  16. ```yaml
  17. loki:
  18. config: |
  19. chunk_store_config:
  20. chunk_cache_config:
  21. embedded_cache:
  22. enabled: false
  23. ```
  24. `compactor_address` has to be explicitly set in the `common` section of the config.
  25. ```yaml
  26. loki:
  27. config: |
  28. common:
  29. compactor_address: {{"{{"}} include "loki.compactorFullname" . {{"}}"}}:3100
  30. ```
  31. ### From 0.41.x to 0.42.0
  32. All containers were previously named "loki". This version changes the container names to make the chart compatible with the loki-mixin. Now the container names correctly reflect the component (querier, distributor, ingester, ...). If you are using custom prometheus rules that use the container name you probably have to change them.
  33. ### From 0.34.x to 0.35.0
  34. This version updates the `Ingress` API Version of the Loki Gateway component to `networking.k8s.io/v1` of course given that the cluster supports it. Here it's important to notice the change in the `values.yml` with regards to the ingress configuration section and its new structure.
  35. ```yaml
  36. gateway:
  37. ingress:
  38. enabled: true
  39. # Newly added optional property
  40. ingressClassName: nginx
  41. hosts:
  42. - host: gateway.loki.example.com
  43. paths:
  44. # New data structure introduced
  45. - path: /
  46. # Newly added optional property
  47. pathType: Prefix
  48. ```
  49. ### From 0.30.x to 0.31.0
  50. This version updates the `podManagementPolicy` of running the Loki components as `StatefulSet`'s to `Parallel` instead of the default `OrderedReady` in order to allow better scalability for Loki e.g. in case the pods weren't terminated gracefully. This change requires a manual action deleting the existing StatefulSets before upgrading with Helm.
  51. ```bash
  52. # Delete the Ingesters StatefulSets
  53. kubectl delete statefulset RELEASE_NAME-loki-distributed-ingester -n LOKI_NAMESPACE --cascade=orphan
  54. # Delete the Queriers StatefulSets
  55. kubectl delete statefulset RELEASE_NAME-loki-distributed-querier -n LOKI_NAMESPACE --cascade=orphan
  56. ```
  57. {{ template "chart.valuesSection" . }}
  58. ## Components
  59. The chart supports the components shown in the following table.
  60. Ingester, distributor, querier, and query-frontend are always installed.
  61. The other components are optional.
  62. | Component | Optional | Enabled by default |
  63. | --- | --- | --- |
  64. | gateway | ✅ | ✅ |
  65. | ingester | ❎ | n/a |
  66. | distributor | ❎ | n/a |
  67. | querier | ❎ | n/a |
  68. | query-frontend | ❎ | n/a |
  69. | table-manager | ✅ | ❎ |
  70. | compactor | ✅ | ❎ |
  71. | ruler | ✅ | ❎ |
  72. | index-gateway | ✅ | ❎ |
  73. | memcached-chunks | ✅ | ❎ |
  74. | memcached-frontend | ✅ | ❎ |
  75. | memcached-index-queries | ✅ | ❎ |
  76. | memcached-index-writes | ✅ | ❎ |
  77. ## Configuration
  78. This chart configures Loki in microservices mode.
  79. It has been tested to work with [boltdb-shipper](https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/)
  80. and [memberlist](https://grafana.com/docs/loki/latest/configuration/#memberlist_config) while other storage and discovery options should work as well.
  81. However, the chart does not support setting up Consul or Etcd for discovery,
  82. and it is not intended to support these going forward.
  83. They would have to be set up separately.
  84. Instead, memberlist can be used which does not require a separate key/value store.
  85. The chart creates a headless service for the memberlist which ingester, distributor, querier, and ruler are part of.
  86. ----
  87. **NOTE:**
  88. In its default configuration, the chart uses `boltdb-shipper` and `filesystem` as storage.
  89. The reason for this is that the chart can be validated and installed in a CI pipeline.
  90. However, this setup is not fully functional.
  91. Querying will not be possible (or limited to the ingesters' in-memory caches) because that would otherwise require shared storage between ingesters and queriers
  92. which the chart does not support and would require a volume that supports `ReadWriteMany` access mode anyways.
  93. The recommendation is to use object storage, such as S3, GCS, MinIO, etc., or one of the other options documented at https://grafana.com/docs/loki/latest/storage/.
  94. Alternatively, in order to quickly test Loki using the filestore, the [single binary chart](https://github.com/grafana/helm-charts/tree/main/charts/loki) can be used.
  95. ----
  96. ### Directory and File Locations
  97. * Volumes are mounted to `/var/loki`. The various directories Loki needs should be configured as subdirectories (e. g. `/var/loki/index`, `/var/loki/cache`). Loki will create the directories automatically.
  98. * The config file is mounted to `/etc/loki/config/config.yaml` and passed as CLI arg.
  99. ### Example configuration using memberlist, boltdb-shipper, and S3 for storage
  100. ```yaml
  101. loki:
  102. structuredConfig:
  103. ingester:
  104. # Disable chunk transfer which is not possible with statefulsets
  105. # and unnecessary for boltdb-shipper
  106. max_transfer_retries: 0
  107. chunk_idle_period: 1h
  108. chunk_target_size: 1536000
  109. max_chunk_age: 1h
  110. storage_config:
  111. aws:
  112. s3: s3://eu-central-1
  113. bucketnames: my-loki-s3-bucket
  114. boltdb_shipper:
  115. shared_store: s3
  116. schema_config:
  117. configs:
  118. - from: 2020-09-07
  119. store: boltdb-shipper
  120. object_store: aws
  121. schema: v11
  122. index:
  123. prefix: loki_index_
  124. period: 24h
  125. ```
  126. The above configuration selectively overrides default values found in the `loki.config` template file.
  127. Using `loki.structuredConfig` it is possible to externally set most any configuration parameter (special considerations for elements of an array).
  128. ```
  129. helm upgrade loki-distributed --install -f values.yaml --set loki.structuredConfig.storage_config.aws.bucketnames=my-loki-bucket
  130. ```
  131. `loki.config`, `loki.schemaConfig` and `loki.storageConfig` may also be used in conjuction with `loki.structuredConfig`. Values found in `loki.structuredConfig` will take precedence. Array values, such as those found in `loki.schema_config` will be overridden wholesale and not amended to.
  132. For `loki.schema_config` its generally expected that this will always be configured per usage as its values over time are in reference to the history of loki schema versions and schema configurations throughout the lifetime of a given loki instance.
  133. Note that when using `loki.config` must be configured as string.
  134. That's required because it is passed through the `tpl` function in order to support templating.
  135. When using `loki.config` the passed in template must include template sections for `loki.schemaConfig` and `loki.storageConfig` for those to continue to work as expected.
  136. Because the config file is templated, it is also possible to reference other values provided to helm e.g. externalize S3 bucket names:
  137. ```yaml
  138. loki:
  139. config: |
  140. storage_config:
  141. aws:
  142. s3: s3://eu-central-1
  143. bucketnames: {{"{{"}} .Values.bucketnames {{"}}"}}
  144. ```
  145. ```console
  146. helm upgrade loki-distributed --install -f values.yaml --set bucketnames=my-loki-bucket
  147. ```
  148. ## Gateway
  149. By default and inspired by Grafana's [Tanka setup](https://github.com/grafana/loki/tree/master/production/ksonnet/loki), the chart installs the gateway component which is an NGINX that exposes Loki's API
  150. and automatically proxies requests to the correct Loki components (distributor, querier, query-frontend).
  151. The gateway must be enabled if an Ingress is required, since the Ingress exposes the gateway only.
  152. If the gateway is enabled, Grafana and log shipping agents, such as Promtail, should be configured to use the gateway.
  153. If NetworkPolicies are enabled, they are more restrictive if the gateway is enabled.
  154. ## Metrics
  155. Loki exposes Prometheus metrics.
  156. The chart can create ServiceMonitor objects for all Loki components.
  157. ```yaml
  158. serviceMonitor:
  159. enabled: true
  160. ```
  161. Furthermore, it is possible to add Prometheus rules:
  162. ```yaml
  163. prometheusRule:
  164. enabled: true
  165. groups:
  166. - name: loki-rules
  167. rules:
  168. - record: job:loki_request_duration_seconds_bucket:sum_rate
  169. expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)
  170. - record: job_route:loki_request_duration_seconds_bucket:sum_rate
  171. expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)
  172. - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
  173. expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)
  174. ```
  175. ## Caching
  176. The chart can configure up to four Memcached instances for the various caches Lokis can use.
  177. Configuration works the same for all caches.
  178. The configuration of `memcached-chunks` below demonstrates setting additional options.
  179. Exporters for the Memcached instances can be configured as well.
  180. ```yaml
  181. memcachedExporter:
  182. enabled: true
  183. ```
  184. ### memcached-chunks
  185. ```yaml
  186. memcachedChunks:
  187. enabled: true
  188. replicas: 2
  189. extraArgs:
  190. - -m 2048
  191. - -I 2m
  192. - -v
  193. resources:
  194. requests:
  195. cpu: 500m
  196. memory: 3Gi
  197. limits:
  198. cpu: "2"
  199. memory: 3Gi
  200. loki:
  201. config: |
  202. chunk_store_config:
  203. chunk_cache_config:
  204. memcached:
  205. batch_size: 100
  206. parallelism: 100
  207. memcached_client:
  208. consistent_hash: true
  209. host: {{"{{"}} include "loki.memcachedChunksFullname" . {{"}}"}}
  210. service: memcached-client
  211. ```
  212. ### memcached-frontend
  213. ```yaml
  214. memcachedFrontend:
  215. enabled: true
  216. loki:
  217. config: |
  218. query_range:
  219. cache_results: true
  220. results_cache:
  221. cache:
  222. memcached_client:
  223. consistent_hash: true
  224. host: {{"{{"}} include "loki.memcachedFrontendFullname" . {{"}}"}}
  225. max_idle_conns: 16
  226. service: memcached-client
  227. timeout: 500ms
  228. update_interval: 1m
  229. ```
  230. ### memcached-index-queries
  231. ```yaml
  232. memcachedIndexQueries:
  233. enabled: true
  234. loki:
  235. config: |
  236. storage_config:
  237. index_queries_cache_config:
  238. memcached:
  239. batch_size: 100
  240. parallelism: 100
  241. memcached_client:
  242. consistent_hash: true
  243. host: {{"{{"}} include "loki.memcachedIndexQueriesFullname" . {{"}}"}}
  244. service: memcached-client
  245. ```
  246. ### memcached-index-writes
  247. NOTE: This cache is not used with `boltdb-shipper` and should not be enabled in that case.
  248. ```yaml
  249. memcachedIndexWrite:
  250. enabled: true
  251. loki:
  252. config: |
  253. chunk_store_config:
  254. write_dedupe_cache_config:
  255. memcached:
  256. batch_size: 100
  257. parallelism: 100
  258. memcached_client:
  259. consistent_hash: true
  260. host: {{"{{"}} include "loki.memcachedIndexWritesFullname" . {{"}}"}}
  261. service: memcached-client
  262. ```
  263. ## Compactor
  264. Compactor is an optional component which must explicitly be enabled.
  265. The chart automatically sets the correct working directory as command-line arg.
  266. The correct storage backend must be configured, e.g. `s3`.
  267. ```yaml
  268. compactor:
  269. enabled: true
  270. loki:
  271. config: |
  272. compactor:
  273. shared_store: s3
  274. ```
  275. ## Ruler
  276. Ruler is an optional component which must explicitly be enabled.
  277. In addition to installing the ruler, the chart also supports creating rules.
  278. Rules files must be added to directories named after the tenants.
  279. See `values.yaml` for a more detailed example.
  280. ```yaml
  281. ruler:
  282. enabled: true
  283. directories:
  284. fake:
  285. rules.txt: |
  286. groups:
  287. - name: should_fire
  288. rules:
  289. - alert: HighPercentageError
  290. expr: |
  291. sum(rate({app="loki"} |= "error" [5m])) by (job)
  292. /
  293. sum(rate({app="loki"}[5m])) by (job)
  294. > 0.05
  295. for: 10m
  296. labels:
  297. severity: warning
  298. annotations:
  299. summary: High error percentage
  300. ```