User Workload Monitoring

User Workload Monitoring

Prerequisites

Setup User Workload Monitoring

  DEFAULT_STORAGE_CLASS=$(oc get sc -A -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}')
  cat manifests/cluster-monitoring-config.yaml | sed 's/storageClassName:.*/storageClassName: '$DEFAULT_STORAGE_CLASS'/' | oc apply -f  -
  sleep 10
  oc -n openshift-user-workload-monitoring wait --for condition=ready \
   --timeout=180s pod -l app.kubernetes.io/name=prometheus
  oc -n openshift-user-workload-monitoring wait --for condition=ready \
   --timeout=180s pod -l app.kubernetes.io/name=thanos-ruler
  oc get pvc -n openshift-monitoring

Output

  configmap/cluster-monitoring-config created
  pod/prometheus-user-workload-0 condition met
  pod/prometheus-user-workload-1 condition met
  pod/thanos-ruler-user-workload-0 condition met
  pod/thanos-ruler-user-workload-1 condition met
  NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
  prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-c1513ebb-6fcf-4136-b3a2-41a7a3afd83b   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   21s
  prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-f9f9d7b1-3303-41ed-9549-43fab596cebe   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   21s

Verify monitoring stack

oc  get pod -n openshift-user-workload-monitoring

Sample output

NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-787b6f6d75-lrgjg   2/2     Running   0          70s
prometheus-user-workload-0             6/6     Running   0          63s
prometheus-user-workload-1             6/6     Running   0          63s
thanos-ruler-user-workload-0           4/4     Running   0          62s
thanos-ruler-user-workload-1           4/4     Running   0          62s

CPU and Memory used by User Workload Monitoring

Overall resouces consumed by user workload monitoring

CPU Usage

Memory Usage

CPU Quota

Memory Quota

Service Monitoring

Deploy frontend/backend app with custom metrics

oc create -f manifests/frontend-v1-and-backend-v1-JVM.yaml -n project1

Call frontend app with curl

curl https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1)

Output

Frontend version: v1 => [Backend: http://backend:8080, Response: 200, Body: Backend version:v1, Response:200, Host:backend-v1-5587465b8d-9kvsc, Status:200, Message: Hello, World]

Check for backend's metrics

Check JVM heap size

oc exec -n project1 $(oc get pods -l app=backend \
--no-headers  -o custom-columns='Name:.metadata.name' \
-n project1 | head -n 1 ) \
-- curl -s  http://localhost:8080/q/metrics | grep heap

Sample output

jvm_memory_used_bytes{area="heap",id="Tenured Gen"} 1.1301808E7
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'"} 5261952.0
jvm_memory_used_bytes{area="heap",id="Eden Space"} 2506768.0
jvm_memory_used_bytes{area="nonheap",id="Metaspace"} 3.3255E7
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'"} 1389824.0
jvm_memory_used_bytes{area="heap",id="Survivor Space"} 524288.0
jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space"} 4096808.0
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'"} 1120640.0
...

Check for backend application related metrics

curl https://$(oc get route frontend -n project1 -o jsonpath='{.spec.host}')
oc exec -n project1 $(oc get pods -l app=backend \
--no-headers  -o custom-columns='Name:.metadata.name' \
-n project1 | head -n 1 ) \
-- curl -s  http://localhost:8080/q/metrics | grep http_server_requests_seconds

Check that http_server_requests_seconds_count with method GET and root URI value is 1 with return status 200

# HELP http_server_requests_seconds
http_server_requests_seconds_count{method="GET",outcome="SUCCESS",status="200",uri="root"} 2.0
http_server_requests_seconds_sum{method="GET",outcome="SUCCESS",status="200",uri="root"} 3.769968656
# TYPE http_server_requests_seconds_max gauge
# HELP http_server_requests_seconds_max
http_server_requests_seconds_max{method="GET",outcome="SUCCESS",status="200",uri="root"} 3.486753482

Create Service Monitoring to monitor backend service

  apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    name: backend-monitor
  spec:
    endpoints:
    - interval: 30s
      port: http
      path: /q/metrics # Get metrics from URI /q/metrics
      scheme: http
    selector:
      matchLabels:
        app: backend # select only label app = backend

Create service monitor

  oc apply -f manifests/backend-service-monitor.yaml -n project1

Remark: Role monitor-edit is required for create ServiceMonitor and PodMonitor resources. Following example is granting role montior-edit to user1 for project1

  oc adm policy add-role-to-user  monitoring-edit user1 -n project1

Developer Console monitoring metrics

Select application metrics

Application metrics

Scale backend-v1 and frontend-v1 to 5 pod and run load test tool

oc scale deployment backend-v1 --replicas=5 -n project1
oc scale deployment frontend-v1 --replicas=5 -n project1

Use K6

 oc run load-test-frontend -n project1 \
-i --image=loadimpact/k6  \
--rm=true --restart=Never --  run -< manifests/load-test-k6.js \
-e URL=http://frontend:8080 \
-e THREADS=10 \
-e RAMPUP=30s \
-e DURATION=3m \
-e RAMPDOWN=30s \
-e K6_NO_CONNECTION_REUSE=true

Use Siege

oc create -f manifests/tools.yaml -n project1
TOOL=$(oc get po -n project1 -l app=network-tools --no-headers  -o custom-columns='Name:.metadata.name')

Run siege command

oc exec -n project1 $TOOL -- siege -c 20 -t 4m http://frontend:8080

PromQL for query request/sec for GET method to root URI of backend service
```
rate(http_server_requests_seconds_count{method="GET",uri="root"}[1m])
```
Sample stacked graph

K6

This is optional steps. If you have k6 on your manchine

k6 run manifests/load-test-k6.js \
--out json=output.json \
--summary-export=summary-export.json \
-e URL=https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1) \
-e THREADS=10 \
-e RAMPUP=30s \
-e DURATION=3m \
-e RAMPDOWN=30s \
-e K6_NO_CONNECTION_REUSE=true

Remark: output are stored in output.json and summary-export.json

Run K6 with dashboard (You need to install K6 dashbaord first)

k6 run manifests/load-test-k6.js \
--out dashboard \
-e URL=https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1) \
-e THREADS=10 \
-e RAMPUP=30s \
-e DURATION=3m \
-e RAMPDOWN=30s \
-e K6_NO_CONNECTION_REUSE=true

View Dashboard

Custom Grafana Dashboard

Install Grafana Operator to project application-monitor

or using CLI

oc create -f manifests/grafana-sub.yaml

Verify

oc get csv -n openshift-operators

Output

NAME                      DISPLAY            VERSION   REPLACES                  PHASE
grafana-operator.v5.6.2   Grafana Operator   5.6.2     grafana-operator.v5.6.1   Succeeded

Use Grafana Operator by Red Hat to deploy Grafana and configure datasource to Thanos Querier Remark: Grafana Operator is Community Edition - not supported by Red Hat

Create project

oc new-project application-monitor --display-name="App Dashboard" --description="Grafana Dashboard for Application Monitoring"

Create Grafana instance

oc create -f manifests/grafana.yaml -n application-monitor
oc create route edge grafana --service=grafana-service --port=3000 -n application-monitor
watch -d oc get pods -n application-monitor

Sample Output

NAME                                 READY   STATUS    RESTARTS   AGE
grafana-deployment-96d5f5954-5hml7   1/1     Running   0          14s

Add cluster role cluster-monitoring-view to Grafana ServiceAccount

oc adm policy add-cluster-role-to-user cluster-monitoring-view \
-z grafana-sa -n application-monitor

Create Grafana DataSource with serviceaccount grafana-serviceaccount's token and connect to thanos-querier

TOKEN=$(oc create token grafana-sa --duration=4294967296s -n application-monitor)
cat manifests/grafana-datasource.yaml|sed 's/Bearer .*/Bearer '"$TOKEN""'"'/'|oc apply -n application-monitor -f -

Create Grafana Dashboard

oc apply -f manifests/grafana-dashboard.yaml -n application-monitor

Login to Grafana Dashboard with following URL
```
echo "Grafana URL => https://$(oc get route grafana -o jsonpath='{.spec.host}' -n application-monitor)"
```
or use link provided by Developer Console

Login with user admin and password openshift

Generate workload

bash script to loop request to frontend application.

FRONTEND_URL=https://$(oc get route frontend -n project1 -o jsonpath='{.spec.host}')
while [ 1 ];
do
  curl $FRONTEND_URL
  printf "\n"
  sleep .2
done

k6 load test tool with 10 threads for 5 minutes

oc run load-test-frontend -n project1 \
-i --image=loadimpact/k6  \
--rm=true --restart=Never --  run -< manifests/load-test-k6.js \
-e URL=http://frontend:8080 \
-e THREADS=15 \
-e RAMPUP=30s \
-e DURATION=5m \
-e RAMPDOWN=30s

Grafana Dashboard

Custom Alert

Check PrometheusRule

backend-app-alert is consists with 2 following alerts:
- ConcurrentBackend severity warning when total concurrent reqeusts is greater than 40 requests/sec
- HighLatency servierity critical when percentile 90th of response time is greater than 5 sec

Create backend-app-alert
```
oc apply -f manifests/backend-custom-alert.yaml
```
Remark: Role monitoring-rules-view is required for view PrometheusRule resource and role monitoring-rules-edit is required to create, modify, and deleting PrometheusRule

Following example is granting role monitoring-rules-view and monitoring-rules-edit to user1 for project1
```
oc adm policy add-role-to-user  monitoring-rules-view user1 -n project1
oc adm policy add-role-to-user  monitoring-rules-edit user1 -n project1
```

Test ConcurrentBackend alert with 25 concurrent requests

   oc run load-test-frontend -n project1 \
  -i --image=loadimpact/k6  \
  --rm=true --restart=Never --  run -< manifests/load-test-k6.js \
  -e URL=http://frontend:8080 \
  -e THREADS=15 \
  -e RAMPUP=30s \
  -e DURATION=5m \
  -e RAMPDOWN=30s

Check for alert in Developer Console by select Menu Monitoring then select Alerts

Check for alert in Developer Console
Console overview status

sum(increase(http_server_requests_seconds_count{outcome="SUCCESS"}[5m]))/sum(increase(http_server_requests_seconds_count[5m]))*100

User Workload Monitoring

User Workload Monitoring

Prerequisites

Service Monitoring

K6

Custom Grafana Dashboard

Custom Alert

results matching ""

No results matching ""