User Workload Monitoring
Prerequisites
- Setup User Workload Monitoring 
  DEFAULT_STORAGE_CLASS=$(oc get sc -A -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}')
  cat manifests/cluster-monitoring-config.yaml | sed 's/storageClassName:.*/storageClassName: '$DEFAULT_STORAGE_CLASS'/' | oc apply -f  -
  sleep 10
  oc -n openshift-user-workload-monitoring wait --for condition=ready \
   --timeout=180s pod -l app.kubernetes.io/name=prometheus
  oc -n openshift-user-workload-monitoring wait --for condition=ready \
   --timeout=180s pod -l app.kubernetes.io/name=thanos-ruler
  oc get pvc -n openshift-monitoring
Output
  configmap/cluster-monitoring-config created
  pod/prometheus-user-workload-0 condition met
  pod/prometheus-user-workload-1 condition met
  pod/thanos-ruler-user-workload-0 condition met
  pod/thanos-ruler-user-workload-1 condition met
  NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
  prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-c1513ebb-6fcf-4136-b3a2-41a7a3afd83b   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   21s
  prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-f9f9d7b1-3303-41ed-9549-43fab596cebe   50Gi       RWO            ocs-external-storagecluster-ceph-rbd   21s
- Verify monitoring stack - oc get pod -n openshift-user-workload-monitoring- Sample output - NAME READY STATUS RESTARTS AGE prometheus-operator-787b6f6d75-lrgjg 2/2 Running 0 70s prometheus-user-workload-0 6/6 Running 0 63s prometheus-user-workload-1 6/6 Running 0 63s thanos-ruler-user-workload-0 4/4 Running 0 62s thanos-ruler-user-workload-1 4/4 Running 0 62s
- CPU and Memory used by User Workload Monitoring - Overall resouces consumed by user workload monitoring  - CPU Usage  - Memory Usage  - CPU Quota  - Memory Quota  
Service Monitoring
- Deploy frontend/backend app with custom metrics - oc create -f manifests/frontend-v1-and-backend-v1-JVM.yaml -n project1 
- Call frontend app with curl - curl https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1)- Output - Frontend version: v1 => [Backend: http://backend:8080, Response: 200, Body: Backend version:v1, Response:200, Host:backend-v1-5587465b8d-9kvsc, Status:200, Message: Hello, World]- Check for backend's metrics - Check JVM heap size - oc exec -n project1 $(oc get pods -l app=backend \ --no-headers -o custom-columns='Name:.metadata.name' \ -n project1 | head -n 1 ) \ -- curl -s http://localhost:8080/q/metrics | grep heap- Sample output - jvm_memory_used_bytes{area="heap",id="Tenured Gen"} 1.1301808E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'"} 5261952.0 jvm_memory_used_bytes{area="heap",id="Eden Space"} 2506768.0 jvm_memory_used_bytes{area="nonheap",id="Metaspace"} 3.3255E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'"} 1389824.0 jvm_memory_used_bytes{area="heap",id="Survivor Space"} 524288.0 jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space"} 4096808.0 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'"} 1120640.0 ...
- Check for backend application related metrics - curl https://$(oc get route frontend -n project1 -o jsonpath='{.spec.host}') oc exec -n project1 $(oc get pods -l app=backend \ --no-headers -o custom-columns='Name:.metadata.name' \ -n project1 | head -n 1 ) \ -- curl -s http://localhost:8080/q/metrics | grep http_server_requests_seconds
 - Check that http_server_requests_seconds_count with method GET and root URI value is 1 with return status 200 - # HELP http_server_requests_seconds http_server_requests_seconds_count{method="GET",outcome="SUCCESS",status="200",uri="root"} 2.0 http_server_requests_seconds_sum{method="GET",outcome="SUCCESS",status="200",uri="root"} 3.769968656 # TYPE http_server_requests_seconds_max gauge # HELP http_server_requests_seconds_max http_server_requests_seconds_max{method="GET",outcome="SUCCESS",status="200",uri="root"} 3.486753482
 
- Create Service Monitoring to monitor backend service - apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: backend-monitor spec: endpoints: - interval: 30s port: http path: /q/metrics # Get metrics from URI /q/metrics scheme: http selector: matchLabels: app: backend # select only label app = backend- Create service monitor - oc apply -f manifests/backend-service-monitor.yaml -n project1- Remark: Role monitor-edit is required for create ServiceMonitor and PodMonitor resources. Following example is granting role montior-edit to user1 for project1 - oc adm policy add-role-to-user monitoring-edit user1 -n project1
- Developer Console monitoring metrics - Select application metrics  
- Application metrics - Scale backend-v1 and frontend-v1 to 5 pod and run load test tool - oc scale deployment backend-v1 --replicas=5 -n project1 oc scale deployment frontend-v1 --replicas=5 -n project1- Use K6
 - oc run load-test-frontend -n project1 \ -i --image=loadimpact/k6 \ --rm=true --restart=Never -- run -< manifests/load-test-k6.js \ -e URL=http://frontend:8080 \ -e THREADS=10 \ -e RAMPUP=30s \ -e DURATION=3m \ -e RAMPDOWN=30s \ -e K6_NO_CONNECTION_REUSE=true- Use Siege
 - oc create -f manifests/tools.yaml -n project1 TOOL=$(oc get po -n project1 -l app=network-tools --no-headers -o custom-columns='Name:.metadata.name')- Run siege command - oc exec -n project1 $TOOL -- siege -c 20 -t 4m http://frontend:8080
- PromQL for query request/sec for GET method to root URI of backend service - rate(http_server_requests_seconds_count{method="GET",uri="root"}[1m])- Sample stacked graph  
 
 
K6
This is optional steps. If you have k6 on your manchine
k6 run manifests/load-test-k6.js \
--out json=output.json \
--summary-export=summary-export.json \
-e URL=https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1) \
-e THREADS=10 \
-e RAMPUP=30s \
-e DURATION=3m \
-e RAMPDOWN=30s \
-e K6_NO_CONNECTION_REUSE=true
Remark: output are stored in output.json and summary-export.json
Run K6 with dashboard (You need to install K6 dashbaord first)
k6 run manifests/load-test-k6.js \
--out dashboard \
-e URL=https://$(oc get route frontend -o jsonpath='{.spec.host}' -n project1) \
-e THREADS=10 \
-e RAMPUP=30s \
-e DURATION=3m \
-e RAMPDOWN=30s \
-e K6_NO_CONNECTION_REUSE=true
View Dashboard

Custom Grafana Dashboard
- Install Grafana Operator to project application-monitor  - or using CLI - oc create -f manifests/grafana-sub.yaml- Verify - oc get csv -n openshift-operators- Output - NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v5.6.2 Grafana Operator 5.6.2 grafana-operator.v5.6.1 Succeeded- Use Grafana Operator by Red Hat to deploy Grafana and configure datasource to Thanos Querier Remark: Grafana Operator is Community Edition - not supported by Red Hat 
- Create project - oc new-project application-monitor --display-name="App Dashboard" --description="Grafana Dashboard for Application Monitoring"
- Create Grafana instance - oc create -f manifests/grafana.yaml -n application-monitor oc create route edge grafana --service=grafana-service --port=3000 -n application-monitor watch -d oc get pods -n application-monitor- Sample Output - NAME READY STATUS RESTARTS AGE grafana-deployment-96d5f5954-5hml7 1/1 Running 0 14s
- Add cluster role - cluster-monitoring-viewto Grafana ServiceAccount- oc adm policy add-cluster-role-to-user cluster-monitoring-view \ -z grafana-sa -n application-monitor
- Create Grafana DataSource with serviceaccount grafana-serviceaccount's token and connect to thanos-querier - TOKEN=$(oc create token grafana-sa --duration=4294967296s -n application-monitor) cat manifests/grafana-datasource.yaml|sed 's/Bearer .*/Bearer '"$TOKEN""'"'/'|oc apply -n application-monitor -f -
- Create Grafana Dashboard - oc apply -f manifests/grafana-dashboard.yaml -n application-monitor
- Login to Grafana Dashboard with following URL - echo "Grafana URL => https://$(oc get route grafana -o jsonpath='{.spec.host}' -n application-monitor)"- or use link provided by Developer Console  - Login with user - adminand password- openshift
- Generate workload - bash script to loop request to frontend application. - FRONTEND_URL=https://$(oc get route frontend -n project1 -o jsonpath='{.spec.host}') while [ 1 ]; do curl $FRONTEND_URL printf "\n" sleep .2 done
- k6 load test tool with 10 threads for 5 minutes - oc run load-test-frontend -n project1 \ -i --image=loadimpact/k6 \ --rm=true --restart=Never -- run -< manifests/load-test-k6.js \ -e URL=http://frontend:8080 \ -e THREADS=15 \ -e RAMPUP=30s \ -e DURATION=5m \ -e RAMPDOWN=30s
 
- Grafana Dashboard  
Custom Alert
- Check PrometheusRule - backend-app-alert is consists with 2 following alerts: - ConcurrentBackend severity warning when total concurrent reqeusts is greater than 40 requests/sec 
- HighLatency servierity critical when percentile 90th of response time is greater than 5 sec 
 
- Create backend-app-alert - oc apply -f manifests/backend-custom-alert.yaml- Remark: Role - monitoring-rules-viewis required for view- PrometheusRuleresource and role- monitoring-rules-editis required to create, modify, and deleting- PrometheusRule- Following example is granting role monitoring-rules-view and monitoring-rules-edit to user1 for project1 - oc adm policy add-role-to-user monitoring-rules-view user1 -n project1 oc adm policy add-role-to-user monitoring-rules-edit user1 -n project1
- Test - ConcurrentBackendalert with 25 concurrent requests- oc run load-test-frontend -n project1 \ -i --image=loadimpact/k6 \ --rm=true --restart=Never -- run -< manifests/load-test-k6.js \ -e URL=http://frontend:8080 \ -e THREADS=15 \ -e RAMPUP=30s \ -e DURATION=5m \ -e RAMPDOWN=30s- Check for alert in Developer Console by select Menu - Monitoringthen select- Alerts 
- Check for alert in Developer Console  
- Console overview status  
sum(increase(http_server_requests_seconds_count{outcome="SUCCESS"}[5m]))/sum(increase(http_server_requests_seconds_count[5m]))*100