Skip to content

Initial troubleshooting

Percona Operator for MongoDB uses Custom Resources to manage options for the various components of the cluster.

  • PerconaServerMongoDB Custom Resource with Percona Server for MongoDB options (it has handy psmdb shortname also),

  • PerconaServerMongoDBBackup and PerconaServerMongoDBRestore Custom Resources contain options for Percona Backup for MongoDB used to backup Percona Server for MongoDB and to restore it from backups (psmdb-backup and psmdb-restore shortnames are available for them).

The first thing you can check for the Custom Resource is to query it with kubectl get command:

$ kubectl get psmdb
Expected output
NAME              ENDPOINT                                           STATUS   AGE
my-cluster-name   my-cluster-name-mongos.default.svc.cluster.local   ready    5m26s

The Custom Resource should have Ready status.

Note

You can check which Percona’s Custom Resources are present and get some information about them as follows:

$ kubectl api-resources | grep -i percona
Expected output
perconaservermongodbbackups       psmdb-backup    psmdb.percona.com/v1                   true         PerconaServerMongoDBBackup
perconaservermongodbrestores      psmdb-restore   psmdb.percona.com/v1                   true         PerconaServerMongoDBRestore
perconaservermongodbs             psmdb           psmdb.percona.com/v1                   true         PerconaServerMongoDB

Check the Pods

If Custom Resource is not getting Ready status, it makes sense to check individual Pods. You can do it as follows:

$ kubectl get pods
Expected output
NAME                                               READY   STATUS    RESTARTS   AGE
my-cluster-name-cfg-0                              2/2     Running   0          11m
my-cluster-name-cfg-1                              2/2     Running   1          10m
my-cluster-name-cfg-2                              2/2     Running   1          9m
my-cluster-name-mongos-0                           1/1     Running   0          11m
my-cluster-name-mongos-1                           1/1     Running   0          11m
my-cluster-name-mongos-2                           1/1     Running   0          11m
my-cluster-name-rs0-0                              2/2     Running   0          11m
my-cluster-name-rs0-1                              2/2     Running   0          10m
my-cluster-name-rs0-2                              2/2     Running   0          9m
percona-server-mongodb-operator-665cd69f9b-xg5dl   1/1     Running   0          37m

The above command provides the following insights:

  • READY indicates how many containers in the Pod are ready to serve the traffic. In the above example, my-cluster-name-rs0-0 Pod has all two containers ready (2/2). For an application to work properly, all containers of the Pod should be ready.
  • STATUS indicates the current status of the Pod. The Pod should be in a Running state to confirm that the application is working as expected. You can find out other possible states in the official Kubernetes documentation .
  • RESTARTS indicates how many times containers of Pod were restarted. This is impacted by the Container Restart Policy . In an ideal world, the restart count would be zero, meaning no issues from the beginning. If the restart count exceeds zero, it may be reasonable to check why it happens.
  • AGE: Indicates how long the Pod is running. Any abnormality in this value needs to be checked.

You can find more details about a specific Pod using the kubectl describe pods <pod-name> command.

$ kubectl describe pods my-cluster-name-rs0-0
Expected output
...
Name:         my-cluster-name-rs0-0
Namespace:    default
...
Controlled By:  StatefulSet/my-cluster-name-rs0
Init Containers:
 mongo-init:
...
Containers:
 mongod:
...
   Restart Count:  0
   Limits:
     cpu:     300m
     memory:  500M
   Requests:
     cpu:      300m
     memory:   500M
   Liveness:   exec [/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200] delay=60s timeout=10s period=30s #success=1 #failure=4
   Readiness:  tcp-socket :27017 delay=10s timeout=2s period=3s #success=1 #failure=8
   Environment Variables from:
     internal-my-cluster-name-users  Secret  Optional: false
   Environment:
...
   Mounts:
...
Volumes:
...
Events:                      <none>

This gives a lot of information about containers, resources, container status and also events. So, describe output should be checked to see any abnormalities.


Last update: 2024-09-15