How I debug a certificate didn't renew

I found out my certificate is expired this morning and it’s not renewed automatically. Here’s how I debug it step by step.

Get certificate status

$ kubectl describe cert -n slack slack-tls
Status:
Conditions:
Last Transition Time: 2020-01-21T04:15:16Z
Message: Certificate is up to date and has not expired
Reason: Ready
Status: True
Type: Ready
Not After: 2020-08-18T01:20:11Z

Try to force certificate renewal

By adding spec.renewBefore to certificate.

kubectl -n <namespace> patch certificate example-certificate --type=merge -p '{"spec":{"renewBefore":"2159h00m00s"}}'

And the order is still invalid.

$ kubectl -n slack get order
NAME STATE AGE
slack-tls-488818493 invalid 11m

So, I try to see if any event happened.

$ kubectl get event -n slack
LAST SEEN TYPE REASON OBJECT MESSAGE
38m Warning PresentError challenge/slack-tls-488818493-0 Error presenting challenge: GoogleCloud API call failed: googleapi: Error 403: Request had insufficient authentication scopes.
More details:
Reason: insufficientPermissions, Message: Insufficient Permission
12m Warning CleanUpError challenge/slack-tls-488818493-0 Error cleaning up challenge: GoogleCloud API call failed: googleapi: Error 403: Request had insufficient authentication scopes.
More details:

So, I’m digging out it’s the reason like I posted at: Cert-Manager Error presenting challenge: GoogleCloud API call failed: googleapi: Error 403: Request had insufficient authentication scopes.

We have two node pool on the GKE cluster. unfortunately, cert-manager pod located on a node without www.googleapis.com/cloud-platform permission. So we can add this to make sure it deploy to the right nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
name: cert-manager
namespace: "cert-manager"
spec:
template:
spec:
nodeSelector:
www.googleapis.com/cloud-platform: "true"

Then, yes it is. Cert-manager start to work.

$ kubectl describe cert -n slack slack-tls
Status:
Conditions:
Last Transition Time: 2020-08-18T03:56:24Z
Message: Certificate is up to date and has not expired
Reason: Ready
Status: True
Type: Ready
Not After: 2020-11-16T02:56:23Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 24m cert-manager Created Order resource "slack-tls-488818493"
Normal OrderComplete 21m cert-manager Order "slack-tls-488818493" completed successfully
Normal CertIssued 21m cert-manager Certificate issued successfully

Remember to remove spec.renewBefore, or you will hit Let’s encrypt rate limit.

kubectl -n <namespace> patch certificate example-certificate --type=json -p='[{"op": "remove", "path": "/spec/renewBefore"}]'

After

We might need something like https://www.elastic.co/guide/en/uptime/current/uptime-certificates.html to watch certificates and alert.

Reference