Kubernetes - pull an image from private ECR registry. Auto refresh ECR token.

Published on 15 Mar 2021

Although there are a lot of instructions available, I haven't found a straightforward way of deploying a container to Kubernetes cluster that is hosted in a private ECR registry. In this short article, I would like to share a sequence of steps that can be used to perform the deployment.

Prerequisites

Make sure that the machine that is going to be used to perform the deployment (whether it's your local or most likely CI/CD environment) does have aws-cli installed and configured with proper access key id and secret access key so that it has access to pull an image. More info in the official AWS article.

Overview

As a good practice, it's always best to deploy a different set of applications in separate namespaces. We would create namespace manually via kubectl create command (for the reasons I explain later).

From the application standpoint, we would consider a minimalistic "health check application" that replies with status: ok as an HTTP response. Kubernetes manifest file would consist out of the following objects:

service (exposed via Nodeport)
deployment (containing an application)

manifest.yml:

apiVersion: v1
kind: Service
metadata:
  name: health-check-service
  namespace: health-check
spec:
  type: NodePort
  ports:
  - port: 3000
  selector:
    app: node-hello-world-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: node-hello-world-deployment
  namespace: health-check
  labels:
    app: node-hello-world-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: node-hello-world-app
  template:
    metadata:
      labels:
        app: node-hello-world-app
    spec:
      containers:
      - name: node-hello-world-app-container
        image: <aws_account_id>.dkr.ecr.<aws_region>.amazonaws.com/nodejs-hello-world
        imagePullPolicy: Always
        ports:
          - name: web
            containerPort: 3000
      imagePullSecrets:
          - name: regcred

Besides a familiar look of service and deployment definition, there are a couple of items that are needed to be highlighted:

ECR Image Registry URL: <aws_account_id>.dkr.ecr.aws_region.amazonaws.com/<image-name>:<tag>
- <aws_account_id> - your account id, e.g. e9ae3c220b23
- <aws_region> - aws region name (examples here)
- <image-name> - image name
- <tag> - image tag, usually defines a version or simply use latest
Image Pull Policy: Always enforce image force pull to avoid unexpected issues when k8s doesn't pull an image from a remote repository.

Deployment steps

Create namespace via kubectl create command:
```
kubectl create namespace health-check
```
The reason we create namespace manually and not in the above manifest file is that in the next step we would have to create a secret within this namespace. This is super important since kubernetes secrets are scoped to a specific namespace.

Next, the secret is generated via a command line using aws ecr that is outside of "kubectl" ecosystem.
Create a registry secret within the above namespace that would be used to pull an image from a private ECR repository:
```
kubectl create secret docker-registry regcred \
  --docker-server=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password) \
  --namespace=health-check
```
This command would utilize aws-cli aws ecr get-login-password and save the generated credentials in a special docker-registry secret type. More info about it in the official kubernetes docs.

Please note, that username is always set as AWS for all accounts.
Deploy manifest file using kubectl apply -f command:
```
kubectl apply -f manifest.yml
```

Using the http command I can verify that my deployment is working:

$ http <cluster-ip>:<node-port>
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 15
Content-Type: application/json; charset=utf-8
Date: Mon, 15 Mar 2021 16:47:54 GMT
ETag: W/"f-VaSQ4oDUiZblZNAEkkN+sX+q3Sg"
Keep-Alive: timeout=5
X-Powered-By: Express

{
    "message": "Hello World!"
}

One-liner for CI/CD pipeline

If you need to automate the deployment via CI/CD or simply just would line to use one-line command, here it is:

NAMESPACE_NAME="health-check" && \
kubectl create namespace $NAMESPACE_NAME || true && \
kubectl create secret docker-registry regcred \
  --docker-server=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password) \
  --namespace=$NAMESPACE_NAME || true && \
kubectl apply -f manifest-deployment.yml

Clarification:

NAMESPACE_NAME variable is pulled out for reuse. It's also scoped only to a current bash session.
...|| true bash statement is added to ignore errors for already created space and secret when reusing the same command to re-apply changes from the manifest file.

Update - AWS ECR Token refresh

Even though it's possible to be successful with installation following the steps above, there is one caveat. ECR will reject stale tokens that were obtained more than 12 hours ago. As was mentioned in the comment below by Patrick McMahon:

... if you have deployed an application and the scheduler decides to move the pod to another node after 12 hours you will get an 'ImagePullBackOff' error as the authentication to ECR no longer works

There are 2 possible ways to approach the problem (there might be more, but these are the 2 that I came up with):

Use cronjob on the host os (or some other remote machine that has access to a cluster) to automatically keep refresh tokens. The downside, if the host os goes down this approach may stop working until fixed. Here are the steps:

# Create a log file that cron job will output to
sudo touch /var/log/aws-ecr-update-credentials.log

# Make a current user owner of the file so that cronjob running under his/its account can write to it
sudo chown $USER /var/log/aws-ecr-update-credentials.log

# Create an empty file where the script would reside
sudo touch /usr/local/bin/aws-ecr-update-credentials.sh

# Allow cronjob to execute the script under the user
sudo chown $USER /usr/local/bin/aws-ecr-update-credentials.sh\

# Make the script executable
sudo chmod +x /usr/local/bin/aws-ecr-update-credentials.sh

Add the script to the recently created /usr/local/bin/aws-ecr-update-credentials.sh file:

#!/usr/bin/env bash

kube_namespaces=($(kubectl get secret --all-namespaces | grep regcred | awk '{print $1}'))
for i in "${kube_namespaces[@]}"
do
  :
  echo "$(date): Updating secret for namespace - $i"
  kubectl delete secret regcred --namespace $i
  kubectl create secret docker-registry regcred \
  --docker-server=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(/usr/local/bin/aws ecr get-login-password) \
  --namespace=$i
done

This script will update only namespaces that already have regcred secret. Don't forget to replace ${AWS_ACCOUNT} and ${AWS_REGION} with corresponding values or add corresponding environment variables.

The last step - add cronjob

#open crontab file
crontab -e

#job
0 */10 * * * /usr/local/bin/aws-ecr-update-credentials.sh >> /var/log/aws-ecr-update-credentials.log 2>&1

You can replace "0 */10 * * *" with "* * * * *" which would result in executing this script every 1 minute, so you can check logs and make sure the script works as expected.

Use CronJob resource on the Kubernetes side:

apiVersion: v1
kind: Secret
metadata:
  name: ecr-registry-helper-secrets
  namespace: health-check
stringData:
  AWS_SECRET_ACCESS_KEY: "xxxx"
  AWS_ACCESS_KEY_ID: "xxx"
  AWS_ACCOUNT: "xxx"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ecr-registry-helper-cm
  namespace: health-check
data:
  AWS_REGION: "xxx"
  DOCKER_SECRET_NAME: regcred
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: ecr-registry-helper
  namespace: health-check
spec:
  schedule: "0 */10 * * *"
  successfulJobsHistoryLimit: 3
  suspend: false
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-health-check
          containers:
          - name: ecr-registry-helper
            image: odaniait/aws-kubectl:latest
            imagePullPolicy: IfNotPresent
            envFrom:
              - secretRef:
                  name: ecr-registry-helper-secrets
              - configMapRef:
                  name: ecr-registry-helper-cm
            command:
              - /bin/sh
              - -c
              - |-
                ECR_TOKEN=`aws ecr get-login-password --region ${AWS_REGION}`
                NAMESPACE_NAME=health-check
                kubectl delete secret --ignore-not-found $DOCKER_SECRET_NAME -n $NAMESPACE_NAME
                kubectl create secret docker-registry $DOCKER_SECRET_NAME \
                --docker-server=https://${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com \
                --docker-username=AWS \
                --docker-password="${ECR_TOKEN}" \
                --namespace=$NAMESPACE_NAME
                echo "Secret was successfully updated at $(date)"
          restartPolicy: Never
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-health-check
  namespace: health-check
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: health-check
  name: role-full-access-to-secrets
rules:
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["regcred"]
  verbs: ["delete"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: health-check-role-binding
  namespace: health-check
subjects:
- kind: ServiceAccount
  name: sa-health-check
  namespace: health-check
  apiGroup: ""
roleRef:
  kind: Role
  name: role-full-access-to-secrets
  apiGroup: ""

There are quite a few configuration details in here. The biggest difference is that CronJob resource in this example is scoped to a specific namespace and is allowed to delete only regcred secret which makes it secure.

On the high-level Secret/ConfigMap are used to extract configuration details, other resources are used to permit CronJob to have the ability to remove/update regcred token (Role -> RoleBinding -> Service Account -> CronJob). The same as per the previous approach, it's advisable to change cron configuration from "0 */10 * * *" initially to "* * * * *" to verify that this script works as expected.

The source code and readme are also available on GitHub: https://github.com/skryvets/kubernetes-pull-an-image-from-private-ecr-registry

Conclusion

This article provides a basic and production-ready approach of how to deploy a private-hosted application to a Kubernetes cluster. It can scale to any number of pods/replica-sets/deployments as long as they reside in the same namespace. It's also possible to repeat the same steps for each namespace.