Deep dive into Hashicorp Vault’s auto-unseal

7 min readJun 12, 2023

Image source: https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Cloud_KMS_1.png

The goal is to understand what happens with HashiCorp Vault when we choose the auto unseal option. I will show you by using GCP Cloud KMS for the unsealing.

Disclaimer: I assume that you have at least one GCP project where you can create and manage KMS keys. I will also use Terraform and K3D in my representation.

Setup the test environment
What are Google Cloud KMS keys?
So where is the unseal keys actually?
Decrypt the keys manually

Setup the test environment

Google Cloud dependencies

The first things that we need are some keys on Google Cloud’s Key Management System. There is a Terraform snippet which creates this for you.

resource "google_kms_key_ring" "vault_key_ring" {
   project  = "${var.project_id}"
   name     = "vault_key_ring"
   location = "us-central1"
}

resource "google_kms_crypto_key" "vault_crypto_key" {
   name            = "vault_crypto_key"
   key_ring        = "${google_kms_key_ring.vault_key_ring.id}"
   rotation_period = "100000s"
}

Vault needs to interact with the GCP KMS so we should create a Service Account as well and assign a few roles. I will be honest, I didn’t check what are the role requirements for that. Anything the Vault prompted me with I just assigned it. (I know this isn’t the best solution, but since it’s not relevant I chose the fastest and easiest way to do that.)

resource "google_service_account" "vault_kms" {
  project      = "${var.project_id}"
  account_id   = "vault_kms"
  display_name = "Vault's KMS manager"
  description  = "This service account is responsible for the Vault's KMS management"
}

resource "google_project_iam_binding" "vault_kms_role_1" {
  project = "${var.project_id}"
  role    = "roles/cloudkms.admin"
  members = [
    "serviceAccount:${google_service_account.vault_kms.email}"
  ]
}

resource "google_project_iam_binding" "vault_kms_role_2" {
  project = "${var.project_id}"
  role    = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  members = [
    "serviceAccount:${google_service_account.vault_kms.email}"
  ]
}

resource "google_project_iam_binding" "vault_kms_role_3" {
  project = "${var.project_id}"
  role    = "roles/cloudkms.signerVerifier"
  members = [
    "serviceAccount:${google_service_account.vault_kms.email}"
  ]
}

Setup K3D

So we have everything on the GCP side. Let’s create a local Kubernetes cluster where I can setup my amazing auto unsealer Vault instance. I also need to create volume where we can attach the data directory of Vault.

mkdir /tmp/k3dvol
k3d cluster create vault --volume /tmp/k3dvol:/tmp/k3dvol
kubectl get nodes
# k3d-vault-server-0   Ready    control-plane,master   17h   v1.24.4+k3s1

Apply Vault manifests

So we have cluster that we can work with. We need to write the YAML files to apply the necessary resources.

First we will need a Persistent Volume and a Persistent Volume Claim so we can investigate the data directory of the Vault outside of the cluster. It is also important for our case, because for the demonstration of a working auto-unseal I need to remove the existing Pod. If I remove the Pod, the deployment will create a new one, but without the Volume all of the data will be lost.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: vault-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/tmp/k3dvol"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vault-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

After that I added the Config Map for the Vault’s config. I also added the Google Cloud Service Account’s key as a Config Map (because I live my life in constant danger).

apiVersion: v1
kind: ConfigMap
metadata:
  name: vault-config
data:
  vault.hcl: |
    storage "file" {
      path = "/opt/vault"
    }

    listener "tcp" {
      address     = "127.0.0.1:8200"
      tls_disable = 1
    }

    seal "gcpckms" {
      credentials = "/usr/vault/kms-service-account.json"
      project     = "PROJECT_ID"
      region      = "us-central1"
      key_ring    = "vault_key_ring"
      crypto_key  = "vault_crypto_key"
    }

    disable_mlock = true
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: vault-sa
data:
  kms-service-account.json: |
    {
      "type": "service_account",
      "project_id": "NOPE",
      "private_key_id": "NOPE",
      "private_key": "NOPE",
      "client_email": "NOPE",
      "client_id": "NOPE",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "NOPE",
      "universe_domain": "googleapis.com"
    }

At this point I have everything what I need for the configuration of the Vault, so I need to create the Deployment (and a Service, just for fun).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vault-deployment
  labels:
    app: vault-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vault-pod
  template:
    metadata:
      labels:
        app: vault-pod
    spec:
      restartPolicy: Always
      volumes:
      - name: vault-config
        configMap:
          name: vault-config
          items:
          - key: vault.hcl
            path: vault.hcl
      - name: vault-sa
        configMap:
          name: vault-sa
          items:
          - key: kms-service-account.json
            path: kms-service-account.json
      - name: vault-pv-storage
        persistentVolumeClaim:
          claimName: vault-pv-claim
      containers:
      - name: vault
        image: vault
        ports:
        - containerPort: 8200
        volumeMounts:
        - name: vault-config
          mountPath: /usr/vault/vault.hcl
          subPath: vault.hcl
        - name: vault-sa
          mountPath: /usr/vault/kms-service-account.json
          subPath: kms-service-account.json
        - name: vault-pv-storage
          mountPath: /opt/vault
        command: ["vault"]
        args: ["server", "-config=/usr/vault/vault.hcl"]
---
apiVersion: v1
kind: Service
metadata:
  name: vault-service
spec:
  selector:
    app: vault-pod
  ports:
  - name: rest-api
    protocol: TCP
    port: 8200
    targetPort: 8200

After you applied all of the resources you will have a running Vault Pod.

kubectl get pods
# vault-deployment-77bd9ff4fd-zpzxw   1/1     Running   0          17h

kubectl logs vault-deployment-77bd9ff4fd-zpzxw
# [INFO]  core: stored unseal keys supported, attempting fetch
# [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"

You have the pod, but you will get the WARN as I showed. It is because the underlying system is not initialized yet. So we need to initialize it. I’m sure that there are (only) better ways to do that but let’s exec into the pod.

kubectl exec -it vault-deployment-77bd9ff4fd-zpzxw sh

(INSIDE)$ VAULT_ADDR=http://127.0.0.1:8200 vault operator init
# Recovery Key 1: NOPE
# Recovery Key 2: NOPE
# Recovery Key 3: NOPE
# Recovery Key 4: NOPE
# Recovery Key 5: NOPE

# Initial Root Token: NOPE

# Success! Vault is initialized

# Recovery key initialized with 5 key shares and a key threshold of 3. Please
# securely distribute the key shares printed above.

(INSIDE)$ VAULT_ADDR=http://127.0.0.1:8200 vault status
# ...
# Initialized              true
# Sealed                   false
# ...

So what happens if we remove this pod?

kubectl delete pods vault-deployment-77bd9ff4fd-zpzxw

kubectl get pods
# NAME                                            READY   STATUS    RESTARTS   AGE
# vault-deployment-77bd9ff4fd-db7mj   1/1     Running   0          21s

kubectl exec -it vault-deployment-77bd9ff4fd-db7mj sh

(INSIDE)$ VAULT_ADDR=http://127.0.0.1:8200 vault status
# ...
# Initialized              true
# Sealed                   false
# ...

A new Pod has just been created and if we exec into the container we can see that it’s been unsealed without any problem.

What are Google Cloud KMS keys?

TL;DR: Set of symmetric or asymmetric keys for encryption, decryption and digital signatures.

I don’t want to jump into the details, you can read more from the official documentation: https://cloud.google.com/security-key-management

But it is important to note that most of these keys are stateless keys, so they are not able to store any information inside the Cloud KMS. This raises the question: where are those keys stored which auto unseal the Vault instance?

So where are the unseal keys actually?

I wrote that the Persistent Volume Claim is really important for the whole process, because the Vault stores those keys inside the data directory.

So when you call vault operator init, Vault creates the main key. The main key will be chunked into multiple parts by the Shamir algorithm. This is the default operation. Any time you want to unseal it manually you need to provide M key from N (where N is the number of chunks and M is the threshold). Vault can compose the main key from these chunks and start to operate.

The seal option provided in the config file creates an additional key called recovery key and this is where Google Cloud KMS (or any other KMS) comes in place. The Vault encrypts the recovery key (and some metadata) and stores it under the data_dir/core/_recovery-key. When the Vault starts and performs an auto unseal the system will read this file and do some magic.

Magic means that it parses the file (because it is stored in protobuf format) and then decrypts the recovery key. After a successful decryption it can create new unseal tokens and compose the main key.

Relevant parts of the code from Vault’s source:

Decrypt the keys manually

So we know how auto unseal works for Vault, but let’s try to decrypt these keys for ourselves. I wrote a small Go tool (of course it is importing the half Hashicorp code base), which does it for us. Let’s see the end result.

package main

import (
 "context"
 "encoding/json"
 "fmt"

 cloudkms "cloud.google.com/go/kms/apiv1"
 "cloud.google.com/go/kms/apiv1/kmspb"
 "github.com/golang/protobuf/proto"
 wrapping "github.com/hashicorp/go-kms-wrapping/v2"
 "github.com/hashicorp/go-kms-wrapping/wrappers/gcpckms/v2"
 "github.com/hashicorp/vault/sdk/physical"
)

var example = []byte(`COPY THE CONTENT OF /tmp/k3dvol/core/_recovery-key`)

var (
 project   = "PROJECT"
 location  = "us-central1"
 keyRing   = "vault_key_ring"
 cryptoKey = "vault_crypto_key"
)

func main() {
 var pe = physical.Entry{}
 json.Unmarshal(example, &pe)

 blobInfo := &wrapping.BlobInfo{}
 if err := proto.Unmarshal(pe.Value, blobInfo); err != nil {
  panic(err)
 }

 result, err := Decrypt(context.Background(), blobInfo, nil)
 if err != nil {
  panic(err)
 }
 fmt.Println(string(result))
}

// Decrypt is used to decrypt the ciphertext.
func Decrypt(ctx context.Context, in *wrapping.BlobInfo, opt ...wrapping.Option) ([]byte, error) {
 // Default to mechanism used before key info was stored
 if in.KeyInfo == nil {
  in.KeyInfo = &wrapping.KeyInfo{
   Mechanism: gcpckms.GcpCkmsEncrypt,
  }
 }

 kmsClient, _ := cloudkms.NewKeyManagementClient(context.Background())
 parentName := fmt.Sprintf("projects/%s/locations/%s/keyRings/%s/cryptoKeys/%s", project, location, keyRing, cryptoKey)
 var plaintext []byte
 switch in.KeyInfo.Mechanism {
 case gcpckms.GcpCkmsEnvelopeAesGcmEncrypt:
  resp, _ := kmsClient.Decrypt(ctx, &kmspb.DecryptRequest{
   Name:       parentName,
   Ciphertext: in.KeyInfo.WrappedKey,
  })

  envInfo := &wrapping.EnvelopeInfo{
   Key:        resp.Plaintext,
   Iv:         in.Iv,
   Ciphertext: in.Ciphertext,
  }
  plaintext, _ = wrapping.EnvelopeDecrypt(envInfo, opt...)
 }

 return plaintext, nil
}

The Decrypt function is the copy of the linked GCP KMS Decryption implementation.
The fmt.Println(string(result)) will print out the recovery key that the Vault will use to auto unseal the system.
Vault has a Seal interface which has an autoSeal implementation used in case of auto unseal. This is important because it shows that it’s not using Shamir for the unsealing operation.

Conclusion

So why did I wrote this? Firstly because if I don’t write it down my brain removes the information by the next day. Secondly it is important to know how the systems that we use daily work. I created the auto unsealed Vault instance for our internal development/test environment (yes for development, not even close to production), but something inside me just didn’t let it go. I need to know where are those keys which is making the auto unseal. So I opened the Vault source code (we are using it since 2017 and it wasn’t the first time) and I checked how it works.