Skip to main content

K8s Troubleshooting

Kubernetes Alerts

I've integrated EKS and configured alerts for it. Why don't I see any alerts?

Check if the alert rule has any resource groups selected. Update the alert rule so that no resource groups are selected.

Kubernetes Compliance

How do I check if the Node and Cluster Collectors are running on a Kubernetes cluster?

  1. List all pods in the Lacework namespace for the cluster:

    kubectl get pods -o wide -n lacework
  2. For each Kubernetes cluster of N number of nodes, check that there is N+1 number of Lacework Agents:

    Example output for a K8s cluster of 2 nodes
    NAME                                      READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
    lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
    lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
    lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
    • The Cluster Collector node is named with a prefix of lacework-agent-cluster-*. There should be one for each Kubernetes cluster.
    • The Node Collector nodes are named with a prefix of lacework-agent-*. There should be one for each node in a Kubernetes cluster.

How do I check if any custom changes made to configuration files have been processed successfully?

  1. List all pods in the Lacework namespace for the cluster:

    kubectl get pods -o wide -n lacework
    Example output for a K8s cluster of 2 nodes
    NAME                                      READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
    lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
    lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
    lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
  2. Check the logs on any nodes where configuration changes were made:

    Example command for a Node Collector log
    kubectl logs lacework-agent-7brq8 -n lacework

    The entries in the Node Collector logs will look similar to the output below if the configuration cannot be read:

    Example Node Collector log output for disabled functionality
    time="2022-09-30T15:07:22.254Z" level=error msg="Cannot parse config file [/var/lib/lacework/config/config.json] Unable to sanitize content: [Config parsing error 5 (k8sNodeScrapeIntervalMins, 5)]" caller="json.go:261" pid=1111

    Check your config.json for any erroneous characters or spaces if this message is seen.

Node Collector

Find Node Collectors by listing all pods in the Lacework namespace for the cluster:

kubectl get pods -o wide -n lacework

The Node Collector nodes are named with a prefix of lacework-agent-*. There should be one for each node in a Kubernetes cluster:

Example output for a K8s cluster of 2 nodes
NAME                                      READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>

How do I check whether the Node Collector has started successfully?

Check the logs on any lacework-agent-* nodes:

Example command for a Node Collector log
kubectl logs lacework-agent-tgfv8 -n lacework

The entries in the Node Collector logs should look similar to the output below (when running successfully):

Example Node Collector log output
time="2022-09-30T15:25:34.093Z" level=info msg="k8sNodeCollector is enabled..Starting module" caller="collector.go:75" pid=30794
time="2022-09-30T15:25:34.093Z" level=info msg="Initialized module: k8sNodeCollectorConfig" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.094Z" level=info msg="Initialized module: k8sNodeCollectorSink" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.094Z" level=info msg="Initialized module: k8sNodeCollector" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.096Z" level=info msg="Started module: k8sNodeCollectorConfig" caller="lwmodule.go:69" pid=30794
time="2022-09-30T15:25:34.096Z" level=info msg="Started module: k8sNodeCollectorSink" caller="lwmodule.go:69" pid=30794
time="2022-09-30T15:25:34.104Z" level=info msg="k8sNodeCollector, Ticker started." caller="k8snodecollector.go:57" pid=30794
time="2022-09-30T15:25:34.104Z" level=info msg="Started module: k8sNodeCollector" caller="lwmodule.go:69" pid=30794

How do I check whether the Node Collector functionality is enabled or disabled?

Check the logs on any lacework-agent-* nodes:

Example command for a Node Collector log
kubectl logs lacework-agent-tgfv8 -n lacework

If the entries in the Node Collector logs will look similar to the output below if the Node Collector functionality is disabled:

Example Node Collector log output for disabled functionality
time="2022-09-28T19:39:20.506Z" level=info msg="k8sNodeCollector is disabled" caller="collector.go:89" pid=1111

Cluster Collector

Find the Cluster Collector by listing all pods in the Lacework namespace for the cluster:

kubectl get pods -o wide -n lacework

The Cluster Collector node is named with a prefix of lacework-agent-cluster-*. There should be one for each Kubernetes cluster:

Example output for a K8s cluster of 2 nodes
NAME                                      READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>

How do I check whether the Cluster Collector has started successfully?

Check the Cluster Collector node logs for messages containing took that state Completed discovery and Collection status OK:

Example command
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep took
Example log output
2023-01-11T18:19:52.013Z INFO lwmodule/lwmodule.go:52 Initializing modules done, took: 119.706µs
2023-01-11T18:19:52.014Z INFO lwmodule/lwmodule.go:72 Starting modules done, took: 223.344µs
2023-01-11T18:19:52.931Z INFO discovery/discovery.go:535 Completed discovery, took 754.51598ms
2023-01-11T18:20:00.244Z INFO driver/driver.go:211 Collection status OK, started 2023-01-11T18:19:52Z ended 2023-01-11T18:20:00Z took: 8.230019046s

How do I check if the Cluster Collector has sent data to Lacework?

Check the Cluster Collector node logs for messages containing Sending that have "collectionStatusType":"STARTED" and "collectionStatusType":"COMPLETED" entries with no error message ("errorMessage":"):

Example command
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep -i Sending
Example log output
2022-11-23T16:08:45.076Z    INFO    driver/driver.go:222    Sending config_summary_t msg {"MsgType":"config_summary_t","COLLECTION_TYPE":{"formatVersion":"1","collectionType":"SNAPSHOT"},"COLLECTION_VERSION":"1.0","REQUEST_GUID":"B3BE74CC-CD32-B5B9-4140-2010841FF57B","START_TIME":"1669219715557","END_TIME":"1669219725076","COLLECTION_CONFIG":{"formatVersion":"1","k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4","k8sClusterVersion":{"buildDate":"2021-07-15T20:59:07Z","compiler":"gc","gitCommit":"ca643a4d1f7bfe34773c74f79527be4afd95bf39","gitTreeState":"clean","gitVersion":"v1.21.3","goVersion":"go1.16.6","major":"1","minor":"21","platform":"linux/amd64"},"k8sClusterType":"eks","k8sClusterName":"test","cloudRegion":"us-west-2","hostName":"ip-192-168-59-234.us-west-2.compute.internal","k8sCollectorConfigFileData":"","k8sCollectorConfigMerged":""},"COLLECTION_START_TIME":"1669219715557","COLLECTION_END_TIME":"1669219725076","COLLECTION_STATUS":{"formatVersion":"1","collectionStatusType":"STARTED","errorType":"OK","errorMessage":""},"PROPS":{"collectionStats":{"numDefaultAllows":"0","numDefaultDenies":"0","numOverrideAllows":"0","numOverrideDenies":"0","numDetailErrs":"0","numStatusErrs":"0","numSummaryErrs":"0","timeTakenForCollectionInMsecs":"0","timeTakenForCollectionHumanReadable":""},"k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4"},"KEYS":{"totalCollectionRecords":""}}
2022-11-23T16:08:57.703Z INFO driver/driver.go:222 Sending config_summary_t msg {"MsgType":"config_summary_t","COLLECTION_TYPE":{"formatVersion":"1","collectionType":"SNAPSHOT"},"COLLECTION_VERSION":"1.0","REQUEST_GUID":"B3BE74CC-CD32-B5B9-4140-2010841FF57B","START_TIME":"1669219715557","END_TIME":"1669219737703","COLLECTION_CONFIG":{"formatVersion":"1","k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4","k8sClusterVersion":{"buildDate":"2021-07-15T20:59:07Z","compiler":"gc","gitCommit":"ca643a4d1f7bfe34773c74f79527be4afd95bf39","gitTreeState":"clean","gitVersion":"v1.21.3","goVersion":"go1.16.6","major":"1","minor":"21","platform":"linux/amd64"},"k8sClusterType":"eks","k8sClusterName":"test","cloudRegion":"us-west-2","hostName":"ip-192-168-59-234.us-west-2.compute.internal","k8sCollectorConfigFileData":"","k8sCollectorConfigMerged":""},"COLLECTION_START_TIME":"1669219715557","COLLECTION_END_TIME":"1669219737703","COLLECTION_STATUS":{"formatVersion":"1","collectionStatusType":"COMPLETED","errorType":"OK","errorMessage":""},"PROPS":{"collectionStats":{"numDefaultAllows":"0","numDefaultDenies":"0","numOverrideAllows":"0","numOverrideDenies":"30","numDetailErrs":"0","numStatusErrs":"0","numSummaryErrs":"0","timeTakenForCollectionInMsecs":"22145","timeTakenForCollectionHumanReadable":"22.145797895s"},"k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4"},"KEYS":{"totalCollectionRecords":"578"}}

Why don't I see any Compliance Changed alerts after integrating my Kubernetes cluster?

After integrating your Kubernetes cluster with the Lacework Compliance platform, Compliance Changed alerts are not generated until at least two full evaluations have been performed on your cluster. This is generally less than 48 hours after integrating your cluster.

Compliance Changed alerts occur when two existing evaluations are compared to detect any changes. If a compliance violation is found during the comparison, a resulting Compliance Changed alert is generated.

New Violations can be generated after the first evaluation on your Kubernetes cluster (for any detected policy violations).

Partial collection available due to EC2MetadataError

The Cluster Collector retrieves AWS instance metadata for the EKS cluster, which is crucial to connect Node and Cluster collector data together and provide configuration visibility in the Lacework platform.

If the metadata requests from the Cluster Collector are blocked, this can result in errors seen in the Lacework Console (such as Partial Collection).

Check the Cluster Collector logs to see if any errors are seen relating to the retrieval of metadata:

Example command
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep EC2Metadata
Example log output
2023-01-20T12:26:59.763Z INFO discovery/cdiscovery.go:117 AWS:Error in getting instance-id from EC2 metadata EC2MetadataError: failed to make EC2Metadata requestrequest blocked by allow-route-regexp "^$": /latest/meta-data/instance-idstatus code: 404, request id:

These errors can be seen when the EKS cluster is running KIAM (or similar). KIAM utilizes a proxy that intercepts metadata requests. KIAM policies can sometimes be configured to disallow AWS metadata requests from EKS cluster pods (where the Cluster Collector is installed).

If KIAM is installed on your cluster, ensure that AWS metadata requests are permitted from pods.

Partial collection available due to mismatched EKS cluster name

Errors such as Partial collection available. The cloud collector has not been configured. may be seen on the Console if the kubernetesCluster value provided during installation does not match the EKS cluster name in AWS.

Check the name of your EKS cluster in AWS, and then repeat the EKS Compliance integration Helm command again.

Ensure that the following configuration value is set to your Amazon EKS cluster name (as it appears in AWS):

Example
--set laceworkConfig.kubernetesCluster=myEksClusterName \