K8s Troubleshooting
Kubernetes Alerts
I've integrated EKS and configured alerts for it. Why don't I see any alerts?
Check if the alert rule has any resource groups selected. Update the alert rule so that no resource groups are selected.
Kubernetes Compliance
How do I check if the Node and Cluster Collectors are running on a Kubernetes cluster?
List all pods in the Lacework namespace for the cluster:
kubectl get pods -o wide -n lacework
For each Kubernetes cluster of
N
number of nodes, check that there isN+1
number of Lacework Agents:Example output for a K8s cluster of 2 nodesNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>- The Cluster Collector node is named with a prefix of
lacework-agent-cluster-*
. There should be one for each Kubernetes cluster. - The Node Collector nodes are named with a prefix of
lacework-agent-*
. There should be one for each node in a Kubernetes cluster.
- The Cluster Collector node is named with a prefix of
How do I check if any custom changes made to configuration files have been processed successfully?
List all pods in the Lacework namespace for the cluster:
kubectl get pods -o wide -n lacework
Example output for a K8s cluster of 2 nodesNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>Check the logs on any nodes where configuration changes were made:
Example command for a Node Collector logkubectl logs lacework-agent-7brq8 -n lacework
The entries in the Node Collector logs will look similar to the output below if the configuration cannot be read:
Example Node Collector log output for disabled functionalitytime="2022-09-30T15:07:22.254Z" level=error msg="Cannot parse config file [/var/lib/lacework/config/config.json] Unable to sanitize content: [Config parsing error 5 (k8sNodeScrapeIntervalMins, 5)]" caller="json.go:261" pid=1111
Check your
config.json
for any erroneous characters or spaces if this message is seen.
Node Collector
Find Node Collectors by listing all pods in the Lacework namespace for the cluster:
kubectl get pods -o wide -n lacework
The Node Collector nodes are named with a prefix of lacework-agent-*
. There should be one for each node in a Kubernetes cluster:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
How do I check whether the Node Collector has started successfully?
Check the logs on any lacework-agent-*
nodes:
kubectl logs lacework-agent-tgfv8 -n lacework
The entries in the Node Collector logs should look similar to the output below (when running successfully):
time="2022-09-30T15:25:34.093Z" level=info msg="k8sNodeCollector is enabled..Starting module" caller="collector.go:75" pid=30794
time="2022-09-30T15:25:34.093Z" level=info msg="Initialized module: k8sNodeCollectorConfig" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.094Z" level=info msg="Initialized module: k8sNodeCollectorSink" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.094Z" level=info msg="Initialized module: k8sNodeCollector" caller="lwmodule.go:50" pid=30794
time="2022-09-30T15:25:34.096Z" level=info msg="Started module: k8sNodeCollectorConfig" caller="lwmodule.go:69" pid=30794
time="2022-09-30T15:25:34.096Z" level=info msg="Started module: k8sNodeCollectorSink" caller="lwmodule.go:69" pid=30794
time="2022-09-30T15:25:34.104Z" level=info msg="k8sNodeCollector, Ticker started." caller="k8snodecollector.go:57" pid=30794
time="2022-09-30T15:25:34.104Z" level=info msg="Started module: k8sNodeCollector" caller="lwmodule.go:69" pid=30794
How do I check whether the Node Collector functionality is enabled or disabled?
Check the logs on any lacework-agent-*
nodes:
kubectl logs lacework-agent-tgfv8 -n lacework
If the entries in the Node Collector logs will look similar to the output below if the Node Collector functionality is disabled:
time="2022-09-28T19:39:20.506Z" level=info msg="k8sNodeCollector is disabled" caller="collector.go:89" pid=1111
Cluster Collector
Find the Cluster Collector by listing all pods in the Lacework namespace for the cluster:
kubectl get pods -o wide -n lacework
The Cluster Collector node is named with a prefix of lacework-agent-cluster-*
. There should be one for each Kubernetes cluster:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lacework-agent-7brq8 1/1 Running 0 18s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-cluster-7c8dd4ccb9-2wh89 1/1 Running 0 22s xxx.xx.x.xx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
lacework-agent-tgfv8 1/1 Running 0 10s xxx.xxx.xx.xxx ip-xxx-xxx-xx-xxx.us-west-2.compute.internal <none> <none>
How do I check whether the Cluster Collector has started successfully?
Check the Cluster Collector node logs for messages containing took
that state Completed discovery
and Collection status OK
:
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep took
2023-01-11T18:19:52.013Z INFO lwmodule/lwmodule.go:52 Initializing modules done, took: 119.706µs
2023-01-11T18:19:52.014Z INFO lwmodule/lwmodule.go:72 Starting modules done, took: 223.344µs
2023-01-11T18:19:52.931Z INFO discovery/discovery.go:535 Completed discovery, took 754.51598ms
2023-01-11T18:20:00.244Z INFO driver/driver.go:211 Collection status OK, started 2023-01-11T18:19:52Z ended 2023-01-11T18:20:00Z took: 8.230019046s
How do I check if the Cluster Collector has sent data to Lacework?
Check the Cluster Collector node logs for messages containing Sending
that have "collectionStatusType":"STARTED"
and "collectionStatusType":"COMPLETED"
entries with no error message ("errorMessage":"
):
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep -i Sending
2022-11-23T16:08:45.076Z INFO driver/driver.go:222 Sending config_summary_t msg {"MsgType":"config_summary_t","COLLECTION_TYPE":{"formatVersion":"1","collectionType":"SNAPSHOT"},"COLLECTION_VERSION":"1.0","REQUEST_GUID":"B3BE74CC-CD32-B5B9-4140-2010841FF57B","START_TIME":"1669219715557","END_TIME":"1669219725076","COLLECTION_CONFIG":{"formatVersion":"1","k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4","k8sClusterVersion":{"buildDate":"2021-07-15T20:59:07Z","compiler":"gc","gitCommit":"ca643a4d1f7bfe34773c74f79527be4afd95bf39","gitTreeState":"clean","gitVersion":"v1.21.3","goVersion":"go1.16.6","major":"1","minor":"21","platform":"linux/amd64"},"k8sClusterType":"eks","k8sClusterName":"test","cloudRegion":"us-west-2","hostName":"ip-192-168-59-234.us-west-2.compute.internal","k8sCollectorConfigFileData":"","k8sCollectorConfigMerged":""},"COLLECTION_START_TIME":"1669219715557","COLLECTION_END_TIME":"1669219725076","COLLECTION_STATUS":{"formatVersion":"1","collectionStatusType":"STARTED","errorType":"OK","errorMessage":""},"PROPS":{"collectionStats":{"numDefaultAllows":"0","numDefaultDenies":"0","numOverrideAllows":"0","numOverrideDenies":"0","numDetailErrs":"0","numStatusErrs":"0","numSummaryErrs":"0","timeTakenForCollectionInMsecs":"0","timeTakenForCollectionHumanReadable":""},"k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4"},"KEYS":{"totalCollectionRecords":""}}
2022-11-23T16:08:57.703Z INFO driver/driver.go:222 Sending config_summary_t msg {"MsgType":"config_summary_t","COLLECTION_TYPE":{"formatVersion":"1","collectionType":"SNAPSHOT"},"COLLECTION_VERSION":"1.0","REQUEST_GUID":"B3BE74CC-CD32-B5B9-4140-2010841FF57B","START_TIME":"1669219715557","END_TIME":"1669219737703","COLLECTION_CONFIG":{"formatVersion":"1","k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4","k8sClusterVersion":{"buildDate":"2021-07-15T20:59:07Z","compiler":"gc","gitCommit":"ca643a4d1f7bfe34773c74f79527be4afd95bf39","gitTreeState":"clean","gitVersion":"v1.21.3","goVersion":"go1.16.6","major":"1","minor":"21","platform":"linux/amd64"},"k8sClusterType":"eks","k8sClusterName":"test","cloudRegion":"us-west-2","hostName":"ip-192-168-59-234.us-west-2.compute.internal","k8sCollectorConfigFileData":"","k8sCollectorConfigMerged":""},"COLLECTION_START_TIME":"1669219715557","COLLECTION_END_TIME":"1669219737703","COLLECTION_STATUS":{"formatVersion":"1","collectionStatusType":"COMPLETED","errorType":"OK","errorMessage":""},"PROPS":{"collectionStats":{"numDefaultAllows":"0","numDefaultDenies":"0","numOverrideAllows":"0","numOverrideDenies":"30","numDetailErrs":"0","numStatusErrs":"0","numSummaryErrs":"0","timeTakenForCollectionInMsecs":"22145","timeTakenForCollectionHumanReadable":"22.145797895s"},"k8sClusterId":"47337eba-d271-4100-a00f-acf7cd7b33f4"},"KEYS":{"totalCollectionRecords":"578"}}
Why don't I see any Compliance Changed alerts after integrating my Kubernetes cluster?
After integrating your Kubernetes cluster with the Lacework Compliance platform, Compliance Changed alerts are not generated until at least two full evaluations have been performed on your cluster. This is generally less than 48 hours after integrating your cluster.
Compliance Changed alerts occur when two existing evaluations are compared to detect any changes. If a compliance violation is found during the comparison, a resulting Compliance Changed alert is generated.
New Violations can be generated after the first evaluation on your Kubernetes cluster (for any detected policy violations).
Partial collection available due to EC2MetadataError
The Cluster Collector retrieves AWS instance metadata for the EKS cluster, which is crucial to connect Node and Cluster collector data together and provide configuration visibility in the Lacework platform.
If the metadata requests from the Cluster Collector are blocked, this can result in errors seen in the Lacework Console (such as Partial Collection).
Check the Cluster Collector logs to see if any errors are seen relating to the retrieval of metadata:
kubectl logs lacework-agent-cluster-7c8dd4ccb9-2wh89 -n lacework | grep EC2Metadata
2023-01-20T12:26:59.763Z INFO discovery/cdiscovery.go:117 AWS:Error in getting instance-id from EC2 metadata EC2MetadataError: failed to make EC2Metadata requestrequest blocked by allow-route-regexp "^$": /latest/meta-data/instance-idstatus code: 404, request id:
These errors can be seen when the EKS cluster is running KIAM (or similar). KIAM utilizes a proxy that intercepts metadata requests. KIAM policies can sometimes be configured to disallow AWS metadata requests from EKS cluster pods (where the Cluster Collector is installed).
If KIAM is installed on your cluster, ensure that AWS metadata requests are permitted from pods.
Partial collection available due to mismatched EKS cluster name
Errors such as Partial collection available. The cloud collector has not been configured.
may be seen on the Console if the kubernetesCluster
value provided during installation does not match the EKS cluster name in AWS.
Check the name of your EKS cluster in AWS, and then repeat the EKS Compliance integration Helm command again.
Ensure that the following configuration value is set to your Amazon EKS cluster name (as it appears in AWS):
--set laceworkConfig.kubernetesCluster=myEksClusterName \