Anthos Multi Cluster Ingress — прерывистое подключение и исчезновение серверной службы
Я использую частный кластер 2 GKE, настроенный в europe-west2. У меня есть выделенный кластер конфигурации для MCI и рабочий кластер для рабочих нагрузок. Оба кластера зарегистрированы в концентраторе Anthos, а функция входа включена в кластере конфигурации. Кроме того, в рабочем кластере работает последняя версия ASM 1.12.2.
As far as MCI is concerned my deployment is 'standard' as in based on available docs (ie https://cloud.google.com/architecture/distributed-services-on-gke-private-using-anthos-service-mesh#configure-multi-cluster-ingress, terraform-example-foundation repo etc).
Everything works but I'm hitting an intermittent connectivity issue no matter how many times I redeploy entire stack. My eyes are bleeding from staring at logging dashboard. I ran out of dots to connect.
I'm probing some endpoints presented from my cluster which most of the time returns 200 with following logged under
resource.type="http_load_balancer"
:
{
httpRequest: {
latency: "0.081658s"
remoteIp: "20.83.144.189"
requestMethod: "GET"
requestSize: "360"
requestUrl: "https://foo.bar.io/"
responseSize: "1054"
serverIp: "100.64.72.136"
status: 200
...
}
insertId: "10mjvz4e8g0nq"
jsonPayload: {
@type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
statusDetails: "response_sent_by_backend"
}
...
resource: {
labels: {
backend_service_name: "mci-4z8mmz-80-asm-ingress-mcs-istio"
forwarding_rule_name: "mci-4z8mmz-fws-asm-ingress-mci-istio"
project_id: "prj-foo-bar"
target_proxy_name: "mci-4z8mmz-asm-ingress-mci-istio"
url_map_name: "mci-4z8mmz-asm-ingress-mci-istio"
zone: "global"
}
type: "http_load_balancer"
}
severity: "INFO"
spanId: "2a986abfc69bef6f"
timestamp: "2022-02-04T15:24:14.160642Z"
...
}
At random intervals, anything between 1 - 5 hours the probes start failing with 404 for a period of 5 - 10 mins and following is logged:
{
httpRequest: {
...
requestMethod: "GET"
...
requestUrl: "https://foo.bar.io/"
...
status: 404
...
}
insertId: "10mjvz4e8g0nq"
jsonPayload: {
@type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
statusDetails: "internal_error"
}
...
resource: {
labels: {
backend_service_name: ""
forwarding_rule_name: "mci-4z8mmz-fws-asm-ingress-mci-istio"
project_id: "prj-foo-bar"
target_proxy_name: "mci-4z8mmz-asm-ingress-mci-istio"
url_map_name: "mci-4z8mmz-asm-ingress-mci-istio"
zone: "global"
}
type: "http_load_balancer"
}
severity: "WARNING"
...
}
backend_service_name
and
serverIp
disappears and the external LB provisioned via MCI goes for an extended nap. If I try to access the endpoints in a browser during that period i get 404'd and eventually
connection was closed
.
I've searched logs far and wide and cannot find any leads.
Has anyone experienced a similar issue ? Could this be a regional thing ? I'm yet to try deploying to another region.
Any info/links/ideas much appreciated.
Edit:
I also confirmed that health checks are fine and there are no transitions. Pods never receive the request so 404's are coming from external lb.
1 ответ
У меня была такая же/похожая проблема при использовании HTTPS с MultiClusterIngress.
Служба поддержки Google предложила использовать буквальный статический IP-адрес для аннотации:
networking.gke.io/static-ip: STATIC_IP_ADDRESS
Попробуйте использовать буквальный IP-адрес, например
34.102.201.47
Вместо
https://www.googleapis.com/compute/v1/projects/PROJECT_ID/global/addresses/ADDRESS_NAME
как описано в https://cloud.google.com/kubernetes-engine/docs/how-to/multi-cluster-ingress#static.
Если это не решит проблему, попробуйте обратиться в службу поддержки Google.