2019-07-04

GitOps - Flux 心得

IaC (Infrastructure as Code): 全部的 state 都使用 git 控管。
- 每一次的更動都是 atomic, transactional。
- 透過發 pull request 解決 production 的問題，而不是直接做操作。
No more kubectl.
不需要把 Cluster 權限給 CI。
有新的 Image 會自動部署。
- Watch Docker Registry.

Flux Flow: From https://github.com/weaveworks/flux

Typical push pipeline with read/write permission outside of the cluster.

Push Strategy: From https://www.weave.works/technologies/gitops/#pull-pipeline

Pull pipeline: credentials are kept inside the cluster.

Pull Strategy: From https://www.weave.works/technologies/gitops/#pull-pipeline

Telepresence - 在本地端與 Kubernetes 的其他服務溝通及快速開發

Telepresence 透過 sshuttle 使用 SSH connection 產生 VPN-like tunnel，建立一個雙向的 network proxy。(more details)，甚至可以在 service 前面加一層 ingress, cert-manager 進而產生一個 Local HTTPS 的開發環境。

此外，CNCF 基金會目前已經將 Telepresence 加入計畫，可以對這個工具多一點的信心。

解決問題

如果 cluster 的 micro service 很多，不可能在本地端用 minikube 測試。
不需要等待 CI/CD 將程式碼推到 Cluster 才能看到結果
不需要額外設定 VPN(OpenVPN, Wireguard)，存取 Cluster 其他 Service
可以在程式內直接使用 Kubernetes Cluster 的 Service
- ex. requests.get(‘client.elasticsearch:9200’)

先看結果

InvalidReplicaSetConfig: Our replica set config is invalid or we are not a member of it

Sometimes, kubernetes cluster restart. And comes the problem. Just reconig by using the same cofig and force it.

Solution

rs.reconfig(rs.config(),{force:true})

Problems

InvalidReplicaSetConfig

configReplSet:OTHER> rs.status()
{
	"operationTime" : Timestamp(1558623903, 1),
	"ok" : 0,
	"errmsg" : "Our replica set config is invalid or we are not a member of it",
	"code" : 93,
	"codeName" : "InvalidReplicaSetConfig",
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("000000000000000000000000")
	},
	"lastCommittedOpTime" : Timestamp(0, 0),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1558623903, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

AlreadyInitialized

configReplSet:OTHER> rs.initiate(
...   {
...     _id: "configReplSet",
...     configsvr: true,
...     members: [
...       { _id : 0, host : "mongo-configsvr-0.mongo-configsvr:27019" },
...       { _id : 1, host : "mongo-configsvr-1.mongo-configsvr:27019" },
...       { _id : 2, host : "mongo-configsvr-2.mongo-configsvr:27019" }
...     ]
...   }
... )
{
	"operationTime" : Timestamp(1558612647, 1),
	"ok" : 0,
	"errmsg" : "already initialized",
	"code" : 23,
	"codeName" : "AlreadyInitialized",
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("000000000000000000000000")
	},
	"lastCommittedOpTime" : Timestamp(0, 0),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1558623903, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

2019-05-16

Grafana Troubleshooting

grafana Alert validation error: Data source used by alert rule not found, alertName=Index Size alert, datasource=”$server”

Query 不能使用變數，不然無法正確 Alert。

https://github.com/grafana/grafana/issues/9334

Provisioning

https://grafana.com/docs/administration/provisioning/

無法使用 provisioning/dashboards/dashboard.yaml

無解．變更 dashboard 後，點擊儲存，只能匯出。

logger=alerting.notifier error=”open : no such file or directory

… Todo

匯入 dashboard.json 不會自動產生 alert

要再點選儲存，才會生效。

2019-05-13

Elastic Search Troubleshooting

master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster

無法正確的加入 master node

兩個 env 要一起用，不然就會出現這個錯誤。

- name: discovery.seed_hosts
value: "elasticsearch-master"
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2"

Ref: https://discuss.elastic.co/t/master-not-discovered-yet-this-node-has-not-previously-joined-a-bootstrapped-v7-cluster/176304/2

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid than local cluster uuid rejecting

要先加入新的 master node 才能砍掉舊的 master node，因為 master node 會存 cluster metadata，需要先 replica 到新的 master node。

CrashLoopBackOff 時無法自動 delete pod

寫錯 yaml 時，造成 container crash，進而造成 CrashLoopBackOff 時，就算 apply 新的 yaml，kubernetes 也無法自動移除舊的 Pod。只能 kubectl delete pod podname

Kind: StatefulSet
Replicas: 1
kuberctl:
Client: v1.14.1
Server: v1.12.7-gke.10

Rammus

A Taiwan Developer