Aaron`s Dev Path

gitGraph:
  commit id:"Graduate From High School" tag:"Linfen, China"
  commit id:"Got Driver Licence" tag:"2013.08"
  branch TYUT
  commit id:"Enrollment TYUT 🥰"  tag:"Taiyuan, China"
  commit id:"Develop Game App" tag:"“Hello Hell”" type: HIGHLIGHT
  commit id:"Plan:3+1" tag:"2016.09"
  branch Briup.Ltd
  commit id:"First Internship" tag:"Suzhou, China"
  commit id:"CRUD boy" 
  commit id:"Dimission" tag:"2017.01" type:REVERSE
  checkout TYUT
  merge Briup.Ltd id:"Final Presentation" tag:"2017.04"
  checkout Briup.Ltd
  branch Enjoyor.PLC
  commit id:"Second Internship" tag:"Hangzhou,China"
  checkout TYUT
  merge Enjoyor.PLC id:"Got SE Bachelor Degree " tag:"2017.07"
  checkout Enjoyor.PLC
  commit id:"First Full Time Job" tag:"2017.07"
  commit id:"Dimssion" tag:"2018.04"
  checkout main
  merge Enjoyor.PLC id:"Plan To Study Aboard"
  commit id:"Get Some Rest" tag:"2018.06"
  branch TOEFL-GRE
  commit id:"Learning At Huahua.Ltd" tag:"Beijing,China"
  commit id:"Got USC Admission" tag:"2018.11" type: HIGHLIGHT
  checkout main
  merge TOEFL-GRE id:"Prepare To Leave" tag:"2018.12"
  branch USC
  commit id:"Pass Pre-School" tag:"Los Angeles,USA"
  checkout main
  merge USC id:"Back Home,Summer Break" tag:"2019.06"
  commit id:"Back School" tag:"2019.07"
  checkout USC
  merge main id:"Got Straight As"
  commit id:"Leaning ML, DL, GPT"
  checkout main
  merge USC id:"Back,Due to COVID-19" tag:"2021.02"
  checkout USC
  commit id:"Got DS Master Degree" tag:"2021.05"
  checkout main
  commit id:"Got An offer" tag:"2021.06"
  branch Zhejianglab
  commit id:"Second Full Time" tag:"Hangzhou,China"
  commit id:"Got Promotion" tag:"2024.01"
  commit id:"For Now"

Subsections of Aaron`s Dev Path

🐙Argo (CI/CD)

Content

CheatSheets

  • decode passd
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
  • relogin
ARGOCD_PASS=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
MASTER_IP=$(kubectl get nodes --selector=node-role.kubernetes.io/control-plane -o jsonpath='{$.items[0].status.addresses[?(@.type=="InternalIP")].address}')
argocd login --insecure --username admin $MASTER_IP:30443 --password $ARGOCD_PASS
  • force delete
argocd app terminate-op <$>

Subsections of 🐙Argo (CI/CD)

Subsections of Argo WorkFlow

Subsections of Template

DAG Template

DAG Template

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-diamond-
spec:
  entrypoint: entry
  serviceAccountName: argo-workflow
  templates:
  - name: echo
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]
  - name: entry
    dag:
      tasks:
      - name: start
        template: echo
        arguments:
            parameters: [{name: message, value: DAG initialized}]
      - name: diamond
        template: diamond
        dependencies: [start]
  - name: diamond
    dag:
      tasks:
      - name: A
        template: echo
        arguments:
          parameters: [{name: message, value: A}]
      - name: B
        dependencies: [A]
        template: echo
        arguments:
          parameters: [{name: message, value: B}]
      - name: C
        dependencies: [A]
        template: echo
        arguments:
          parameters: [{name: message, value: C}]
      - name: D
        dependencies: [B, C]
        template: echo
        arguments:
          parameters: [{name: message, value: D}]
      - name: end
        dependencies: [D]
        template: echo
        arguments:
          parameters: [{name: message, value: end}]

Subsections of 📃Articles

Subsections of BuckUp

ES [Local Disk]

Preliminary

  • ElasticSearch has installed, if not check link

  • The elasticsearch.yml has configed path.repo, which should be set the same value of settings.location (this will be handled by helm chart, dont worry)

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: elastic-search
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: elasticsearch
        targetRevision: 19.11.3
        helm:
          releaseName: elastic-search
          values: |
            global:
              kibanaEnabled: true
            clusterName: elastic
            image:
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            security:
              enabled: false
            service:
              type: ClusterIP
            extraConfig:
              path:
                repo: /tmp
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: elastic-search.dev.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            master:
              masterOnly: false
              replicaCount: 1
              persistence:
                enabled: false
              resources:
                requests:
                  cpu: 2
                  memory: 1024Mi
                limits:
                  cpu: 4
                  memory: 4096Mi
              heapSize: 2g
            data:
              replicaCount: 0
              persistence:
                enabled: false
            coordinating:
              replicaCount: 0
            ingest:
              enabled: true
              replicaCount: 0
              service:
                enabled: false
                type: ClusterIP
              ingress:
                enabled: false
            metrics:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            sysctlImage:
              enabled: true
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            kibana:
              elasticsearch:
                hosts:
                  - '{{ include "elasticsearch.service.name" . }}'
                port: '{{ include "elasticsearch.service.ports.restAPI" . }}'
            esJavaOpts: "-Xmx2g -Xms2g"        
      destination:
        server: https://kubernetes.default.svc
        namespace: application

    diff from oirginal file :

    extraConfig:
        path:
          repo: /tmp

Methods

Elasticsearch 做备份有两种方式,

  1. 是将数据导出成文本文件,比如通过elasticdump、esm等工具将存储在 Elasticsearch 中的数据导出到文件中。
  2. 是使用snapshot接口实现快照功能,增量备份文件

第一种方式相对简单,在数据量小的时候比较实用,但当应对大数据量场景时,更推荐使用snapshot api 的方式。

Steps

buckup

asdadas

  1. 创建快照仓库repo -> my_fs_repository
curl -k -X PUT "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/tmp"
  }
}
'

你也能使用storage-class 挂载一个路径在pod中,将snapshot文件存放在外挂路径上

  1. 验证集群各个节点是否可以使用这个快照仓库repo
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository/_verify?pretty"
  1. 查看快照仓库repo
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/_all?pretty"
  1. 查看某一个快照仓库repo的具体setting
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository?pretty"
  1. 分析一个快照仓库repo
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository/_analyze?blob_count=10&max_blob_size=1mb&timeout=120s&pretty"
  1. 手动打快照
curl -k -X PUT "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository/ay_snap_02?pretty"
  1. 查看指定快照仓库repo 可用的快照
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository/*?verbose=false&pretty"
  1. 测试恢复
# Delete an index
curl -k -X DELETE "https://elastic-search.dev.tech:32443/books?pretty"

# restore that index
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_fs_repository/ay_snap_02/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "books"
}
'

# query
curl -k -X GET "https://elastic-search.dev.tech:32443/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

ES [S3 Compatible]

Preliminary

  • ElasticSearch has installed, if not check link

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: elastic-search
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: elasticsearch
        targetRevision: 19.11.3
        helm:
          releaseName: elastic-search
          values: |
            global:
              kibanaEnabled: true
            clusterName: elastic
            image:
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            security:
              enabled: true
            service:
              type: ClusterIP
            extraEnvVars:
            - name: S3_ACCESSKEY
              value: admin
            - name: S3_SECRETKEY
              value: ZrwpsezF1Lt85dxl
            extraConfig:
              s3:
                client:
                  default:
                    protocol: http
                    endpoint: "http://192.168.31.111:9090"
                    path_style_access: true
            initScripts:
              configure-s3-client.sh: |
                elasticsearch_set_key_value "s3.client.default.access_key" "${S3_ACCESSKEY}"
                elasticsearch_set_key_value "s3.client.default.secret_key" "${S3_SECRETKEY}"
            hostAliases:
            - ip: 192.168.31.111
              hostnames:
              - minio-api.dev.tech
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: elastic-search.dev.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            master:
              masterOnly: false
              replicaCount: 1
              persistence:
                enabled: false
              resources:
                requests:
                  cpu: 2
                  memory: 1024Mi
                limits:
                  cpu: 4
                  memory: 4096Mi
              heapSize: 2g
            data:
              replicaCount: 0
              persistence:
                enabled: false
            coordinating:
              replicaCount: 0
            ingest:
              enabled: true
              replicaCount: 0
              service:
                enabled: false
                type: ClusterIP
              ingress:
                enabled: false
            metrics:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            sysctlImage:
              enabled: true
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            kibana:
              elasticsearch:
                hosts:
                  - '{{ include "elasticsearch.service.name" . }}'
                port: '{{ include "elasticsearch.service.ports.restAPI" . }}'
            esJavaOpts: "-Xmx2g -Xms2g"        
      destination:
        server: https://kubernetes.default.svc
        namespace: application

    diff from oirginal file :

    extraEnvVars:
    - name: S3_ACCESSKEY
      value: admin
    - name: S3_SECRETKEY
      value: ZrwpsezF1Lt85dxl
    extraConfig:
      s3:
        client:
          default:
            protocol: http
            endpoint: "http://192.168.31.111:9090"
            path_style_access: true
    initScripts:
      configure-s3-client.sh: |
        elasticsearch_set_key_value "s3.client.default.access_key" "${S3_ACCESSKEY}"
        elasticsearch_set_key_value "s3.client.default.secret_key" "${S3_SECRETKEY}"
    hostAliases:
    - ip: 192.168.31.111
      hostnames:
      - minio-api.dev.tech

Methods

Elasticsearch 做备份有两种方式,

  1. 是将数据导出成文本文件,比如通过elasticdump、esm等工具将存储在 Elasticsearch 中的数据导出到文件中。
  2. 是使用snapshot接口实现快照功能,增量备份文件

第一种方式相对简单,在数据量小的时候比较实用,但当应对大数据量场景时,更推荐使用snapshot api 的方式。

Steps

buckup

asdadas

  1. 创建快照仓库repo -> my_s3_repository
curl -k -X PUT "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "local-test",
    "client": "default",
    "endpoint": "http://192.168.31.111:9000"
  }
}
'

你也能使用storage-class 挂载一个路径在pod中,将snapshot文件存放在外挂路径上

  1. 验证集群各个节点是否可以使用这个快照仓库repo
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository/_verify?pretty"
  1. 查看快照仓库repo
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/_all?pretty"
  1. 查看某一个快照仓库repo的具体setting
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository?pretty"
  1. 分析一个快照仓库repo
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository/_analyze?blob_count=10&max_blob_size=1mb&timeout=120s&pretty"
  1. 手动打快照
curl -k -X PUT "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository/ay_s3_snap_02?pretty"
  1. 查看指定快照仓库repo 可用的快照
curl -k -X GET "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository/*?verbose=false&pretty"
  1. 测试恢复
# Delete an index
curl -k -X DELETE "https://elastic-search.dev.tech:32443/books?pretty"

# restore that index
curl -k -X POST "https://elastic-search.dev.tech:32443/_snapshot/my_s3_repository/ay_s3_snap_02/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "books"
}
'

# query
curl -k -X GET "https://elastic-search.dev.tech:32443/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

ES Auto BackUp

Preliminary

  • ElasticSearch has installed, if not check link

  • We use local disk to save the snapshots, more deatils check link

  • And the security is enabled.

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: elastic-search
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: elasticsearch
        targetRevision: 19.11.3
        helm:
          releaseName: elastic-search
          values: |
            global:
              kibanaEnabled: true
            clusterName: elastic
            image:
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            security:
              enabled: true
              tls:
                autoGenerated: true
            service:
              type: ClusterIP
            extraConfig:
              path:
                repo: /tmp
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: elastic-search.dev.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            master:
              masterOnly: false
              replicaCount: 1
              persistence:
                enabled: false
              resources:
                requests:
                  cpu: 2
                  memory: 1024Mi
                limits:
                  cpu: 4
                  memory: 4096Mi
              heapSize: 2g
            data:
              replicaCount: 0
              persistence:
                enabled: false
            coordinating:
              replicaCount: 0
            ingest:
              enabled: true
              replicaCount: 0
              service:
                enabled: false
                type: ClusterIP
              ingress:
                enabled: false
            metrics:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            sysctlImage:
              enabled: true
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            kibana:
              elasticsearch:
                hosts:
                  - '{{ include "elasticsearch.service.name" . }}'
                port: '{{ include "elasticsearch.service.ports.restAPI" . }}'
            esJavaOpts: "-Xmx2g -Xms2g"        
      destination:
        server: https://kubernetes.default.svc
        namespace: application

    diff from oirginal file :

    security:
      enabled: true
    extraConfig:
        path:
          repo: /tmp

Methods

Steps

auto buckup
  1. 创建快照仓库repo -> slm_fs_repository
curl --user elastic:L9shjg6csBmPZgCZ -k -X PUT "https://10.88.0.143:30294/_snapshot/slm_fs_repository?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/tmp"
  }
}
'

你也能使用storage-class 挂载一个路径在pod中,将snapshot文件存放在外挂路径上

  1. 验证集群各个节点是否可以使用这个快照仓库repo
curl --user elastic:L9shjg6csBmPZgCZ  -k -X POST "https://10.88.0.143:30294/_snapshot/slm_fs_repository/_verify?pretty"
  1. 查看快照仓库repo
curl --user elastic:L9shjg6csBmPZgCZ  -k -X GET "https://10.88.0.143:30294/_snapshot/_all?pretty"
  1. 查看某一个快照仓库repo的具体setting
curl --user elastic:L9shjg6csBmPZgCZ  -k -X GET "https://10.88.0.143:30294/_snapshot/slm_fs_repository?pretty"
  1. 分析一个快照仓库repo
curl --user elastic:L9shjg6csBmPZgCZ  -k -X POST "https://10.88.0.143:30294/_snapshot/slm_fs_repository/_analyze?blob_count=10&max_blob_size=1mb&timeout=120s&pretty"
  1. 查看指定快照仓库repo 可用的快照
curl --user elastic:L9shjg6csBmPZgCZ  -k -X GET "https://10.88.0.143:30294/_snapshot/slm_fs_repository/*?verbose=false&pretty"
  1. 创建SLM admin 角色
curl --user elastic:L9shjg6csBmPZgCZ -k -X POST "https://10.88.0.143:30294/_security/role/slm-admin?pretty" -H 'Content-Type: application/json' -d'
{
  "cluster": [ "manage_slm", "cluster:admin/snapshot/*" ],
  "indices": [
    {
      "names": [ ".slm-history-*" ],
      "privileges": [ "all" ]
    }
  ]
}
'
  1. 创建自动备份cornjob
curl --user elastic:L9shjg6csBmPZgCZ -k -X PUT "https://10.88.0.143:30294/_slm/policy/nightly-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 30 1 * * ?",       
  "name": "<nightly-snap-{now/d}>", 
  "repository": "slm_fs_repository",    
  "config": {
    "indices": "*",                 
    "include_global_state": true    
  },
  "retention": {                    
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}
'
  1. 启动自动备份
curl --user elastic:L9shjg6csBmPZgCZ -k -X POST "https://10.88.0.143:30294/_slm/policy/nightly-snapshots/_execute?pretty"
  1. 查看SLM备份历史
curl --user elastic:L9shjg6csBmPZgCZ -k -X GET "https://10.88.0.143:30294/_slm/stats?pretty"
  1. 测试恢复
# Delete an index
curl --user elastic:L9shjg6csBmPZgCZ  -k -X DELETE "https://10.88.0.143:30294/books?pretty"

# restore that index
curl --user elastic:L9shjg6csBmPZgCZ  -k -X POST "https://10.88.0.143:30294/_snapshot/slm_fs_repository/my_snapshot_2099.05.06/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "books"
}
'

# query
curl --user elastic:L9shjg6csBmPZgCZ  -k -X GET "https://10.88.0.143:30294/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

Subsections of Cheat Sheet

Subsections of Aliyun Related

OSSutil

Ali version of Minio(https://min.io/)

download ossutil

first, you need to download ossutil first

OS:
curl https://gosspublic.alicdn.com/ossutil/install.sh  | sudo bash
curl -o ossutil-v1.7.19-windows-386.zip https://gosspublic.alicdn.com/ossutil/1.7.19/ossutil-v1.7.19-windows-386.zip

config ossutil

./ossutil config
ParamsDescriptionInstruction
endpointthe Endpoint of the region where the Bucket is located
accessKeyIDOSS AccessKeyget from user info panel
accessKeySecretOSS AccessKeySecretget from user info panel
stsTokentoken for sts servicecould be empty
Info

and you can also modify /home/<$user>/.ossutilconfig file directly to change the configuration.

list files

ossutil ls oss://<$PATH>
ossutil ls oss://csst-data/CSST-20240312/dfs/

download file/dir

you can use cp to download or upload file

ossutil cp -r oss://<$PATH> <$PTHER_PATH>
ossutil cp -r oss://csst-data/CSST-20240312/dfs/ /data/nfs/data/pvc...

upload file/dir

ossutil cp -r <$SOURCE_PATH> oss://<$PATH>
ossutil cp -r /data/nfs/data/pvc/a.txt  oss://csst-data/CSST-20240312/dfs/b.txt

ECS DNS

ZJADC (Aliyun Directed Cloud)

Append content in /etc/resolv.conf

options timeout:2 attempts:3 rotate
nameserver 10.255.9.2
nameserver 10.200.12.5

And then you probably need to modify yum.repo.d as well, check link


YQGCY (Aliyun Directed Cloud)

Append content in /etc/resolv.conf

nameserver 172.27.205.79

And then restart kube-system.coredns-xxxx


Google DNS

nameserver 8.8.8.8
nameserver 4.4.4.4
nameserver 223.5.5.5
nameserver 223.6.6.6

Restart DNS

OS:
vim /etc/NetworkManager/NetworkManager.conf
vim /etc/NetworkManager/NetworkManager.conf

add "dns=none" under '[main]' part

systemctl restart NetworkManager

Modify ifcfg-ethX [Optional]

if you cannot get ipv4 address, you can try to modify ifcfg-ethX

vim /etc/sysconfig/network-scripts/ifcfg-ens33

set ONBOOT=yes

OS Mirrors

Fedora

  • Fedora 40 located in /etc/yum.repos.d/
    [updates]
    name=Fedora $releasever - $basearch - Updates
    #baseurl=http://download.example/pub/fedora/linux/updates/$releasever/Everything/$basearch/
    metalink=https://mirrors.fedoraproject.org/metalink?repo=updates-released-f$releasever&arch=$basearch
    enabled=1
    countme=1
    repo_gpgcheck=0
    type=rpm
    gpgcheck=1
    metadata_expire=6h
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$releasever-$basearch
    skip_if_unavailable=False
    
    [updates-debuginfo]
    name=Fedora $releasever - $basearch - Updates - Debug
    #baseurl=http://download.example/pub/fedora/linux/updates/$releasever/Everything/$basearch/debug/
    metalink=https://mirrors.fedoraproject.org/metalink?repo=updates-released-debug-f$releasever&arch=$basearch
    enabled=0
    repo_gpgcheck=0
    type=rpm
    gpgcheck=1
    metadata_expire=6h
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$releasever-$basearch
    skip_if_unavailable=False
    
    [updates-source]
    name=Fedora $releasever - Updates Source
    #baseurl=http://download.example/pub/fedora/linux/updates/$releasever/Everything/SRPMS/
    metalink=https://mirrors.fedoraproject.org/metalink?repo=updates-released-source-f$releasever&arch=$basearch
    enabled=0
    repo_gpgcheck=0
    type=rpm
    gpgcheck=1
    metadata_expire=6h
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$releasever-$basearch
    skip_if_unavailable=False

CentOS

  • CentOS 7 located in /etc/yum.repos.d/

    [base]
    name=CentOS-$releasever
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
    baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
    gpgcheck=1
    gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-7
    
    [extras]
    name=CentOS-$releasever
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
    baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
    gpgcheck=1
    gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-7
    [base]
    name=CentOS-$releasever - Base - mirrors.aliyun.com
    failovermethod=priority
    baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
    gpgcheck=1
    gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
    
    [extras]
    name=CentOS-$releasever - Extras - mirrors.aliyun.com
    failovermethod=priority
    baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
    gpgcheck=1
    gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
    [base]
    name=CentOS-$releasever - Base - 163.com
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
    baseurl=http://mirrors.163.com/centos/$releasever/os/$basearch/
    gpgcheck=1
    gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7
    
    [extras]
    name=CentOS-$releasever - Extras - 163.com
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
    baseurl=http://mirrors.163.com/centos/$releasever/extras/$basearch/
    gpgcheck=1
    gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7

  • CentOS 8 stream located in /etc/yum.repos.d/

    [baseos]
    name=CentOS Linux - BaseOS
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=BaseOS&infra=$infra
    baseurl=http://mirror.centos.org/centos/8-stream/BaseOS/$basearch/os/
    gpgcheck=1
    enabled=1
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
    
    [extras]
    name=CentOS Linux - Extras
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
    baseurl=http://mirror.centos.org/centos/8-stream/extras/$basearch/os/
    gpgcheck=1
    enabled=1
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
    
    [appstream]
    name=CentOS Linux - AppStream
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=AppStream&infra=$infra
    baseurl=http://mirror.centos.org/centos/8-stream/AppStream/$basearch/os/
    gpgcheck=1
    enabled=1
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
    [base]
    name=CentOS-8.5.2111 - Base - mirrors.aliyun.com
    baseurl=http://mirrors.aliyun.com/centos-vault/8.5.2111/BaseOS/$basearch/os/
    gpgcheck=0
    gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-Official
    
    [extras]
    name=CentOS-8.5.2111 - Extras - mirrors.aliyun.com
    baseurl=http://mirrors.aliyun.com/centos-vault/8.5.2111/extras/$basearch/os/
    gpgcheck=0
    gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-Official
    
    [AppStream]
    name=CentOS-8.5.2111 - AppStream - mirrors.aliyun.com
    baseurl=http://mirrors.aliyun.com/centos-vault/8.5.2111/AppStream/$basearch/os/
    gpgcheck=0
    gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-Official

Ubuntu

  • Ubuntu 18.04 located in /etc/apt/sources.list

    deb http://archive.ubuntu.com/ubuntu/ bionic main restricted
    deb http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
    deb http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
    deb http://security.ubuntu.com/ubuntu/ bionic-security main restricted

  • Ubuntu 20.04 located in /etc/apt/sources.list

    deb http://archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse
    deb http://security.ubuntu.com/ubuntu/ focal-security main restricted

  • Ubuntu 22.04 located in /etc/apt/sources.list

    deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
    deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
    deb http://archive.ubuntu.com/ubuntu/ jammy-backports main restricted universe multiverse
    deb http://security.ubuntu.com/ubuntu/ jammy-security main restricted

Debian

  • Debian Buster located in /etc/apt/sources.list

    deb http://deb.debian.org/debian buster main
    deb http://security.debian.org/debian-security buster/updates main
    deb http://deb.debian.org/debian buster-updates main
    deb http://mirrors.aliyun.com/debian/ buster main non-free contrib
    deb http://mirrors.aliyun.com/debian-security buster/updates main
    deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
    deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ buster-updates main contrib non-free
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ buster-backports main contrib non-free
    deb http://security.debian.org/debian-security buster/updates main contrib non-free

  • Debian Bullseye located in /etc/apt/sources.list

    deb http://deb.debian.org/debian bullseye main
    deb http://security.debian.org/debian-security bullseye-security main
    deb http://deb.debian.org/debian bullseye-updates main
    deb http://mirrors.aliyun.com/debian/ bullseye main non-free contrib
    deb http://mirrors.aliyun.com/debian-security/ bullseye-security main
    deb http://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
    deb http://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free
    deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free
    deb http://security.debian.org/debian-security bullseye-security main contrib non-free

Anolis

  • Anolis 3 located in /etc/yum.repos.d/

    [alinux3-module]
    name=alinux3-module
    baseurl=http://mirrors.aliyun.com/alinux/3/module/$basearch/
    gpgkey=http://mirrors.aliyun.com/alinux/3/RPM-GPG-KEY-ALINUX-3
    enabled=1
    gpgcheck=1
    
    [alinux3-os]
    name=alinux3-os
    baseurl=http://mirrors.aliyun.com/alinux/3/os/$basearch/
    gpgkey=http://mirrors.aliyun.com/alinux/3/RPM-GPG-KEY-ALINUX-3
    enabled=1
    gpgcheck=1
    
    [alinux3-plus]
    name=alinux3-plus
    baseurl=http://mirrors.aliyun.com/alinux/3/plus/$basearch/
    gpgkey=http://mirrors.aliyun.com/alinux/3/RPM-GPG-KEY-ALINUX-3
    enabled=1
    gpgcheck=1
    
    [alinux3-powertools]
    name=alinux3-powertools
    baseurl=http://mirrors.aliyun.com/alinux/3/powertools/$basearch/
    gpgkey=http://mirrors.aliyun.com/alinux/3/RPM-GPG-KEY-ALINUX-3
    enabled=1
    gpgcheck=1
    
    [alinux3-updates]
    name=alinux3-updates
    baseurl=http://mirrors.aliyun.com/alinux/3/updates/$basearch/
    gpgkey=http://mirrors.aliyun.com/alinux/3/RPM-GPG-KEY-ALINUX-3
    enabled=1
    gpgcheck=1
    
    [epel]
    name=Extra Packages for Enterprise Linux 8 - $basearch
    baseurl=http://mirrors.aliyun.com/epel/8/Everything/$basearch
    failovermethod=priority
    enabled=1
    gpgcheck=1
    gpgkey=http://mirrors.aliyun.com/epel/RPM-GPG-KEY-EPEL-8
    
    [epel-module]
    name=Extra Packages for Enterprise Linux 8 - $basearch
    baseurl=http://mirrors.aliyun.com/epel/8/Modular/$basearch
    failovermethod=priority
    enabled=0
    gpgcheck=1
    gpgkey=http://mirrors.aliyun.com/epel/RPM-GPG-KEY-EPEL-8

  • Anolis 2 located in /etc/yum.repos.d/


Refresh Repo

OS:
dnf clean all && dnf makecache
yum clean all && yum makecache
apt-get clean all

Subsections of App Related

Mirrors [Aliyun, Tsinghua]

Gradle Tencent Mirror

https://mirrors.cloud.tencent.com/gradle/gradle-8.0-bin.zip

PIP Tuna Mirror -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

Maven Mirror

<mirror>
    <id>aliyunmaven</id>
    <mirrorOf>*</mirrorOf>
    <name>阿里云公共仓库</name>
    <url>https://maven.aliyun.com/repository/public</url>
</mirror>

Subsections of Git Related

Not Allow Push

Cannot push to your own branch

mvc mvc

  1. Edit .git/config file under your repo directory.

  2. Find url=entry under section [remote "origin"].

  3. Change it from:

    url=https://gitlab.com/AaronYang2333/ska-src-dm-local-data-preparer.git/

    url=ssh://git@gitlab.com/AaronYang2333/ska-src-dm-local-data-preparer.git

  4. try push again

Subsections of Linux Related

Disable Service

Disable firewall、selinux、dnsmasq、swap service

systemctl disable --now firewalld 
systemctl disable --now dnsmasq
systemctl disable --now NetworkManager

setenforce 0
sed -i 's#SELINUX=permissive#SELINUX=disabled#g' /etc/sysconfig/selinux
sed -i 's#SELINUX=permissive#SELINUX=disabled#g' /etc/selinux/config
reboot
getenforce


swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab

Example Shell Script

Init ES Backup Setting

create an ES backup setting in s3, and make an snapshot after creation

#!/bin/bash
ES_HOST="http://192.168.58.2:30910"
ES_BACKUP_REPO_NAME="s3_fs_repository"
S3_CLIENT="default"
ES_BACKUP_BUCKET_IN_S3="es-snapshot"
ES_SNAPSHOT_TAG="auto"

CHECK_RESPONSE=$(curl -s -k -X POST "$ES_HOST/_snapshot/$ES_BACKUP_REPO_NAME/_verify?pretty" )
CHECKED_NODES=$(echo "$CHECK_RESPONSE" | jq -r '.nodes')


if [ "$CHECKED_NODES" == null ]; then
  echo "Doesn't exist an ES backup setting..."
  echo "A default backup setting will be generated. (using '$S3_CLIENT' s3 client and all backup files will be saved in a bucket : '$ES_BACKUP_BUCKET_IN_S3'"

  CREATE_RESPONSE=$(curl -s -k -X PUT "$ES_HOST/_snapshot/$ES_BACKUP_REPO_NAME?pretty" -H 'Content-Type: application/json' -d "{\"type\":\"s3\",\"settings\":{\"bucket\":\"$ES_BACKUP_BUCKET_IN_S3\",\"client\":\"$S3_CLIENT\"}}")
  CREATE_ACKNOWLEDGED_FLAG=$(echo "$CREATE_RESPONSE" | jq -r '.acknowledged')

  if [ "$CREATE_ACKNOWLEDGED_FLAG" == true ]; then
    echo "Buckup setting '$ES_BACKUP_REPO_NAME' has been created successfully!"
  else
    echo "Failed to create backup setting '$ES_BACKUP_REPO_NAME', since $$CREATE_RESPONSE"
  fi
else
  echo "Already exist an ES backup setting '$ES_BACKUP_REPO_NAME'"
fi

CHECK_RESPONSE=$(curl -s -k -X POST "$ES_HOST/_snapshot/$ES_BACKUP_REPO_NAME/_verify?pretty" )
CHECKED_NODES=$(echo "$CHECK_RESPONSE" | jq -r '.nodes')

if [ "$CHECKED_NODES" != null ]; then
  SNAPSHOT_NAME="meta-data-$ES_SNAPSHOT_TAG-snapshot-$(date +%s)"
  SNAPSHOT_CREATION=$(curl -s -k -X PUT "$ES_HOST/_snapshot/$ES_BACKUP_REPO_NAME/$SNAPSHOT_NAME")
  echo "Snapshot $SNAPSHOT_NAME has been created."
else
  echo "Failed to create snapshot $SNAPSHOT_NAME ."
fi

Login Without Pwd

copy id_rsa to other nodes

yum install sshpass -y
mkdir -p /extend/shell

cat >>/extend/shell/fenfa_pub.sh<< EOF
#!/bin/bash
ROOT_PASS=root123
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
for ip in 101 102 103 
do
sshpass -p\$ROOT_PASS ssh-copy-id -o StrictHostKeyChecking=no 192.168.29.\$ip
done
EOF

cd /extend/shell
chmod +x fenfa_pub.sh

./fenfa_pub.sh

Set Http Proxy

set http proxy

export https_proxy=http://localhost:20171

Subsections of Storage Related

User Based Policy

User Based Policy

you can change <$bucket> to control the permission

App:
  • ${aws:username} is a build-in variable, indicating the logined user name.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowUserToSeeBucketListInTheConsole",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "AllowRootAndHomeListingOfCompanyBucket",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<$bucket>"
            ],
            "Condition": {
                "StringEquals": {
                    "s3:prefix": [
                        "",
                        "<$path>/",
                        "<$path>/${aws:username}"
                    ],
                    "s3:delimiter": [
                        "/"
                    ]
                }
            }
        },
        {
            "Sid": "AllowListingOfUserFolder",
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<$bucket>"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "<$path>/${aws:username}/*"
                    ]
                }
            }
        },
        {
            "Sid": "AllowAllS3ActionsInUserFolder",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::<$bucket>/<$path>/${aws:username}/*"
            ]
        }
    ]
}
  • <$uid> is Aliyun UID
{
    "Version": "1",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "oss:*"
        ],
        "Principal": [
            "<$uid>"
        ],
        "Resource": [
            "acs:oss:*:<$oss_id>:<$bucket>/<$path>/*"
        ]
    }, {
        "Effect": "Allow",
        "Action": [
            "oss:ListObjects",
            "oss:GetObject"
        ],
        "Principal": [
             "<$uid>"
        ],
        "Resource": [
            "acs:oss:*:<$oss_id>:<$bucket>"
        ],
        "Condition": {
            "StringLike": {
            "oss:Prefix": [
                    "<$path>/*"
                ]
            }
        }
    }]
}
Example:
{
	"Version": "1",
	"Statement": [{
		"Effect": "Allow",
		"Action": [
			"oss:*"
		],
		"Principal": [
			"203415213249511533"
		],
		"Resource": [
			"acs:oss:*:1007296819402486:conti-csst/test/*"
		]
	}, {
		"Effect": "Allow",
		"Action": [
			"oss:ListObjects",
			"oss:GetObject"
		],
		"Principal": [
			"203415213249511533"
		],
		"Resource": [
			"acs:oss:*:1007296819402486:conti-csst"
		],
		"Condition": {
			"StringLike": {
				"oss:Prefix": [
					"test/*"
				]
			}
		}
	}]
}

Subsections of Command

Git CMD

Init global config

git config --list
git config --global user.name "AaronYang"
git config --global user.email aaron19940628@gmail.com
git config --global pager.branch false
git config --global pull.ff only
git --no-pager diff

change user and email (locally)

git config user.name ""
git config user.email ""

list all remote repo

git remote -v
git remote set-url origin git@github.com:<$user>/<$repo>.git

Get specific file from remote

git archive --remote=git@github.com:<$user>/<$repo>.git <$branch>:<$source_file_path> -o <$target_source_path>
git archive --remote=git@github.com:AaronYang2333/LOL_Overlay_Assistant_Tool.git master:paper/2003.11755.pdf -o a.pdf

Clone specific branch

git clone -b slurm-23.02 --single-branch --depth=1 https://github.com/SchedMD/slurm.git

Update submodule

git submodule add –depth 1 https://github.com/xxx/xxxx a/b/c

git submodule update --init --recursive

Save credential

login first and then execute this

git config --global credential.helper store

Delete Branch

  • Deleting a remote branch
    git push origin --delete <branch>  # Git version 1.7.0 or newer
    git push origin -d <branch>        # Shorter version (Git 1.7.0 or newer)
    git push origin :<branch>          # Git versions older than 1.7.0
  • Deleting a local branch
    git branch --delete <branch>
    git branch -d <branch> # Shorter version
    git branch -D <branch> # Force-delete un-merged branches

Prune remote branches

git remote prune origin

Update remote repo

git remote set-url origin http://xxxxx.git

Linux

useradd

sudo useradd <$name> -m -r -s /bin/bash -p <$password>
echo '<$name> ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers

telnet

a command line interface for communication with a remote device or serve

telnet <$ip> <$port>
telnet 172.27.253.50 9000 #test application connectivity

lsof (list as open files)

everything is a file

lsof <$option:value>

-a List processes that have open files

-c <process_name> List files opened by the specified process

-g List GID number process details

-d <file_number> List the processes occupying this file number

-d List open files in a directory

-D Recursively list open files in a directory

-n List files using NFS

-i List eligible processes. (protocol, :port, @ip)

-p List files opened by the specified process ID

-u List UID number process details

lsof -i:30443 # find port 30443 
lsof -i -P -n # list all connections

awk (Aho, Weinberger, and Kernighan [Names])

awk is a scripting language used for manipulating data and generating reports.

# awk [params] 'script' 
awk <$params> <$string_content>

filter bigger than 3

echo -e "1\n2\n3\n4\n5\n" | awk '$1>3'

func1 func1

ss (socket statistics)

view detailed information about your system’s network connections, including TCP/IP, UDP, and Unix domain sockets

ss [options]
OptionsDescription
-tDisplay TCP sockets
-lDisplay listening sockets
-nShow numerical addresses instead of resolving
-aDisplay all sockets (listening and non-listening)
#show all listening TCP connection
ss -tln
#show all established TCP connections
ss -tan

clean files 3 days ago

find /aaa/bbb/ccc/*.gz -mtime +3 -exec rm {} \;

ssh without affect $HOME/.ssh/known_hosts

ssh -o "UserKnownHostsFile /dev/null" root@aaa.domain.com
ssh -o "UserKnownHostsFile /dev/null" -o "StrictHostKeyChecking=no" root@aaa.domain.com

sync clock

[yum|dnf] install -y chrony \
    && systemctl enable chronyd \
    && (systemctl is-active chronyd || systemctl start chronyd) \
    && chronyc sources \
    && chronyc tracking \
    && timedatectl set-timezone 'Asia/Shanghai'

set hostname

hostnamectl set-hostname develop

add remote key to other server

ssh -o "UserKnownHostsFile /dev/null" \
    root@aaa.bbb.ccc \
    "mkdir -p /root/.ssh && chmod 700 /root/.ssh && echo '$SOME_PUBLIC_KEY' \
    >> /root/.ssh/authorized_keys && chmod 600 /root/.ssh/authorized_keys"
ssh -o "UserKnownHostsFile /dev/null" \
    root@17.27.253.67 \
    "mkdir -p /root/.ssh && chmod 700 /root/.ssh && echo 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC00JLKF/Cd//rJcdIVGCX3ePo89KAgEccvJe4TEHs5pI5FSxs/7/JfQKZ+by2puC3IT88bo/d7nStw9PR3BXgqFXaBCknNBpSLWBIuvfBF+bcL+jGnQYo2kPjrO+2186C5zKGuPRi9sxLI5AkamGB39L5SGqwe5bbKq2x/8OjUP25AlTd99XsNjEY2uxNVClHysExVad/ZAcl0UVzG5xmllusXCsZVz9HlPExqB6K1sfMYWvLVgSCChx6nUfgg/NZrn/kQG26X0WdtXVM2aXpbAtBioML4rWidsByDb131NqYpJF7f+x3+I5pQ66Qpc72FW1G4mUiWWiGhF9tL8V9o1AY96Rqz0AVaxAQrBEuyCWKrXbA97HeC3Xp57Luvlv9TqUd8CIJYq+QTL0hlIDrzK9rJsg34FRAvf9sh8K2w/T/gC9UnRjRXgkPUgKldq35Y6Z9wP6KY45gCXka1PU4nVqb6wicO+RHcZ5E4sreUwqfTypt5nTOgW2/p8iFhdN8= Administrator@AARON-X1-8TH' \
    >> /root/.ssh/authorized_keys && chmod 600 /root/.ssh/authorized_keys"

set -x

This will print each command to the standard error before executing it, which is useful for debugging scripts.

set -x

set -e

Exit immediately if a command exits with a non-zero status.

set -x

sed (Stream Editor)

sed <$option> <$file_path>

replace unix -> linux

echo "linux is great os. unix is opensource. unix is free os." | sed 's/unix/linux/'

or you can check https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/

fdisk

list all disk

fdisk -l

create CFS file system

Use mkfs.xfs command to create xfs file system and internal log on the same disk, example is shown below:

mkfs.xfs <$path>

modprobe

program to add and remove modules from the Linux Kernel

modprobe nfs && modprobe nfsd

disown

disown command in Linux is used to remove jobs from the job table.

disown [options] jobID1 jobID2 ... jobIDN

for example, there is a job running in the background

ping google.com > /dev/null &

using jobs - to list all running jobs

jobs -l

using disown -a remove all jobs from the job tables

disown -a

using disown %2 to remove the #2 job

disown %2

generate SSH key

ssh-keygen -t rsa -b 4096 -C "aaron19940628@gmail.com"
sudo ln -sf <$install_path>/bin/* /usr/local/bin

append dir into $PATH (temporary)

export PATH="/root/bin:$PATH"

copy public key to ECS

ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.200.60.53

Maven

1. build from submodule

You dont need to build from the head of project.

./mvnw clean package -DskipTests  -rf :<$submodule-name>

you can find the <$submodule-name> from submodule ’s pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

	<modelVersion>4.0.0</modelVersion>

	<parent>
		<groupId>org.apache.flink</groupId>
		<artifactId>flink-formats</artifactId>
		<version>1.20-SNAPSHOT</version>
	</parent>

	<artifactId>flink-avro</artifactId>
	<name>Flink : Formats : Avro</name>

Then you can modify the command as

./mvnw clean package -DskipTests  -rf :flink-avro
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO] ------------------------------------------------------------------------
[INFO] Detecting the operating system and CPU architecture
[INFO] ------------------------------------------------------------------------
[INFO] os.detected.name: linux
[INFO] os.detected.arch: x86_64
[INFO] os.detected.bitness: 64
[INFO] os.detected.version: 6.7
[INFO] os.detected.version.major: 6
[INFO] os.detected.version.minor: 7
[INFO] os.detected.release: fedora
[INFO] os.detected.release.version: 38
[INFO] os.detected.release.like.fedora: true
[INFO] os.detected.classifier: linux-x86_64
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] Flink : Formats : Avro                                             [jar]
[INFO] Flink : Formats : SQL Avro                                         [jar]
[INFO] Flink : Formats : Parquet                                          [jar]
[INFO] Flink : Formats : SQL Parquet                                      [jar]
[INFO] Flink : Formats : Orc                                              [jar]
[INFO] Flink : Formats : SQL Orc                                          [jar]
[INFO] Flink : Python                                                     [jar]
...

Normally, build Flink will start from module flink-parent

2. skip some other test

For example, you can skip RAT test by doing this:

./mvnw clean package -DskipTests '-Drat.skip=true'

Gradle

1. spotless

keep your code spotless, check more detail in https://github.com/diffplug/spotless

there are several files need to configure.

  1. settings.gradle.kts
plugins {
    id("org.gradle.toolchains.foojay-resolver-convention") version "0.7.0"
}
  1. build.gradle.kts
plugins {
    id("com.diffplug.spotless") version "6.23.3"
}
configure<com.diffplug.gradle.spotless.SpotlessExtension> {
    kotlinGradle {
        target("**/*.kts")
        ktlint()
    }
    java {
        target("**/*.java")
        googleJavaFormat()
            .reflowLongStrings()
            .skipJavadocFormatting()
            .reorderImports(false)
    }
    yaml {
        target("**/*.yaml")
        jackson()
            .feature("ORDER_MAP_ENTRIES_BY_KEYS", true)
    }
    json {
        target("**/*.json")
        targetExclude(".vscode/settings.json")
        jackson()
            .feature("ORDER_MAP_ENTRIES_BY_KEYS", true)
    }
}

And the, you can execute follwoing command to format your code.

./gradlew spotlessApply
./mvnw spotless:apply

2. shadowJar

shadowjar could combine a project’s dependency classes and resources into a single jar. check https://imperceptiblethoughts.com/shadow/

you need moidfy your build.gradle.kts

import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins {
    java // Optional 
    id("com.github.johnrengelman.shadow") version "8.1.1"
}

tasks.named<ShadowJar>("shadowJar") {
    archiveBaseName.set("connector-shadow")
    archiveVersion.set("1.0")
    archiveClassifier.set("")
    manifest {
        attributes(mapOf("Main-Class" to "com.example.xxxxx.Main"))
    }
}
./gradlew shadowJar

3. check dependency

list your project’s dependencies in tree view

you need moidfy your build.gradle.kts

configurations {
    compileClasspath
}
./gradlew dependencies --configuration compileClasspath
./gradlew :<$module_name>:dependencies --configuration compileClasspath

result will look like this

compileClasspath - Compile classpath for source set 'main'.
+--- org.projectlombok:lombok:1.18.22
+--- org.apache.flink:flink-hadoop-fs:1.17.1
|    \--- org.apache.flink:flink-core:1.17.1
|         +--- org.apache.flink:flink-annotations:1.17.1
|         |    \--- com.google.code.findbugs:jsr305:1.3.9 -> 3.0.2
|         +--- org.apache.flink:flink-metrics-core:1.17.1
|         |    \--- org.apache.flink:flink-annotations:1.17.1 (*)
|         +--- org.apache.flink:flink-shaded-asm-9:9.3-16.1
|         +--- org.apache.flink:flink-shaded-jackson:2.13.4-16.1
|         +--- org.apache.commons:commons-lang3:3.12.0
|         +--- org.apache.commons:commons-text:1.10.0
|         |    \--- org.apache.commons:commons-lang3:3.12.0
|         +--- commons-collections:commons-collections:3.2.2
|         +--- org.apache.commons:commons-compress:1.21 -> 1.24.0
|         +--- org.apache.flink:flink-shaded-guava:30.1.1-jre-16.1
|         \--- com.google.code.findbugs:jsr305:1.3.9 -> 3.0.2
...

Elastic Search DSL

Basic Query

exist query

Returns documents that contain an indexed value for a field.

GET /_search
{
  "query": {
    "exists": {
      "field": "user"
    }
  }
}

The following search returns documents that are missing an indexed value for the user.id field.

GET /_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "user.id"
        }
      }
    }
  }
}
fuzz query

Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.

GET /_search
{
  "query": {
    "fuzzy": {
      "filed_A": {
        "value": "ki"
      }
    }
  }
}

Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.

GET /_search
{
  "query": {
    "fuzzy": {
      "filed_A": {
        "value": "ki",
        "fuzziness": "AUTO",
        "max_expansions": 50,
        "prefix_length": 0,
        "transpositions": true,
        "rewrite": "constant_score_blended"
      }
    }
  }
}

rewrite:

  • constant_score_boolean
  • constant_score_filter
  • top_terms_blended_freqs_N
  • top_terms_boost_N, top_terms_N
  • frequent_terms, score_delegating
ids query

Returns documents based on their IDs. This query uses document IDs stored in the _id field.

GET /_search
{
  "query": {
    "ids" : {
      "values" : ["2NTC5ZIBNLuBWC6V5_0Y"]
    }
  }
}
prefix query

The following search returns documents where the filed_A field contains a term that begins with ki.

GET /_search
{
  "query": {
    "prefix": {
      "filed_A": {
        "value": "ki",
         "rewrite": "constant_score_blended",
         "case_insensitive": true
      }
    }
  }
}

You can simplify the prefix query syntax by combining the <field> and value parameters.

GET /_search
{
  "query": {
    "prefix" : { "filed_A" : "ki" }
  }
}
range query

Returns documents that contain terms within a provided range.

GET /_search
{
  "query": {
    "range": {
      "filed_number": {
        "gte": 10,
        "lte": 20,
        "boost": 2.0
      }
    }
  }
}
GET /_search
{
  "query": {
    "range": {
      "filed_timestamp": {
        "time_zone": "+01:00",        
        "gte": "2020-01-01T00:00:00", 
        "lte": "now"                  
      }
    }
  }
}
regex query

Returns documents that contain terms matching a regular expression.

GET /_search
{
  "query": {
    "regexp": {
      "filed_A": {
        "value": "k.*y",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score_blended"
      }
    }
  }
}
term query

Returns documents that contain an exact term in a provided field.

You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.

GET /_search
{
  "query": {
    "term": {
      "filed_A": {
        "value": "kimchy",
        "boost": 1.0
      }
    }
  }
}
wildcard query

Returns documents that contain terms matching a wildcard pattern.

A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.

GET /_search
{
  "query": {
    "wildcard": {
      "filed_A": {
        "value": "ki*y",
        "boost": 1.0,
        "rewrite": "constant_score_blended"
      }
    }
  }
}

Subsections of 🧪Demos

Subsections of Game

LOL Overlay Assistant

Using deep learning techniques to help you to win the game.

State Machine Event Bus Python 3.6 TensorFlow2 Captain InfoNew Awesome

ScreenShots

There are four main funcs in this tool.

  1. The first one is to detect your game client thread and recognize which status you are in. func1 func1

  2. The second one is to recommend some champions to play. Based on your enemy’s team banned champion, this tool will provide you three more choices to counter your enemies. func2 func2

  3. The third func will scans the mini-map, and when someone is heading to you, a notification window will pop up. func3 func3

  4. The last func will provides you some gear recommendation based on your enemy’s item list. fun4 fun4

Framework

mvc mvc

Checkout in Bilibili

Checkout in Youtube

Repo

you can get code from github, gitee

Roller Coin Assistant

Using deep learning techniques to help you to mining the cryptos, such as BTC, ETH and DOGE.

ScreenShots

There are two main funcs in this tool.

  1. Help you to crack the game
  • only support ‘Coin-Flip’ Game for now.

    right, rollercoin.com had decrease the benefit from this game, thats why I make the repo public. update

  1. Help you to pass the geetest.

How to use

  1. open a web browser.
  2. go to this link https://rollercoin.com, and create an account.(https://rollercoin.com)
  3. keep the lang equals to ‘English’ (you can click the bottom button to change it).
  4. click the ‘Game’ button.
  5. start the application, and enjoy it.

Tips

  1. only supprot 1920*1080, 2560*1440 and higher resolution screen.
  2. and if you use 1920*1080 screen, strongly recommend you to fullscreen you web browser.

Repo

you can get code from gitee

Subsections of HPC

Slurm On K8S

slurm_on_k8s slurm_on_k8s

Trying to run slurm cluster on kubernets

Install

You can directly use helm to manage this slurm chart

  1. helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
  2. helm install slurm ay-helm-mirror/slurm --version 1.0.4

And then, you should see something like this func1 func1

Also, you can modify the values.yaml by yourself, and reinstall the slurm cluster

helm upgrade --create-namespace -n slurm --install -f ./values.yaml slurm ay-helm-mirror/slurm --version=1.0.4
Important

And you even can build your own image, especially for people wanna use their own libs. For now, the image we used is

login –> docker.io/aaron666/slurm-login:intel-mpi

slurmd –> docker.io/aaron666/slurm-slurmd:intel-mpi

slurmctld -> docker.io/aaron666/slurm-slurmctld:latest

slurmdbd –> docker.io/aaron666/slurm-slurmdbd:latest

munged –> docker.io/aaron666/slurm-munged:latest

Slurm Operator

if you wanna change slurm configuration ,please check slurm configuration generator click

  • for helm user

    just run for fun!

    1. helm repo add ay-helm-repo https://aaronyang0628.github.io/helm-chart-mirror/charts
    2. helm install slurm ay-helm-repo/slurm --version 1.0.4
  • for opertaor user

    pull an image and apply

    1. docker pull aaron666/slurm-operator:latest
    2. kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/helm-chart-mirror/refs/heads/main/templates/slurm/install.yaml
    3. kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/helm-chart-mirror/refs/heads/main/templates/slurm/slurmdeployment.values.yaml

Subsections of Plugins

Flink S3 F3 Multiple

Normally, Flink only can access only one S3 endpoint during the runtime. But we need to process some files from multiple minio simultaneously.

So I modified the original flink-s3-fs-hadoop and enable flink to do so.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(5000L, CheckpointingMode.EXACTLY_ONCE);
env.setParallelism(1);
env.setStateBackend(new HashMapStateBackend());
env.getCheckpointConfig().setCheckpointStorage("file:///./checkpoints");

final FileSource<String> source =
    FileSource.forRecordStreamFormat(
            new TextLineInputFormat(),
            new Path(
                "s3u://admin:ZrwpsezF1Lt85dxl@10.11.33.132:9000/user-data/home/conti/2024-02-08--10"))
        .build();

final FileSource<String> source2 =
    FileSource.forRecordStreamFormat(
            new TextLineInputFormat(),
            new Path(
                "s3u://minioadmin:minioadmin@10.101.16.72:9000/user-data/home/conti"))
        .build();

env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source")
    .union(env.fromSource(source2, WatermarkStrategy.noWatermarks(), "file-source2"))
    .print("union-result");
    
env.execute();

using default flink-s3-fs-hadoop, the configuration value will set into Hadoop configuration map. Only one value functioning at the same, there is no way for user to operate different in single one job context.

Configuration pluginConfiguration = new Configuration();
pluginConfiguration.setString("s3a.access-key", "admin");
pluginConfiguration.setString("s3a.secret-key", "ZrwpsezF1Lt85dxl");
pluginConfiguration.setString("s3a.connection.maximum", "1000");
pluginConfiguration.setString("s3a.endpoint", "http://10.11.33.132:9000");
pluginConfiguration.setBoolean("s3a.path.style.access", Boolean.TRUE);
FileSystem.initialize(
    pluginConfiguration, PluginUtils.createPluginManagerFromRootFolder(pluginConfiguration));

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(5000L, CheckpointingMode.EXACTLY_ONCE);
env.setParallelism(1);
env.setStateBackend(new HashMapStateBackend());
env.getCheckpointConfig().setCheckpointStorage("file:///./checkpoints");

final FileSource<String> source =
    FileSource.forRecordStreamFormat(
            new TextLineInputFormat(), new Path("s3a://user-data/home/conti/2024-02-08--10"))
        .build();
env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source").print();

env.execute();

Usage

There

Install From

For now, you can directly download flink-s3-fs-hadoop-$VERSION.jar and load in your project.
$VERSION is the flink version you are using.

  implementation(files("flink-s3-fs-hadoop-$flinkVersion.jar"))
  <dependency>
      <groupId>org.apache</groupId>
      <artifactId>flink</artifactId>
      <version>$flinkVersion</version>
      <systemPath>${project.basedir}flink-s3-fs-hadoop-$flinkVersion.jar</systemPath>
  </dependency>
the jar we provided was based on original flink-s3-fs-hadoop plugin, so you should use original protocal prefix s3a://

Or maybe you can wait from the PR, after I mereged into flink-master, you don't need to do anything, just update your flink version.
and directly use s3u://

Repo

you can get code from github, gitlab

Subsections of Stream

Cosmic Antenna

Design Architecture

  • objects

continuously processing antenna signal records and convert them into 3 dimension data matrixes, sending them to different astronomical algorithm endpoints. asdsaa asdsaa

  • how data flows

asdsaa asdsaa

Building From Zero

Following these steps, you may build comic-antenna from nothing.

1. install podman

you can check article Install Podman

2. install kind and kubectl

you can check article install kubectl

# create a cluster using podman
curl -o kind.cluster.yaml -L https://gitlab.com/-/snippets/3686427/raw/main/kind-cluster.yaml \
&& export KIND_EXPERIMENTAL_PROVIDER=podman \
&& kind create cluster --name cs-cluster --image m.daocloud.io/docker.io/kindest/node:v1.27.3 --config=./kind.cluster.yaml
Modify ~/.kube/config

vim ~/.kube/config

in line 5, change server: http://::xxxx -> server: http://0.0.0.0:xxxxx

asdsaa asdsaa

3. [Optional] pre-downloaded slow images

DOCKER_IMAGE_PATH=/root/docker-images && mkdir -p $DOCKER_IMAGE_PATH
BASE_URL="https://resource-ops-dev.lab.zjvis.net:32443/docker-images"
for IMAGE in "quay.io_argoproj_argocd_v2.9.3.dim" \
    "ghcr.io_dexidp_dex_v2.37.0.dim" \
    "docker.io_library_redis_7.0.11-alpine.dim" \
    "docker.io_library_flink_1.17.dim"
do
    IMAGE_FILE=$DOCKER_IMAGE_PATH/$IMAGE
    if [ ! -f $IMAGE_FILE ]; then
        TMP_FILE=$IMAGE_FILE.tmp \
        && curl -o "$TMP_FILE" -L "$BASE_URL/$IMAGE" \
        && mv $TMP_FILE $IMAGE_FILE
    fi
    kind -n cs-cluster load image-archive $IMAGE_FILE
done

4. install argocd

you can check article Install ArgoCD

5. install essential app on argocd

# install cert manger    
curl -LO https://gitlab.com/-/snippets/3686424/raw/main/cert-manager.yaml \
&& kubectl -n argocd apply -f cert-manager.yaml \
&& argocd app sync argocd/cert-manager

# install ingress
curl -LO https://gitlab.com/-/snippets/3686426/raw/main/ingress-nginx.yaml \
&& kubectl -n argocd apply -f ingress-nginx.yaml \
&& argocd app sync argocd/ingress-nginx

# install flink-kubernetes-operator
curl -LO https://gitlab.com/-/snippets/3686429/raw/main/flink-operator.yaml \
&& kubectl -n argocd apply -f flink-operator.yaml \
&& argocd app sync argocd/flink-operator

6. install git

sudo dnf install -y git \
&& rm -rf $HOME/cosmic-antenna-demo \
&& mkdir $HOME/cosmic-antenna-demo \
&& git clone --branch pv_pvc_template https://github.com/AaronYang2333/cosmic-antenna-demo.git $HOME/cosmic-antenna-demo

7. prepare application image

# cd into  $HOME/cosmic-antenna-demo
sudo dnf install -y java-11-openjdk.x86_64 \
&& $HOME/cosmic-antenna-demo/gradlew :s3sync:buildImage \
&& $HOME/cosmic-antenna-demo/gradlew :fpga-mock:buildImage
# save and load into cluster
VERSION="1.0.3"
podman save --quiet -o $DOCKER_IMAGE_PATH/fpga-mock_$VERSION.dim localhost/fpga-mock:$VERSION \
&& kind -n cs-cluster load image-archive $DOCKER_IMAGE_PATH/fpga-mock_$VERSION.dim
podman save --quiet -o $DOCKER_IMAGE_PATH/s3sync_$VERSION.dim localhost/s3sync:$VERSION \
&& kind -n cs-cluster load image-archive $DOCKER_IMAGE_PATH/s3sync_$VERSION.dim
kubectl -n flink edit role/flink -o yaml
Modify role config
kubectl -n flink edit role/flink -o yaml

add services and endpoints to the rules.resources

8. prepare k8s resources [pv, pvc, sts]

cp -rf $HOME/cosmic-antenna-demo/flink/*.yaml /tmp \
&& podman exec -d cs-cluster-control-plane mkdir -p /mnt/flink-job
# create persist volume
kubectl -n flink create -f /tmp/pv.template.yaml
# create pv claim
kubectl -n flink create -f /tmp/pvc.template.yaml
# start up flink application
kubectl -n flink create -f /tmp/job.template.yaml
# start up ingress
kubectl -n flink create -f /tmp/ingress.forward.yaml
# start up fpga UDP client, sending data 
cp $HOME/cosmic-antenna-demo/fpga-mock/client.template.yaml /tmp \
&& kubectl -n flink create -f /tmp/client.template.yaml

9. check dashboard in browser

http://job-template-example.flink.lab.zjvis.net

Repo

you can get code from github


Reference

  1. https://github.com/ben-wangz/blog/tree/main/docs/content/6.kubernetes/7.installation/ha-cluster
  2. xxx

Subsections of Design

Yaml Crawler

Steps

  1. define which web url you wanna crawl, lets say https://www.xxx.com/aaa.apex
  2. create a page pojo org.example.business.page.MainPage to describe that page

Then you can create a yaml file named root-pages.yaml and its content is

- '@class': "org.example.business.page.MainPage"
  url: "https://www.xxx.com/aaa.apex"
  1. and then define a process flow yaml file, implying how to process web pages the crawler will meet.
processorChain:
  - '@class': "org.example.crawler.core.processor.decorator.ExceptionRecord"
    processor:
      '@class': "org.example.crawler.core.processor.decorator.RetryControl"
      processor:
        '@class': "org.example.crawler.core.processor.decorator.SpeedControl"
        processor:
          '@class': "org.example.business.hs.code.MainPageProcessor"
          application: "app-name"
        time: 100
        unit: "MILLISECONDS"
      retryTimes: 1
  - '@class': "org.example.crawler.core.processor.decorator.ExceptionRecord"
    processor:
      '@class': "org.example.crawler.core.processor.decorator.RetryControl"
      processor:
        '@class': "org.example.crawler.core.processor.decorator.SpeedControl"
        processor:
          '@class': "org.example.crawler.core.processor.download.DownloadProcessor"
          pagePersist:
            '@class': "org.example.business.persist.DownloadPageDatabasePersist"
            downloadPageRepositoryBeanName: "downloadPageRepository"
          downloadPageTransformer:
            '@class': "org.example.crawler.download.DefaultDownloadPageTransformer"
          skipExists:
            '@class': "org.example.crawler.download.SkipExistsById"
        time: 1
        unit: "SECONDS"
      retryTimes: 1
nThreads: 1
pollWaitingTime: 30
pollWaitingTimeUnit: "SECONDS"
waitFinishedTimeout: 180
waitFinishedTimeUnit: "SECONDS" 

ExceptionRecord, RetryControl, SpeedControl are provided by the yaml crawler itself, dont worry. you only need to extend how to process your page MainPage, for example, you defined a MainPageProcessor. each processor will produce a set of other page or DownloadPage. DownloadPage like a ship containing information you need, and this framework will help you process DownloadPage and download or persist.

  1. Vola, run your crawler then.

Repo

you can get code from github, gitlab

Subsections of 🐿️Apache Flink

Subsections of On K8s Operator

Job Privilieges

Template

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: flink
  name: flink-deployment-manager
rules:
- apiGroups: 
  - flink.apache.org
  resources: 
  - flinkdeployments
  verbs: 
  - 'get'
  - 'list'
  - 'create'
  - 'update'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: flink-deployment-manager-binding
  namespace: flink
subjects:
- kind: User
  name: "277293711358271379"  
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: flink-deployment-manager
  apiGroup: rbac.authorization.k8s.io

OSS Template

Template

apiVersion: "flink.apache.org/v1beta1"
kind: "FlinkDeployment"
metadata:
  name: "financial-job"
spec:
  image: "cr.registry.res.cloud.wuxi-yqgcy.cn/mirror/financial-topic:1.5-oss"
  flinkVersion: "v1_17"
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "8"
    fs.oss.endpoint: http://ay-test.oss-cn-jswx-xuelang-d01-a.ops.cloud.wuxi-yqgcy.cn/
    fs.oss.accessKeyId: 4gqOVOfQqCsCUwaC
    fs.oss.accessKeySecret: xxx
  ingress:
    template: "flink.k8s.io/{{namespace}}/{{name}}(/|$)(.*)"
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "self-signed-ca-issuer"
      nginx.ingress.kubernetes.io/rewrite-target: "/$2"
  serviceAccount: "flink"
  podTemplate:
    apiVersion: "v1"
    kind: "Pod"
    metadata:
      name: "financial-job"
    spec:
      containers:
        - name: "flink-main-container"
          env:
            - name: ENABLE_BUILT_IN_PLUGINS
              value: flink-oss-fs-hadoop-1.17.2.jar
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: "local:///app/application.jar"
    parallelism: 1
    upgradeMode: "stateless"

S3 Template

Template

apiVersion: "flink.apache.org/v1beta1"
kind: "FlinkDeployment"
metadata:
  name: "financial-job"
spec:
  image: "cr.registry.res.cloud.wuxi-yqgcy.cn/mirror/financial-topic:1.5"
  flinkVersion: "v1_17"
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "8"
    s3a.endpoint: http://172.27.253.89:9000
    s3a.access-key: minioadmin
    s3a.secret-key: minioadmin
  ingress:
    template: "flink.k8s.io/{{namespace}}/{{name}}(/|$)(.*)"
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "self-signed-ca-issuer"
      nginx.ingress.kubernetes.io/rewrite-target: "/$2"
  serviceAccount: "flink"
  podTemplate:
    apiVersion: "v1"
    kind: "Pod"
    metadata:
      name: "financial-job"
    spec:
      containers:
        - name: "flink-main-container"
          env:
            - name: ENABLE_BUILT_IN_PLUGINS
              value: flink-s3-fs-hadoop-1.17.2.jar
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: "local:///app/application.jar"
    parallelism: 1
    upgradeMode: "stateless"

Subsections of CDC

Mysql CDC

More Ofthen, we can get a simplest example form CDC Connectors. But people still need to google some inescapable problems before using it.

preliminary

Flink: 1.17 JDK: 11

Flink CDC VersionFlink Version
1.0.01.11.*
1.1.01.11.*
1.2.01.12.*
1.3.01.12.*
1.4.01.13.*
2.0.*1.13.*
2.1.*1.13.*
2.2.*1.13.*, 1.14.*
2.3.*1.13.*, 1.14.*, 1.15.*
2.4.*1.13.*, 1.14.*, 1.15.*
3.0.*1.14.*, 1.15.*, 1.16.*

usage for DataStream API

Only import com.ververica.flink-connector-mysql-cdc is not enough.

implementation("com.ververica:flink-connector-mysql-cdc:2.4.0")

//you also need these following dependencies
implementation("org.apache.flink:flink-shaded-guava:30.1.1-jre-16.1")
implementation("org.apache.flink:flink-connector-base:1.17")
implementation("org.apache.flink:flink-table-planner_2.12:1.17")
<dependency>
  <groupId>com.ververica</groupId>
  <!-- add the dependency matching your database -->
  <artifactId>flink-connector-mysql-cdc</artifactId>
  <!-- The dependency is available only for stable releases, SNAPSHOT dependencies need to be built based on master or release- branches by yourself. -->
  <version>2.4.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-guava -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-shaded-guava</artifactId>
  <version>30.1.1-jre-16.1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-base -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-base</artifactId>
  <version>1.17.1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-planner -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-planner_2.12</artifactId>
  <version>1.17.1</version>
</dependency>

Example Code

MySqlSource<String> mySqlSource =
    MySqlSource.<String>builder()
        .hostname("192.168.56.107")
        .port(3306)
        .databaseList("test") // set captured database
        .tableList("test.table_a") // set captured table
        .username("root")
        .password("mysql")
        .deserializer(
            new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
        .serverTimeZone("UTC")
        .build();

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// enable checkpoint
env.enableCheckpointing(3000);

env.fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySQL Source")
    // set 4 parallel source tasks
    .setParallelism(4)
    .print()
    .setParallelism(1); // use parallelism 1 for sink to keep message ordering

env.execute("Print MySQL Snapshot + Binlog");

usage for table/SQL API

Connector

☸️Kubernetes

Subsections of ☸️Kubernetes

Prepare k8s Cluster

Building a K8s Cluster, you can choose one of the following methods.

Install Kuberctl

Build Cluster

Install By

Prerequisites

  • Hardware Requirements:

    1. At least 2 GB of RAM per machine (minimum 1 GB)
    2. 2 CPUs on the master node
    3. Full network connectivity among all machines (public or private network)
  • Operating System:

    1. Ubuntu 20.04/18.04, CentOS 7/8, or any other supported Linux distribution.
  • Network Requirements:

    1. Unique hostname, MAC address, and product_uuid for each node.
    2. Certain ports need to be open (e.g., 6443, 2379-2380, 10250, 10251, 10252, 10255, etc.)
  • Disable Swap:

    sudo swapoff -a

Steps to Setup Kubernetes Cluster

  1. Prepare Your Servers Update the Package Index and Install Necessary Packages On all your nodes (both master and worker):
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

Add the Kubernetes APT Repository

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

Install kubeadm, kubelet, and kubectl

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
  1. Initialize the Master Node On the master node, initialize the Kubernetes control plane:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16

The –pod-network-cidr flag is used to set the Pod network range. You might need to adjust this based on your network provider

Set up Local kubeconfig

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
  1. Install a Pod Network Add-on You can install a network add-on like Flannel, Calico, or Weave. For example, to install Calico:
  1. Join Worker Nodes to the Cluster On each worker node, run the kubeadm join command provided at the end of the kubeadm init output on the master node. It will look something like this:
sudo kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>

If you lost the join command, you can create a new token on the master node:

sudo kubeadm token create --print-join-command
  1. Verify the Cluster Once all nodes have joined, you can verify the cluster status from the master node:
kubectl get nodes

This command should list all your nodes with the status “Ready”.

Subsections of Prepare k8s Cluster

Kind

Preliminary

  • Kind binary has installed, if not check 🔗link

  • Hardware Requirements:

    1. At least 2 GB of RAM per machine (minimum 1 GB)
    2. 2 CPUs on the master node
    3. Full network connectivity among all machines (public or private network)
  • Operating System:

    1. Ubuntu 22.04/14.04, CentOS 7/8, or any other supported Linux distribution.
  • Network Requirements:

    1. Unique hostname, MAC address, and product_uuid for each node.
    2. Certain ports need to be open (e.g., 6443, 2379-2380, 10250, 10251, 10252, 10255, etc.)

Customize your cluster

Creating a Kubernetes cluster is as simple as kind create cluster

kind create cluster --name test

Reference

and the you can visit https://kind.sigs.k8s.io/docs/user/quick-start/ for mode detail.

K3s

Preliminary

  • Hardware Requirements:

    1. Server need to have at least 2 cores, 2 GB RAM
    2. Agent need 1 core , 512 MB RAM
  • Operating System:

    1. K3s is expected to work on most modern Linux systems.
  • Network Requirements:

    1. The K3s server needs port 6443 to be accessible by all nodes.
    2. If you wish to utilize the metrics server, all nodes must be accessible to each other on port 10250.

Init server

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn sh -s - server --cluster-init --flannel-backend=vxlan --node-taint "node-role.kubernetes.io/control-plane=true:NoSchedule"

Get token

cat /var/lib/rancher/k3s/server/node-token

Join worker

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL=https://<master-ip>:6443 K3S_TOKEN=<join-token> sh -

Copy kubeconfig

mkdir -p $HOME/.kube
cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config

Minikube

Preliminary

  • Minikube binary has installed, if not check 🔗link

  • Hardware Requirements:

    1. At least 2 GB of RAM per machine (minimum 1 GB)
    2. 2 CPUs on the master node
    3. Full network connectivity among all machines (public or private network)
  • Operating System:

    1. Ubuntu 20.04/18.04, CentOS 7/8, or any other supported Linux distribution.
  • Network Requirements:

    1. Unique hostname, MAC address, and product_uuid for each node.
    2. Certain ports need to be open (e.g., 6443, 2379-2380, 10250, 10251, 10252, 10255, etc.)

[Optional] Disable aegis service and reboot system for Aliyun

sudo systemctl disable aegis && sudo reboot

Customize your cluster

minikube start --driver=podman  --image-mirror-country=cn --kubernetes-version=v1.33.1 --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --cpus=6 --memory=20g --disk-size=50g --force

Restart minikube

minikube stop && minikube start

Add alias

alias kubectl="minikube kubectl --"

Forward

ssh -i ~/.minikube/machines/minikube/id_rsa docker@$(minikube ip) -L '*:30443:0.0.0.0:30443' -N -f

and then you can visit https://minikube.sigs.k8s.io/docs/start/ for more detail.

FAQ

通常是由于 Metrics Server 未正确安装 或 External Metrics API 缺失 导致的

# 启用 Minikube 的 metrics-server 插件
minikube addons enable metrics-server

# 等待部署完成(约 1-2 分钟)
kubectl wait --for=condition=available deployment/metrics-server -n kube-system --timeout=180s

# 验证 Metrics Server 是否运行
kubectl -n kube-system get pods  | grep metrics-server

the possibilities are endless (almost - including other shortcodes may or may not work)

You can add standard markdown syntax:

  • multiple paragraphs
  • bullet point lists
  • emphasized, bold and even bold emphasized text
  • links
  • etc.
...and even source code

the possibilities are endless (almost - including other shortcodes may or may not work)

Subsections of Command

Kubectl CheatSheet

Switch Context

  • use different config
kubectl --kubeconfig /root/.kube/config_ack get pod

Resource

  • create resource

    Resource From
      kubectl create -n <$namespace> -f <$file_url>
    apiVersion: v1
    kind: Service
    metadata:
    labels:
        app.kubernetes.io/component: server
        app.kubernetes.io/instance: argo-cd
        app.kubernetes.io/name: argocd-server-external
        app.kubernetes.io/part-of: argocd
        app.kubernetes.io/version: v2.8.4
    name: argocd-server-external
    spec:
    ports:
    - name: https
        port: 443
        protocol: TCP
        targetPort: 8080
        nodePort: 30443
    selector:
        app.kubernetes.io/instance: argo-cd
        app.kubernetes.io/name: argocd-server
    type: NodePort
    
      helm install <$resource_id> <$resource_id> \
          --namespace <$namespace> \
          --create-namespace \
          --version <$version> \
          --repo <$repo_url> \
          --values resource.values.yaml \
          --atomic
    crds:
        install: true
        keep: false
    global:
        revisionHistoryLimit: 3
        image:
            repository: m.daocloud.io/quay.io/argoproj/argocd
            imagePullPolicy: IfNotPresent
    redis:
        enabled: true
        image:
            repository: m.daocloud.io/docker.io/library/redis
        exporter:
            enabled: false
            image:
                repository: m.daocloud.io/bitnami/redis-exporter
        metrics:
            enabled: false
    redis-ha:
        enabled: false
        image:
            repository: m.daocloud.io/docker.io/library/redis
        configmapTest:
            repository: m.daocloud.io/docker.io/koalaman/shellcheck
        haproxy:
            enabled: false
            image:
            repository: m.daocloud.io/docker.io/library/haproxy
        exporter:
            enabled: false
            image: m.daocloud.io/docker.io/oliver006/redis_exporter
    dex:
        enabled: true
        image:
            repository: m.daocloud.io/ghcr.io/dexidp/dex
    

  • debug resource

kubectl -n <$namespace> describe <$resource_id>
  • logging resource
kubectl -n <$namespace> logs -f <$resource_id>
  • port forwarding resource
kubectl -n <$namespace> port-forward  <$resource_id> --address 0.0.0.0 8080:80 # local:pod
  • delete all resource under specific namespace
kubectl delete all --all -n <$namespace>
kubectl delete all --all --all-namespaces
  • delete error pods
kubectl -n <$namespace> delete pods --field-selector status.phase=Failed
  • force delete
kubectl -n <$namespace> delete pod <$resource_id> --force --grace-period=0
  • opening a Bash Shell inside a Pod
kubectl -n <$namespace> exec -it <$resource_id> -- bash  
  • copy secret to another namespace
kubectl -n <$namespaceA> get secret <$secret_name> -o json \
    | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' \
    | kubectl -n <$namespaceB> apply -f -
  • copy secret to another name
kubectl -n <$namespace> get secret <$old_secret_name> -o json | \
jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid","ownerReferences","annotations","labels"]) | .metadata.name = "<$new_secret_name>"' | \
kubectl apply -n <$namespace> -f -
  • delete all completed job
kubectl delete jobs -n <$namespace> --field-selector status.successful=1 

Nodes

  • add taint
kubectl taint nodes <$node_ip> <key:value>
kubectl taint nodes node1 dedicated:NoSchedule
  • remove taint
kubectl remove taint
kubectl taint nodes node1 dedicated:NoSchedule-
  • show info extract by json path
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'

Deploy

  • rollout show rollout history
kubectl -n <$namespace> rollout history deploy/<$deploy_resource_id>

undo rollout

kubectl -n <$namespace> rollout undo deploy <$deploy_resource_id>  --to-revision=1

Helm Chart CheatSheet

Finding Charts

helm search hub wordpress

Adding Repositories

helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
helm repo update

Showing Chart Values

helm show values bitnami/wordpress

Packaging Charts

helm package --dependency-update --destination /tmp/ /root/metadata-operator/environments/helm/metadata-environment/charts

Subsections of Conatiner

CheatShett

type:
  1. remove specific image
podman rmi <$image_id>
  1. remove all <none> images
podman rmi `podamn images | grep  '<none>' | awk '{print $3}'`
  1. remove all stopped containers
podman container prune
  1. remove all docker images not used
podman image prune

sudo podman volume prune

  1. find ip address of a container
podman inspect --format='{{.NetworkSettings.IPAddress}}' minio-server
  1. exec into container
podman run -it <$container_id> /bin/bash
  1. run with environment
podman run -d --replace 
    -p 18123:8123 -p 19000:9000 \
    --name clickhouse-server \
    -e ALLOW_EMPTY_PASSWORD=yes \
    --ulimit nofile=262144:262144 \
    quay.m.daocloud.io/kryptonite/clickhouse-docker-rootless:20.9.3.45 

--ulimit nofile=262144:262144: 262144 is the maximum users process or for showing maximum user process limit for the logged-in user

ulimit is admin access required Linux shell command which is used to see, set, or limit the resource usage of the current user. It is used to return the number of open file descriptors for each process. It is also used to set restrictions on the resources used by a process.

  1. login registry
podman login --tls-verify=false --username=ascm-org-1710208820455 cr.registry.res.cloud.zhejianglab.com -p 'xxxx'
  1. tag image
podman tag 76fdac66291c cr.registry.res.cloud.zhejianglab.com/ay-dev/datahub-s3-fits:1.0.0
  1. push image
podman push cr.registry.res.cloud.zhejianglab.com/ay-dev/datahub-s3-fits:1.0.0
  1. remove specific image
docker rmi <$image_id>
  1. remove all <none> images
docker rmi `docker images | grep  '<none>' | awk '{print $3}'`
  1. remove all stopped containers
docker container prune
  1. remove all docker images not used
docker image prune
  1. find ip address of a container
docker inspect --format='{{.NetworkSettings.IPAddress}}' minio-server
  1. exec into container
docker exec -it <$container_id> /bin/bash
  1. run with environment
docker run -d --replace -p 18123:8123 -p 19000:9000 --name clickhouse-server -e ALLOW_EMPTY_PASSWORD=yes --ulimit nofile=262144:262144 quay.m.daocloud.io/kryptonite/clickhouse-docker-rootless:20.9.3.45 

--ulimit nofile=262144:262144: sssss

  1. copy file

    Copy a local file into container

    docker cp ./some_file CONTAINER:/work

    or copy files from container to local path

    docker cp CONTAINER:/var/logs/ /tmp/app_logs
  2. load a volume

docker run --rm \
    --entrypoint bash \
    -v $PWD/data:/app:ro \
    -it docker.io/minio/mc:latest \
    -c "mc --insecure alias set minio https://oss-cn-hangzhou-zjy-d01-a.ops.cloud.zhejianglab.com/ g83B2sji1CbAfjQO 2h8NisFRELiwOn41iXc6sgufED1n1A \
        && mc --insecure ls minio/csst-prod/ \
        && mc --insecure mb --ignore-existing minio/csst-prod/crp-test \
        && mc --insecure cp /app/modify.pdf minio/csst-prod/crp-test/ \
        && mc --insecure ls --recursive minio/csst-prod/"

Subsections of Template

Subsections of DevContainer Template

Java 21 + Go 1.24

prepare .devcontainer.json

{
  "name": "Go & Java DevContainer",
  "build": {
    "dockerfile": "Dockerfile"
  },
  "mounts": [
    "source=/root/.kube/config,target=/root/.kube/config,type=bind",
    "source=/root/.minikube/profiles/minikube/client.crt,target=/root/.minikube/profiles/minikube/client.crt,type=bind",
    "source=/root/.minikube/profiles/minikube/client.key,target=/root/.minikube/profiles/minikube/client.key,type=bind",
    "source=/root/.minikube/ca.crt,target=/root/.minikube/ca.crt,type=bind"
  ],
  "customizations": {
    "vscode": {
      "extensions": [
        "golang.go",
        "vscjava.vscode-java-pack",
        "redhat.java",
        "vscjava.vscode-maven",
        "Alibaba-Cloud.tongyi-lingma",
        "vscjava.vscode-java-debug",
        "vscjava.vscode-java-dependency",
        "vscjava.vscode-java-test"
      ]
    }
  },
  "remoteUser": "root",
  "postCreateCommand": "go version && java -version && mvn -v"
}

prepare Dockerfile

FROM m.daocloud.io/docker.io/ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    ca-certificates \
    curl \
    git \
    wget \
    gnupg \
    vim \
    lsb-release \
    apt-transport-https \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# install OpenJDK 21 
RUN mkdir -p /etc/apt/keyrings && \
    wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | gpg --dearmor -o /etc/apt/keyrings/adoptium.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/adoptium.gpg arch=amd64] https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list > /dev/null && \
    apt-get update && \
    apt-get install -y temurin-21-jdk && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# set java env
ENV JAVA_HOME=/usr/lib/jvm/temurin-21-jdk-amd64

# install maven
ARG MAVEN_VERSION=3.9.10
RUN wget https://dlcdn.apache.org/maven/maven-3/${MAVEN_VERSION}/binaries/apache-maven-${MAVEN_VERSION}-bin.tar.gz -O /tmp/maven.tar.gz && \
    mkdir -p /opt/maven && \
    tar -C /opt/maven -xzf /tmp/maven.tar.gz --strip-components=1 && \
    rm /tmp/maven.tar.gz

ENV MAVEN_HOME=/opt/maven
ENV PATH="${MAVEN_HOME}/bin:${PATH}"

# install go 1.24.4 
ARG GO_VERSION=1.24.4
RUN wget https://dl.google.com/go/go${GO_VERSION}.linux-amd64.tar.gz -O /tmp/go.tar.gz && \
    tar -C /usr/local -xzf /tmp/go.tar.gz && \
    rm /tmp/go.tar.gz

# set go env
ENV GOROOT=/usr/local/go
ENV GOPATH=/go
ENV PATH="${GOROOT}/bin:${GOPATH}/bin:${PATH}"

# install other binarys
ARG KUBECTL_VERSION=v1.33.0
RUN wget https://files.m.daocloud.io/dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl -O /tmp/kubectl && \
    chmod u+x /tmp/kubectl && \
    mv -f /tmp/kubectl /usr/local/bin/kubectl 

ARG HELM_VERSION=v3.13.3
RUN wget https://files.m.daocloud.io/get.helm.sh/helm-${HELM_VERSION}-linux-amd64.tar.gz -O /tmp/helm-${HELM_VERSION}-linux-amd64.tar.gz && \
    mkdir -p /opt/helm && \
    tar -C /opt/helm -xzf /tmp/helm-${HELM_VERSION}-linux-amd64.tar.gz && \
    rm /tmp/helm-${HELM_VERSION}-linux-amd64.tar.gz

ENV HELM_HOME=/opt/helm/linux-amd64
ENV PATH="${HELM_HOME}:${PATH}"

USER root
WORKDIR /workspace

Subsections of DEV

Devpod

Preliminary

  • Kubernetes has installed, if not check 🔗link
  • Devpod has installed, if not check 🔗link

1. Get provider config

# just copy ~/.kube/config

for example, the original config

apiVersion: v1
clusters:
- cluster:
    certificate-authority: <$file_path>
    extensions:
    - extension:
        provider: minikube.sigs.k8s.io
        version: v1.33.0
      name: cluster_info
    server: https://<$minikube_ip>:8443
  name: minikube
contexts:
- context:
    cluster: minikube
    extensions:
    - extension:
        provider: minikube.sigs.k8s.io
        version: v1.33.0
      name: context_info
    namespace: default
    user: minikube
  name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
  user:
    client-certificate: <$file_path>
    client-key: <$file_path>

you need to rename clusters.cluster.certificate-authority, clusters.cluster.server, users.user.client-certificate, users.user.client-key.

clusters.cluster.certificate-authority -> clusters.cluster.certificate-authority-data
clusters.cluster.server -> ip set to `localhost`
users.user.client-certificate -> users.user.client-certificate-data
users.user.client-key -> users.user.client-key-data

the data you paste after each key should be base64

cat <$file_path> | base64

then, modified config file should be look like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: xxxxxxxxxxxxxx
    extensions:
    - extension:
        provider: minikube.sigs.k8s.io
        version: v1.33.0
      name: cluster_info
    server: https://127.0.0.1:8443 
  name: minikube
contexts:
- context:
    cluster: minikube
    extensions:
    - extension:
        provider: minikube.sigs.k8s.io
        version: v1.33.0
      name: context_info
    namespace: default
    user: minikube
  name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
  user:
    client-certificate-data: xxxxxxxxxxxx
    client-key-data: xxxxxxxxxxxxxxxx

then we should forward minikube port in your own pc

#where you host minikube
MACHINE_IP_ADDRESS=10.200.60.102
USER=ayay
MINIKUBE_IP_ADDRESS=$(ssh -o 'UserKnownHostsFile /dev/null' $USER@$MACHINE_IP_ADDRESS '$HOME/bin/minikube ip')
ssh -o 'UserKnownHostsFile /dev/null' $USER@$MACHINE_IP_ADDRESS -L "*:8443:$MINIKUBE_IP_ADDRESS:8443" -N -f

2. Create workspace

  1. get git repo link
  2. choose appropriate provider
  3. choose ide type and version
  4. and go!

Useful Command

Install Kubectl

for more information, you can check 🔗link to install kubectl

  • How to use it in devpod

    Everything works fine.

    when you in pod, and using kubectl you should change clusters.cluster.server in ~/.kube/config to https://<$minikube_ip>:8443

  • exec into devpod

kubectl -n devpod exec -it <$resource_id> -c devpod -- bin/bash
  • add DNS item
10.aaa.bbb.ccc gitee.zhejianglab.com
  • shutdown ssh tunnel
    # check if port 8443 is already open
    netstat -aon|findstr "8443"
    
    # find PID
    ps | grep ssh
    
    # kill the process
    taskkill /PID <$PID> /T /F
    # check if port 8443 is already open
    netstat -aon|findstr "8443"
    
    # find PID
    ps | grep ssh
    
    # kill the process
    kill -9 <$PID>

Dev Conatiner

write .devcontainer.json

Deploy

Subsections of Operator

KubeBuilder

Basic

Kubebuilder 是一个使用 CRDs 构建 K8s API 的 SDK,主要是:

  • 基于 controller-runtime 以及 client-go 构建
  • 提供一套可扩展的 API 框架,方便用户从零开始开发 CRDsControllers 和 Admission Webhooks 来扩展 K8s。
  • 还提供脚手架工具初始化 CRDs 工程,自动生成 boilerplate 模板代码和配置;

Architecture

mvc mvc

Main.go

import (
	_ "k8s.io/client-go/plugin/pkg/client/auth"

	ctrl "sigs.k8s.io/controller-runtime"
)
// nolint:gocyclo
func main() {
    ...

    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{}

    ...
    if err = (&controller.GuestbookReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "Guestbook")
        os.Exit(1)
    }

    ...
    if os.Getenv("ENABLE_WEBHOOKS") != "false" {
        if err = webhookwebappv1.SetupGuestbookWebhookWithManager(mgr); err != nil {
            setupLog.Error(err, "unable to create webhook", "webhook", "Guestbook")
            os.Exit(1)
        }
    }

Manager

Manager是核心组件,可以协调多个控制器、处理缓存、客户端、领导选举等,来自https://github.com/kubernetes-sigs/controller-runtime/blob/v0.20.0/pkg/manager/manager.go

  • Client 承担了与 Kubernetes API Server 通信、操作资源对象、读写缓存等关键职责; 分为两类:
    • Reader:优先读Cache, 避免频繁访问 API Server, Get后放缓存
    • Writer: 支持写操作(Create、Update、Delete、Patch),直接与 API Server 交互。
    • informers 是 client-go 提供的核心组件,用于监听(Watch)Kubernetes API Server 中特定资源类型(如 Pod、Deployment 或自定义 CRD)的变更事件(Create/Update/Delete)。
      • Client 依赖 Informer 机制自动同步缓存。当 API Server 中资源变更时,Informer 会定时更新本地缓存,确保后续读操作获取最新数据。
  • Cache
    • Cache 通过 内置的client 的 ListWatcher机制 监听 API Server 的资源变更。
    • 事件被写入本地缓存(如 Indexer),避免频繁访问 API Server。
    • 缓存(Cache)的作用是减少对API Server的直接请求,同时保证控制器能够快速读取资源的最新状态。
  • Event

    Kubernetes API Server 通过 HTTP 长连接 推送资源变更事件,client-go 的 Informer 负责监听这些消息。

    • Event:事件是Kubernetes API Server与Controller之间传递的信息,包含资源类型、资源名称、事件类型(ADDED、MODIFIED、DELETED)等信息,并转换成requets, check link
    • API Server → Manager的Informer → Cache → Controller的Watch → Predicate过滤 → WorkQueue → Controller的Reconcile()方法

Controller

It’s a controller’s job to ensure that, for any given object the actual state of the world matches the desired state in the object. Each controller focuses on one root Kind, but may interact with other Kinds.

func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    ...
}
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&webappv1.Guestbook{}).
		Named("guestbook").
		Complete(r)
}

If you wanna build your own controller, please check https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md

  1. 每个Controller在初始化时会向Manager注册它关心的资源类型(例如通过Owns(&v1.Pod{})声明关注Pod资源)。

  2. Manager根据Controller的注册信息,为相关资源创建对应的Informer和Watch, check link

  3. 当资源变更事件发生时,Informer会将事件从缓存中取出,并通过Predicate(过滤器)判断是否需要触发协调逻辑。

  4. 若事件通过过滤,Controller会将事件加入队列(WorkQueue),最终调用用户实现的Reconcile()函数进行处理, check link

func (c *Controller[request]) Start(ctx context.Context) error {

	c.ctx = ctx

	queue := c.NewQueue(c.Name, c.RateLimiter)

    c.Queue = &priorityQueueWrapper[request]{TypedRateLimitingInterface: queue}

	err := func() error {

            // start to sync event sources
            if err := c.startEventSources(ctx); err != nil {
                return err
            }

            for i := 0; i < c.MaxConcurrentReconciles; i++ {
                go func() {
                    for c.processNextWorkItem(ctx) {

                    }
                }()
            }
	}()

	c.LogConstructor(nil).Info("All workers finished")
}
func (c *Controller[request]) processNextWorkItem(ctx context.Context) bool {
	obj, priority, shutdown := c.Queue.GetWithPriority()

	c.reconcileHandler(ctx, obj, priority)

}

Webhook

Webhooks are a mechanism to intercept requests to the Kubernetes API server. They can be used to validate, mutate, or even proxy requests.

func (d *GuestbookCustomDefaulter) Default(ctx context.Context, obj runtime.Object) error {}

func (v *GuestbookCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {}

func (v *GuestbookCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {}

func (v *GuestbookCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {}

func SetupGuestbookWebhookWithManager(mgr ctrl.Manager) error {
	return ctrl.NewWebhookManagedBy(mgr).For(&webappv1.Guestbook{}).
		WithValidator(&GuestbookCustomValidator{}).
		WithDefaulter(&GuestbookCustomDefaulter{}).
		Complete()
}

Subsections of KubeBuilder

Quick Start

Prerequisites

  • go version v1.23.0+
  • docker version 17.03+.
  • kubectl version v1.11.3+.
  • Access to a Kubernetes v1.11.3+ cluster.

Installation

# download kubebuilder and install locally.
curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
chmod +x kubebuilder && sudo mv kubebuilder /usr/local/bin/

Create A Project

mkdir -p ~/projects/guestbook
cd ~/projects/guestbook
kubebuilder init --domain my.domain --repo my.domain/guestbook

Just try again!

rm -rf ~/projects/guestbook/*
kubebuilder init --domain my.domain --repo my.domain/guestbook

Create An API

kubebuilder create api --group webapp --version v1 --kind Guestbook
apt-get -y install make
rm -rf ~/projects/guestbook/*
kubebuilder init --domain my.domain --repo my.domain/guestbook
kubebuilder create api --group webapp --version v1 --kind Guestbook

Prepare a K8s Cluster

cluster in
minikube start --kubernetes-version=v1.27.10 --image-mirror-country=cn --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --cpus=4 --memory=4g --disk-size=50g --force

asdasda

you can moidfy file /~/projects/guestbook/api/v1/guestbook_types.go

type GuestbookSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// Foo is an example field of Guestbook. Edit guestbook_types.go to remove/update
	Foo string `json:"foo,omitempty"`
}

which will corresponding to the file /~/projects/guestbook/config/samples/webapp_v1_guestbook.yaml

If you are editing the API definitions, generate the manifests such as Custom Resources (CRs) or Custom Resource Definitions (CRDs) using

make manifests

you can moidfy file /~/projects/guestbook/internal/controller/guestbook_controller.go

// 	"fmt"
// "k8s.io/apimachinery/pkg/api/errors"
// "k8s.io/apimachinery/pkg/types"
// 	appsv1 "k8s.io/api/apps/v1"
//	corev1 "k8s.io/api/core/v1"
//	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	// The context is used to allow cancellation of requests, and potentially things like tracing. 
	_ = log.FromContext(ctx)

	fmt.Printf("I am a controller ->>>>>>")
	fmt.Printf("Name: %s, Namespace: %s", req.Name, req.Namespace)

	guestbook := &webappv1.Guestbook{}
	if err := r.Get(ctx, req.NamespacedName, guestbook); err != nil {
		return ctrl.Result{}, err
	}

	fooString := guestbook.Spec.Foo
	replicas := int32(1)
	fmt.Printf("Foo String: %s", fooString)

	// labels := map[string]string{
	// 	"app": req.Name,
	// }

	// dep := &appsv1.Deployment{
	// 	ObjectMeta: metav1.ObjectMeta{
	// 		Name:      fooString + "-deployment",
	// 		Namespace: req.Namespace,
	// 		Labels:    labels,
	// 	},
	// 	Spec: appsv1.DeploymentSpec{
	// 		Replicas: &replicas,
	// 		Selector: &metav1.LabelSelector{
	// 			MatchLabels: labels,
	// 		},
	// 		Template: corev1.PodTemplateSpec{
	// 			ObjectMeta: metav1.ObjectMeta{
	// 				Labels: labels,
	// 			},
	// 			Spec: corev1.PodSpec{
	// 				Containers: []corev1.Container{{
	// 					Name:  fooString,
	// 					Image: "busybox:latest",
	// 				}},
	// 			},
	// 		},
	// 	},
	// }

	// existingDep := &appsv1.Deployment{}
	// err := r.Get(ctx, types.NamespacedName{Name: dep.Name, Namespace: dep.Namespace}, existingDep)
	// if err != nil {
	// 	if errors.IsNotFound(err) {
	// 		if err := r.Create(ctx, dep); err != nil {
	// 			return ctrl.Result{}, err
	// 		}
	// 	} else {
	// 		return ctrl.Result{}, err
	// 	}
	// }

	return ctrl.Result{}, nil
}

And you can use make run to test your controller.

make run

and use following command to send a request

make sure you install crds -> make install before you exec this following command

make install
kubectl apply -k config/samples/

your controller terminal should be look like this

I am a controller ->>>>>>Name: guestbook-sample, Namespace: defaultFoo String: foo-value

Install CRDs

check installed crds in k8s

kubectl get crds

install guestbook crd in k8s

cd ~/projects/guestbook
make install

uninstall CRDs

make uninstall

make undeploy

Deploy to cluster

make docker-build IMG=aaron666/guestbook-operator:test
make docker-build docker-push IMG=<some-registry>/<project-name>:tag
make deploy IMG=<some-registry>/<project-name>:tag

Operator-SDK

    Subsections of Proxy

    Daocloud

    1. install container tools

    systemctl stop firewalld && systemctl disable firewalld
    sudo dnf install -y podman
    podman run -d -P m.daocloud.io/docker.io/library/nginx

    KubeVPN

    1.install krew

      1. download and install krew
      1. Add the $HOME/.krew/bin directory to your PATH environment variable.
    export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
      1. Run kubectl krew to check the installation
    kubectl krew list

    2. Download from kubevpn source from github

    kubectl krew index add kubevpn https://gitclone.com/github.com/kubenetworks/kubevpn.git
    kubectl krew install kubevpn/kubevpn
    kubectl kubevpn 

    3. Deploy VPN in some cluster

    Using different config to access different cluster and deploy vpn in that k8s.

    kubectl kubevpn connect
    kubectl kubevpn connect --kubeconfig /root/.kube/xxx_config

    Your terminal should look like this:

    ➜  ~ kubectl kubevpn connect
    Password:
    Starting connect
    Getting network CIDR from cluster info...
    Getting network CIDR from CNI...
    Getting network CIDR from services...
    Labeling Namespace default
    Creating ServiceAccount kubevpn-traffic-manager
    Creating Roles kubevpn-traffic-manager
    Creating RoleBinding kubevpn-traffic-manager
    Creating Service kubevpn-traffic-manager
    Creating MutatingWebhookConfiguration kubevpn-traffic-manager
    Creating Deployment kubevpn-traffic-manager
    
    Pod kubevpn-traffic-manager-66d969fd45-9zlbp is Pending
    Container     Reason            Message
    control-plane ContainerCreating
    vpn           ContainerCreating
    webhook       ContainerCreating
    
    Pod kubevpn-traffic-manager-66d969fd45-9zlbp is Running
    Container     Reason           Message
    control-plane ContainerRunning
    vpn           ContainerRunning
    webhook       ContainerRunning
    
    Forwarding port...
    Connected tunnel
    Adding route...
    Configured DNS service
    +----------------------------------------------------------+
    | Now you can access resources in the kubernetes cluster ! |
    +----------------------------------------------------------+

    already connected to cluster network, use command kubectl kubevpn status to check status

    ➜  ~ kubectl kubevpn status
    ID Mode Cluster   Kubeconfig                  Namespace            Status      Netif
    0  full ops-dev   /root/.kube/zverse_config   data-and-computing   Connected   utun0

    use pod productpage-788df7ff7f-jpkcs IP 172.29.2.134

    ➜  ~ kubectl get pods -o wide
    NAME                                       AGE     IP                NODE              NOMINATED NODE  GATES
    authors-dbb57d856-mbgqk                    7d23h   172.29.2.132      192.168.0.5       <none>         
    details-7d8b5f6bcf-hcl4t                   61d     172.29.0.77       192.168.104.255   <none>         
    kubevpn-traffic-manager-66d969fd45-9zlbp   74s     172.29.2.136      192.168.0.5       <none>         
    productpage-788df7ff7f-jpkcs               61d     172.29.2.134      192.168.0.5       <none>         
    ratings-77b6cd4499-zvl6c                   61d     172.29.0.86       192.168.104.255   <none>         
    reviews-85c88894d9-vgkxd                   24d     172.29.2.249      192.168.0.5       <none>         

    use ping to test connection, seems good

    ➜  ~ ping 172.29.2.134
    PING 172.29.2.134 (172.29.2.134): 56 data bytes
    64 bytes from 172.29.2.134: icmp_seq=0 ttl=63 time=55.727 ms
    64 bytes from 172.29.2.134: icmp_seq=1 ttl=63 time=56.270 ms
    64 bytes from 172.29.2.134: icmp_seq=2 ttl=63 time=55.228 ms
    64 bytes from 172.29.2.134: icmp_seq=3 ttl=63 time=54.293 ms
    ^C
    --- 172.29.2.134 ping statistics ---
    4 packets transmitted, 4 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 54.293/55.380/56.270/0.728 ms

    use service productpage IP 172.21.10.49

    ➜  ~ kubectl get services -o wide
    NAME                      TYPE        CLUSTER-IP     PORT(S)              SELECTOR
    authors                   ClusterIP   172.21.5.160   9080/TCP             app=authors
    details                   ClusterIP   172.21.6.183   9080/TCP             app=details
    kubernetes                ClusterIP   172.21.0.1     443/TCP              <none>
    kubevpn-traffic-manager   ClusterIP   172.21.2.86    84xxxxxx0/TCP        app=kubevpn-traffic-manager
    productpage               ClusterIP   172.21.10.49   9080/TCP             app=productpage
    ratings                   ClusterIP   172.21.3.247   9080/TCP             app=ratings
    reviews                   ClusterIP   172.21.8.24    9080/TCP             app=reviews

    use command curl to test service connection

    ➜  ~ curl 172.21.10.49:9080
    <!DOCTYPE html>
    <html>
      <head>
        <title>Simple Bookstore App</title>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    seems good too~

    Domain resolve

    a Pod/Service named productpage in the default namespace can successfully resolve by following name:

    • productpage
    • productpage.default
    • productpage.default.svc.cluster.local
    ➜  ~ curl productpage.default.svc.cluster.local:9080
    <!DOCTYPE html>
    <html>
      <head>
        <title>Simple Bookstore App</title>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    Short domain resolve

    To access the service in the cluster, service name or you can use the short domain name, such as productpage

    ➜  ~ curl productpage:9080
    <!DOCTYPE html>
    <html>
      <head>
        <title>Simple Bookstore App</title>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    ...

    Disclaimer: This only works on the namespace where kubevpn-traffic-manager is deployed.

    Subsections of Serverless

    Subsections of Kserve

    Install Kserve

    Preliminary

    • v 1.30 + Kubernetes has installed, if not check 🔗link
    • Helm has installed, if not check 🔗link

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm binary has installed, if not check 🔗link


    1.install from script directly

    curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.15/hack/quick_install.sh" | bash
    Expectd Output

    Installing Gateway API CRDs …

    😀 Successfully installed Istio

    😀 Successfully installed Cert Manager

    😀 Successfully installed Knative

    But you probably will ecounter some error due to the network, like this:

    you need to reinstall some components

    export KSERVE_VERSION=v0.15.2
    export deploymentMode=Serverless
    helm upgrade --namespace kserve kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version $KSERVE_VERSION
    helm upgrade --namespace kserve kserve oci://ghcr.io/kserve/charts/kserve --version $KSERVE_VERSION --set-string kserve.controller.deploymentMode="$deploymentMode"
    # helm upgrade knative-operator --namespace knative-serving  https://github.com/knative/operator/releases/download/knative-v1.15.7/knative-operator-v1.15.7.tgz

    Preliminary

    1. If you have only one node in your cluster, you need at least 6 CPUs, 6 GB of memory, and 30 GB of disk storage.


    2. If you have multiple nodes in your cluster, for each node you need at least 2 CPUs, 4 GB of memory, and 20 GB of disk storage.


    1.install knative serving CRD resources

    kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-crds.yaml

    2.install knative serving components

    kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-core.yaml
    # kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/assets/refs/heads/main/knative/serving/release/download/knative-v1.18.0/serving-core.yaml

    3.install network layer Istio

    kubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/istio.yaml
    kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/istio.yaml
    kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/net-istio.yaml

    Monitor the Knative components until all of the components show a STATUS of Running or Completed.

    kubectl get pods -n knative-serving
    
    #NAME                                      READY   STATUS    RESTARTS   AGE
    #3scale-kourier-control-54cc54cc58-mmdgq   1/1     Running   0          81s
    #activator-67656dcbbb-8mftq                1/1     Running   0          97s
    #autoscaler-df6856b64-5h4lc                1/1     Running   0          97s
    #controller-788796f49d-4x6pm               1/1     Running   0          97s
    #domain-mapping-65f58c79dc-9cw6d           1/1     Running   0          97s
    #domainmapping-webhook-cc646465c-jnwbz     1/1     Running   0          97s
    #webhook-859796bc7-8n5g2                   1/1     Running   0          96s

    4.install cert manager

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.2/cert-manager.yaml

    5.install kserve

    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve.yaml
    kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve-cluster-resources.yaml
    Reference

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Helm binary has installed, if not check 🔗link


    1.install gateway API CRDs

    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml

    2.install cert manager

    Reference

    following 🔗link to install cert manager

    3.install istio system

    Reference

    following 🔗link to install three istio components (istio-base, istiod, istio-ingressgateway)

    4.install Knative Operator

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: knative-operator
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://knative.github.io/operator
        chart: knative-operator
        targetRevision: v1.18.1
        helm:
          releaseName: knative-operator
          values: |
            knative_operator:
              knative_operator:
                image: m.daocloud.io/gcr.io/knative-releases/knative.dev/operator/cmd/operator
                tag: v1.18.1
                resources:
                  requests:
                    cpu: 100m
                    memory: 100Mi
                  limits:
                    cpu: 1000m
                    memory: 1000Mi
              operator_webhook:
                image: m.daocloud.io/gcr.io/knative-releases/knative.dev/operator/cmd/webhook
                tag: v1.18.1
                resources:
                  requests:
                    cpu: 100m
                    memory: 100Mi
                  limits:
                    cpu: 500m
                    memory: 500Mi
      destination:
        server: https://kubernetes.default.svc
        namespace: knative-serving
    EOF

    5.sync by argocd

    argocd app sync argocd/knative-operator

    6.install kserve serving CRD

    kubectl apply -f - <<EOF
    apiVersion: operator.knative.dev/v1beta1
    kind: KnativeServing
    metadata:
      name: knative-serving
      namespace: knative-serving
    spec:
      version: 1.18.0 # this is knative serving version
      config:
        domain:
          example.com: ""
    EOF

    7.install kserve CRD

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: kserve-crd
      annotations:
        argocd.argoproj.io/sync-options: ServerSideApply=true
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true
      project: default
      source:
        repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
        chart: kserve-crd
        targetRevision: v0.15.2
        helm:
          releaseName: kserve-crd 
      destination:
        server: https://kubernetes.default.svc
        namespace: kserve
    EOF
    knative-serving    activator-cbf5b6b55-7gw8s                                 Running        116s
    knative-serving    autoscaler-c5d454c88-nxrms                                Running        115s
    knative-serving    autoscaler-hpa-6c966695c6-9ld24                           Running        113s
    knative-serving    cleanup-serving-serving-1.18.0-45nhg                      Completed      113s
    knative-serving    controller-84f96b7676-jjqfp                               Running        115s
    knative-serving    net-istio-controller-574679cd5f-2sf4d                     Running        112s
    knative-serving    net-istio-webhook-85c99487db-mmq7n                        Running        111s
    knative-serving    storage-version-migration-serving-serving-1.18.0-k28vf    Completed      113s
    knative-serving    webhook-75d4fb6db5-qqcwz                                  Running        114s

    8.sync by argocd

    argocd app sync argocd/kserve-crd

    9.install kserve Controller

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: kserve
      annotations:
        argocd.argoproj.io/sync-options: ServerSideApply=true
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true
      project: default
      source:
        repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
        chart: kserve
        targetRevision: v0.15.2
        helm:
          releaseName: kserve
          values: |
            kserve:
              agent:
                image: m.daocloud.io/docker.io/kserve/agent
              router:
                image: m.daocloud.io/docker.io/kserve/router
              storage:
                image: m.daocloud.io/docker.io/kserve/storage-initializer
                s3:
                  accessKeyIdName: AWS_ACCESS_KEY_ID
                  secretAccessKeyName: AWS_SECRET_ACCESS_KEY
                  endpoint: ""
                  region: ""
                  verifySSL: ""
                  useVirtualBucket: ""
                  useAnonymousCredential: ""
              controller:
                deploymentMode: "Serverless"
                rbacProxyImage: m.daocloud.io/quay.io/brancz/kube-rbac-proxy:v0.18.0
                rbacProxy:
                  resources:
                    limits:
                      cpu: 100m
                      memory: 300Mi
                    requests:
                      cpu: 100m
                      memory: 300Mi
                gateway:
                  domain: example.com
                image: m.daocloud.io/docker.io/kserve/kserve-controller
                resources:
                  limits:
                    cpu: 100m
                    memory: 300Mi
                  requests:
                    cpu: 100m
                    memory: 300Mi
              servingruntime:
                tensorflow:
                  image: tensorflow/serving
                  tag: 2.6.2
                mlserver:
                  image: m.daocloud.io/docker.io/seldonio/mlserver
                  tag: 1.5.0
                sklearnserver:
                  image: m.daocloud.io/docker.io/kserve/sklearnserver
                xgbserver:
                  image: m.daocloud.io/docker.io/kserve/xgbserver
                huggingfaceserver:
                  image: m.daocloud.io/docker.io/kserve/huggingfaceserver
                  devShm:
                    enabled: false
                    sizeLimit: ""
                  hostIPC:
                    enabled: false
                huggingfaceserver_multinode:
                  shm:
                    enabled: true
                    sizeLimit: "3Gi"
                tritonserver:
                  image: nvcr.io/nvidia/tritonserver
                pmmlserver:
                  image: m.daocloud.io/docker.io/kserve/pmmlserver
                paddleserver:
                  image: m.daocloud.io/docker.io/kserve/paddleserver
                lgbserver:
                  image: m.daocloud.io/docker.io/kserve/lgbserver
                torchserve:
                  image: pytorch/torchserve-kfs
                  tag: 0.9.0
                art:
                  image: m.daocloud.io/docker.io/kserve/art-explainer
              localmodel:
                enabled: false
                controller:
                  image: m.daocloud.io/docker.io/kserve/kserve-localmodel-controller
                jobNamespace: kserve-localmodel-jobs
                agent:
                  hostPath: /mnt/models
                  image: m.daocloud.io/docker.io/kserve/kserve-localmodelnode-agent
              inferenceservice:
                resources:
                  limits:
                    cpu: "1"
                    memory: "2Gi"
                  requests:
                    cpu: "1"
                    memory: "2Gi"
      destination:
        server: https://kubernetes.default.svc
        namespace: kserve
    EOF
    Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": failed to call webhook: Post "https://kserve-webhook-server-service.kserve.svc:443/validate-serving-kserve-io-v1alpha1-clusterservingruntime?timeout=10s": no endpoints available for service "kserve-webhook-server-service"                               Running        114s

    Just wait for a while and the resync, and it will be fine.

    10.sync by argocd

    argocd app sync argocd/kserve

    11.install kserve eventing CRD

    kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.18.1/eventing-crds.yaml

    12.install kserve eventing

    kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.18.1/eventing-core.yaml
    knative-eventing   eventing-controller-cc45869cd-fmhg8        1/1     Running       0          3m33s
    knative-eventing   eventing-webhook-67fcc6959b-lktxd          1/1     Running       0          3m33s
    knative-eventing   job-sink-7f5d754db-tbf2z                   1/1     Running       0          3m33s

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Subsections of Serving

    Subsections of Inference

    First Pytorch ISVC

    Mnist Inference

    More Information about mnist service can be found 🔗link

    1. create a namespace
    kubectl create namespace kserve-test
    1. deploy a sample iris service
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "first-torchserve"
      namespace: kserve-test
    spec:
      predictor:
        model:
          modelFormat:
            name: pytorch
          storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
          resources:
            limits:
              memory: 4Gi
    EOF
    1. Check InferenceService status
    kubectl -n kserve-test get inferenceservices first-torchserve 
    kubectl -n kserve-test get pod
    #NAME                                           READY   STATUS    RESTARTS   AGE
    #first-torchserve-predictor-00001-deplo...      2/2     Running   0          25s
    
    kubectl -n kserve-test get inferenceservices first-torchserve
    #NAME           URL   READY     PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
    #kserve-test   first-torchserve      http://first-torchserve.kserve-test.example.com   True           100                              first-torchserve-predictor-00001   2m59s

    After all pods are ready, you can access the service by using the following command

    Access By

    If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

    export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

    If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.

    export INGRESS_HOST=$(minikube ip)
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    export INGRESS_HOST=$(minikube ip)
    kubectl port-forward --namespace istio-system svc/istio-ingressgateway 30080:80
    export INGRESS_PORT=30080
    1. Perform a prediction First, prepare your inference input request inside a file:
    wget -O ./mnist-input.json https://raw.githubusercontent.com/kserve/kserve/refs/heads/master/docs/samples/v1beta1/torchserve/v1/imgconv/input.json
    ssh -i ~/.minikube/machines/minikube/id_rsa docker@$(minikube ip) -L "*:${INGRESS_PORT}:0.0.0.0:${INGRESS_PORT}" -N -f
    1. Invoke the service
    SERVICE_HOSTNAME=$(kubectl -n kserve-test get inferenceservice first-torchserve  -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    # http://first-torchserve.kserve-test.example.com 
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist:predict" -d @./mnist-input.json
    *   Trying 192.168.58.2...
    * TCP_NODELAY set
    * Connected to 192.168.58.2 (192.168.58.2) port 32132 (#0)
    > POST /v1/models/mnist:predict HTTP/1.1
    > Host: my-torchserve.kserve-test.example.com
    > User-Agent: curl/7.61.1
    > Accept: */*
    > Content-Type: application/json
    > Content-Length: 401
    > 
    * upload completely sent off: 401 out of 401 bytes
    < HTTP/1.1 200 OK
    < content-length: 19
    < content-type: application/json
    < date: Mon, 09 Jun 2025 09:27:27 GMT
    < server: istio-envoy
    < x-envoy-upstream-service-time: 1128
    < 
    * Connection #0 to host 192.168.58.2 left intact
    {"predictions":[2]}

    First Custom Model

    AlexNet Inference

    More Information about AlexNet service can be found 🔗link

    1. Implement Custom Model using KServe API
     1import argparse
     2import base64
     3import io
     4import time
     5
     6from fastapi.middleware.cors import CORSMiddleware
     7from torchvision import models, transforms
     8from typing import Dict
     9import torch
    10from PIL import Image
    11
    12import kserve
    13from kserve import Model, ModelServer, logging
    14from kserve.model_server import app
    15from kserve.utils.utils import generate_uuid
    16
    17
    18class AlexNetModel(Model):
    19    def __init__(self, name: str):
    20        super().__init__(name, return_response_headers=True)
    21        self.name = name
    22        self.load()
    23        self.ready = False
    24
    25    def load(self):
    26        self.model = models.alexnet(pretrained=True)
    27        self.model.eval()
    28        # The ready flag is used by model ready endpoint for readiness probes,
    29        # set to True when model is loaded successfully without exceptions.
    30        self.ready = True
    31
    32    async def predict(
    33        self,
    34        payload: Dict,
    35        headers: Dict[str, str] = None,
    36        response_headers: Dict[str, str] = None,
    37    ) -> Dict:
    38        start = time.time()
    39        # Input follows the Tensorflow V1 HTTP API for binary values
    40        # https://www.tensorflow.org/tfx/serving/api_rest#encoding_binary_values
    41        img_data = payload["instances"][0]["image"]["b64"]
    42        raw_img_data = base64.b64decode(img_data)
    43        input_image = Image.open(io.BytesIO(raw_img_data))
    44        preprocess = transforms.Compose([
    45            transforms.Resize(256),
    46            transforms.CenterCrop(224),
    47            transforms.ToTensor(),
    48            transforms.Normalize(mean=[0.485, 0.456, 0.406],
    49                                 std=[0.229, 0.224, 0.225]),
    50        ])
    51        input_tensor = preprocess(input_image).unsqueeze(0)
    52        output = self.model(input_tensor)
    53        torch.nn.functional.softmax(output, dim=1)
    54        values, top_5 = torch.topk(output, 5)
    55        result = values.flatten().tolist()
    56        end = time.time()
    57        response_id = generate_uuid()
    58
    59        # Custom response headers can be added to the inference response
    60        if response_headers is not None:
    61            response_headers.update(
    62                {"prediction-time-latency": f"{round((end - start) * 1000, 9)}"}
    63            )
    64
    65        return {"predictions": result}
    66
    67
    68parser = argparse.ArgumentParser(parents=[kserve.model_server.parser])
    69args, _ = parser.parse_known_args()
    70
    71if __name__ == "__main__":
    72    # Configure kserve and uvicorn logger
    73    if args.configure_logging:
    74        logging.configure_logging(args.log_config_file)
    75    model = AlexNetModel(args.model_name)
    76    model.load()
    77    # Custom middlewares can be added to the model
    78    app.add_middleware(
    79        CORSMiddleware,
    80        allow_origins=["*"],
    81        allow_credentials=True,
    82        allow_methods=["*"],
    83        allow_headers=["*"],
    84    )
    85    ModelServer().start([model])
    1. create requirements.txt
    kserve
    torchvision==0.18.0
    pillow>=10.3.0,<11.0.0
    1. create Dockerfile
    FROM m.daocloud.io/docker.io/library/python:3.11-slim
    
    WORKDIR /app
    
    COPY requirements.txt .
    RUN pip install --no-cache-dir  -r requirements.txt 
    
    COPY model.py .
    
    CMD ["python", "model.py", "--model_name=custom-model"]
    1. build and push custom docker image
    docker build -t ay-custom-model .
    docker tag ddfd0186813e docker-registry.lab.zverse.space/ay/ay-custom-model:latest
    docker push docker-registry.lab.zverse.space/ay/ay-custom-model:latest
    1. create a namespace
    kubectl create namespace kserve-test
    1. deploy a sample custom-model service
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
      name: ay-custom-model
    spec:
      predictor:
        containers:
          - name: kserve-container
            image: docker-registry.lab.zverse.space/ay/ay-custom-model:latest
    EOF
    1. Check InferenceService status
    kubectl -n kserve-test get inferenceservices ay-custom-model
    kubectl -n kserve-test get pod
    #NAME                                           READY   STATUS    RESTARTS   AGE
    #ay-custom-model-predictor-00003-dcf4rk         2/2     Running   0        167m
    
    kubectl -n kserve-test get inferenceservices ay-custom-model
    #NAME           URL   READY     PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
    #ay-custom-model   http://ay-custom-model.kserve-test.example.com   True           100                              ay-custom-model-predictor-00003   177m

    After all pods are ready, you can access the service by using the following command

    Access By

    If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

    export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

    If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.

    export INGRESS_HOST=$(minikube ip)
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    export INGRESS_HOST=$(minikube ip)
    kubectl port-forward --namespace istio-system svc/istio-ingressgateway 30080:80
    export INGRESS_PORT=30080
    1. Perform a prediction

    First, prepare your inference input request inside a file:

    wget -O ./alex-net-input.json https://kserve.github.io/website/0.15/modelserving/v1beta1/custom/custom_model/input.json
    ssh -i ~/.minikube/machines/minikube/id_rsa docker@$(minikube ip) -L "*:${INGRESS_PORT}:0.0.0.0:${INGRESS_PORT}" -N -f
    1. Invoke the service
    export SERVICE_HOSTNAME=$(kubectl -n kserve-test get inferenceservice ay-custom-model  -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    # http://ay-custom-model.kserve-test.example.com
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -X POST "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/custom-model:predict" -d @.//alex-net-input.json
    *   Trying 192.168.58.2:30704...
    * Connected to 192.168.58.2 (192.168.58.2) port 30704
    > POST /v1/models/custom-model:predict HTTP/1.1
    > Host: ay-custom-model.kserve-test.example.com
    > User-Agent: curl/8.5.0
    > Accept: */*
    > Content-Type: application/json
    > Content-Length: 105339
    > 
    * We are completely uploaded and fine
    < HTTP/1.1 200 OK
    < content-length: 110
    < content-type: application/json
    < date: Wed, 11 Jun 2025 03:38:30 GMT
    < prediction-time-latency: 89.966773987
    < server: istio-envoy
    < x-envoy-upstream-service-time: 93
    < 
    * Connection #0 to host 192.168.58.2 left intact
    {"predictions":[14.975619316101074,14.0368070602417,13.966034889221191,12.252280235290527,12.086270332336426]}

    First Model In Minio

    Inference Model In Minio

    More Information about Deploy InferenceService with a saved model on S3 can be found 🔗link

    Create Service Account

    === “yaml”

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sa
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/s3access # replace with your IAM role ARN
        serving.kserve.io/s3-endpoint: s3.amazonaws.com # replace with your s3 endpoint e.g minio-service.kubeflow:9000
        serving.kserve.io/s3-usehttps: "1" # by default 1, if testing with minio you can set to 0
        serving.kserve.io/s3-region: "us-east-2"
        serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials

    === “kubectl”

    kubectl apply -f create-s3-sa.yaml

    Create S3 Secret and attach to Service Account

    Create a secret with your S3 user credential, KServe reads the secret annotations to inject the S3 environment variables on storage initializer or model agent to download the models from S3 storage.

    Create S3 secret

    === “yaml”

    apiVersion: v1
    kind: Secret
    metadata:
      name: s3creds
      annotations:
         serving.kserve.io/s3-endpoint: s3.amazonaws.com # replace with your s3 endpoint e.g minio-service.kubeflow:9000
         serving.kserve.io/s3-usehttps: "1" # by default 1, if testing with minio you can set to 0
         serving.kserve.io/s3-region: "us-east-2"
         serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
    type: Opaque
    stringData: # use `stringData` for raw credential string or `data` for base64 encoded string
      AWS_ACCESS_KEY_ID: XXXX
      AWS_SECRET_ACCESS_KEY: XXXXXXXX

    Attach secret to a service account

    === “yaml”

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sa
    secrets:
    - name: s3creds

    === “kubectl”

    kubectl apply -f create-s3-secret.yaml

    !!! note If you are running kserve with istio sidecars enabled, there can be a race condition between the istio proxy being ready and the agent pulling models. This will result in a tcp dial connection refused error when the agent tries to download from s3.

    To resolve it, istio allows the blocking of other containers in a pod until the proxy container is ready.
    
    You can enabled this by setting `proxy.holdApplicationUntilProxyStarts: true` in `istio-sidecar-injector` configmap, `proxy.holdApplicationUntilProxyStarts` flag was introduced in Istio 1.7 as an experimental feature and is turned off by default.
    

    Deploy the model on S3 with InferenceService

    Create the InferenceService with the s3 storageUri and the service account with s3 credential attached.

    === “New Schema”

    ```yaml
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "mnist-s3"
    spec:
      predictor:
        serviceAccountName: sa
        model:
          modelFormat:
            name: tensorflow
          storageUri: "s3://kserve-examples/mnist"
    ```
    

    === “Old Schema”

    ```yaml
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "mnist-s3"
    spec:
      predictor:
        serviceAccountName: sa
        tensorflow:
          storageUri: "s3://kserve-examples/mnist"
    ```
    

    Apply the autoscale-gpu.yaml.

    === “kubectl”

    kubectl apply -f mnist-s3.yaml

    Run a prediction

    Now, the ingress can be accessed at ${INGRESS_HOST}:${INGRESS_PORT} or follow this instruction to find out the ingress IP and port.

    SERVICE_HOSTNAME=$(kubectl get inferenceservice mnist-s3 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    
    MODEL_NAME=mnist-s3
    INPUT_PATH=@./input.json
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

    !!! success “Expected Output”

    ```{ .bash .no-copy }
    Note: Unnecessary use of -X or --request, POST is already inferred.
    *   Trying 35.237.217.209...
    * TCP_NODELAY set
    * Connected to mnist-s3.default.35.237.217.209.xip.io (35.237.217.209) port 80 (#0)
    > POST /v1/models/mnist-s3:predict HTTP/1.1
    > Host: mnist-s3.default.35.237.217.209.xip.io
    > User-Agent: curl/7.55.1
    > Accept: */*
    > Content-Length: 2052
    > Content-Type: application/x-www-form-urlencoded
    > Expect: 100-continue
    >
    < HTTP/1.1 100 Continue
    * We are completely uploaded and fine
    < HTTP/1.1 200 OK
    < content-length: 251
    < content-type: application/json
    < date: Sun, 04 Apr 2021 20:06:27 GMT
    < x-envoy-upstream-service-time: 5
    < server: istio-envoy
    <
    * Connection #0 to host mnist-s3.default.35.237.217.209.xip.io left intact
    {
        "predictions": [
            {
                "predictions": [0.327352405, 2.00153053e-07, 0.0113353515, 0.203903764, 3.62863029e-05, 0.416683704, 0.000281196437, 8.36911859e-05, 0.0403052084, 1.82206513e-05],
                "classes": 5
            }
        ]
    }
    ```
    

    Kafka Sink Transformer

    AlexNet Inference

    More Information about Custom Transformer service can be found 🔗link

    1. Implement Custom Transformer ./model.py using Kserve API
     1import os
     2import argparse
     3import json
     4
     5from typing import Dict, Union
     6from kafka import KafkaProducer
     7from cloudevents.http import CloudEvent
     8from cloudevents.conversion import to_structured
     9
    10from kserve import (
    11    Model,
    12    ModelServer,
    13    model_server,
    14    logging,
    15    InferRequest,
    16    InferResponse,
    17)
    18
    19from kserve.logging import logger
    20from kserve.utils.utils import generate_uuid
    21
    22kafka_producer = KafkaProducer(
    23    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    24    bootstrap_servers=os.environ.get('KAFKA_BOOTSTRAP_SERVERS', 'localhost:9092')
    25)
    26
    27class ImageTransformer(Model):
    28    def __init__(self, name: str):
    29        super().__init__(name, return_response_headers=True)
    30        self.ready = True
    31
    32
    33    def preprocess(
    34        self, payload: Union[Dict, InferRequest], headers: Dict[str, str] = None
    35    ) -> Union[Dict, InferRequest]:
    36        logger.info("Received inputs %s", payload)
    37        logger.info("Received headers %s", headers)
    38        self.request_trace_key = os.environ.get('REQUEST_TRACE_KEY', 'algo.trace.requestId')
    39        if self.request_trace_key not in payload:
    40            logger.error("Request trace key '%s' not found in payload, you cannot trace the prediction result", self.request_trace_key)
    41            if "instances" not in payload:
    42                raise ValueError(
    43                    f"Request trace key '{self.request_trace_key}' not found in payload and 'instances' key is missing."
    44                )
    45        else:
    46            headers[self.request_trace_key] = payload.get(self.request_trace_key)
    47   
    48        return {"instances": payload["instances"]}
    49
    50    def postprocess(
    51        self,
    52        infer_response: Union[Dict, InferResponse],
    53        headers: Dict[str, str] = None,
    54        response_headers: Dict[str, str] = None,
    55    ) -> Union[Dict, InferResponse]:
    56        logger.info("postprocess headers: %s", headers)
    57        logger.info("postprocess response headers: %s", response_headers)
    58        logger.info("postprocess response: %s", infer_response)
    59
    60        attributes = {
    61            "source": "data-and-computing/kafka-sink-transformer",
    62            "type": "org.zhejianglab.zverse.data-and-computing.kafka-sink-transformer",
    63            "request-host": headers.get('host', 'unknown'),
    64            "kserve-isvc-name": headers.get('kserve-isvc-name', 'unknown'),
    65            "kserve-isvc-namespace": headers.get('kserve-isvc-namespace', 'unknown'),
    66            self.request_trace_key: headers.get(self.request_trace_key, 'unknown'),
    67        }
    68
    69        _, cloudevent = to_structured(CloudEvent(attributes, infer_response))
    70        try:
    71            kafka_producer.send(os.environ.get('KAFKA_TOPIC', 'test-topic'), value=cloudevent.decode('utf-8').replace("'", '"'))
    72            kafka_producer.flush()
    73        except Exception as e:
    74            logger.error("Failed to send message to Kafka: %s", e)
    75        return infer_response
    76
    77parser = argparse.ArgumentParser(parents=[model_server.parser])
    78args, _ = parser.parse_known_args()
    79
    80if __name__ == "__main__":
    81    if args.configure_logging:
    82        logging.configure_logging(args.log_config_file)
    83    logging.logger.info("available model name: %s", args.model_name)
    84    logging.logger.info("all args: %s", args.model_name)
    85    model = ImageTransformer(args.model_name)
    86    ModelServer().start([model])
    1. modify ./pyproject.toml
    [tool.poetry]
    name = "custom_transformer"
    version = "0.15.2"
    description = "Custom Transformer Examples. Not intended for use outside KServe Frameworks Images."
    authors = ["Dan Sun <dsun20@bloomberg.net>"]
    license = "Apache-2.0"
    packages = [
        { include = "*.py" }
    ]
    
    [tool.poetry.dependencies]
    python = ">=3.9,<3.13"
    kserve = {path = "../kserve", develop = true}
    pillow = "^10.3.0"
    kafka-python = "^2.2.15"
    cloudevents = "^1.11.1"
    
    [[tool.poetry.source]]
    name = "pytorch"
    url = "https://download.pytorch.org/whl/cpu"
    priority = "explicit"
    
    [tool.poetry.group.test]
    optional = true
    
    [tool.poetry.group.test.dependencies]
    pytest = "^7.4.4"
    mypy = "^0.991"
    
    [tool.poetry.group.dev]
    optional = true
    
    [tool.poetry.group.dev.dependencies]
    black = { version = "~24.3.0", extras = ["colorama"] }
    
    [tool.poetry-version-plugin]
    source = "file"
    file_path = "../VERSION"
    
    [build-system]
    requires = ["poetry-core>=1.0.0"]
    build-backend = "poetry.core.masonry.api"
    1. prepare ../custom_transformer.Dockerfile
    ARG PYTHON_VERSION=3.11
    ARG BASE_IMAGE=python:${PYTHON_VERSION}-slim-bookworm
    ARG VENV_PATH=/prod_venv
    
    FROM ${BASE_IMAGE} AS builder
    
    # Install Poetry
    ARG POETRY_HOME=/opt/poetry
    ARG POETRY_VERSION=1.8.3
    
    RUN python3 -m venv ${POETRY_HOME} && ${POETRY_HOME}/bin/pip install poetry==${POETRY_VERSION}
    ENV PATH="$PATH:${POETRY_HOME}/bin"
    
    # Activate virtual env
    ARG VENV_PATH
    ENV VIRTUAL_ENV=${VENV_PATH}
    RUN python3 -m venv $VIRTUAL_ENV
    ENV PATH="$VIRTUAL_ENV/bin:$PATH"
    
    COPY kserve/pyproject.toml kserve/poetry.lock kserve/
    RUN cd kserve && poetry install --no-root --no-interaction --no-cache
    COPY kserve kserve
    RUN cd kserve && poetry install --no-interaction --no-cache
    
    COPY custom_transformer/pyproject.toml custom_transformer/poetry.lock custom_transformer/
    RUN cd custom_transformer && poetry install --no-root --no-interaction --no-cache
    COPY custom_transformer custom_transformer
    RUN cd custom_transformer && poetry install --no-interaction --no-cache
    
    
    FROM ${BASE_IMAGE} AS prod
    
    COPY third_party third_party
    
    # Activate virtual env
    ARG VENV_PATH
    ENV VIRTUAL_ENV=${VENV_PATH}
    ENV PATH="$VIRTUAL_ENV/bin:$PATH"
    
    RUN useradd kserve -m -u 1000 -d /home/kserve
    
    COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
    COPY --from=builder kserve kserve
    COPY --from=builder custom_transformer custom_transformer
    
    USER 1000
    ENTRYPOINT ["python", "-m", "custom_transformer.model"]
    1. regenerate poetry.lock
    poetry lock --no-update
    1. build and push custom docker image
    cd python
    podman build -t docker-registry.lab.zverse.space/data-and-computing/ay-dev/msg-transformer:dev9 -f custom_transformer.Dockerfile .
    
    podman push docker-registry.lab.zverse.space/data-and-computing/ay-dev/msg-transformer:dev9

    Subsections of Generative

    First Generative Service

    B --> C[[Knative Serving]] --> D[自动扩缩容/灰度发布]
    B --> E[[Istio]] --> F[流量管理/安全]
    B --> G[[存储系统]] --> H[S3/GCS/PVC]
    
    ### 单YAML部署推理服务
    ```yaml
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          resources: {}
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

    check CRD

    kubectl -n kserve-test get inferenceservices sklearn-iris 
    kubectl -n istio-system get svc istio-ingressgateway 
    export INGRESS_HOST=$(minikube ip)
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    SERVICE_HOSTNAME=$(kubectl -n kserve-test get inferenceservice sklearn-iris  -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    # http://sklearn-iris.kserve-test.example.com 
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict" -d @./iris-input.json

    How to deploy your own ML model

    apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
      name: huggingface-llama3
      namespace: kserve-test
      annotations:
        serving.kserve.io/deploymentMode: RawDeployment
        serving.kserve.io/autoscalerClass: none
    spec:
      predictor:
        model:
          modelFormat:
            name: huggingface
          storageUri: pvc://llama-3-8b-pvc/hf/8b_instruction_tuned
        workerSpec:
          pipelineParallelSize: 2
          tensorParallelSize: 1
          containers:
          - name: worker-container
              resources: 
              requests:
                  nvidia.com/gpu: "8"

    check https://kserve.github.io/website/0.15/modelserving/v1beta1/llm/huggingface/multi-node/#workerspec-and-servingruntime

    Canary Policy

    KServe supports canary rollouts for inference services. Canary rollouts allow for a new version of an InferenceService to receive a percentage of traffic. Kserve supports a configurable canary rollout strategy with multiple steps. The rollout strategy can also be implemented to rollback to the previous revision if a rollout step fails.

    KServe automatically tracks the last good revision that was rolled out with 100% traffic. The canaryTrafficPercent field in the component’s spec needs to be set with the percentage of traffic that should be routed to the new revision. KServe will then automatically split the traffic between the last good revision and the revision that is currently being rolled out according to the canaryTrafficPercent value.

    When the first revision of an InferenceService is deployed, it will receive 100% of the traffic. When multiple revisions are deployed, as in step 2, and the canary rollout strategy is configured to route 10% of the traffic to the new revision, 90% of the traffic will go to the LastestRolledoutRevision. If there is an unhealthy or bad revision applied, traffic will not be routed to that bad revision. In step 3, the rollout strategy promotes the LatestReadyRevision from step 2 to the LatestRolledoutRevision. Since it is now promoted, the LatestRolledoutRevision gets 100% of the traffic and is fully rolled out. If a rollback needs to happen, 100% of the traffic will be pinned to the previous healthy/good revision- the PreviousRolledoutRevision.

    Canary Rollout Strategy Steps 1-2 Canary Rollout Strategy Steps 1-2 Canary Rollout Strategy Step 3 Canary Rollout Strategy Step 3

    Reference

    For more information, see Canary Rollout.

    Subsections of Canary Policy

    Rollout Example

    Create the InferenceService

    Follow the First Inference Service tutorial. Set up a namespace kserve-test and create an InferenceService.

    After rolling out the first model, 100% traffic goes to the initial model with service revision 1.

    kubectl -n kserve-test get isvc sklearn-iris
    NAME       URL              READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
    sklearn-iris   http://sklearn-iris.kserve-test.example.com   True      100       sklearn-iris-predictor--00001   46s      2m39s     70s

    Apply Canary Rollout Strategy

    • Add the canaryTrafficPercent field to the predictor component
    • Update the storageUri to use a new/updated model.
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        canaryTrafficPercent: 10
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
    EOF

    After rolling out the canary model, traffic is split between the latest ready revision 2 and the previously rolled out revision 1.

    kubectl -n kserve-test get isvc sklearn-iris
    NAME       URL              READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
    sklearn-iris   http://sklearn-iris.kserve-test.example.com   True    90     10       sklearn-iris-predictor-00002   sklearn-iris-predictor-00003   19h

    Check the running pods, you should now see port two pods running for the old and new model and 10% traffic is routed to the new model. Notice revision 1 contains 0002 in its name, while revision 2 contains 0003.

    kubectl get pods 
    
    NAME                                                        READY   STATUS    RESTARTS   AGE
    sklearn-iris-predictor-00002-deployment-c7bb6c685-ktk7r     2/2     Running   0          71m
    sklearn-iris-predictor-00003-deployment-8498d947-fpzcg      2/2     Running   0          20m

    Run a prediction

    Follow the next two steps (Determine the ingress IP and ports and Perform inference) in the First Inference Service tutorial.

    Send more requests to the InferenceService to observe the 10% of traffic that routes to the new revision.

    Promote the canary model

    If the canary model is healthy/passes your tests,

    you can promote it by removing the canaryTrafficPercent field and re-applying the InferenceService custom resource with the same name sklearn-iris

    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
    EOF

    Now all traffic goes to the revision 2 for the new model.

    kubectl get isvc sklearn-iris
    NAME       URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
    sklearn-iris   http://sklearn-iris.kserve-test.example.com   True           100                              sklearn-iris-predictor-00002   17m

    The pods for revision generation 1 automatically scales down to 0 as it is no longer getting the traffic.

    kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris
    NAME                                                           READY   STATUS        RESTARTS   AGE
    sklearn-iris-predictor-00001-deployment-66c5f5b8d5-gmfvj   1/2     Terminating   0          17m
    sklearn-iris-predictor-00002-deployment-5bd9ff46f8-shtzd   2/2     Running       0          15m

    Rollback and pin the previous model

    You can pin the previous model (model v1, for example) by setting the canaryTrafficPercent to 0 for the current model (model v2, for example). This rolls back from model v2 to model v1 and decreases model v2’s traffic to zero.

    Apply the custom resource to set model v2’s traffic to 0%.

    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        canaryTrafficPercent: 0
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
    EOF

    Check the traffic split, now 100% traffic goes to the previous good model (model v1) for revision generation 1.

    kubectl get isvc sklearn-iris
    NAME       URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION              LATESTREADYREVISION                AGE
    sklearn-iris   http://sklearn-iris.kserve-test.example.com   True    100    0        sklearn-iris-predictor-00002   sklearn-iris-predictor-00003   18m

    The pods for previous revision (model v1) now routes 100% of the traffic to its pods while the new model (model v2) routes 0% traffic to its pods.

    kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris
    
    NAME                                                       READY   STATUS        RESTARTS   AGE
    sklearn-iris-predictor-00002-deployment-66c5f5b8d5-gmfvj   1/2     Running       0          35s
    sklearn-iris-predictor-00003-deployment-5bd9ff46f8-shtzd   2/2     Running       0          16m

    Route traffic using a tag

    You can enable tag based routing by adding the annotation serving.kserve.io/enable-tag-routing, so traffic can be explicitly routed to the canary model (model v2) or the old model (model v1) via a tag in the request URL.

    Apply model v2 with canaryTrafficPercent: 10 and serving.kserve.io/enable-tag-routing: "true".

    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      annotations:
        serving.kserve.io/enable-tag-routing: "true"
    spec:
      predictor:
        canaryTrafficPercent: 10
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
    EOF

    Check the InferenceService status to get the canary and previous model URL.

    kubectl get isvc sklearn-iris -ojsonpath="{.status.components.predictor}"  | jq

    The output should look like

    {
        "address": {
        "url": "http://sklearn-iris-predictor-.kserve-test.svc.cluster.local"
        },
        "latestCreatedRevision": "sklearn-iris-predictor--00003",
        "latestReadyRevision": "sklearn-iris-predictor--00003",
        "latestRolledoutRevision": "sklearn-iris-predictor--00001",
        "previousRolledoutRevision": "sklearn-iris-predictor--00001",
        "traffic": [
        {
            "latestRevision": true,
            "percent": 10,
            "revisionName": "sklearn-iris-predictor--00003",
            "tag": "latest",
            "url": "http://latest-sklearn-iris-predictor-.kserve-test.example.com"
        },
        {
            "latestRevision": false,
            "percent": 90,
            "revisionName": "sklearn-iris-predictor--00001",
            "tag": "prev",
            "url": "http://prev-sklearn-iris-predictor-.kserve-test.example.com"
        }
        ],
        "url": "http://sklearn-iris-predictor-.kserve-test.example.com"
    }

    Since we updated the annotation on the InferenceService, model v2 now corresponds to sklearn-iris-predictor--00003.

    You can now send the request explicitly to the new model or the previous model by using the tag in the request URL. Use the curl command from Perform inference and add latest- or prev- to the model name to send a tag based request.

    For example, set the model name and use the following commands to send traffic to each service based on the latest or prev tag.

    curl the latest revision

    MODEL_NAME=sklearn-iris
    curl -v -H "Host: latest-${MODEL_NAME}-predictor-.kserve-test.example.com" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d @./iris-input.json

    or curl the previous revision

    curl -v -H "Host: prev-${MODEL_NAME}-predictor-.kserve-test.example.com" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d @./iris-input.json

    Auto Scaling

    Soft Limit

    You can configure InferenceService with annotation autoscaling.knative.dev/target for a soft limit. The soft limit is a targeted limit rather than a strictly enforced bound, particularly if there is a sudden burst of requests, this value can be exceeded.

    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
      annotations:
        autoscaling.knative.dev/target: "5"
    spec:
      predictor:
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

    Hard Limit

    You can also configure InferenceService with field containerConcurrency with a hard limit. The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.

    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        containerConcurrency: 5
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

    Scale with QPS

    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        scaleTarget: 1
        scaleMetric: qps
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

    Scale with GPU

    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "flowers-sample-gpu"
      namespace: kserve-test
    spec:
      predictor:
        scaleTarget: 1
        scaleMetric: concurrency
        model:
          modelFormat:
            name: tensorflow
          storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
          runtimeVersion: "2.6.2-gpu"
          resources:
            limits:
              nvidia.com/gpu: 1

    Enable Scale To Zero

    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
      namespace: kserve-test
    spec:
      predictor:
        minReplicas: 0
        model:
          args: ["--enable_docs_url=True"]
          modelFormat:
            name: sklearn
          resources: {}
          runtime: kserve-sklearnserver
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

    Prepare Concurrent Requests Container

    # export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    podman run --rm \
          -v /root/kserve/iris-input.json:/tmp/iris-input.json \
          --privileged \
          -e INGRESS_HOST=$(minikube ip) \
          -e INGRESS_PORT=32132 \
          -e MODEL_NAME=sklearn-iris \
          -e INPUT_PATH=/tmp/iris-input.json \
          -e SERVICE_HOSTNAME=sklearn-iris.kserve-test.example.com \
          -it m.daocloud.io/docker.io/library/golang:1.22  bash -c "go install github.com/rakyll/hey@latest; bash"

    Fire

    Send traffic in 30 seconds spurts maintaining 5 in-flight requests.

    hey -z 30s -c 100 -m POST -host ${SERVICE_HOSTNAME} -D $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
    Summary:
      Total:        30.1390 secs
      Slowest:      0.5015 secs
      Fastest:      0.0252 secs
      Average:      0.1451 secs
      Requests/sec: 687.3483
      
      Total data:   4371076 bytes
      Size/request: 211 bytes
    
    Response time histogram:
      0.025 [1]     |
      0.073 [14]    |
      0.120 [33]    |
      0.168 [19363] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.216 [1171]  |■■
      0.263 [28]    |
      0.311 [6]     |
      0.359 [0]     |
      0.406 [0]     |
      0.454 [0]     |
      0.502 [100]   |
    
    
    Latency distribution:
      10% in 0.1341 secs
      25% in 0.1363 secs
      50% in 0.1388 secs
      75% in 0.1462 secs
      90% in 0.1587 secs
      95% in 0.1754 secs
      99% in 0.1968 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:   0.0000 secs, 0.0252 secs, 0.5015 secs
      DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
      req write:    0.0000 secs, 0.0000 secs, 0.0005 secs
      resp wait:    0.1451 secs, 0.0251 secs, 0.5015 secs
      resp read:    0.0000 secs, 0.0000 secs, 0.0003 secs
    
    Status code distribution:
      [500] 20716 responses

    Reference

    For more information, please refer to the KPA documentation.

    Subsections of Knative

    Subsections of Eventing

    Broker

    Knative Broker 是 Knative Eventing 系统的核心组件,它的主要作用是充当事件路由和分发的中枢,在事件生产者(事件源)和事件消费者(服务)之间提供解耦、可靠的事件传输。

    以下是 Knative Broker 的关键作用详解:

    事件接收中心:

    Broker 是事件流汇聚的入口点。各种事件源(如 Kafka 主题、HTTP 源、Cloud Pub/Sub、GitHub Webhooks、定时器、自定义源等)将事件发送到 Broker。

    事件生产者只需知道 Broker 的地址,无需关心最终有哪些消费者或消费者在哪里。

    事件存储与缓冲:

    Broker 通常基于持久化的消息系统实现(如 Apache Kafka, Google Cloud Pub/Sub, RabbitMQ, NATS Streaming 或内存实现 InMemoryChannel)。这提供了:

    持久化: 确保事件在消费者处理前不会丢失(取决于底层通道实现)。

    缓冲: 当消费者暂时不可用或处理速度跟不上事件产生速度时,Broker 可以缓冲事件,避免事件丢失或压垮生产者/消费者。

    重试: 如果消费者处理事件失败,Broker 可以重新投递事件(通常需要结合 Trigger 和 Subscription 的重试策略)。

    解耦事件源和事件消费者:

    这是 Broker 最重要的作用之一。事件源只负责将事件发送到 Broker,完全不知道有哪些服务会消费这些事件。

    事件消费者通过创建 Trigger 向 Broker 声明它对哪些事件感兴趣。消费者只需知道 Broker 的存在,无需知道事件是从哪个具体源产生的。

    这种解耦极大提高了系统的灵活性和可维护性:

    独立演进: 可以独立添加、移除或修改事件源或消费者,只要它们遵循 Broker 的契约。

    动态路由: 基于事件属性(如 type, source)动态路由事件到不同的消费者,无需修改生产者或消费者代码。

    多播: 同一个事件可以被多个不同的消费者同时消费(一个事件 -> Broker -> 多个匹配的 Trigger -> 多个服务)。

    事件过滤与路由(通过 Trigger):

    Broker 本身不直接处理复杂的过滤逻辑。过滤和路由是由 Trigger 资源实现的。

    Trigger 资源绑定到特定的 Broker。

    Trigger 定义了:

    订阅者: 目标服务(Knative Service、Kubernetes Service、Channel 等)的地址。

    过滤器: 基于事件属性(主要是 type 和 source,以及其他可扩展属性)的条件表达式。只有满足条件的事件才会被 Broker 通过该 Trigger 路由到对应的订阅者。

    Broker 接收事件后,会检查所有绑定到它的 Trigger 的过滤器。对于每一个匹配的 Trigger,Broker 都会将事件发送到该 Trigger 指定的订阅者。

    提供标准事件接口:

    Broker 遵循 CloudEvents 规范,它接收和传递的事件都是 CloudEvents 格式的。这为不同来源的事件和不同消费者的处理提供了统一的格式标准,简化了集成。

    多租户和命名空间隔离:

    Broker 通常部署在 Kubernetes 的特定命名空间中。一个命名空间内可以创建多个 Broker。

    这允许在同一个集群内为不同的团队、应用或环境(如 dev, staging)隔离事件流。每个团队/应用可以管理自己命名空间内的 Broker 和 Trigger。

    总结比喻:

    可以把 Knative Broker 想象成一个高度智能的邮局分拣中心:

    接收信件(事件): 来自世界各地(不同事件源)的信件(事件)都寄到这个分拣中心(Broker)。

    存储信件: 分拣中心有仓库(持久化/缓冲)临时存放信件,确保信件安全不丢失。

    分拣规则(Trigger): 分拣中心里有很多分拣员(Trigger)。每个分拣员负责特定类型或来自特定地区的信件(基于事件属性过滤)。

    投递信件: 分拣员(Trigger)找到符合自己负责规则的信件(事件),就把它们投递到正确的收件人(订阅者服务)家门口。

    解耦: 寄信人(事件源)只需要知道分拣中心(Broker)的地址,完全不需要知道收信人(消费者)是谁、在哪里。收信人(消费者)只需要告诉分拣中心里负责自己这类信件的分拣员(创建 Trigger)自己的地址,不需要关心信是谁寄来的。分拣中心(Broker)和分拣员(Trigger)负责中间的复杂路由工作。

    Broker 带来的核心价值:

    松耦合: 彻底解耦事件生产者和消费者。

    灵活性: 动态添加/移除消费者,动态改变路由规则(通过修改/创建/删除 Trigger)。

    可靠性: 提供事件持久化和重试机制(依赖底层实现)。

    可伸缩性: Broker 和消费者都可以独立伸缩。

    标准化: 基于 CloudEvents。

    简化开发: 开发者专注于业务逻辑(生产事件或消费事件),无需自己搭建复杂的事件总线基础设施。

    Subsections of Broker

    Install Kafka Broker

    About

    broker broker

    • Source, curl, kafkaSource,
    • Broker
    • Trigger
    • Sink: ksvc, isvc

    Install a Channel (messaging) layer

    kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.18.0/eventing-kafka-controller.yaml
    configmap/kafka-broker-config created
    configmap/kafka-channel-config created
    customresourcedefinition.apiextensions.k8s.io/kafkachannels.messaging.knative.dev created
    customresourcedefinition.apiextensions.k8s.io/consumers.internal.kafka.eventing.knative.dev created
    customresourcedefinition.apiextensions.k8s.io/consumergroups.internal.kafka.eventing.knative.dev created
    customresourcedefinition.apiextensions.k8s.io/kafkasinks.eventing.knative.dev created
    customresourcedefinition.apiextensions.k8s.io/kafkasources.sources.knative.dev created
    clusterrole.rbac.authorization.k8s.io/eventing-kafka-source-observer created
    configmap/config-kafka-source-defaults created
    configmap/config-kafka-autoscaler created
    configmap/config-kafka-features created
    configmap/config-kafka-leader-election created
    configmap/kafka-config-logging created
    configmap/config-namespaced-broker-resources created
    configmap/config-tracing configured
    clusterrole.rbac.authorization.k8s.io/knative-kafka-addressable-resolver created
    clusterrole.rbac.authorization.k8s.io/knative-kafka-channelable-manipulator created
    clusterrole.rbac.authorization.k8s.io/kafka-controller created
    serviceaccount/kafka-controller created
    clusterrolebinding.rbac.authorization.k8s.io/kafka-controller created
    clusterrolebinding.rbac.authorization.k8s.io/kafka-controller-addressable-resolver created
    deployment.apps/kafka-controller created
    clusterrole.rbac.authorization.k8s.io/kafka-webhook-eventing created
    serviceaccount/kafka-webhook-eventing created
    clusterrolebinding.rbac.authorization.k8s.io/kafka-webhook-eventing created
    mutatingwebhookconfiguration.admissionregistration.k8s.io/defaulting.webhook.kafka.eventing.knative.dev created
    mutatingwebhookconfiguration.admissionregistration.k8s.io/pods.defaulting.webhook.kafka.eventing.knative.dev created
    secret/kafka-webhook-eventing-certs created
    validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.kafka.eventing.knative.dev created
    deployment.apps/kafka-webhook-eventing created
    service/kafka-webhook-eventing created
    kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.18.0/eventing-kafka-channel.yaml
    configmap/config-kafka-channel-data-plane created
    clusterrole.rbac.authorization.k8s.io/knative-kafka-channel-data-plane created
    serviceaccount/knative-kafka-channel-data-plane created
    clusterrolebinding.rbac.authorization.k8s.io/knative-kafka-channel-data-plane created
    statefulset.apps/kafka-channel-dispatcher created
    deployment.apps/kafka-channel-receiver created
    service/kafka-channel-ingress created

    Install a Broker layer

    kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.18.0/eventing-kafka-broker.yaml
    configmap/config-kafka-broker-data-plane created
    clusterrole.rbac.authorization.k8s.io/knative-kafka-broker-data-plane created
    serviceaccount/knative-kafka-broker-data-plane created
    clusterrolebinding.rbac.authorization.k8s.io/knative-kafka-broker-data-plane created
    statefulset.apps/kafka-broker-dispatcher created
    deployment.apps/kafka-broker-receiver created
    service/kafka-broker-ingress created
    Reference

    please check sts

    root@ay-k3s01:~# kubectl -n knative-eventing  get sts
    NAME                       READY   AGE
    kafka-broker-dispatcher    1/1     19m
    kafka-channel-dispatcher   0/0     22m

    some sts replia is 0, please check

    [Optional] Install Eventing extensions

    • kafka sink
    kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.18.0/eventing-kafka-sink.yaml
    Reference

    for more information, you can check 🔗https://knative.dev/docs/eventing/sinks/kafka-sink/

    • kafka source
    kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.18.0/eventing-kafka-source.yaml
    Reference

    for more information, you can check 🔗https://knative.dev/docs/eventing/sources/kafka-source/

    Display Broker Message

    Flow

    flowchart LR
        A[Curl] -->|HTTP| B{Broker}
        B -->|Subscribe| D[Trigger1]
        B -->|Subscribe| E[Trigger2]
        B -->|Subscribe| F[Trigger3]
        E --> G[Display Service]

    Setps

    1. Create Broker Setting

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kafka-broker-config
      namespace: knative-eventing
    data:
      default.topic.partitions: "10"
      default.topic.replication.factor: "1"
      bootstrap.servers: "kafka.database.svc.cluster.local:9092" #kafka service address
      default.topic.config.retention.ms: "3600"
    EOF

    2. Create Broker

    kubectl apply -f - <<EOF
    apiVersion: eventing.knative.dev/v1
    kind: Broker
    metadata:
      annotations:
        eventing.knative.dev/broker.class: Kafka
      name: first-broker
      namespace: kserve-test
    spec:
      config:
        apiVersion: v1
        kind: ConfigMap
        name: kafka-broker-config
        namespace: knative-eventing
    EOF

    deadletterSink:

    3. Create Trigger

    kubectl apply -f - <<EOF
    apiVersion: eventing.knative.dev/v1
    kind: Trigger
    metadata:
      name: display-service-trigger
      namespace: kserve-test
    spec:
      broker: first-broker
      subscriber:
        ref:
          apiVersion: serving.knative.dev/v1
          kind: Service
          name: event-display
    EOF

    4. Create Sink Service (Display Message)

    kubectl apply -f - <<EOF
    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: event-display
      namespace: kserve-test
    spec:
      template:
        spec:
          containers:
            - image: gcr.io/knative-releases/knative.dev/eventing/cmd/event_display
    EOF

    5. Test

    kubectl run curl-test --image=curlimages/curl -it --rm --restart=Never -- \
      -v "http://kafka-broker-ingress.knative-eventing.svc.cluster.local/kserve-test/first-broker" \
      -X POST \
      -H "Ce-Id: $(date +%s)" \
      -H "Ce-Specversion: 1.0" \
      -H "Ce-Type: test.type" \
      -H "Ce-Source: curl-test" \
      -H "Content-Type: application/json" \
      -d '{"test": "Broker is working"}'

    6. Check message

    kubectl -n kserve-test logs -f deploy/event-display-00001-deployment 
    2025/07/02 09:01:25 Failed to read tracing config, using the no-op default: empty json tracing config
    ☁️  cloudevents.Event
    Context Attributes,
      specversion: 1.0
      type: test.type
      source: curl-test
      id: 1751446880
      datacontenttype: application/json
    Extensions,
      knativekafkaoffset: 6
      knativekafkapartition: 6
    Data,
      {
        "test": "Broker is working"
      }

    Kafka Broker Invoke ISVC

    1. Prepare RBAC

    • create cluster role to access CRD isvc
    kubectl apply -f - <<EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kserve-access-for-knative
    rules:
    - apiGroups: ["serving.kserve.io"]
      resources: ["inferenceservices", "inferenceservices/status"]
      verbs: ["get", "list", "watch"]
    EOF
    • create rolebinding and grant privileges
    kubectl apply -f - <<EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kafka-controller-kserve-access
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kserve-access-for-knative
    subjects:
    - kind: ServiceAccount
      name: kafka-controller
      namespace: knative-eventing
    EOF

    2. Create Broker Setting

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kafka-broker-config
      namespace: knative-eventing
    data:
      default.topic.partitions: "10"
      default.topic.replication.factor: "1"
      bootstrap.servers: "kafka.database.svc.cluster.local:9092" #kafka service address
      default.topic.config.retention.ms: "3600"
    EOF

    3. Create Broker

    kubectl apply -f - <<EOF
    apiVersion: eventing.knative.dev/v1
    kind: Broker
    metadata:
      annotations:
        eventing.knative.dev/broker.class: Kafka
      name: isvc-broker
      namespace: kserve-test
    spec:
      config:
        apiVersion: v1
        kind: ConfigMap
        name: kafka-broker-config
        namespace: knative-eventing
      delivery:
        deadLetterSink:
          ref:
            apiVersion: serving.knative.dev/v1
            kind: Service
            name: event-display
    EOF

    4. Create InferenceService

    Reference

    you can create isvc first-tourchserve service, by following 🔗link

    5. Create Trigger

    kubectl apply -f - << EOF
    apiVersion: eventing.knative.dev/v1
    kind: Trigger
    metadata:
      name: kserve-trigger
      namespace: kserve-test
    spec:
      broker: isvc-broker
      filter:
        attributes:
          type: prediction-request
      subscriber:
        uri: http://first-torchserve.kserve-test.svc.cluster.local/v1/models/mnist:predict
    EOF

    6. Test

    Normally, we can invoke first-tourchserve by executing

    export MASTER_IP=192.168.100.112
    export ISTIO_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    export SERVICE_HOSTNAME=$(kubectl -n kserve-test get inferenceservice first-torchserve  -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    # http://first-torchserve.kserve-test.example.com 
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" "http://${MASTER_IP}:${ISTIO_INGRESS_PORT}/v1/models/mnist:predict" -d @./mnist-input.json

    Now, you can access model by executing

    export KAFKA_BROKER_INGRESS_PORT=$(kubectl -n knative-eventing get service kafka-broker-ingress -o jsonpath='{.spec.ports[?(@.name=="http-container")].nodePort}')
    curl -v "http://${MASTER_IP}:${KAFKA_BROKER_INGRESS_PORT}/kserve-test/isvc-broker" \
      -X POST \
      -H "Ce-Id: $(date +%s)" \
      -H "Ce-Specversion: 1.0" \
      -H "Ce-Type: prediction-request" \
      -H "Ce-Source: event-producer" \
      -H "Content-Type: application/json" \
      -d @./mnist-input.json 

    please check kafka

    # list all topics, find suffix is `isvc-broker` -> knative-broker-kserve-test-isvc-broker
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
        'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --list'
    # retrieve msg from that topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --consumer.config $CLIENT_CONFIG_FILE --topic knative-broker-kserve-test-isvc-broker --from-beginning'

    And then, you could see

    {
        "instances": [
            {
                "data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
            }
        ]
    }
    {
        "predictions": [
            2
        ]
    }

    Subsections of Plugin

    Subsections of Eventing Kafka Broker

    Prepare Dev Environment

    1. update go -> 1.24

    2. install ko -> 1.8.0

    go install github.com/google/ko@latest
    # wget https://github.com/ko-build/ko/releases/download/v0.18.0/ko_0.18.0_Linux_x86_64.tar.gz
    # tar -xzf ko_0.18.0_Linux_x86_64.tar.gz  -C /usr/local/bin/ko
    # cp /usr/local/bin/ko/ko /root/bin
    1. protoc
    PB_REL="https://github.com/protocolbuffers/protobuf/releases"
    curl -LO $PB_REL/download/v30.2/protoc-30.2-linux-x86_64.zip
    # mkdir -p ${HOME}/bin/
    mkdir -p /usr/local/bin/protoc
    unzip protoc-30.2-linux-x86_64.zip -d /usr/local/bin/protoc
    cp /usr/local/bin/protoc/bin/protoc /root/bin
    # export PATH="$PATH:/root/bin"
    rm -rf protoc-30.2-linux-x86_64.zip
    1. protoc-gen-go -> 1.5.4
    go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
    export GOPATH=/usr/local/go/bin
    1. copy some code
    mkdir -p ${GOPATH}/src/knative.dev
    cd ${GOPATH}/src/knative.dev
    git clone git@github.com:knative/eventing.git # clone eventing repo
    git clone git@github.com:AaronYang0628/eventing-kafka-broker.git
    cd eventing-kafka-broker
    git remote add upstream https://github.com/knative-extensions/eventing-kafka-broker.git
    git remote set-url --push upstream no_push
    export KO_DOCKER_REPO=docker-registry.lab.zverse.space/data-and-computing/ay-dev

    Build Async Preidction Flow

    Flow

    flowchart LR
        A[User Curl] -->|HTTP| B{ISVC-Broker:Kafka}
        B -->|Subscribe| D[Trigger1]
        B -->|Subscribe| E[Kserve-Triiger]
        B -->|Subscribe| F[Trigger3]
        E --> G[Mnist Service]
        G --> |Kafka-Sink| B

    Setps

    1. Create Broker Setting

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kafka-broker-config
      namespace: knative-eventing
    data:
      default.topic.partitions: "10"
      default.topic.replication.factor: "1"
      bootstrap.servers: "kafka.database.svc.cluster.local:9092" #kafka service address
      default.topic.config.retention.ms: "3600"
    EOF

    2. Create Broker

    kubectl apply -f - <<EOF
    apiVersion: eventing.knative.dev/v1
    kind: Broker
    metadata:
      annotations:
        eventing.knative.dev/broker.class: Kafka
      name: isvc-broker
      namespace: kserve-test
    spec:
      config:
        apiVersion: v1
        kind: ConfigMap
        name: kafka-broker-config
        namespace: knative-eventing
    EOF

    3. Create Trigger

    kubectl apply -f - << EOF
    apiVersion: eventing.knative.dev/v1
    kind: Trigger
    metadata:
      name: kserve-trigger
      namespace: kserve-test
    spec:
      broker: isvc-broker
      filter:
        attributes:
          type: prediction-request-udf-attr # you can change this
      subscriber:
        uri: http://prediction-and-sink.kserve-test.svc.cluster.local/v1/models/mnist:predict
    EOF

    4. Create InferenceService

     1kubectl apply -f - <<EOF
     2apiVersion: serving.kserve.io/v1beta1
     3kind: InferenceService
     4metadata:
     5  name: prediction-and-sink
     6  namespace: kserve-test
     7spec:
     8  predictor:
     9    model:
    10      modelFormat:
    11        name: pytorch
    12      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
    13  transformer:
    14    containers:
    15      - image: docker-registry.lab.zverse.space/data-and-computing/ay-dev/msg-transformer:dev9
    16        name: kserve-container
    17        env:
    18        - name: KAFKA_BOOTSTRAP_SERVERS
    19          value: kafka.database.svc.cluster.local
    20        - name: KAFKA_TOPIC
    21          value: test-topic # result will be saved in this topic
    22        - name: REQUEST_TRACE_KEY
    23          value: test-trace-id # using this key to retrieve preidtion result
    24        command:
    25          - "python"
    26          - "-m"
    27          - "model"
    28        args:
    29          - --model_name
    30          - mnist
    31EOF
    root@ay-k3s01:~# kubectl -n kserve-test get pod
    NAME                                                              READY   STATUS    RESTARTS   AGE
    prediction-and-sink-predictor-00001-deployment-f64bb76f-jqv4m     2/2     Running   0          3m46s
    prediction-and-sink-transformer-00001-deployment-76cccd867lksg9   2/2     Running   0          4m3s

    Source code of the docker-registry.lab.zverse.space/data-and-computing/ay-dev/msg-transformer:dev9 could be found 🔗here

    [Optional] 5. Invoke InferenceService

    • preparation
    wget -O ./mnist-input.json https://raw.githubusercontent.com/kserve/kserve/refs/heads/master/docs/samples/v1beta1/torchserve/v1/imgconv/input.json
    SERVICE_NAME=prediction-and-sink
    MODEL_NAME=mnist
    INPUT_PATH=@./mnist-input.json
    PLAIN_SERVICE_HOSTNAME=$(kubectl -n kserve-test get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    • fire!!
    export INGRESS_HOST=192.168.100.112
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    curl -v -H "Host: ${PLAIN_SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
    curl -v -H "Host: ${PLAIN_SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
    *   Trying 192.168.100.112:31855...
    * Connected to 192.168.100.112 (192.168.100.112) port 31855
    > POST /v1/models/mnist:predict HTTP/1.1
    > Host: prediction-and-sink.kserve-test.ay.test.dev
    > User-Agent: curl/8.5.0
    > Accept: */*
    > Content-Type: application/json
    > Content-Length: 401
    > 
    < HTTP/1.1 200 OK
    < content-length: 19
    < content-type: application/json
    < date: Wed, 02 Jul 2025 08:55:05 GMT,Wed, 02 Jul 2025 08:55:04 GMT
    < server: istio-envoy
    < x-envoy-upstream-service-time: 209
    < 
    * Connection #0 to host 192.168.100.112 left intact
    {"predictions":[2]}

    6. Invoke Broker

    • preparation
    cat > image-with-trace-id.json << EOF
    {
        "test-trace-id": "16ec3446-48d6-422e-9926-8224853e84a7",
        "instances": [
            {
                "data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
            }
        ]
    }
    EOF
    • fire!!
    export MASTER_IP=192.168.100.112
    export KAFKA_BROKER_INGRESS_PORT=$(kubectl -n knative-eventing get service kafka-broker-ingress -o jsonpath='{.spec.ports[?(@.name=="http-container")].nodePort}')
    curl -v "http://${MASTER_IP}:${KAFKA_BROKER_INGRESS_PORT}/kserve-test/isvc-broker" \
      -X POST \
      -H "Ce-Id: $(date +%s)" \
      -H "Ce-Specversion: 1.0" \
      -H "Ce-Type: prediction-request-udf-attr" \
      -H "Ce-Source: event-producer" \
      -H "Content-Type: application/json" \
      -d @./image-with-trace-id.json 
    • check input data in kafka topic knative-broker-kserve-test-isvc-broker
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --consumer.config $CLIENT_CONFIG_FILE --topic knative-broker-kserve-test-isvc-broker --from-beginning'
    {
        "test-trace-id": "16ec3446-48d6-422e-9926-8224853e84a7",
        "instances": [
        {
            "data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
        }]
    }
    {
        "predictions": [2] // result will be saved in this topic as well
    }
    • check response result in kafka topic test-topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --consumer.config $CLIENT_CONFIG_FILE --topic test-topic --from-beginning'
    {
        "specversion": "1.0",
        "id": "822e3115-0185-4752-9967-f408dda72004",
        "source": "data-and-computing/kafka-sink-transformer",
        "type": "org.zhejianglab.zverse.data-and-computing.kafka-sink-transformer",
        "time": "2025-07-02T08:57:04.133497+00:00",
        "data":
        {
            "predictions": [2]
        },
        "request-host": "prediction-and-sink-transformer.kserve-test.svc.cluster.local",
        "kserve-isvc-name": "prediction-and-sink",
        "kserve-isvc-namespace": "kserve-test",
        "test-trace-id": "16ec3446-48d6-422e-9926-8224853e84a7"
    }

    🪀Software

    Subsections of 🪀Software

    Subsections of Application

    Install Cert Manager

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm binary has installed, if not check 🔗link


    1.get helm repo

    helm repo add cert-manager-repo https://charts.jetstack.io
    helm repo update

    2.install chart

    helm install cert-manager-repo/cert-manager --generate-name --version 1.17.2
    Using Mirror
    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts \
      && helm install ay-helm-mirror/cert-manager --generate-name --version 1.17.2

    for more information, you can check 🔗https://aaronyang0628.github.io/helm-chart-mirror/

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Helm binary has installed, if not check 🔗link


    1.prepare `cert-manager.yaml`

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: cert-manager
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
        chart: cert-manager
        targetRevision: 1.17.2
        helm:
          releaseName: cert-manager
          values: |
            installCRDs: true
            image:
              repository: m.daocloud.io/quay.io/jetstack/cert-manager-controller
              tag: v1.17.2
            webhook:
              image:
                repository: m.daocloud.io/quay.io/jetstack/cert-manager-webhook
                tag: v1.17.2
            cainjector:
              image:
                repository: m.daocloud.io/quay.io/jetstack/cert-manager-cainjector
                tag: v1.17.2
            acmesolver:
              image:
                repository: m.daocloud.io/quay.io/jetstack/cert-manager-acmesolver
                tag: v1.17.2
            startupapicheck:
              image:
                repository: m.daocloud.io/quay.io/jetstack/cert-manager-startupapicheck
                tag: v1.17.2
      destination:
        server: https://kubernetes.default.svc
        namespace: basic-components
    EOF

    3.sync by argocd

    argocd app sync argocd/cert-manager

    4.prepare self-signed.yaml

    kubectl apply  -f - <<EOF
    ---
    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
      namespace: basic-components
      name: self-signed-issuer
    spec:
      selfSigned: {}
    
    ---
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      namespace: basic-components
      name: my-self-signed-ca
    spec:
      isCA: true
      commonName: my-self-signed-ca
      secretName: root-secret
      privateKey:
        algorithm: ECDSA
        size: 256
      issuerRef:
        name: self-signed-issuer
        kind: Issuer
        group: cert-manager.io
    
    ---
    apiVersion: cert-manager.io/v1
    kind: ClusterIssuer
    metadata:
      name: self-signed-ca-issuer
    spec:
      ca:
        secretName: root-secret
    EOF

    Preliminary

    1. Docker|Podman|Buildah has installed, if not check 🔗link


    1.just run

    docker run --name cert-manager -e ALLOW_EMPTY_PASSWORD=yes bitnami/cert-manager:latest
    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    docker run --name cert-manager \
      -e ALLOW_EMPTY_PASSWORD=yes 
      m.daocloud.io/docker.io/bitnami/cert-manager:latest

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install Chart Museum

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm binary has installed, if not check 🔗link


    1.get helm repo

    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
    helm repo update

    2.install chart

    helm install ay-helm-mirror/kube-prometheus-stack --generate-name
    Using Mirror
    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts \
      && helm install ay-helm-mirror/cert-manager --generate-name --version 1.17.2

    for more information, you can check 🔗https://aaronyang0628.github.io/helm-chart-mirror/

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Helm binary has installed, if not check 🔗link


    4. Ingres has installed on argoCD, if not check 🔗link


    5. Minio has installed, if not check 🔗link


    1.prepare `chart-museum-credentials`

    Storage In
    kubectl get namespaces basic-components > /dev/null 2>&1 || kubectl create namespace basic-components
    kubectl -n basic-components create secret generic chart-museum-credentials \
        --from-literal=username=admin \
        --from-literal=password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)
    
    kubectl get namespaces basic-components > /dev/null 2>&1 || kubectl create namespace basic-components
    kubectl -n basic-components create secret generic chart-museum-credentials \
        --from-literal=username=admin \
        --from-literal=password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=aws_access_key_id=$(kubectl -n storage get secret minio-credentials -o jsonpath='{.data.rootUser}' | base64 -d) \
        --from-literal=aws_secret_access_key=$(kubectl -n storage get secret minio-credentials -o jsonpath='{.data.rootPassword}' | base64 -d)
    

    2.prepare `chart-museum.yaml`

    Storage In
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: chart-museum
    spec:
      syncPolicy:
        syncOptions:
          - CreateNamespace=true
      project: default
      source:
        repoURL: https://chartmuseum.github.io/charts
        chart: chartmuseum
        targetRevision: 3.10.3
        helm:
          releaseName: chart-museum
          values: |
            replicaCount: 1
            image:
              repository: ghcr.io/helm/chartmuseum
            env:
              open:
                DISABLE_API: false
                STORAGE: local
                AUTH_ANONYMOUS_GET: true
              existingSecret: "chart-museum-credentials"
              existingSecretMappings:
                BASIC_AUTH_USER: "username"
                BASIC_AUTH_PASS: "password"
            persistence:
              enabled: false
              storageClass: ""
            volumePermissions:
              image:
                registry: m.daocloud.io/docker.io
            ingress:
              enabled: true
              ingressClassName: nginx
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hosts:
                - name: chartmuseum.ay.dev
                  path: /?(.*)
                  tls: true
                  tlsSecret: chartmuseum.ay.dev-tls
      destination:
        server: https://kubernetes.default.svc
        namespace: basic-components
    
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: chart-museum
    spec:
      syncPolicy:
        syncOptions:
          - CreateNamespace=true
      project: default
      source:
        repoURL: https://chartmuseum.github.io/charts
        chart: chartmuseum
        targetRevision: 3.10.3
        helm:
          releaseName: chart-museum
          values: |
            replicaCount: 1
            image:
              repository: ghcr.io/helm/chartmuseum
            env:
              open:
                DISABLE_API: false
                STORAGE: amazon
                STORAGE_AMAZON_ENDPOINT: http://minio-api.ay.dev:32080
                STORAGE_AMAZON_BUCKET: chart-museum
                STORAGE_AMAZON_PREFIX: charts
                STORAGE_AMAZON_REGION: us-east-1
                AUTH_ANONYMOUS_GET: true
              existingSecret: "chart-museum-credentials"
              existingSecretMappings:
                BASIC_AUTH_USER: "username"
                BASIC_AUTH_PASS: "password"
                AWS_ACCESS_KEY_ID: "aws_access_key_id"
                AWS_SECRET_ACCESS_KEY: "aws_secret_access_key"
            persistence:
              enabled: false
              storageClass: ""
            volumePermissions:
              image:
                registry: m.daocloud.io/docker.io
            ingress:
              enabled: true
              ingressClassName: nginx
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hosts:
                - name: chartmuseum.ay.dev
                  path: /?(.*)
                  tls: true
                  tlsSecret: chartmuseum.ay.dev-tls
      destination:
        server: https://kubernetes.default.svc
        namespace: basic-components
    

    3.sync by argocd

    argocd app sync argocd/chart-museum

    install based on docker

    echo  "start from head is important"

    Uploading a Chart Package

    Follow “How to Run” section below to get ChartMuseum up and running at http://localhost:8080

    First create mychart-0.1.0.tgz using the Helm CLI:

    cd mychart/
    helm package .

    Upload mychart-0.1.0.tgz:

    curl --data-binary "@mychart-0.1.0.tgz" http://localhost:8080/api/charts

    If you’ve signed your package and generated a provenance file, upload it with:

    curl --data-binary "@mychart-0.1.0.tgz.prov" http://localhost:8080/api/prov

    Both files can also be uploaded at once (or one at a time) on the /api/charts route using the multipart/form-data format:

    curl -F "chart=@mychart-0.1.0.tgz" -F "prov=@mychart-0.1.0.tgz.prov" http://localhost:8080/api/charts

    You can also use the helm-push plugin:

    helm cm-push mychart/ chartmuseum

    Installing Charts into Kubernetes

    Add the URL to your ChartMuseum installation to the local repository list:

    helm repo add chartmuseum http://localhost:8080

    Search for charts:

    helm search repo chartmuseum/

    Install chart:

    helm install chartmuseum/mychart --generate-name

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install Flink Operator

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. Cert-manager has installed and the clusterissuer has a named self-signed-ca-issuer service, , if not check 🔗link


    1.get helm repo

    helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.11.0/
    helm repo update

    latest version : 🔗https://flink.apache.org/downloads/#apache-flink-kubernetes-operator

    2.install chart

    helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator --set image.repository=apache/flink-kubernetes-operator --set webhook.create=false

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuer service , if not check 🔗link


    4. Ingres has installed on argoCD, if not check 🔗link


    2.prepare `flink-operator.yaml`

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: flink-operator
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://downloads.apache.org/flink/flink-kubernetes-operator-1.11.0
        chart: flink-kubernetes-operator
        targetRevision: 1.11.0
        helm:
          releaseName: flink-operator
          values: |
            image:
              repository: m.daocloud.io/ghcr.io/apache/flink-kubernetes-operator
              pullPolicy: IfNotPresent
              tag: "1.11.0"
          version: v3
      destination:
        server: https://kubernetes.default.svc
        namespace: flink
    EOF

    3.sync by argocd

    argocd app sync argocd/flink-operator

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Deploy GateKeeper Server

    Official Website: https://open-policy-agent.github.io/gatekeeper/website/

    Preliminary

    • Kubernetes 版本必须大于 v1.16

    Components

    Gatekeeper 是基于 Open Policy Agent(OPA) 构建的 Kubernetes 准入控制器,它允许用户定义和实施自定义策略,以控制 Kubernetes 集群中资源的创建、更新和删除操作

    • 核心组件
      • 约束模板(Constraint Templates):定义策略的规则逻辑,使用 Rego 语言编写。它是策略的抽象模板,可以被多个约束实例(Constraint Instance)复用。
      • 约束实例(Constraints Instance):基于约束模板创建的具体策略实例,指定了具体的参数和匹配规则,用于定义哪些资源需要应用该策略。
      • 准入控制器(Admission Controller)(无需修改):拦截 Kubernetes API Server 的请求,根据定义的约束对请求进行评估,如果请求违反了任何约束,则拒绝该请求。

        mvc mvc

        • gatekeeper-audit
          • 定期合规检查:该组件会按照预设的时间间隔,对集群中已存在的所有资源进行全面扫描,以检查它们是否符合所定义的约束规则。(周期性,批量检查)
          • 生成审计报告:在完成资源扫描后,gatekeeper-audit 会生成详细的审计报告,其中会明确指出哪些资源违反了哪些约束规则,方便管理员及时了解集群的合规状态。
        • gatekeeper-controller-manager
          • 实时准入控制:作为准入控制器,gatekeeper-controller-manager 在资源创建、更新或删除操作发起时,会实时拦截这些请求。它会依据预定义的约束模板和约束规则,对请求中的资源进行即时评估。(实时性,事件驱动)
          • 处理决策请求:根据评估结果,如果请求中的资源符合所有约束规则,gatekeeper-controller-manager 会允许该请求继续执行;若违反了任何规则,它会拒绝该请求,避免违规资源进入集群。

    Features

    1. 约束管理

      • 自定义约束模板:用户可以使用 Rego 语言编写自定义的约束模板,实现各种复杂的策略逻辑。

        例如,可以定义策略要求所有的命名空间 NameSpace 必须设置特定的标签,或者限制某些命名空间只能使用特定的镜像。

            ```shell
            kubectl get constrainttemplates
            kubectl get constraints
            ```
        
            ```shell
            kubectl apply -f - <<EOF
            apiVersion: templates.gatekeeper.sh/v1
            kind: ConstraintTemplate
            metadata:
            name: k8srequiredlabels
            spec:
                crd:
                    spec:
                    names:
                        kind: K8sRequiredLabels
                    validation:
                        openAPIV3Schema:
                            type: object
                            properties:
                                labels:
                                    type: array
                                    items:
                                        type: string
            targets:
                - target: admission.k8s.gatekeeper.sh
                rego: |
                    package k8srequiredlabels
        
                    violation[{"msg": msg, "details": {"missing_labels": missing}}] {
                        provided := {label | input.review.object.metadata.labels[label]}
                        required := {label | label := input.parameters.labels[_]}
                        missing := required - provided
                        count(missing) > 0
                        msg := sprintf("you must provide labels: %v", [missing])
                    }
            EOF
            ```
        

      • 约束模板复用:约束模板可以被多个约束实例复用,提高了策略的可维护性和复用性。

        例如,可以创建一个通用的标签约束模板,然后在不同的命名空间 NameSpace 中创建不同的约束实例,要求不同的标签。

            要求所有的命名空间 NameSpace 必须存在标签“gatekeeper”
        
            ```yaml
            apiVersion: constraints.gatekeeper.sh/v1beta1
            kind: K8sRequiredLabels
            metadata:
            name: ns-must-have-gk-label
            spec:
                enforcementAction: dryrun
                match:
                    kinds:
                    - apiGroups: [""]
                        kinds: ["Namespace"]
                parameters:
                    labels: ["gatekeeper"]
            ```
        

      • 约束更新:当约束模板或约束发生更新时,Gatekeeper 会自动重新评估所有相关的资源,确保策略的实时生效。

    2. 资源控制

      • 准入拦截:当有资源创建或更新请求时,Gatekeeper 会实时拦截请求,并根据策略进行评估。如果请求违反了策略,会立即拒绝请求,并返回详细的错误信息,帮助用户快速定位问题。

      • 资源创建和更新限制:Gatekeeper 可以阻止不符合策略的资源创建和更新请求。

        例如,如果定义了一个策略要求所有的 Deployment 必须设置资源限制(requests 和 limits),那么当用户尝试创建或更新一个没有设置资源限制的 Deployment 时,请求将被拒绝。

        通过enforcementAction来控制,可选:dryrun | deny | warn

        check https://open-policy-agent.github.io/gatekeeper-library/website/validation/containerlimits

      • 资源类型过滤:可以通过约束的 match 字段指定需要应用策略的资源类型和命名空间。

        例如,可以只对特定命名空间中的 Pod 应用策略,或者只对特定 API 组和版本的资源应用策略。

        可以通过syncSet (同步配置)来指定过滤和忽略那些资源

            ```yaml
            apiVersion: config.gatekeeper.sh/v1alpha1
            kind: Config
            metadata:
            name: config
            namespace: "gatekeeper-system"
            spec:
            sync:
                syncOnly:
                - group: ""
                    version: "v1"
                    kind: "Namespace"
                - group: ""
                    version: "v1"
                    kind: "Pod"
            match:
                - excludedNamespaces: ["kube-*"]
                processes: ["*"]
            ```
        

    3. 合规性保证

      • 行业标准和自定义规范:Gatekeeper 可以确保 Kubernetes 集群中的资源符合行业标准和管理员要求的内部的安全规范。

        例如,可以定义策略要求所有的容器必须使用最新的安全补丁,或者要求所有的存储卷必须进行加密。

        Gatekeeper 已经提供近50种各类资源限制的约束策略,可以通过访问https://open-policy-agent.github.io/gatekeeper-library/website/ 查看并获得

      • 审计和报告:Gatekeeper 可以记录所有的策略评估结果,方便管理员进行审计和报告。通过查看审计日志,管理员可以了解哪些资源违反了策略,以及违反了哪些策略。

      • 审计导出:审计日志可以导出并接入下游。

        详细信息可以查看https://open-policy-agent.github.io/gatekeeper/website/docs/pubsub/

    Installation

    install from
    kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.18.2/deploy/gatekeeper.yaml
    helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
    helm install gatekeeper/gatekeeper --name-template=gatekeeper --namespace gatekeeper-system --create-namespace

    Make sure that:

    • You have Docker version 20.10 or later installed.
    • Your kubectl context is set to the desired installation cluster.
    • You have a container registry you can write to that is readable by the target cluster.
    git clone https://github.com/open-policy-agent/gatekeeper.git \
    && cd gatekeeper 
    • Build and push Gatekeeper image:
    export DESTINATION_GATEKEEPER_IMAGE=<add registry like "myregistry.docker.io/gatekeeper">
    make docker-buildx REPOSITORY=$DESTINATION_GATEKEEPER_IMAGE OUTPUT_TYPE=type=registry
    • And the deploy
    make deploy REPOSITORY=$DESTINATION_GATEKEEPER_IMAGE

    Subsections of Binary

    Argo Workflow Binary

    MIRROR="files.m.daocloud.io/"
    VERSION=v3.5.4
    curl -sSLo argo-linux-amd64.gz "https://${MIRROR}github.com/argoproj/argo-workflows/releases/download/${VERSION}/argo-linux-amd64.gz"
    gunzip argo-linux-amd64.gz
    chmod u+x argo-linux-amd64
    mkdir -p ${HOME}/bin
    mv -f argo-linux-amd64 ${HOME}/bin/argo
    rm -f argo-linux-amd64.gz

    ArgoCD Binary

    MIRROR="files.m.daocloud.io/"
    VERSION=v2.9.3
    [ $(uname -m) = x86_64 ] && curl -sSLo argocd "https://${MIRROR}github.com/argoproj/argo-cd/releases/download/${VERSION}/argocd-linux-amd64"
    [ $(uname -m) = aarch64 ] && curl -sSLo argocd "https://${MIRROR}github.com/argoproj/argo-cd/releases/download/${VERSION}/argocd-linux-arm64"
    chmod u+x argocd
    mkdir -p ${HOME}/bin
    mv -f argocd ${HOME}/bin

    [Optional] add to PATH

    cat >> ~/.bashrc  << EOF
    export PATH=$PATH:/root/bin
    EOF
    source ~/.bashrc

    Golang Binary

    # sudo rm -rf /usr/local/go  # 删除旧版本
    wget https://go.dev/dl/go1.24.4.linux-amd64.tar.gz
    tar -C /usr/local -xzf go1.24.4.linux-amd64.tar.gz
    vim ~/.bashrc
    export PATH=$PATH:/usr/local/go/bin
    source ~/.bashrc
    rm -rf ./go1.24.4.linux-amd64.tar.gz

    Helm Binary

    ARCH_IN_FILE_NAME=linux-amd64
    FILE_NAME=helm-v3.18.3-${ARCH_IN_FILE_NAME}.tar.gz
    curl -sSLo ${FILE_NAME} "https://files.m.daocloud.io/get.helm.sh/${FILE_NAME}"
    tar zxf ${FILE_NAME}
    mkdir -p ${HOME}/bin
    mv -f ${ARCH_IN_FILE_NAME}/helm ${HOME}/bin
    rm -rf ./${FILE_NAME}
    rm -rf ./${ARCH_IN_FILE_NAME}
    chmod u+x ${HOME}/bin/helm

    JQ Binary

    jq

    JQ_VERSION=1.7
    JQ_BINARY=jq-linux64
    wget https://github.com/stedolan/jq/releases/download/jq-${JQ_VERSION}/${JQ_BINARY}.tar.gz -O - | tar xz && mv ${JQ_BINARY} /usr/bin/jq

    Kind Binary

    MIRROR="files.m.daocloud.io/"
    VERSION=v0.29.0
    [ $(uname -m) = x86_64 ] && curl -sSLo kind "https://${MIRROR}github.com/kubernetes-sigs/kind/releases/download/${VERSION}/kind-linux-amd64"
    [ $(uname -m) = aarch64 ] && curl -sSLo kind "https://${MIRROR}github.com/kubernetes-sigs/kind/releases/download/${VERSION}/kind-linux-arm64"
    chmod u+x kind
    mkdir -p ${HOME}/bin
    mv -f kind ${HOME}/bin

    Krew Binary

    cd "$(mktemp -d)" &&
    OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
    ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
    KREW="krew-${OS}_${ARCH}" &&
    curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
    tar zxvf "${KREW}.tar.gz" &&
    ./"${KREW}" install krew

    Kubectl Binary

    MIRROR="files.m.daocloud.io/"
    VERSION=$(curl -L -s https://${MIRROR}dl.k8s.io/release/stable.txt)
    [ $(uname -m) = x86_64 ] && curl -sSLo kubectl "https://${MIRROR}dl.k8s.io/release/${VERSION}/bin/linux/amd64/kubectl"
    [ $(uname -m) = aarch64 ] && curl -sSLo kubectl "https://${MIRROR}dl.k8s.io/release/${VERSION}/bin/linux/arm64/kubectl"
    chmod u+x kubectl
    mkdir -p ${HOME}/bin
    mv -f kubectl ${HOME}/bin

    Maven Binary

    wget https://dlcdn.apache.org/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz
    tar xzf apache-maven-3.9.6-bin.tar.gz -C /usr/local
    ln -sfn /usr/local/apache-maven-3.9.6/bin/mvn /root/bin/mvn  
    export PATH=$PATH:/usr/local/apache-maven-3.9.6/bin
    source ~/.bashrc

    Minikube Binary

    MIRROR="files.m.daocloud.io/"
    [ $(uname -m) = x86_64 ] && curl -sSLo minikube "https://${MIRROR}storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64"
    [ $(uname -m) = aarch64 ] && curl -sSLo minikube "https://${MIRROR}storage.googleapis.com/minikube/releases/latest/minikube-linux-arm64"
    chmod u+x minikube
    mkdir -p ${HOME}/bin
    mv -f minikube ${HOME}/bin

    Open Java

    mkdir -p /etc/apt/keyrings && \
    wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | gpg --dearmor -o /etc/apt/keyrings/adoptium.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/adoptium.gpg arch=amd64] https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list > /dev/null && \
    apt-get update && \
    apt-get install -y temurin-21-jdk && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

    YQ Binary

    YQ_VERSION=v4.40.5
    YQ_BINARY=yq_linux_amd64
    wget https://github.com/mikefarah/yq/releases/download/${YQ_VERSION}/${YQ_BINARY}.tar.gz -O - | tar xz && mv ${YQ_BINARY} /usr/bin/yq

    CICD

    Articles

    FQA

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Subsections of CICD

    Install Argo CD

    Preliminary

    • Kubernets has installed, if not check 🔗link
    • Helm binary has installed, if not check 🔗link

    1. install argoCD binary

    2. install components

    Install By
    1. Prepare argocd.values.yaml
    crds:
      install: true
      keep: false
    global:
      revisionHistoryLimit: 3
      image:
        repository: m.daocloud.io/quay.io/argoproj/argocd
        imagePullPolicy: IfNotPresent
    redis:
      enabled: true
      image:
        repository: m.daocloud.io/docker.io/library/redis
      exporter:
        enabled: false
        image:
          repository: m.daocloud.io/bitnami/redis-exporter
      metrics:
        enabled: false
    redis-ha:
      enabled: false
      image:
        repository: m.daocloud.io/docker.io/library/redis
      configmapTest:
        repository: m.daocloud.io/docker.io/koalaman/shellcheck
      haproxy:
        enabled: false
        image:
          repository: m.daocloud.io/docker.io/library/haproxy
      exporter:
        enabled: false
        image: m.daocloud.io/docker.io/oliver006/redis_exporter
    dex:
      enabled: true
      image:
        repository: m.daocloud.io/ghcr.io/dexidp/dex
    
    2. Install argoCD
    helm install argo-cd argo-cd \
      --namespace argocd \
      --create-namespace \
      --version 5.46.7 \
      --repo https://aaronyang0628.github.io/helm-chart-mirror/charts \
      --values argocd.values.yaml \
      --atomic
    

    by default you can install argocd by this link

    kubectl create namespace argocd \
    && kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

    Or, you can use your won flle link.

    4. prepare argocd-server-external.yaml

    Install By
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app.kubernetes.io/component: server
        app.kubernetes.io/instance: argo-cd
        app.kubernetes.io/name: argocd-server-external
        app.kubernetes.io/part-of: argocd
        app.kubernetes.io/version: v2.8.4
      name: argocd-server-external
    spec:
      ports:
      - name: https
        port: 443
        protocol: TCP
        targetPort: 8080
        nodePort: 30443
      selector:
        app.kubernetes.io/instance: argo-cd
        app.kubernetes.io/name: argocd-server
      type: NodePort
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app.kubernetes.io/component: server
        app.kubernetes.io/instance: argo-cd
        app.kubernetes.io/name: argocd-server-external
        app.kubernetes.io/part-of: argocd
        app.kubernetes.io/version: v2.8.4
      name: argocd-server-external
    spec:
      ports:
      - name: https
        port: 443
        protocol: TCP
        targetPort: 8080
        nodePort: 30443
      selector:
        app.kubernetes.io/name: argocd-server
      type: NodePort

    5. create external service

    kubectl -n argocd apply -f argocd-server-external.yaml

    6. get argocd initialized password

    kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

    7. login argocd

    ARGOCD_PASS=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    MASTER_IP=$(kubectl get nodes --selector=node-role.kubernetes.io/control-plane -o jsonpath='{$.items[0].status.addresses[?(@.type=="InternalIP")].address}')
    argocd login --insecure --username admin $MASTER_IP:30443 --password $ARGOCD_PASS
    open https://<$local_ip:localhost>:30443

    Install Argo WorkFlow

    Preliminary

    • Kubernets has installed, if not check 🔗link
    • Argo CD has installed, if not check 🔗link
    • cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuerservice, , if not check 🔗link

    1. prepare argo-workflows.yaml

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: argo-workflows
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://argoproj.github.io/argo-helm
        chart: argo-workflows
        targetRevision: 0.40.11
        helm:
          releaseName: argo-workflows
          values: |
            crds:
              install: true
              keep: false
            singleNamespace: false
            controller:
              image:
                registry: m.daocloud.io/quay.io
              workflowNamespaces:
                - business-workflows
            executor:
              image:
                registry: m.daocloud.io/quay.io
            workflow:
              serviceAccount:
                create: true
              rbac:
                create: true
            server:
              enabled: true
              image:
                registry: m.daocloud.io/quay.io
              ingress:
                enabled: true
                ingressClassName: nginx
                annotations:
                  cert-manager.io/cluster-issuer: self-signed-ca-issuer
                  nginx.ingress.kubernetes.io/rewrite-target: /$1
                hosts:
                  - argo-workflows.dev.geekcity.tech
                paths:
                  - /?(.*)
                tls:
                  - secretName: argo-workflows-tls
                    hosts:
                      - argo-workflows.dev.geekcity.tech
              authModes:
                - server
              sso:
                enabled: false
      destination:
        server: https://kubernetes.default.svc
        namespace: workflows

    2. install argo workflow binary

    kubectl get namespace business-workflows > /dev/null 2>&1 || kubectl create namespace business-workflows

    4. apply to k8s

    kubectl -n argocd apply -f argo-workflows.yaml

    5. sync by argocd

    argocd app sync argocd/argo-workflows

    6. check workflow status

    # list all flows
    argo -n business-workflows list
    # get specific flow status
    argo -n business-workflows get <$flow_name>
    # get specific flow log
    argo -n business-workflows logs <$flow_name>
    # get specific flow log continuously
    argo -n business-workflows logs <$flow_name> --watch

    Container

    Articles

    FQA

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Subsections of Container

    Install Buildah

    Reference

    Installation

    Caution

    If you already have something wrong with apt update, please check the following 🔗link, adding docker source wont help you to solve that problem.

    sudo dnf update -y 
    sudo dnf config-manager --add-repo=https://download.docker.com/linux/fedora/docker-ce.repo
    sudo dnf install docker-ce docker-ce-cli containerd.io 

    Once the installation is complete, start the Docker service

    sudo systemctl enable docker
    sudo systemctl start docker
    sudo yum install -y yum-utils
    sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo 
    sudo yum install docker-ce --nobest --allowerasing -y

    Once the installation is complete, start the Docker service

    sudo systemctl enable docker
    sudo systemctl start docker
    1. Set up Docker’s apt repository.
    # Add Docker's official GPG key:
    sudo apt-get update
    sudo apt-get install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    1. Install the Docker packages.

    latest version

    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

    specific version

     apt-cache madison docker-ce | awk '{ print $3 }'
     echo $DOCKER_VERSION=5:28.2.1-1~XXXXX
     sudo apt-get install docker-ce=$DOCKER_VERSION docker-ce-cli=$DOCKER_VERSION containerd.io docker-buildx-plugin docker-compose-plugin
    1. Verify that the installation is successful by running the hello-world image:
    sudo docker run hello-world

    Info

    • Docker Image saved in /var/lib/docker

    Mirror

    You can modify /etc/docker/daemon.json

    {
      "registry-mirrors": ["<$mirror_url>"]
    }

    for example:

    • https://docker.mirrors.ustc.edu.cn

    Install Docker Engine

    Reference

    Installation

    Caution

    If you already have something wrong with apt update, please check the following 🔗link, adding docker source wont help you to solve that problem.

    sudo dnf update -y 
    sudo dnf config-manager --add-repo=https://download.docker.com/linux/fedora/docker-ce.repo
    sudo dnf install docker-ce docker-ce-cli containerd.io 

    Once the installation is complete, start the Docker service

    sudo systemctl enable docker
    sudo systemctl start docker
    sudo yum install -y yum-utils
    sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo 
    sudo yum install docker-ce --nobest --allowerasing -y

    Once the installation is complete, start the Docker service

    sudo systemctl enable docker
    sudo systemctl start docker
    1. Set up Docker’s apt repository.
    # Add Docker's official GPG key:
    sudo apt-get update
    sudo apt-get install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    1. Install the Docker packages.

    latest version

    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

    specific version

     apt-cache madison docker-ce | awk '{ print $3 }'
     echo $DOCKER_VERSION=5:28.2.1-1~XXXXX
     sudo apt-get install docker-ce=$DOCKER_VERSION docker-ce-cli=$DOCKER_VERSION containerd.io docker-buildx-plugin docker-compose-plugin
    1. Verify that the installation is successful by running the hello-world image:
    sudo docker run hello-world

    Info

    • Docker Image saved in /var/lib/docker

    Mirror

    You can modify /etc/docker/daemon.json

    {
      "registry-mirrors": ["<$mirror_url>"]
    }

    for example:

    • https://docker.mirrors.ustc.edu.cn

    Install Podman

    Reference

    Installation

    Caution

    If you already have something wrong with apt update, please check the following 🔗link, adding docker source wont help you to solve that problem.

    sudo dnf update -y 
    sudo dnf -y install podman
    sudo yum install -y podman
    sudo apt-get update
    sudo apt-get -y install podman

    Run Params

    start an container

    podman run [params]

    -rm: delete if failed

    -v: load a volume

    Example

    podman run --rm\
          -v /root/kserve/iris-input.json:/tmp/iris-input.json \
          --privileged \
         -e MODEL_NAME=sklearn-iris \
         -e INPUT_PATH=/tmp/iris-input.json \
         -e SERVICE_HOSTNAME=sklearn-iris.kserve-test.example.com \
          -it m.daocloud.io/docker.io/library/golang:1.22  sh -c "command A; command B; exec bash"

    Subsections of Database

    Install Clickhouse

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. argoCD has installed, if not check 🔗link


    3. cert-manager has installed on argocd and the clusterissuer has a named `self-signed-ca-issuer`service, , if not check 🔗link


    1.prepare admin credentials secret

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    kubectl -n database create secret generic clickhouse-admin-credentials \
        --from-literal=password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    2.prepare `deploy-clickhouse.yaml`

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: clickhouse
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: clickhouse
        targetRevision: 4.5.1
        helm:
          releaseName: clickhouse
          values: |
            serviceAccount:
              name: clickhouse
            image:
              registry: m.daocloud.io/docker.io
              pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
            zookeeper:
              enabled: true
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
              replicaCount: 3
              persistence:
                enabled: true
                storageClass: nfs-external
                size: 8Gi
              volumePermissions:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                  pullPolicy: IfNotPresent
            shards: 2
            replicaCount: 3
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: clickhouse.dev.geekcity.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            persistence:
              enabled: false
            resources:
              requests:
                cpu: 2
                memory: 512Mi
              limits:
                cpu: 3
                memory: 1024Mi
            auth:
              username: admin
              existingSecret: clickhouse-admin-credentials
              existingSecretKey: password
            metrics:
              enabled: true
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
              serviceMonitor:
                enabled: true
                namespace: monitor
                jobLabel: clickhouse
                selector:
                  app.kubernetes.io/name: clickhouse
                  app.kubernetes.io/instance: clickhouse
                labels:
                  release: prometheus-stack
            extraDeploy:
              - |
                apiVersion: apps/v1
                kind: Deployment
                metadata:
                  name: clickhouse-tool
                  namespace: database
                  labels:
                    app.kubernetes.io/name: clickhouse-tool
                spec:
                  replicas: 1
                  selector:
                    matchLabels:
                      app.kubernetes.io/name: clickhouse-tool
                  template:
                    metadata:
                      labels:
                        app.kubernetes.io/name: clickhouse-tool
                    spec:
                      containers:
                        - name: clickhouse-tool
                          image: m.daocloud.io/docker.io/clickhouse/clickhouse-server:23.11.5.29-alpine
                          imagePullPolicy: IfNotPresent
                          env:
                            - name: CLICKHOUSE_USER
                              value: admin
                            - name: CLICKHOUSE_PASSWORD
                              valueFrom:
                                secretKeyRef:
                                  key: password
                                  name: clickhouse-admin-credentials
                            - name: CLICKHOUSE_HOST
                              value: csst-clickhouse.csst
                            - name: CLICKHOUSE_PORT
                              value: "9000"
                            - name: TZ
                              value: Asia/Shanghai
                          command:
                            - tail
                          args:
                            - -f
                            - /etc/hosts
      destination:
        server: https://kubernetes.default.svc
        namespace: database

    3.deploy clickhouse

    kubectl -n argocd apply -f deploy-clickhouse.yaml

    4.sync by argocd

    argocd app sync argocd/clickhouse

    5.prepare `clickhouse-interface.yaml`

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app.kubernetes.io/component: clickhouse
        app.kubernetes.io/instance: clickhouse
      name: clickhouse-interface
    spec:
      ports:
      - name: http
        port: 8123
        protocol: TCP
        targetPort: http
        nodePort: 31567
      - name: tcp
        port: 9000
        protocol: TCP
        targetPort: tcp
        nodePort: 32005
      selector:
        app.kubernetes.io/component: clickhouse
        app.kubernetes.io/instance: clickhouse
        app.kubernetes.io/name: clickhouse
      type: NodePort

    6.apply to k8s

    kubectl -n database apply -f clickhouse-interface.yaml

    7.extract clickhouse admin credentials

    kubectl -n database get secret clickhouse-admin-credentials -o jsonpath='{.data.password}' | base64 -d

    8.invoke http api

    add `$K8S_MASTER_IP clickhouse.dev.geekcity.tech` to **/etc/hosts**
    CK_PASS=$(kubectl -n database get secret clickhouse-admin-credentials -o jsonpath='{.data.password}' | base64 -d)
    echo 'SELECT version()' | curl -k "https://admin:${CK_PASS}@clickhouse.dev.geekcity.tech:32443/" --data-binary @-

    Preliminary

    1. Docker has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p clickhouse/{data,logs}
    podman run --rm \
        --ulimit nofile=262144:262144 \
        --name clickhouse-server \
        -p 18123:8123 \
        -p 19000:9000 \
        -v $(pwd)/clickhouse/data:/var/lib/clickhouse \
        -v $(pwd)/clickhouse/logs:/var/log/clickhouse-server \
        -e CLICKHOUSE_DB=my_database \
        -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 \
        -e CLICKHOUSE_USER=ayayay \
        -e CLICKHOUSE_PASSWORD=123456 \
        -d m.daocloud.io/docker.io/clickhouse/clickhouse-server:23.11.5.29-alpine

    2.check dashboard

    And then you can visit 🔗http://localhost:18123

    3.use cli api

    And then you can visit 🔗http://localhost:19000
    podman run --rm \
      --entrypoint clickhouse-client \
      -it m.daocloud.io/docker.io/clickhouse/clickhouse-server:23.11.5.29-alpine \
      --host host.containers.internal \
      --port 19000 \
      --user ayayay \
      --password 123456 \
      --query "select version()"

    4.use visual client

    podman run --rm -p 8080:80 -d m.daocloud.io/docker.io/spoonest/clickhouse-tabix-web-client:stable

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Argo Workflow has installed, if not check 🔗link


    1.prepare `argocd-login-credentials`

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    kubectl -n database create secret generic mariadb-credentials \
        --from-literal=mariadb-root-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-replication-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    2.apply rolebinding to k8s

    kubectl apply -f - <<EOF
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: application-administrator
    rules:
      - apiGroups:
          - argoproj.io
        resources:
          - applications
        verbs:
          - '*'
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs:
          - '*'
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: argocd
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: application
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    EOF

    4.prepare clickhouse admin credentials secret

    kubectl get namespace application > /dev/null 2>&1 || kubectl create namespace application
    kubectl -n application create secret generic clickhouse-admin-credentials \
      --from-literal=password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    5.prepare deploy-clickhouse-flow.yaml

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: deploy-argocd-app-ck-
    spec:
      entrypoint: entry
      artifactRepositoryRef:
        configmap: artifact-repositories
        key: default-artifact-repository
      serviceAccountName: argo-workflow
      templates:
      - name: entry
        inputs:
          parameters:
          - name: argocd-server
            value: argo-cd-argocd-server.argocd:443
          - name: insecure-option
            value: --insecure
        dag:
          tasks:
          - name: apply
            template: apply
          - name: prepare-argocd-binary
            template: prepare-argocd-binary
            dependencies:
            - apply
          - name: sync
            dependencies:
            - prepare-argocd-binary
            template: sync
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: wait
            dependencies:
            - sync
            template: wait
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
      - name: apply
        resource:
          action: apply
          manifest: |
            apiVersion: argoproj.io/v1alpha1
            kind: Application
            metadata:
              name: app-clickhouse
              namespace: argocd
            spec:
              syncPolicy:
                syncOptions:
                - CreateNamespace=true
              project: default
              source:
                repoURL: https://charts.bitnami.com/bitnami
                chart: clickhouse
                targetRevision: 4.5.3
                helm:
                  releaseName: app-clickhouse
                  values: |
                    image:
                      registry: docker.io
                      repository: bitnami/clickhouse
                      tag: 23.12.3-debian-11-r0
                      pullPolicy: IfNotPresent
                    service:
                      type: ClusterIP
                    volumePermissions:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    ingress:
                      enabled: true
                      ingressClassName: nginx
                      annotations:
                        cert-manager.io/cluster-issuer: self-signed-ca-issuer
                        nginx.ingress.kubernetes.io/rewrite-target: /$1
                      path: /?(.*)
                      hostname: clickhouse.dev.geekcity.tech
                      tls: true
                    shards: 2
                    replicaCount: 3
                    persistence:
                      enabled: false
                    auth:
                      username: admin
                      existingSecret: clickhouse-admin-credentials
                      existingSecretKey: password
                    zookeeper:
                      enabled: true
                      image:
                        registry: m.daocloud.io/docker.io
                        repository: bitnami/zookeeper
                        tag: 3.8.3-debian-11-r8
                        pullPolicy: IfNotPresent
                      replicaCount: 3
                      persistence:
                        enabled: false
                      volumePermissions:
                        enabled: false
                        image:
                          registry: m.daocloud.io/docker.io
                          pullPolicy: IfNotPresent
              destination:
                server: https://kubernetes.default.svc
                namespace: application
      - name: prepare-argocd-binary
        inputs:
          artifacts:
          - name: argocd-binary
            path: /tmp/argocd
            mode: 755
            http:
              url: https://files.m.daocloud.io/github.com/argoproj/argo-cd/releases/download/v2.9.3/argocd-linux-amd64
        outputs:
          artifacts:
          - name: argocd-binary
            path: "{{inputs.artifacts.argocd-binary.path}}"
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          command:
          - sh
          - -c
          args:
          - |
            ls -l {{inputs.artifacts.argocd-binary.path}}
      - name: sync
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          - name: WITH_PRUNE_OPTION
            value: --prune
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app sync argocd/app-clickhouse ${WITH_PRUNE_OPTION} --timeout 300
      - name: wait
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app wait argocd/app-clickhouse

    6.subimit to argo workflow client

    argo -n business-workflows submit deploy-clickhouse-flow.yaml

    7.extract clickhouse admin credentials

    kubectl -n application get secret clickhouse-admin-credentials -o jsonpath='{.data.password}' | base64 -d

    8.invoke http api

    add `$K8S_MASTER_IP clickhouse.dev.geekcity.tech` to **/etc/hosts**
    CK_PASSWORD=$(kubectl -n application get secret clickhouse-admin-credentials -o jsonpath='{.data.password}' | base64 -d) && echo 'SELECT version()' | curl -k "https://admin:${CK_PASSWORD}@clickhouse.dev.geekcity.tech/" --data-binary @-

    9.create external interface

    kubectl -n application apply -f - <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app.kubernetes.io/component: clickhouse
        app.kubernetes.io/instance: app-clickhouse
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: clickhouse
        app.kubernetes.io/version: 23.12.2
        argocd.argoproj.io/instance: app-clickhouse
        helm.sh/chart: clickhouse-4.5.3
      name: app-clickhouse-service-external
    spec:
      ports:
      - name: tcp
        port: 9000
        protocol: TCP
        targetPort: tcp
        nodePort: 30900
      selector:
        app.kubernetes.io/component: clickhouse
        app.kubernetes.io/instance: app-clickhouse
        app.kubernetes.io/name: clickhouse
      type: NodePort
    EOF

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install ElasticSearch

    Preliminary

    • Kubernetes has installed, if not check 🔗link
    • argoCD has installed, if not check 🔗link
    • ingres has installed on argoCD, if not check 🔗link
    • cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuerservice, , if not check 🔗link

    Steps

    1. prepare elastic-search.yaml

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: elastic-search
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: elasticsearch
        targetRevision: 19.11.3
        helm:
          releaseName: elastic-search
          values: |
            global:
              kibanaEnabled: true
            clusterName: elastic
            image:
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            security:
              enabled: false
            service:
              type: ClusterIP
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: elastic-search.dev.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            master:
              masterOnly: false
              replicaCount: 1
              persistence:
                enabled: false
              resources:
                requests:
                  cpu: 2
                  memory: 1024Mi
                limits:
                  cpu: 4
                  memory: 4096Mi
              heapSize: 2g
            data:
              replicaCount: 0
              persistence:
                enabled: false
            coordinating:
              replicaCount: 0
            ingest:
              enabled: true
              replicaCount: 0
              service:
                enabled: false
                type: ClusterIP
              ingress:
                enabled: false
            metrics:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            sysctlImage:
              enabled: true
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            kibana:
              elasticsearch:
                hosts:
                  - '{{ include "elasticsearch.service.name" . }}'
                port: '{{ include "elasticsearch.service.ports.restAPI" . }}'
            esJavaOpts: "-Xmx2g -Xms2g"        
      destination:
        server: https://kubernetes.default.svc
        namespace: application
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: elastic-search
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: elasticsearch
        targetRevision: 19.11.3
        helm:
          releaseName: elastic-search
          values: |
            global:
              kibanaEnabled: true
            clusterName: elastic
            image:
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            security:
              enabled: false
            service:
              type: ClusterIP
            ingress:
              enabled: true
              annotations:
                cert-manager.io/cluster-issuer: self-signed-ca-issuer
                nginx.ingress.kubernetes.io/rewrite-target: /$1
              hostname: elastic-search.dev.tech
              ingressClassName: nginx
              path: /?(.*)
              tls: true
            master:
              masterOnly: false
              replicaCount: 1
              persistence:
                enabled: false
              resources:
                requests:
                  cpu: 2
                  memory: 1024Mi
                limits:
                  cpu: 4
                  memory: 4096Mi
              heapSize: 2g
            data:
              replicaCount: 0
              persistence:
                enabled: false
            coordinating:
              replicaCount: 0
            ingest:
              enabled: true
              replicaCount: 0
              service:
                enabled: false
                type: ClusterIP
              ingress:
                enabled: false
            metrics:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.zjvis.net/docker.io
                pullPolicy: IfNotPresent
            sysctlImage:
              enabled: true
              registry: m.zjvis.net/docker.io
              pullPolicy: IfNotPresent
            kibana:
              elasticsearch:
                hosts:
                  - '{{ include "elasticsearch.service.name" . }}'
                port: '{{ include "elasticsearch.service.ports.restAPI" . }}'
            esJavaOpts: "-Xmx2g -Xms2g"        
      destination:
        server: https://kubernetes.default.svc
        namespace: application

    3. apply to k8s

    kubectl -n argocd apply -f elastic-search.yaml

    4. sync by argocd

    argocd app sync argocd/elastic-search

    [Optional] Test REST API call

    add $K8S_MASTER_IP elastic-search.dev.tech to /etc/hosts

    curl -k "https://elastic-search.dev.tech:32443/?pretty"

    [Optional] Add Single Document

    curl -k -H "Content-Type: application/json" \
        -X POST "https://elastic-search.dev.tech:32443/books/_doc?pretty" \
        -d '{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470}'

    Install Kafka

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm binary has installed, if not check 🔗link


    1.get helm repo

    helm repo add bitnami oci://registry-1.docker.io/bitnamicharts/kafka
    helm repo update

    2.install chart

    helm upgrade --create-namespace -n database kafka --install bitnami/kafka \
      --set global.imageRegistry=m.daocloud.io/docker.io \
      --set zookeeper.enabled=false \
      --set controller.replicaCount=1 \
      --set broker.replicaCount=1 \
      --set persistance.enabled=false  \
      --version 28.0.3
    
    helm upgrade --create-namespace -n database kafka --install bitnami/kafka \
      --set global.imageRegistry=m.daocloud.io/docker.io \
      --set zookeeper.enabled=false \
      --set controller.replicaCount=1 \
      --set broker.replicaCount=1 \
      --set persistance.enabled=false  \
      --version 28.0.3
    
    kubectl -n database \
      create secret generic client-properties \
      --from-literal=client.properties="$(printf "security.protocol=SASL_PLAINTEXT\nsasl.mechanism=SCRAM-SHA-256\nsasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username=\"user1\" password=\"$(kubectl get secret kafka-user-passwords --namespace database -o jsonpath='{.data.client-passwords}' | base64 -d | cut -d , -f 1)\";\n")"
    kubectl -n database apply -f - << EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kafka-client-tools
      labels:
        app: kafka-client-tools
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kafka-client-tools
      template:
        metadata:
          labels:
            app: kafka-client-tools
        spec:
          volumes:
          - name: client-properties
            secret:
              secretName: client-properties
          containers:
          - name: kafka-client-tools
            image: m.daocloud.io/docker.io/bitnami/kafka:3.6.2
            volumeMounts:
            - name: client-properties
              mountPath: /bitnami/custom/client.properties
              subPath: client.properties
              readOnly: true
            env:
            - name: BOOTSTRAP_SERVER
              value: kafka.database.svc.cluster.local:9092
            - name: CLIENT_CONFIG_FILE
              value: /bitnami/custom/client.properties
            command:
            - tail
            - -f
            - /etc/hosts
            imagePullPolicy: IfNotPresent
    EOF

    3.validate function

    - list topics
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
        'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --list'
    - create topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --create --if-not-exists --topic test-topic'
    - describe topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --describe --topic test-topic'
    - produce message
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'for message in $(seq 0 10); do echo $message | kafka-console-producer.sh --bootstrap-server $BOOTSTRAP_SERVER --producer.config $CLIENT_CONFIG_FILE --topic test-topic; done'
    - consume message
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --consumer.config $CLIENT_CONFIG_FILE --topic test-topic --from-beginning'

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Helm binary has installed, if not check 🔗link


    1.prepare `deploy-kafka.yaml`

    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: kafka
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: kafka
        targetRevision: 28.0.3
        helm:
          releaseName: kafka
          values: |
            image:
              registry: m.daocloud.io/docker.io
            controller:
              replicaCount: 1
              persistence:
                enabled: false
              logPersistence:
                enabled: false
              extraConfig: |
                message.max.bytes=5242880
                default.replication.factor=1
                offsets.topic.replication.factor=1
                transaction.state.log.replication.factor=1
            broker:
              replicaCount: 1
              persistence:
                enabled: false
              logPersistence:
                enabled: false
              extraConfig: |
                message.max.bytes=5242880
                default.replication.factor=1
                offsets.topic.replication.factor=1
                transaction.state.log.replication.factor=1
            externalAccess:
              enabled: false
              autoDiscovery:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
            volumePermissions:
              enabled: false
              image:
                registry: m.daocloud.io/docker.io
            metrics:
              kafka:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
              jmx:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
            provisioning:
              enabled: false
            kraft:
              enabled: true
            zookeeper:
              enabled: false
      destination:
        server: https://kubernetes.default.svc
        namespace: database
    EOF
    kubectl -n argocd apply -f - << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: kafka
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: kafka
        targetRevision: 28.0.3
        helm:
          releaseName: kafka
          values: |
            image:
              registry: m.daocloud.io/docker.io
            listeners:
              client:
                protocol: PLAINTEXT
              interbroker:
                protocol: PLAINTEXT
            controller:
              replicaCount: 0
              persistence:
                enabled: false
              logPersistence:
                enabled: false
              extraConfig: |
                message.max.bytes=5242880
                default.replication.factor=1
                offsets.topic.replication.factor=1
                transaction.state.log.replication.factor=1
            broker:
              replicaCount: 1
              minId: 0
              persistence:
                enabled: false
              logPersistence:
                enabled: false
              extraConfig: |
                message.max.bytes=5242880
                default.replication.factor=1
                offsets.topic.replication.factor=1
                transaction.state.log.replication.factor=1
            externalAccess:
              enabled: false
              autoDiscovery:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
            volumePermissions:
              enabled: false
              image:
                registry: m.daocloud.io/docker.io
            metrics:
              kafka:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
              jmx:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
            provisioning:
              enabled: false
            kraft:
              enabled: false
            zookeeper:
              enabled: true
              image:
                registry: m.daocloud.io/docker.io
              replicaCount: 1
              auth:
                client:
                  enabled: false
                quorum:
                  enabled: false
              persistence:
                enabled: false
              volumePermissions:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                metrics:
                  enabled: false
              tls:
                client:
                  enabled: false
                quorum:
                  enabled: false
      destination:
        server: https://kubernetes.default.svc
        namespace: database
    EOF

    2.sync by argocd

    argocd app sync argocd/kafka

    3.set up client tool

    kubectl -n database \
        create secret generic client-properties \
        --from-literal=client.properties="$(printf "security.protocol=SASL_PLAINTEXT\nsasl.mechanism=SCRAM-SHA-256\nsasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username=\"user1\" password=\"$(kubectl get secret kafka-user-passwords --namespace database -o jsonpath='{.data.client-passwords}' | base64 -d | cut -d , -f 1)\";\n")"
    kubectl -n database \
        create secret generic client-properties \
        --from-literal=client.properties="security.protocol=PLAINTEXT"

    5.prepare `kafka-client-tools.yaml`

    kubectl -n database apply -f - << EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kafka-client-tools
      labels:
        app: kafka-client-tools
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kafka-client-tools
      template:
        metadata:
          labels:
            app: kafka-client-tools
        spec:
          volumes:
          - name: client-properties
            secret:
              secretName: client-properties
          containers:
          - name: kafka-client-tools
            image: m.daocloud.io/docker.io/bitnami/kafka:3.6.2
            volumeMounts:
            - name: client-properties
              mountPath: /bitnami/custom/client.properties
              subPath: client.properties
              readOnly: true
            env:
            - name: BOOTSTRAP_SERVER
              value: kafka.database.svc.cluster.local:9092
            - name: CLIENT_CONFIG_FILE
              value: /bitnami/custom/client.properties
            - name: ZOOKEEPER_CONNECT
              value: kafka-zookeeper.database.svc.cluster.local:2181
            command:
            - tail
            - -f
            - /etc/hosts
            imagePullPolicy: IfNotPresent
    EOF

    6.validate function

    - list topics
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
        'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --list'
    - create topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --create --if-not-exists --topic test-topic'
    - describe topic
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-topics.sh --bootstrap-server $BOOTSTRAP_SERVER --command-config $CLIENT_CONFIG_FILE --describe --topic test-topic'
    - produce message
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'for message in $(seq 0 10); do echo $message | kafka-console-producer.sh --bootstrap-server $BOOTSTRAP_SERVER --producer.config $CLIENT_CONFIG_FILE --topic test-topic; done'
    - consume message
    kubectl -n database exec -it deployment/kafka-client-tools -- bash -c \
      'kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVER --consumer.config $CLIENT_CONFIG_FILE --topic test-topic --from-beginning'

    Preliminary

    1. Docker has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p kafka/data
    chmod -R 777 kafka/data
    podman run --rm \
        --name kafka-server \
        --hostname kafka-server \
        -p 9092:9092 \
        -p 9094:9094 \
        -v $(pwd)/kafka/data:/bitnami/kafka/data \
        -e KAFKA_CFG_NODE_ID=0 \
        -e KAFKA_CFG_PROCESS_ROLES=controller,broker \
        -e KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka-server:9093 \
        -e KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094 \
        -e KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://host.containers.internal:9094 \
        -e KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT \
        -e KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER \
        -d m.daocloud.io/docker.io/bitnami/kafka:3.6.2

    2.list topic

    BOOTSTRAP_SERVER=host.containers.internal:9094
    podman run --rm \
        -it m.daocloud.io/docker.io/bitnami/kafka:3.6.2 kafka-topics.sh \
            --bootstrap-server $BOOTSTRAP_SERVER --list

    2.create topic

    BOOTSTRAP_SERVER=host.containers.internal:9094
    # BOOTSTRAP_SERVER=10.200.60.64:9094
    TOPIC=test-topic
    podman run --rm \
        -it m.daocloud.io/docker.io/bitnami/kafka:3.6.2 kafka-topics.sh \
            --bootstrap-server $BOOTSTRAP_SERVER \
            --create \
            --if-not-exists \
            --topic $TOPIC

    2.consume record

    BOOTSTRAP_SERVER=host.containers.internal:9094
    # BOOTSTRAP_SERVER=10.200.60.64:9094
    TOPIC=test-topic
    podman run --rm \
        -it m.daocloud.io/docker.io/bitnami/kafka:3.6.2 kafka-console-consumer.sh \
            --bootstrap-server $BOOTSTRAP_SERVER \
            --topic $TOPIC \
            --from-beginning

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install MariaDB

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. argoCD has installed, if not check 🔗link


    3. cert-manager has installed on argocd and the clusterissuer has a named `self-signed-ca-issuer`service, , if not check 🔗link


    1.prepare mariadb credentials secret

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    kubectl -n database create secret generic mariadb-credentials \
        --from-literal=mariadb-root-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-replication-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    2.prepare `deploy-mariadb.yaml`

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: mariadb
    spec:
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
      project: default
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: mariadb
        targetRevision: 16.3.2
        helm:
          releaseName: mariadb
          values: |
            architecture: standalone
            auth:
              database: test-mariadb
              username: aaron.yang
              existingSecret: mariadb-credentials
            primary:
              extraFlags: "--character-set-server=utf8mb4 --collation-server=utf8mb4_bin"
              persistence:
                enabled: false
            secondary:
              replicaCount: 1
              persistence:
                enabled: false
            image:
              registry: m.daocloud.io/docker.io
              pullPolicy: IfNotPresent
            volumePermissions:
              enabled: false
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
            metrics:
              enabled: false
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
      destination:
        server: https://kubernetes.default.svc
        namespace: database

    3.deploy mariadb

    kubectl -n argocd apply -f deploy-mariadb.yaml

    4.sync by argocd

    argocd app sync argocd/mariadb

    5.check mariadb

    kubectl -n database get secret mariadb-credentials -o jsonpath='{.data.mariadb-root-password}' | base64 -d

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. ArgoCD has installed, if not check 🔗link


    3. Argo Workflow has installed, if not check 🔗link


    1.prepare `argocd-login-credentials`

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    kubectl -n database create secret generic mariadb-credentials \
        --from-literal=mariadb-root-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-replication-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
        --from-literal=mariadb-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    2.apply rolebinding to k8s

    kubectl -n argocd apply -f - <<EOF
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: application-administrator
    rules:
      - apiGroups:
          - argoproj.io
        resources:
          - applications
        verbs:
          - '*'
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs:
          - '*'
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: argocd
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: application
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    EOF

    3.prepare mariadb credentials secret

    kubectl -n application create secret generic mariadb-credentials \
      --from-literal=mariadb-root-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
      --from-literal=mariadb-replication-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
      --from-literal=mariadb-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    4.prepare `deploy-mariadb-flow.yaml`

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: deploy-argocd-app-mariadb-
    spec:
      entrypoint: entry
      artifactRepositoryRef:
        configmap: artifact-repositories
        key: default-artifact-repository
      serviceAccountName: argo-workflow
      templates:
      - name: entry
        inputs:
          parameters:
          - name: argocd-server
            value: argo-cd-argocd-server.argocd:443
          - name: insecure-option
            value: --insecure
        dag:
          tasks:
          - name: apply
            template: apply
          - name: prepare-argocd-binary
            template: prepare-argocd-binary
            dependencies:
            - apply
          - name: sync
            dependencies:
            - prepare-argocd-binary
            template: sync
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: wait
            dependencies:
            - sync
            template: wait
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: init-db-tool
            template: init-db-tool
            dependencies:
            - wait
      - name: apply
        resource:
          action: apply
          manifest: |
            apiVersion: argoproj.io/v1alpha1
            kind: Application
            metadata:
              name: app-mariadb
              namespace: argocd
            spec:
              syncPolicy:
                syncOptions:
                - CreateNamespace=true
              project: default
              source:
                repoURL: https://charts.bitnami.com/bitnami
                chart: mariadb
                targetRevision: 16.5.0
                helm:
                  releaseName: app-mariadb
                  values: |
                    architecture: standalone
                    auth:
                      database: geekcity
                      username: aaron.yang
                      existingSecret: mariadb-credentials
                    primary:
                      persistence:
                        enabled: false
                    secondary:
                      replicaCount: 1
                      persistence:
                        enabled: false
                    image:
                      registry: m.daocloud.io/docker.io
                      pullPolicy: IfNotPresent
                    volumePermissions:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    metrics:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
              destination:
                server: https://kubernetes.default.svc
                namespace: application
      - name: prepare-argocd-binary
        inputs:
          artifacts:
          - name: argocd-binary
            path: /tmp/argocd
            mode: 755
            http:
              url: https://files.m.daocloud.io/github.com/argoproj/argo-cd/releases/download/v2.9.3/argocd-linux-amd64
        outputs:
          artifacts:
          - name: argocd-binary
            path: "{{inputs.artifacts.argocd-binary.path}}"
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          command:
          - sh
          - -c
          args:
          - |
            ls -l {{inputs.artifacts.argocd-binary.path}}
      - name: sync
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          - name: WITH_PRUNE_OPTION
            value: --prune
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app sync argocd/app-mariadb ${WITH_PRUNE_OPTION} --timeout 300
      - name: wait
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app wait argocd/app-mariadb
      - name: init-db-tool
        resource:
          action: apply
          manifest: |
            apiVersion: apps/v1
            kind: Deployment
            metadata:
              name: app-mariadb-tool
              namespace: application
              labels:
                app.kubernetes.io/name: mariadb-tool
            spec:
              replicas: 1
              selector:
                matchLabels:
                  app.kubernetes.io/name: mariadb-tool
              template:
                metadata:
                  labels:
                    app.kubernetes.io/name: mariadb-tool
                spec:
                  containers:
                    - name: mariadb-tool
                      image:  m.daocloud.io/docker.io/bitnami/mariadb:10.5.12-debian-10-r0
                      imagePullPolicy: IfNotPresent
                      env:
                        - name: MARIADB_ROOT_PASSWORD
                          valueFrom:
                            secretKeyRef:
                              key: mariadb-root-password
                              name: mariadb-credentials
                        - name: TZ
                          value: Asia/Shanghai

    5.subimit to argo workflow client

    argo -n business-workflows submit deploy-mariadb-flow.yaml

    6.decode password

    kubectl -n application get secret mariadb-credentials -o jsonpath='{.data.mariadb-root-password}' | base64 -d

    Preliminary

    1. Docker has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p mariadb/data
    podman run  \
        -p 3306:3306 \
        -e MARIADB_ROOT_PASSWORD=mysql \
        -d m.daocloud.io/docker.io/library/mariadb:11.2.2-jammy \
        --log-bin \
        --binlog-format=ROW

    2.use web console

    And then you can visit 🔗http://localhost:8080

    username: `root`

    password: `mysql`

    podman run --rm -p 8080:80 \
        -e PMA_ARBITRARY=1 \
        -d m.daocloud.io/docker.io/library/phpmyadmin:5.1.1-apache

    3.use internal client

    podman run --rm \
        -e MYSQL_PWD=mysql \
        -it m.daocloud.io/docker.io/library/mariadb:11.2.2-jammy \
        mariadb \
        --host host.containers.internal \
        --port 3306 \
        --user root \
        --database mysql \
        --execute 'select version()'

    Useful SQL

    1. list all bin logs
    SHOW BINARY LOGS;
    1. delete previous bin logs
    PURGE BINARY LOGS TO 'mysqld-bin.0000003'; # delete mysqld-bin.0000001 and mysqld-bin.0000002
    PURGE BINARY LOGS BEFORE 'yyyy-MM-dd HH:mm:ss';
    PURGE BINARY LOGS DATE_SUB(NOW(), INTERVAL 3 DAYS); # delete last three days bin log file.

    If you using master-slave mode, you can change all BINARY to MASTER

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install Milvus

    Preliminary

    • Kubernetes has installed, if not check link
    • argoCD has installed, if not check link
    • cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuerservice, , if not check link
    • minio has installed, if not check link

    Steps

    1. copy minio credentials secret

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    kubectl -n storage get secret minio-secret -o json \
        | jq 'del(.metadata["namespace","creationTimestamp","resourceVersion","selfLink","uid"])' \
        | kubectl -n database apply -f -

    2. prepare deploy-milvus.yaml

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: milvus
    spec:
      syncPolicy:
        syncOptions:
          - CreateNamespace=true
      project: default
      source:
        repoURL: registry-1.docker.io/bitnamicharts
        chart: milvus
        targetRevision: 11.2.4
        helm:
          releaseName: milvus
          values: |
            global:
              security:
                allowInsecureImages: true
            milvus:
              image:
                registry: m.lab.zverse.space/docker.io
                repository: bitnami/milvus
                tag: 2.5.7-debian-12-r0
                pullPolicy: IfNotPresent
              auth:
                enabled: false
            initJob:
              forceRun: false
              image:
                registry: m.lab.zverse.space/docker.io
                repository: bitnami/pymilvus
                tag: 2.5.6-debian-12-r0
                pullPolicy: IfNotPresent
              resources:
                requests:
                  cpu: 2
                  memory: 512Mi
                limits:
                  cpu: 2
                  memory: 2Gi
            dataCoord:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 512Mi
                limits:
                  cpu: 2
                  memory: 2Gi
              metrics:
                enabled: true
                
            rootCoord:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
            queryCoord:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
            indexCoord:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
            dataNode:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
            queryNode:
              replicaCount: 1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 2Gi
            indexNode:
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 2Gi
            proxy:
              replicaCount: 1
              service:
                type: ClusterIP
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 2Gi
            attu:
              image:
                registry: m.lab.zverse.space/docker.io
                repository: bitnami/attu
                tag: 2.5.5-debian-12-r1
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
              service:
                type: ClusterIP
              ingress:
                enabled: true
                ingressClassName: "nginx"
                annotations:
                  cert-manager.io/cluster-issuer: alidns-webhook-zverse-letsencrypt
                hostname: milvus.dev.tech
                path: /
                pathType: ImplementationSpecific
                tls: true
            waitContainer:
              image:
                registry: m.lab.zverse.space/docker.io
                repository: bitnami/os-shell
                tag: 12-debian-12-r40
                pullPolicy: IfNotPresent
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 4Gi
            externalS3:
              host: "minio.storage"
              port: 9000
              existingSecret: "minio-secret"
              existingSecretAccessKeyIDKey: "root-user"
              existingSecretKeySecretKey: "root-password"
              bucket: "milvus"
              rootPath: "file"
            etcd:
              enabled: true
              image:
                registry: m.lab.zverse.space/docker.io
              replicaCount: 1
              auth:
                rbac:
                  create: false
                client:
                  secureTransport: false
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
                limits:
                  cpu: 2
                  memory: 2Gi
              persistence:
                enabled: true
                storageClass: ""
                size: 2Gi
              preUpgradeJob:
                enabled: false
            minio:
              enabled: false
            kafka:
              enabled: true
              image:
                registry: m.lab.zverse.space/docker.io
              controller:
                replicaCount: 1
                livenessProbe:
                  failureThreshold: 8
                resources:
                  requests:
                    cpu: 500m
                    memory: 1Gi
                  limits:
                    cpu: 2
                    memory: 2Gi
                persistence:
                  enabled: true
                  storageClass: ""
                  size: 2Gi
              service:
                ports:
                  client: 9092
              extraConfig: |-
                offsets.topic.replication.factor=3
              listeners:
                client:
                  protocol: PLAINTEXT
                interbroker:
                  protocol: PLAINTEXT
                external:
                  protocol: PLAINTEXT
              sasl:
                enabledMechanisms: "PLAIN"
                client:
                  users:
                    - user
              broker:
                replicaCount: 0
      destination:
        server: https://kubernetes.default.svc
        namespace: database

    3. apply to k8s

    kubectl -n argocd apply -f deploy-milvus.yaml

    4. sync by argocd

    argocd app sync argocd/milvus

    5. check Attu WebUI

    milvus address: milvus-proxy:19530

    milvus database: default

    https://milvus.dev.tech:32443/#/

    5. [Optional] import data

    import data by using sql file

    MARIADB_ROOT_PASSWORD=$(kubectl -n database get secret mariadb-credentials -o jsonpath='{.data.mariadb-root-password}' | base64 -d)
    POD_NAME=$(kubectl get pod -n database -l "app.kubernetes.io/name=mariadb-tool" -o jsonpath="{.items[0].metadata.name}") \
    && export SQL_FILENAME="Dump20240301.sql" \
    && kubectl -n database cp ${SQL_FILENAME} ${POD_NAME}:/tmp/${SQL_FILENAME} \
    && kubectl -n database exec -it deployment/app-mariadb-tool -- bash -c \
        'echo "create database ccds;" | mysql -h mariadb.database -uroot -p$MARIADB_ROOT_PASSWORD' \
    && kubectl -n database exec -it ${POD_NAME} -- bash -c \
        "mysql -h mariadb.database -uroot -p\${MARIADB_ROOT_PASSWORD} \
        ccds < /tmp/Dump20240301.sql"

    6. [Optional] decode password

    kubectl -n database get secret mariadb-credentials -o jsonpath='{.data.mariadb-root-password}' | base64 -d

    7. [Optional] execute sql in pod

    kubectl -n database exec -it xxxx bash
    mariadb -h 127.0.0.1 -u root -p$MARIADB_ROOT_PASSWORD

    And then you can check connection by

    show status like  'Threads%';

    Install Neo4j

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    1.get helm repo

    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
    helm repo update

    2.install chart

    helm install ay-helm-mirror/kube-prometheus-stack --generate-name
    Using Proxy

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    1.prepare `deploy-xxxxx.yaml`

    2.apply to k8s

    kubectl -n argocd apply -f xxxx.yaml

    3.sync by argocd

    argocd app sync argocd/xxxx

    4.prepare yaml-content.yaml

    5.apply to k8s

    kubectl apply -f xxxx.yaml

    6.apply xxxx.yaml directly

    kubectl apply -f - <<EOF
    
    EOF

    Preliminary

    1. Docker|Podman|Buildah has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p neo4j/data
    podman run --rm \
        --name neo4j \
        -p 7474:7474 \
        -p 7687:7687 \
        -e neo4j_ROOT_PASSWORD=mysql \
        -v $(pwd)/neo4j/data:/data \
        -d docker.io/library/neo4j:5.18.0-community-bullseye
    and then you can visit 🔗[http://localhost:7474]


    username: `root`
    password: `mysql`

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    4. Argo Workflow has installed, if not check 🔗link


    1.prepare `argocd-login-credentials`

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database

    2.apply rolebinding to k8s

    kubectl apply -f - <<EOF
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: application-administrator
    rules:
      - apiGroups:
          - argoproj.io
        resources:
          - applications
        verbs:
          - '*'
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs:
          - '*'
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: argocd
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: application
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    EOF

    4.prepare `deploy-xxxx-flow.yaml`

    6.subimit to argo workflow client

    argo -n business-workflows submit deploy-xxxx-flow.yaml

    7.decode password

    kubectl -n application get secret xxxx-credentials -o jsonpath='{.data.xxx-password}' | base64 -d

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install Postgresql

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    1.get helm repo

    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
    helm repo update

    2.install chart

    helm install ay-helm-mirror/kube-prometheus-stack --generate-name
    Using Proxy

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    1.prepare `deploy-xxxxx.yaml`

    2.apply to k8s

    kubectl -n argocd apply -f xxxx.yaml

    3.sync by argocd

    argocd app sync argocd/xxxx

    4.prepare yaml-content.yaml

    5.apply to k8s

    kubectl apply -f xxxx.yaml

    6.apply xxxx.yaml directly

    kubectl apply -f - <<EOF
    
    EOF

    Preliminary

    1. Docker|Podman|Buildah has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p $(pwd)/postgresql/data
    podman run --rm \
        --name postgresql \
        -p 5432:5432 \
        -e POSTGRES_PASSWORD=postgresql \
        -e PGDATA=/var/lib/postgresql/data/pgdata \
        -v $(pwd)/postgresql/data:/var/lib/postgresql/data \
        -d docker.io/library/postgres:15.2-alpine3.17

    2.use web console

    podman run --rm \
      -p 8080:80 \
      -e 'PGADMIN_DEFAULT_EMAIL=ben.wangz@foxmail.com' \
      -e 'PGADMIN_DEFAULT_PASSWORD=123456' \
      -d docker.io/dpage/pgadmin4:6.15
    And then you can visit 🔗[http://localhost:8080]


    3.use internal client

    podman run --rm \
        --env PGPASSWORD=postgresql \
        --entrypoint psql \
        -it docker.io/library/postgres:15.2-alpine3.17 \
        --host host.containers.internal \
        --port 5432 \
        --username postgres \
        --dbname postgres \
        --command 'select version()'

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    4. Argo Workflow has installed, if not check 🔗link


    5. Minio artifact repository has been configured, if not check 🔗link


    - endpoint: minio.storage:9000

    1.prepare `argocd-login-credentials`

    kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database
    ARGOCD_USERNAME=admin
    ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    kubectl -n business-workflows create secret generic argocd-login-credentials \
        --from-literal=username=${ARGOCD_USERNAME} \
        --from-literal=password=${ARGOCD_PASSWORD}

    2.apply rolebinding to k8s

    kubectl apply -f - <<EOF
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: application-administrator
    rules:
      - apiGroups:
          - argoproj.io
        resources:
          - applications
        verbs:
          - '*'
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs:
          - '*'
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: argocd
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: application
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    EOF

    3.prepare postgresql admin credentials secret

    kubectl -n application create secret generic postgresql-credentials \
      --from-literal=postgres-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
      --from-literal=password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) \
      --from-literal=replication-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    4.prepare `deploy-postgresql-flow.yaml`

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: deploy-argocd-app-pg-
    spec:
      entrypoint: entry
      artifactRepositoryRef:
        configmap: artifact-repositories
        key: default-artifact-repository
      serviceAccountName: argo-workflow
      templates:
      - name: entry
        inputs:
          parameters:
          - name: argocd-server
            value: argo-cd-argocd-server.argocd:443
          - name: insecure-option
            value: --insecure
        dag:
          tasks:
          - name: apply
            template: apply
          - name: prepare-argocd-binary
            template: prepare-argocd-binary
            dependencies:
            - apply
          - name: sync
            dependencies:
            - prepare-argocd-binary
            template: sync
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: wait
            dependencies:
            - sync
            template: wait
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: init-db-tool
            template: init-db-tool
            dependencies:
            - wait
      - name: apply
        resource:
          action: apply
          manifest: |
            apiVersion: argoproj.io/v1alpha1
            kind: Application
            metadata:
              name: app-postgresql
              namespace: argocd
            spec:
              syncPolicy:
                syncOptions:
                - CreateNamespace=true
              project: default
              source:
                repoURL: https://charts.bitnami.com/bitnami
                chart: postgresql
                targetRevision: 14.2.2
                helm:
                  releaseName: app-postgresql
                  values: |
                    architecture: standalone
                    auth:
                      database: geekcity
                      username: aaron.yang
                      existingSecret: postgresql-credentials
                    primary:
                      persistence:
                        enabled: false
                    readReplicas:
                      replicaCount: 1
                      persistence:
                        enabled: false
                    backup:
                      enabled: false
                    image:
                      registry: m.daocloud.io/docker.io
                      pullPolicy: IfNotPresent
                    volumePermissions:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    metrics:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
              destination:
                server: https://kubernetes.default.svc
                namespace: application
      - name: prepare-argocd-binary
        inputs:
          artifacts:
          - name: argocd-binary
            path: /tmp/argocd
            mode: 755
            http:
              url: https://files.m.daocloud.io/github.com/argoproj/argo-cd/releases/download/v2.9.3/argocd-linux-amd64
        outputs:
          artifacts:
          - name: argocd-binary
            path: "{{inputs.artifacts.argocd-binary.path}}"
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          command:
          - sh
          - -c
          args:
          - |
            ls -l {{inputs.artifacts.argocd-binary.path}}
      - name: sync
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          - name: WITH_PRUNE_OPTION
            value: --prune
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app sync argocd/app-postgresql ${WITH_PRUNE_OPTION} --timeout 300
      - name: wait
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app wait argocd/app-postgresql
      - name: init-db-tool
        resource:
          action: apply
          manifest: |
            apiVersion: apps/v1
            kind: Deployment
            metadata:
              name: app-postgresql-tool
              namespace: application
              labels:
                app.kubernetes.io/name: postgresql-tool
            spec:
              replicas: 1
              selector:
                matchLabels:
                  app.kubernetes.io/name: postgresql-tool
              template:
                metadata:
                  labels:
                    app.kubernetes.io/name: postgresql-tool
                spec:
                  containers:
                    - name: postgresql-tool
                      image: m.daocloud.io/docker.io/bitnami/postgresql:14.4.0-debian-11-r9
                      imagePullPolicy: IfNotPresent
                      env:
                        - name: POSTGRES_PASSWORD
                          valueFrom:
                            secretKeyRef:
                              key: postgres-password
                              name: postgresql-credentials
                        - name: TZ
                          value: Asia/Shanghai
                      command:
                        - tail
                      args:
                        - -f
                        - /etc/hosts

    6.subimit to argo workflow client

    argo -n business-workflows submit deploy-postgresql.yaml

    7.decode password

    kubectl -n application get secret postgresql-credentials -o jsonpath='{.data.postgres-password}' | base64 -d

    8.import data

    POSTGRES_PASSWORD=$(kubectl -n application get secret postgresql-credentials -o jsonpath='{.data.postgres-password}' | base64 -d) \
    POD_NAME=$(kubectl get pod -n application -l "app.kubernetes.io/name=postgresql-tool" -o jsonpath="{.items[0].metadata.name}") \
    && export SQL_FILENAME="init_dfs_table_data.sql" \
    && kubectl -n application cp ${SQL_FILENAME} ${POD_NAME}:/tmp/${SQL_FILENAME} \
    && kubectl -n application exec -it deployment/app-postgresql-tool -- bash -c \
        'echo "CREATE DATABASE csst;" | PGPASSWORD="$POSTGRES_PASSWORD" \
        psql --host app-postgresql.application -U postgres -d postgres -p 5432' \
    && kubectl -n application exec -it deployment/app-postgresql-tool -- bash -c \
        'PGPASSWORD="$POSTGRES_PASSWORD" psql --host app-postgresql.application \
        -U postgres -d csst -p 5432 < /tmp/init_dfs_table_data.sql'

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    Install Redis

    Installation

    Install By

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    1.get helm repo

    helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
    helm repo update

    2.install chart

    helm install ay-helm-mirror/kube-prometheus-stack --generate-name
    Using Proxy

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    1.prepare `deploy-xxxxx.yaml`

    2.apply to k8s

    kubectl -n argocd apply -f xxxx.yaml

    3.sync by argocd

    argocd app sync argocd/xxxx

    4.prepare yaml-content.yaml

    5.apply to k8s

    kubectl apply -f xxxx.yaml

    6.apply xxxx.yaml directly

    kubectl apply -f - <<EOF
    
    EOF

    Preliminary

    1. Docker|Podman|Buildah has installed, if not check 🔗link


    Using Proxy

    you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

    1.init server

    mkdir -p $(pwd)/redis/data
    podman run --rm \
        --name redis \
        -p 6379:6379 \
        -d docker.io/library/redis:7.2.4-alpine

    1.use internal client

    podman run --rm \
        -it docker.io/library/redis:7.2.4-alpine \
        redis-cli \
        -h host.containers.internal \
        set mykey somevalue

    Preliminary

    1. Kubernetes has installed, if not check 🔗link


    2. Helm has installed, if not check 🔗link


    3. ArgoCD has installed, if not check 🔗link


    4. Argo Workflow has installed, if not check 🔗link


    5. Minio artifact repository has been configured, if not check 🔗link


    - endpoint: minio.storage:9000

    1.prepare `argocd-login-credentials`

    ARGOCD_USERNAME=admin
    ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    kubectl -n business-workflows create secret generic argocd-login-credentials \
        --from-literal=username=${ARGOCD_USERNAME} \
        --from-literal=password=${ARGOCD_PASSWORD}

    2.apply rolebinding to k8s

    kubectl apply -f - <<EOF
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: application-administrator
    rules:
      - apiGroups:
          - argoproj.io
        resources:
          - applications
        verbs:
          - '*'
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs:
          - '*'
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: argocd
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: application-administration
      namespace: application
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: application-administrator
    subjects:
      - kind: ServiceAccount
        name: argo-workflow
        namespace: business-workflows
    EOF

    3.prepare redis credentials secret

    kubectl -n application create secret generic redis-credentials \
      --from-literal=redis-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

    4.prepare `deploy-redis-flow.yaml`

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: deploy-argocd-app-redis-
    spec:
      entrypoint: entry
      artifactRepositoryRef:
        configmap: artifact-repositories
        key: default-artifact-repository
      serviceAccountName: argo-workflow
      templates:
      - name: entry
        inputs:
          parameters:
          - name: argocd-server
            value: argocd-server.argocd:443
          - name: insecure-option
            value: --insecure
        dag:
          tasks:
          - name: apply
            template: apply
          - name: prepare-argocd-binary
            template: prepare-argocd-binary
            dependencies:
            - apply
          - name: sync
            dependencies:
            - prepare-argocd-binary
            template: sync
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
          - name: wait
            dependencies:
            - sync
            template: wait
            arguments:
              artifacts:
              - name: argocd-binary
                from: "{{tasks.prepare-argocd-binary.outputs.artifacts.argocd-binary}}"
              parameters:
              - name: argocd-server
                value: "{{inputs.parameters.argocd-server}}"
              - name: insecure-option
                value: "{{inputs.parameters.insecure-option}}"
      - name: apply
        resource:
          action: apply
          manifest: |
            apiVersion: argoproj.io/v1alpha1
            kind: Application
            metadata:
              name: app-redis
              namespace: argocd
            spec:
              syncPolicy:
                syncOptions:
                - CreateNamespace=true
              project: default
              source:
                repoURL: https://charts.bitnami.com/bitnami
                chart: redis
                targetRevision: 18.16.0
                helm:
                  releaseName: app-redis
                  values: |
                    architecture: replication
                    auth:
                      enabled: true
                      sentinel: true
                      existingSecret: redis-credentials
                    master:
                      count: 1
                      disableCommands:
                        - FLUSHDB
                        - FLUSHALL
                      persistence:
                        enabled: false
                    replica:
                      replicaCount: 3
                      disableCommands:
                        - FLUSHDB
                        - FLUSHALL
                      persistence:
                        enabled: false
                    image:
                      registry: m.daocloud.io/docker.io
                      pullPolicy: IfNotPresent
                    sentinel:
                      enabled: false
                      persistence:
                        enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    metrics:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    volumePermissions:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
                    sysctl:
                      enabled: false
                      image:
                        registry: m.daocloud.io/docker.io
                        pullPolicy: IfNotPresent
              destination:
                server: https://kubernetes.default.svc
                namespace: application
      - name: prepare-argocd-binary
        inputs:
          artifacts:
          - name: argocd-binary
            path: /tmp/argocd
            mode: 755
            http:
              url: https://files.m.daocloud.io/github.com/argoproj/argo-cd/releases/download/v2.9.3/argocd-linux-amd64
        outputs:
          artifacts:
          - name: argocd-binary
            path: "{{inputs.artifacts.argocd-binary.path}}"
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          command:
          - sh
          - -c
          args:
          - |
            ls -l {{inputs.artifacts.argocd-binary.path}}
      - name: sync
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          - name: WITH_PRUNE_OPTION
            value: --prune
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app sync argocd/app-redis ${WITH_PRUNE_OPTION} --timeout 300
      - name: wait
        inputs:
          artifacts:
          - name: argocd-binary
            path: /usr/local/bin/argocd
          parameters:
          - name: argocd-server
          - name: insecure-option
            value: ""
        container:
          image: m.daocloud.io/docker.io/library/fedora:39
          env:
          - name: ARGOCD_USERNAME
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: username
          - name: ARGOCD_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-login-credentials
                key: password
          command:
          - sh
          - -c
          args:
          - |
            set -e
            export ARGOCD_SERVER={{inputs.parameters.argocd-server}}
            export INSECURE_OPTION={{inputs.parameters.insecure-option}}
            export ARGOCD_USERNAME=${ARGOCD_USERNAME:-admin}
            argocd login ${INSECURE_OPTION} --username ${ARGOCD_USERNAME} --password ${ARGOCD_PASSWORD} ${ARGOCD_SERVER}
            argocd app wait argocd/app-redis

    6.subimit to argo workflow client

    argo -n business-workflows submit deploy-redis-flow.yaml

    7.decode password

    kubectl -n application get secret redis-credentials -o jsonpath='{.data.redis-password}' | base64 -d

    FAQ

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    You can add standard markdown syntax:

    • multiple paragraphs
    • bullet point lists
    • emphasized, bold and even bold emphasized text
    • links
    • etc.
    ...and even source code

    the possibilities are endless (almost - including other shortcodes may or may not work)

    HPC

      Subsections of Monitor

      Install Permetheus Stack

      Installation

      Install By

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
      helm repo update
      helm install ay-helm-mirror/kube-prometheus-stack --generate-name

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. argoCD has installed, if not check 🔗link


      3. ingres has installed on argoCD, if not check 🔗link


      cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuerservice, , if not check 🔗link

      kubectl get namespaces monitor > /dev/null 2>&1 || kubectl create namespace monitor
      kubectl -n monitor create secret generic prometheus-stack-credentials \
        --from-literal=grafana-username=admin \
        --from-literal=grafana-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)
        apiVersion: argoproj.io/v1alpha1
        kind: Application
        metadata:
          name: prometheus-stack
        spec:
          syncPolicy:
            syncOptions:
              - CreateNamespace=true
              - ServerSideApply=true
          project: default
          source:
            repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
            chart: kube-prometheus-stack
            targetRevision: 72.6.2
            helm:
              releaseName: prometheus-stack
              values: |
                crds:
                  enabled: true
                global:
                  rbac:
                    create: true
                  imageRegistry: ""
                  imagePullSecrets: []
                alertmanager:
                  enabled: true
                  ingress:
                    enabled: false
                  serviceMonitor:
                    selfMonitor: true
                    interval: ""
                  alertmanagerSpec:
                    image:
                      registry: m.daocloud.io/quay.io
                      repository: prometheus/alertmanager
                      tag: v0.28.1
                    replicas: 1
                    resources: {}
                    storage:
                      volumeClaimTemplate:
                        spec:
                          storageClassName: ""
                          accessModes: ["ReadWriteOnce"]
                          resources:
                            requests:
                              storage: 2Gi
                grafana:
                  enabled: true
                  ingress:
                    enabled: true
                    annotations:
                      cert-manager.io/clusterissuer: self-signed-issuer
                      kubernetes.io/ingress.class: nginx
                    hosts:
                      - grafana.dev.tech
                    path: /
                    pathtype: ImplementationSpecific
                    tls:
                    - secretName: grafana.dev.tech-tls
                      hosts:
                      - grafana.dev.tech
                prometheusOperator:
                  admissionWebhooks:
                    patch:
                      resources: {}
                      image:
                        registry: m.daocloud.io/registry.k8s.io
                        repository: ingress-nginx/kube-webhook-certgen
                        tag: v1.5.3  
                  image:
                    registry: m.daocloud.io/quay.io
                    repository: prometheus-operator/prometheus-operator
                  prometheusConfigReloader:
                    image:
                      registry: m.daocloud.io/quay.io
                      repository: prometheus-operator/prometheus-config-reloader
                    resources: {}
                  thanosImage:
                    registry: m.daocloud.io/quay.io
                    repository: thanos/thanos
                    tag: v0.38.0
                prometheus:
                  enabled: true
                  ingress:
                    enabled: true
                    annotations:
                      cert-manager.io/clusterissuer: self-signed-issuer
                      kubernetes.io/ingress.class: nginx
                    hosts:
                      - prometheus.dev.tech
                    path: /
                    pathtype: ImplementationSpecific
                    tls:
                    - secretName: prometheus.dev.tech-tls
                      hosts:
                      - prometheus.dev.tech
                  prometheusSpec:
                    image:
                      registry: m.daocloud.io/quay.io
                      repository: prometheus/prometheus
                      tag: v3.4.0
                    replicas: 1
                    shards: 1
                    resources: {}
                    storageSpec: 
                      volumeClaimTemplate:
                        spec:
                          storageClassName: ""
                          accessModes: ["ReadWriteOnce"]
                          resources:
                            requests:
                              storage: 2Gi
                thanosRuler:
                  enabled: false
                  ingress:
                    enabled: false
                  thanosRulerSpec:
                    replicas: 1
                    storage: {}
                    resources: {}
                    image:
                      registry: m.daocloud.io/quay.io
                      repository: thanos/thanos
                      tag: v0.38.0
          destination:
            server: https://kubernetes.default.svc
            namespace: monitor
        kubectl -n argocd apply -f prometheus-stack.yaml
        argocd app sync argocd/prometheus-stack
        kubectl -n monitor get secret prometheus-stack-credentials -o jsonpath='{.data.grafana-password}' | base64 -d
        > add `$K8S_MASTER_IP grafana.dev.tech` to **/etc/hosts**
      
        > add `$K8S_MASTER_IP prometheus.dev.tech` to **/etc/hosts**
      prometheus-srver: https://prometheus.dev.tech:32443/


      grafana-console: https://grafana.dev.tech:32443/


      install based on docker

      echo  "start from head is important"

      FAQ

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      Subsections of Networking

      Install Ingress

      Installation

      Install By

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
      helm repo update
      helm install ay-helm-mirror/kube-prometheus-stack --generate-name

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. argoCD has installed, if not check 🔗link


      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: ingress-nginx
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
          chart: ingress-nginx
          targetRevision: 4.11.3
          helm:
            releaseName: ingress-nginx
            values: |
              controller:
                image:
                  registry: m.daocloud.io
                  image: registry.k8s.io/ingress-nginx/controller
                  tag: "v1.9.5"
                  pullPolicy: IfNotPresent
                service:
                  enabled: true
                  type: NodePort
                  nodePorts:
                    http: 32080
                    https: 32443
                    tcp:
                      8080: 32808
                admissionWebhooks:
                  enabled: true
                  patch:
                    enabled: true
                    image:
                      registry: m.daocloud.io
                      image: registry.k8s.io/ingress-nginx/kube-webhook-certgen
                      tag: v20231011-8b53cabe0
                      pullPolicy: IfNotPresent
              defaultBackend:
                enabled: false
        destination:
          server: https://kubernetes.default.svc
          namespace: basic-components
      
        kubectl -n argocd apply -f ingress-nginx.yaml
        argocd app sync argocd/ingress-nginx

      install based on docker

      echo  "start from head is important"

      FAQ

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      Install Istio

      Installation

      Install By

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      1.get helm repo

      helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
      helm repo update

      2.install chart

      helm install ay-helm-mirror/kube-prometheus-stack --generate-name
      Using Proxy

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      3. ArgoCD has installed, if not check 🔗link


      1.prepare `deploy-istio-base.yaml`

      kubectl -n argocd apply -f - << EOF
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: istio-base
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://istio-release.storage.googleapis.com/charts
          chart: base
          targetRevision: 1.23.2
          helm:
            releaseName: istio-base
            values: |
              defaults:
                global:
                  istioNamespace: istio-system
                base:
                  enableCRDTemplates: false
                  enableIstioConfigCRDs: true
                defaultRevision: "default"
        destination:
          server: https://kubernetes.default.svc
          namespace: istio-system
      EOF

      2.sync by argocd

      argocd app sync argocd/istio-base

      3.prepare `deploy-istiod.yaml`

      kubectl -n argocd apply -f - << EOF
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: istiod
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://istio-release.storage.googleapis.com/charts
          chart: istiod
          targetRevision: 1.23.2
          helm:
            releaseName: istiod
            values: |
              defaults:
                global:
                  istioNamespace: istio-system
                  defaultResources:
                    requests:
                      cpu: 10m
                      memory: 128Mi
                    limits:
                      cpu: 100m
                      memory: 128Mi
                  hub: m.daocloud.io/docker.io/istio
                  proxy:
                    autoInject: disabled
                    resources:
                      requests:
                        cpu: 100m
                        memory: 128Mi
                      limits:
                        cpu: 2000m
                        memory: 1024Mi
                pilot:
                  autoscaleEnabled: true
                  resources:
                    requests:
                      cpu: 500m
                      memory: 2048Mi
                  cpu:
                    targetAverageUtilization: 80
                  podAnnotations:
                    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
        destination:
          server: https://kubernetes.default.svc
          namespace: istio-system
      EOF

      4.sync by argocd

      argocd app sync argocd/istiod

      5.prepare `deploy-istio-ingressgateway.yaml`

      kubectl -n argocd apply -f - << EOF
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: istio-ingressgateway
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://istio-release.storage.googleapis.com/charts
          chart: gateway
          targetRevision: 1.23.2
          helm:
            releaseName: istio-ingressgateway
            values: |
              defaults:
                replicaCount: 1
                podAnnotations:
                  inject.istio.io/templates: "gateway"
                  sidecar.istio.io/inject: "true"
                  cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
                resources:
                  requests:
                    cpu: 100m
                    memory: 128Mi
                  limits:
                    cpu: 2000m
                    memory: 1024Mi
                service:
                  type: LoadBalancer
                  ports:
                  - name: status-port
                    port: 15021
                    protocol: TCP
                    targetPort: 15021
                  - name: http2
                    port: 80
                    protocol: TCP
                    targetPort: 80
                  - name: https
                    port: 443
                    protocol: TCP
                    targetPort: 443
                autoscaling:
                  enabled: true
                  minReplicas: 1
                  maxReplicas: 5
        destination:
          server: https://kubernetes.default.svc
          namespace: istio-system
      EOF

      6.sync by argocd

      argocd app sync argocd/istio-ingressgateway

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      3. ArgoCD has installed, if not check 🔗link


      4. Argo Workflow has installed, if not check 🔗link


      1.prepare `argocd-login-credentials`

      kubectl get namespaces database > /dev/null 2>&1 || kubectl create namespace database

      2.apply rolebinding to k8s

      kubectl apply -f - <<EOF
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: application-administrator
      rules:
        - apiGroups:
            - argoproj.io
          resources:
            - applications
          verbs:
            - '*'
        - apiGroups:
            - apps
          resources:
            - deployments
          verbs:
            - '*'
      
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        name: application-administration
        namespace: argocd
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: application-administrator
      subjects:
        - kind: ServiceAccount
          name: argo-workflow
          namespace: business-workflows
      
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        name: application-administration
        namespace: application
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: application-administrator
      subjects:
        - kind: ServiceAccount
          name: argo-workflow
          namespace: business-workflows
      EOF

      4.prepare `deploy-xxxx-flow.yaml`

      6.subimit to argo workflow client

      argo -n business-workflows submit deploy-xxxx-flow.yaml

      7.decode password

      kubectl -n application get secret xxxx-credentials -o jsonpath='{.data.xxx-password}' | base64 -d

      FAQ

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      Install Nginx

      1. prepare server.conf

      cat << EOF > default.conf
      server {
        listen 80;
        location / {
            root   /usr/share/nginx/html;
            autoindex on;
        }
      }
      EOF

      2. install

      mkdir $(pwd)/data
      podman run --rm -p 8080:80 \
          -v $(pwd)/data:/usr/share/nginx/html:ro \
          -v $(pwd)/default.conf:/etc/nginx/conf.d/default.conf:ro \
          -d docker.io/library/nginx:1.19.9-alpine
      echo 'this is a test' > $(pwd)/data/some-data.txt
      Tip

      you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

      visit http://localhost:8080

      Subsections of RPC

      gRpc

      This guide gets you started with gRPC in C++ with a simple working example.

      In the C++ world, there’s no universally accepted standard for managing project dependencies. You need to build and install gRPC before building and running this quick start’s Hello World example.

      Build and locally install gRPC and Protocol Buffers. The steps in the section explain how to build and locally install gRPC and Protocol Buffers using cmake. If you’d rather use bazel, see Building from source.

      1. Setup

      Choose a directory to hold locally installed packages. This page assumes that the environment variable MY_INSTALL_DIR holds this directory path. For example:

      export MY_INSTALL_DIR=$HOME/.local

      Ensure that the directory exists:

      mkdir -p $MY_INSTALL_DIR

      Add the local bin folder to your path variable, for example:

      export PATH="$MY_INSTALL_DIR/bin:$PATH"
      Important

      We strongly encourage you to install gRPC locally — using an appropriately set CMAKE_INSTALL_PREFIX — because there is no easy way to uninstall gRPC after you’ve installed it globally.

      2. Install Essentials

      2.1 Install Cmake

      You need version 3.13 or later of cmake. Install it by following these instructions:

      Install on
      sudo apt install -y cmake
      brew install cmake
      cmake --version
      2.2 Install basic tools required to build gRPC
      Install on
      sudo apt install -y build-essential autoconf libtool pkg-config
      brew install autoconf automake libtool pkg-config
      2.3 Clone the grpc repo

      Clone the grpc repo and its submodules:

      git clone --recurse-submodules -b v1.62.0 --depth 1 --shallow-submodules https://github.com/grpc/grpc
      2.4 Build and install gRPC and Protocol Buffers

      While not mandatory, gRPC applications usually leverage Protocol Buffers for service definitions and data serialization, and the example code uses proto3.

      The following commands build and locally install gRPC and Protocol Buffers:

      cd grpc
      mkdir -p cmake/build
      pushd cmake/build
      cmake -DgRPC_INSTALL=ON \
            -DgRPC_BUILD_TESTS=OFF \
            -DCMAKE_INSTALL_PREFIX=$MY_INSTALL_DIR \
            ../..
      make -j 4
      make install
      popd

      3. Run the example

      The example code is part of the grpc repo source, which you cloned as part of the steps of the previous section.

      3.1 change the example’s directory:
      cd examples/cpp/helloworld
      3.2 build the example project by using cmake

      make sure you still can echo $MY_INSTALL_DIR, and return a valid result

      mkdir -p cmake/build
      pushd cmake/build
      cmake -DCMAKE_PREFIX_PATH=$MY_INSTALL_DIR ../..
      make -j 4

      3.3 run the server

      ./greeter_server

      3.4 from a different terminal, run the client and see the client output:

      ./greeter_client

      and the result should be like this:

      Greeter received: Hello world

      Subsections of Storage

      Deploy Artifict Repository

      Preliminary

      • Kubernetes has installed, if not check link
      • minio is ready for artifact repository

        endpoint: minio.storage:9000

      Steps

      1. prepare bucket for s3 artifact repository

      # K8S_MASTER_IP could be you master ip or loadbalancer external ip
      K8S_MASTER_IP=172.27.253.27
      MINIO_ACCESS_SECRET=$(kubectl -n storage get secret minio-secret -o jsonpath='{.data.rootPassword}' | base64 -d)
      podman run --rm \
      --entrypoint bash \
      --add-host=minio-api.dev.geekcity.tech:${K8S_MASTER_IP} \
      -it docker.io/minio/mc:latest \
      -c "mc alias set minio http://minio-api.dev.geekcity.tech admin ${MINIO_ACCESS_SECRET} \
          && mc ls minio \
          && mc mb --ignore-existing minio/argo-workflows-artifacts"

      2. prepare secret s3-artifact-repository-credentials

      will create business-workflows namespace

      MINIO_ACCESS_KEY=$(kubectl -n storage get secret minio-secret -o jsonpath='{.data.rootUser}' | base64 -d)
      kubectl -n business-workflows create secret generic s3-artifact-repository-credentials \
          --from-literal=accessKey=${MINIO_ACCESS_KEY} \
          --from-literal=secretKey=${MINIO_ACCESS_SECRET}

      3. prepare configMap artifact-repositories.yaml

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: artifact-repositories
        annotations:
          workflows.argoproj.io/default-artifact-repository: default-artifact-repository
      data:
        default-artifact-repository: |
          s3:
            endpoint: minio.storage:9000
            insecure: true
            accessKeySecret:
              name: s3-artifact-repository-credentials
              key: accessKey
            secretKeySecret:
              name: s3-artifact-repository-credentials
              key: secretKey
            bucket: argo-workflows-artifacts

      4. apply artifact-repositories.yaml to k8s

      kubectl -n business-workflows apply -f artifact-repositories.yaml

      Install Minio

      Installation

      Install By

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm binary has installed, if not check 🔗link


      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. ArgoCD has installed, if not check 🔗link


      3. Ingres has installed on argoCD, if not check 🔗link


      4. Cert-manager has installed on argocd and the clusterissuer has a named `self-signed-ca-issuer`service, , if not check 🔗link


      1.prepare minio credentials secret

      kubectl get namespaces storage > /dev/null 2>&1 || kubectl create namespace storage
      kubectl -n storage create secret generic minio-secret \
          --from-literal=root-user=admin \
          --from-literal=root-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

      2.prepare `deploy-minio.yaml`

      kubectl -n argocd apply -f - << EOF
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: minio
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://aaronyang0628.github.io/helm-chart-mirror/charts
          chart: minio
          targetRevision: 16.0.10
          helm:
            releaseName: minio
            values: |
              global:
                imageRegistry: "m.daocloud.io/docker.io"
                imagePullSecrets: []
                storageClass: ""
                security:
                  allowInsecureImages: true
                compatibility:
                  openshift:
                    adaptSecurityContext: auto
              image:
                registry: m.daocloud.io/docker.io
                repository: bitnami/minio
              clientImage:
                registry: m.daocloud.io/docker.io
                repository: bitnami/minio-client
              mode: standalone
              defaultBuckets: ""
              auth:
                # rootUser: admin
                # rootPassword: ""
                existingSecret: "minio-secret"
              statefulset:
                updateStrategy:
                  type: RollingUpdate
                podManagementPolicy: Parallel
                replicaCount: 1
                zones: 1
                drivesPerNode: 1
              resourcesPreset: "micro"
              resources: 
                requests:
                  memory: 512Mi
                  cpu: 250m
                limits:
                  memory: 512Mi
                  cpu: 250m
              ingress:
                enabled: true
                ingressClassName: "nginx"
                hostname: minio-console.ay.dev
                path: /?(.*)
                pathType: ImplementationSpecific
                annotations: 
                  nginx.ingress.kubernetes.io/rewrite-target: /$1
                tls: true
                selfSigned: true
                extraHosts: []
              apiIngress:
                enabled: true
                ingressClassName: "nginx"
                hostname: minio-api.ay.dev
                path: /?(.*)
                pathType: ImplementationSpecific
                annotations: 
                  nginx.ingress.kubernetes.io/rewrite-target: /$1
              persistence:
                enabled: false
                storageClass: ""
                mountPath: /bitnami/minio/data
                accessModes:
                  - ReadWriteOnce
                size: 8Gi
                annotations: {}
                existingClaim: ""
              metrics:
                prometheusAuthType: public
                enabled: false
                serviceMonitor:
                  enabled: false
                  namespace: ""
                  labels: {}
                  jobLabel: ""
                  paths:
                    - /minio/v2/metrics/cluster
                    - /minio/v2/metrics/node
                  interval: 30s
                  scrapeTimeout: ""
                  honorLabels: false
                prometheusRule:
                  enabled: false
                  namespace: ""
                  additionalLabels: {}
                  rules: []
        destination:
          server: https://kubernetes.default.svc
          namespace: storage
      EOF

      3.sync by argocd

      argocd app sync argocd/minio

      4.decode minio secret

      kubectl -n storage get secret minio-secret -o jsonpath='{.data.root-password}' | base64 -d

      5.visit web console

      Login Credentials

      add $K8S_MASTER_IP minio-console.dev.tech to /etc/hosts

      address: 🔗http://minio-console.dev.tech:32080/login

      access key: admin

      secret key: ``

      6.using mc

      K8S_MASTER_IP=$(kubectl get node -l node-role.kubernetes.io/control-plane -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
      MINIO_ACCESS_SECRET=$(kubectl -n storage get secret minio-secret -o jsonpath='{.data.root-password}' | base64 -d)
      podman run --rm \
          --entrypoint bash \
          --add-host=minio-api.dev.tech:${K8S_MASTER_IP} \
          -it m.daocloud.io/docker.io/minio/mc:latest \
          -c "mc alias set minio http://minio-api.dev.tech:32080 admin ${MINIO_ACCESS_SECRET} \
              && mc ls minio \
              && mc mb --ignore-existing minio/test \
              && mc cp /etc/hosts minio/test/etc/hosts \
              && mc ls --recursive minio"
      K8S_MASTER_IP=$(kubectl get node -l node-role.kubernetes.io/control-plane -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
      MINIO_ACCESS_SECRET=$(kubectl -n storage get secret minio-secret -o jsonpath='{.data.root-password}' | base64 -d)
      podman run --rm \
          --entrypoint bash \
          --add-host=minio-api.dev.tech:${K8S_MASTER_IP} \
          -it m.daocloud.io/docker.io/minio/mc:latest

      Preliminary

      1. Docker has installed, if not check 🔗link


      Using Proxy

      you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

      1.init server

      mkdir -p $(pwd)/minio/data
      podman run --rm \
          --name minio-server \
          -p 9000:9000 \
          -p 9001:9001 \
          -v $(pwd)/minio/data:/data \
          -d docker.io/minio/minio:latest server /data --console-address :9001

      2.use web console

      And then you can visit 🔗http://localhost:9001

      username: `minioadmin`

      password: `minioadmin`

      3.use internal client

      podman run --rm \
          --entrypoint bash \
          -it docker.io/minio/mc:latest \
          -c "mc alias set minio http://host.docker.internal:9000 minioadmin minioadmin \
              && mc ls minio \
              && mc mb --ignore-existing minio/test \
              && mc cp /etc/hosts minio/test/etc/hosts \
              && mc ls --recursive minio"

      FAQ

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      Install NFS

      Installation

      Install By

      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. Helm has installed, if not check 🔗link


      Preliminary

      1. Kubernetes has installed, if not check 🔗link


      2. argoCD has installed, if not check 🔗link


      3. ingres has installed on argoCD, if not check 🔗link


      1.prepare `nfs-provisioner.yaml`

      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: nfs-provisioner
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner
          chart: nfs-subdir-external-provisioner
          targetRevision: 4.0.18
          helm:
            releaseName: nfs-provisioner
            values: |
              image:
                repository: m.daocloud.io/registry.k8s.io/sig-storage/nfs-subdir-external-provisioner
                pullPolicy: IfNotPresent
              nfs:
                server: nfs.services.test
                path: /
                mountOptions:
                  - vers=4
                  - minorversion=0
                  - rsize=1048576
                  - wsize=1048576
                  - hard
                  - timeo=600
                  - retrans=2
                  - noresvport
                volumeName: nfs-subdir-external-provisioner-nas
                reclaimPolicy: Retain
              storageClass:
                create: true
                defaultClass: true
                name: nfs-external-nas
        destination:
          server: https://kubernetes.default.svc
          namespace: storage

      3.deploy mariadb

      kubectl -n argocd apply -f nfs-provisioner.yaml

      4.sync by argocd

      argocd app sync argocd/nfs-provisioner

      Preliminary

      1. Docker has installed, if not check 🔗link


      Using Proxy

      you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

      1.init server

      echo -e "nfs\nnfsd" > /etc/modules-load.d/nfs4.conf
      modprobe nfs && modprobe nfsd
      mkdir -p $(pwd)/data/nfs/data
      echo '/data *(rw,fsid=0,no_subtree_check,insecure,no_root_squash)' > $(pwd)/data/nfs/exports
      podman run \
          --name nfs4 \
          --rm \
          --privileged \
          -p 2049:2049 \
          -v $(pwd)/data/nfs/data:/data \
          -v $(pwd)/data/nfs/exports:/etc/exports:ro \
          -d docker.io/erichough/nfs-server:2.2.1

      Preliminary

      1. centos yum repo source has updated, if not check 🔗link


      2.

      1.install nfs util

      sudo apt update -y
      sudo apt-get install nfs-common
      dnf update -y
      dnf install -y nfs-utils rpcbindn
      sudo apt update -y
      sudo apt-get install nfs-common

      2. create share folder

      mkdir /data && chmod 755 /data

      3.edit `/etc/exports`

      /data *(rw,sync,insecure,no_root_squash,no_subtree_check)

      4.start nfs server

      systemctl enable rpcbind
      systemctl enable nfs-server
      systemctl start rpcbind
      systemctl start nfs-server

      5.test load on localhost

      showmount -e localhost
      Export list for localhost:
      /data *

      6.test load on other ip

      showmount -e 192.168.aa.bb
      Export list for localhost:
      /data *

      7.mount nfs disk

      mkdir -p $(pwd)/mnt/nfs
      sudo mount -v 192.168.aa.bb:/data $(pwd)/mnt/nfs  -o proto=tcp -o nolock

      8.set nfs auto mount

      echo "192.168.aa.bb:/data /data nfs rw,auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0" >> /etc/fstab
      df -h

      Notes

      [Optional] create new partition
      disk size:
      fdisk /dev/vdb
      
      # n
      # p
      # w
      parted
      
      #select /dev/vdb 
      #mklabel gpt 
      #mkpart primary 0 -1
      #Cancel
      #mkpart primary 0% 100%
      #print
      [Optional]Format disk
      mkfs.xfs /dev/vdb1 -f
      [Optional] mount disk to folder
      mount /dev/vdb1 /data
      [Optional] mount when restart
      #vim `/etc/fstab` 
      /dev/vdb1     /data  xfs   defaults   0 0

      fstab fstab

      FAQ

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      You can add standard markdown syntax:

      • multiple paragraphs
      • bullet point lists
      • emphasized, bold and even bold emphasized text
      • links
      • etc.
      ...and even source code

      the possibilities are endless (almost - including other shortcodes may or may not work)

      Install Reids

      Preliminary

      • Kubernetes has installed, if not check link
      • argoCD has installed, if not check link
      • ingres has installed on argoCD, if not check link
      • cert-manager has installed on argocd and the clusterissuer has a named self-signed-ca-issuerservice, , if not check link

      Steps

      1. prepare secret

      kubectl get namespaces storage > /dev/null 2>&1 || kubectl create namespace storage
      kubectl -n storage create secret generic redis-credentials \
          --from-literal=redis-password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16)

      2. prepare redis.yaml

      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: redis
      spec:
        syncPolicy:
          syncOptions:
          - CreateNamespace=true
        project: default
        source:
          repoURL: https://charts.bitnami.com/bitnami
          chart: redis
          targetRevision: 18.16.0
          helm:
            releaseName: redis
            values: |
              architecture: replication
              auth:
                enabled: true
                sentinel: true
                existingSecret: redis-credentials
              master:
                count: 1
                disableCommands:
                  - FLUSHDB
                  - FLUSHALL
                persistence:
                  enabled: true
                  storageClass: nfs-external
                  size: 8Gi
              replica:
                replicaCount: 3
                disableCommands:
                  - FLUSHDB
                  - FLUSHALL
                persistence:
                  enabled: true
                  storageClass: nfs-external
                  size: 8Gi
              image:
                registry: m.daocloud.io/docker.io
                pullPolicy: IfNotPresent
              sentinel:
                enabled: false
                persistence:
                  enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                  pullPolicy: IfNotPresent
              metrics:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                  pullPolicy: IfNotPresent
              volumePermissions:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                  pullPolicy: IfNotPresent
              sysctl:
                enabled: false
                image:
                  registry: m.daocloud.io/docker.io
                  pullPolicy: IfNotPresent
              extraDeploy:
                - |
                  apiVersion: apps/v1
                  kind: Deployment
                  metadata:
                    name: redis-tool
                    namespace: csst
                    labels:
                      app.kubernetes.io/name: redis-tool
                  spec:
                    replicas: 1
                    selector:
                      matchLabels:
                        app.kubernetes.io/name: redis-tool
                    template:
                      metadata:
                        labels:
                          app.kubernetes.io/name: redis-tool
                      spec:
                        containers:
                        - name: redis-tool
                          image: m.daocloud.io/docker.io/bitnami/redis:7.2.4-debian-12-r8
                          imagePullPolicy: IfNotPresent
                          env:
                          - name: REDISCLI_AUTH
                            valueFrom:
                              secretKeyRef:
                                key: redis-password
                                name: redis-credentials
                          - name: TZ
                            value: Asia/Shanghai
                          command:
                          - tail
                          - -f
                          - /etc/hosts
        destination:
          server: https://kubernetes.default.svc
          namespace: storage

      3. apply to k8s

      kubectl -n argocd apply -f redis.yaml

      4. sync by argocd

      argocd app sync argocd/redis

      5. decode password

      kubectl -n storage get secret redis-credentials -o jsonpath='{.data.redis-password}' | base64 -d

      tests

      • kubectl -n storage exec -it deployment/redis-tool -- \
            redis-cli -c -h redis-master.storage ping
      • kubectl -n storage exec -it deployment/redis-tool -- \
            redis-cli -c -h redis-master.storage set mykey somevalue
      • kubectl -n storage exec -it deployment/redis-tool -- \
            redis-cli -c -h redis-master.storage get mykey
      • kubectl -n storage exec -it deployment/redis-tool -- \
            redis-cli -c -h redis-master.storage del mykey
      • kubectl -n storage exec -it deployment/redis-tool -- \
            redis-cli -c -h redis-master.storage get mykey

      Subsections of 👨‍💻Schedmd Slurm

      Subsections of Build&Install

      Install On Debian

      Cluster Setting

      1 Manager, 1 Login Node and 2 Compute node:

      hostnameIProlequota
      manage01192.168.56.115manager2C4G
      login01192.168.56.116login2C4G
      compute01192.168.56.117compute2C4G
      compute02192.168.56.118compute2C4G

      Software Version:

      softwareversion
      osDebian 12 bookworm
      slurm24.05.2

      Prepare Steps (All Nodes)

      1. Modify the /etc/network/interfaces file (if you cannot get ipv4 address)

      Append the following lines to the file

      allow-hotplug enps08
      iface enps08 inet dhcp

      restart the network

      systemctl restart networking
      1. Modify the /etc/apt/sources.list file Using tuna mirror
      cat > /etc/apt/sources.list << EOF
      deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
      deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
      
      deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
      deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
      
      deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
      deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
      
      deb https://mirrors.tuna.tsinghua.edu.cn/debian-security/ bookworm-security main contrib non-free non-free-firmware
      deb-src https://mirrors.tuna.tsinghua.edu.cn/debian-security/ bookworm-security main contrib non-free non-free-firmware
      EOF
      1. update apt cache
      apt clean all && apt update
      1. set hostname on each node
      Node:
      hostnamectl set-hostname manage01
      hostnamectl set-hostname login01
      hostnamectl set-hostname compute01
      hostnamectl set-hostname compute02
      1. set hosts file
      cat >> /etc/hosts << EOF
      192.168.56.115 manage01
      192.168.56.116 login01
      192.168.56.117 compute01
      192.168.56.118 compute02
      EOF
      1. disable firewall
      systemctl stop nftables && systemctl disable nftables
      1. install packages ntpdate
      apt-get -y install ntpdate

      sync server time

      ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
      echo 'Asia/Shanghai' >/etc/timezone
      ntpdate time.windows.com
      1. add cron job to sync time
      crontab -e
      */5 * * * * /usr/sbin/ntpdate time.windows.com
      1. create ssh key pair on each node
      ssh-keygen -t rsa -b 4096 -C $HOSTNAME
      1. ssh login without password [All Node]
      Node:
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@login01
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@compute01
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@compute02
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@manage01
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@compute01
      ssh-copy-id -i ~/.ssh/id_rsa.pub root@compute02

      Install Components

      1. Install NFS server (Manager Node)

      there are many ways to install NFS server

      create shared folder

      mkdir /data
      chmod 755 /data

      modify vim /etc/exports

      /data *(rw,sync,insecure,no_subtree_check,no_root_squash)

      start nfs server

      systemctl start rpcbind 
      systemctl start nfs-server 
      
      systemctl enable rpcbind 
      systemctl enable nfs-server

      check nfs server

      showmount -e localhost
      
      # Output
      Export list for localhost:
      /data *
      1. Install munge service
      • add user munge (All Nodes)
      groupadd -g 1108 munge
      useradd -m -c "Munge Uid 'N' Gid Emporium" -d /var/lib/munge -u 1108 -g munge -s /sbin/nologin munge
      • Install rng-tools-debian (Manager Nodes)
      apt-get install -y rng-tools-debian
      # modify service script
      vim /usr/lib/systemd/system/rngd.service
      [Service]
      ExecStart=/usr/sbin/rngd -f -r /dev/urandom
      systemctl daemon-reload
      systemctl start rngd
      systemctl enable rngd
      apt-get install -y libmunge-dev libmunge2 munge
      • generate secret key (Manager Nodes)
      dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
      • copy munge.key from manager node to the rest node (All Nodes)
      scp -p /etc/munge/munge.key root@login01:/etc/munge/
      scp -p /etc/munge/munge.key root@compute01:/etc/munge/
      scp -p /etc/munge/munge.key root@compute02:/etc/munge/
      • grant privilege on munge.key (All Nodes)
      chown munge: /etc/munge/munge.key
      chmod 400 /etc/munge/munge.key
      
      systemctl start munge
      systemctl enable munge

      Using systemctl status munge to check if the service is running

      • test munge
      munge -n | ssh compute01 unmunge
      1. Install Mariadb (Manager Nodes)
      apt-get install -y mariadb-server
      • create database and user
      systemctl start mariadb
      systemctl enable mariadb
      
      ROOT_PASS=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) 
      mysql -e "CREATE USER root IDENTIFIED BY '${ROOT_PASS}'"
      mysql -uroot -p$ROOT_PASS -e 'create database slurm_acct_db'
      • create user slurm,and grant all privileges on database slurm_acct_db
      mysql -uroot -p$ROOT_PASS
      create user slurm;
      
      grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by '123456' with grant option;
      
      flush privileges;
      • create Slurm user
      groupadd -g 1109 slurm
      useradd -m -c "Slurm manager" -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm

      Install Slurm (All Nodes)

      • Install basic Debian package build requirements:
      apt-get install -y build-essential fakeroot devscripts equivs
      • Unpack the distributed tarball:
      wget https://download.schedmd.com/slurm/slurm-24.05.2.tar.bz2 -O slurm-24.05.2.tar.bz2 &&
      tar -xaf slurm*tar.bz2
      • cd to the directory containing the Slurm source:
      cd slurm-24.05.2 &&   mkdir -p /etc/slurm && ./configure 
      • compile slurm
      make install
      • modify configuration files (Manager Nodes)

        cp /root/slurm-24.05.2/etc/slurm.conf.example /etc/slurm/slurm.conf
        vim /etc/slurm/slurm.conf

        focus on these options:

        SlurmctldHost=manage
        
        AccountingStorageEnforce=associations,limits,qos
        AccountingStorageHost=manage
        AccountingStoragePass=/var/run/munge/munge.socket.2
        AccountingStoragePort=6819  
        AccountingStorageType=accounting_storage/slurmdbd  
        
        JobCompHost=localhost
        JobCompLoc=slurm_acct_db
        JobCompPass=123456
        JobCompPort=3306
        JobCompType=jobcomp/mysql
        JobCompUser=slurm
        JobContainerType=job_container/none
        JobAcctGatherType=jobacct_gather/linux
        cp /root/slurm-24.05.2/etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
        vim /etc/slurm/slurmdbd.conf
        • modify /etc/slurm/cgroup.conf
        cp /root/slurm-24.05.2/etc/cgroup.conf.example /etc/slurm/cgroup.conf
        • send configuration files to other nodes
        scp -r /etc/slurm/*.conf  root@login01:/etc/slurm/
        scp -r /etc/slurm/*.conf  root@compute01:/etc/slurm/
        scp -r /etc/slurm/*.conf  root@compute02:/etc/slurm/
      • grant privilege on some directories (All Nodes)

      mkdir /var/spool/slurmd
      chown slurm: /var/spool/slurmd
      mkdir /var/log/slurm
      chown slurm: /var/log/slurm
      
      mkdir /var/spool/slurmctld
      chown slurm: /var/spool/slurmctld
      
      chown slurm: /etc/slurm/slurmdbd.conf
      chmod 600 /etc/slurm/slurmdbd.conf
      • start slurm services on each node
      Node:
      systemctl start slurmdbd
      systemctl enable slurmdbd
      
      systemctl start slurmctld
      systemctl enable slurmctld
      
      systemctl start slurmd
      systemctl enable slurmd
      Using `systemctl status xxxx` to check if the `xxxx` service is running
      ```text
      # vim /usr/lib/systemd/system/slurmdbd.service
      
      
      [Unit]
      Description=Slurm DBD accounting daemon
      After=network-online.target remote-fs.target munge.service mysql.service mysqld.service mariadb.service sssd.service
      Wants=network-online.target
      ConditionPathExists=/etc/slurm/slurmdbd.conf
      
      [Service]
      Type=simple
      EnvironmentFile=-/etc/sysconfig/slurmdbd
      EnvironmentFile=-/etc/default/slurmdbd
      User=slurm
      Group=slurm
      RuntimeDirectory=slurmdbd
      RuntimeDirectoryMode=0755
      ExecStart=/usr/local/sbin/slurmdbd -D -s $SLURMDBD_OPTIONS
      ExecReload=/bin/kill -HUP $MAINPID
      LimitNOFILE=65536
      
      
      # Uncomment the following lines to disable logging through journald.
      # NOTE: It may be preferable to set these through an override file instead.
      #StandardOutput=null
      #StandardError=null
      
      [Install]
      WantedBy=multi-user.target
      ```
      
      ```text
      # vim /usr/lib/systemd/system/slurmctld.service
      
      
      [Unit]
      Description=Slurm controller daemon
      After=network-online.target remote-fs.target munge.service sssd.service
      Wants=network-online.target
      ConditionPathExists=/etc/slurm/slurm.conf
      
      [Service]
      Type=notify
      EnvironmentFile=-/etc/sysconfig/slurmctld
      EnvironmentFile=-/etc/default/slurmctld
      User=slurm
      Group=slurm
      RuntimeDirectory=slurmctld
      RuntimeDirectoryMode=0755
      ExecStart=/usr/local/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS
      ExecReload=/bin/kill -HUP $MAINPID
      LimitNOFILE=65536
      
      
      # Uncomment the following lines to disable logging through journald.
      # NOTE: It may be preferable to set these through an override file instead.
      #StandardOutput=null
      #StandardError=null
      
      [Install]
      WantedBy=multi-user.target
      ```
      
      ```text
      # vim /usr/lib/systemd/system/slurmd.service
      
      
      [Unit]
      Description=Slurm node daemon
      After=munge.service network-online.target remote-fs.target sssd.service
      Wants=network-online.target
      #ConditionPathExists=/etc/slurm/slurm.conf
      
      [Service]
      Type=notify
      EnvironmentFile=-/etc/sysconfig/slurmd
      EnvironmentFile=-/etc/default/slurmd
      RuntimeDirectory=slurm
      RuntimeDirectoryMode=0755
      ExecStart=/usr/local/sbin/slurmd --systemd $SLURMD_OPTIONS
      ExecReload=/bin/kill -HUP $MAINPID
      KillMode=process
      LimitNOFILE=131072
      LimitMEMLOCK=infinity
      LimitSTACK=infinity
      Delegate=yes
      
      
      # Uncomment the following lines to disable logging through journald.
      # NOTE: It may be preferable to set these through an override file instead.
      #StandardOutput=null
      #StandardError=null
      
      [Install]
      WantedBy=multi-user.target
      ```
      
      systemctl start slurmd
      systemctl enable slurmd
      Using `systemctl status slurmd` to check if the `slurmd` service is running
      systemctl start slurmd
      systemctl enable slurmd
      Using `systemctl status slurmd` to check if the `slurmd` service is running
      systemctl start slurmd
      systemctl enable slurmd
      Using `systemctl status slurmd` to check if the `slurmd` service is running
      • test slurm check cluster configuration
      scontrol show config

      check cluster status

      sinfo
      scontrol show partition
      scontrol show node

      submit job

      srun -N2 hostname
      scontrol show jobs
      check job status
      squeue -a

      Install From Binary

      (All) means all type nodes should install this component.

      (Mgr) means only the manager node should install this component.

      (Auth) means only the Auth node should install this component.

      (Cmp) means only the Compute node should install this component.

      Typically, there are three nodes are required to run Slurm. 1 Manage(Mgr), 1 Auth and N Compute(Cmp). but you can choose to install all service in single node. check

      Prequisites

      1. change hostname (All)
        hostnamectl set-hostname (manager|auth|computeXX)
      2. modify /etc/hosts (All)
        echo "192.aa.bb.cc (manager|auth|computeXX)" >> /etc/hosts
      3. disable firewall, selinux, dnsmasq, swap (All). more detail here
      4. NFS Server (Mgr). NFS is used as the default file system for the Slurm accounting database.
      5. [NFS Client] (All). all node should mount the NFS share
        mount <$nfs_server>:/data /data -o proto=tcp -o nolock
      6. Munge (All). The auth/munge plugin will be built if the MUNGE authentication development library is installed. MUNGE is used as the default authentication mechanism.

        All node need to have the munge user and group.

        groupadd -g 1108 munge
        useradd -m -c "Munge Uid 'N' Gid Emporium" -d /var/lib/munge -u 1108 -g munge -s /sbin/nologin munge
        yum install epel-release -y
        yum install munge munge-libs munge-devel -y

        Create global secret key

        /usr/sbin/create-munge-key -r
        dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

        sync secret to the rest of nodes

        scp -p /etc/munge/munge.key root@<$rest_node>:/etc/munge/
        ssh root@<$rest_node> "chown munge: /etc/munge/munge.key && chmod 400 /etc/munge/munge.key"
        ssh root@<$rest_node> "systemctl start munge && systemctl enable munge"

        test munge if it works

        munge -n | unmunge
      7. Database (Mgr). MySQL support for accounting will be built if the MySQL or MariaDB development library is present. A currently supported version of MySQL or MariaDB should be used.

        install mariadb

        yum -y install mariadb-server
        systemctl start mariadb && systemctl enable mariadb
        ROOT_PASS=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) 
        mysql -e "CREATE USER root IDENTIFIED BY '${ROOT_PASS}'"

        login mysql

        mysql -u root -p${ROOT_PASS}
        create database slurm_acct_db;
        create user slurm;
        grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by '123456' with grant option;
        flush privileges;
        quit

        asdadadasasda

      Install Slurm

      1. create slurm user (All)
        groupadd -g 1109 slurm
        useradd -m -c "slurm manager" -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm
      Install Slurm from

      Build RPM package

      1. install depeendencies (Mgr)

        yum -y install gcc gcc-c++ readline-devel perl-ExtUtils-MakeMaker pam-devel rpm-build mysql-devel python3
      2. build rpm package (Mgr)

        wget https://download.schedmd.com/slurm/slurm-24.05.2.tar.bz2 -O slurm-24.05.2.tar.bz2
        rpmbuild -ta --nodeps slurm-24.05.2.tar.bz2

        The rpm files will be installed under the $(HOME)/rpmbuild directory of the user building them.

      3. send rpm to rest nodes (Mgr)

        ssh root@<$rest_node> "mkdir -p /root/rpmbuild/RPMS/"
        scp -p $(HOME)/rpmbuild/RPMS/x86_64 root@<$rest_node>:/root/rpmbuild/RPMS/x86_64
      4. install rpm (Mgr)

        ssh root@<$rest_node> "yum localinstall /root/rpmbuild/RPMS/x86_64/slurm-*"
      5. modify configuration file (Mgr)

        cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf
        cp /etc/slurm/slurm.conf.example /etc/slurm/slurm.conf
        cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
        chmod 600 /etc/slurm/slurmdbd.conf
        chown slurm: /etc/slurm/slurmdbd.conf

        cgroup.conf doesnt need to change.

        edit /etc/slurm/slurm.conf, you can use this link as a reference

        edit /etc/slurm/slurmdbd.conf, you can use this link as a reference

      Install yum repo directly

      1. install slurm (All)

        yum -y slurm-wlm slurmdbd
      2. modify configuration file (All)

        vim /etc/slurm-llnl/slurm.conf
        vim /etc/slurm-llnl/slurmdbd.conf

        cgroup.conf doesnt need to change.

        edit /etc/slurm/slurm.conf, you can use this link as a reference

        edit /etc/slurm/slurmdbd.conf, you can use this link as a reference

      1. send configuration (Mgr)
         scp -r /etc/slurm/*.conf  root@<$rest_node>:/etc/slurm/
         ssh rootroot@<$rest_node> "mkdir /var/spool/slurmd && chown slurm: /var/spool/slurmd"
         ssh rootroot@<$rest_node> "mkdir /var/log/slurm && chown slurm: /var/log/slurm"
         ssh rootroot@<$rest_node> "mkdir /var/spool/slurmctld && chown slurm: /var/spool/slurmctld"
      2. start service (Mgr)
        ssh rootroot@<$rest_node> "systemctl start slurmdbd && systemctl enable slurmdbd"
        ssh rootroot@<$rest_node> "systemctl start slurmctld && systemctl enable slurmctld"
      3. start service (All)
        ssh rootroot@<$rest_node> "systemctl start slurmd && systemctl enable slurmd"

      Test

      1. show cluster status
      scontrol show config
      sinfo
      scontrol show partition
      scontrol show node
      1. submit job
      srun -N2 hostname
      scontrol show jobs
      1. check job status
      squeue -a

      Reference:

      1. https://slurm.schedmd.com/documentation.html
      2. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
      3. https://github.com/Artlands/Install-Slurm

      Subsections of CheatSheet

      Common Environment Variables

      VariableDescription
      $SLURM_JOB_IDThe Job ID.
      $SLURM_JOBIDDeprecated. Same as $SLURM_JOB_ID
      $SLURM_SUBMIT_HOSTThe hostname of the node used for job submission.
      $SLURM_JOB_NODELISTContains the definition (list) of the nodes that is assigned to the job.
      $SLURM_NODELISTDeprecated. Same as SLURM_JOB_NODELIST.
      $SLURM_CPUS_PER_TASKNumber of CPUs per task.
      $SLURM_CPUS_ON_NODENumber of CPUs on the allocated node.
      $SLURM_JOB_CPUS_PER_NODECount of processors available to the job on this node.
      $SLURM_CPUS_PER_GPUNumber of CPUs requested per allocated GPU.
      $SLURM_MEM_PER_CPUMemory per CPU. Same as –mem-per-cpu .
      $SLURM_MEM_PER_GPUMemory per GPU.
      $SLURM_MEM_PER_NODEMemory per node. Same as –mem .
      $SLURM_GPUSNumber of GPUs requested.
      $SLURM_NTASKSSame as -n, –ntasks. The number of tasks.
      $SLURM_NTASKS_PER_NODENumber of tasks requested per node.
      $SLURM_NTASKS_PER_SOCKETNumber of tasks requested per socket.
      $SLURM_NTASKS_PER_CORENumber of tasks requested per core.
      $SLURM_NTASKS_PER_GPUNumber of tasks requested per GPU.
      $SLURM_NPROCSSame as -n, –ntasks. See $SLURM_NTASKS.
      $SLURM_TASKS_PER_NODENumber of tasks to be initiated on each node.
      $SLURM_ARRAY_JOB_IDJob array’s master job ID number.
      $SLURM_ARRAY_TASK_IDJob array ID (index) number.
      $SLURM_ARRAY_TASK_COUNTTotal number of tasks in a job array.
      $SLURM_ARRAY_TASK_MAXJob array’s maximum ID (index) number.
      $SLURM_ARRAY_TASK_MINJob array’s minimum ID (index) number.

      A full list of environment variables for SLURM can be found by visiting the SLURM page on environment variables.

      File Operations

      File Distribution

      • sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
        • Feature
          1. distribute file:Quickly copy files to all compute nodes assigned to the job, avoiding the hassle of manually distributing files. Faster than traditional scp or rsync, especially when distributing to multiple nodes。
          2. simplify script:one command to distribute files to all nodes assigned to the job。
          3. imrpove performance:Improve file distribution speed by parallelizing transfers, especially for large or multiple files。
        • Usage
          1. Alone
          sbcast <source_file> <destination_path>
          1. Embedded in a job script
          #!/bin/bash
          #SBATCH --job-name=example_job
          #SBATCH --output=example_job.out
          #SBATCH --error=example_job.err
          #SBATCH --partition=compute
          #SBATCH --nodes=4
          
          # Use sbcast to distribute the file to the /tmp directory of each node
          sbcast data.txt /tmp/data.txt
          
          # Run your program using the distributed files
          srun my_program /tmp/data.txt

      File Collection

      1. File Redirection When submitting a job, you can use the #SBATCH –output and #SBATCH –error directives to redirect standard output and standard error to specified files.

         #SBATCH --output=output.txt
         #SBATCH --error=error.txt

        Or

        sbatch -N2 -w "compute[01-02]" -o result/file/path xxx.slurm
      2. Send the destination address manually Using scp or rsync in the job to copy the files from the compute nodes to the submit node

      3. Using NFS If a shared file system (such as NFS, Lustre, or GPFS) is configured in the computing cluster, the result files can be written directly to the shared directory. In this way, the result files generated by all nodes are automatically stored in the same location.

      4. Using sbcast

      Submit Jobs

      3 Type Jobs

      • srun is used to submit a job for execution or initiate job steps in real time.

        • Example
          1. run shell
          srun -N2 bin/hostname
          1. run script
          srun -N1 test.sh
      • sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

        • Example

          1. submit a batch job
          sbatch -N2 -w "compute[01-02]" -o job.stdout /data/jobs/batch-job.slurm
          #!/bin/bash
          
          #SBATCH -N 1
          #SBATCH --job-name=cpu-N1-batch
          #SBATCH --partition=compute
          #SBATCH --mail-type=end
          #SBATCH --mail-user=xxx@email.com
          #SBATCH --output=%j.out
          #SBATCH --error=%j.err
          
          srun -l /bin/hostname #you can still write srun <command> in here
          srun -l pwd
          
          1. submit a parallel task to process differnt data partition
          sbatch /data/jobs/parallel.slurm
          #!/bin/bash
          #SBATCH -N 2 
          #SBATCH --job-name=cpu-N2-parallel
          #SBATCH --partition=compute
          #SBATCH --time=01:00:00
          #SBATCH --array=1-4  # 定义任务数组,假设有4个分片
          #SBATCH --ntasks-per-node=1 # 每个节点只运行一个任务
          #SBATCH --output=process_data_%A_%a.out
          #SBATCH --error=process_data_%A_%a.err
          
          TASK_ID=${SLURM_ARRAY_TASK_ID}
          
          DATA_PART="data_part_${TASK_ID}.txt" #make sure you have that file
          
          if [ -f ${DATA_PART} ]; then
              echo "Processing ${DATA_PART} on node $(hostname)"
              # python process_data.py --input ${DATA_PART}
          else
              echo "File ${DATA_PART} does not exist!"
          fi
          
          split -l 1000 data.txt data_part_ 
          && mv data_part_aa data_part_1 
          && mv data_part_ab data_part_2
          
      • salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.

        • Example
          1. allocate resources (more like create an virtual machine)
          salloc -N2 bash
          This command will create a job which allocates 2 nodes and spawn a bash shell on each node. and you can execute srun commands in that environment. After your computing task is finsihs, remember to shutdown your job.
          scancel <$job_id>
          when you exit the job, the resources will be released.

      Subsections of MPI Libs

      Test Intel MPI Jobs

      在SLURM集群中使用MPI(Message Passing Interface)进行并行计算,通常需要以下几个步骤:

      1. 安装MPI库

      确保你的集群节点已经安装了MPI库,常见的MPI实现包括:

      • OpenMPI
      • Intel MPI
      • MPICH 可以通过以下命令检查集群是否安装了MPI:
      mpicc --version  # 检查MPI编译器
      mpirun --version # 检查MPI运行时环境

      2. 测试MPI性能

      mpirun -n 2 IMB-MPI1 pingpong

      3. 编译MPI程序

      你可以用mpicc(C语言)或mpic++(C++语言)来编译MPI程序。例如:

      以下是一个简单的MPI “Hello, World!” 示例程序,假设文件名为 hello_mpi.c, 还有一个进行矩阵计算的示例程序,文件名为dot_product.c,任意挑选一个即可:

      #include <stdio.h>
      #include <mpi.h>
      
      int main(int argc, char *argv[]) {
          int rank, size;
          
          // 初始化MPI环境
          MPI_Init(&argc, &argv);
      
          // 获取当前进程的rank和总进程数
          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
          MPI_Comm_size(MPI_COMM_WORLD, &size);
      
          // 输出进程的信息
          printf("Hello, World! I am process %d out of %d processes.\n", rank, size);
      
          // 退出MPI环境
          MPI_Finalize();
      
          return 0;
      }
      #include <stdio.h>
      #include <stdlib.h>
      #include <mpi.h>
      
      #define N 8  // 向量大小
      
      // 计算向量的局部点积
      double compute_local_dot_product(double *A, double *B, int start, int end) {
          double local_dot = 0.0;
          for (int i = start; i < end; i++) {
              local_dot += A[i] * B[i];
          }
          return local_dot;
      }
      
      void print_vector(double *Vector) {
          for (int i = 0; i < N; i++) {
              printf("%f ", Vector[i]);   
          }
          printf("\n");
      }
      
      int main(int argc, char *argv[]) {
          int rank, size;
      
          // 初始化MPI环境
          MPI_Init(&argc, &argv);
          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
          MPI_Comm_size(MPI_COMM_WORLD, &size);
      
          // 向量A和B
          double A[N], B[N];
      
          // 进程0初始化向量A和B
          if (rank == 0) {
              for (int i = 0; i < N; i++) {
                  A[i] = i + 1;  // 示例数据
                  B[i] = (i + 1) * 2;  // 示例数据
              }
          }
      
          // 广播向量A和B到所有进程
          MPI_Bcast(A, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
          MPI_Bcast(B, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
      
          // 每个进程计算自己负责的部分
          int local_n = N / size;  // 每个进程处理的元素个数
          int start = rank * local_n;
          int end = (rank + 1) * local_n;
          
          // 如果是最后一个进程,确保处理所有剩余的元素(处理N % size)
          if (rank == size - 1) {
              end = N;
          }
      
          double local_dot_product = compute_local_dot_product(A, B, start, end);
      
          // 使用MPI_Reduce将所有进程的局部点积结果汇总到进程0
          double global_dot_product = 0.0;
          MPI_Reduce(&local_dot_product, &global_dot_product, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
      
          // 进程0输出最终结果
          if (rank == 0) {
              printf("Vector A is\n");
              print_vector(A);
              printf("Vector B is\n");
              print_vector(B);
              printf("Dot Product of A and B: %f\n", global_dot_product);
          }
      
          // 结束MPI环境
          MPI_Finalize();
          return 0;
      }

      3. 创建Slurm作业脚本

      创建一个SLURM作业脚本来运行该MPI程序。以下是一个基本的SLURM作业脚本,假设文件名为 mpi_test.slurm:

      #!/bin/bash
      #SBATCH --job-name=mpi_job       # Job name
      #SBATCH --nodes=2                # Number of nodes to use
      #SBATCH --ntasks-per-node=1      # Number of tasks per node
      #SBATCH --time=00:10:00          # Time limit
      #SBATCH --output=mpi_test_output_%j.log     # Standard output file
      #SBATCH --error=mpi_test_output_%j.err     # Standard error file
      
      # Manually set Intel OneAPI MPI and Compiler environment
      export I_MPI_PMI=pmi2
      export I_MPI_PMI_LIBRARY=/usr/lib/x86_64-linux-gnu/slurm/mpi_pmi2.so
      export I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.14
      export INTEL_COMPILER_ROOT=/opt/intel/oneapi/compiler/2025.0
      export PATH=$I_MPI_ROOT/bin:$INTEL_COMPILER_ROOT/bin:$PATH
      export LD_LIBRARY_PATH=$I_MPI_ROOT/lib:$INTEL_COMPILER_ROOT/lib:$LD_LIBRARY_PATH
      export MANPATH=$I_MPI_ROOT/man:$INTEL_COMPILER_ROOT/man:$MANPATH
      
      # Compile the MPI program
      icx-cc -I$I_MPI_ROOT/include  hello_mpi.c -o hello_mpi -L$I_MPI_ROOT/lib -lmpi
      
      # Run the MPI job
      
      mpirun -np 2 ./hello_mpi
      #!/bin/bash
      #SBATCH --job-name=mpi_job       # Job name
      #SBATCH --nodes=2                # Number of nodes to use
      #SBATCH --ntasks-per-node=1      # Number of tasks per node
      #SBATCH --time=00:10:00          # Time limit
      #SBATCH --output=mpi_test_output_%j.log     # Standard output file
      #SBATCH --error=mpi_test_output_%j.err     # Standard error file
      
      # Manually set Intel OneAPI MPI and Compiler environment
      export I_MPI_PMI=pmi2
      export I_MPI_PMI_LIBRARY=/usr/lib/x86_64-linux-gnu/slurm/mpi_pmi2.so
      export I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.14
      export INTEL_COMPILER_ROOT=/opt/intel/oneapi/compiler/2025.0
      export PATH=$I_MPI_ROOT/bin:$INTEL_COMPILER_ROOT/bin:$PATH
      export LD_LIBRARY_PATH=$I_MPI_ROOT/lib:$INTEL_COMPILER_ROOT/lib:$LD_LIBRARY_PATH
      export MANPATH=$I_MPI_ROOT/man:$INTEL_COMPILER_ROOT/man:$MANPATH
      
      # Compile the MPI program
      icx-cc -I$I_MPI_ROOT/include  dot_product.c -o dot_product -L$I_MPI_ROOT/lib -lmpi
      
      # Run the MPI job
      
      mpirun -np 2 ./dot_product

      4. 编译MPI程序

      在运行作业之前,你需要编译MPI程序。在集群上使用mpicc来编译该程序。假设你将程序保存在 hello_mpi.c 文件中,使用以下命令进行编译:

      mpicc -o hello_mpi hello_mpi.c
      mpicc -o dot_product dot_product.c

      5. 提交Slurm作业

      保存上述作业脚本(mpi_test.slurm)并使用以下命令提交作业:

      sbatch mpi_test.slurm

      6. 查看作业状态

      你可以使用以下命令查看作业的状态:

      squeue -u <your_username>

      7. 检查输出

      作业完成后,输出将保存在你作业脚本中指定的文件中(例如 mpi_test_output_<job_id>.log)。你可以使用 cat 或任何文本编辑器查看输出:

      cat mpi_test_output_*.log

      示例输出 如果一切正常,输出会类似于:

      Hello, World! I am process 0 out of 2 processes.
      Hello, World! I am process 1 out of 2 processes.
      Result Matrix C (A * B):
      14 8 2 -4 
      20 10 0 -10 
      -1189958655 1552515295 21949 -1552471397 
      0 0 0 0 

      Test Open MPI Jobs

      在SLURM集群中使用MPI(Message Passing Interface)进行并行计算,通常需要以下几个步骤:

      1. 安装MPI库

      确保你的集群节点已经安装了MPI库,常见的MPI实现包括:

      • OpenMPI
      • Intel MPI
      • MPICH 可以通过以下命令检查集群是否安装了MPI:
      mpicc --version  # 检查MPI编译器
      mpirun --version # 检查MPI运行时环境

      2. 编译MPI程序

      你可以用mpicc(C语言)或mpic++(C++语言)来编译MPI程序。例如:

      以下是一个简单的MPI “Hello, World!” 示例程序,假设文件名为 hello_mpi.c, 还有一个进行矩阵计算的示例程序,文件名为dot_product.c,任意挑选一个即可:

      #include <stdio.h>
      #include <mpi.h>
      
      int main(int argc, char *argv[]) {
          int rank, size;
          
          // 初始化MPI环境
          MPI_Init(&argc, &argv);
      
          // 获取当前进程的rank和总进程数
          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
          MPI_Comm_size(MPI_COMM_WORLD, &size);
      
          // 输出进程的信息
          printf("Hello, World! I am process %d out of %d processes.\n", rank, size);
      
          // 退出MPI环境
          MPI_Finalize();
      
          return 0;
      }
      #include <stdio.h>
      #include <stdlib.h>
      #include <mpi.h>
      
      #define N 8  // 向量大小
      
      // 计算向量的局部点积
      double compute_local_dot_product(double *A, double *B, int start, int end) {
          double local_dot = 0.0;
          for (int i = start; i < end; i++) {
              local_dot += A[i] * B[i];
          }
          return local_dot;
      }
      
      void print_vector(double *Vector) {
          for (int i = 0; i < N; i++) {
              printf("%f ", Vector[i]);   
          }
          printf("\n");
      }
      
      int main(int argc, char *argv[]) {
          int rank, size;
      
          // 初始化MPI环境
          MPI_Init(&argc, &argv);
          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
          MPI_Comm_size(MPI_COMM_WORLD, &size);
      
          // 向量A和B
          double A[N], B[N];
      
          // 进程0初始化向量A和B
          if (rank == 0) {
              for (int i = 0; i < N; i++) {
                  A[i] = i + 1;  // 示例数据
                  B[i] = (i + 1) * 2;  // 示例数据
              }
          }
      
          // 广播向量A和B到所有进程
          MPI_Bcast(A, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
          MPI_Bcast(B, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
      
          // 每个进程计算自己负责的部分
          int local_n = N / size;  // 每个进程处理的元素个数
          int start = rank * local_n;
          int end = (rank + 1) * local_n;
          
          // 如果是最后一个进程,确保处理所有剩余的元素(处理N % size)
          if (rank == size - 1) {
              end = N;
          }
      
          double local_dot_product = compute_local_dot_product(A, B, start, end);
      
          // 使用MPI_Reduce将所有进程的局部点积结果汇总到进程0
          double global_dot_product = 0.0;
          MPI_Reduce(&local_dot_product, &global_dot_product, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
      
          // 进程0输出最终结果
          if (rank == 0) {
              printf("Vector A is\n");
              print_vector(A);
              printf("Vector B is\n");
              print_vector(B);
              printf("Dot Product of A and B: %f\n", global_dot_product);
          }
      
          // 结束MPI环境
          MPI_Finalize();
          return 0;
      }

      3. 创建Slurm作业脚本

      创建一个SLURM作业脚本来运行该MPI程序。以下是一个基本的SLURM作业脚本,假设文件名为 mpi_test.slurm:

      #!/bin/bash
      #SBATCH --job-name=mpi_test                 # 作业名称
      #SBATCH --nodes=2                           # 请求节点数
      #SBATCH --ntasks-per-node=1                 # 每个节点上的任务数
      #SBATCH --time=00:10:00                     # 最大运行时间
      #SBATCH --output=mpi_test_output_%j.log     # 输出日志文件
      
      # 加载MPI模块(如果使用模块化环境)
      module load openmpi
      
      # 运行MPI程序
      mpirun --allow-run-as-root -np 2 ./hello_mpi
      #!/bin/bash
      #SBATCH --job-name=mpi_test                 # 作业名称
      #SBATCH --nodes=2                           # 请求节点数
      #SBATCH --ntasks-per-node=1                 # 每个节点上的任务数
      #SBATCH --time=00:10:00                     # 最大运行时间
      #SBATCH --output=mpi_test_output_%j.log     # 输出日志文件
      
      # 加载MPI模块(如果使用模块化环境)
      module load openmpi
      
      # 运行MPI程序
      mpirun --allow-run-as-root -np 2 ./dot_product

      4. 编译MPI程序

      在运行作业之前,你需要编译MPI程序。在集群上使用mpicc来编译该程序。假设你将程序保存在 hello_mpi.c 文件中,使用以下命令进行编译:

      mpicc -o hello_mpi hello_mpi.c
      mpicc -o dot_product dot_product.c

      5. 提交Slurm作业

      保存上述作业脚本(mpi_test.slurm)并使用以下命令提交作业:

      sbatch mpi_test.slurm

      6. 查看作业状态

      你可以使用以下命令查看作业的状态:

      squeue -u <your_username>

      7. 检查输出

      作业完成后,输出将保存在你作业脚本中指定的文件中(例如 mpi_test_output_<job_id>.log)。你可以使用 cat 或任何文本编辑器查看输出:

      cat mpi_test_output_*.log

      示例输出 如果一切正常,输出会类似于:

      Hello, World! I am process 0 out of 2 processes.
      Hello, World! I am process 1 out of 2 processes.
      Result Matrix C (A * B):
      14 8 2 -4 
      20 10 0 -10 
      -1189958655 1552515295 21949 -1552471397 
      0 0 0 0 

      Try OpenSCOW

      What is SCOW?

      SCOW is a HPC cluster management system built by PKU.

      SCOW used four virtual machines to run slurm cluster. It is a good choice for you to learn how to use slurm.

      You should check https://pkuhpc.github.io/OpenSCOW/docs/hpccluster, it works well.