Subsections of Import Data

Import CCDS MariaDB Data

Preliminary

  • MariaDB has installed though argo-workflow, if not check link
  • MariaDB server pod named is app-mariadb and in namespace application
Warning

if the pod name and namespace isn’t match, you might need to modify following shell.

Download SQL file

wget https://inner-gitlab.citybrain.org/chang.liu/ccds-server/-/raw/ccds-v1/deploy/ccds-db-init.sql -O ccds-mariadb-init.sql

TODO the content we download is all html, fff

Using import tool

MARIADB_ROOT_PASSWORD=$(kubectl -n application get secret mariadb-credentials -o jsonpath='{.data.mariadb-root-password}' | base64 -d) \
TOOL_POD_NAME=$(kubectl get pod -n application -l "app.kubernetes.io/name=mariadb-tool" -o jsonpath="{.items[0].metadata.name}") \
&& export SQL_FILENAME="ccds-mariadb-init.sql" \
&& kubectl -n application cp ${SQL_FILENAME} ${TOOL_POD_NAME}:/tmp/${SQL_FILENAME} \
&& kubectl -n application exec -it deployment/app-mariadb-tool -- bash -c \
    'echo "create database ccds;" | mysql -h app-mariadb.application -uroot -p$MARIADB_ROOT_PASSWORD' \
&& kubectl -n application exec -it ${TOOL_POD_NAME} -- bash -c \
    "mysql -h app-mariadb.application -uroot -p\${MARIADB_ROOT_PASSWORD} \
    ccds < /tmp/ccds-mariadb-init.sql"

Build Clickhouse NFS Server

Preliminary

  • Podman has installed, if not check link

1. create new partition

parted

#select /dev/vdb 
#mklabel gpt 
#mkpart primary 0 -1
#Cancel
#mkpart primary 0% 100%
#print

2. Format disk

mkfs.xfs /dev/vdb1 -f

3. mount disk to folder

mount /dev/vdb1 /data

4. mount after restart server

#vim `/etc/fstab` 
/dev/vdb1     /data  xfs   defaults   0 0

fstab fstab

5. init NFSv4 Server

echo -e "nfs\nnfsd" > /etc/modules-load.d/nfs4.conf
modprobe nfs && modprobe nfsd
mkdir -p $(pwd)/data/nfs/data
echo '/data *(rw,fsid=0,no_subtree_check,insecure,no_root_squash)' > $(pwd)/data/nfs/exports
podman run \
    --name nfs4 \
    --rm \
    --privileged \
    -p 12049:2049 \
    -v $(pwd)/data/nfs/data:/data \
    -v $(pwd)/data/nfs/exports:/etc/exports:ro \
    -d docker.io/erichough/nfs-server:2.2.1
Tip

you can run an addinational daocloud image to accelerate your pulling, check Daocloud Proxy

6. [Optional] test load

install nfs on
sudo yum install -y nfs-utils
sudo apt-get install nfs-common
sudo dnf install -y nfs-utils

client is ok for normal user

mkdir -p $(pwd)/mnt/nfs
sudo mount -t nfs4 -o port=2049 -v localhost:/ $(pwd)/mnt/nfs
df -h

7.create NFS provisioner

prepare csst-ck-nfs-provisioner.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: csst-ck-nfs-provisioner
spec:
  syncPolicy:
    syncOptions:
    - CreateNamespace=true
  project: default
  source:
    repoURL: https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner
    chart: nfs-subdir-external-provisioner
    targetRevision: 4.0.18
    helm:
      releaseName: csst-ck-nfs-provisioner
      values: |
        image:
          repository: m.daocloud.io/registry.k8s.io/sig-storage/nfs-subdir-external-provisioner
          pullPolicy: IfNotPresent
        nfs:
          server: <$nfs.service.ip.addr>
          path: /
          mountOptions:
            - port=12049
            - vers=4
            - minorversion=0
            - rsize=1048576
            - wsize=1048576
            - hard
            - timeo=600
            - retrans=2
            - noresvport
          volumeName: csst-ck-nfs-subdir-external-provisioner-nas
          reclaimPolicy: Retain
        storageClass:
          create: true
          defaultClass: true
          name: csst-nfs-external-nas        
  destination:
    server: https://kubernetes.default.svc
    namespace: basic-components

Your NFS server will be host on <$ip>:12049

8. apply to k8s

kubectl -n argocd apply -f csst-ck-nfs-provisioner.yaml

9. sync by argocd

argocd app sync argocd/csst-ck-nfs-provisioner

Then you can use storage class csst-nfs-external-nas to create pv or pvc

Import CSST Clickhouse Data

Preliminary

  • Clickhouse has installed though argo-workflow, if not check link
  • Clickhouse server pod named is app-clickhouse and in namespace application
Warning

if the pod name and namespace isn’t match, you might need to modify following shell.

Download Data file

Currently, we mount a NFS disk to retrieve data.

Using client tool

CK_HOST="172.27.253.66"
CK_PASSWORD=$(kubectl -n application get secret clickhouse-admin-credentials -o jsonpath='{.data.password}' | base64 -d) \
&& podman run --rm --entrypoint clickhouse-client -it m.daocloud.io/docker.io/clickhouse/clickhouse-server:23.11.5.29-alpine \
     --host ${CK_HOST} \
     --port 30900 \
     --user admin \
     --password ${CK_PASSWORD} \
     --query "select version()"

Init Database

CREATE DATABASE IF NOT EXISTS csst ON CLUSTER default;

CREATE TABLE IF NOT EXISTS csst.msc_level2_catalog_local ON CLUSTER default
(
    level2_id Int64,
    OBSID String DEFAULT '',
    CCDNO Int32,
    objID Int32,
    X Float32,
    XErr Float64,
    Y Float32,
    YErr Float64,
    RA Float64,
    RAErr Float64,
    DEC Float64,
    DECErr Float64,
    A Float32,
    AErr Float32,
    B Float32,
    BErr Float32,
    PA Float32,
    Flag Int32,
    Flag_ISO Int32,
    Flag_ISO_Num Int32,
    FWHM Float32,
    AB Float32,
    E Float32,
    Flux_Kron Float64,
    FluxErr_Kron Float32,
    Mag_Kron Float64,
    MagErr_Kron Float64,
    Radius_Kron Float64,
    Sky Float32,
    Flux_Aper1 Float32,
    FluxErr_Aper1 Float32,
    Mag_Aper1 Float32,
    MagErr_Aper1 Float32,
    Flux_Aper2 Float32,
    FluxErr_Aper2 Float32,
    Mag_Aper2 Float32,
    MagErr_Aper2 Float32,
    Flux_Aper3 Float32,
    FluxErr_Aper3 Float32,
    Mag_Aper3 Float32,
    MagErr_Aper3 Float32,
    Flux_Aper4 Float32,
    FluxErr_Aper4 Float32,
    Mag_Aper4 Float32,
    MagErr_Aper4 Float32,
    Flux_Aper5 Float32,
    FluxErr_Aper5 Float32,
    Mag_Aper5 Float32,
    MagErr_Aper5 Float32,
    Flux_Aper6 Float32,
    FluxErr_Aper6 Float32,
    Mag_Aper6 Float32,
    MagErr_Aper6 Float32,
    Flux_Aper7 Float32,
    FluxErr_Aper7 Float32,
    Mag_Aper7 Float32,
    MagErr_Aper7 Float32,
    Flux_Aper8 Float32,
    FluxErr_Aper8 Float32,
    Mag_Aper8 Float32,
    MagErr_Aper8 Float32,
    Flux_Aper9 Float32,
    FluxErr_Aper9 Float32,
    Mag_Aper9 Float32,
    MagErr_Aper9 Float32,
    Flux_Aper10 Float32,
    FluxErr_Aper10 Float32,
    Mag_Aper10 Float32,
    MagErr_Aper10 Float32,
    Flux_Aper11 Float32,
    FluxErr_Aper11 Float32,
    Mag_Aper11 Float32,
    MagErr_Aper11 Float32,
    Flux_Aper12 Float32,
    FluxErr_Aper12 Float32,
    Mag_Aper12 Float32,
    MagErr_Aper12 Float32,
    Type Int32,
    R20 Float32,
    R50 Float32,
    R90 Float32,
    X_PSF Float64,
    Y_PSF Float64,
    RA_PSF Float64,
    DEC_PSF Float64,
    Chi2_PSF Float32,
    Flux_PSF Float32,
    FluxErr_PSF Float32,
    Mag_PSF Float32,
    MagErr_PSF Float32,
    X_Model Float64,
    Y_Model Float64,
    RA_Model Float64,
    DEC_Model Float64,
    Chi2_Model Float32,
    Flag_Model Int32,
    Flux_Model Float32,
    FluxErr_Model Float32,
    Mag_Model Float32,
    MagErr_Model Float32,
    Flux_Bulge Float32,
    FluxErr_Bulge Float32,
    Mag_Bulge Float32,
    MagErr_Bulge Float32,
    Re_Bulge Float32,
    ReErr_Bulge Float32,
    E_Bulge Float32,
    EErr_Bulge Float32,
    PA_Bulge Float32,
    PAErr_Bulge Float32,
    Flux_Disk Float32,
    FluxErr_Disk Float32,
    Mag_Disk Float32,
    MagErr_Disk Float32,
    Re_Disk Float32,
    ReErr_Disk Float32,
    E_Disk Float32,
    EErr_Disk Float32,
    PA_Disk Float32,
    PAErr_Disk Float32,
    Ratio_Disk Float32,
    RatioErr_Disk Float32,
    Spread_Model Float32,
    SpreadErr_Model Float32,
    Filter String DEFAULT '',
    Brick_Id Int32
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/csst_msc_level2_catalog','{replica}')
PARTITION BY round(modulo(Brick_Id,64))
ORDER BY (level2_id,Brick_Id)
SETTINGS index_granularity = 8192;

CREATE TABLE IF NOT EXISTS csst.msc_level2_catalog_all ON CLUSTER default AS csst.msc_level2_catalog_local
    ENGINE = Distributed(default, csst, msc_level2_catalog_local, rand());



CREATE DATABASE IF NOT EXISTS ephem ON CLUSTER default;

CREATE TABLE IF NOT EXISTS ephem.gaia3_source_local ON CLUSTER default
(
    solution_id Int64,
    designation String DEFAULT '',
    source_id Int64,
    random_index Int64,
    ref_epoch Float64,
    ra Float64,
    ra_error Float64,
    `dec` Float64,
    dec_error Float64,
    parallax Float64,
    parallax_error Float64,
    parallax_over_error Float64,
    pm Float64,
    pmra Float64,
    pmra_error Float64,
    pmdec Float64,
    pmdec_error Float64,
    ra_dec_corr Float64,
    ra_parallax_corr Float64,
    ra_pmra_corr Float64,
    ra_pmdec_corr Float64,
    dec_parallax_corr Float64,
    dec_pmra_corr Float64,
    dec_pmdec_corr Float64,
    parallax_pmra_corr Float64,
    parallax_pmdec_corr Float64,
    pmra_pmdec_corr Float64,
    astrometric_n_obs_al Int64,
    astrometric_n_obs_ac Int64,
    astrometric_n_good_obs_al Int64,
    astrometric_n_bad_obs_al Int64,
    astrometric_gof_al Float64,
    astrometric_chi2_al Float64,
    astrometric_excess_noise Float64,
    astrometric_excess_noise_sig Float64,
    astrometric_params_solved Int64,
    astrometric_primary_flag UInt8,
    nu_eff_used_in_astrometry Float64,
    pseudocolour Float64,
    pseudocolour_error Float64,
    ra_pseudocolour_corr Float64,
    dec_pseudocolour_corr Float64,
    parallax_pseudocolour_corr Float64,
    pmra_pseudocolour_corr Float64,
    pmdec_pseudocolour_corr Float64,
    astrometric_matched_transits Int64,
    visibility_periods_used Int64,
    astrometric_sigma5d_max Float64,
    matched_transits Int64,
    new_matched_transits Int64,
    matched_transits_removed Int64,
    ipd_gof_harmonic_amplitude Float64,
    ipd_gof_harmonic_phase Float64,
    ipd_frac_multi_peak Int64,
    ipd_frac_odd_win Int64,
    ruwe Float64,
    scan_direction_strength_k1 Float64,
    scan_direction_strength_k2 Float64,
    scan_direction_strength_k3 Float64,
    scan_direction_strength_k4 Float64,
    scan_direction_mean_k1 Float64,
    scan_direction_mean_k2 Float64,
    scan_direction_mean_k3 Float64,
    scan_direction_mean_k4 Float64,
    duplicated_source UInt8,
    phot_g_n_obs Int64,
    phot_g_mean_flux Float64,
    phot_g_mean_flux_error Float64,
    phot_g_mean_flux_over_error Float64,
    phot_g_mean_mag Float64,
    phot_bp_n_obs Int64,
    phot_bp_mean_flux Float64,
    phot_bp_mean_flux_error Float64,
    phot_bp_mean_flux_over_error Float64,
    phot_bp_mean_mag Float64,
    phot_rp_n_obs Int64,
    phot_rp_mean_flux Float64,
    phot_rp_mean_flux_error Float64,
    phot_rp_mean_flux_over_error Float64,
    phot_rp_mean_mag Float64,
    phot_bp_rp_excess_factor Float64,
    phot_bp_n_contaminated_transits Float64,
    phot_bp_n_blended_transits Float64,
    phot_rp_n_contaminated_transits Float64,
    phot_rp_n_blended_transits Float64,
    phot_proc_mode Float64,
    bp_rp Float64,
    bp_g Float64,
    g_rp Float64,
    radial_velocity Float64,
    radial_velocity_error Float64,
    rv_method_used Float64,
    rv_nb_transits Float64,
    rv_nb_deblended_transits Float64,
    rv_visibility_periods_used Float64,
    rv_expected_sig_to_noise Float64,
    rv_renormalised_gof Float64,
    rv_chisq_pvalue Float64,
    rv_time_duration Float64,
    rv_amplitude_robust Float64,
    rv_template_teff Float64,
    rv_template_logg Float64,
    rv_template_fe_h Float64,
    rv_atm_param_origin Float64,
    vbroad Float64,
    vbroad_error Float64,
    vbroad_nb_transits Float64,
    grvs_mag Float64,
    grvs_mag_error Float64,
    grvs_mag_nb_transits Float64,
    rvs_spec_sig_to_noise Float64,
    phot_variable_flag String DEFAULT '',
    l Float64,
    b Float64,
    ecl_lon Float64,
    ecl_lat Float64,
    in_qso_candidates UInt8,
    in_galaxy_candidates UInt8,
    non_single_star Int64,
    has_xp_continuous UInt8,
    has_xp_sampled UInt8,
    has_rvs UInt8,
    has_epoch_photometry UInt8,
    has_epoch_rv UInt8,
    has_mcmc_gspphot UInt8,
    has_mcmc_msc UInt8,
    in_andromeda_survey UInt8,
    classprob_dsc_combmod_quasar Float64,
    classprob_dsc_combmod_galaxy Float64,
    classprob_dsc_combmod_star Float64,
    teff_gspphot Float64,
    teff_gspphot_lower Float64,
    teff_gspphot_upper Float64,
    logg_gspphot Float64,
    logg_gspphot_lower Float64,
    logg_gspphot_upper Float64,
    mh_gspphot Float64,
    mh_gspphot_lower Float64,
    mh_gspphot_upper Float64,
    distance_gspphot Float64,
    distance_gspphot_lower Float64,
    distance_gspphot_upper Float64,
    azero_gspphot Float64,
    azero_gspphot_lower Float64,
    azero_gspphot_upper Float64,
    ag_gspphot Float64,
    ag_gspphot_lower Float64,
    ag_gspphot_upper Float64,
    ebpminrp_gspphot Float64,
    ebpminrp_gspphot_lower Float64,
    ebpminrp_gspphot_upper Float64,
    libname_gspphot String DEFAULT '',
    NS8HIdx Int32,
    NS16HIdx Int32,
    NS32HIdx Int32,
    NS64HIdx Int32,
    fileIdx Int32
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/gaia3','{replica}')
PARTITION BY round(modulo(NS8HIdx,20))
ORDER BY (NS8HIdx,NS16HIdx,NS32HIdx,NS64HIdx,fileIdx)
SETTINGS index_granularity = 8192;

CREATE TABLE IF NOT EXISTS ephem.gaia3_source_all ON CLUSTER default AS ephem.gaia3_source_local
    ENGINE = Distributed(default, ephem, gaia3_source_local, rand());

Import Data

Import
podman run --rm  -v /tmp/deploy:/tmp/deploy -v /tmp/native:/share/diskdata/gaia3 \
     --entrypoint clickhouse-client \
     -it m.daocloud.io/docker.io/clickhouse/clickhouse-server:23.11.5.29-alpine \
     --host ${CK_HOST}  \
     --port 30900  \
     --user admin  \
     --password ${CK_PASSWORD} \
     --query "insert into ephem.gaia3_source_all from infile /share/diskdata/gaia3/100751.native FORMAT Native" 

1. first you need to run a container

podman run --rm  -u root  -v /data:/data:ro   --entrypoint tail   -it docker.io/bitnami/clickhouse:23.10.5-debian-11-r0 -f /etc/hosts

## get into pod
# podman exec -it <$container_id> bash

dir /data saved all the xxx.native file

2. when you in pod, you need to create a shell script run.sh

#!/bin/sh

INDEX=0
for filename in $(ls -l /data | awk '{print $NF}');
do
        INDEX=$(($INDEX+1))
        echo $(date) $INDEX $filename >> import.log
        clickhouse-client -h 172.27.253.66 --port=30900 --user admin --password YEkvhrhEaeZTf7E0 \
                --query "insert into ephem.gaia3_source_local FORMAT Native" < /data/$filename \
        || echo $filename >> import_err.log

done

Then you can use sh run.sh to import data into clickhouse and view import_err.log to trace the error.

Import CSST Postgres Data

Preliminary

  • Postgresql has installed though argo-workflow, if not check link
  • Postgresql server pod named is app-postgresql and in namespace application
Warning

if the pod name and namespace isn’t match, you might need to modify following shell.

Download SQL file

wget https://inner-gitlab.citybrain.org/csst/csst-py/-/raw/main/deploy/pg/init_dfs_table_data.sql -O init_dfs_table_data.sql

TODO the content we download is all html, fff

Using import tool

POSTGRES_PASSWORD=$(kubectl -n application get secret postgresql-credentials -o jsonpath='{.data.postgres-password}' | base64 -d) \
POD_NAME=$(kubectl get pod -n application -l "app.kubernetes.io/name=postgresql-tool" -o jsonpath="{.items[0].metadata.name}") \
&& export SQL_FILENAME="init_dfs_table_data.sql" \
&& kubectl -n application cp ${SQL_FILENAME} ${POD_NAME}:/tmp/${SQL_FILENAME} \
&& kubectl -n application exec -it deployment/app-postgresql-tool -- bash -c \
     'echo "CREATE DATABASE csst;" | PGPASSWORD="$POSTGRES_PASSWORD" \
     psql --host app-postgresql.application -U postgres -d postgres -p 5432' \
&& kubectl -n application exec -it deployment/app-postgresql-tool -- bash -c \
     'PGPASSWORD="$POSTGRES_PASSWORD" psql --host app-postgresql.application \
     -U postgres -d csst -p 5432 < /tmp/init_dfs_table_data.sql'