Comment redémarrer Mariadb Galera Cluster?

Question

Une fois que tout le nœud s'est écrasé, j'essaie de récupérer le cluster mais sans succès. J'ai seulement 2 nœuds.

Comme la documentation indique que j'ai défini un paramètre sur l'un des nœuds:

set global wsrep_provider_options="pc.bootstrap=true";

Et puis essayez de démarrer le premier nœud:

systemctl start mariadb

Après cela, j'ai eu une erreur:

Oct 11 16:11:12 proxy1 sh[2367]: 2016-10-11 16:11:12 140291677038720 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2402 ... Oct 11 16:11:15 proxy1 sh[2367]: WSREP: Recovered position b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2434 ... Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Read nil XID from storage engines, skipping position init Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): Galera 25.3.18(r3632) by Codership Oy <info@codership.com> loaded successfully. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: CRC-32C: using hardware acceleration. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Found saved state: b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:-1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_Host = 192.168.0.41; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140046790919936 [Note] WSREP: Service thread queue flushed. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Assign initial position for certification: 141, protocol version: -1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_sst_grab() Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Start replication Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Setting initial position to b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: protonet asio version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Using CRC-32C for message checksums. Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: backend: asio Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm thread scheduling priority set to other:0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory) Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: restore pc from disk failed Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: GMCast version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: EVS version 0 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm: connecting to group 'test_cluster', peer '192.168.0.41:,192.168.0.42:' Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.41:4567' points to own listening address, blacklisting Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 1ef15511 tcp://192.168.0.42:4567 Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: declaring 1ef15511 at tcp://192.168.0.42:4567 stable Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:15 proxy1 mysqld[2434]: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,1ef15511,2) memb { Oct 11 16:11:15 proxy1 mysqld[2434]: 1ef15511,0 Oct 11 16:11:15 proxy1 mysqld[2434]: 30a7b2e6,0 Oct 11 16:11:15 proxy1 mysqld[2434]: } joined { Oct 11 16:11:15 proxy1 mysqld[2434]: } left { Oct 11 16:11:15 proxy1 mysqld[2434]: } partitioned { Oct 11 16:11:15 proxy1 mysqld[2434]: }) Oct 11 16:11:18 proxy1 mysqld[2434]: 2016-10-11 16:11:18 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:19 proxy1 mysqld[2434]: 2016-10-11 16:11:19 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.42:4567 Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: forgetting 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:20 proxy1 mysqld[2434]: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,30a7b2e6,3) memb { Oct 11 16:11:20 proxy1 mysqld[2434]: 30a7b2e6,0 Oct 11 16:11:20 proxy1 mysqld[2434]: } joined { Oct 11 16:11:20 proxy1 mysqld[2434]: } left { Oct 11 16:11:20 proxy1 mysqld[2434]: } partitioned { Oct 11 16:11:20 proxy1 mysqld[2434]: 1ef15511,0 Oct 11 16:11:20 proxy1 mysqld[2434]: }) Oct 11 16:11:25 proxy1 mysqld[2434]: 2016-10-11 16:11:25 140047023368320 [Note] WSREP: cleaning up 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [Note] WSREP: view((empty)) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: at gcomm/src/pc.cpp:connect():162 Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'test_cluster' at 'gcomm://192.168.0.41,192.168.0.42': -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs connect failed: Connection timed out Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.41,192.168.0.42) failed: 7 Oct 11 16:11:46 proxy1 mysqld[2434]: 2016-10-11 16:11:46 140047023368320 [ERROR] Aborting Oct 11 16:11:47 proxy1 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Oct 11 16:11:47 proxy1 systemd[1]: Failed to start MariaDB database server. -- Subject: Unit mariadb.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mariadb.service has failed. -- -- The result is failed. Oct 11 16:11:47 proxy1 systemd[1]: Unit mariadb.service entered failed state. Oct 11 16:11:47 proxy1 systemd[1]: mariadb.service failed. Oct 11 16:11:47 proxy1 polkitd[570]: Unregistered Authentication Agent for unix-process:2360:148848 (system bus name :1.15, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

Comment récupérer un cluster?

Oleksandr · Accepted Answer

Cluster Mariadb Galera :
[.____] (( solution 1:
[.____] 1) J'ai changé safe_to_bootstrap Paramètre à 1 sur l'un des nœuds du fichier /var/lib/mysql/grastate.dat:

safe_to_bootstrap: 1

2) Après cela, j'ai tué tous les processus MySQL:

killall -KILL mysql mysqld_safe mysqld mysql-systemd

3) et a commencé un nouveau groupe:

galera_new_cluster

4) Tous les autres nœuds que j'ai reconnectés à la nouvelle:

systemctl restart mariadb

P.s. Pour installer Killall sur Centos, utilisez psmisc:

Sudo yum install psmisc

( solution 2:
Une autre façon de redémarrer un cluster Mariadb Galera est d'utiliser --wsrep-new-cluster Paramètre.

1) Tuez tous les processus MySQL:

killall -KILL mysql mysqld_safe mysqld mysql-systemd

2) Sur le nœud le plus à jour, commencez un nouveau cluster:

/etc/init.d/mysql start --wsrep-new-cluster

3) Maintenant, d'autres nœuds peuvent être connectés:

service mysql start --wsrep_cluster_address="gcomm://192.168.0.101,192.168.0.102,192.168.0.103" \ --wsrep_cluster_name="my_cluster"

PARCONA XTRADB Cluster :
[.____] (( solution 1:
[.____] Si vous pouvez vous connecter au nœud le plus à jour, vous pouvez configurer le nœud sur bootstrap avec le prochain SQL:

SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

( solution 2:
[.____] Au cas où tous vos nœuds sont morts et que vous ne pouvez pas être démarré, vous pouvez arrêter l'ancien cluster et en faire une nouvelle. Vous devez arrêter tous les nœuds de cluster car ils ont une information sur les vieux nœuds de l'ancien cluster.

1) Tuez tous les processus MySQL sur tous les nœuds:

killall -KILL mysql mysqld_safe mysqld mysql-systemd

2) Démarrez un nouveau cluster sur le nœud le plus à jour:

systemctl start mysql@bootstrap.service

3) Démarrer d'autres nœuds:

systemctl start mysql