Percona - Impossible de démarrer MySQL après le redémarrage

Question

Je viens de configurer un cluster de Percona Xtradb à Ubuntu avec 2 noeuds.

C'est ce que j'ai fait.

lancé 2 nouveaux serveurs.
Installation de Percona Cluster sur les deux nœuds.
Arrêté MySQL sur les deux nœuds.

Sur Server1:

/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81 binlog_format=ROW wsrep_slave_threads= 8 wsrep_log_conflicts innodb_autoinc_lock_mode=2 wsrep_node_address=172.31.5.172 wsrep_cluster_name=pxc-cluster pxc_strict_mode=DISABLED wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="sstuser:s3cretPass"

Sur le serveur 2:

/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81 binlog_format=ROW wsrep_slave_threads= 8 wsrep_log_conflicts innodb_autoinc_lock_mode=2 wsrep_node_address=172.31.13.81 wsrep_cluster_name=pxc-cluster pxc_strict_mode=DISABLED wsrep_sst_method=xtrabackup-v2 wsrep_sst_auth="sstuser:s3cretPass"

Sur Server1: /etc/init.d/mysql bootstrap-pxc Alors mysql a commencé.

CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass'; GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost'; FLUSH PRIVILEGES;

Sur le serveur 2:

service mysql start

À l'instant, pas de problèmes, testé la réplicatine. Ensuite, j'ai fait un redémarrage des deux nœuds.

Après que MySQL ne démarre pas.

Server1 :

2017-11-05T17:22:03.505946Z mysqld_safe Logging to '/var/log/mysqld.log'. 2017-11-05T17:22:03.515607Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 2017-11-05T17:22:03.520623Z mysqld_safe Skipping wsrep-recover for cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 pair 2017-11-05T17:22:03.521158Z mysqld_safe Assigning cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 to wsrep_start_position 2017-11-05T17:22:03.706708Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:03.708134Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 11089 ... 2017-11-05T17:22:03.709661Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-5-172-bin' to avoid this problem. 2017-11-05T17:22:03.709964Z 0 [Note] WSREP: Setting wsrep_ready to false 2017-11-05T17:22:03.709977Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization. 2017-11-05T17:22:03.709982Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so' 2017-11-05T17:22:03.712249Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully. 2017-11-05T17:22:03.712290Z 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2017-11-05T17:22:03.712549Z 0 [Note] WSREP: Found saved state: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0, safe_to_bootsrap: 1 2017-11-05T17:22:03.713823Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_Host = 172.31.5.172; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; 2017-11-05T17:22:03.727615Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1 2017-11-05T17:22:03.727630Z 0 [Note] WSREP: Preparing to initiate SST/IST 2017-11-05T17:22:03.727634Z 0 [Note] WSREP: Starting replication 2017-11-05T17:22:03.727640Z 0 [Note] WSREP: Setting initial position to cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 2017-11-05T17:22:03.727741Z 0 [Note] WSREP: Using CRC-32C for message checksums. 2017-11-05T17:22:03.727792Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2017-11-05T17:22:03.727852Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:03.727859Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:03.728266Z 0 [Note] WSREP: GMCast version 0 2017-11-05T17:22:03.728398Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2017-11-05T17:22:03.728405Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:03.729595Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567 2017-11-05T17:22:03.729609Z 0 [Warning] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') address 'tcp://172.31.5.172:4567' points to own listening address, blacklisting 2017-11-05T17:22:06.729830Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:06.729954Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer d73a400b with addr tcp://172.31.5.172:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:06.730072Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:06.730101Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,1) memb { d73a400b,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:07.230228Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50154S), skipping check 2017-11-05T17:22:10.730540Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:14.731257Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:16.154486Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567 2017-11-05T17:22:16.154851Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2017-11-05T17:22:16.654815Z 0 [Note] WSREP: declaring de905d70 at tcp://172.31.13.81:4567 stable 2017-11-05T17:22:16.655411Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:16.655466Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { d73a400b,0 de905d70,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:19.232193Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting off

Server2 :

2017-11-05T17:22:15.979415Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:15.981265Z 0 [Note] mysqld (mysqld 5.7.19-17-57-log) starting as process 1325 ... 2017-11-05T17:22:15.983852Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem. 2017-11-05T17:22:15.984031Z 0 [Note] WSREP: Setting wsrep_ready to false 2017-11-05T17:22:15.984043Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization. 2017-11-05T17:22:15.984047Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so' 2017-11-05T17:22:16.013201Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully. 2017-11-05T17:22:16.013252Z 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2017-11-05T17:22:16.015682Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 0 2017-11-05T17:22:16.019709Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_Host = 172.31.13.81; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; 2017-11-05T17:22:16.034431Z 0 [Note] WSREP: GCache history reset: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 -> 00000000-0000-0000-0000-000000000000:-1 2017-11-05T17:22:16.035026Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 2017-11-05T17:22:16.035038Z 0 [Note] WSREP: Preparing to initiate SST/IST 2017-11-05T17:22:16.035041Z 0 [Note] WSREP: Starting replication 2017-11-05T17:22:16.035048Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 2017-11-05T17:22:16.035159Z 0 [Note] WSREP: Using CRC-32C for message checksums. 2017-11-05T17:22:16.035214Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2017-11-05T17:22:16.036228Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:16.036250Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:16.036685Z 0 [Note] WSREP: GMCast version 0 2017-11-05T17:22:16.036828Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2017-11-05T17:22:16.036836Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2017-11-05T17:22:16.037997Z 0 [Note] WSREP: EVS version 0 2017-11-05T17:22:16.038072Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:' 2017-11-05T17:22:16.039496Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567 2017-11-05T17:22:16.039513Z 0 [Warning] WSREP: (de905d70, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting 2017-11-05T17:22:16.039960Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567 2017-11-05T17:22:16.040014Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2017-11-05T17:22:16.540329Z 0 [Note] WSREP: declaring d73a400b at tcp://172.31.5.172:4567 stable 2017-11-05T17:22:16.540836Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:16.540862Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { d73a400b,0 de905d70,0 } joined { } left { } partitioned { } ) 2017-11-05T17:22:19.039641Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection to peer de905d70 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S 2017-11-05T17:22:19.539775Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-05T17:22:37.061048Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.31.5.172:4567 2017-11-05T17:22:38.061306Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') reconnecting to d73a400b (tcp://172.31.5.172:4567), attempt 0 2017-11-05T17:22:41.562259Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspecting node: d73a400b 2017-11-05T17:22:41.562325Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspected node without join message, declaring inactive 2017-11-05T17:22:42.562445Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,d73a400b,2) memb { de905d70,0 } joined { } left { } partitioned { d73a400b,0 } ) 2017-11-05T17:22:42.562525Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:22:42.562550Z 0 [Note] WSREP: Current view of cluster as seen by this node view (view_id(NON_PRIM,de905d70,3) memb { de905d70,0 } joined { } left { } partitioned { d73a400b,0 } ) 2017-11-05T17:22:47.071058Z 0 [Note] WSREP: Current view of cluster as seen by this node view ((empty)) 2017-11-05T17:22:47.071334Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158 2017-11-05T17:22:47.071356Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) 2017-11-05T17:22:47.071412Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out) 2017-11-05T17:22:47.071421Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out 2017-11-05T17:22:47.071426Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7) 2017-11-05T17:22:47.071430Z 0 [ERROR] Aborting 2017-11-05T17:22:47.071433Z 0 [Note] Giving 0 client threads a chance to die gracefully 2017-11-05T17:22:47.071437Z 0 [Note] WSREP: Waiting for active wsrep applier to exit 2017-11-05T17:22:47.071440Z 0 [Note] WSREP: Service disconnected. 2017-11-05T17:22:47.071443Z 0 [Note] WSREP: Waiting to close threads...... 2017-11-05T17:22:52.072202Z 0 [Note] WSREP: Some threads may fail to exit. 2017-11-05T17:22:52.072285Z 0 [Note] Binlog end 2017-11-05T17:22:52.072905Z 0 [Note] mysqld: Shutdown complete 2017-11-05T17:22:52.322273Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-11-05T17:22:52.325253Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem. 2017-11-05T17:22:52.344397Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown 2017-11-05T17:22:52.346176Z 0 [Warning] WSREP: (f4348c81, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting 2017-11-05T17:22:52.847728Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible 2017-11-05T17:23:22.856010Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158 2017-11-05T17:23:22.856051Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) 2017-11-05T17:23:22.856128Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out) 2017-11-05T17:23:22.856140Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out 2017-11-05T17:23:22.856147Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7) 2017-11-05T17:23:22.856151Z 0 [ERROR] Aborting

HamoonDBA · Accepted Answer

Oui, Rick a raison. Parce que tous les nœuds sont en panne, vous devez bootstrap cluster à nouveau.

Vous avez arrêté tous les nœuds gracieusement.

Si vous savez quel est le dernier (noeud) arrêté, choisissez ce nœud aussi bootstrap puis démarrez un autre nœud.

Mais si vous ne vous souvenez pas que vous ne vous souvenez pas du dernier nœud que vous avez éteint, ouvrez grastate.dat (localiser dans le répertoire mysql) à partir de tous les nœuds et comparez SEQNO Number Number Inside and Choice A Nice qui a un numéro plus grand et commencez par lancer ce nœud. Si vous choisissez un mauvais nœud pour Bootstrap, un autre nœud doit supprimer toutes les données et demander à SST à nouveau. donc sois prudent