web-dev-qa-db-fra.com

Percona - Impossible de démarrer MySQL après le redémarrage

Je viens de configurer un cluster de Percona Xtradb à Ubuntu avec 2 noeuds.

C'est ce que j'ai fait.

  1. lancé 2 nouveaux serveurs.
  2. Installation de Percona Cluster sur les deux nœuds.
  3. Arrêté MySQL sur les deux nœuds.

Sur Server1:

/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf
wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81
binlog_format=ROW
wsrep_slave_threads= 8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=172.31.5.172
wsrep_cluster_name=pxc-cluster
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth="sstuser:s3cretPass"

Sur le serveur 2:

/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf
wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81
binlog_format=ROW
wsrep_slave_threads= 8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=172.31.13.81
wsrep_cluster_name=pxc-cluster
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth="sstuser:s3cretPass"

Sur Server1: /etc/init.d/mysql bootstrap-pxc Alors mysql a commencé.

CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass';
GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
FLUSH PRIVILEGES;

Sur le serveur 2:

service mysql start

À l'instant, pas de problèmes, testé la réplicatine. Ensuite, j'ai fait un redémarrage des deux nœuds.

Après que MySQL ne démarre pas.

Server1 :

2017-11-05T17:22:03.505946Z mysqld_safe Logging to '/var/log/mysqld.log'.
2017-11-05T17:22:03.515607Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2017-11-05T17:22:03.520623Z mysqld_safe Skipping wsrep-recover for cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 pair
2017-11-05T17:22:03.521158Z mysqld_safe Assigning cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 to wsrep_start_position
2017-11-05T17:22:03.706708Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:03.708134Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 11089 ...
2017-11-05T17:22:03.709661Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-5-172-bin' to avoid this problem.
2017-11-05T17:22:03.709964Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-05T17:22:03.709977Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-05T17:22:03.709982Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-05T17:22:03.712249Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <[email protected]> loaded successfully.
2017-11-05T17:22:03.712290Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-05T17:22:03.712549Z 0 [Note] WSREP: Found saved state: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0, safe_to_bootsrap: 1
2017-11-05T17:22:03.713823Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_Host = 172.31.5.172; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-05T17:22:03.727615Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2017-11-05T17:22:03.727630Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-05T17:22:03.727634Z 0 [Note] WSREP: Starting replication
2017-11-05T17:22:03.727640Z 0 [Note] WSREP: Setting initial position to cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0
2017-11-05T17:22:03.727741Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-05T17:22:03.727792Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-05T17:22:03.727852Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:03.727859Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:03.728266Z 0 [Note] WSREP: GMCast version 0
2017-11-05T17:22:03.728398Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-05T17:22:03.728405Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:03.729595Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567
2017-11-05T17:22:03.729609Z 0 [Warning] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') address 'tcp://172.31.5.172:4567' points to own listening address, blacklisting
2017-11-05T17:22:06.729830Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:06.729954Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer d73a400b with addr tcp://172.31.5.172:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:06.730072Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:06.730101Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,1)
memb {
        d73a400b,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-11-05T17:22:07.230228Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50154S), skipping check
2017-11-05T17:22:10.730540Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:14.731257Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:16.154486Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567
2017-11-05T17:22:16.154851Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-11-05T17:22:16.654815Z 0 [Note] WSREP: declaring de905d70 at tcp://172.31.13.81:4567 stable
2017-11-05T17:22:16.655411Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:16.655466Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
        d73a400b,0
        de905d70,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-11-05T17:22:19.232193Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting off

Server2 :

2017-11-05T17:22:15.979415Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:15.981265Z 0 [Note] mysqld (mysqld 5.7.19-17-57-log) starting as process 1325 ...
2017-11-05T17:22:15.983852Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem.
2017-11-05T17:22:15.984031Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-05T17:22:15.984043Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-05T17:22:15.984047Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-05T17:22:16.013201Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <[email protected]> loaded successfully.
2017-11-05T17:22:16.013252Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-05T17:22:16.015682Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 0
2017-11-05T17:22:16.019709Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_Host = 172.31.13.81; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-05T17:22:16.034431Z 0 [Note] WSREP: GCache history reset: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 -> 00000000-0000-0000-0000-000000000000:-1
2017-11-05T17:22:16.035026Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-11-05T17:22:16.035038Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-05T17:22:16.035041Z 0 [Note] WSREP: Starting replication
2017-11-05T17:22:16.035048Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-11-05T17:22:16.035159Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-05T17:22:16.035214Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-05T17:22:16.036228Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:16.036250Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:16.036685Z 0 [Note] WSREP: GMCast version 0
2017-11-05T17:22:16.036828Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-05T17:22:16.036836Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-05T17:22:16.037997Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:16.038072Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:16.039496Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567
2017-11-05T17:22:16.039513Z 0 [Warning] WSREP: (de905d70, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting
2017-11-05T17:22:16.039960Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567
2017-11-05T17:22:16.040014Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-11-05T17:22:16.540329Z 0 [Note] WSREP: declaring d73a400b at tcp://172.31.5.172:4567 stable
2017-11-05T17:22:16.540836Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:16.540862Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
        d73a400b,0
        de905d70,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-11-05T17:22:19.039641Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection to peer de905d70 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:19.539775Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-11-05T17:22:37.061048Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.31.5.172:4567
2017-11-05T17:22:38.061306Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') reconnecting to d73a400b (tcp://172.31.5.172:4567), attempt 0
2017-11-05T17:22:41.562259Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspecting node: d73a400b
2017-11-05T17:22:41.562325Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspected node without join message, declaring inactive
2017-11-05T17:22:42.562445Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
        de905d70,0
        }
joined {
        }
left {
        }
partitioned {
        d73a400b,0
        }
)
2017-11-05T17:22:42.562525Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:42.562550Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,de905d70,3)
memb {
        de905d70,0
        }
joined {
        }
left {
        }
partitioned {
        d73a400b,0
        }
)
2017-11-05T17:22:47.071058Z 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2017-11-05T17:22:47.071334Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2017-11-05T17:22:47.071356Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-11-05T17:22:47.071412Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out)
2017-11-05T17:22:47.071421Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-11-05T17:22:47.071426Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7)
2017-11-05T17:22:47.071430Z 0 [ERROR] Aborting

2017-11-05T17:22:47.071433Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-11-05T17:22:47.071437Z 0 [Note] WSREP: Waiting for active wsrep applier to exit
2017-11-05T17:22:47.071440Z 0 [Note] WSREP: Service disconnected.
2017-11-05T17:22:47.071443Z 0 [Note] WSREP: Waiting to close threads......
2017-11-05T17:22:52.072202Z 0 [Note] WSREP: Some threads may fail to exit.
2017-11-05T17:22:52.072285Z 0 [Note] Binlog end
2017-11-05T17:22:52.072905Z 0 [Note] mysqld: Shutdown complete

2017-11-05T17:22:52.322273Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:52.325253Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem.
2017-11-05T17:22:52.344397Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:52.346176Z 0 [Warning] WSREP: (f4348c81, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting
2017-11-05T17:22:52.847728Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:23:22.856010Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2017-11-05T17:23:22.856051Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-11-05T17:23:22.856128Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out)
2017-11-05T17:23:22.856140Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-11-05T17:23:22.856147Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7)
2017-11-05T17:23:22.856151Z 0 [ERROR] Aborting
1
Bhuvanesh

Oui, Rick a raison. Parce que tous les nœuds sont en panne, vous devez bootstrap cluster à nouveau.

Vous avez arrêté tous les nœuds gracieusement.

Si vous savez quel est le dernier (noeud) arrêté, choisissez ce nœud aussi bootstrap puis démarrez un autre nœud.

Mais si vous ne vous souvenez pas que vous ne vous souvenez pas du dernier nœud que vous avez éteint, ouvrez grastate.dat (localiser dans le répertoire mysql) à partir de tous les nœuds et comparez SEQNO Number Number Inside and Choice A Nice qui a un numéro plus grand et commencez par lancer ce nœud. Si vous choisissez un mauvais nœud pour Bootstrap, un autre nœud doit supprimer toutes les données et demander à SST à nouveau. donc sois prudent

2
HamoonDBA