Cassandraリビルドでサイトの拡張が停止する
環境
問題
拡張プロセスでCassandra nodetoolのリビルドに適していないサイトを選択したためにサイトの拡張が停止しました。
Cassandra system.log
のデータ:
INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,752 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3H> for keyspace accounts
INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3I> for keyspace accounts
WARN [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 StorageService.java (line 1503) Parameter error while rebuilding node
java.lang.IllegalStateException: Unable to find sufficient sources for streaming range (-6636921090683170249,-6636701783084431689] in keyspace accounts
at org.apache.cassandra.dht.RangeStreamer.handleSourceNotFound(RangeStreamer.java:306)
at org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:285)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:129)
at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1429)
at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1343)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO [RMI TCP Connection(22716)-127.0.0.1] 2024-03-07 05:30:20,425 StorageService.java (line 1402) starting rebuild for (All keyspaces), (All tokens), RESET_NO_SNAPSHOT, included DCs: group20
この nodetool status
コマンドを 拡張サイト(サイト4/ group40
)から実行すると、サイト2(group20
)のノードがDS
""(停止/停止)と表示されます
が、サイト2(group20
)のノードは稼働中で、他のサイトからアクセスできます。
Datacenter: group10
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
UN <IP_1A> 1.77 TiB 256 50.9% <UUID_1A> unknown
UN <IP_1B> 1.52 TiB 256 49.0% <UUID_1B> unknown
UN <IP_1C> 1.48 TiB 256 49.2% <UUID_1C> unknown
UN <IP_1D> 1.68 TiB 256 49.5% <UUID_1D> unknown
UN <IP_1E> 1.77 TiB 256 51.4% <UUID_1E> unknown
UN <IP_1F> 1.56 TiB 256 49.9% <UUID_1F> unknown
Datacenter: group20
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
DS <IP_2A> 3.06 TiB 256 100.0% <UUID_2A> unknown
DS <IP_2B> 3.13 TiB 256 100.0% <UUID_2B> unknown
DS <IP_2C> 2.99 TiB 256 100.0% <UUID_2C> unknown
Datacenter: group30
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
UN <IP_3A> 1.18 TiB 256 33.4% <UUID_3A> unknown
UN <IP_3B> 1.03 TiB 256 33.6% <UUID_3B> unknown
UN <IP_3C> 1.1 TiB 256 34.8% <UUID_3C> unknown
UN <IP_3D> 1.03 TiB 256 31.4% <UUID_3D> unknown
UN <IP_3E> 1.02 TiB 256 34.1% <UUID_3E> unknown
UN <IP_3F> 964.94 GiB 256 32.1% <UUID_3F> unknown
UN <IP_3G> 1.02 TiB 256 34.7% <UUID_3G> unknown
UN <IP_3H> 969.99 GiB 256 31.6% <UUID_3H> unknown
UN <IP_3I> 1.1 TiB 256 34.3% <UUID_3I> unknown
Datacenter: group40
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
UN <IP_4A> 10.35 GiB 256 37.1% <UUID_4A> unknown
UN <IP_4B> 7.62 GiB 256 35.7% <UUID_4B> unknown
UN <IP_4C> 4.83 GiB 256 39.9% <UUID_4C> unknown
UN <IP_4D> 11.75 GiB 256 40.1% <UUID_4D> unknown
UN <IP_4E> 10.69 GiB 256 38.8% <UUID_4E> unknown
UN <IP_4F> 6.12 GiB 256 35.2% <UUID_4F> unknown
UN <IP_4G> 8.06 GiB 256 37.0% <UUID_4G> unknown
UN <IP_4H> 14.86 GiB 256 36.0% <UUID_4H> unknown