[jira] [Commented] (ZOOKEEPER-1277) servers stop serving when lower 32bits of zxid roll over

Discussion:

Dave Latham (JIRA)

2013-04-16 21:49:19 UTC

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633423#comment-13633423 ]

Dave Latham commented on ZOOKEEPER-1277:
----------------------------------------

Excuse me, we were running 3.4.3, not 3.4.4

servers stop serving when lower 32bits of zxid roll over
--------------------------------------------------------
Key: ZOOKEEPER-1277
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1277
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.3.3
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
Fix For: 3.3.5, 3.4.4, 3.5.0
Attachments: ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br34.patch, ZOOKEEPER-1277_br34.patch, ZOOKEEPER-1277_trunk.patch, ZOOKEEPER-1277_trunk.patch
When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again.
This should work fine, however in the current 3.3 branch the followers see this as a NEWLEADER message, which it's not, and effectively stop serving clients. Attached clients seem to eventually time out given that heartbeats (or any operation) are no longer processed. The follower doesn't recover from this.
I've tested this out on 3.3 branch and confirmed this problem, however I haven't tried it on 3.4/3.5. It may not happen on the newer branches due to ZOOKEEPER-335, however there is certainly an issue with updating the "acceptedEpoch" files contained in the datadir. (I'll enter a separate jira for that)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Dave Latham (JIRA)

2013-04-16 21:49:17 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633420#comment-13633420 ]

Dave Latham commented on ZOOKEEPER-1277:
----------------------------------------

We recently experienced an HBase outage that I believe was caused by this issue. Running on ZK 3.4.4, the log for the leader shows this:

{noformat}
2013-04-12 17:46:25,894 INFO org.apache.zookeeper.server.quorum.Leader: Have quorum of supporters; starting up and setting last processed zxid: 0x1a00000004
2013-04-12 17:46:25,895 WARN org.apache.zookeeper.server.FinalRequestProcessor: Zxid outstanding 111669149696 is less than current 111669149697
2013-04-12 17:46:25,895 WARN org.apache.zookeeper.server.quorum.LearnerHandler: ******* GOODBYE /10.0.1.100:34796 ********
2013-04-12 17:46:25,896 ERROR org.apache.zookeeper.server.NIOServerCnxnFactory: Thread LearnerHandler Socket[addr=/10.0.1.100,port=34796,localport=2888] tickOfLastAck:897811 synced?:true queuedPacketLength:0 died
java.lang.IllegalThreadStateException
at java.lang.Thread.start(Thread.java:638)
at org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.startSessionTracker(LeaderZooKeeperServer.java:87)
at org.apache.zookeeper.server.ZooKeeperServer.startup(ZooKeeperServer.java:394)
at org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:531)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:497)
{noformat}

Immediately after this one of the followers had a new election and became a follower again. Also, the heap on the leader immediately climbed until the process became stuck spending most of its time in GC. At this point HBase region servers started dropping like flies and then the ZK node was killed.

I'm adding this comment now for two purposes. First, so that if other people see the same symptom in their logs they may find this issue faster. Second, I'd love to hear from anyone more familiar with ZooKeeper if this issue does indeeed explain the observations I wrote and mentioned above.

Patrick Hunt (JIRA)

2013-04-16 23:43:16 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633563#comment-13633563 ]

Patrick Hunt commented on ZOOKEEPER-1277:
-----------------------------------------

Hi [~davelatham], it seems unlikely to me. Are you only running hbase against ZK? Because in that case the number of changes to zk are going to be <<<< than 4billion (the amount necessary to roll over the lower 32 bits), hbase just doesn't generate that much traffic. I've only seen the rollover case with 10k's of clients doing large numbers of operations per second. hbase just doesn't drive that much traffic - it's mainly for failover and table management.

You might have hit an issue with 3.4 that was fixed in a subsequent release. However the symptoms you mentioned don't ring a bell either....

Dave Latham (JIRA)

2013-04-16 23:47:15 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633565#comment-13633565 ]

Dave Latham commented on ZOOKEEPER-1277:
----------------------------------------

Thanks for the response, [~phunt]. It is only HBase, but there are 1000 region servers and are using replication which puts much greater load on ZK. Taking a recent sample I see the zxid going up by thousands per second.

Patrick Hunt (JIRA)

2013-04-16 23:59:16 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633580#comment-13633580 ]

Patrick Hunt commented on ZOOKEEPER-1277:
-----------------------------------------

[~davelatham] this could be it then. 1k's/sec means ~ a month before rollover.

Lu Xuehui (JIRA)

2014-02-28 10:21:20 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915635#comment-13915635 ]

Lu Xuehui commented on ZOOKEEPER-1277:
--------------------------------------

when the zixd roll over, the epoch++ ; a new leader arises ,the epoch += 2. this way can avoid throw Exception ?

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Alexandr Orlov (JIRA)

2015-12-07 12:43:10 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044853#comment-15044853 ]

Alexandr Orlov commented on ZOOKEEPER-1277:
-------------------------------------------

Hi! We still have a problem with versions 3.4.6, 3.5.1-alpha:
{noformat}
2015-12-03 11:02:48,073 - ERROR [ProcessThread(sid:5 cport:-1)::***@139] - Unexpected exception
org.apache.zookeeper.server.RequestProcessor$RequestProcessorException: zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start
at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:80)
at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:673)
at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:131)
Caused by: org.apache.zookeeper.server.quorum.Leader$XidRolloverException: zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start
at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:746)
at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78)
... 2 more
2015-12-03 11:02:48,073 - WARN [LearnerHandler-/2a02:6b8:0:1602:37a6:f71a:79c1:e5f3:48040:***@658] - Ignoring unexpected exception
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
2015-12-03 11:02:49,766 - INFO [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2183:***@493] - Shutting down
2015-12-03 11:02:49,766 - INFO [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2183:***@714] - LOOKING
2015-12-03 11:02:49,766 - DEBUG [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2183:***@645] - Initializing leader election protocol...
{noformat}

As i seen at https://zookeeper.apache.org/doc/r3.5.1-alpha/releasenotes.html that problem should be resolved, but it isn't

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Flavio Junqueira (JIRA)

2015-12-07 13:57:11 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044976#comment-15044976 ]

Flavio Junqueira commented on ZOOKEEPER-1277:
---------------------------------------------

[~frenzzz] The log messages you posted say that it is triggering a new election, which starts a new epoch and consequently resets the zxid. What's the problem you're observing more precisely?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Alexandr Orlov (JIRA)

2015-12-07 18:05:11 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045368#comment-15045368 ]

Alexandr Orlov commented on ZOOKEEPER-1277:
-------------------------------------------

I mean when zxid "roll over" had occured and new leader election triggred, zookeeper stop serving. Leader activation at our environment
took about 30sec and zxid roll over happens about two times per week, what is not pretty good. Would be great, if it possible, to find out some solution for avoiding re-election.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Benedict Jin (JIRA)

2017-05-17 10:08:04 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013819#comment-16013819 ]

Benedict Jin commented on ZOOKEEPER-1277:
-----------------------------------------

@Patrick Hunt Hi, Patrick Hunt. If zk happend `32bits` overflow and force a leader re-election, but at the same time run the command `zkServer.sh start` from outside by my `keep alive` shell script. Is there could be a problem?

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Patrick Hunt (JIRA)

2017-05-24 02:27:05 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022221#comment-16022221 ]

Patrick Hunt commented on ZOOKEEPER-1277:
-----------------------------------------

[~benedict jin] Not sure I follow that question. I believe it should be ok to add a new server during a re-election, even if that election were triggered by a epoch overflow. I've never tried that however.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Benedict Jin (JIRA)

2017-05-24 02:31:04 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022227#comment-16022227 ]

Benedict Jin commented on ZOOKEEPER-1277:
-----------------------------------------

I see. Thank you! :D

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Benedict Jin (JIRA)

2017-05-24 02:35:05 UTC

Permalink

[ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022229#comment-16022229 ]

Benedict Jin commented on ZOOKEEPER-1277:
-----------------------------------------

I created a new jira ZOOKEEPER-2789 to discuss reassign `ZXID` for solving 32bit overflow problem. Could you please offer some advice for it?

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)