Kafka Troubleshooting Notes
Something Wrong With One Broker in a Cluster
Consumer connected to a broker but not receiving messages. Looking at various topics, broker 0 is marked as an out of sync replica, e.g.:
./kafka-topics.sh --describe my-topic --zookeeper myzk:2181/kafka
Topic: my-topic Partition: 6 Leader: 4 Replicas: 0,4,1 Isr: 4,1
Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 0,1,2 Isr: 2,1
Notice ‘0’ is missing from the
Isr list above.
Also, I tried publishing a test message manually to the topic using
broker-0 and it didn’t seem to arrive. Lag did not increase, whereas lag does increase if I publish directly to
In this case, the problem was caused by some internal problem in the broker which caused it to drop out of the broker cluster. Restarting the broker resolved the immediate problem. However, the (Pykafka) consumers repeatedly attempted to reset to offset
-1 on startup. To resolve this, the consumers can be force reset to a specific offset (assuming there is a record somewhere of the latest offset that has actually been consumed). There is a simple script here to reset offsets.