1# librdkafka v1.7.0 2 3librdkafka v1.7.0 is feature release: 4 5 * [KIP-360](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820) - Improve reliability of transactional producer. 6 Requires Apache Kafka 2.5 or later. 7 * OpenSSL Engine support (`ssl.engine.location`) by @adinigam and @ajbarb. 8 9 10## Enhancements 11 12 * Added `connections.max.idle.ms` to automatically close idle broker 13 connections. 14 This feature is disabled by default unless `bootstrap.servers` contains 15 the string `azure` in which case the default is set to <4 minutes to improve 16 connection reliability and circumvent limitations with the Azure load 17 balancers (see #3109 for more information). 18 * Bumped to OpenSSL 1.1.1k in binary librdkafka artifacts. 19 * The binary librdkafka artifacts for Alpine are now using Alpine 3.12. 20 OpenSSL 1.1.1k. 21 * Improved static librdkafka Windows builds using MinGW (@neptoess, #3130). 22 23 24## Upgrade considerations 25 26 * The C++ `oauthbearer_token_refresh_cb()` was missing a `Handle *` 27 argument that has now been added. This is a breaking change but the original 28 function signature is considered a bug. 29 This change only affects C++ OAuth developers. 30 * [KIP-735](https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout) The consumer `session.timeout.ms` 31 default was changed from 10 to 45 seconds to make consumer groups more 32 robust and less sensitive to temporary network and cluster issues. 33 * Statistics: `consumer_lag` is now using the `committed_offset`, 34 while the new `consumer_lag_stored` is using `stored_offset` 35 (offset to be committed). 36 This is more correct than the previous `consumer_lag` which was using 37 either `committed_offset` or `app_offset` (last message passed 38 to application). 39 40 41## Fixes 42 43### General fixes 44 45 * Fix accesses to freed metadata cache mutexes on client termination (#3279) 46 * There was a race condition on receiving updated metadata where a broker id 47 update (such as bootstrap to proper broker transformation) could finish after 48 the topic metadata cache was updated, leading to existing brokers seemingly 49 being not available. 50 One occurrence of this issue was query_watermark_offsets() that could return 51 `ERR__UNKNOWN_PARTITION` for existing partitions shortly after the 52 client instance was created. 53 * The OpenSSL context is now initialized with `TLS_client_method()` 54 (on OpenSSL >= 1.1.0) instead of the deprecated and outdated 55 `SSLv23_client_method()`. 56 * The initial cluster connection on client instance creation could sometimes 57 be delayed up to 1 second if a `group.id` or `transactional.id` 58 was configured (#3305). 59 * Speed up triggering of new broker connections in certain cases by exiting 60 the broker thread io/op poll loop when a wakeup op is received. 61 * SASL GSSAPI: The Kerberos kinit refresh command was triggered from 62 `rd_kafka_new()` which made this call blocking if the refresh command 63 was taking long. The refresh is now performed by the background rdkafka 64 main thread. 65 * Fix busy-loop (100% CPU on the broker threads) during the handshake phase 66 of an SSL connection. 67 * Disconnects during SSL handshake are now propagated as transport errors 68 rather than SSL errors, since these disconnects are at the transport level 69 (e.g., incorrect listener, flaky load balancer, etc) and not due to SSL 70 issues. 71 * Increment metadata fast refresh interval backoff exponentially (@ajbarb, #3237). 72 * Unthrottled requests are no longer counted in the `brokers[].throttle` 73 statistics object. 74 * Log CONFWARN warning when global topic configuration properties 75 are overwritten by explicitly setting a `default_topic_conf`. 76 77### Consumer fixes 78 79 * If a rebalance happened during a `consume_batch..()` call the already 80 accumulated messages for revoked partitions were not purged, which would 81 pass messages to the application for partitions that were no longer owned 82 by the consumer. Fixed by @jliunyu. #3340. 83 * Fix balancing and reassignment issues with the cooperative-sticky assignor. 84 #3306. 85 * Fix incorrect detection of first rebalance in sticky assignor (@hallfox). 86 * Aborted transactions with no messages produced to a partition could 87 cause further successfully committed messages in the same Fetch response to 88 be ignored, resulting in consumer-side message loss. 89 A log message along the lines `Abort txn ctrl msg bad order at offset 90 7501: expected before or at 7702: messages in aborted transactions may be delivered to the application` 91 would be seen. 92 This is a rare occurrence where a transactional producer would register with 93 the partition but not produce any messages before aborting the transaction. 94 * The consumer group deemed cached metadata up to date by checking 95 `topic.metadata.refresh.interval.ms`: if this property was set too low 96 it would cause cached metadata to be unusable and new metadata to be fetched, 97 which could delay the time it took for a rebalance to settle. 98 It now correctly uses `metadata.max.age.ms` instead. 99 * The consumer group timed auto commit would attempt commits during rebalances, 100 which could result in "Illegal generation" errors. This is now fixed, the 101 timed auto committer is only employed in the steady state when no rebalances 102 are taking places. Offsets are still auto committed when partitions are 103 revoked. 104 * Retriable FindCoordinatorRequest errors are no longer propagated to 105 the application as they are retried automatically. 106 * Fix rare crash (assert `rktp_started`) on consumer termination 107 (introduced in v1.6.0). 108 * Fix unaligned access and possibly corrupted snappy decompression when 109 building with MSVC (@azat) 110 * A consumer configured with the `cooperative-sticky` assignor did 111 not actively Leave the group on unsubscribe(). This delayed the 112 rebalance for the remaining group members by up to `session.timeout.ms`. 113 * The current subscription list was sometimes leaked when unsubscribing. 114 115### Producer fixes 116 117 * The timeout value of `flush()` was not respected when delivery reports 118 were scheduled as events (such as for confluent-kafka-go) rather than 119 callbacks. 120 * There was a race conditition in `purge()` which could cause newly 121 created partition objects, or partitions that were changing leaders, to 122 not have their message queues purged. This could cause 123 `abort_transaction()` to time out. This issue is now fixed. 124 * In certain high-thruput produce rate patterns producing could stall for 125 1 second, regardless of `linger.ms`, due to rate-limiting of internal 126 queue wakeups. This is now fixed by not rate-limiting queue wakeups but 127 instead limiting them to one wakeup per queue reader poll. #2912. 128 129### Transactional Producer fixes 130 131 * KIP-360: Fatal Idempotent producer errors are now recoverable by the 132 transactional producer and will raise a `txn_requires_abort()` error. 133 * If the cluster went down between `produce()` and `commit_transaction()` 134 and before any partitions had been registered with the coordinator, the 135 messages would time out but the commit would succeed because nothing 136 had been sent to the coordinator. This is now fixed. 137 * If the current transaction failed while `commit_transaction()` was 138 checking the current transaction state an invalid state transaction could 139 occur which in turn would trigger a assertion crash. 140 This issue showed up as "Invalid txn state transition: .." crashes, and is 141 now fixed by properly synchronizing both checking and transition of state. 142 143 144 145# librdkafka v1.6.1 146 147librdkafka v1.6.1 is a maintenance release. 148 149## Upgrade considerations 150 151 * Fatal idempotent producer errors are now also fatal to the transactional 152 producer. This is a necessary step to maintain data integrity prior to 153 librdkafka supporting KIP-360. Applications should check any transactional 154 API errors for the is_fatal flag and decommission the transactional producer 155 if the flag is set. 156 * The consumer error raised by `auto.offset.reset=error` now has error-code 157 set to `ERR__AUTO_OFFSET_RESET` to allow an application to differentiate 158 between auto offset resets and other consumer errors. 159 160 161## Fixes 162 163### General fixes 164 165 * Admin API and transactional `send_offsets_to_transaction()` coordinator 166 requests, such as TxnOffsetCommitRequest, could in rare cases be sent 167 multiple times which could cause a crash. 168 * `ssl.ca.location=probe` is now enabled by default on Mac OSX since the 169 librdkafka-bundled OpenSSL might not have the same default CA search paths 170 as the system or brew installed OpenSSL. Probing scans all known locations. 171 172### Transactional Producer fixes 173 174 * Fatal idempotent producer errors are now also fatal to the transactional 175 producer. 176 * The transactional producer could crash if the transaction failed while 177 `send_offsets_to_transaction()` was called. 178 * Group coordinator requests for transactional 179 `send_offsets_to_transaction()` calls would leak memory if the 180 underlying request was attempted to be sent after the transaction had 181 failed. 182 * When gradually producing to multiple partitions (resulting in multiple 183 underlying AddPartitionsToTxnRequests) subsequent partitions could get 184 stuck in pending state under certain conditions. These pending partitions 185 would not send queued messages to the broker and eventually trigger 186 message timeouts, failing the current transaction. This is now fixed. 187 * Committing an empty transaction (no messages were produced and no 188 offsets were sent) would previously raise a fatal error due to invalid state 189 on the transaction coordinator. We now allow empty/no-op transactions to 190 be committed. 191 192### Consumer fixes 193 194 * The consumer will now retry indefinitely (or until the assignment is changed) 195 to retrieve committed offsets. This fixes the issue where only two retries 196 were attempted when outstanding transactions were blocking OffsetFetch 197 requests with `ERR_UNSTABLE_OFFSET_COMMIT`. #3265 198 199 200 201 202 203# librdkafka v1.6.0 204 205librdkafka v1.6.0 is feature release: 206 207 * [KIP-429 Incremental rebalancing](https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol) with sticky 208 consumer group partition assignor (KIP-54) (by @mhowlett). 209 * [KIP-480 Sticky producer partitioning](https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner) (`sticky.partitioning.linger.ms`) - 210 achieves higher throughput and lower latency through sticky selection 211 of random partition (by @abbycriswell). 212 * AdminAPI: Add support for `DeleteRecords()`, `DeleteGroups()` and 213 `DeleteConsumerGroupOffsets()` (by @gridaphobe) 214 * [KIP-447 Producer scalability for exactly once semantics](https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics) - 215 allows a single transactional producer to be used for multiple input 216 partitions. Requires Apache Kafka 2.5 or later. 217 * Transactional producer fixes and improvements, see **Transactional Producer fixes** below. 218 * The [librdkafka.redist](https://www.nuget.org/packages/librdkafka.redist/) 219 NuGet package now supports Linux ARM64/Aarch64. 220 221 222## Upgrade considerations 223 224 * Sticky producer partitioning (`sticky.partitioning.linger.ms`) is 225 enabled by default (10 milliseconds) which affects the distribution of 226 randomly partitioned messages, where previously these messages would be 227 evenly distributed over the available partitions they are now partitioned 228 to a single partition for the duration of the sticky time 229 (10 milliseconds by default) before a new random sticky partition 230 is selected. 231 * The new KIP-447 transactional producer scalability guarantees are only 232 supported on Apache Kafka 2.5 or later, on earlier releases you will 233 need to use one producer per input partition for EOS. This limitation 234 is not enforced by the producer or broker. 235 * Error handling for the transactional producer has been improved, see 236 the **Transactional Producer fixes** below for more information. 237 238 239## Known issues 240 241 * The Transactional Producer's API timeout handling is inconsistent with the 242 underlying protocol requests, it is therefore strongly recommended that 243 applications call `rd_kafka_commit_transaction()` and 244 `rd_kafka_abort_transaction()` with the `timeout_ms` parameter 245 set to `-1`, which will use the remaining transaction timeout. 246 247 248## Enhancements 249 250 * KIP-107, KIP-204: AdminAPI: Added `DeleteRecords()` (by @gridaphobe). 251 * KIP-229: AdminAPI: Added `DeleteGroups()` (by @gridaphobe). 252 * KIP-496: AdminAPI: Added `DeleteConsumerGroupOffsets()`. 253 * KIP-464: AdminAPI: Added support for broker-side default partition count 254 and replication factor for `CreateTopics()`. 255 * Windows: Added `ssl.ca.certificate.stores` to specify a list of 256 Windows Certificate Stores to read CA certificates from, e.g., 257 `CA,Root`. `Root` remains the default store. 258 * Use reentrant `rand_r()` on supporting platforms which decreases lock 259 contention (@azat). 260 * Added `assignor` debug context for troubleshooting consumer partition 261 assignments. 262 * Updated to OpenSSL v1.1.1i when building dependencies. 263 * Update bundled lz4 (used when `./configure --disable-lz4-ext`) to v1.9.3 264 which has vast performance improvements. 265 * Added `rd_kafka_conf_get_default_topic_conf()` to retrieve the 266 default topic configuration object from a global configuration object. 267 * Added `conf` debugging context to `debug` - shows set configuration 268 properties on client and topic instantiation. Sensitive properties 269 are redacted. 270 * Added `rd_kafka_queue_yield()` to cancel a blocking queue call. 271 * Will now log a warning when multiple ClusterIds are seen, which is an 272 indication that the client might be erroneously configured to connect to 273 multiple clusters which is not supported. 274 * Added `rd_kafka_seek_partitions()` to seek multiple partitions to 275 per-partition specific offsets. 276 277 278## Fixes 279 280### General fixes 281 282 * Fix a use-after-free crash when certain coordinator requests were retried. 283 * The C++ `oauthbearer_set_token()` function would call `free()` on 284 a `new`-created pointer, possibly leading to crashes or heap corruption (#3194) 285 286### Consumer fixes 287 288 * The consumer assignment and consumer group implementations have been 289 decoupled, simplified and made more strict and robust. This will sort out 290 a number of edge cases for the consumer where the behaviour was previously 291 undefined. 292 * Partition fetch state was not set to STOPPED if OffsetCommit failed. 293 * The session timeout is now enforced locally also when the coordinator 294 connection is down, which was not previously the case. 295 296 297### Transactional Producer fixes 298 299 * Transaction commit or abort failures on the broker, such as when the 300 producer was fenced by a newer instance, were not propagated to the 301 application resulting in failed commits seeming successful. 302 This was a critical race condition for applications that had a delay after 303 producing messages (or sendings offsets) before committing or 304 aborting the transaction. This issue has now been fixed and test coverage 305 improved. 306 * The transactional producer API would return `RD_KAFKA_RESP_ERR__STATE` 307 when API calls were attempted after the transaction had failed, we now 308 try to return the error that caused the transaction to fail in the first 309 place, such as `RD_KAFKA_RESP_ERR__FENCED` when the producer has 310 been fenced, or `RD_KAFKA_RESP_ERR__TIMED_OUT` when the transaction 311 has timed out. 312 * Transactional producer retry count for transactional control protocol 313 requests has been increased from 3 to infinite, retriable errors 314 are now automatically retried by the producer until success or the 315 transaction timeout is exceeded. This fixes the case where 316 `rd_kafka_send_offsets_to_transaction()` would fail the current 317 transaction into an abortable state when `CONCURRENT_TRANSACTIONS` was 318 returned by the broker (which is a transient error) and the 3 retries 319 were exhausted. 320 321 322### Producer fixes 323 324 * Calling `rd_kafka_topic_new()` with a topic config object with 325 `message.timeout.ms` set could sometimes adjust the global `linger.ms` 326 property (if not explicitly configured) which was not desired, this is now 327 fixed and the auto adjustment is only done based on the 328 `default_topic_conf` at producer creation. 329 * `rd_kafka_flush()` could previously return `RD_KAFKA_RESP_ERR__TIMED_OUT` 330 just as the timeout was reached if the messages had been flushed but 331 there were now no more messages. This has been fixed. 332 333 334 335 336# librdkafka v1.5.3 337 338librdkafka v1.5.3 is a maintenance release. 339 340## Upgrade considerations 341 342 * CentOS 6 is now EOL and is no longer included in binary librdkafka packages, 343 such as NuGet. 344 345## Fixes 346 347### General fixes 348 349 * Fix a use-after-free crash when certain coordinator requests were retried. 350 * Coordinator requests could be left uncollected on instance destroy which 351 could lead to hang. 352 * Fix rare 1 second stalls by forcing rdkafka main thread wakeup when a new 353 next-timer-to-be-fired is scheduled. 354 * Fix additional cases where broker-side automatic topic creation might be 355 triggered unexpectedly. 356 * AdminAPI: The operation_timeout (on-broker timeout) previously defaulted to 0, 357 but now defaults to `socket.timeout.ms` (60s). 358 * Fix possible crash for Admin API protocol requests that fail at the 359 transport layer or prior to sending. 360 361 362### Consumer fixes 363 364 * Consumer would not filter out messages for aborted transactions 365 if the messages were compressed (#3020). 366 * Consumer destroy without prior `close()` could hang in certain 367 cgrp states (@gridaphobe, #3127). 368 * Fix possible null dereference in `Message::errstr()` (#3140). 369 * The `roundrobin` partition assignment strategy could get stuck in an 370 endless loop or generate uneven assignments in case the group members 371 had asymmetric subscriptions (e.g., c1 subscribes to t1,t2 while c2 372 subscribes to t2,t3). (#3159) 373 * Mixing committed and logical or absolute offsets in the partitions 374 passed to `rd_kafka_assign()` would in previous released ignore the 375 logical or absolute offsets and use the committed offsets for all partitions. 376 This is now fixed. (#2938) 377 378 379 380 381# librdkafka v1.5.2 382 383librdkafka v1.5.2 is a maintenance release. 384 385 386## Upgrade considerations 387 388 * The default value for the producer configuration property `retries` has 389 been increased from 2 to infinity, effectively limiting Produce retries to 390 only `message.timeout.ms`. 391 As the reasons for the automatic internal retries vary (various broker error 392 codes as well as transport layer issues), it doesn't make much sense to limit 393 the number of retries for retriable errors, but instead only limit the 394 retries based on the allowed time to produce a message. 395 * The default value for the producer configuration property 396 `request.timeout.ms` has been increased from 5 to 30 seconds to match 397 the Apache Kafka Java producer default. 398 This change yields increased robustness for broker-side congestion. 399 400 401## Enhancements 402 403 * The generated `CONFIGURATION.md` (through `rd_kafka_conf_properties_show())`) 404 now include all properties and values, regardless if they were included in 405 the build, and setting a disabled property or value through 406 `rd_kafka_conf_set()` now returns `RD_KAFKA_CONF_INVALID` and provides 407 a more useful error string saying why the property can't be set. 408 * Consumer configs on producers and vice versa will now be logged with 409 warning messages on client instantiation. 410 411## Fixes 412 413### Security fixes 414 415 * There was an incorrect call to zlib's `inflateGetHeader()` with 416 unitialized memory pointers that could lead to the GZIP header of a fetched 417 message batch to be copied to arbitrary memory. 418 This function call has now been completely removed since the result was 419 not used. 420 Reported by Ilja van Sprundel. 421 422 423### General fixes 424 425 * `rd_kafka_topic_opaque()` (used by the C++ API) would cause object 426 refcounting issues when used on light-weight (error-only) topic objects 427 such as consumer errors (#2693). 428 * Handle name resolution failures when formatting IP addresses in error logs, 429 and increase printed hostname limit to ~256 bytes (was ~60). 430 * Broker sockets would be closed twice (thus leading to potential race 431 condition with fd-reuse in other threads) if a custom `socket_cb` would 432 return error. 433 434### Consumer fixes 435 436 * The `roundrobin` `partition.assignment.strategy` could crash (assert) 437 for certain combinations of members and partitions. 438 This is a regression in v1.5.0. (#3024) 439 * The C++ `KafkaConsumer` destructor did not destroy the underlying 440 C `rd_kafka_t` instance, causing a leak if `close()` was not used. 441 * Expose rich error strings for C++ Consumer `Message->errstr()`. 442 * The consumer could get stuck if an outstanding commit failed during 443 rebalancing (#2933). 444 * Topic authorization errors during fetching are now reported only once (#3072). 445 446### Producer fixes 447 448 * Topic authorization errors are now properly propagated for produced messages, 449 both through delivery reports and as `ERR_TOPIC_AUTHORIZATION_FAILED` 450 return value from `produce*()` (#2215) 451 * Treat cluster authentication failures as fatal in the transactional 452 producer (#2994). 453 * The transactional producer code did not properly reference-count partition 454 objects which could in very rare circumstances lead to a use-after-free bug 455 if a topic was deleted from the cluster when a transaction was using it. 456 * `ERR_KAFKA_STORAGE_ERROR` is now correctly treated as a retriable 457 produce error (#3026). 458 * Messages that timed out locally would not fail the ongoing transaction. 459 If the application did not take action on failed messages in its delivery 460 report callback and went on to commit the transaction, the transaction would 461 be successfully committed, simply omitting the failed messages. 462 * EndTxnRequests (sent on commit/abort) are only retried in allowed 463 states (#3041). 464 Previously the transaction could hang on commit_transaction() if an abortable 465 error was hit and the EndTxnRequest was to be retried. 466 467 468*Note: there was no v1.5.1 librdkafka release* 469 470 471 472 473# librdkafka v1.5.0 474 475The v1.5.0 release brings usability improvements, enhancements and fixes to 476librdkafka. 477 478## Enhancements 479 480 * Improved broker connection error reporting with more useful information and 481 hints on the cause of the problem. 482 * Consumer: Propagate errors when subscribing to unavailable topics (#1540) 483 * Producer: Add `batch.size` producer configuration property (#638) 484 * Add `topic.metadata.propagation.max.ms` to allow newly manually created 485 topics to be propagated throughout the cluster before reporting them 486 as non-existent. This fixes race issues where CreateTopics() is 487 quickly followed by produce(). 488 * Prefer least idle connection for periodic metadata refreshes, et.al., 489 to allow truly idle connections to time out and to avoid load-balancer-killed 490 idle connection errors (#2845) 491 * Added `rd_kafka_event_debug_contexts()` to get the debug contexts for 492 a debug log line (by @wolfchimneyrock). 493 * Added Test scenarios which define the cluster configuration. 494 * Added MinGW-w64 builds (@ed-alertedh, #2553) 495 * `./configure --enable-XYZ` now requires the XYZ check to pass, 496 and `--disable-XYZ` disables the feature altogether (@benesch) 497 * Added `rd_kafka_produceva()` which takes an array of produce arguments 498 for situations where the existing `rd_kafka_producev()` va-arg approach 499 can't be used. 500 * Added `rd_kafka_message_broker_id()` to see the broker that a message 501 was produced or fetched from, or an error was associated with. 502 * Added RTT/delay simulation to mock brokers. 503 504 505## Upgrade considerations 506 507 * Subscribing to non-existent and unauthorized topics will now propagate 508 errors `RD_KAFKA_RESP_ERR_UNKNOWN_TOPIC_OR_PART` and 509 `RD_KAFKA_RESP_ERR_TOPIC_AUTHORIZATION_FAILED` to the application through 510 the standard consumer error (the err field in the message object). 511 * Consumer will no longer trigger auto creation of topics, 512 `allow.auto.create.topics=true` may be used to re-enable the old deprecated 513 functionality. 514 * The default consumer pre-fetch queue threshold `queued.max.messages.kbytes` 515 has been decreased from 1GB to 64MB to avoid excessive network usage for low 516 and medium throughput consumer applications. High throughput consumer 517 applications may need to manually set this property to a higher value. 518 * The default consumer Fetch wait time has been increased from 100ms to 500ms 519 to avoid excessive network usage for low throughput topics. 520 * If OpenSSL is linked statically, or `ssl.ca.location=probe` is configured, 521 librdkafka will probe known CA certificate paths and automatically use the 522 first one found. This should alleviate the need to configure 523 `ssl.ca.location` when the statically linked OpenSSL's OPENSSLDIR differs 524 from the system's CA certificate path. 525 * The heuristics for handling Apache Kafka < 0.10 brokers has been removed to 526 improve connection error handling for modern Kafka versions. 527 Users on Brokers 0.9.x or older should already be configuring 528 `api.version.request=false` and `broker.version.fallback=...` so there 529 should be no functional change. 530 * The default producer batch accumulation time, `linger.ms`, has been changed 531 from 0.5ms to 5ms to improve batch sizes and throughput while reducing 532 the per-message protocol overhead. 533 Applications that require lower produce latency than 5ms will need to 534 manually set `linger.ms` to a lower value. 535 * librdkafka's build tooling now requires Python 3.x (python3 interpreter). 536 537 538## Fixes 539 540### General fixes 541 542 * The client could crash in rare circumstances on ApiVersion or 543 SaslHandshake request timeouts (#2326) 544 * `./configure --LDFLAGS='a=b, c=d'` with arguments containing = are now 545 supported (by @sky92zwq). 546 * `./configure` arguments now take precedence over cached `configure` variables 547 from previous invocation. 548 * Fix theoretical crash on coord request failure. 549 * Unknown partition error could be triggered for existing partitions when 550 additional partitions were added to a topic (@benesch, #2915) 551 * Quickly refresh topic metadata for desired but non-existent partitions. 552 This will speed up the initial discovery delay when new partitions are added 553 to an existing topic (#2917). 554 555 556### Consumer fixes 557 558 * The roundrobin partition assignor could crash if subscriptions 559 where asymmetrical (different sets from different members of the group). 560 Thanks to @ankon and @wilmai for identifying the root cause (#2121). 561 * The consumer assignors could ignore some topics if there were more subscribed 562 topics than consumers in taking part in the assignment. 563 * The consumer would connect to all partition leaders of a topic even 564 for partitions that were not being consumed (#2826). 565 * Initial consumer group joins should now be a couple of seconds quicker 566 thanks expedited query intervals (@benesch). 567 * Fix crash and/or inconsistent subscriptions when using multiple consumers 568 (in the same process) with wildcard topics on Windows. 569 * Don't propagate temporary offset lookup errors to application. 570 * Immediately refresh topic metadata when partitions are reassigned to other 571 brokers, avoiding a fetch stall of up to `topic.metadata.refresh.interval.ms`. (#2955) 572 * Memory for batches containing control messages would not be freed when 573 using the batch consume APIs (@pf-qiu, #2990). 574 575 576### Producer fixes 577 578 * Proper locking for transaction state in EndTxn handler. 579 580 581 582# librdkafka v1.4.4 583 584v1.4.4 is a maintenance release with the following fixes and enhancements: 585 586 * Transactional producer could crash on request timeout due to dereferencing 587 NULL pointer of non-existent response object. 588 * Mark `rd_kafka_send_offsets_to_transaction()` CONCURRENT_TRANSACTION (et.al) 589 errors as retriable. 590 * Fix crash on transactional coordinator FindCoordinator request failure. 591 * Minimize broker re-connect delay when broker's connection is needed to 592 send requests. 593 * Proper locking for transaction state in EndTxn handler. 594 * `socket.timeout.ms` was ignored when `transactional.id` was set. 595 * Added RTT/delay simulation to mock brokers. 596 597*Note: there was no v1.4.3 librdkafka release* 598 599 600 601# librdkafka v1.4.2 602 603v1.4.2 is a maintenance release with the following fixes and enhancements: 604 605 * Fix produce/consume hang after partition goes away and comes back, 606 such as when a topic is deleted and re-created. 607 * Consumer: Reset the stored offset when partitions are un-assign()ed (fixes #2782). 608 This fixes the case where a manual offset-less commit() or the auto-committer 609 would commit a stored offset from a previous assignment before 610 a new message was consumed by the application. 611 * Probe known CA cert paths and set default `ssl.ca.location` accordingly 612 if OpenSSL is statically linked or `ssl.ca.location` is set to `probe`. 613 * Per-partition OffsetCommit errors were unhandled (fixes #2791) 614 * Seed the PRNG (random number generator) by default, allow application to 615 override with `enable.random.seed=false` (#2795) 616 * Fix stack overwrite (of 1 byte) when SaslHandshake MechCnt is zero 617 * Align bundled c11 threads (tinycthreads) constants to glibc and musl (#2681) 618 * Fix return value of rd_kafka_test_fatal_error() (by @ckb42) 619 * Ensure CMake sets disabled defines to zero on Windows (@benesch) 620 621 622*Note: there was no v1.4.1 librdkafka release* 623 624 625 626 627 628# Older releases 629 630See https://github.com/edenhill/librdkafka/releases 631