Revision tags: v9.0.0-rc2, v9.0.0-rc1, v9.0.0-rc0 |
|
#
303e6f54 |
| 11-Mar-2024 |
Hao Xiang <hao.xiang@bytedance.com> |
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for n
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters.
Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v9.0.0-rc2, v9.0.0-rc1, v9.0.0-rc0 |
|
#
303e6f54 |
| 11-Mar-2024 |
Hao Xiang <hao.xiang@bytedance.com> |
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for n
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters.
Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v9.0.0-rc2, v9.0.0-rc1, v9.0.0-rc0 |
|
#
303e6f54 |
| 11-Mar-2024 |
Hao Xiang <hao.xiang@bytedance.com> |
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for n
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters.
Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v8.2.2, v7.2.10 |
|
#
4aac6b1e |
| 29-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multif
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v8.2.2, v7.2.10 |
|
#
4aac6b1e |
| 29-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multif
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
3ab4441d |
| 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only
- multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling.
Use multifd_send_set_error() in the three old call sites where only the error will be set.
Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup().
Both helpers will need to set quitting=1.
Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v8.2.1, v8.1.5, v7.2.9, v8.1.4, v7.2.8, v8.2.0, v8.2.0-rc4, v8.2.0-rc3, v8.2.0-rc2, v8.2.0-rc1, v7.2.7, v8.1.3, v8.2.0-rc0 |
|
#
7aa6070d |
| 17-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_err
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
show more ...
|
#
3e5f3bcd |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints f
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-*: record BH steps to resume VM for precopy - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>
show more ...
|
#
3c80f142 |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently.
Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>
show more ...
|
Revision tags: v8.2.1, v8.1.5, v7.2.9, v8.1.4, v7.2.8, v8.2.0, v8.2.0-rc4, v8.2.0-rc3, v8.2.0-rc2, v8.2.0-rc1, v7.2.7, v8.1.3, v8.2.0-rc0 |
|
#
7aa6070d |
| 17-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_err
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
show more ...
|
#
3e5f3bcd |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints f
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-*: record BH steps to resume VM for precopy - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>
show more ...
|
#
3c80f142 |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently.
Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>
show more ...
|
Revision tags: v8.2.1, v8.1.5, v7.2.9, v8.1.4, v7.2.8, v8.2.0, v8.2.0-rc4, v8.2.0-rc3, v8.2.0-rc2, v8.2.0-rc1, v7.2.7, v8.1.3, v8.2.0-rc0 |
|
#
7aa6070d |
| 17-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_err
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
show more ...
|
#
3e5f3bcd |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints f
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-*: record BH steps to resume VM for precopy - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>
show more ...
|
#
3c80f142 |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently.
Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>
show more ...
|
Revision tags: v8.1.2 |
|
#
967e3889 |
| 12-Oct-2023 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error.
I
migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error.
In the non-NULL case, all it does it trace the error and return. One of the callers already have tracing, add a tracepoint to the other and stop passing the error into the function.
Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-4-farosas@suse.de>
show more ...
|
#
b1b38387 |
| 11-Oct-2023 |
Juan Quintela <quintela@redhat.com> |
migration/rdma: Remove qemu_ prefix from exported functions
Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Si
migration/rdma: Remove qemu_ prefix from exported functions
Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-10-quintela@redhat.com>
show more ...
|
#
8b239597 |
| 10-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Allow user to specify available switchover bandwidth
Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision
migration: Allow user to specify available switchover bandwidth
Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process.
This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available.
However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes.
With that, the decision to switchover will be affected, by assuming that we may not be able to switchover at all with such a low bandwidth, but in reality we can.
The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth, keeping iterations forever with a low estimation of bandwidth.
The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far.
One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value.
For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. It may not be the exact bandwidth when switching over (in which case qemu will push migration data as fast as possible), but much better than QEMU trying to wildly guess, especially when very wrong.
A new parameter "avail-switchover-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), it trusts the user more by using this value to decide when to switchover, assuming that we'll have such bandwidth available then.
Note that specifying this value will not throttle the bandwidth for switchover yet, so QEMU will always use the full bandwidth possible for sending switchover data, assuming that should always be the most important way to use the network at that time.
This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason.
Reported-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231010221922.40638-1-peterx@redhat.com>
show more ...
|
#
2c88739c |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_r
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_rdma_resolve_host() to show source device details. rdma_start_incoming_migration() arranges its call via rdma_accept_incoming_migration() and qemu_rdma_accept() to show destination device details.
Two issues:
1. rdma_start_outgoing_migration() can run in HMP context. The information should arguably go the monitor, not stdout.
2. ibv_query_port() failure is reported as error. Its callers remain unaware of this failure (qemu_rdma_dump_id() can't fail), so reporting this to the user as an error is problematic.
Fixable, but the device detail dump is noise, except when troubleshooting. Tracing is a better fit. Similar function qemu_rdma_dump_id() was converted to tracing in commit 733252deb8b (Tracify migration/rdma.c).
Convert qemu_rdma_dump_id(), too.
While there, touch up qemu_rdma_dump_gid()'s outdated comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
show more ...
|
#
87a24ca3 |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Consistently use uint64_t for work request IDs
We use int instead of uint64_t in a few places. Change them to uint64_t.
This cleans up a comparison of signed qemu_rdma_block_for_wr
migration/rdma: Consistently use uint64_t for work request IDs
We use int instead of uint64_t in a few places. Change them to uint64_t.
This cleans up a comparison of signed qemu_rdma_block_for_wrid() parameter @wrid_requested with unsigned @wr_id. Harmless, because the actual arguments are non-negative enumeration constants.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-6-armbru@redhat.com>
show more ...
|
#
b5631d5b |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Drop fragile wr_id formatting
wrid_desc[] uses 4001 pointers to map four integer values to strings.
print_wrid() accesses wrid_desc[] out of bounds when passed a negative argument.
migration/rdma: Drop fragile wr_id formatting
wrid_desc[] uses 4001 pointers to map four integer values to strings.
print_wrid() accesses wrid_desc[] out of bounds when passed a negative argument. It returns null for values 2..1999 and 2001..3999.
qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id] and passes print_wrid(wr_id) to tracepoints. Could conceivably crash trying to format a null string. I believe access out of bounds is not possible.
Not worth cleaning up. Dumb down to show just numeric wr_id.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-5-armbru@redhat.com>
show more ...
|
#
2c88739c |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_r
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_rdma_resolve_host() to show source device details. rdma_start_incoming_migration() arranges its call via rdma_accept_incoming_migration() and qemu_rdma_accept() to show destination device details.
Two issues:
1. rdma_start_outgoing_migration() can run in HMP context. The information should arguably go the monitor, not stdout.
2. ibv_query_port() failure is reported as error. Its callers remain unaware of this failure (qemu_rdma_dump_id() can't fail), so reporting this to the user as an error is problematic.
Fixable, but the device detail dump is noise, except when troubleshooting. Tracing is a better fit. Similar function qemu_rdma_dump_id() was converted to tracing in commit 733252deb8b (Tracify migration/rdma.c).
Convert qemu_rdma_dump_id(), too.
While there, touch up qemu_rdma_dump_gid()'s outdated comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
show more ...
|
#
87a24ca3 |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Consistently use uint64_t for work request IDs
We use int instead of uint64_t in a few places. Change them to uint64_t.
This cleans up a comparison of signed qemu_rdma_block_for_wr
migration/rdma: Consistently use uint64_t for work request IDs
We use int instead of uint64_t in a few places. Change them to uint64_t.
This cleans up a comparison of signed qemu_rdma_block_for_wrid() parameter @wrid_requested with unsigned @wr_id. Harmless, because the actual arguments are non-negative enumeration constants.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-6-armbru@redhat.com>
show more ...
|
#
b5631d5b |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Drop fragile wr_id formatting
wrid_desc[] uses 4001 pointers to map four integer values to strings.
print_wrid() accesses wrid_desc[] out of bounds when passed a negative argument.
migration/rdma: Drop fragile wr_id formatting
wrid_desc[] uses 4001 pointers to map four integer values to strings.
print_wrid() accesses wrid_desc[] out of bounds when passed a negative argument. It returns null for values 2..1999 and 2001..3999.
qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id] and passes print_wrid(wr_id) to tracepoints. Could conceivably crash trying to format a null string. I believe access out of bounds is not possible.
Not worth cleaning up. Dumb down to show just numeric wr_id.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-5-armbru@redhat.com>
show more ...
|
#
2c88739c |
| 28-Sep-2023 |
Markus Armbruster <armbru@redhat.com> |
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_r
migration/rdma: Replace flawed device detail dump by tracing
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() and qemu_rdma_resolve_host() to show source device details. rdma_start_incoming_migration() arranges its call via rdma_accept_incoming_migration() and qemu_rdma_accept() to show destination device details.
Two issues:
1. rdma_start_outgoing_migration() can run in HMP context. The information should arguably go the monitor, not stdout.
2. ibv_query_port() failure is reported as error. Its callers remain unaware of this failure (qemu_rdma_dump_id() can't fail), so reporting this to the user as an error is problematic.
Fixable, but the device detail dump is noise, except when troubleshooting. Tracing is a better fit. Similar function qemu_rdma_dump_id() was converted to tracing in commit 733252deb8b (Tracify migration/rdma.c).
Convert qemu_rdma_dump_id(), too.
While there, touch up qemu_rdma_dump_gid()'s outdated comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
show more ...
|