Browse Source

Pull request

Andrey Drobyshev's qemugdb script improvements and my --device
 scsi-block,migrate-pr=on|off live migration support for SCSI Persistent
 Reservations.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmmJ8t0ACgkQnKSrs4Gr
 c8goXggAgx8Fehp5q1e1xUulb/WwnHw14lfl2+O4Or3FxK9TDWSUjT0Htk0+QwAf
 W+7Q7MTnSzLTDYKbsPj+4RxZ+Pth/ra2rhIS3YWMQLNAjFKAIWKvQdD0krOlJ8t+
 i3DkERhaw/ke2ImR7GSr7SZjJjhHaxTaC+R/DEPWVxgK1j4mLt/pwAhigWxlvVLT
 SInnZAvfy7+OspFu3AcBtwDEe0MvIQKdTgxZS7wSf/tWS/9WZqsM8pSL/1+ozPGg
 hWjHevhGI6LS4QfRqdF6+dq/XaGT81hFNosCL2o9YWbLuipk/9TyUSX7uevo1IFz
 SpXwxFltCyPicaGJcufX4MjASJqjrg==
 =DKtL
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

Andrey Drobyshev's qemugdb script improvements and my --device
scsi-block,migrate-pr=on|off live migration support for SCSI Persistent
Reservations.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmmJ8t0ACgkQnKSrs4Gr
# c8goXggAgx8Fehp5q1e1xUulb/WwnHw14lfl2+O4Or3FxK9TDWSUjT0Htk0+QwAf
# W+7Q7MTnSzLTDYKbsPj+4RxZ+Pth/ra2rhIS3YWMQLNAjFKAIWKvQdD0krOlJ8t+
# i3DkERhaw/ke2ImR7GSr7SZjJjhHaxTaC+R/DEPWVxgK1j4mLt/pwAhigWxlvVLT
# SInnZAvfy7+OspFu3AcBtwDEe0MvIQKdTgxZS7wSf/tWS/9WZqsM8pSL/1+ozPGg
# hWjHevhGI6LS4QfRqdF6+dq/XaGT81hFNosCL2o9YWbLuipk/9TyUSX7uevo1IFz
# SpXwxFltCyPicaGJcufX4MjASJqjrg==
# =DKtL
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon Feb  9 14:44:45 2026 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  docs: add SCSI migrate-pr documentation
  scsi: save/load SCSI reservation state
  scsi: track SCSI reservation state for live migration
  scsi: add error reporting to scsi_SG_IO()
  scsi: generalize scsi_SG_IO_FROM_DEV() to scsi_SG_IO()
  scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  scripts/qemugdb: timers: Improve 'qemu timers' command readability
  scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command
  scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
pull/319/head
Peter Maydell 2 months ago
parent
commit
b3abdfa486
  1. 1
      docs/system/device-emulation.rst
  2. 10
      docs/system/devices/scsi/index.rst
  3. 54
      docs/system/devices/scsi/migrate-pr.rst
  4. 4
      hw/core/machine.c
  5. 3
      hw/scsi/scsi-bus.c
  6. 90
      hw/scsi/scsi-disk.c
  7. 291
      hw/scsi/scsi-generic.c
  8. 2
      hw/scsi/trace-events
  9. 15
      include/hw/scsi/scsi.h
  10. 21
      include/scsi/constants.h
  11. 257
      scripts/qemugdb/coroutine.py
  12. 2
      scripts/qemugdb/mtree.py
  13. 54
      scripts/qemugdb/timers.py

1
docs/system/device-emulation.rst

@ -95,6 +95,7 @@ Emulated Devices
devices/keyboard.rst
devices/net.rst
devices/nvme.rst
devices/scsi/index.rst
devices/usb-u2f.rst
devices/usb.rst
devices/vfio-user.rst

10
docs/system/devices/scsi/index.rst

@ -0,0 +1,10 @@
SCSI Devices
============
Several SCSI devices are available in QEMU. They are primarily used for block
storage.
.. toctree::
:maxdepth: 1
migrate-pr.rst

54
docs/system/devices/scsi/migrate-pr.rst

@ -0,0 +1,54 @@
..
SPDX-License-Identifier: GPL-2.0-or-later
.. _scsi_migrate_pr:
SCSI Persistent Reservation Live Migration
==========================================
This document explains how to live migrate SCSI Persistent Reservations.
The ``scsi-block`` device migrates SCSI Persistent Reservations when the
``migrate-pr=on`` parameter is given. Migration is enabled by default in
versioned machine types since QEMU 11.0. It is disabled by default on older
machine types and needs to be explicitly enabled with ``--device
scsi-block,migrate-pr=on,...``.
When migration is enabled, QEMU snoops PERSISTENT RESERVATION OUT commands and
tracks the reservation key registered by the guest as well as reservations that
the guest acquires. This information is migrated along with the guest and the
destination QEMU submits a PERSISTENT RESERVATION OUT command with the PREEMPT
service action to atomically transfer the reservation to the destination before
the guest starts running on the destination.
The following persistent reservation capabilities reported by the PERSISTENT
RESERVATION IN command with the REPORT CAPABILITIES service action are masked
from the guest by QEMU when migration is enabled:
* Specify Initiator Ports Capable (SIP_C)
* All Target Ports Capable (ATC_C)
When migration is disabled, the ``scsi-block`` device is live migrated but
reservations remain in place on the source. Usually this is not the intended
behavior unless there is another mechanism to update reservations during
migration. The PERSISTENT RESERVATION IN command also does not mask
capabilities reported to the guest when migration is disabled.
Limitations
-----------
QEMU does not remember snooped reservation details across restart, so software
inside the guest must acquire the reservation after boot in order for live
migration to work. Similarly, if the reservation is acquired outside the guest
then it will not live migrate along with the guest.
Snooping only considers the PERSISTENT RESERVATION OUT commands from the guest
and does not track reservation changes made by other SCSI initiators. QEMU's
snooped reservation details can become stale if another SCSI initiator
makes changes to the reservation.
Guests running on the same host share a single SCSI initiator identity unless
Fibre Channel N_Port ID Virtualization is configured. As a consequence,
multiple guests on the same hosts may observe unexpected behavior if they use
the same physical LUN. From the LUN's perspective all guests are the same
initiator and there is no way to distinguish between guests.

4
hw/core/machine.c

@ -38,7 +38,9 @@
#include "hw/acpi/generic_event_device.h"
#include "qemu/audio.h"
GlobalProperty hw_compat_10_2[] = {};
GlobalProperty hw_compat_10_2[] = {
{ "scsi-block", "migrate-pr", "off" },
};
const size_t hw_compat_10_2_len = G_N_ELEMENTS(hw_compat_10_2);
GlobalProperty hw_compat_10_1[] = {

3
hw/scsi/scsi-bus.c

@ -393,6 +393,7 @@ static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
}
qemu_mutex_init(&dev->requests_lock);
qemu_mutex_init(&dev->pr_state.mutex);
QTAILQ_INIT(&dev->requests);
scsi_device_realize(dev, &local_err);
if (local_err) {
@ -417,6 +418,8 @@ static void scsi_qdev_unrealize(DeviceState *qdev)
scsi_device_unrealize(dev);
qemu_mutex_destroy(&dev->pr_state.mutex);
blockdev_mark_auto_del(dev->conf.blk);
}

90
hw/scsi/scsi-disk.c

@ -28,6 +28,7 @@
#include "qemu/hw-version.h"
#include "qemu/memalign.h"
#include "hw/scsi/scsi.h"
#include "migration/misc.h"
#include "migration/qemu-file-types.h"
#include "migration/vmstate.h"
#include "hw/scsi/emulation.h"
@ -122,6 +123,7 @@ struct SCSIDiskState {
*/
uint16_t rotation_rate;
bool migrate_emulated_scsi_request;
NotifierWithReturn migration_notifier;
};
static void scsi_free_request(SCSIRequest *req)
@ -2737,6 +2739,29 @@ static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun,
}
#ifdef __linux__
/*
* Preempt on the SCSI Persistent Reservation on the source when migration
* fails because the destination may have already preempted and we need to get
* the reservation back.
*/
static int scsi_block_migration_notifier(NotifierWithReturn *notifier,
MigrationEvent *e, Error **errp)
{
if (e->type == MIG_EVENT_PRECOPY_FAILED) {
SCSIDiskState *s =
container_of(notifier, SCSIDiskState, migration_notifier);
SCSIDevice *d = &s->qdev;
Error *local_err = NULL;
if (!scsi_generic_pr_state_preempt(d, &local_err)) {
/* MIG_EVENT_PRECOPY_FAILED cannot fail, so just warn */
error_prepend(&local_err, "scsi-block migration rollback: ");
warn_report_err(local_err);
}
}
return 0;
}
static int get_device_type(SCSIDiskState *s)
{
uint8_t cmd[16];
@ -2748,8 +2773,8 @@ static int get_device_type(SCSIDiskState *s)
cmd[0] = INQUIRY;
cmd[4] = sizeof(buf);
ret = scsi_SG_IO_FROM_DEV(s->qdev.conf.blk, cmd, sizeof(cmd),
buf, sizeof(buf), s->qdev.io_timeout);
ret = scsi_SG_IO(s->qdev.conf.blk, SG_DXFER_FROM_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->qdev.io_timeout, NULL);
if (ret < 0) {
return -1;
}
@ -2815,6 +2840,16 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
scsi_realize(&s->qdev, errp);
scsi_generic_read_device_inquiry(&s->qdev);
migration_add_notifier(&s->migration_notifier,
scsi_block_migration_notifier);
}
static void scsi_block_unrealize(SCSIDevice *dev)
{
SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
migration_remove_notifier(&s->migration_notifier);
}
typedef struct SCSIBlockReq {
@ -3209,6 +3244,47 @@ static const Property scsi_hd_properties[] = {
DEFINE_BLOCK_CHS_PROPERTIES(SCSIDiskState, qdev.conf),
};
#ifdef __linux__
static bool scsi_disk_pr_state_post_load_errp(void *opaque, int version_id,
Error **errp)
{
SCSIDiskState *s = opaque;
SCSIDevice *dev = &s->qdev;
return scsi_generic_pr_state_preempt(dev, errp);
}
static bool scsi_disk_pr_state_needed(void *opaque)
{
SCSIDiskState *s = opaque;
SCSIPRState *pr_state = &s->qdev.pr_state;
bool ret;
if (!s->qdev.migrate_pr) {
return false;
}
/* A reservation requires a key, so checking this field is enough */
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
ret = pr_state->key;
}
return ret;
}
static const VMStateDescription vmstate_scsi_disk_pr_state = {
.name = "scsi-disk/pr",
.version_id = 1,
.minimum_version_id = 1,
.post_load_errp = scsi_disk_pr_state_post_load_errp,
.needed = scsi_disk_pr_state_needed,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(qdev.pr_state.key, SCSIDiskState),
VMSTATE_UINT8(qdev.pr_state.resv_type, SCSIDiskState),
VMSTATE_END_OF_LIST()
}
};
#endif /* __linux__ */
static const VMStateDescription vmstate_scsi_disk_state = {
.name = "scsi-disk",
.version_id = 1,
@ -3221,7 +3297,13 @@ static const VMStateDescription vmstate_scsi_disk_state = {
VMSTATE_BOOL(tray_open, SCSIDiskState),
VMSTATE_BOOL(tray_locked, SCSIDiskState),
VMSTATE_END_OF_LIST()
}
},
.subsections = (const VMStateDescription * const []) {
#ifdef __linux__
&vmstate_scsi_disk_pr_state,
#endif
NULL
},
};
static void scsi_hd_class_initfn(ObjectClass *klass, const void *data)
@ -3301,6 +3383,7 @@ static const Property scsi_block_properties[] = {
-1),
DEFINE_PROP_UINT32("io_timeout", SCSIDiskState, qdev.io_timeout,
DEFAULT_IO_TIMEOUT),
DEFINE_PROP_BOOL("migrate-pr", SCSIDiskState, qdev.migrate_pr, true),
};
static void scsi_block_class_initfn(ObjectClass *klass, const void *data)
@ -3310,6 +3393,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, const void *data)
SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
sc->realize = scsi_block_realize;
sc->unrealize = scsi_block_unrealize;
sc->alloc_req = scsi_block_new_request;
sc->parse_cdb = scsi_block_parse_cdb;
sdc->dma_readv = scsi_block_dma_readv;

291
hw/scsi/scsi-generic.c

@ -265,6 +265,248 @@ static int scsi_generic_emulate_block_limits(SCSIGenericReq *r, SCSIDevice *s)
return r->buflen;
}
/*
* Patch persistent reservation capabilities that are not emulated.
*/
static void scsi_handle_persistent_reserve_in_reply(SCSIGenericReq *r,
SCSIDevice *s)
{
uint8_t service_action = r->req.cmd.buf[1] & 0x1f;
if (!s->migrate_pr) {
return; /* when migration is disabled there is no need for patching */
}
if (service_action == PRI_REPORT_CAPABILITIES) {
assert(r->buflen >= 3);
/*
* Clear specify initiator ports capable (SIP_C) and all target ports
* capable (ATC_C).
*
* SPEC_I_PT is not supported because the guest sees an emulated SCSI
* bus and does not have the underlying transport IDs needed to use
* SPEC_I_PT.
*
* ALL_TG_PT is not supported because we only track the state of this
* emulated I_T nexus, not the underlying device's target ports.
*/
r->buf[2] &= ~0xc;
}
}
static int scsi_generic_read_reservation(SCSIDevice *s, uint64_t *key,
uint8_t *resv_type, Error **errp)
{
uint8_t cmd[10] = {};
uint8_t buf[24] = {};
uint32_t additional_length;
int ret;
*key = 0;
*resv_type = 0;
cmd[0] = PERSISTENT_RESERVE_IN;
cmd[1] = PRI_READ_RESERVATION;
cmd[8] = sizeof(buf);
ret = scsi_SG_IO(s->conf.blk, SG_DXFER_FROM_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout, errp);
if (ret < 0) {
return ret;
}
memcpy(&additional_length, &buf[4], sizeof(additional_length));
be32_to_cpus(&additional_length);
if (additional_length >= 0x10) {
memcpy(key, &buf[8], sizeof(*key));
be64_to_cpus(key);
*resv_type = buf[21] & 0xf;
}
return 0;
}
/*
* Snoop changes to registered keys and reservations so that this information
* can be transferred during live migration.
*/
static void scsi_handle_persistent_reserve_out_reply(
SCSIGenericReq *r,
SCSIDevice *s)
{
SCSIPRState *pr_state = &s->pr_state;
uint8_t service_action = r->req.cmd.buf[1] & 0x1f;
uint8_t resv_type = r->req.cmd.buf[2] & 0xf;
uint64_t old_key;
uint64_t new_key;
assert(r->buflen >= 16);
memcpy(&old_key, &r->buf[0], sizeof(old_key));
memcpy(&new_key, &r->buf[8], sizeof(new_key));
be64_to_cpus(&old_key);
be64_to_cpus(&new_key);
trace_scsi_generic_persistent_reserve_out_reply(service_action, resv_type,
old_key, new_key);
switch (service_action) {
case PRO_REGISTER: /* fallthrough */
case PRO_REGISTER_AND_IGNORE_EXISTING_KEY:
if (service_action == PRO_REGISTER && old_key == 0 && new_key == 0) {
/* Do nothing */
} else {
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
pr_state->key = new_key;
if (new_key == 0) {
pr_state->resv_type = 0; /* release reservation */
}
}
}
break;
case PRO_RESERVE:
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
pr_state->resv_type = resv_type;
}
break;
case PRO_RELEASE:
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
pr_state->resv_type = 0;
}
break;
case PRO_CLEAR:
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
pr_state->key = 0;
pr_state->resv_type = 0;
}
break;
case PRO_REPLACE_LOST_RESERVATION:
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
pr_state->key = new_key;
pr_state->resv_type = resv_type;
}
break;
case PRO_PREEMPT: /* fallthrough */
case PRO_PREEMPT_AND_ABORT: {
uint64_t dev_key;
uint8_t dev_resv_type;
Error *local_err = NULL;
/* Not enough information to know actual state, ask the device */
if (!scsi_generic_read_reservation(s, &dev_key, &dev_resv_type,
&local_err)) {
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
if (pr_state->key == dev_key) {
pr_state->resv_type = dev_resv_type;
} else {
pr_state->resv_type = 0;
}
}
}
if (local_err) {
warn_report_err(local_err);
}
break;
}
/*
* PRO_REGISTER_AND_MOVE cannot be implemented since it involves the
* physical SCSI bus target ports.
*/
default:
break; /* do nothing */
}
}
static bool scsi_generic_pr_register(SCSIDevice *s, uint64_t key, Error **errp)
{
uint8_t cmd[10] = {};
uint8_t buf[24] = {};
uint64_t key_be = cpu_to_be64(key);
int ret;
cmd[0] = PERSISTENT_RESERVE_OUT;
cmd[1] = PRO_REGISTER;
cmd[8] = sizeof(buf);
memcpy(&buf[8], &key_be, sizeof(key_be));
ret = scsi_SG_IO(s->conf.blk, SG_DXFER_TO_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout, errp);
if (ret < 0) {
error_prepend(errp, "PERSISTENT RESERVE OUT with REGISTER");
return false;
}
return true;
}
static bool scsi_generic_pr_preempt(SCSIDevice *s, uint64_t key,
uint8_t resv_type, Error **errp)
{
uint8_t cmd[10] = {};
uint8_t buf[24] = {};
uint64_t key_be = cpu_to_be64(key);
int ret;
cmd[0] = PERSISTENT_RESERVE_OUT;
cmd[1] = PRO_PREEMPT;
cmd[2] = resv_type & 0xf;
cmd[8] = sizeof(buf);
memcpy(&buf[0], &key_be, sizeof(key_be));
memcpy(&buf[8], &key_be, sizeof(key_be));
ret = scsi_SG_IO(s->conf.blk, SG_DXFER_TO_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout, errp);
if (ret < 0) {
error_prepend(errp, "PERSISTENT RESERVE OUT with PREEMPT");
return false;
}
return true;
}
/* Register keys and preempt reservations after live migration */
bool scsi_generic_pr_state_preempt(SCSIDevice *s, Error **errp)
{
SCSIPRState *pr_state = &s->pr_state;
uint64_t key;
uint8_t resv_type;
WITH_QEMU_LOCK_GUARD(&pr_state->mutex) {
key = pr_state->key;
resv_type = pr_state->resv_type;
}
trace_scsi_generic_pr_state_preempt(key, resv_type);
if (key) {
if (!scsi_generic_pr_register(s, key, errp)) {
return false;
}
/*
* Two cases:
*
* 1. There is no reservation (resv_type is 0) and the other I_T nexus
* will be unregistered. This is important so the source host does
* not leak registered keys across live migration.
*
* 2. There is a reservation (resv_type is not 0) and the other I_T
* nexus will be unregistered and its reservation is atomically
* taken over by us. This is the scenario where a reservation is
* migrated along with the guest.
*/
if (!scsi_generic_pr_preempt(s, key, resv_type, errp)) {
return false;
}
}
return true;
}
static void scsi_read_complete(void * opaque, int ret)
{
SCSIGenericReq *r = (SCSIGenericReq *)opaque;
@ -347,6 +589,9 @@ static void scsi_read_complete(void * opaque, int ret)
if (r->req.cmd.buf[0] == INQUIRY) {
len = scsi_handle_inquiry_reply(r, s, len);
}
if (r->req.cmd.buf[0] == PERSISTENT_RESERVE_IN) {
scsi_handle_persistent_reserve_in_reply(r, s);
}
req_complete:
scsi_req_data(&r->req, len);
@ -396,6 +641,9 @@ static void scsi_write_complete(void * opaque, int ret)
s->blocksize = (r->buf[9] << 16) | (r->buf[10] << 8) | r->buf[11];
trace_scsi_generic_write_complete_blocksize(s->blocksize);
}
if (r->req.cmd.buf[0] == PERSISTENT_RESERVE_OUT) {
scsi_handle_persistent_reserve_out_reply(r, s);
}
scsi_command_complete_noio(r, ret);
}
@ -525,16 +773,17 @@ static int read_naa_id(const uint8_t *p, uint64_t *p_wwn)
return -EINVAL;
}
int scsi_SG_IO_FROM_DEV(BlockBackend *blk, uint8_t *cmd, uint8_t cmd_size,
uint8_t *buf, uint8_t buf_size, uint32_t timeout)
int scsi_SG_IO(BlockBackend *blk, int direction, uint8_t *cmd,
uint8_t cmd_size, uint8_t *buf, uint8_t buf_size,
uint32_t timeout, Error **errp)
{
sg_io_hdr_t io_header;
uint8_t sensebuf[8];
uint8_t sensebuf[8] = {};
int ret;
memset(&io_header, 0, sizeof(io_header));
io_header.interface_id = 'S';
io_header.dxfer_direction = SG_DXFER_FROM_DEV;
io_header.dxfer_direction = direction;
io_header.dxfer_len = buf_size;
io_header.dxferp = buf;
io_header.cmdp = cmd;
@ -549,6 +798,29 @@ int scsi_SG_IO_FROM_DEV(BlockBackend *blk, uint8_t *cmd, uint8_t cmd_size,
io_header.driver_status || io_header.host_status) {
trace_scsi_generic_ioctl_sgio_done(cmd[0], ret, io_header.status,
io_header.host_status);
if (ret < 0) {
error_setg_errno(errp, -ret, "SG_IO ioctl failed");
} else {
g_autofree char *sensebuf_hex =
g_strdup_printf("%02x%02x%02x%02x%02x%02x%02x%02x",
sensebuf[0],
sensebuf[1],
sensebuf[2],
sensebuf[3],
sensebuf[4],
sensebuf[5],
sensebuf[6],
sensebuf[7]);
error_setg(errp, "SG_IO SCSI command failed with status=0x%x "
"driver_status=0x%x host_status=0x%x sensebuf=%s "
"sb_len_wr=%u",
io_header.status,
io_header.driver_status,
io_header.host_status,
sensebuf_hex,
io_header.sb_len_wr);
}
return -1;
}
return 0;
@ -574,8 +846,8 @@ static void scsi_generic_set_vpd_bl_emulation(SCSIDevice *s)
cmd[2] = 0x00;
cmd[4] = sizeof(buf);
ret = scsi_SG_IO_FROM_DEV(s->conf.blk, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout);
ret = scsi_SG_IO(s->conf.blk, SG_DXFER_FROM_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout, NULL);
if (ret < 0) {
/*
* Do not assume anything if we can't retrieve the
@ -610,8 +882,8 @@ static void scsi_generic_read_device_identification(SCSIDevice *s)
cmd[2] = 0x83;
cmd[4] = sizeof(buf);
ret = scsi_SG_IO_FROM_DEV(s->conf.blk, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout);
ret = scsi_SG_IO(s->conf.blk, SG_DXFER_FROM_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), s->io_timeout, NULL);
if (ret < 0) {
return;
}
@ -662,7 +934,8 @@ static int get_stream_blocksize(BlockBackend *blk)
cmd[0] = MODE_SENSE;
cmd[4] = sizeof(buf);
ret = scsi_SG_IO_FROM_DEV(blk, cmd, sizeof(cmd), buf, sizeof(buf), 6);
ret = scsi_SG_IO(blk, SG_DXFER_FROM_DEV, cmd, sizeof(cmd),
buf, sizeof(buf), 6, NULL);
if (ret < 0) {
return -1;
}

2
hw/scsi/trace-events

@ -390,3 +390,5 @@ scsi_generic_realize_blocksize(int blocksize) "block size %d"
scsi_generic_aio_sgio_command(uint32_t tag, uint8_t cmd, uint32_t timeout) "generic aio sgio: tag=0x%x cmd=0x%x timeout=%u"
scsi_generic_ioctl_sgio_command(uint8_t cmd, uint32_t timeout) "generic ioctl sgio: cmd=0x%x timeout=%u"
scsi_generic_ioctl_sgio_done(uint8_t cmd, int ret, uint8_t status, uint8_t host_status) "generic ioctl sgio: cmd=0x%x ret=%d status=0x%x host_status=0x%x"
scsi_generic_persistent_reserve_out_reply(uint8_t service_action, uint8_t resv_type, uint64_t old_key, uint64_t new_key) "persistent reserve out reply service_action=%u resv_type=%u old_key=0x%" PRIx64 " new_key=0x%" PRIx64
scsi_generic_pr_state_preempt(uint64_t key, uint8_t resv_type) "key=0x%" PRIx64 " resv_type=%u"

15
include/hw/scsi/scsi.h

@ -57,6 +57,13 @@ struct SCSIRequest {
QTAILQ_ENTRY(SCSIRequest) next;
};
/* Per-SCSIDevice Persistent Reservation state */
typedef struct {
QemuMutex mutex; /* protects all fields (e.g. from multiple IOThreads) */
uint64_t key; /* 0 if no registered key */
uint8_t resv_type; /* 0 if no reservation */
} SCSIPRState;
#define TYPE_SCSI_DEVICE "scsi-device"
OBJECT_DECLARE_TYPE(SCSIDevice, SCSIDeviceClass, SCSI_DEVICE)
@ -97,6 +104,9 @@ struct SCSIDevice
uint32_t io_timeout;
bool needs_vpd_bl_emulation;
bool hba_supports_iothread;
bool migrate_pr;
SCSIPRState pr_state;
};
extern const VMStateDescription vmstate_scsi_device;
@ -236,13 +246,14 @@ void scsi_device_report_change(SCSIDevice *dev, SCSISense sense);
void scsi_device_unit_attention_reported(SCSIDevice *dev);
void scsi_generic_read_device_inquiry(SCSIDevice *dev);
int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, int len, bool fixed);
int scsi_SG_IO_FROM_DEV(BlockBackend *blk, uint8_t *cmd, uint8_t cmd_size,
uint8_t *buf, uint8_t buf_size, uint32_t timeout);
int scsi_SG_IO(BlockBackend *blk, int direction, uint8_t *cmd, uint8_t cmd_size,
uint8_t *buf, uint8_t buf_size, uint32_t timeout, Error **errp);
SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun);
SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int target, int lun);
/* scsi-generic.c. */
extern const SCSIReqOps scsi_generic_req_ops;
bool scsi_generic_pr_state_preempt(SCSIDevice *s, Error **errp);
/* scsi-disk.c */
#define SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR 0

21
include/scsi/constants.h

@ -319,4 +319,25 @@
#define IDENT_DESCR_TGT_DESCR_SIZE 32
#define XCOPY_BLK2BLK_SEG_DESC_SIZE 28
/*
* PERSISTENT RESERVATION IN service action codes
*/
#define PRI_READ_KEYS 0x00
#define PRI_READ_RESERVATION 0x01
#define PRI_REPORT_CAPABILITIES 0x02
#define PRI_READ_FULL_STATUS 0x03
/*
* PERSISTENT RESERVATION OUT service action codes
*/
#define PRO_REGISTER 0x00
#define PRO_RESERVE 0x01
#define PRO_RELEASE 0x02
#define PRO_CLEAR 0x03
#define PRO_PREEMPT 0x04
#define PRO_PREEMPT_AND_ABORT 0x05
#define PRO_REGISTER_AND_IGNORE_EXISTING_KEY 0x06
#define PRO_REGISTER_AND_MOVE 0x07
#define PRO_REPLACE_LOST_RESERVATION 0x08
#endif

257
scripts/qemugdb/coroutine.py

@ -9,10 +9,119 @@
# This work is licensed under the terms of the GNU GPL, version 2
# or later. See the COPYING file in the top-level directory.
import atexit
import gdb
import os
import pty
import re
import struct
import textwrap
from collections import OrderedDict
from copy import deepcopy
VOID_PTR = gdb.lookup_type('void').pointer()
# Registers in the same order they're present in ELF coredump file.
# See asm/ptrace.h
PT_REGS = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10', 'r9',
'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax', 'rip', 'cs',
'eflags', 'rsp', 'ss']
coredump = None
class Coredump:
_ptregs_suff = '.ptregs'
def __init__(self, coredump, executable):
atexit.register(self._cleanup)
self.coredump = coredump
self.executable = executable
self._ptregs_blob = coredump + self._ptregs_suff
self._dirty = False
with open(coredump, 'rb') as f:
while f.read(4) != b'CORE':
pass
gdb.write(f'core file {coredump}: found "CORE" at 0x{f.tell():x}\n')
# Looking for struct elf_prstatus and pr_reg field in it (an array
# of general purpose registers). See sys/procfs.h.
# lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
f.seek(4, 1)
# lseek(f.fileno(), 112, SEEK_CUR):
# offsetof(struct elf_prstatus, pr_reg)
f.seek(112, 1)
self._ptregs_offset = f.tell()
# If binary blob with the name /path/to/coredump + '.ptregs'
# exists, that means proper cleanup didn't happen during previous
# GDB session with the same coredump, and registers in the dump
# itself might've remained patched. Thus we restore original
# registers values from this blob
if os.path.exists(self._ptregs_blob):
with open(self._ptregs_blob, 'rb') as b:
orig_ptregs_bytes = b.read()
self._dirty = True
else:
orig_ptregs_bytes = f.read(len(PT_REGS) * 8)
values = struct.unpack(f"={len(PT_REGS)}q", orig_ptregs_bytes)
self._orig_ptregs = OrderedDict(zip(PT_REGS, values))
if not os.path.exists(self._ptregs_blob):
gdb.write(f'saving original pt_regs in {self._ptregs_blob}\n')
with open(self._ptregs_blob, 'wb') as b:
b.write(orig_ptregs_bytes)
gdb.write('\n')
def patch_regs(self, regs):
# Set dirty flag early on to make sure regs are restored upon cleanup
self._dirty = True
gdb.write(f'patching core file {self.coredump}\n')
patched_ptregs = deepcopy(self._orig_ptregs)
int_regs = {k: int(v) for k, v in regs.items()}
patched_ptregs.update(int_regs)
with open(self.coredump, 'ab') as f:
gdb.write(f'assume pt_regs at 0x{self._ptregs_offset:x}\n')
f.seek(self._ptregs_offset, 0)
gdb.write('writing regs:\n')
for reg in self._orig_ptregs.keys():
if reg in int_regs:
gdb.write(f" {reg}: {int_regs[reg]:#16x}\n")
f.write(struct.pack(f"={len(PT_REGS)}q", *patched_ptregs.values()))
gdb.write('\n')
def restore_regs(self):
if not self._dirty:
return
gdb.write(f'\nrestoring original regs in core file {self.coredump}\n')
with open(self.coredump, 'ab') as f:
gdb.write(f'assume pt_regs at 0x{self._ptregs_offset:x}\n')
f.seek(self._ptregs_offset, 0)
f.write(struct.pack(f"={len(PT_REGS)}q",
*self._orig_ptregs.values()))
self._dirty = False
gdb.write('\n')
def _cleanup(self):
if os.path.exists(self._ptregs_blob):
self.restore_regs()
gdb.write(f'\nremoving saved pt_regs file {self._ptregs_blob}\n')
os.unlink(self._ptregs_blob)
def pthread_self():
'''Fetch the base address of TLS.'''
return gdb.parse_and_eval("$fs_base")
@ -77,6 +186,55 @@ def symbol_lookup(addr):
return f"{func_str} at {path}:{line}"
def run_with_pty(cmd):
# Create a PTY pair
master_fd, slave_fd = pty.openpty()
pid = os.fork()
if pid == 0: # Child
os.close(master_fd)
# Attach stdin/stdout/stderr to the PTY slave side
os.dup2(slave_fd, 0)
os.dup2(slave_fd, 1)
os.dup2(slave_fd, 2)
os.close(slave_fd)
os.execvp("gdb", cmd) # Runs gdb and doesn't return
# Parent
os.close(slave_fd)
output = bytearray()
try:
while True:
data = os.read(master_fd, 65536)
if not data:
break
output.extend(data)
except OSError: # in case subprocess exits and we get EBADF on read()
pass
finally:
try:
os.close(master_fd)
except OSError: # in case we get EBADF on close()
pass
# Wait for child to finish (reap zombie)
os.waitpid(pid, 0)
return output.decode('utf-8')
def dump_backtrace_patched(regs):
cmd = ['gdb', '-batch',
'-ex', 'set debuginfod enabled off',
'-ex', 'set complaints 0',
'-ex', 'set style enabled on',
'-ex', 'python print("----split----")',
'-ex', 'bt', coredump.executable, coredump.coredump]
coredump.patch_regs(regs)
out = run_with_pty(cmd).split('----split----')[1]
gdb.write(out)
def dump_backtrace(regs):
'''
Backtrace dump with raw registers, mimic GDB command 'bt'.
@ -120,7 +278,7 @@ def dump_backtrace_live(regs):
selected_frame.select()
def bt_jmpbuf(jmpbuf):
def bt_jmpbuf(jmpbuf, detailed=False):
'''Backtrace a jmpbuf'''
regs = get_jmpbuf_regs(jmpbuf)
try:
@ -128,8 +286,12 @@ def bt_jmpbuf(jmpbuf):
# but only works with live sessions.
dump_backtrace_live(regs)
except:
# If above doesn't work, fallback to poor man's unwind
dump_backtrace(regs)
if detailed:
# Obtain detailed trace by patching regs in copied coredump
dump_backtrace_patched(regs)
else:
# If above doesn't work, fallback to poor man's unwind
dump_backtrace(regs)
def co_cast(co):
return co.cast(gdb.lookup_type('CoroutineUContext').pointer())
@ -138,28 +300,90 @@ def coroutine_to_jmpbuf(co):
coroutine_pointer = co_cast(co)
return coroutine_pointer['env']['__jmpbuf']
def init_coredump():
global coredump
files = gdb.execute('info files', False, True).split('\n')
if not 'core dump' in files[1]:
return False
core_path = re.search("`(.*)'", files[2]).group(1)
exec_path = re.match('^Symbols from "(.*)".$', files[0]).group(1)
if coredump is None:
coredump = Coredump(core_path, exec_path)
return True
class CoroutineCommand(gdb.Command):
'''Display coroutine backtrace'''
__doc__ = textwrap.dedent("""\
Display coroutine backtrace
Usage: qemu coroutine COROPTR [--detailed]
Show backtrace for a coroutine specified by COROPTR
--detailed obtain detailed trace by copying coredump, patching
regs in it, and runing gdb subprocess to get
backtrace from the patched coredump
""")
def __init__(self):
gdb.Command.__init__(self, 'qemu coroutine', gdb.COMMAND_DATA,
gdb.COMPLETE_NONE)
def _usage(self):
gdb.write('usage: qemu coroutine <coroutine-pointer> [--detailed]\n')
return
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
if len(argv) != 1:
gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
argc = len(argv)
if argc == 0 or argc > 2 or (argc == 2 and argv[1] != '--detailed'):
return self._usage()
detailed = True if argc == 2 else False
is_coredump = init_coredump()
if detailed and not is_coredump:
gdb.write('--detailed is only valid when debugging core dumps\n')
return
bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))
try:
bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])),
detailed=detailed)
finally:
coredump.restore_regs()
class CoroutineBt(gdb.Command):
'''Display backtrace including coroutine switches'''
__doc__ = textwrap.dedent("""\
Display backtrace including coroutine switches
Usage: qemu bt [--detailed]
--detailed obtain detailed trace by copying coredump, patching
regs in it, and runing gdb subprocess to get
backtrace from the patched coredump
""")
def __init__(self):
gdb.Command.__init__(self, 'qemu bt', gdb.COMMAND_STACK,
gdb.COMPLETE_NONE)
def _usage(self):
gdb.write('usage: qemu bt [--detailed]\n')
return
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
argc = len(argv)
if argc > 1 or (argc == 1 and argv[0] != '--detailed'):
return self._usage()
detailed = True if argc == 1 else False
is_coredump = init_coredump()
if detailed and not is_coredump:
gdb.write('--detailed is only valid when debugging core dumps\n')
return
gdb.execute("bt")
@ -173,13 +397,16 @@ class CoroutineBt(gdb.Command):
if co_ptr == False:
return
while True:
co = co_cast(co_ptr)
co_ptr = co["base"]["caller"]
if co_ptr == 0:
break
gdb.write("Coroutine at " + str(co_ptr) + ":\n")
bt_jmpbuf(coroutine_to_jmpbuf(co_ptr))
try:
while True:
co = co_cast(co_ptr)
co_ptr = co["base"]["caller"]
if co_ptr == 0:
break
gdb.write("\nCoroutine at " + str(co_ptr) + ":\n")
bt_jmpbuf(coroutine_to_jmpbuf(co_ptr), detailed=detailed)
finally:
coredump.restore_regs()
class CoroutineSPFunction(gdb.Function):
def __init__(self):

2
scripts/qemugdb/mtree.py

@ -25,7 +25,7 @@ def int128(p):
if p.type.code == gdb.TYPE_CODE_STRUCT:
return int(p['lo']) + (int(p['hi']) << 64)
else:
return int(("%s" % p), 16)
return int(("%s" % p), 0)
class MtreeCommand(gdb.Command):
'''Display the memory tree hierarchy'''

54
scripts/qemugdb/timers.py

@ -21,14 +21,53 @@ class TimersCommand(gdb.Command):
gdb.Command.__init__(self, 'qemu timers', gdb.COMMAND_DATA,
gdb.COMPLETE_NONE)
def _format_expire_time(self, expire_time, scale):
"Return human-readable expiry time (ns) with scale info."
secs = expire_time / 1e9
# Select unit and compute value
if secs < 1:
val, unit = secs * 1000, "ms"
elif secs < 60:
val, unit = secs, "s"
elif secs < 3600:
val, unit = secs / 60, "min"
elif secs < 86400:
val, unit = secs / 3600, "hrs"
else:
val, unit = secs / 86400, "days"
scale_map = {1: "ns", 1000: "us", 1000000: "ms",
1000000000: "s"}
scale_str = scale_map.get(scale, f"scale={scale}")
return f"{val:.2f} {unit} [{scale_str}]"
def _format_attribute(self, attr):
"Given QEMUTimer attributes value, return a human-readable string"
# From include/qemu/timer.h
if attr == 0:
value = 'NONE'
elif attr == 1 << 0:
value = 'ATTR_EXTERNAL'
elif attr == int(0xffffffff):
value = 'ATTR_ALL'
else:
value = 'UNKNOWN'
return f'{attr} <{value}>'
def dump_timers(self, timer):
"Follow a timer and recursively dump each one in the list."
# timer should be of type QemuTimer
gdb.write(" timer %s/%s (cb:%s,opq:%s)\n" % (
timer['expire_time'],
timer['scale'],
timer['cb'],
timer['opaque']))
scale = int(timer['scale'])
expire_time = int(timer['expire_time'])
attributes = int(timer['attributes'])
time_str = self._format_expire_time(expire_time, scale)
attr_str = self._format_attribute(attributes)
gdb.write(f" timer at {time_str} (attr:{attr_str}, "
f"cb:{timer['cb']}, opq:{timer['opaque']})\n")
if int(timer['next']) > 0:
self.dump_timers(timer['next'])
@ -36,10 +75,9 @@ class TimersCommand(gdb.Command):
def process_timerlist(self, tlist, ttype):
gdb.write("Processing %s timers\n" % (ttype))
gdb.write(" clock %s is enabled:%s, last:%s\n" % (
gdb.write(" clock %s is enabled:%s\n" % (
tlist['clock']['type'],
tlist['clock']['enabled'],
tlist['clock']['last']))
tlist['clock']['enabled']))
if int(tlist['active_timers']) > 0:
self.dump_timers(tlist['active_timers'])

Loading…
Cancel
Save