1.\" $NetBSD: libnvmm.3,v 1.27 2020/09/05 07:22:25 maxv Exp $ 2.\" 3.\" Copyright (c) 2018-2020 Maxime Villard, m00nbsd.net 4.\" All rights reserved. 5.\" 6.\" This code is part of the NVMM hypervisor. 7.\" 8.\" Redistribution and use in source and binary forms, with or without 9.\" modification, are permitted provided that the following conditions 10.\" are met: 11.\" 1. Redistributions of source code must retain the above copyright 12.\" notice, this list of conditions and the following disclaimer. 13.\" 2. Redistributions in binary form must reproduce the above copyright 14.\" notice, this list of conditions and the following disclaimer in the 15.\" documentation and/or other materials provided with the distribution. 16.\" 17.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 18.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 19.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 20.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 21.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 22.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 23.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 24.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 25.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27.\" SUCH DAMAGE. 28.\" 29.Dd May 25, 2021 30.Dt LIBNVMM 3 31.Os 32.Sh NAME 33.Nm libnvmm 34.Nd NetBSD Virtualization API 35.Sh LIBRARY 36.Lb libnvmm 37.Sh SYNOPSIS 38.In nvmm.h 39.Ft int 40.Fn nvmm_init "void" 41.Ft int 42.Fn nvmm_capability "struct nvmm_capability *cap" 43.Ft int 44.Fn nvmm_machine_create "struct nvmm_machine *mach" 45.Ft int 46.Fn nvmm_machine_destroy "struct nvmm_machine *mach" 47.Ft int 48.Fn nvmm_machine_configure "struct nvmm_machine *mach" "uint64_t op" \ 49 "void *conf" 50.Ft int 51.Fn nvmm_vcpu_create "struct nvmm_machine *mach" "nvmm_cpuid_t cpuid" \ 52 "struct nvmm_vcpu *vcpu" 53.Ft int 54.Fn nvmm_vcpu_destroy "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" 55.Ft int 56.Fn nvmm_vcpu_configure "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" \ 57 "uint64_t op" "void *conf" 58.Ft int 59.Fn nvmm_vcpu_getstate "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" \ 60 "uint64_t flags" 61.Ft int 62.Fn nvmm_vcpu_setstate "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" \ 63 "uint64_t flags" 64.Ft int 65.Fn nvmm_vcpu_inject "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" 66.Ft int 67.Fn nvmm_vcpu_run "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" 68.Ft int 69.Fn nvmm_hva_map "struct nvmm_machine *mach" "uintptr_t hva" "size_t size" 70.Ft int 71.Fn nvmm_hva_unmap "struct nvmm_machine *mach" "uintptr_t hva" "size_t size" 72.Ft int 73.Fn nvmm_gpa_map "struct nvmm_machine *mach" "uintptr_t hva" "gpaddr_t gpa" \ 74 "size_t size" "int prot" 75.Ft int 76.Fn nvmm_gpa_unmap "struct nvmm_machine *mach" "uintptr_t hva" "gpaddr_t gpa" \ 77 "size_t size" 78.Ft int 79.Fn nvmm_gva_to_gpa "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" \ 80 "gvaddr_t gva" "gpaddr_t *gpa" "nvmm_prot_t *prot" 81.Ft int 82.Fn nvmm_gpa_to_hva "struct nvmm_machine *mach" "gpaddr_t gpa" \ 83 "uintptr_t *hva" "nvmm_prot_t *prot" 84.Ft int 85.Fn nvmm_assist_io "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" 86.Ft int 87.Fn nvmm_assist_mem "struct nvmm_machine *mach" "struct nvmm_vcpu *vcpu" 88.Sh DESCRIPTION 89.Nm 90provides a library for emulator software to handle hardware-accelerated virtual 91machines in 92.Nx . 93A virtual machine is described by an opaque structure, 94.Cd nvmm_machine . 95Emulator software should not attempt to modify this structure directly, and 96should use the API provided by 97.Nm 98to manage virtual machines. 99A virtual CPU is described by a public structure, 100.Cd nvmm_vcpu . 101.Pp 102.Fn nvmm_init 103initializes NVMM. 104See 105.Sx NVMM Initialization 106below for details. 107.Pp 108.Fn nvmm_capability 109gets the capabilities of NVMM. 110See 111.Sx NVMM Capability 112below for details. 113.Pp 114.Fn nvmm_machine_create 115creates a virtual machine in the kernel. 116The 117.Fa mach 118structure is initialized, and describes the machine. 119.Pp 120.Fn nvmm_machine_destroy 121destroys the virtual machine described in 122.Fa mach . 123.Pp 124.Fn nvmm_machine_configure 125configures, on the machine 126.Fa mach , 127the parameter indicated in 128.Fa op . 129.Fa conf 130describes the value of the parameter. 131.Pp 132.Fn nvmm_vcpu_create 133creates a virtual CPU in the machine 134.Fa mach , 135giving it the CPU id 136.Fa cpuid , 137and initializes 138.Fa vcpu . 139.Pp 140.Fn nvmm_vcpu_destroy 141destroys the virtual CPU identified by 142.Fa vcpu 143in the machine 144.Fa mach . 145.Pp 146.Fn nvmm_vcpu_configure 147configures, on the VCPU 148.Fa vcpu 149of machine 150.Fa mach , 151the parameter indicated in 152.Fa op . 153.Fa conf 154describes the value of the parameter. 155.Pp 156.Fn nvmm_vcpu_getstate 157gets the state of the virtual CPU identified by 158.Fa vcpu 159in the machine 160.Fa mach . 161.Fa flags 162is the bitmap of the components that are to be retrieved. 163The components are located in 164.Fa vcpu->state . 165See 166.Sx VCPU State Area 167below for details. 168.Pp 169.Fn nvmm_vcpu_setstate 170sets the state of the virtual CPU identified by 171.Fa vcpu 172in the machine 173.Fa mach . 174.Fa flags 175is the bitmap of the components that are to be set. 176The components are located in 177.Fa vcpu->state . 178See 179.Sx VCPU State Area 180below for details. 181.Pp 182.Fn nvmm_vcpu_inject 183injects into the CPU identified by 184.Fa vcpu 185of the machine 186.Fa mach 187an event described by 188.Fa vcpu->event . 189See 190.Sx Event Injection 191below for details. 192.Pp 193.Fn nvmm_vcpu_run 194runs the CPU identified by 195.Fa vcpu 196in the machine 197.Fa mach , 198until a VM exit is triggered. 199The 200.Fa vcpu->exit 201structure is filled to indicate the exit reason, and the associated parameters 202if any. 203.Pp 204.Fn nvmm_hva_map 205maps at address 206.Fa hva 207a buffer of size 208.Fa size 209in the calling process' virtual address space. 210This buffer is allowed to be subsequently mapped in a virtual machine. 211.Pp 212.Fn nvmm_hva_unmap 213unmaps the buffer of size 214.Fa size 215at address 216.Fa hva 217from the calling process' virtual address space. 218.Pp 219.Fn nvmm_gpa_map 220maps into the guest physical memory beginning on address 221.Fa gpa 222the buffer of size 223.Fa size 224located at address 225.Fa hva 226of the calling process' virtual address space. 227The 228.Fa hva 229parameter must point to a buffer that was previously mapped with 230.Fn nvmm_hva_map . 231.Pp 232.Fn nvmm_gpa_unmap 233removes the guest physical memory area beginning on address 234.Fa gpa 235and of size 236.Fa size 237from the machine 238.Fa mach . 239.Pp 240.Fn nvmm_gva_to_gpa 241translates, on the CPU 242.Fa vcpu 243from the machine 244.Fa mach , 245the guest virtual address given in 246.Fa gva 247into a guest physical address returned in 248.Fa gpa . 249The associated page premissions are returned in 250.Fa prot . 251.Fa gva 252must be page-aligned. 253.Pp 254.Fn nvmm_gpa_to_hva 255translates, on the machine 256.Fa mach , 257the guest physical address indicated in 258.Fa gpa 259into a host virtual address returned in 260.Fa hva . 261The associated page premissions are returned in 262.Fa prot . 263.Fa gpa 264must be page-aligned. 265.Pp 266.Fn nvmm_assist_io 267emulates the I/O operation described in 268.Fa vcpu->exit 269on CPU 270.Fa vcpu 271from machine 272.Fa mach . 273See 274.Sx I/O Assist 275below for details. 276.Pp 277.Fn nvmm_assist_mem 278emulates the Mem operation described in 279.Fa vcpu->exit 280on CPU 281.Fa vcpu 282from machine 283.Fa mach . 284See 285.Sx Mem Assist 286below for details. 287.Ss NVMM Initialization 288NVMM initialization is performed by the 289.Fn nvmm_init 290function, which must be invoked by emulator software before any other NVMM 291function. 292.Pp 293.Fn nvmm_init 294opens the NVMM device, and expects to have the proper permissions to do so. 295In a default configuration, this implies being part of the "nvmm" group. 296If using a special configuration, emulator software should arrange to have the 297proper permissions before invoking 298.Fn nvmm_init , 299and can drop them after the call has completed. 300.Pp 301It is to be noted that 302.Fn nvmm_init 303may perform non-re-entrant operations, and should be called only once. 304.Ss NVMM Capability 305The 306.Cd nvmm_capability 307structure helps emulator software identify the capabilities offered by NVMM on 308the host: 309.Bd -literal 310struct nvmm_capability { 311 uint64_t version; 312 uint64_t state_size; 313 uint64_t max_machines; 314 uint64_t max_vcpus; 315 uint64_t max_ram; 316 struct { 317 ... 318 } arch; 319}; 320.Ed 321.Pp 322For example, the 323.Cd max_machines 324field indicates the maximum number of virtual machines supported, while 325.Cd max_vcpus 326indicates the maximum number of VCPUs supported per virtual machine. 327.Ss Machine Ownership 328When a process creates a virtual machine via 329.Fn nvmm_machine_create , 330it is considered the owner of this machine. 331No other processes than the owner can operate a virtual machine. 332.Pp 333When an owner exits, all the virtual machines associated with it are destroyed, 334if they were not already destroyed by the owner itself via 335.Fn nvmm_machine_destroy . 336.Pp 337Virtual machines are not inherited across 338.Xr fork 2 339operations. 340.Ss Machine Configuration 341Emulator software can configure several parameters of a virtual machine by using 342.Fn nvmm_machine_configure . 343Currently, no parameters are implemented. 344.Ss VCPU Configuration 345Emulator software can configure several parameters of a VCPU by using 346.Fn nvmm_vcpu_configure , 347which can take the following operations: 348.Bd -literal 349#define NVMM_VCPU_CONF_CALLBACKS 0 350 ... 351.Ed 352.Pp 353The higher fields depend on the architecture. 354.Ss Guest-Host Mappings 355Each virtual machine has an associated guest physical memory. 356Emulator software is allowed to modify this guest physical memory by mapping 357it into some parts of its virtual address space. 358.Pp 359Emulator software should follow the following steps to achieve that: 360.Pp 361.Bl -bullet -offset indent -compact 362.It 363Call 364.Fn nvmm_hva_map 365to create in the host's virtual address space an area of memory that can 366be shared with a guest. 367Typically, the 368.Fa hva 369parameter will be a pointer to an area that was previously mapped via 370.Fn mmap . 371.Fn nvmm_hva_map 372will replace the content of the area, and will make it read-write (but not 373executable). 374.It 375Make available in the guest an area of guest physical memory, by calling 376.Fn nvmm_gpa_map 377and passing in the 378.Fa hva 379parameter the value that was previously given to 380.Fn nvmm_hva_map . 381.Fn nvmm_gpa_map 382does not replace the content of any memory, it only creates a direct link 383from 384.Fa gpa 385into 386.Fa hva . 387.Fn nvmm_gpa_unmap 388removes this link without modifying 389.Fa hva . 390.El 391.Pp 392The guest will then be able to use the guest physical address passed in the 393.Fa gpa 394parameter of 395.Fn nvmm_gpa_map . 396Each change the guest makes in 397.Fa gpa 398will be reflected in the host's 399.Fa hva , 400and vice versa. 401.Pp 402It is illegal for emulator software to use 403.Fn munmap 404on an area that was mapped via 405.Fn nvmm_hva_map . 406.Ss VCPU State Area 407A VCPU state area is a structure that entirely defines the content of the 408registers of a VCPU. 409Only one such structure exists, for x86: 410.Bd -literal 411struct nvmm_x64_state { 412 struct nvmm_x64_state_seg segs[NVMM_X64_NSEG]; 413 uint64_t gprs[NVMM_X64_NGPR]; 414 uint64_t crs[NVMM_X64_NCR]; 415 uint64_t drs[NVMM_X64_NDR]; 416 uint64_t msrs[NVMM_X64_NMSR]; 417 struct nvmm_x64_state_intr intr; 418 union savefpu fpu; 419}; 420#define nvmm_vcpu_state nvmm_x64_state 421.Ed 422.Pp 423Refer to functional examples to see precisely how to use this structure. 424.Pp 425A VCPU state area is divided in sub-states. 426A 427.Fa flags 428parameter is used to set and get the VCPU state; it acts as a bitmap which 429indicates which sub-states to set or get. 430.Pp 431During VM exits, a partial VCPU state area is provided in 432.Va exitstate , 433see 434.Sx Exit Reasons 435below for details. 436.Ss VCPU Programming Model 437A VCPU is described by a public structure, 438.Cd nvmm_vcpu : 439.Bd -literal 440struct nvmm_vcpu { 441 nvmm_cpuid_t cpuid; 442 struct nvmm_vcpu_state *state; 443 struct nvmm_vcpu_event *event; 444 struct nvmm_vcpu_exit *exit; 445}; 446.Ed 447.Pp 448This structure is used both publicly by emulator software and internally by 449.Nm . 450Emulator software should not modify the pointers of this structure, because 451they are initialized to special values by 452.Nm . 453.Pp 454A call to 455.Fn nvmm_vcpu_getstate 456will fetch the desired parts of the VCPU state and put them in 457.Fa vcpu->state . 458A call to 459.Fn nvmm_vcpu_setstate 460will install in the VCPU the desired parts of 461.Fa vcpu->state . 462A call to 463.Fn nvmm_vcpu_inject 464will inject in the VCPU the event in 465.Fa vcpu->event . 466A call to 467.Fn nvmm_vcpu_run 468will fill 469.Fa vcpu->exit 470with the VCPU exit information. 471.Pp 472If emulator software uses several threads, a VCPU should be associated with 473only one thread, and only this thread should perform VCPU modifications. 474Emulator software should not modify the state of a VCPU with several 475different threads. 476.Ss Exit Reasons 477The 478.Cd nvmm_vcpu_exit 479structure is used to handle VM exits: 480.Bd -literal 481/* Generic. */ 482#define NVMM_VCPU_EXIT_NONE 0x0000000000000000ULL 483#define NVMM_VCPU_EXIT_INVALID 0xFFFFFFFFFFFFFFFFULL 484/* x86: operations. */ 485#define NVMM_VCPU_EXIT_MEMORY 0x0000000000000001ULL 486#define NVMM_VCPU_EXIT_IO 0x0000000000000002ULL 487/* x86: changes in VCPU state. */ 488#define NVMM_VCPU_EXIT_SHUTDOWN 0x0000000000001000ULL 489#define NVMM_VCPU_EXIT_INT_READY 0x0000000000001001ULL 490#define NVMM_VCPU_EXIT_NMI_READY 0x0000000000001002ULL 491#define NVMM_VCPU_EXIT_HALTED 0x0000000000001003ULL 492#define NVMM_VCPU_EXIT_TPR_CHANGED 0x0000000000001004ULL 493/* x86: instructions. */ 494#define NVMM_VCPU_EXIT_RDMSR 0x0000000000002000ULL 495#define NVMM_VCPU_EXIT_WRMSR 0x0000000000002001ULL 496#define NVMM_VCPU_EXIT_MONITOR 0x0000000000002002ULL 497#define NVMM_VCPU_EXIT_MWAIT 0x0000000000002003ULL 498#define NVMM_VCPU_EXIT_CPUID 0x0000000000002004ULL 499 500struct nvmm_vcpu_exit { 501 uint64_t reason; 502 union { 503 ... 504 } u; 505 struct { 506 ... 507 } exitstate; 508}; 509.Ed 510.Pp 511The 512.Va reason 513field indicates the reason of the VM exit. 514Additional parameters describing the exit can be present in 515.Va u . 516.Va exitstate 517contains a partial, implementation-specific VCPU state, usable as a fast-path 518to retrieve certain state values. 519.Pp 520It is possible that a VM exit was caused by a reason internal to the host 521kernel, and that emulator software should not be concerned with. 522In this case, the exit reason is set to 523.Cd NVMM_VCPU_EXIT_NONE . 524This gives a chance for emulator software to halt the VM in its tracks. 525.Pp 526Refer to functional examples to see precisely how to handle VM exits. 527.Ss Event Injection 528It is possible to inject an event into a VCPU. 529An event can be a hardware interrupt, a software interrupt, or a software 530exception, defined by: 531.Bd -literal 532#define NVMM_VCPU_EVENT_EXCP 0 533#define NVMM_VCPU_EVENT_INTR 1 534 535struct nvmm_vcpu_event { 536 u_int type; 537 uint8_t vector; 538 union { 539 struct { 540 uint64_t error; 541 } excp; 542 } u; 543}; 544.Ed 545.Pp 546This describes an event of type 547.Va type , 548to be sent to vector number 549.Va vector , 550with a possible additional 551.Va error 552code that is implementation-specific. 553.Pp 554It is possible that the VCPU is in a state where it cannot receive this 555event, if: 556.Pp 557.Bl -bullet -offset indent -compact 558.It 559the event is a hardware interrupt, and the VCPU runs with interrupts disabled, 560or 561.It 562the event is a non-maskable interrupt (NMI), and the VCPU is already in an 563in-NMI context. 564.El 565.Pp 566Emulator software can manage interrupt and NMI window-exiting via the 567.Va intr 568component of the VCPU state. 569When such window-exiting is enabled, NVMM will cause a VM exit with reason 570.Cd NVMM_VCPU_EXIT_INT_READY 571or 572.Cd NVMM_VCPU_EXIT_NMI_READY 573to indicate that the guest is now able to handle the corresponding class 574of interrupts. 575.Ss Assist Callbacks 576In order to assist emulation of certain operations, 577.Nm 578requires emulator software to register, via 579.Fn nvmm_vcpu_configure , 580a set of callbacks described in the following structure: 581.Bd -literal 582struct nvmm_assist_callbacks { 583 void (*io)(struct nvmm_io *); 584 void (*mem)(struct nvmm_mem *); 585}; 586.Ed 587.Pp 588These callbacks are used by 589.Nm 590each time 591.Fn nvmm_assist_io 592or 593.Fn nvmm_assist_mem 594are invoked. 595Emulator software that does not intend to use either of these assists can put 596.Dv NULL 597in the callbacks. 598.Ss I/O Assist 599When a VM exit occurs with reason 600.Cd NVMM_VCPU_EXIT_IO , 601it is necessary for emulator software to emulate the associated I/O operation. 602.Nm 603provides an easy way for emulator software to perform that. 604.Pp 605.Fn nvmm_assist_io 606will call the registered 607.Fa io 608callback function and give it a 609.Cd nvmm_io 610structure as argument. 611This structure describes an I/O transaction: 612.Bd -literal 613struct nvmm_io { 614 struct nvmm_machine *mach; 615 struct nvmm_vcpu *vcpu; 616 uint16_t port; 617 bool in; 618 size_t size; 619 uint8_t *data; 620}; 621.Ed 622.Pp 623The callback can emulate the operation using this descriptor, following two 624unique cases: 625.Pp 626.Bl -bullet -offset indent -compact 627.It 628The operation is an input. 629In this case, the callback should fill 630.Va data 631with the desired value. 632.It 633The operation is an output. 634In this case, the callback should read 635.Va data 636to retrieve the desired value. 637.El 638.Pp 639In either case, 640.Va port 641will indicate the I/O port, 642.Va in 643will indicate if the operation is an input, and 644.Va size 645will indicate the size of the access. 646.Ss Mem Assist 647When a VM exit occurs with reason 648.Cd NVMM_VCPU_EXIT_MEMORY , 649it is necessary for emulator software to emulate the associated memory 650operation. 651.Nm 652provides an easy way for emulator software to perform that, similar to the I/O 653Assist. 654.Pp 655.Fn nvmm_assist_mem 656will call the registered 657.Fa mem 658callback function and give it a 659.Cd nvmm_mem 660structure as argument. 661This structure describes a Mem transaction: 662.Bd -literal 663struct nvmm_mem { 664 struct nvmm_machine *mach; 665 struct nvmm_vcpu *vcpu; 666 gpaddr_t gpa; 667 bool write; 668 size_t size; 669 uint8_t *data; 670}; 671.Ed 672.Pp 673The callback can emulate the operation using this descriptor, following two 674unique cases: 675.Pp 676.Bl -bullet -offset indent -compact 677.It 678The operation is a read. 679In this case, the callback should fill 680.Va data 681with the desired value. 682.It 683The operation is a write. 684In this case, the callback should read 685.Va data 686to retrieve the desired value. 687.El 688.Pp 689In either case, 690.Va gpa 691will indicate the guest physical address, 692.Va write 693will indicate if the access is a write, and 694.Va size 695will indicate the size of the access. 696.Sh RETURN VALUES 697Upon successful completion, each of these functions returns zero. 698Otherwise, a value of \-1 is returned and the global 699variable 700.Va errno 701is set to indicate the error. 702.Sh FILES 703.Bl -tag -width XXXX -compact 704.It Lk https://www.netbsd.org/~maxv/nvmm/nvmm-demo.zip 705Functional example (demonstrator). 706Contains an emulator that uses the 707.Nm 708API, and a small kernel that exercises this emulator. 709.It Pa src/sys/dev/virtual/nvmm/ 710Source code of the kernel 711.Xr nvmm 4 712driver. 713.It Pa src/lib/libnvmm/ 714Source code of the 715.Nm 716library. 717.El 718.Sh ERRORS 719These functions will fail if: 720.Bl -tag -width [ENOBUFS] 721.It Bq Er EEXIST 722An attempt was made to create a machine or a VCPU that already exists. 723.It Bq Er EFAULT 724An attempt was made to emulate a memory-based operation in a guest, and the 725guest page tables did not have the permissions necessary for the operation 726to complete successfully. 727.It Bq Er EINVAL 728An inappropriate parameter was used. 729.It Bq Er ENOBUFS 730The maximum number of machines or VCPUs was reached. 731.It Bq Er ENOENT 732A query was made on a machine or a VCPU that does not exist. 733.It Bq Er EPERM 734An attempt was made to access a machine that does not belong to the process. 735.El 736.Sh SEE ALSO 737.Xr nvmm 4 , 738.Xr nvmmctl 8 739.Sh AUTHORS 740NVMM was designed and implemented by 741.An Maxime Villard . 742