在Proxmox VE 7.1 中开启vGPU_unlock,实现显卡虚拟化
一:了解NVIDIA vGPU
下图是Nvidia vGPU的原理。在宿主机上安装vGPU驱动,使用nvidia vGPU管理器控制vGPU,随后创建多个mdev设备,也就是vGPU,用于直通到虚拟机,虚拟机使用Nvidia 驱动用于驱动vGPU。有点类似gvt-g。不过这里最重要的是NVIDIA vGPU管理器 。
在宿主机上安装好nvidia vgpu驱动之后,会有2个服务。
- nvidia-vgpud.service
- nvidia-vgpu-mgr.service
简单的解释下这2个服务在vgpu启动时的作用:
1、在使用vGPU卡的时候,正常逻辑是,开机之后,nvidia-vgpud 服务会查询内核中所有已安装的 GPU,并检查 vGPU 功能。如果找到支持 vGPU 的 GPU,则 nvidia-vgpu 会创建一个 MDEV 设备,系统会创建 /sys/class/mdev_bus 目录。
2、将这些设备分配给 VM,当 VM 启动时,它将打开 MDEV 设备。nvidia-vgpu-mgr 此时会使用 ioctl 与内核进行通信。当 nvidia-vgpu-mgr 询问 GPU 是否支持 vGPU 时,vgpu会回答是,随后尝试初始化 vGPU 设备。
目前vgpu_unlock项目只支持Time-sliced技术,也就是单GPU实例性能会动态分配。如一张P4,如果只有一个GPU实例,那么多获得接近100%的性能,同时2个GPU实例,会分别获得1/2的性能。
根据Nvidia vgpu限制,单GPU实例,最少1g显存。如P4 8G,最多有8个1G 显存的GPU实例同时运行
它这是使消费卡能够支持vGPU 技术,而不是破解了授权。需要授权,还是需要去nvidia购买!
二:了解vgpu_unlock原理
正如我们上说vgpu的启动流程。当然我们使用消费卡的时候,nvidia-vgpud这个服务会检测卡的类型,如果是消费卡,自然不会创建mdev设备。如果使用vgpu_unlock,此脚本会拦截nvidia-vgpud的调用,然后欺骗它,这是一张vGPU卡,快产生mdev设备信息吧!
将mdev设备直通给虚拟机,启动的时候,vgpu_unlock又会拦截nvdia-vgpu-mgr服务,告诉它,GPU支持vGPU,快初始化设备吧!
三:vGPU_unlock支持的显卡
请务必参考下面列表显卡,如果是专业卡,请和下面同代核心就行。
不支持30系!!!!!!!!!!
[21c4] TU116 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000
[21d1] TU116BM [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000
[21c2] TU116 -> Quadro RTX 6000
[2182] TU116 [GeForce GTX 1660 Ti] -> Quadro RTX 6000
[2183] TU116 -> Quadro RTX 6000
[2184] TU116 [GeForce GTX 1660] -> Quadro RTX 6000
[2187] TU116 [GeForce GTX 1650 SUPER] -> Quadro RTX 6000
[2188] TU116 [GeForce GTX 1650] -> Quadro RTX 6000
[2191] TU116M [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000
[2192] TU116M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000
[21ae] TU116GL -> Quadro RTX 6000
[21bf] TU116GL -> Quadro RTX 6000
[2189] TU116 [CMP 30HX] -> Quadro RTX 6000
[1fbf] TU117GL -> Quadro RTX 6000
[1fbb] TU117GLM [Quadro T500 Mobile] -> Quadro RTX 6000
[1fd9] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000
[1ff9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000
[1fdd] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000
[1f96] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000
[1f99] TU117M -> Quadro RTX 6000
[1fae] TU117GL -> Quadro RTX 6000
[1fb8] TU117GLM [Quadro T2000 Mobile / Max-Q] -> Quadro RTX 6000
[1fb9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000
[1f97] TU117M [GeForce MX450] -> Quadro RTX 6000
[1f98] TU117M [GeForce MX450] -> Quadro RTX 6000
[1f9c] TU117M [GeForce MX450] -> Quadro RTX 6000
[1f9d] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000
[1fb0] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000
[1fb1] TU117GL [T600] -> Quadro RTX 6000
[1fb2] TU117GLM [Quadro T400 Mobile] -> Quadro RTX 6000
[1fba] TU117GLM [T600 Mobile] -> Quadro RTX 6000
[1f42] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000
[1f47] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000
[1f50] TU106BM [GeForce RTX 2070 Mobile / Max-Q] -> Quadro RTX 6000
[1f51] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000
[1f54] TU106BM [GeForce RTX 2070 Mobile] -> Quadro RTX 6000
[1f55] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000
[1f81] TU117 -> Quadro RTX 6000
[1f82] TU117 [GeForce GTX 1650] -> Quadro RTX 6000
[1f91] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000
[1f92] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000
[1f94] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000
[1f95] TU117M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000
[1f76] TU106GLM [Quadro RTX 3000 Mobile Refresh] -> Quadro RTX 6000
[1f07] TU106 [GeForce RTX 2070 Rev. A] -> Quadro RTX 6000
[1f08] TU106 [GeForce RTX 2060 Rev. A] -> Quadro RTX 6000
[1f09] TU106 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000
[1f0a] TU106 [GeForce GTX 1650] -> Quadro RTX 6000
[1f10] TU106M [GeForce RTX 2070 Mobile] -> Quadro RTX 6000
[1f11] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000
[1f12] TU106M [GeForce RTX 2060 Max-Q] -> Quadro RTX 6000
[1f14] TU106M [GeForce RTX 2070 Mobile / Max-Q Refresh] -> Quadro RTX 6000
[1f15] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000
[1f2e] TU106M -> Quadro RTX 6000
[1f36] TU106GLM [Quadro RTX 3000 Mobile / Max-Q] -> Quadro RTX 6000
[1f0b] TU106 [CMP 40HX] -> Quadro RTX 6000
[1eb5] TU104GLM [Quadro RTX 5000 Mobile / Max-Q] -> Quadro RTX 6000
[1eb6] TU104GLM [Quadro RTX 4000 Mobile / Max-Q] -> Quadro RTX 6000
[1eb8] TU104GL [Tesla T4] -> Quadro RTX 6000
[1eb9] TU104GL -> Quadro RTX 6000
[1ebe] TU104GL -> Quadro RTX 6000
[1ec2] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000
[1ec7] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000
[1ed0] TU104BM [GeForce RTX 2080 Mobile] -> Quadro RTX 6000
[1ed1] TU104BM [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000
[1ed3] TU104BM [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000
[1f02] TU106 [GeForce RTX 2070] -> Quadro RTX 6000
[1f04] TU106 -> Quadro RTX 6000
[1f06] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000
[1ef5] TU104GLM [Quadro RTX 5000 Mobile Refresh] -> Quadro RTX 6000
[1e81] TU104 [GeForce RTX 2080 SUPER] -> Quadro RTX 6000
[1e82] TU104 [GeForce RTX 2080] -> Quadro RTX 6000
[1e84] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000
[1e87] TU104 [GeForce RTX 2080 Rev. A] -> Quadro RTX 6000
[1e89] TU104 [GeForce RTX 2060] -> Quadro RTX 6000
[1e90] TU104M [GeForce RTX 2080 Mobile] -> Quadro RTX 6000
[1e91] TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000
[1e93] TU104M [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000
[1eab] TU104M -> Quadro RTX 6000
[1eae] TU104M -> Quadro RTX 6000
[1eb0] TU104GL [Quadro RTX 5000] -> Quadro RTX 6000
[1eb1] TU104GL [Quadro RTX 4000] -> Quadro RTX 6000
[1eb4] TU104GL [T4G] -> Quadro RTX 6000
[1e04] TU102 [GeForce RTX 2080 Ti] -> Quadro RTX 6000
[1e07] TU102 [GeForce RTX 2080 Ti Rev. A] -> Quadro RTX 6000
[1e2d] TU102 [GeForce RTX 2080 Ti Engineering Sample] -> Quadro RTX 6000
[1e2e] TU102 [GeForce RTX 2080 Ti 12GB Engineering Sample] -> Quadro RTX 6000
[1e30] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000
[1e36] TU102GL [Quadro RTX 6000] -> Quadro RTX 6000
[1e37] TU102GL [GRID RTX T10-4/T10-8/T10-16] -> Quadro RTX 6000
[1e38] TU102GL -> Quadro RTX 6000
[1e3c] TU102GL -> Quadro RTX 6000
[1e3d] TU102GL -> Quadro RTX 6000
[1e3e] TU102GL -> Quadro RTX 6000
[1e78] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000
[1e09] TU102 [CMP 50HX] -> Quadro RTX 6000
[1dba] GV100GL [Quadro GV100] -> Tesla V100 32GB PCIE
[1e02] TU102 [TITAN RTX] -> Quadro RTX 6000
[1cfa] GP107GL [Quadro P2000] -> Tesla P40
[1cfb] GP107GL [Quadro P1000] -> Tesla P40
[1d01] GP108 [GeForce GT 1030] -> Tesla P40
[1d10] GP108M [GeForce MX150] -> Tesla P40
[1d11] GP108M [GeForce MX230] -> Tesla P40
[1d12] GP108M [GeForce MX150] -> Tesla P40
[1d13] GP108M [GeForce MX250] -> Tesla P40
[1d16] GP108M [GeForce MX330] -> Tesla P40
[1d33] GP108GLM [Quadro P500 Mobile] -> Tesla P40
[1d34] GP108GLM [Quadro P520] -> Tesla P40
[1d52] GP108BM [GeForce MX250] -> Tesla P40
[1d56] GP108BM [GeForce MX330] -> Tesla P40
[1d81] GV100 [TITAN V] -> Tesla V100 32GB PCIE
[1cb6] GP107GL [Quadro P620] -> Tesla P40
[1cba] GP107GLM [Quadro P2000 Mobile] -> Tesla P40
[1cbb] GP107GLM [Quadro P1000 Mobile] -> Tesla P40
[1cbc] GP107GLM [Quadro P600 Mobile] -> Tesla P40
[1cbd] GP107GLM [Quadro P620] -> Tesla P40
[1ccc] GP107BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40
[1ccd] GP107BM [GeForce GTX 1050 Mobile] -> Tesla P40
[1ca8] GP107GL -> Tesla P40
[1caa] GP107GL -> Tesla P40
[1cb1] GP107GL [Quadro P1000] -> Tesla P40
[1cb2] GP107GL [Quadro P600] -> Tesla P40
[1cb3] GP107GL [Quadro P400] -> Tesla P40
[1c70] GP106GL -> Tesla P40
[1c81] GP107 [GeForce GTX 1050] -> Tesla P40
[1c82] GP107 [GeForce GTX 1050 Ti] -> Tesla P40
[1c83] GP107 [GeForce GTX 1050 3GB] -> Tesla P40
[1c8c] GP107M [GeForce GTX 1050 Ti Mobile] -> Tesla P40
[1c8d] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40
[1c8e] GP107M -> Tesla P40
[1c8f] GP107M [GeForce GTX 1050 Ti Max-Q] -> Tesla P40
[1c90] GP107M [GeForce MX150] -> Tesla P40
[1c91] GP107M [GeForce GTX 1050 3 GB Max-Q] -> Tesla P40
[1c92] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40
[1c94] GP107M [GeForce MX350] -> Tesla P40
[1c96] GP107M [GeForce MX350] -> Tesla P40
[1ca7] GP107GL -> Tesla P40
[1c36] GP106 [P106M] -> Tesla P40
[1c07] GP106 [P106-100] -> Tesla P40
[1c09] GP106 [P106-090] -> Tesla P40
[1c20] GP106M [GeForce GTX 1060 Mobile] -> Tesla P40
[1c21] GP106M [GeForce GTX 1050 Ti Mobile] -> Tesla P40
[1c22] GP106M [GeForce GTX 1050 Mobile] -> Tesla P40
[1c23] GP106M [GeForce GTX 1060 Mobile Rev. 2] -> Tesla P40
[1c2d] GP106M -> Tesla P40
[1c30] GP106GL [Quadro P2000] -> Tesla P40
[1c31] GP106GL [Quadro P2200] -> Tesla P40
[1c35] GP106M [Quadro P2000 Mobile] -> Tesla P40
[1c60] GP106BM [GeForce GTX 1060 Mobile 6GB] -> Tesla P40
[1c61] GP106BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40
[1c62] GP106BM [GeForce GTX 1050 Mobile] -> Tesla P40
[1bb8] GP104GLM [Quadro P3000 Mobile] -> Tesla P40
[1bb9] GP104GLM [Quadro P4200 Mobile] -> Tesla P40
[1bbb] GP104GLM [Quadro P3200 Mobile] -> Tesla P40
[1bc7] GP104 [P104-101] -> Tesla P40
[1be0] GP104BM [GeForce GTX 1080 Mobile] -> Tesla P40
[1be1] GP104BM [GeForce GTX 1070 Mobile] -> Tesla P40
[1c00] GP106 -> Tesla P40
[1c01] GP106 -> Tesla P40
[1c02] GP106 [GeForce GTX 1060 3GB] -> Tesla P40
[1c03] GP106 [GeForce GTX 1060 6GB] -> Tesla P40
[1c04] GP106 [GeForce GTX 1060 5GB] -> Tesla P40
[1c06] GP106 [GeForce GTX 1060 6GB Rev. 2] -> Tesla P40
[1b87] GP104 [P104-100] -> Tesla P40
[1ba0] GP104M [GeForce GTX 1080 Mobile] -> Tesla P40
[1ba1] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40
[1ba2] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40
[1ba9] GP104M -> Tesla P40
[1baa] GP104M -> Tesla P40
[1bad] GP104 [GeForce GTX 1070 Engineering Sample] -> Tesla P40
[1bb0] GP104GL [Quadro P5000] -> Tesla P40
[1bb1] GP104GL [Quadro P4000] -> Tesla P40
[1bb3] GP104GL [Tesla P4] -> Tesla P40
[1bb4] GP104GL [Tesla P6] -> Tesla P40
[1bb5] GP104GLM [Quadro P5200 Mobile] -> Tesla P40
[1bb6] GP104GLM [Quadro P5000 Mobile] -> Tesla P40
[1bb7] GP104GLM [Quadro P4000 Mobile] -> Tesla P40
[1b06] GP102 [GeForce GTX 1080 Ti] -> Tesla P40
[1b07] GP102 [P102-100] -> Tesla P40
[1b30] GP102GL [Quadro P6000] -> Tesla P40
[1b38] GP102GL [Tesla P40] -> Tesla P40
[1b70] GP102GL -> Tesla P40
[1b78] GP102GL -> Tesla P40
[1b80] GP104 [GeForce GTX 1080] -> Tesla P40
[1b81] GP104 [GeForce GTX 1070] -> Tesla P40
[1b82] GP104 [GeForce GTX 1070 Ti] -> Tesla P40
[1b83] GP104 [GeForce GTX 1060 6GB] -> Tesla P40
[1b84] GP104 [GeForce GTX 1060 3GB] -> Tesla P40
[1b39] GP102GL [Tesla P10] -> Tesla P40
[1b00] GP102 [TITAN X] -> Tesla P40
[1b01] GP102 [GeForce GTX 1080 Ti 10GB] -> Tesla P40
[1b02] GP102 [TITAN Xp] -> Tesla P40
[1b04] GP102 -> Tesla P40
[179c] GM107 [GeForce 940MX] -> Tesla M10
[17c2] GM200 [GeForce GTX TITAN X] -> Tesla M60
[17c8] GM200 [GeForce GTX 980 Ti] -> Tesla M60
[17f0] GM200GL [Quadro M6000] -> Tesla M60
[17f1] GM200GL [Quadro M6000 24GB] -> Tesla M60
[17fd] GM200GL [Tesla M40] -> Tesla M60
[1617] GM204M [GeForce GTX 980M] -> Tesla M60
[1618] GM204M [GeForce GTX 970M] -> Tesla M60
[1619] GM204M [GeForce GTX 965M] -> Tesla M60
[161a] GM204M [GeForce GTX 980 Mobile] -> Tesla M60
[1667] GM204M [GeForce GTX 965M] -> Tesla M60
[1725] GP100 -> Tesla P40
[172e] GP100 -> Tesla P40
[172f] GP100 -> Tesla P40
[174d] GM108M [GeForce MX130] -> Tesla M10
[174e] GM108M [GeForce MX110] -> Tesla M10
[1789] GM107GL [GRID M3-3020] -> Tesla M10
[1402] GM206 [GeForce GTX 950] -> Tesla M60
[1406] GM206 [GeForce GTX 960 OEM] -> Tesla M60
[1407] GM206 [GeForce GTX 750 v2] -> Tesla M60
[1427] GM206M [GeForce GTX 965M] -> Tesla M60
[1430] GM206GL [Quadro M2000] -> Tesla M60
[1431] GM206GL [Tesla M4] -> Tesla M60
[1436] GM206GLM [Quadro M2200 Mobile] -> Tesla M60
[15f0] GP100GL [Quadro GP100] -> Tesla P40
[15f1] GP100GL -> Tesla P40
[1404] GM206 [GeForce GTX 960 FAKE] -> Tesla M60
[13d8] GM204M [GeForce GTX 970M] -> Tesla M60
[13d9] GM204M [GeForce GTX 965M] -> Tesla M60
[13da] GM204M [GeForce GTX 980 Mobile] -> Tesla M60
[13e7] GM204GL [GeForce GTX 980 Engineering Sample] -> Tesla M60
[13f0] GM204GL [Quadro M5000] -> Tesla M60
[13f1] GM204GL [Quadro M4000] -> Tesla M60
[13f2] GM204GL [Tesla M60] -> Tesla M60
[13f3] GM204GL [Tesla M6] -> Tesla M60
[13f8] GM204GLM [Quadro M5000M / M5000 SE] -> Tesla M60
[13f9] GM204GLM [Quadro M4000M] -> Tesla M60
[13fa] GM204GLM [Quadro M3000M] -> Tesla M60
[13fb] GM204GLM [Quadro M5500] -> Tesla M60
[1401] GM206 [GeForce GTX 960] -> Tesla M60
[13b3] GM107GLM [Quadro K2200M] -> Tesla M10
[13b4] GM107GLM [Quadro M620 Mobile] -> Tesla M10
[13b6] GM107GLM [Quadro M1200 Mobile] -> Tesla M10
[13b9] GM107GL [NVS 810] -> Tesla M10
[13ba] GM107GL [Quadro K2200] -> Tesla M10
[13bb] GM107GL [Quadro K620] -> Tesla M10
[13bc] GM107GL [Quadro K1200] -> Tesla M10
[13bd] GM107GL [Tesla M10] -> Tesla M10
[13c0] GM204 [GeForce GTX 980] -> Tesla M60
[13c1] GM204 -> Tesla M60
[13c2] GM204 [GeForce GTX 970] -> Tesla M60
[13c3] GM204 -> Tesla M60
[13d7] GM204M [GeForce GTX 980M] -> Tesla M60
[1389] GM107GL [GRID M30] -> Tesla M10
[1390] GM107M [GeForce 845M] -> Tesla M10
[1391] GM107M [GeForce GTX 850M] -> Tesla M10
[1392] GM107M [GeForce GTX 860M] -> Tesla M10
[1393] GM107M [GeForce 840M] -> Tesla M10
[1398] GM107M [GeForce 845M] -> Tesla M10
[1399] GM107M [GeForce 945M] -> Tesla M10
[139a] GM107M [GeForce GTX 950M] -> Tesla M10
[139b] GM107M [GeForce GTX 960M] -> Tesla M10
[139c] GM107M [GeForce 940M] -> Tesla M10
[139d] GM107M [GeForce GTX 750 Ti] -> Tesla M10
[13b0] GM107GLM [Quadro M2000M] -> Tesla M10
[13b1] GM107GLM [Quadro M1000M] -> Tesla M10
[13b2] GM107GLM [Quadro M600M] -> Tesla M10
[1347] GM108M [GeForce 940M] -> Tesla M10
[1348] GM108M [GeForce 945M / 945A] -> Tesla M10
[1349] GM108M [GeForce 930M] -> Tesla M10
[134b] GM108M [GeForce 940MX] -> Tesla M10
[134d] GM108M [GeForce 940MX] -> Tesla M10
[134e] GM108M [GeForce 930MX] -> Tesla M10
[134f] GM108M [GeForce 920MX] -> Tesla M10
[137a] GM108GLM [Quadro K620M / Quadro M500M] -> Tesla M10
[137b] GM108GLM [Quadro M520 Mobile] -> Tesla M10
[137d] GM108M [GeForce 940A] -> Tesla M10
[1380] GM107 [GeForce GTX 750 Ti] -> Tesla M10
[1381] GM107 [GeForce GTX 750] -> Tesla M10
[1382] GM107 [GeForce GTX 745] -> Tesla M10
[1340] GM108M [GeForce 830M] -> Tesla M10
[1341] GM108M [GeForce 840M] -> Tesla M10
[1344] GM108M [GeForce 845M] -> Tesla M10
[1346] GM108M [GeForce 930M] -> Tesla M10
四:准备环境
4.1: 配置软件源
rm /etc/apt/sources.list
rm /etc/apt/sources.list.d/*
echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free">>/etc/apt/sources.list
echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free">>/etc/apt/sources.list
echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free">>/etc/apt/sources.list
echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bullseye-security main contrib non-free">>/etc/apt/sources.list
echo "deb https://mirrors.tuna.tsinghua.edu.cn/proxmox/debian bullseye pve-no-subscription">>/etc/apt/sources.list
4.2 安装必要的软件包
apt update && apt install dkms git build-essential pve-kernel-5.15 pve-headers-5.15 dkms cargo jq uuid-runtime -y
安装mdevctl
wget -P /opt/ http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
dpkg -i /opt/mdevctl_0.81-1_all.deb
4.3 配置内核
echo vfio >> /etc/modules
echo vfio_iommu_type1 >> /etc/modules
echo vfio_pci >> /etc/modules
echo vfio_virqfd >> /etc/modules
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
update-initramfs -k all -u
4.4 配置引导
#编辑grub,请不要盲目改。根据自己的环境,选择设置
nano /etc/default/grub
#在里面找到:
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
#然后修改为:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
#如果是amd cpu请改为:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
#更新引导
update-grub
4.5 安装驱动
重启主机,待重启之后,验证系统内核是否在5.15
root@pve:~# uname -r
5.15.30-2-pve
如出现5.15则说明正确。
验证是否开启iommu
出现有如下iommu group说明成功
root@pve3:~# dmesg |grep iommu
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on
[ 0.075784] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on
[ 0.352588] iommu: Default domain type: Passthrough (set via kernel command line)
[ 1.373583] pci 0000:00:00.0: Adding to iommu group 0
[ 1.373592] pci 0000:00:02.0: Adding to iommu group 1
[ 1.373605] pci 0000:00:14.0: Adding to iommu group 2
[ 1.373613] pci 0000:00:17.0: Adding to iommu group 3
[ 1.373623] pci 0000:00:1c.0: Adding to iommu group 4
[ 1.373637] pci 0000:00:1d.0: Adding to iommu group 5
[ 1.373647] pci 0000:00:1d.2: Adding to iommu group 6
[ 1.373656] pci 0000:00:1d.3: Adding to iommu group 7
[ 1.373675] pci 0000:00:1f.0: Adding to iommu group 8
[ 1.373683] pci 0000:00:1f.2: Adding to iommu group 8
[ 1.373691] pci 0000:00:1f.3: Adding to iommu group 8
[ 1.373699] pci 0000:00:1f.4: Adding to iommu group 8
[ 1.373707] pci 0000:00:1f.6: Adding to iommu group 9
[ 1.373717] pci 0000:01:00.0: Adding to iommu group 10
[ 1.373726] pci 0000:03:00.0: Adding to iommu group 11
[ 1.373735] pci 0000:05:00.0: Adding to iommu group 12
[ 1.656483] intel_iommu=on
验证nouveau是否未启用
无输出,代表未启用
root@pve3:~# lsmod|grep nouveau
root@pve3:~#
下载驱动
#将驱动下载至/opt目录
wget https://mirrors.apqa.cn/d/vGPU/vgpu_unlock/drivers/NVIDIA-Linux-x86_64-510.47.03-vgpu-kvm-custom.run -P /opt
给驱动添加可执行权限
chmod +x /opt/NVIDIA-Linux-x86_64-510.47.03-vgpu-kvm-custom.run
以dkms方式安装驱动
sh -c /opt/NVIDIA-Linux-x86_64-510.47.03-vgpu-kvm-custom.run --dkms
运行命令后,会提示是否用dkms方式安装,选择yes,回车继续
出现xorg告警,忽略
询问是否启用32位兼容库。这里可选可不选
开始安装驱动
进度条走完就ok,可能会有点时间。
五:配置vgpu_unlock
5.1 编译
cd /opt && git clone https://github.com/mbilker/vgpu_unlock-rs.git
cd /opt/vgpu_unlock-rs
cargo build --release
mkdir /etc/systemd/system/{nvidia-vgpud.service.d,nvidia-vgpu-mgr.service.d}
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf
systemctl daemon-reload
重启主机。
六:验证
重启之后,使用nvidia-smi
确认是否如下,显示GPU信息。
root@pve:~# nvidia-smi
Wed Apr 27 23:33:10 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 P106-090 Off | 00000000:05:00.0 Off | N/A |
| 31% 35C P0 28W / 75W | 11MiB / 3071MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
使用mdevctl types
验证是否出现mdev设备
root@pve:/opt/vgpu_unlock-rs# mdevctl types
0000:05:00.0
nvidia-156
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-215
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2B4
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-241
Available instances: 24
Device API: vfio-pci
Name: GRID P40-1B4
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-283
Available instances: 6
Device API: vfio-pci
Name: GRID P40-4C
Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6
nvidia-284
Available instances: 4
Device API: vfio-pci
Name: GRID P40-6C
Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4
nvidia-285
Available instances: 3
Device API: vfio-pci
Name: GRID P40-8C
Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3
nvidia-286
Available instances: 2
Device API: vfio-pci
Name: GRID P40-12C
Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2
七:开始使用
7.0 认识原生vGPU和vGPU-rs
原生vGPU就是在系统内识别就是一个vGPU设备,需要安装vGPU驱动,需要授权
vGPU-rs是改写了vGPU逻辑信息,在系统里面识别出来是一个专业卡,安装专业卡的驱动。这样不需要授权了,但缺失了nevnc。需要做一些配置(如下)。
7.1 配置vgpu参数(可选配置。使用原生vgpu,可以忽略)
#创建配置文件夹
mkdir /etc/vgpu_unlock
#创建vgpu配置文件
touch /etc/vgpu_unlock/profile_override.toml
将vgpu配置信息写入/etc/vgpu_unlock/profile_override.toml
每次启动一个vgpu设备,vgpu-mgr服务会自动读取此文件,所以修改此文件,是下次启动生效。
[profile.nvidia-18]
num_displays = 1
display_width = 1920
display_height = 1080
max_pixels = 2073600
cuda_enabled = 1
frl_enabled = 0
framebuffer = 12348030976
pci_id = 0x17F011A9
pci_device_id = 0x17F0
参数说明:
[profile.nvidia-18]
这是针对nvidia-18
vgpu型号的配置。若需要配置的vgpu型号为nvidia-46,则需要改成nvidia-46。见7.2节
num_displays
最大显示器数量
display_width = 1920
display_height = 1080
max_pixels = 2073600 这3个是虚拟显示器的分辨率,max_pixels是长宽的乘积
cuda_enabled = 1
是否开启cudafrl_enabled = 0
是否限制帧数,0为不限制framebuffer =
显存,请查看下面的补充pci_id =
SDID SVID的组合pci_device_id =
DID 设备id
7.1.1 framebuffer
framebuffer意思是vgpu管理程序设定的vgpu显存。
通过这个网址换算在线文件大小(bit,bytes,KB,MB,GB,TB)转换换算-BeJSON.com
例如,你期望显存为2048M,所以就用2048-128=1920
更新:新版本占用64M,所以-64,而不是128
进入上面的网址,进行换算。bytes是我们要的结果
换算结果为2013265920
注意!非必要情况,请勿修改显卡,否则无法初始化mdev设备。
7.1.2 pci_id和pci_device_id
在正常情况下,将vgpu设备直通给VM,会带有vgpu的设备id,这样在系统内,会识别这个vgpu为p40-1a或者rt6000-1a之类的型号。随后安装nvidia-vgpu驱动,会将vgpu设备作为一个vgpu设备来使用,如进行授权管理。
正因为vgpu卡和普通的消费卡,核心相同,只是驱动不同,导致了功能有所不一样,所以有了vgpu_unlock项目,让消费卡也能支持vgpu。
这是宿主机层面的。
在虚拟机层面来讲。vgpu的核心,其实和显卡的核心一样,那么从理论上,将vgpu的设备id改成消费卡的id,那么也应该能够驱动。
然而,由于消费卡某些专业功能不能使用,所以建议将vgpu的设备id改成专业卡的id。
配置文件中的pci_id = 0x17F011A0
和pci_device_id = 0x17F0
就是修改vgpu的设备信息。这些参数,vgpu管理程序会读取这些信息,重写vgpu配置,更加的稳定和真实。
pci_device_id:是vgpu所属的设备id
这项属性应该从此处获得:https://devicehunt.com/view/type/pci/vendor/10DE/
正因我们的目的,是改写vgpu信息,使其在虚拟机内,能被识别为专业卡,从而绕过vgpu的驱动限制,无需授权。所以,需要下载专业卡的驱动。
所以,我们应该根据你的物理卡的核心来配置这个设备id。
例如,你有一张1080来使用vgpu,从上面的网站,我们可以看到1080的核心代号为,
那么你应该选择核心为GP104GL的卡。如下,所以你应该选择P5000或者P4000。所以就去nvidia去下载P5000或者P4000的驱动。
所以如果你要用1080,那么你的pci_device_id = 0x1BB0。但是这个id可能打驱动,会识别不了硬件。只需要换一个,如果P5000的ID不行,换P4000的再试。
pci_id: SDID的SVID的组合
pci_id 和pci_device_id用下面一张图就可以看得懂
SDID是二级制造商设备识别码,可以和DID一样
SVID是二级制造商识别码。可以和VID一样
那么如果你不知道这些信息,你可以直接写pci_id = 0x1BB010DE
7.2 vgpu类型
当我们使用mdevctl types
会出现很多信息。其中就包括了vgpu的型号
root@pve2:/opt/vgpu_unlock# mdevctl types
0000:01:00.0
nvidia-156
Available instances: 0
Device API: vfio-pci
Name: GRID P40-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M ,
max_resolution=5120x2880, max_instance=12
nvidia-215
Available instances: 0
Device API: vfio-pci
Name: GRID P40-2B4
Description: num_heads=4, frl_config=45, framebuffer=2048M,
max_resolution=5120x2880, max_instance=12
这些是什么意思呢?
举个例子
nvidia-257
Available instances: 4
Device API: vfio-pci
Name: GRID RTX6000-2Q
Description: num=heads=4, frl_config=60,
framebuffer=2048MB,max_resolution=7680x4320, max_instance=4
- nvidia-257 -->vgpu 类型
- Available instances --->可用的设备数
- Name--->显示名
- Description--->描述,framebuffer 显存,frl 应该是最大 fps,分辨率,最多的设备
其中 GRID RTX6000-2Q 是 mdev 的名字,RTX6000--显卡名,2--2G 显存,Q 代表 vWS
关于最后一位字母,如下
A = Virtual Applications (vApps)
B = Virtual Desktops (vPC)
C = AI/Machine Learning/Training (vCS or vWS)
Q = Virtual Workstations (vWS)(性能最好)
每种不同类型的GPU卡,都会存在不同的vgpu类型。例如P4,有P4-1B,例如RTX6000-1B之类的
总体不变的是上面所说的规则:
按照显存分,如P4-1B,P4-1Q,都属于1g显存。
按照功能分,如P4-1B,vPC设备,P4-1Q,vDWS设备。需要不同的许可证。
在虚拟化层面,我们只关心vgpu的型号,也就是nvidia-257
在配置vgpu的时候,我们就需要选择正确的型号。
如上图所示,我们需要通过mdevctl types的输出,找到我们需要的vgpu型号,通过profile_override.toml
配置参数,再去web界面配置vgpu,才能完成vgpu部署。
7.4.1 修改虚拟机配置(必须操作)//PVE 7.2-7无需此操作
添加下面行到虚拟机conf中
args: -uuid 00000000-0000-0000-0000-000000000100
注意的是,uuid最后的值需要改成你的vmid。如果你的vmid为3333,那么你应该改成
args: -uuid 00000000-0000-0000-0000-000000003333
如果你的vmid是121,那么你应该改成
args: -uuid 00000000-0000-0000-0000-000000000121
注意,uuid的长度和格式是不能变的,根据自己的vmid,替换尾数。
7.4.2 修改PVE的代码,以略过上述步骤,简化部署。//PVE 7.2-7之后无需修改,官方以添加此功能
先备份一下
cp /usr/share/perl5/PVE/QemuServer.pm /usr/share/perl5/PVE/QemuServer.pm.back
使用vim或者nano或者其他你熟悉的编辑器,打开/usr/share/perl5/PVE/QemuServer.pm
找到下面部分。
push @$cmd, $kvm_binary;
push @$cmd, '-id', $vmid;
my $vmname = $conf->{name} || "vm$vmid";
push @$cmd, '-name', $vmname;
push @$cmd, '-no-shutdown';
my $use_virtio = 0;
小技巧,可以通过搜索-no-shutdown
快速定位
在-no-shutdown
这一行下面添加2行
my $vmuuid = PVE::SysFSTools::generate_mdev_uuid($vmid);
push @$cmd, '-uuid' , $vmuuid;
最终结果如下
保存修改,
使用命令重启一下pvedaemon
systemctl restart pvedaemon.service
如果有报错,或者重启失败,那肯定是你没修改对。建议仔细看看
这样的好处是,系统自动给虚拟生成UUID,就不需要手动添加虚拟机配置文件了。直接在web上添加vgpu即可。
注意:没有测试过与其他mdev设备是否会冲突。如gvt-g。请自行测试。
升级之后,需要重启修改。
如果要回退,请执行
cp /usr/share/perl5/PVE/QemuServer.pm.back /usr/share/perl5/PVE/QemuServer.pm
7.5 创建虚拟机
使用vgpu建议使用Windows 21h1以上的系统。
7.5.1 创建虚拟机并安装系统
创建一个虚拟机,seabios和ovmf都可以,芯片组必须是Q35!除非你Q35确实不能用,则换成i440fx。此时不要直通显示设备。
参考配置如下
vgpu在系统中,是作为一个3d设备,所以需要一个额外的显示卡,也就是不要把显卡设置成无!
创建好系统之后,请在系统中,开启远程功能。如远程桌面,todesk,vnc,向日葵,parsec等。
这是因为Win10此类系统,会联网自动安装驱动,如果直通了vgpu,且系统安装了驱动,系统会呈现双显示器状态,可能导致PVE网页虚拟机控制台黑屏,或者是副屏状态,导致无法操作虚拟机。如下面
如果你不慎掉入这个的坑,请关闭虚拟机,分离vgpu,开启远程功能。
7.5.2 直通vgpu设备
在面板,点击添加PCI设备,勾选所有功能和PCIE(vGPU是一个3d设备,请不要勾选主GPU)。在Mdev类型中选择vgpu设备。选哪种,请参考上文。
最终虚拟机配置,像这样:
现在你可以开启虚拟机。如果是严格按照上面教程操作,那么应该不会有意外发生。
如果你看到有下面提示:
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:05:00.0/00000000-0000-0000-0000-000000003561,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: warning: vfio 00000000-0000-0000-0000-000000003561: Could not enable error recovery for the device
TASK OK
不要在意,这只是个提示,最终结果都是TASK OK。
7.5.3 安装显卡驱动
显卡驱动需要低于或者等于vGPU宿主机驱动。否则代码43。当然也不能太老。
例如本文460.73
,所以用下面的453.10是完全可以的!要使用473,那一定会出现问题!
至于在哪里下驱动?我可以吐槽吗?当然是nvidia去下载咯,
https://www.nvidia.com/Download/Find.aspx?lang=en-us
选择模拟之后的卡的驱动,建议选择DCH驱动程序
我已经将比较兼容的驱动,放置在网盘中
https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/guestdrivers/
请按照自己的情况下载驱动。
7.6 访问虚拟机
正常安装好驱动,那么不出意外,你的设备管理器,会看到模拟成专业卡的vgpu设备
屏幕也会有双屏
由于vgpu属于虚拟的,无法输出到物理显示器,所以应该通过远程协议访问。推荐使用parsec进行串流,但是parsec需要编码,如果你的显卡没有编码器,则不能用parsec,例如P106。
对于双屏,建议设置仅为vgpu屏幕。下面是通过系统内部的vnc(不是pve的novnc控制台),进行鲁大师跑分的截图。
再次强调!要玩主机游戏啥的,请使用串流软件。用rdp不行的。
九:排错
对于排错这部分,需要你掌握KVM知识、vgpu知识以及Linux基础。
如最开始所说,vgpu有2个服务。
可以通过2个命令查看nvidia-vgpu日志
journalctl -u nvidia-vgpud
journalctl -u nvidia-vgpu-mgr
如vgpu初始化部分
Apr 28 00:15:58 pve nvidia-vgpu-mgr[2534]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
#创建vgpu设备
Apr 28 00:20:17 pve nvidia-vgpu-mgr[2534]: VgpuStart {
uuid: {00000000-0000-0000-0000-000000003561},
config_params: "vgpu_type_id=46",
unknown_410: [75, 13, 0, 0, 0, 5, 0, 0, 1, 0, 0, 0, 0, 5, 0, 0],
}
#默认的vgpu配置
Apr 28 00:20:17 pve nvidia-vgpu-mgr[3528]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
...skipping...
num_displays: 4,
display_width: 5120,
display_height: 2880,
max_pixels: 17694720,
frl_config: 60,
cuda_enabled: 1,
ecc_supported: 1,
mig_instance_size: 0,
multi_vgpu_supported: 0,
pci_id: 0x1b3811e8,
pci_device_id: 0x1b38,
framebuffer: 0x38000000,
mappable_video_size: 0x400000,
framebuffer_reservation: 0x8000000,
encoder_capacity: 0x64,
bar1_length: 0x100,
blob: [71, 82, 73, 68, 32, 80, 52, 48, 45, 49, 81, 0, 96, 1, 0, 0, 8, 80, 244, 134, 2, 179, 255, 255, 0, 0, 0, 0, 96, 1, 0>
license_type: "NVIDIA RTX Virtual Workstation",
}
#读取/etc/vgpu_unlock/profile_override.toml,并覆写vgpu配置
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Applying profile nvidia-46 overrides
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/num_displays: 4 -> 1
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_width: 5120 -> 1920
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_height: 2880 -> 1080
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/max_pixels: 17694720 -> 2073600
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/cuda_enabled: 1 -> 1
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_id: 456659432 -> 472977896
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_device_id: 6968 -> 7217
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/frl_enabled: 1 -> 0
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0xa0810115 failed.
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Setting mappable_cpu_host_aperture to 10M
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): gpu-pci-id : 0x500
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Framebuffer: 0x38000000
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1c31:0x11e8
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## vGPU Manager Information: ########
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 460.73.01
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0x2080012f failed.
#在VM中获取vgpu信息
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Cannot query ECC status. vGPU ECC support will be disabled.
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vGPU migration disabled
Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: display_init inst: 0 successful
Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ########
Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 453.10
Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: vGPU version: 0x70001
Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Current max guest pfn = 0x17cd58!
lines 477-521/521 (END)
[/loginshow]
附:笔记本也支持
作者:佛西
链接:https://foxi.buduanwang.vip/virtualization/pve/1683.html/
文章版权归作者所有,未经允许请勿转载
如需获得支持,请点击网页右上角
a1067124839
ESchen
molian
SingleDee
chengengjian@SingleDee
hd123
Never
somgthing
佛西@somgthing
flywithjo
famcies
佛西@famcies
inadavid
佛西@inadavid
hrblzp
djj226