Discussion:
[win-pv-devel] Windows on Xen bad IO performance
Jakub Kulesza
2018-07-30 15:07:43 UTC
Permalink
I have a number of different hosts with different xen and windows versions,
but they all share the same thing. Each time I install xen windows pv
drivers 8.2.0 from here:
https://www.xenproject.org/developer...v-drivers.html I'm getting worse IO
performance than before, on standard Windows drivers.

This one setup for example:

- Host X5450, 8GB ram, samsung evo SSD
- ubuntu 18.04 lts, xen 4.9
- VM win 2016, 4GB ram, all CPU cores, LVM volume used as the VMs drive
(in all cases below).
- Xen without PV drivers:
- I'm getting about seq read 34 MB/s, seq write 34 MB/s, random seek +
rw 34 MB/s in Passmark
- Atto benchmark runs and provides so so results
- system is always usable
- Xen with PV drivers:
- I'm getting about seq read seq read 239 MB/s, seq write 242 MB/s,
random seek + rw 241 MB/s in Passmark
- Atto benchmark runs and after a few minutes halts the system, the
results are given below
- When the IO is saturated (or something else) the VM halts and takes
hours to complete tasks, like the atto benchmark
- KVM with signed drivers from Fedora:
- I'm getting about seq read 147 MB/s, seq write 187 MB/s, random seek +
rw 189 MB/s in Passmark
- Atto benchmark runs and provides so so results (so so but better
than xen with PV)
- system is always usable


I found out that I need to modify the gnttab_max_frames parameter to the
xen hypervisor at boottime. A lot of links and reading starts here:
https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B

I did some testing and I am very confused right now. The gnttab_max_frames
is by default 32 (increased to 64 in some xen version), and to solve the
issues i would need to set it higher to 256. The results I get seem to show
something totally different.

New test rig:

- ubuntu 18.04 LTS with everything from normal repositories, updated,
xen 4.9
- i5-8500, 16GB ram, Samsung 850 evo SSD,
- windows 2016 installed on a LVM volume,
- xen pv drivers 8.2.0 installed on Windows,
- logged to the VM using VNC from a laptop in the same local network.


I've tested this at a number of values of gnttab_max_frames from 4 to 4096.

Passmark provides consistent results at around 510 MB/s READ, 305 MB/s
WRITE, 330 MB/s Random ReadWrite, regardless of the setting of
gnttab_max_frames. I guess that it does not saturate the grant tables
mechanism of XEN that much. But with ATTO, the situation is sooo different.

- gnttab_max_frames = 4
- Windows is very snappy, responsive, even under heavy load from ATTO.
- Atto shows good results, with some signs of saturation with packets
bigger than 512KB.
- gnttab_max_frames = 10
- Windows is very snappy but stops being responsive, even under heavy
load from ATTO.
- Atto shows mediocre results, saturation is very high with packets
bigger than 512KB.
- gnttab_max_frames = 64
- You can feel that the windows windows open a little bit slower, system
feels dead with high load from ATTO.
- Atto shows bad results, saturation kills the system with packets
bigger than 512KB. System is getting back OK after ATTO finishes.
- gnttab_max_frames = 256
- Even worse than 64, the results show similarity to 64, but the system
just did not react. I fed up with waiting.
- gnttab_max_frames = 4096
- Windows did not boot. I just got fed up with waiting.


Atto screenshots are here, each has a caption saying at which
gnttab_max_frames setting is was taken. A comment, if you do ATTO benchmark
on a normal drive (or old Xen with GPLPV drivers on windows 2008) you get
stable results from 64KB up, bars don't get shorter. Shorter bars at larger
packets mean that the IO queue gets saturated or there is some IO usage
going on elsewhere - I made sure it does not happen in these tests.

Screenshots: https://imgur.com/gallery/aUPSsCo

To sum up:

- Windows behaves better when I reduce gnttab_max_frames. Quite the
opposite to what the internet is saying.
- What did I do wrong?




--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-07-31 07:51:55 UTC
Permalink
De-htmling... Responses below...

-----
From: win-pv-devel [mailto:win-pv-devel-***@lists.xenproject.org] On Behalf Of Jakub Kulesza
Sent: 30 July 2018 16:08
To: win-pv-***@lists.xenproject.org
Subject: [win-pv-devel] Windows on Xen bad IO performance

I have a number of different hosts with different xen and windows versions, but they all share the same thing. Each time I install xen windows pv drivers 8.2.0 from here: https://www.xenproject.org/developer...v-drivers.html I'm getting worse IO performance than before, on standard Windows drivers.

This one setup for example:
• Host X5450, 8GB ram, samsung evo SSD
• ubuntu 18.04 lts, xen 4.9
• VM win 2016, 4GB ram, all CPU cores, LVM volume used as the VMs drive (in all cases below).
• Xen without PV drivers:
o I'm getting about seq read 34 MB/s, seq write 34 MB/s, random seek + rw 34 MB/s in Passmark
o Atto benchmark runs and provides so so results
o system is always usable
• Xen with PV drivers:
o I'm getting about seq read seq read 239 MB/s, seq write 242 MB/s, random seek + rw 241 MB/s in Passmark
o Atto benchmark runs and after a few minutes halts the system, the results are given below
o When the IO is saturated (or something else) the VM halts and takes hours to complete tasks, like the atto benchmark
• KVM with signed drivers from Fedora:
o I'm getting about seq read 147 MB/s, seq write 187 MB/s, random seek + rw 189 MB/s in Passmark
o Atto benchmark runs and provides so so results (so so but better than xen with PV)
o system is always usable

I found out that I need to modify the gnttab_max_frames parameter to the xen hypervisor at boottime. A lot of links and reading starts here: https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B

I did some testing and I am very confused right now. The gnttab_max_frames is by default 32 (increased to 64 in some xen version), and to solve the issues i would need to set it higher to 256. The results I get seem to show something totally different.

New test rig:
• ubuntu 18.04 LTS with everything from normal repositories, updated, xen 4.9
• i5-8500, 16GB ram, Samsung 850 evo SSD,
• windows 2016 installed on a LVM volume,
• xen pv drivers 8.2.0 installed on Windows,
• logged to the VM using VNC from a laptop in the same local network.

I've tested this at a number of values of gnttab_max_frames from 4 to 4096.

Passmark provides consistent results at around 510 MB/s READ, 305 MB/s WRITE, 330 MB/s Random ReadWrite, regardless of the setting of gnttab_max_frames. I guess that it does not saturate the grant tables mechanism of XEN that much. But with ATTO, the situation is sooo different.
• gnttab_max_frames = 4 
o Windows is very snappy, responsive, even under heavy load from ATTO.
o Atto shows good results, with some signs of saturation with packets bigger than 512KB.
• gnttab_max_frames = 10 
o Windows is very snappy but stops being responsive, even under heavy load from ATTO.
o Atto shows mediocre results, saturation is very high with packets bigger than 512KB.
• gnttab_max_frames = 64
o You can feel that the windows windows open a little bit slower, system feels dead with high load from ATTO.
o Atto shows bad results, saturation kills the system with packets bigger than 512KB. System is getting back OK after ATTO finishes.
• gnttab_max_frames = 256
o Even worse than 64, the results show similarity to 64, but the system just did not react. I fed up with waiting. 
• gnttab_max_frames = 4096 
o Windows did not boot. I just got fed up with waiting.

Atto screenshots are here, each has a caption saying at which gnttab_max_frames setting is was taken. A comment, if you do ATTO benchmark on a normal drive (or old Xen with GPLPV drivers on windows 2008) you get stable results from 64KB up, bars don't get shorter. Shorter bars at larger packets mean that the IO queue gets saturated or there is some IO usage going on elsewhere - I made sure it does not happen in these tests. 

Screenshots: https://imgur.com/gallery/aUPSsCo

To sum up:
• Windows behaves better when I reduce gnttab_max_frames. Quite the opposite to what the internet is saying.
• What did I do wrong?
-----

As discussed on IRC, it would be useful if you tried the 8.2.2 drivers and also highly useful if you could capture logging from QEMU.

One other thing that occurs to me is that XENVBD implements indirect granting but this is relatively under tested because the only backend that implements it is blkback, and we don't use that in XenServer. Whilst is may be slower overall, you might get more stability using QEMU qdisk. (We have a couple of performance fixes for this in the pipeline in Citrix as we are now starting to use it as our default backend, but it should be reasonable as-is).

Paul
--
Pozdrawiam
Jakub Kulesza
Jakub Kulesza
2018-07-31 09:01:30 UTC
Permalink
Post by Paul Durrant
De-htmling... Responses below...
-----
Sent: 30 July 2018 16:08
Subject: [win-pv-devel] Windows on Xen bad IO performance
I have a number of different hosts with different xen and windows versions, but they all share the same thing. Each time I install xen windows pv drivers 8.2.0 from here: https://www.xenproject.org/developer...v-drivers.html I'm getting worse IO performance than before, on standard Windows drivers.
[cut]
Post by Paul Durrant
I found out that I need to modify the gnttab_max_frames parameter to the xen hypervisor at boottime. A lot of links and reading starts here: https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B
I did some testing and I am very confused right now. The gnttab_max_frames is by default 32 (increased to 64 in some xen version), and to solve the issues i would need to set it higher to 256. The results I get seem to show something totally different.
• ubuntu 18.04 LTS with everything from normal repositories, updated, xen 4.9
• i5-8500, 16GB ram, Samsung 850 evo SSD,
• windows 2016 installed on a LVM volume,
• xen pv drivers 8.2.0 installed on Windows,
• logged to the VM using VNC from a laptop in the same local network.
I've tested this at a number of values of gnttab_max_frames from 4 to 4096.
Passmark provides consistent results at around 510 MB/s READ, 305 MB/s WRITE, 330 MB/s Random ReadWrite, regardless of the setting of gnttab_max_frames. I guess that it does not saturate the grant tables mechanism of XEN that much. But with ATTO, the situation is sooo different.
• gnttab_max_frames = 4
o Windows is very snappy, responsive, even under heavy load from ATTO.
o Atto shows good results, with some signs of saturation with packets bigger than 512KB.
• gnttab_max_frames = 10
o Windows is very snappy but stops being responsive, even under heavy load from ATTO.
o Atto shows mediocre results, saturation is very high with packets bigger than 512KB.
• gnttab_max_frames = 64
o You can feel that the windows windows open a little bit slower, system feels dead with high load from ATTO.
o Atto shows bad results, saturation kills the system with packets bigger than 512KB. System is getting back OK after ATTO finishes.
• gnttab_max_frames = 256
o Even worse than 64, the results show similarity to 64, but the system just did not react. I fed up with waiting.
• gnttab_max_frames = 4096
o Windows did not boot. I just got fed up with waiting.
[cut]
Post by Paul Durrant
As discussed on IRC, it would be useful if you tried the 8.2.2 drivers and also highly useful if you could capture logging from QEMU.
One other thing that occurs to me is that XENVBD implements indirect granting but this is relatively under tested because the only backend that implements it is blkback, and we don't use that in XenServer. Whilst is may be slower overall, you might get more stability using QEMU qdisk. (We have a couple of performance fixes for this in the pipeline in Citrix as we are now starting to use it as our default backend, but it should be reasonable as-is).
Paul
I did test 8.2.2 PV drivers. Did not managed to get QEMU logging thou.
Will read more and retry.

Results on the i5-8500 rig - everything set the same as in the tests
mentioned above:

https://imgur.com/gallery/PTm5f4G

gnttab_max_frames = 4:
no signs or very little signs of saturation, everything is flying,
scores are better than with 8.2.0

gnttab_max_frames = default for ubuntu 18.04 (so 32 or 64)
saturation, system goes unresponsive, as bad as before

gnttab_max_frames = 256
saturation, system goes unresponsive, as bad as before

Passmark shows better results on all gnttab_max_frames settings:
Read: 514-515 (same as 8.2.0)
Write: 477 (better!)
Random ReadWrite: 300-360 (same as 8.2.0)

Is this behaviour (lowering max frames to get better results) working
as expected?

How low should I NOT go with max_frames?

Does XenServer recommend any windows guest drivers if used with qemu backend?
--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-07-31 09:44:10 UTC
Permalink
-----Original Message-----
Behalf Of Jakub Kulesza
Sent: 31 July 2018 10:02
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
Post by Paul Durrant
De-htmling... Responses below...
-----
On Behalf Of Jakub Kulesza
Post by Paul Durrant
Sent: 30 July 2018 16:08
Subject: [win-pv-devel] Windows on Xen bad IO performance
I have a number of different hosts with different xen and windows
versions, but they all share the same thing. Each time I install xen windows pv
drivers 8.2.0 from here: https://www.xenproject.org/developer...v-
drivers.html I'm getting worse IO performance than before, on standard
Windows drivers.
[cut]
Post by Paul Durrant
I found out that I need to modify the gnttab_max_frames parameter to
https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B
Post by Paul Durrant
I did some testing and I am very confused right now. The
gnttab_max_frames is by default 32 (increased to 64 in some xen version),
and to solve the issues i would need to set it higher to 256. The results I get
seem to show something totally different.
Post by Paul Durrant
• ubuntu 18.04 LTS with everything from normal repositories, updated, xen
4.9
Post by Paul Durrant
• i5-8500, 16GB ram, Samsung 850 evo SSD,
• windows 2016 installed on a LVM volume,
• xen pv drivers 8.2.0 installed on Windows,
• logged to the VM using VNC from a laptop in the same local network.
I've tested this at a number of values of gnttab_max_frames from 4 to
4096.
Post by Paul Durrant
Passmark provides consistent results at around 510 MB/s READ, 305 MB/s
WRITE, 330 MB/s Random ReadWrite, regardless of the setting of
gnttab_max_frames. I guess that it does not saturate the grant tables
mechanism of XEN that much. But with ATTO, the situation is sooo different.
Post by Paul Durrant
• gnttab_max_frames = 4
o Windows is very snappy, responsive, even under heavy load from ATTO.
o Atto shows good results, with some signs of saturation with packets
bigger than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 10
o Windows is very snappy but stops being responsive, even under heavy
load from ATTO.
Post by Paul Durrant
o Atto shows mediocre results, saturation is very high with packets bigger
than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 64
o You can feel that the windows windows open a little bit slower, system
feels dead with high load from ATTO.
Post by Paul Durrant
o Atto shows bad results, saturation kills the system with packets bigger
than 512KB. System is getting back OK after ATTO finishes.
Post by Paul Durrant
• gnttab_max_frames = 256
o Even worse than 64, the results show similarity to 64, but the system just
did not react. I fed up with waiting.
Post by Paul Durrant
• gnttab_max_frames = 4096
o Windows did not boot. I just got fed up with waiting.
[cut]
Post by Paul Durrant
As discussed on IRC, it would be useful if you tried the 8.2.2 drivers and also
highly useful if you could capture logging from QEMU.
Post by Paul Durrant
One other thing that occurs to me is that XENVBD implements indirect
granting but this is relatively under tested because the only backend that
implements it is blkback, and we don't use that in XenServer. Whilst is may
be slower overall, you might get more stability using QEMU qdisk. (We have a
couple of performance fixes for this in the pipeline in Citrix as we are now
starting to use it as our default backend, but it should be reasonable as-is).
Post by Paul Durrant
Paul
I did test 8.2.2 PV drivers. Did not managed to get QEMU logging thou.
Will read more and retry.
Results on the i5-8500 rig - everything set the same as in the tests
https://imgur.com/gallery/PTm5f4G
no signs or very little signs of saturation, everything is flying,
scores are better than with 8.2.0
gnttab_max_frames = default for ubuntu 18.04 (so 32 or 64)
saturation, system goes unresponsive, as bad as before
gnttab_max_frames = 256
saturation, system goes unresponsive, as bad as before
Read: 514-515 (same as 8.2.0)
Write: 477 (better!)
Random ReadWrite: 300-360 (same as 8.2.0)
Is this behaviour (lowering max frames to get better results) working
as expected?
How low should I NOT go with max_frames?
In general you should not be lowering it from the default. The only thing that will achieve is starving the guest frontend of grants. If it has having a positive impact then that indicates a problem with the frontend.
Does XenServer recommend any windows guest drivers if used with qemu backend?
XenServer is basically using 8.2.1 plus some branding and workaround patches. We're likely to move to an 8.2.2 XENVBD though.

Paul
--
Pozdrawiam
Jakub Kulesza
_______________________________________________
win-pv-devel mailing list
https://lists.xenproject.org/mailman/listinfo/win-pv-devel
Jakub Kulesza
2018-09-27 22:07:31 UTC
Permalink
OK, so I did some more tests.

The testbed:
* dom0 Debian Stretch with Xen 4.8.4
* 4 core 2,66GHz, 20 GB ram
* 4 spinny disks raid5 on hardware controller, dd tested reads about
77MB/s, writes 58MB/s
* domU windows2016, domU config with qemu logging enabled:
https://pastebin.com/g8ddMVbV
* gnttab_max_frames left at default

Test procedure:
* install windows 2016
* bcdedit /set testsigning on
* reboot (and create a snapshot, drivers installed on snapshot version
of windows)
* install pv drivers
* reboot
* get Atto 3.05
* Atto all on default, except testing drive "d" (plain LVM, no
snapshot) and setting queue length to 10.

* qemu log from install to running the atto below (drivers installed:
the latest): https://pastebin.com/C1TasWtn

I think that Atto is quite a good indicator of how a heavy used server
will behave, as we have the same symptoms on another host with windows
2016 on a domU with heavy used MSSQL database.

== testing the latest drivers as of 2018-09-27 from
http://xenbits.xen.org/pvdrivers/win/

Atto test run in qemu log: https://pastebin.com/saq3N6PH
screenshot: https://imgur.com/gallery/ouTQo7b
The test takes a few minutes

What is wrong:
* notice the flat areas on the HDD graphs? This is when the system
behaves unresponsive. It recovers, quite quickly, but the problem is
there.
* Read and Writes should not fall so low on 128KB packets. 128KB
should be at the level of 16, 32 and 64KB and continue onwards on the
same level.

What is better from earlier experiments
* the latest drivers do not make the system go nuts for minutes after
atto is finished, but it kinda is useable during the test.

== testing pv drivers 8.2.0 (latest signed)

For this I did create another snapshot of the system, so I can install
the drivers on a fresh windows, that had no previous version of the
drivers.

Atto test run in qemu log: https://pastebin.com/9PauBcUK
screenshot with results: https://imgur.com/gallery/HC2aSiW
the test takes about an hour (!) and some 20-30 minutes to settle down.

What is wrong:
* system responsiveness in way worse than with the latest ones,
unusable. SQL server would refuse to serve queries with such IO waits.

What is different in the qemu logs is this:

***@1538082446.673267:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (8346 > 32)
***@1538082447.752598:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (1061 > 32)
***@1538082449.768223:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (1700 > 32)
***@1538082462.879887:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (2898 > 32)
***@1538082464.009918:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (5157 > 32)
***@1538082465.066077:xen_platform_log xen platform:
XENVBD|__BufferReaperThread:Reaping Buffers (966 > 32)

Reaping buffers does not happen with the latest drivers.

== questions:

* so you guys must have done something in the right direction since
8.2.0. BRAVO.
* what is the expected write and read speed on a harware that can
deliver (measured with dd) reads at about 77MB/s, and writes 58MB/s.
* do you guys plan to improve something more? How can I help to test
and debug it?
* when are you planning to have a next signed release?
* how come Atto in a domU is getting better reads and writes than
hardware for some packet sizes? Wouldn't it be wise to disable these
caches and allow linux in dom0 (and it's kernel) to handle I/O of all
VMs?


Best regards, Jakub Kulesza
Post by Paul Durrant
-----Original Message-----
Behalf Of Jakub Kulesza
Sent: 31 July 2018 10:02
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
Post by Paul Durrant
De-htmling... Responses below...
-----
On Behalf Of Jakub Kulesza
Post by Paul Durrant
Sent: 30 July 2018 16:08
Subject: [win-pv-devel] Windows on Xen bad IO performance
I have a number of different hosts with different xen and windows
versions, but they all share the same thing. Each time I install xen windows pv
drivers 8.2.0 from here: https://www.xenproject.org/developer...v-
drivers.html I'm getting worse IO performance than before, on standard
Windows drivers.
[cut]
Post by Paul Durrant
I found out that I need to modify the gnttab_max_frames parameter to
https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B
Post by Paul Durrant
I did some testing and I am very confused right now. The
gnttab_max_frames is by default 32 (increased to 64 in some xen version),
and to solve the issues i would need to set it higher to 256. The results I get
seem to show something totally different.
Post by Paul Durrant
• ubuntu 18.04 LTS with everything from normal repositories, updated, xen
4.9
Post by Paul Durrant
• i5-8500, 16GB ram, Samsung 850 evo SSD,
• windows 2016 installed on a LVM volume,
• xen pv drivers 8.2.0 installed on Windows,
• logged to the VM using VNC from a laptop in the same local network.
I've tested this at a number of values of gnttab_max_frames from 4 to
4096.
Post by Paul Durrant
Passmark provides consistent results at around 510 MB/s READ, 305 MB/s
WRITE, 330 MB/s Random ReadWrite, regardless of the setting of
gnttab_max_frames. I guess that it does not saturate the grant tables
mechanism of XEN that much. But with ATTO, the situation is sooo different.
Post by Paul Durrant
• gnttab_max_frames = 4
o Windows is very snappy, responsive, even under heavy load from ATTO.
o Atto shows good results, with some signs of saturation with packets
bigger than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 10
o Windows is very snappy but stops being responsive, even under heavy
load from ATTO.
Post by Paul Durrant
o Atto shows mediocre results, saturation is very high with packets bigger
than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 64
o You can feel that the windows windows open a little bit slower, system
feels dead with high load from ATTO.
Post by Paul Durrant
o Atto shows bad results, saturation kills the system with packets bigger
than 512KB. System is getting back OK after ATTO finishes.
Post by Paul Durrant
• gnttab_max_frames = 256
o Even worse than 64, the results show similarity to 64, but the system just
did not react. I fed up with waiting.
Post by Paul Durrant
• gnttab_max_frames = 4096
o Windows did not boot. I just got fed up with waiting.
[cut]
Post by Paul Durrant
As discussed on IRC, it would be useful if you tried the 8.2.2 drivers and also
highly useful if you could capture logging from QEMU.
Post by Paul Durrant
One other thing that occurs to me is that XENVBD implements indirect
granting but this is relatively under tested because the only backend that
implements it is blkback, and we don't use that in XenServer. Whilst is may
be slower overall, you might get more stability using QEMU qdisk. (We have a
couple of performance fixes for this in the pipeline in Citrix as we are now
starting to use it as our default backend, but it should be reasonable as-is).
Post by Paul Durrant
Paul
I did test 8.2.2 PV drivers. Did not managed to get QEMU logging thou.
Will read more and retry.
Results on the i5-8500 rig - everything set the same as in the tests
https://imgur.com/gallery/PTm5f4G
no signs or very little signs of saturation, everything is flying,
scores are better than with 8.2.0
gnttab_max_frames = default for ubuntu 18.04 (so 32 or 64)
saturation, system goes unresponsive, as bad as before
gnttab_max_frames = 256
saturation, system goes unresponsive, as bad as before
Read: 514-515 (same as 8.2.0)
Write: 477 (better!)
Random ReadWrite: 300-360 (same as 8.2.0)
Is this behaviour (lowering max frames to get better results) working
as expected?
How low should I NOT go with max_frames?
In general you should not be lowering it from the default. The only thing that will achieve is starving the guest frontend of grants. If it has having a positive impact then that indicates a problem with the frontend.
Does XenServer recommend any windows guest drivers if used with qemu backend?
XenServer is basically using 8.2.1 plus some branding and workaround patches. We're likely to move to an 8.2.2 XENVBD though.
Paul
--
Pozdrawiam
Jakub Kulesza
_______________________________________________
win-pv-devel mailing list
https://lists.xenproject.org/mailman/listinfo/win-pv-devel
--
Pozdrawiam
Jakub Kulesza
Jakub Kulesza
2018-09-28 07:46:45 UTC
Permalink
I'll do a shorter summary, as my English was not precise enough last
night I guess:

* 8.2.0 drivers:
- Atto test took over an hour to complete
- the VM lost keyboard and responsiveness after Atto finished
- VM was unresponsive during the test, the results were bad
- configs, logs, results given below
- you could see the following in logs
XENVBD|__BufferReaperThread:Reaping Buffers (1700 > 32), absent in the
test of the latest drivers
* latest 2018-09-27 drivers
- Atto test took a few minutes to complete
- VM was responsive. It had it's hickups, was not as good as a bare
metal system, but I guess that right now this is on par with VMware.
KVM with virtio behaves better.
- configs, logs, results given below
- results were much better than non-PV-drivers version. Atto did
show that something saturates in the pipeline, but not very heavily.

So how can I help to get the latest patches released as a new version,
signed? Or is there something more on the road planned in terms of
code changes? I can help testing definitely.

Does it still make sense to change gnttab_max_frames with the latest
changes in the PV drivers?
Post by Jakub Kulesza
OK, so I did some more tests.
* dom0 Debian Stretch with Xen 4.8.4
* 4 core 2,66GHz, 20 GB ram
* 4 spinny disks raid5 on hardware controller, dd tested reads about
77MB/s, writes 58MB/s
https://pastebin.com/g8ddMVbV
* gnttab_max_frames left at default
* install windows 2016
* bcdedit /set testsigning on
* reboot (and create a snapshot, drivers installed on snapshot version
of windows)
* install pv drivers
* reboot
* get Atto 3.05
* Atto all on default, except testing drive "d" (plain LVM, no
snapshot) and setting queue length to 10.
the latest): https://pastebin.com/C1TasWtn
I think that Atto is quite a good indicator of how a heavy used server
will behave, as we have the same symptoms on another host with windows
2016 on a domU with heavy used MSSQL database.
== testing the latest drivers as of 2018-09-27 from
http://xenbits.xen.org/pvdrivers/win/
Atto test run in qemu log: https://pastebin.com/saq3N6PH
screenshot: https://imgur.com/gallery/ouTQo7b
The test takes a few minutes
* notice the flat areas on the HDD graphs? This is when the system
behaves unresponsive. It recovers, quite quickly, but the problem is
there.
* Read and Writes should not fall so low on 128KB packets. 128KB
should be at the level of 16, 32 and 64KB and continue onwards on the
same level.
What is better from earlier experiments
* the latest drivers do not make the system go nuts for minutes after
atto is finished, but it kinda is useable during the test.
== testing pv drivers 8.2.0 (latest signed)
For this I did create another snapshot of the system, so I can install
the drivers on a fresh windows, that had no previous version of the
drivers.
Atto test run in qemu log: https://pastebin.com/9PauBcUK
screenshot with results: https://imgur.com/gallery/HC2aSiW
the test takes about an hour (!) and some 20-30 minutes to settle down.
* system responsiveness in way worse than with the latest ones,
unusable. SQL server would refuse to serve queries with such IO waits.
XENVBD|__BufferReaperThread:Reaping Buffers (8346 > 32)
XENVBD|__BufferReaperThread:Reaping Buffers (1061 > 32)
XENVBD|__BufferReaperThread:Reaping Buffers (1700 > 32)
XENVBD|__BufferReaperThread:Reaping Buffers (2898 > 32)
XENVBD|__BufferReaperThread:Reaping Buffers (5157 > 32)
XENVBD|__BufferReaperThread:Reaping Buffers (966 > 32)
Reaping buffers does not happen with the latest drivers.
* so you guys must have done something in the right direction since
8.2.0. BRAVO.
* what is the expected write and read speed on a harware that can
deliver (measured with dd) reads at about 77MB/s, and writes 58MB/s.
* do you guys plan to improve something more? How can I help to test
and debug it?
* when are you planning to have a next signed release?
* how come Atto in a domU is getting better reads and writes than
hardware for some packet sizes? Wouldn't it be wise to disable these
caches and allow linux in dom0 (and it's kernel) to handle I/O of all
VMs?
Best regards, Jakub Kulesza
Post by Paul Durrant
-----Original Message-----
Behalf Of Jakub Kulesza
Sent: 31 July 2018 10:02
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
Post by Paul Durrant
De-htmling... Responses below...
-----
On Behalf Of Jakub Kulesza
Post by Paul Durrant
Sent: 30 July 2018 16:08
Subject: [win-pv-devel] Windows on Xen bad IO performance
I have a number of different hosts with different xen and windows
versions, but they all share the same thing. Each time I install xen windows pv
drivers 8.2.0 from here: https://www.xenproject.org/developer...v-
drivers.html I'm getting worse IO performance than before, on standard
Windows drivers.
[cut]
Post by Paul Durrant
I found out that I need to modify the gnttab_max_frames parameter to
https://wiki.gentoo.org/wiki/Xen#Xen..._kernel_4.3.2B
Post by Paul Durrant
I did some testing and I am very confused right now. The
gnttab_max_frames is by default 32 (increased to 64 in some xen version),
and to solve the issues i would need to set it higher to 256. The results I get
seem to show something totally different.
Post by Paul Durrant
• ubuntu 18.04 LTS with everything from normal repositories, updated, xen
4.9
Post by Paul Durrant
• i5-8500, 16GB ram, Samsung 850 evo SSD,
• windows 2016 installed on a LVM volume,
• xen pv drivers 8.2.0 installed on Windows,
• logged to the VM using VNC from a laptop in the same local network.
I've tested this at a number of values of gnttab_max_frames from 4 to
4096.
Post by Paul Durrant
Passmark provides consistent results at around 510 MB/s READ, 305 MB/s
WRITE, 330 MB/s Random ReadWrite, regardless of the setting of
gnttab_max_frames. I guess that it does not saturate the grant tables
mechanism of XEN that much. But with ATTO, the situation is sooo different.
Post by Paul Durrant
• gnttab_max_frames = 4
o Windows is very snappy, responsive, even under heavy load from ATTO.
o Atto shows good results, with some signs of saturation with packets
bigger than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 10
o Windows is very snappy but stops being responsive, even under heavy
load from ATTO.
Post by Paul Durrant
o Atto shows mediocre results, saturation is very high with packets bigger
than 512KB.
Post by Paul Durrant
• gnttab_max_frames = 64
o You can feel that the windows windows open a little bit slower, system
feels dead with high load from ATTO.
Post by Paul Durrant
o Atto shows bad results, saturation kills the system with packets bigger
than 512KB. System is getting back OK after ATTO finishes.
Post by Paul Durrant
• gnttab_max_frames = 256
o Even worse than 64, the results show similarity to 64, but the system just
did not react. I fed up with waiting.
Post by Paul Durrant
• gnttab_max_frames = 4096
o Windows did not boot. I just got fed up with waiting.
[cut]
Post by Paul Durrant
As discussed on IRC, it would be useful if you tried the 8.2.2 drivers and also
highly useful if you could capture logging from QEMU.
Post by Paul Durrant
One other thing that occurs to me is that XENVBD implements indirect
granting but this is relatively under tested because the only backend that
implements it is blkback, and we don't use that in XenServer. Whilst is may
be slower overall, you might get more stability using QEMU qdisk. (We have a
couple of performance fixes for this in the pipeline in Citrix as we are now
starting to use it as our default backend, but it should be reasonable as-is).
Post by Paul Durrant
Paul
I did test 8.2.2 PV drivers. Did not managed to get QEMU logging thou.
Will read more and retry.
Results on the i5-8500 rig - everything set the same as in the tests
https://imgur.com/gallery/PTm5f4G
no signs or very little signs of saturation, everything is flying,
scores are better than with 8.2.0
gnttab_max_frames = default for ubuntu 18.04 (so 32 or 64)
saturation, system goes unresponsive, as bad as before
gnttab_max_frames = 256
saturation, system goes unresponsive, as bad as before
Read: 514-515 (same as 8.2.0)
Write: 477 (better!)
Random ReadWrite: 300-360 (same as 8.2.0)
Is this behaviour (lowering max frames to get better results) working
as expected?
How low should I NOT go with max_frames?
In general you should not be lowering it from the default. The only thing that will achieve is starving the guest frontend of grants. If it has having a positive impact then that indicates a problem with the frontend.
Does XenServer recommend any windows guest drivers if used with qemu backend?
XenServer is basically using 8.2.1 plus some branding and workaround patches. We're likely to move to an 8.2.2 XENVBD though.
Paul
--
Pozdrawiam
Jakub Kulesza
_______________________________________________
win-pv-devel mailing list
https://lists.xenproject.org/mailman/listinfo/win-pv-devel
--
Pozdrawiam
Jakub Kulesza
--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-09-28 08:46:20 UTC
Permalink
This post might be inappropriate. Click to display it.
Jakub Kulesza
2018-09-28 11:03:36 UTC
Permalink
pt., 28 wrz 2018 o 10:46 Paul Durrant <***@citrix.com> napisał(a):
[cut]
Post by Paul Durrant
Thanks for the very detailed analysis!
Actually 8.2.1 are the latest signed drivers.
Retesting this again on the same testbed. Results are exactly the same
as in case of 8.2.0.

[cut]
Post by Paul Durrant
I notice from your QEMU log that you are suffering grant table exhaustion. See line 142 onwards. This will *severly* affect the performance so I suggest you expand your grant table. You'll still see the buffer reaping, but the perf. should be better.
I have compared gnttab_max_frames 32 and 128. Results:

== pv drivers 8.2.1, gnttab_max_frames=32 (debian 9 default, same
testbed as last tests)
Atto results: https://imgur.com/gallery/ElSwBqM
responsiveness: a tad better than 8.2.0, and the big package graph
shows this. IO saturation and dead IO graphs are still there. It's
better and by a margin more responsive than 8.2.0. Responsiveness
recovers instantly after Atto is done. Still bad, but better.
After atto is done, Xen's VNC has lost it's mouse. Keyboard works. Funny.
XENVBD|__BufferReaperThread:Reaping Buffers is there in the logs

== pv drivers 8.2.1, gnttab_max_frames=128 (same testbed as last tests)
Atto results: https://imgur.com/gallery/7x8k2RS
responsiveness: Up to atto transfer sizes of 12MB, cannot say if it's
different. IO saturation and dead IO graphs are still there. When it
started testing 16MB read, suddenly everything got unblocked like
magic. I need to do more testing. This looks unreal.
After atto is done, mouse did not get lost :)

XENVBD|__BufferReaperThread:Reaping Buffers (2305 > 32) is there in the logs.

# xl dmesg | grep mem | head -n 1
(XEN) Command line: placeholder dom0_mem=4096M gnttab_max_frames=128

I would say that in case of Atto (that is REALLY IO heavy) there is
very marginal impact. On the other hand I see that SQL Server
workloads benefit from changing gnttab_max_frames.

Side note, what does this actually mean:
***@1538131510.689960:xen_platform_log xen platform:
XENBUS|GnttabExpand: added references [00003a00 - 00003bff]
***@1538131512.359271:xen_platform_log xen platform:
XENBUS|RangeSetPop: fail1 (c000009a)


[cut]
Post by Paul Durrant
Post by Jakub Kulesza
XENVBD|__BufferReaperThread:Reaping Buffers (966 > 32)
Reaping buffers does not happen with the latest drivers.
The fact that you are clearly seeing a lot of buffer is interesting in itself. The buffer code is there to provide memory for bouncing SRBs when the storage stack fails to honour the minimum 512 byte sector alignment needed by the blkif protocol. These messages indicate that atto is not honouring that alignment.
Maybe Atto is not, but so is MS SQL. This is visible when testing with
Atto on both 8.2.1 and 8.2.0, not visible on 9.0-dev-20180927. The
9.0-dev is getting lower results with smaller packet sizes, but stable
and working across the Atto test.
Post by Paul Durrant
Post by Jakub Kulesza
* so you guys must have done something in the right direction since
8.2.0. BRAVO.
The master branch has a lot of re-work and the buffering code is one of the places that was modified. It now uses a XENBUS_CACHE to acquire bounce buffers and these caches do not reap in the same way. The cache code uses a slab allocator and this simply frees slabs when all the contained objects become unreferenced. The bounce objects are quite small and thus, with enough alloc/free interleaving, it's probably quite likely that the cache will remain hot so little slab freeing or allocation will actually be happening so the bounce buffer allocation and freeing overhead will be very small.
Also the master branch should default to a single (or maybe 2?) page ring, even if the backend can do 16 whereas all the 8.2.X drivers will use all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a single
page ring? max_ring_page_order on xen_blkback in dom0?
Post by Paul Durrant
Post by Jakub Kulesza
* what is the expected write and read speed on a harware that can
deliver (measured with dd) reads at about 77MB/s, and writes 58MB/s.
* do you guys plan to improve something more? How can I help to test
and debug it?
* when are you planning to have a next signed release?
All the real improvements are all in master (not even in the as-yet-unsigned 8.2.2), so maybe we're nearing the point where a 9.0.0 release makes sense. This means we need to start doing fill logo kit runs on all the drivers to shake out any weird bugs or compatibility problems, which takes quite a bit of effort so I'm not sure how soon we'll get to that. Hopefully within a few months though.
You could try setting up a logo kit yourself and try testing XENVBD to see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?

Is there an acceptance testplan that should be run?

Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their customers
running better :)? Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par with
the big competitor.

Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
Post by Paul Durrant
Post by Jakub Kulesza
* how come Atto in a domU is getting better reads and writes than
hardware for some packet sizes? Wouldn't it be wise to disable these
caches and allow linux in dom0 (and it's kernel) to handle I/O of all
VMs?
We have no caching internally in XENVBD. The use of the XENBUS_CACHE objects is merely for bouncing so any real caching of data will be going on in the Windows storage stack, over which we don't have much control, or in your dom0 kernel.
ACK.


[cut]


--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-09-28 12:00:20 UTC
Permalink
-----Original Message-----
Sent: 28 September 2018 12:04
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
[cut]
Post by Paul Durrant
Thanks for the very detailed analysis!
Actually 8.2.1 are the latest signed drivers.
Retesting this again on the same testbed. Results are exactly the same
as in case of 8.2.0.
[cut]
Post by Paul Durrant
I notice from your QEMU log that you are suffering grant table
exhaustion. See line 142 onwards. This will *severly* affect the
performance so I suggest you expand your grant table. You'll still see the
buffer reaping, but the perf. should be better.
== pv drivers 8.2.1, gnttab_max_frames=32 (debian 9 default, same
testbed as last tests)
Atto results: https://imgur.com/gallery/ElSwBqM
responsiveness: a tad better than 8.2.0, and the big package graph
shows this. IO saturation and dead IO graphs are still there. It's
better and by a margin more responsive than 8.2.0. Responsiveness
recovers instantly after Atto is done. Still bad, but better.
After atto is done, Xen's VNC has lost it's mouse. Keyboard works. Funny.
XENVBD|__BufferReaperThread:Reaping Buffers is there in the logs
== pv drivers 8.2.1, gnttab_max_frames=128 (same testbed as last tests)
Atto results: https://imgur.com/gallery/7x8k2RS
responsiveness: Up to atto transfer sizes of 12MB, cannot say if it's
different. IO saturation and dead IO graphs are still there. When it
started testing 16MB read, suddenly everything got unblocked like
magic. I need to do more testing. This looks unreal.
After atto is done, mouse did not get lost :)
XENVBD|__BufferReaperThread:Reaping Buffers (2305 > 32) is there in the logs.
At 16MB I suspect things suddenly became aligned and so all the bouncing stopped. This all the log spam ceased and things got a lot more stable.
# xl dmesg | grep mem | head -n 1
(XEN) Command line: placeholder dom0_mem=4096M gnttab_max_frames=128
I would say that in case of Atto (that is REALLY IO heavy) there is
very marginal impact. On the other hand I see that SQL Server
workloads benefit from changing gnttab_max_frames.
XENBUS|GnttabExpand: added references [00003a00 - 00003bff]
XENBUS|RangeSetPop: fail1 (c000009a)
Logically these messages should be read the other way round (I expect there was another GnttabExpand after that RangeSetPop).

When a new grant table page is added (by GnttabExpand) a new set of refs (in this case from 3a00 to 3bff) becomes available. These are added into the XENBUS_RANGE_SET used by the XENBUS_GNTTAB code. When something wants to allocate a ref then RangeSetPop is called to get an available ref. When that call fails it means the range set is empty and so a new page needs to be added, so GnttabExpand is called again to do that.
[cut]
Post by Paul Durrant
Post by Jakub Kulesza
XENVBD|__BufferReaperThread:Reaping Buffers (966 > 32)
Reaping buffers does not happen with the latest drivers.
The fact that you are clearly seeing a lot of buffer is interesting in
itself. The buffer code is there to provide memory for bouncing SRBs when
the storage stack fails to honour the minimum 512 byte sector alignment
needed by the blkif protocol. These messages indicate that atto is not
honouring that alignment.
Maybe Atto is not, but so is MS SQL. This is visible when testing with
Atto on both 8.2.1 and 8.2.0, not visible on 9.0-dev-20180927. The
9.0-dev is getting lower results with smaller packet sizes, but stable
and working across the Atto test.
Post by Paul Durrant
Post by Jakub Kulesza
* so you guys must have done something in the right direction since
8.2.0. BRAVO.
The master branch has a lot of re-work and the buffering code is one
of the places that was modified. It now uses a XENBUS_CACHE to acquire
bounce buffers and these caches do not reap in the same way. The cache
code uses a slab allocator and this simply frees slabs when all the
contained objects become unreferenced. The bounce objects are quite small
and thus, with enough alloc/free interleaving, it's probably quite likely
that the cache will remain hot so little slab freeing or allocation will
actually be happening so the bounce buffer allocation and freeing overhead
will be very small.
Post by Paul Durrant
Also the master branch should default to a single (or maybe 2?) page
ring, even if the backend can do 16 whereas all the 8.2.X drivers will use
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a single
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
Post by Paul Durrant
Post by Jakub Kulesza
* what is the expected write and read speed on a harware that can
deliver (measured with dd) reads at about 77MB/s, and writes 58MB/s.
* do you guys plan to improve something more? How can I help to test
and debug it?
* when are you planning to have a next signed release?
All the real improvements are all in master (not even in the as-yet-
unsigned 8.2.2), so maybe we're nearing the point where a 9.0.0 release
makes sense. This means we need to start doing fill logo kit runs on all
the drivers to shake out any weird bugs or compatibility problems, which
takes quite a bit of effort so I'm not sure how soon we'll get to that.
Hopefully within a few months though.
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing XENVBD to
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-hardware/test/hlk/windows-hardware-lab-kit
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should automatically select all the tests relevant to the driver you elect to test (which is XENVBD in this case).
Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their customers
running better :)?
Indeed Citrix should be interested, but testing and updating the branded drivers has to be prioritized against other things. Whether Citrix wants to update branded drivers does not stop me signing and releasing the Xen Project drivers though... it just means they won't get as much testing, so I'd rather wait... but only if it doesn't take too long.
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par with
the big competitor.
Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
A lot of virtio's performance comes from the fact that KVM is a type-2 and so the backend always has full privilege over the frontend. This means that QEMU is set up in such a way that it has all of guest memory mapped all the time. Thus virtio has much less overhead, as it does not have to care about things like grant tables.

Cheers,

Paul
Post by Paul Durrant
Post by Jakub Kulesza
* how come Atto in a domU is getting better reads and writes than
hardware for some packet sizes? Wouldn't it be wise to disable these
caches and allow linux in dom0 (and it's kernel) to handle I/O of all
VMs?
We have no caching internally in XENVBD. The use of the XENBUS_CACHE
objects is merely for bouncing so any real caching of data will be going
on in the Windows storage stack, over which we don't have much control, or
in your dom0 kernel.
ACK.
[cut]
--
Pozdrawiam
Jakub Kulesza
Jakub Kulesza
2018-09-28 12:51:10 UTC
Permalink
pt., 28 wrz 2018 o 14:00 Paul Durrant <***@citrix.com> napisał(a):
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Also the master branch should default to a single (or maybe 2?) page
ring, even if the backend can do 16 whereas all the 8.2.X drivers will use
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a single
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
Current debian defaults are:
log_stats=0
max_buffer_pages=1024
max_persistent_grants=1056
max_queues=4
max_ring_page_order=4

what would you tweak? max_queues and max_ring_page_order to 1?

[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing XENVBD to
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-hardware/test/hlk/windows-hardware-lab-kit
Post by Paul Durrant
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should automatically select all the tests relevant to the driver you elect to test (which is XENVBD in this case).
I will read and see what I can do about this. I can sacrifice a few
evenings for sure.
Post by Paul Durrant
Post by Paul Durrant
Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their customers
running better :)?
Indeed Citrix should be interested, but testing and updating the branded drivers has to be prioritized against other things. Whether Citrix wants to update branded drivers does not stop me signing and releasing the Xen Project drivers though... it just means they won't get as much testing, so I'd rather wait... but only if it doesn't take too long.
ech, priorities, resources, deadlines. I'll hook you up on Linkedin :)
Post by Paul Durrant
Post by Paul Durrant
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par with
the big competitor.
Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
A lot of virtio's performance comes from the fact that KVM is a type-2 and so the backend always has full privilege over the frontend. This means that QEMU is set up in such a way that it has all of guest memory mapped all the time. Thus virtio has much less overhead, as it does not have to care about things like grant tables.
clear.
--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-09-28 14:03:33 UTC
Permalink
-----Original Message-----
Sent: 28 September 2018 13:51
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Also the master branch should default to a single (or maybe 2?)
page
Post by Paul Durrant
Post by Paul Durrant
ring, even if the backend can do 16 whereas all the 8.2.X drivers will
use
Post by Paul Durrant
Post by Paul Durrant
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a single
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
log_stats=0
max_buffer_pages=1024
max_persistent_grants=1056
max_queues=4
max_ring_page_order=4
what would you tweak? max_queues and max_ring_page_order to 1?
1 will give you a 2 page ring, which should be fine.
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing
XENVBD to
Post by Paul Durrant
Post by Paul Durrant
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-hardware/test/hlk/windows-
hardware-lab-kit
Post by Paul Durrant
Post by Paul Durrant
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should automatically
select all the tests relevant to the driver you elect to test (which is
XENVBD in this case).
I will read and see what I can do about this. I can sacrifice a few
evenings for sure.
Cool.
Post by Paul Durrant
Post by Paul Durrant
Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their customers
running better :)?
Indeed Citrix should be interested, but testing and updating the branded
drivers has to be prioritized against other things. Whether Citrix wants
to update branded drivers does not stop me signing and releasing the Xen
Project drivers though... it just means they won't get as much testing, so
I'd rather wait... but only if it doesn't take too long.
ech, priorities, resources, deadlines. I'll hook you up on Linkedin :)
:-)

Cheers,

Paul
Post by Paul Durrant
Post by Paul Durrant
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par with
the big competitor.
Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
A lot of virtio's performance comes from the fact that KVM is a type-2
and so the backend always has full privilege over the frontend. This means
that QEMU is set up in such a way that it has all of guest memory mapped
all the time. Thus virtio has much less overhead, as it does not have to
care about things like grant tables.
clear.
--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-09-28 14:04:42 UTC
Permalink
-----Original Message-----
From: Paul Durrant
Sent: 28 September 2018 15:04
Subject: RE: [win-pv-devel] Windows on Xen bad IO performance
-----Original Message-----
Sent: 28 September 2018 13:51
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Also the master branch should default to a single (or maybe 2?)
page
Post by Paul Durrant
Post by Paul Durrant
ring, even if the backend can do 16 whereas all the 8.2.X drivers
will
use
Post by Paul Durrant
Post by Paul Durrant
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a
single
Post by Paul Durrant
Post by Paul Durrant
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
log_stats=0
max_buffer_pages=1024
max_persistent_grants=1056
max_queues=4
max_ring_page_order=4
what would you tweak? max_queues and max_ring_page_order to 1?
1 will give you a 2 page ring, which should be fine.
Sorry.. should have said set max_queues to 1 too. Multi-queue isn't that much use yet.

Paul
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing
XENVBD to
Post by Paul Durrant
Post by Paul Durrant
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-
hardware/test/hlk/windows-
hardware-lab-kit
Post by Paul Durrant
Post by Paul Durrant
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should automatically
select all the tests relevant to the driver you elect to test (which is
XENVBD in this case).
I will read and see what I can do about this. I can sacrifice a few
evenings for sure.
Cool.
Post by Paul Durrant
Post by Paul Durrant
Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their
customers
Post by Paul Durrant
Post by Paul Durrant
running better :)?
Indeed Citrix should be interested, but testing and updating the
branded
drivers has to be prioritized against other things. Whether Citrix wants
to update branded drivers does not stop me signing and releasing the Xen
Project drivers though... it just means they won't get as much testing,
so
I'd rather wait... but only if it doesn't take too long.
ech, priorities, resources, deadlines. I'll hook you up on Linkedin :)
:-)
Cheers,
Paul
Post by Paul Durrant
Post by Paul Durrant
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par
with
Post by Paul Durrant
Post by Paul Durrant
the big competitor.
Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
A lot of virtio's performance comes from the fact that KVM is a type-2
and so the backend always has full privilege over the frontend. This
means
that QEMU is set up in such a way that it has all of guest memory mapped
all the time. Thus virtio has much less overhead, as it does not have to
care about things like grant tables.
clear.
--
Pozdrawiam
Jakub Kulesza
Jakub Kulesza
2018-09-28 19:50:44 UTC
Permalink
Well, this turns out strange. It is not better but went worse.

Atto provides such results: https://imgur.com/gallery/D4erdER
So it's on par to 8.2.1 with gnttab at 32. But the stability is worse
then before.

Settings from kernel:
# cat /sys/module/xen_blkback/parameters/*
0
1024
1056
1
1
Post by Paul Durrant
-----Original Message-----
From: Paul Durrant
Sent: 28 September 2018 15:04
Subject: RE: [win-pv-devel] Windows on Xen bad IO performance
-----Original Message-----
Sent: 28 September 2018 13:51
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Also the master branch should default to a single (or maybe 2?)
page
Post by Paul Durrant
Post by Paul Durrant
ring, even if the backend can do 16 whereas all the 8.2.X drivers
will
use
Post by Paul Durrant
Post by Paul Durrant
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a
single
Post by Paul Durrant
Post by Paul Durrant
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
log_stats=0
max_buffer_pages=1024
max_persistent_grants=1056
max_queues=4
max_ring_page_order=4
what would you tweak? max_queues and max_ring_page_order to 1?
1 will give you a 2 page ring, which should be fine.
Sorry.. should have said set max_queues to 1 too. Multi-queue isn't that much use yet.
Paul
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing
XENVBD to
Post by Paul Durrant
Post by Paul Durrant
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-
hardware/test/hlk/windows-
hardware-lab-kit
Post by Paul Durrant
Post by Paul Durrant
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should automatically
select all the tests relevant to the driver you elect to test (which is
XENVBD in this case).
I will read and see what I can do about this. I can sacrifice a few
evenings for sure.
Cool.
Post by Paul Durrant
Post by Paul Durrant
Is there a list of issues that you'll want to get fixed for 9.0? Is
Citrix interested right now in getting Windows VMs of their
customers
Post by Paul Durrant
Post by Paul Durrant
running better :)?
Indeed Citrix should be interested, but testing and updating the
branded
drivers has to be prioritized against other things. Whether Citrix wants
to update branded drivers does not stop me signing and releasing the Xen
Project drivers though... it just means they won't get as much testing,
so
I'd rather wait... but only if it doesn't take too long.
ech, priorities, resources, deadlines. I'll hook you up on Linkedin :)
:-)
Cheers,
Paul
Post by Paul Durrant
Post by Paul Durrant
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when you
compare it to virtio on KVM. And 9.0-dev I'd say would be on par
with
Post by Paul Durrant
Post by Paul Durrant
the big competitor.
Funny story, I've tried getting virtio qemu devices running within a
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on it
under windows.
A lot of virtio's performance comes from the fact that KVM is a type-2
and so the backend always has full privilege over the frontend. This
means
that QEMU is set up in such a way that it has all of guest memory mapped
all the time. Thus virtio has much less overhead, as it does not have to
care about things like grant tables.
clear.
--
Pozdrawiam
Jakub Kulesza
--
Pozdrawiam
Jakub Kulesza
Paul Durrant
2018-09-30 10:07:38 UTC
Permalink
-----Original Message-----
Sent: 28 September 2018 20:51
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
Well, this turns out strange. It is not better but went worse.
Atto provides such results: https://imgur.com/gallery/D4erdER
So it's on par to 8.2.1 with gnttab at 32. But the stability is worse
then before.
# cat /sys/module/xen_blkback/parameters/*
0
1024
1056
1
1
I'm guessing the grant table exhaustion has gone, but the bounce buffering is still going to hurt... and that's just a consequence of the benchmark not honouring the alignment requirements :-(

Paul
Post by Paul Durrant
-----Original Message-----
From: Paul Durrant
Sent: 28 September 2018 15:04
Subject: RE: [win-pv-devel] Windows on Xen bad IO performance
-----Original Message-----
Sent: 28 September 2018 13:51
Subject: Re: [win-pv-devel] Windows on Xen bad IO performance
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Also the master branch should default to a single (or maybe
2?)
Post by Paul Durrant
page
Post by Paul Durrant
Post by Paul Durrant
ring, even if the backend can do 16 whereas all the 8.2.X
drivers
Post by Paul Durrant
will
use
Post by Paul Durrant
Post by Paul Durrant
all 16 pages (which is why you need a heap more grant entries).
can this be tweaked somehow on current 8.2.X drivers? to get a
single
Post by Paul Durrant
Post by Paul Durrant
page ring? max_ring_page_order on xen_blkback in dom0?
Yes, tweaking the mod param in blkback will do the trick.
log_stats=0
max_buffer_pages=1024
max_persistent_grants=1056
max_queues=4
max_ring_page_order=4
what would you tweak? max_queues and max_ring_page_order to 1?
1 will give you a 2 page ring, which should be fine.
Sorry.. should have said set max_queues to 1 too. Multi-queue isn't that
much use yet.
Post by Paul Durrant
Paul
[cut]
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
You could try setting up a logo kit yourself and try testing
XENVBD to
Post by Paul Durrant
Post by Paul Durrant
see if it passes... that would be useful knowledge.
seems fun. Where can I read on how to set up the logo kit?
See https://docs.microsoft.com/en-us/windows-
hardware/test/hlk/windows-
hardware-lab-kit
Post by Paul Durrant
Post by Paul Durrant
Is there an acceptance testplan that should be run?
I've not use the kit in a while but I believe it should
automatically
Post by Paul Durrant
select all the tests relevant to the driver you elect to test (which
is
Post by Paul Durrant
XENVBD in this case).
I will read and see what I can do about this. I can sacrifice a few
evenings for sure.
Cool.
Post by Paul Durrant
Post by Paul Durrant
Is there a list of issues that you'll want to get fixed for 9.0?
Is
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
Citrix interested right now in getting Windows VMs of their
customers
Post by Paul Durrant
Post by Paul Durrant
running better :)?
Indeed Citrix should be interested, but testing and updating the
branded
drivers has to be prioritized against other things. Whether Citrix
wants
Post by Paul Durrant
to update branded drivers does not stop me signing and releasing the
Xen
Post by Paul Durrant
Project drivers though... it just means they won't get as much
testing,
Post by Paul Durrant
so
I'd rather wait... but only if it doesn't take too long.
ech, priorities, resources, deadlines. I'll hook you up on Linkedin
:)
Post by Paul Durrant
:-)
Cheers,
Paul
Post by Paul Durrant
Post by Paul Durrant
Testing windows VMs on VMware the same way (with
VMware's paravirtual IO) is not stellar anyway, looks crap when
you
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
compare it to virtio on KVM. And 9.0-dev I'd say would be on par
with
Post by Paul Durrant
Post by Paul Durrant
the big competitor.
Funny story, I've tried getting virtio qemu devices running
within a
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
XEN VM, but this is not stable enough. I have managed to get the
device show up in Windows, didn't manage to put a filesystem on
it
Post by Paul Durrant
Post by Paul Durrant
Post by Paul Durrant
under windows.
A lot of virtio's performance comes from the fact that KVM is a
type-2
Post by Paul Durrant
and so the backend always has full privilege over the frontend. This
means
that QEMU is set up in such a way that it has all of guest memory
mapped
Post by Paul Durrant
all the time. Thus virtio has much less overhead, as it does not
have to
Post by Paul Durrant
care about things like grant tables.
clear.
--
Pozdrawiam
Jakub Kulesza
--
Pozdrawiam
Jakub Kulesza
Loading...