Multiple GPUs / HW ...
 
Notifications
Clear all

Multiple GPUs / HW device selection / FFStream decode debugging / nvidea eGPU / VDPAU  

Page 4 / 4
  RSS

PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
25/04/2020 7:28 pm  

@Ig0r

Thanks for the feedback.  Well, running prof2 is more at a programmer level and it is not that easy to explain how to proceed.  However, meanwhile another performance improvement has been implemented here which affects redrawing the video/audio on the timeline.  But I do not think that will help with your video.  Testing that I have been doing here is Big Buck Bunny which is 3840x2160 at 60fps and "Play every frame" is almost always near 60.  That is using either the Video driver X11 (software) or X11-OpenGL and no hardware acceleration but I have 16 CPUs on this laptop. 

I think that upgrading to LTS 20 is probably a good idea BUT we have not created another partition for installing that here in order to create a new build on April 30th so it has not been tested.  We will do that this coming week so that it is available but there will be very little specific testing.  When we do that, I will check out how to run "prof2" on it and see what is taking so much time.  Meanwhile, another user has provided some profile information that we can look at.


ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
25/04/2020 7:40 pm  

@Ig0r

Do not worry about getting back to this in any time frame.  You are busy and I have plenty of other things to do! 

About the prof 2 error, if/when you have time just skip the step that did not work, that is:

cp: target '/usr/local/bin/' is not a directory

and try to run the next steps and let me know if another problem.  We had tested prof2 on Ubuntu 18 a few months ago and at that time we did not have to do the "cp" step so it may not even be needed.


ReplyQuote
Ig0r
 Ig0r
(@ig0r)
Active Member
Joined: 2 months ago
Posts: 19
26/04/2020 9:05 am  

@phyllissmith

WOW, this high resolution @ 60FPS sounds like quite some heavy work for the hardware. But as I said, using the VAAPI approach I do get superb performance (see the screenshot attached) -> almost steady 60FPS too. But via the eGPU (while we clarified in the meantime that VDPAU is initialized successfully) lacks behind big time in terms of preview playback (~40FPS only and stuttering), while the eGPU rendering is awesome!. The slow preview playback on the eGPU is what we are now trying to debug further here. 

And yeah, I have an old (but gold!) x230 with an Intel i5-3320M (4) @ 3.300GHz, but you know the whole story is about getting the most out of old hardware and let newer hardware (the GTX970 eGPU) do the heavy lifting.

I tried to follow along, but I failed on the next to last step:

[email protected]:~/Downloads/cin-ub18-debug/prof2$ ./prof -o /tmp/prof_list.txt /home/ig0r/Downloads/cin-ub18-debug/cin
cant find symbol 'main'

I'm not sure if this has something to do with the "cp" step we just skipped. The two required packages are definitely installed. In time just let me know how to proceed, I'd like to support you on that before I switch to 20.04 LTS (since I think especially around GPU support they implemented quite some new features, among other things the "Launch using dedicated graphics card" gnome shell implementation https://ubuntu.com/blog/whats-new-in-ubuntu-desktop-20-04-lts).

Enjoy your weekend! All the best.


ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
27/04/2020 1:27 am  

@Ig0r

I keep losing focus. So getting back to the following:

The slow preview playback on the eGPU is what we are now trying to debug further here. 

At least once I did something to get the FPS down from the expected 60 to 29 but now I can not repeat that -- I was hoping that that would provide a hint as to why the eGPU does not run as well. I have to go back over this forum to refresh my brain.

 

Meanwhile, GG downloaded Ubuntu 20 and created a Cinelerra-GG static build there for later if we get this debugged.  It as at:

https://cinelerra-gg.org/download/testing/cinelerra-5.1-ub20-x86_64.static.txz


ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
27/04/2020 7:39 pm  

@Ig0r

Today we ran some more tests on our GTX970 computer using Big Buck Bunny 3840x2160 which I had converted from mkv to mp4.  I uploaded that exact test file to your video_debug shared drive (which should be deleted after you download it as I do not know about license stuff).

With "vdpau" it consistently plays at 60fps and 88% cpu usage.

With "vaapi" it consistently plays at 60fps and 334% cpu usage (it is probably emulating).

With "none" it consistently plays at 58fps and 380% cpu usage.

Conclusion that we have come to here might be as you stated earlier:

"It could be that we now reached the inherent bottleneck of the setup is the by far less optimum communication between CPU and eGPU vs the inherent optimization between the intel cpu and the onboard "gpu""

But GG can not explain why rendering - which you said runs at full GPU - should be any faster than playing using your eGPU since there is the same sort of traffic going to and from across the expresscard bus/slot.

Unfortunately, we have no further debugging help that we can think of doing from here without the same setup.  The one suggestion he said that you might watch for is to run the command line "top", or equivalent, with 5 second frequent updates to see if the CPU usage goes real low and then suddenly jumps real high, back and forth.  That might indicate it is doing something in the software that is time-consuming.  I do not know if running the profile2, prof2, would lead to any discovery.


ReplyQuote
Ig0r
 Ig0r
(@ig0r)
Active Member
Joined: 2 months ago
Posts: 19
15/05/2020 8:53 pm  

@phyllissmith

It's been a while - sorry for that. In the meantime I upgraded to ubuntu 20.04LTS, installed all nvidia drivers and did some testing. Some pre-info on that up front:

  • The command of utilizing the eGPU in mpv changed a bit. It now actually complains about the GoPro footage pixel format:

mpv --vo=vdpau -hwdec=vdpau video.mp4

gives

[vo/vdpau] Warning: this compatibility VO is low quality and may have issues with OSD, scaling, screenshots and more.
[vo/vdpau] vo=gpu is the preferred choice in any case and includes VDPAU support via hwdec=vdpau or vdpau-copy.
[ffmpeg] AVHWFramesContext: Unsupported sw format: yuvj420p
Failed to allocate hw frames.

A workaround in the command solves the issue having almost zero CPU usage and giving the eGPU the full load:

mpv --hwdec-image-format=yuv420p --vo=vdpau -hwdec=vdpau video.mp4

This at least shows that handling the yuvj420p is not a piece of cake, but somehow it is possible to fully utilize the GPU for optimum performance.

  • Another proof that the GPU can handle this format very well is a transcoding test using ffmpeg which is blazing fast (~3x normal playback speed) and almost has 0% CPU usage:

ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -vsync 0 -i video.mp4 -c:v h264_nvenc -b:v 60M -maxrate:v 61M -bufsize:v 80M -profile:v main -rc:v vbr_hq -rc-lookahead:v 32 -spatial_aq:v 1 -aq-strength:v 15 -coder:v cabac -f mp4 h264_test_dec.MP4

This also gives the video in yuv420p, which might be used just in case we find out that it's really only the yuvj420p pixel format giving us a hard time (which I don't think, read below). 

  • Crosschecking with VLC also works as expected, VDPAU is being used as hardware decoder.

In terms of testing I think we're good to go - the eGPU is fully operational having full hardware decoding support.


Switching back to cinelerra (I used the static build from 30.04.2020, thanks for that!) things start to be messy again. 

  • First, I checked if using the intel vaapi stuff everything works. And yes, indeed the performance is as good as in the past (I'm happy to point this out again: this is still the only video editing software I found with this great hardware playback support!). 

Trying vaapi with your big bucks bunny test video gives fairly good 48fps at moderate CPU usage. There is an issue about the ibva: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so popping up which I never saw before. As a matter of fact, if I remember correctly, I did the big bucks bunny test at the old ubuntu already at that gave me permanent smooth 60fps, so there is still something fishy with the current vaapi drivers, but already very usable:

[email protected]:~/Program_data/cinelerra_static$ ./cin
Cinelerra Infinity - built: Apr 30 2020 09:07:59
git://git.cinelerra-gg.org/goodguy/cinelerra.git
(c) 2006-2019 Heroine Virtual Ltd. by Adam Williams
2007-2020 mods for Cinelerra-GG by W.P.Morrow aka goodguy
Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.

[h264 @ 0x7f3cec2d9c00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec2d8d40] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3cec277700] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f3cec3fb780] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec3fa580] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3c817cd840] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f3c84011380] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f3cec275400] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f3c8403b840] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f3c8403d800] Trying to use DRM render node for device 0.
[AVHWDeviceContext @ 0x7f3c8403d800] libva: VA-API version 1.7.0
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Found init function __vaDriverInit_1_7
[AVHWDeviceContext @ 0x7f3c8403d800] libva: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so init failed
[AVHWDeviceContext @ 0x7f3c8403d800] libva: va_openDriver() returns 1
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
[AVHWDeviceContext @ 0x7f3c8403d800] libva: Found init function __vaDriverInit_1_6
[AVHWDeviceContext @ 0x7f3c8403d800] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x7f3c8403d800] Initialised VAAPI connection: version 1.7
[AVHWDeviceContext @ 0x7f3c8403d800] VAAPI driver: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.0.
[AVHWDeviceContext @ 0x7f3c8403d800] Driver not found in known nonstandard list, using standard behaviour.
[h264 @ 0x7f3c859e1600] Reinit context to 3840x2160, pix_fmt: vaapi_vld

 

  • I did another test using my old short testvideo, and damn it the performance is great! Permanent 60fps at both viewer and compositor.
  • So lets check the VDPAU stuff, and this is where sh** hits the fan again. Trying the bbb video of yours gives me lousy 20fps after "successfully created a VDPAU device...". On top of that, the CPU usage is quite high (~70%) AND the GPU load is high (~95%, having ~60% video engine utilizatzion)

[email protected]:~/Program_data/cinelerra_static$ ./cin
Cinelerra Infinity - built: Apr 30 2020 09:07:59
git://git.cinelerra-gg.org/goodguy/cinelerra.git
(c) 2006-2019 Heroine Virtual Ltd. by Adam Williams
2007-2020 mods for Cinelerra-GG by W.P.Morrow aka goodguy
Cinelerra is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. There is absolutely no warranty for Cinelerra.

[h264 @ 0x7f68f0170540] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f0177700] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68f016e240] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68f02d1ac0] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f02d0bc0] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68dd7cd300] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68dd7e5b00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f68dd7d0900] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d4010e00] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVIOContext @ 0x7f68f02e1000] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68d4039e80] Reinit context to 3840x2160, pix_fmt: yuvj420p
[swscaler @ 0x7f68deaf9000] deprecated pixel format used, make sure you did set range correctly
[AVHWDeviceContext @ 0x7f68d4016700] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d403ca40] Reinit context to 3840x2160, pix_fmt: vdpau
[h264 @ 0x7f68d403ca40] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68deb221c0] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68d4158780] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f69389901c0] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f69389cdc80] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f6938075980] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[AVIOContext @ 0x7f68d4228bc0] Statistics: 9614994 bytes read, 3 seeks
[AVIOContext @ 0x7f68d4017fc0] Statistics: 1069060 bytes read, 2 seeks
[h264 @ 0x7f68b800ec40] Reinit context to 3840x2160, pix_fmt: yuvj420p
[h264 @ 0x7f68b802d640] Reinit context to 3840x2160, pix_fmt: yuvj420p
[AVHWDeviceContext @ 0x7f68b85293c0] Successfully created a VDPAU device (NVIDIA VDPAU Driver Shared Library 435.21 Sun Aug 25 08:06:02 CDT 2019) on X11 display :1
[h264 @ 0x7f68b86d5e80] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68b9cea540] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ddcdf500] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68ba6a6840] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68dd857a00] deprecated pixel format used, make sure you did set range correctly
[h264 @ 0x7f68dd9c0fc0] Reinit context to 3840x2160, pix_fmt: vdpau
[swscaler @ 0x7f68ba6a7980] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69f340] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69f340] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69e7c0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x7f68ba69

  • A quick test on my old short testvideo gives similar results. 
  • I did also some tests with videos having yuv420p pixel format with similar performance. So it is not the pixel format to blame anymore, the devices are initialized correctly.

Conclusion:

By now I use cinelerra via the vaapi drivers only, since the eGPU doesn't give me any advantages. So yes you were right, I was too euphoric when stating the rendering is faster on the eGPU. It only seemed so looking at the GPU usage. This is kind of fake, since now the GPU is also very heavily used, without any performance advantages. 

Question:

Are you guys still motivated to debug this further? Its hard, as you said, since everything seems to work fine at your setup having even the same GPU. Still, its such a shame that the hardware is capable of everything, but somehow there is a missing link to fully utilize this in cinelerra. I stick to it: The software would see another huge leap forward compared to other software out there. I hope to have some more time on my hands soon so I can help however I can on this. 

Please let me know how and if we proceed - thanks so much for your time! All the best,
Christopher


ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
15/05/2020 10:02 pm  

@ig0r

We would definitely like to have this working to full gpu usage, but right now only have 1 idea that we might be able to try.  That would be for us to install Ubuntu 20 on the same computer that had Fedora which we were using for testing, to see if there is some difference there that may affect how Cinelerra runs with a GPU (still no eGPU, but it might tell us something).

 

I will have to read your last note in more detail to see if I can come up with any further debugging techniques that might lead to a solution.  Right now we have no ideas that you could try on your side.  It is quite puzzling.

This post was modified 3 weeks ago by PhyllisSmith

ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
16/05/2020 8:51 pm  

@ig0r

We installed Ubuntu 20.04 and nvidia 390 drivers and with default settings using vdpau, Big Buck Bunny got 60fps when playing.   BUT, believe it or not, when you switch to Settings->Preferences, Playback A, Video driver of X11-OpenGL, we only got 13 fps.

 

So please check that you have "X11" set for the Video driver and test BBB again.  I.e. we generated a problem that matches yours, with the exception of an eGPU, and totally solved it to get 60fps using X11.  Let us know the results -- I hope this works for you too.  (It must be thrashing and that would explain why MPV, etc. do not exhibit the same slowdown -- thy most likely only use software and not OpenGL in this situation).

This post was modified 3 weeks ago by PhyllisSmith

ReplyQuote
Ig0r
 Ig0r
(@ig0r)
Active Member
Joined: 2 months ago
Posts: 19
17/05/2020 8:50 am  

@phyllissmith

Unfortunately that's not it. I tried this setting several times, and while indeed the performance in OpenGL is even worse (~6fps), the defaul X11 option gives me the ~24fps using the eGPU - check the screenshot attached. You'll also see some performance stats from the nvidia settings and cpu usage from nmon.

OK it seems we're stuck now, but great that you still have motivation! Mine flared up again too! 🙂 

Let me know if you have further ideas - I'll continue with some testing on my own here.


ReplyQuote
PhyllisSmith
(@phyllissmith)
Member Moderator
Joined: 2 years ago
Posts: 155
17/05/2020 3:49 pm  

@ig0r

Actually you gave GG an idea so thanks for testing X11-OpenGL along with X11 as the video driver.  There must be something that MPV is doing that sets up the hardware usage to alleviate potential GPU usage thrashing.  There are probably some graphics experts out there that might know the answer but we are mostly deficient in this area.  GG might be looking at the MPV code to see if there is some setup that can solve the issue -- but that is a lot of code so narrowing it down may be difficult.  It does point to the eGPU though since we do not have the problem with our GTX970 when using the X11 driver.

 

Meanwhile, I will search the web to see if I can find a clue too.  So we are not done yet!


ReplyQuote
Page 4 / 4
Share: