View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000090||Cinelerra-GG||[All Projects] Feature||public||2018-12-30 19:53||2019-08-07 22:28|
|Target Version||Fixed in Version|
|Summary||0000090: Allow optional hardware-supported encoding during rendering|
|Description||Depending on the system, there might be hardware rendering (GPU) available. If available, the available options are listed using the "vainfo" command in a terminal window. On Mint at least, you have to install vainfo first. I don't know how much effort is is to retrieve that info using native Cinelerra code.|
If hardware acceleration is available for a chosen format (say h.264 or h.265), could it be made available in the "compression" dropdown list in the video settings of the rendering window? Maybe the quality is less than pure software rendering, but if the hardware causes an considerable speedup then certainly for non-professional use this would be a nice option to have.
|Tags||No tags attached.|
|Closing -- good discussions here though! but you can now do "low optional hardware-supported encoding during rendering" as described in the notes and pdf file.|
well, all those components (kernel part, 2d/video part for X, OpenGl, OpenCL, cuda, Vulkan, nvenc/dec) quite closely related. I don't think kernel developers are very happy about 10Mb 'black box' module, doing all those DMA and memory operations in kernel address space, at least if you want to send kernel bugreport.
With nouveau current load of some engines implemented as Gallium HUD (and some OpenGL extension). For clocks you can try:
0f: core 575 MHz shader 1438 MHz memory 850 MHz
AC: core 399 MHz shader 810 MHz memory 499 MHz
[as root, after mounting debugfs]
Earlier NV (up to GeForce 750) cards can be reclocked manually by echoing higher performance level to this file:
echo 0f > /sys/kernel/debug/dri/0/pstate
0f: core 575 MHz shader 1438 MHz memory 850 MHz AC DC *
AC: core 576 MHz shader 1458 MHz memory 499 MHz
you see, in my case (nv92) memory failed to reclock, known problem. And even this level of reclocking requre small kernel patch (not all nv92 GF 8800 crads actually have same type of memory, so some are reclockable and some are not). With new and higher-performant cards boot clocks acan be quite low, as low as 50Mhz (!) for some fermi card! No wonder anything OpenGL actually just a bit faster than llvmpipe...
There is list of nouveau tasks, but they require a lot of knowledge to even try:
https://www.x.org/wiki/SummerOfCodeIdeas/ -> Nouveau (Open Source NVIDIA driver)
For just idea how much info must be parsed and passed into driver for 'simple' h264 decoding:
Problem definitely exaggregated by fact firmware running on card video engines (there seems to be more than one - Bitstream Processor handles sequential tasks, Video Processor handles parallelizable tasks, and some xtensa microcontroller coordinates them, IIRC.) ALSO closed-source!
There was parallel effort about RE'ing some ARM video SOCs:
May be one can convince them to try and help nouveau, too (because it used with some tegra platforms ..with arm CPU).
I'll try to convince nouveau developers to add some dmesg warnings about secure[GPU]Boot/reclocking problems, so it will be more visible, not sitting on some project website no-one visiting BEFORE running those cards with default nouveau driver ....
With Nvidia it is not only the driver, also the enclosed OpenGL/Glx provides considerably more speed than the Mesa (SGI) installed by the distribution. (glxinfo | grep vendor)
(Videos with mpv jerk with the combi of the distribution! As long as the drivers are delivered by the manufacturer I see no reason why I should keep my paid hardware to a minimum.)
> How did you measure video engine load?
nvidia-settings -> Graphics Card Information,
shows among other things the current load of the GPU and the VE.
I have no doubt that they are doing their best and I am sure that if it were possible, they would gladly produce a driver at least as good as Nvidia's.
But I'm not a developer, I can't help them, I need a driver that meets my current expectations for video editing with Cin-GG. For the moment, it is the Nvidia driver who allows me to work best.
re: nouveau (libre nvidia driver)
Some problems were explained on IRC few days ago:
13:59 karolherbst: and we have no fix for tearing so far
14:00 karolherbst: what's missing is reverse engineering of the line buffer
14:00 karolherbst: I think... but it depends on what kind of tearing you got
16:20 imirkin_: as always, lack of interest/time is the reason
16:20 kreyren: how many active developers are there in nouveau?
16:20 imirkin_: maybe like 2, depending on how you account for things
14:33 imirkin: the gpu boots to low clocks
14:33 imirkin: the pmu must change to high clocks
14:34 kreyren: so can we do that on nouveau?
14:34 imirkin: on gm20x, it's a hilarious situation since we can actually change the clocks from non-secure mode
14:34 imirkin: but we *can't* control the fan
14:34 imirkin: on gp10x, we can't do the RE necessary to figure out how to perform the reclocking sequence
14:34 kreyren: is there an issue with overheating if we can't controll the fan? or it that controlled by GPU itself?
14:35 imirkin: the driver controls the fan
14:35 imirkin: so the gpu would overheat
14:35 imirkin: except on a laptop
see also Mailinglist for current issues (archives also can be interesting to read):
I just want to point out nouveau issues come from real technosocial problem (hard-to-drive hardware with many complex processors on-board, coupled with serious lack of manpower), NOT because devs are lazy or don't give a damn.
|It's been a long time since I tested the free driver for my Nvidia card. But from memory, with my display spread over three 1080p monitors, it was impossible for me to completely eliminate video tearing, whereas I manage it well with the Nvidia pilot.|
|I certainly can't speak for everyone, but for my situation only the proprietary Nvidia drivers work best. With the free drivers I can't run two 4K monitors, that only works with the proprietary graphics drivers because the additional display port via the free drivers isn't activated properly. There were also problems with the 4K resolution. From time to time, I also play games and there you can see the differences enormously.|
Oh, are Nvidia open source drivers that bad? I was not aware of that. I don´t do gaming, so have no need for a separate GPU.
Of course the end result is the most important. If only Nvidia proprietary drivers work acceptable on Nvidia hardware, then indeed they should be supported if possible. I withdraw my argument that proprietary Nvidia drivers like Cuda should be avoided.
By the way, the video encoding/decoding is not Cuda or OpenCL, and this issue (90) was about the hardware accelerated encoding/decoding. I think that has been achieved. At least on my tests I did not see any different output from hardware or software; I looked at the encoded file with VLC. vdpau was a a little slower than vaapi.
Issue 215 is more about generic hardware acceleration such as Cuda or OpenCL.
Most people who work with Linux probably do it for the same reasons, because of the freedom of open source code. I try to use open source free software wherever possible. But I have to agree with Pierre. Nvidia is one such topic where I use for performance reasons the proprietary drivers. The free drivers are so bad on my notebook/laptop that I can't use the power of my laptop. I work almost exclusively with my laptop because I'm always on the go.
On laptops only Nvidia graphics cards actually work properly at the moment. For this reason I would leave it to the user which drivers he wants to work with and that includes Cuda. Nevertheless, I welcome any development concerning open source and would like to see the graphics cards on free driver basis become more powerful in the future, so that I can remove the last proprietary drivers. But until then I find this temporary solution acceptable.
My comment would probably require a separate discussion, but here it is.
I don't think Cin-GG should limit compatibility with proprietary drivers and their options such as Nvidia's, where possible.
I am fully aware of the advantages offered by the open source world... but not at the price of reduced or limited performance. Cin-GG aims as much as possible to match the possibilities and performance of professional video editing (NLE) software. Those who use these software in the still highly competitive context of audio-visual, know that it is essential to do everything possible to preserve all the maximum quality of the source video sequences (regardless of their origin) and the efficiency of the subsequent processing. Let compatibility with proprietary software and codecs be a separate option, okay... but it should remain possible, easy and not limited.
@Olaf: yes, the vaapi and older vdpua interfaces are the the specialized video decoding/encoding hardware blocks in a GPU, not the mostly game-related big GPU blocks. According to ¨https://wiki.archlinux.org/index.php/Hardware_video_acceleration¨ vdpau is legacy, vaapi is current for AMD/Intel/Nvidia, except when using the Nvidia closed source driver for which Nvidia has a proprietary set of interfaces. I think Cin-gg should go the open source route where possible.
For that last reason I also think Cuda should be avoided - it is proprietary Nvidia. Much better to use OpenCL which is supported by all three major GPU hardwares.
How did you measure video engine load?
Correct me if I'm wrong. Vdpau and Vaapi are drivers for the video engine. They are not drivers for GPU Utilization.
A test to use Vdpau or Vaapi should at least include the results of the load of the video engine. Something like this (only exemplary values):
Without Video Engine: 45% CPU load, 0% Video Engine load, Composer 18 fps
Comparison with Video Engine: 20% CPU load, 80% Video Engine load, Composer 25 fps
Be careful with rendering using the Video Engine, you should always check whether the same high-quality results are produced as with software rendering (CPU). Only then is a comparison possible.
(For the future, because of a request: Drivers for the Video Engine do not support ICC profiles.)
I reviewed your results and thank you for the tests you did.
Very interesting (I did not expect it!) that in the presence of effects RGBA-FLOAT is more efficient than RGBA-8bit. Maybe because Cin works internally in floating point and so it combines better with a color space of the same type?
It is also interesting to define the number of CPU cores to 10 (for an 8 threads). I thought it was only valid for plugins engine and not for the CinGG engine in general.
Vaapi seems better than Vdpau (which in fact Nvidia no longer develops in favor of Nvenc).
I have done some testing on the effects of using hardware accelerated video decoding and encoding. This is using the vaapi interface, on an AMD GPU. Details below.
CinelerraGG 2019-05 hardware acceleration tests on AMD Ryzen 5 2400G (3.6 GHz, 4core/8threads, built-in GPU), SSD storage.
Memory DDR4 2400 MHz, motherboard Asrock B450 Pro 4, BIOS 3.10 . No overclocking of CPU, GPU or memory.
OS software Mint XFCE 19.1, kernel 4.18.0-21, graphics driver amdgpu, Mesa 18.2.8 .
The XFCE´s panel app ¨cpu graph properties" used to track CPU usage (low cpu usage itself). Update rate set to ¨slowest".
The hardware video decoding/encoding is done by special function units in the GPU, likely not affected by the
size of the GPU but only by the clock speed, which I left at the Bios´ default ¨auto". This is a fairly fast machine, on slower machines a video card might provide more of a speedup.
libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_1
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.1 (libva 2.1.0)
vainfo: Driver version: Mesa Gallium driver 18.2.8 for AMD RAVEN (DRM 3.26.0, 4.18.0-21-generic, LLVM 7.0.0)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
Cin "Settings->Preferences->Performance, Use HW device" set to ¨none¨ or ¨vaapi¨, project SMP CPUs set to 10. The CPU has 8 threads max, but I never got almost 100% unless I set it to 10.
Cin "Settings->Preferences->Appearance, YUV color space BT709, YUV color range JPEG.
Cin is always started from a terminal to see any errors. After changing HW device, exit and restart cin (to be sure).
Source HD video, 350MByte, mp4, 1920x1080x50p, 10Mb/s, 268 secs.
Load video as ¨replace current project¨.
If the video is loaded and HW device is vaapi, the the terminal shows:
mesa: for the -simplifycfg-sink-common option: may only occur zero or one times!
mesa: for the -global-isel-abort option: may only occur zero or one times!
After loading, Settings->Format->Color Model is set to RGBA-8bit. Depending on the test, change it to RGBA-FLOAT.
Depending on the test, put the 3-way color effect on it but don´t change anything in the effect´s settings.
1. HW device none, RGBA-8 bit, effect none: CPU load 10%.
2. HW device none, RGBA-FLOAT, effect none: CPU load 12%.
3. HW device vaapi, RGBA-8 bit, effect none: CPU load 7%.
4. HW device vaapi, RGBA-FLOAT, effect none: CPU load 8%.
5. HW device none, RGBA-8 bit, effect Color 3-way: CPU load 73%.
6. HW device none, RGBA-FLOAT, effect Color 3-way: CPU load 66%.
7. HW device vaapi, RGBA-8 bit, effect Color 3-way: CPU load 60%.
8. HW device vaapi, RGBA-FLOAT, effect Color 3-way: CPU load 53%.
Render testing. This is done with the effect Color 3-way active and the HW device aset to vaapi
(should have no effect on rendering). File format ffmpeg.mp4, insertion strategy "create new resources only".
Video compression settings depending on test. Rendering times both from cin terminal as manually stopped
using stopwatch. When using vaapi_h264, the bitrate was set to 10 Mb/s, else it would encode with 31 Mb and produce a much bigger file. And the source was only 10 Mb/s.
9. RGBA-8 bit, bitrate 0, h264.mp4: CPU load wildly varying 50-98%, 954 secs, 14 fps, filesize 289 MByte, bitrate 8,6 Mb/s.
10. RGBA-FLOAT, bitrate 0, h264.mp4: CPU load wildly varying 55-98%, 1252 secs, 11 fps, filesize 287 MByte, bitrate 8,6 Mb/s.
11. RGBA-8 bit, bitrate 10000000, h264_vaapi.mp4: CPU load 51-63%, 741 secs, 18 fps, filesize 339 MByte, bitrate 10 Mb/s.
12. RGBA-FLOAT, bitrate 10000000, h264_vaapi.mp4: CPU load 46-60%, 1050 secs, 13 fps, filesize xxx MByte, bitrate xxx Mb/s.
Where the Cin terminal at the end of rendering normally looks like:
Render::render_single: Session finished.
** rendered 13406 frames in 267.184 secs, 50.175 fps
audio0 pad 64 0 (64)
at 11) and 12) it displayed additionally:
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
FFStream::decode: avcodec_send_packet failed.
err: Invalid data found when processing input
My conclusion is that hardware-accelerated de- and encoding provides definite benefits. Especially on encoding it leaves more CPU over to do more effects. The GPU document only mentions Intel for vaapi, it should mention AMD as well.
Cuda is very different from hardware decoding/encoding such as controlled by the vaapi/vdpau interfaces. It should be compared to OpenCL, I think.
And OpenCL is vendor-independent, unlike Cuda. I think enabling the hardware de/encoding should be different from using aGPU compute interface like
Cuda/OpenCL in the performance tab.
I´ll be posting some vaapi test after this.
|What a positive surprise that the integration of CUDA has worked. Unfortunately I am not able to test it currently, because I still have outdated drivers installed. I will switch to the latest OpenSuse version in a few days anyway and then use this opportunity to install the latest graphics drivers and also test CUDA. In a few days I will add this new feature to the feature page. Thanks again for the great work.|
With the addition of an optional Cuda build/usage, this task is complete as far as I know. A document covering all of the encode/decode/Cuda parts has been updated and is temporarily at:
It has been added locally to the manual which will be updated later. I will mark this resolved later in case anyone wants to add a comment yet.
Hardware acceleration for encoding for rendering is now available for most graphics card. It will automatically be built in on the monthly builds. The vendors for the boards, have limited which codecs they implemented. Currently available are:
Nvidia: h264 and h265 (h265 works only for later Nvidia boards - at least for Maxwell chip boards).
Intel HD graphics: h264, mpeg2, mjpeg, h265, and possibly vp8/vp9 (h264 and mpeg2 work on, for example Intel Broadwell).
|As Andrew stated, this is partially implemented with ENCODING, but not for Nvidia graphics boards which need NVENC. I will leave this open until gg has a chance to look into ffmpeg having nvenc removed from the non-free list.|
Partially present since https://git.cinelerra-gg.org/git/?p=goodguy/cinelerra.git;a=commit;h=1d4f5d708de0d8ec19300b417354a3374d00ed47
See email at https://lists.cinelerra-gg.org/pipermail/cin/2019-May/000603.html
|2018-12-30 19:53||MatN||New Issue|
|2019-05-14 04:12||Andrew-R||Note Added: 0001521|
|2019-05-16 03:25||PhyllisSmith||Assigned To||=> goodguy|
|2019-05-16 03:25||PhyllisSmith||Status||new => assigned|
|2019-05-16 03:27||PhyllisSmith||Status||assigned => acknowledged|
|2019-05-16 03:27||PhyllisSmith||Note Added: 0001547|
|2019-06-16 22:04||PhyllisSmith||Status||acknowledged => feedback|
|2019-06-16 22:04||PhyllisSmith||Note Added: 0001735|
|2019-06-21 00:37||PhyllisSmith||Note Added: 0001744|
|2019-06-21 00:50||Sam||Note Added: 0001747|
|2019-06-23 20:44||MatN||Note Added: 0001772|
|2019-06-23 20:44||MatN||Status||feedback => assigned|
|2019-06-23 20:59||MatN||Note Added: 0001773|
|2019-06-24 16:24||Andrea_Paz||Note Added: 0001774|
|2019-06-25 08:13||Olaf||Note Added: 0001775|
|2019-06-25 17:32||MatN||Note Added: 0001776|
|2019-06-25 18:25||Pierre||Note Added: 0001779|
|2019-06-25 19:26||Sam||Note Added: 0001780|
|2019-06-25 19:28||Sam||Note Edited: 0001780||View Revisions|
|2019-06-25 20:51||MatN||Note Added: 0001783|
|2019-06-25 21:54||Sam||Note Added: 0001784|
|2019-06-25 22:04||Pierre||Note Added: 0001785|
|2019-06-25 23:43||Andrew-R||Note Added: 0001786|
|2019-06-26 00:14||Pierre||Note Added: 0001787|
|2019-06-26 09:11||Olaf||Note Added: 0001789|
|2019-06-26 14:56||Andrew-R||Note Added: 0001793|
|2019-08-07 22:28||PhyllisSmith||Status||assigned => closed|
|2019-08-07 22:28||PhyllisSmith||Resolution||open => fixed|
|2019-08-07 22:28||PhyllisSmith||Note Added: 0002004|