0000090: Allow optional hardware-supported encoding during rendering

ID	Project	Category	View Status	Date Submitted	Last Update

0000090	Cinelerra-GG	[All Projects] Feature	public	2018-12-30 19:53	2019-08-07 22:28

Reporter	MatN	Assigned To	goodguy
Priority	normal	Severity	feature	Reproducibility	always
Status	closed	Resolution	fixed
Platform	x86_64	OS	LinuxMint	OS Version	18.3
Product Version	2018-11
Target Version		Fixed in Version

Summary	0000090: Allow optional hardware-supported encoding during rendering
Description	Depending on the system, there might be hardware rendering (GPU) available. If available, the available options are listed using the "vainfo" command in a terminal window. On Mint at least, you have to install vainfo first. I don't know how much effort is is to retrieve that info using native Cinelerra code. If hardware acceleration is available for a chosen format (say h.264 or h.265), could it be made available in the "compression" dropdown list in the video settings of the rendering window? Maybe the quality is less than pure software rendering, but if the hardware causes an considerable speedup then certainly for non-professional use this would be a nice option to have.
Tags	No tags attached.

PhyllisSmith 2019-08-07 22:28 manager ~0002004	Closing -- good discussions here though! but you can now do "low optional hardware-supported encoding during rendering" as described in the notes and pdf file.

Andrew-R 2019-06-26 14:56 reporter ~0001793	well, all those components (kernel part, 2d/video part for X, OpenGl, OpenCL, cuda, Vulkan, nvenc/dec) quite closely related. I don't think kernel developers are very happy about 10Mb 'black box' module, doing all those DMA and memory operations in kernel address space, at least if you want to send kernel bugreport. With nouveau current load of some engines implemented as Gallium HUD (and some OpenGL extension). For clocks you can try: cat /sys/kernel/debug/dri/0/pstate 0f: core 575 MHz shader 1438 MHz memory 850 MHz AC: core 399 MHz shader 810 MHz memory 499 MHz [as root, after mounting debugfs] Earlier NV (up to GeForce 750) cards can be reclocked manually by echoing higher performance level to this file: echo 0f > /sys/kernel/debug/dri/0/pstate cat /sys/kernel/debug/dri/0/pstate 0f: core 575 MHz shader 1438 MHz memory 850 MHz AC DC * AC: core 576 MHz shader 1458 MHz memory 499 MHz you see, in my case (nv92) memory failed to reclock, known problem. And even this level of reclocking requre small kernel patch (not all nv92 GF 8800 crads actually have same type of memory, so some are reclockable and some are not). With new and higher-performant cards boot clocks acan be quite low, as low as 50Mhz (!) for some fermi card! No wonder anything OpenGL actually just a bit faster than llvmpipe... There is list of nouveau tasks, but they require a lot of knowledge to even try: https://www.x.org/wiki/SummerOfCodeIdeas/ -> Nouveau (Open Source NVIDIA driver) For just idea how much info must be parsed and passed into driver for 'simple' h264 decoding: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nouveau_vp3_video_vp.c Problem definitely exaggregated by fact firmware running on card video engines (there seems to be more than one - Bitstream Processor handles sequential tasks, Video Processor handles parallelizable tasks, and some xtensa microcontroller coordinates them, IIRC.) ALSO closed-source! There was parallel effort about RE'ing some ARM video SOCs: http://linux-sunxi.org/CedarX https://bootlin.com/blog/tag/vpu/ May be one can convince them to try and help nouveau, too (because it used with some tegra platforms ..with arm CPU). I'll try to convince nouveau developers to add some dmesg warnings about secure[GPU]Boot/reclocking problems, so it will be more visible, not sitting on some project website no-one visiting BEFORE running those cards with default nouveau driver ....

Olaf 2019-06-26 09:11 reporter ~0001789	With Nvidia it is not only the driver, also the enclosed OpenGL/Glx provides considerably more speed than the Mesa (SGI) installed by the distribution. (glxinfo \| grep vendor) (Videos with mpv jerk with the combi of the distribution! As long as the drivers are delivered by the manufacturer I see no reason why I should keep my paid hardware to a minimum.) @MatN > How did you measure video engine load? nvidia-settings -> Graphics Card Information, shows among other things the current load of the GPU and the VE.

Pierre 2019-06-26 00:14 updater ~0001787	I have no doubt that they are doing their best and I am sure that if it were possible, they would gladly produce a driver at least as good as Nvidia's. But I'm not a developer, I can't help them, I need a driver that meets my current expectations for video editing with Cin-GG. For the moment, it is the Nvidia driver who allows me to work best.

Andrew-R 2019-06-25 23:43 reporter ~0001786	re: nouveau (libre nvidia driver) Some problems were explained on IRC few days ago: https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-18 ----------- 13:59 karolherbst: and we have no fix for tearing so far 14:00 karolherbst: what's missing is reverse engineering of the line buffer 14:00 karolherbst: I think... but it depends on what kind of tearing you got ----------- https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-19 ------------ 16:20 imirkin_: as always, lack of interest/time is the reason 16:20 kreyren: how many active developers are there in nouveau? 16:20 imirkin_: maybe like 2, depending on how you account for things ------------ https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-20 --------- 14:33 imirkin: the gpu boots to low clocks 14:33 imirkin: the pmu must change to high clocks 14:34 kreyren: so can we do that on nouveau? 14:34 imirkin: on gm20x, it's a hilarious situation since we can actually change the clocks from non-secure mode 14:34 imirkin: but we can't control the fan 14:34 imirkin: on gp10x, we can't do the RE necessary to figure out how to perform the reclocking sequence 14:34 kreyren: is there an issue with overheating if we can't controll the fan? or it that controlled by GPU itself? 14:35 imirkin: the driver controls the fan 14:35 imirkin: so the gpu would overheat 14:35 imirkin: except on a laptop -------------------- see also Mailinglist for current issues (archives also can be interesting to read): https://lists.freedesktop.org/archives/nouveau/ I just want to point out nouveau issues come from real technosocial problem (hard-to-drive hardware with many complex processors on-board, coupled with serious lack of manpower), NOT because devs are lazy or don't give a damn.

Pierre 2019-06-25 22:04 updater ~0001785	It's been a long time since I tested the free driver for my Nvidia card. But from memory, with my display spread over three 1080p monitors, it was impossible for me to completely eliminate video tearing, whereas I manage it well with the Nvidia pilot.

Sam 2019-06-25 21:54 administrator ~0001784	I certainly can't speak for everyone, but for my situation only the proprietary Nvidia drivers work best. With the free drivers I can't run two 4K monitors, that only works with the proprietary graphics drivers because the additional display port via the free drivers isn't activated properly. There were also problems with the 4K resolution. From time to time, I also play games and there you can see the differences enormously.

MatN 2019-06-25 20:51 reporter ~0001783	Oh, are Nvidia open source drivers that bad? I was not aware of that. I don´t do gaming, so have no need for a separate GPU. Of course the end result is the most important. If only Nvidia proprietary drivers work acceptable on Nvidia hardware, then indeed they should be supported if possible. I withdraw my argument that proprietary Nvidia drivers like Cuda should be avoided. By the way, the video encoding/decoding is not Cuda or OpenCL, and this issue (90) was about the hardware accelerated encoding/decoding. I think that has been achieved. At least on my tests I did not see any different output from hardware or software; I looked at the encoded file with VLC. vdpau was a a little slower than vaapi. Issue 215 is more about generic hardware acceleration such as Cuda or OpenCL.

Sam 2019-06-25 19:26 administrator ~0001780 Last edited: 2019-06-25 19:28 View 2 revisions	Most people who work with Linux probably do it for the same reasons, because of the freedom of open source code. I try to use open source free software wherever possible. But I have to agree with Pierre. Nvidia is one such topic where I use for performance reasons the proprietary drivers. The free drivers are so bad on my notebook/laptop that I can't use the power of my laptop. I work almost exclusively with my laptop because I'm always on the go. On laptops only Nvidia graphics cards actually work properly at the moment. For this reason I would leave it to the user which drivers he wants to work with and that includes Cuda. Nevertheless, I welcome any development concerning open source and would like to see the graphics cards on free driver basis become more powerful in the future, so that I can remove the last proprietary drivers. But until then I find this temporary solution acceptable.

Pierre 2019-06-25 18:25 updater ~0001779	@MatN My comment would probably require a separate discussion, but here it is. I don't think Cin-GG should limit compatibility with proprietary drivers and their options such as Nvidia's, where possible. I am fully aware of the advantages offered by the open source world... but not at the price of reduced or limited performance. Cin-GG aims as much as possible to match the possibilities and performance of professional video editing (NLE) software. Those who use these software in the still highly competitive context of audio-visual, know that it is essential to do everything possible to preserve all the maximum quality of the source video sequences (regardless of their origin) and the efficiency of the subsequent processing. Let compatibility with proprietary software and codecs be a separate option, okay... but it should remain possible, easy and not limited.

MatN 2019-06-25 17:32 reporter ~0001776	@Olaf: yes, the vaapi and older vdpua interfaces are the the specialized video decoding/encoding hardware blocks in a GPU, not the mostly game-related big GPU blocks. According to ¨https://wiki.archlinux.org/index.php/Hardware_video_acceleration¨ vdpau is legacy, vaapi is current for AMD/Intel/Nvidia, except when using the Nvidia closed source driver for which Nvidia has a proprietary set of interfaces. I think Cin-gg should go the open source route where possible. For that last reason I also think Cuda should be avoided - it is proprietary Nvidia. Much better to use OpenCL which is supported by all three major GPU hardwares. How did you measure video engine load?

Olaf 2019-06-25 08:13 reporter ~0001775	Correct me if I'm wrong. Vdpau and Vaapi are drivers for the video engine. They are not drivers for GPU Utilization. A test to use Vdpau or Vaapi should at least include the results of the load of the video engine. Something like this (only exemplary values): Without Video Engine: 45% CPU load, 0% Video Engine load, Composer 18 fps Comparison with Video Engine: 20% CPU load, 80% Video Engine load, Composer 25 fps Be careful with rendering using the Video Engine, you should always check whether the same high-quality results are produced as with software rendering (CPU). Only then is a comparison possible. (For the future, because of a request: Drivers for the Video Engine do not support ICC profiles.)

Andrea_Paz 2019-06-24 16:24 manager ~0001774	I reviewed your results and thank you for the tests you did. Very interesting (I did not expect it!) that in the presence of effects RGBA-FLOAT is more efficient than RGBA-8bit. Maybe because Cin works internally in floating point and so it combines better with a color space of the same type? It is also interesting to define the number of CPU cores to 10 (for an 8 threads). I thought it was only valid for plugins engine and not for the CinGG engine in general. Vaapi seems better than Vdpau (which in fact Nvidia no longer develops in favor of Nvenc).

MatN 2019-06-23 20:59 reporter ~0001773	I have done some testing on the effects of using hardware accelerated video decoding and encoding. This is using the vaapi interface, on an AMD GPU. Details below. CinelerraGG 2019-05 hardware acceleration tests on AMD Ryzen 5 2400G (3.6 GHz, 4core/8threads, built-in GPU), SSD storage. Memory DDR4 2400 MHz, motherboard Asrock B450 Pro 4, BIOS 3.10 . No overclocking of CPU, GPU or memory. OS software Mint XFCE 19.1, kernel 4.18.0-21, graphics driver amdgpu, Mesa 18.2.8 . The XFCE´s panel app ¨cpu graph properties" used to track CPU usage (low cpu usage itself). Update rate set to ¨slowest". The hardware video decoding/encoding is done by special function units in the GPU, likely not affected by the size of the GPU but only by the clock speed, which I left at the Bios´ default ¨auto". This is a fairly fast machine, on slower machines a video card might provide more of a speedup. vainfo shows: libva info: VA-API version 1.1.0 libva info: va_getDriverName() returns 0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so libva info: Found init function __vaDriverInit_1_1 libva info: va_openDriver() returns 0 vainfo: VA-API version: 1.1 (libva 2.1.0) vainfo: Driver version: Mesa Gallium driver 18.2.8 for AMD RAVEN (DRM 3.26.0, 4.18.0-21-generic, LLVM 7.0.0) vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSlice VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSlice VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSlice VAProfileHEVCMain10 : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileVP9Profile2 : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc Cin "Settings->Preferences->Performance, Use HW device" set to ¨none¨ or ¨vaapi¨, project SMP CPUs set to 10. The CPU has 8 threads max, but I never got almost 100% unless I set it to 10. Cin "Settings->Preferences->Appearance, YUV color space BT709, YUV color range JPEG. Cin is always started from a terminal to see any errors. After changing HW device, exit and restart cin (to be sure). Source HD video, 350MByte, mp4, 1920x1080x50p, 10Mb/s, 268 secs. Load video as ¨replace current project¨. If the video is loaded and HW device is vaapi, the the terminal shows: mesa: for the -simplifycfg-sink-common option: may only occur zero or one times! mesa: for the -global-isel-abort option: may only occur zero or one times! After loading, Settings->Format->Color Model is set to RGBA-8bit. Depending on the test, change it to RGBA-FLOAT. Depending on the test, put the 3-way color effect on it but don´t change anything in the effect´s settings. Playback testing. 1. HW device none, RGBA-8 bit, effect none: CPU load 10%. 2. HW device none, RGBA-FLOAT, effect none: CPU load 12%. 3. HW device vaapi, RGBA-8 bit, effect none: CPU load 7%. 4. HW device vaapi, RGBA-FLOAT, effect none: CPU load 8%. 5. HW device none, RGBA-8 bit, effect Color 3-way: CPU load 73%. 6. HW device none, RGBA-FLOAT, effect Color 3-way: CPU load 66%. 7. HW device vaapi, RGBA-8 bit, effect Color 3-way: CPU load 60%. 8. HW device vaapi, RGBA-FLOAT, effect Color 3-way: CPU load 53%. Render testing. This is done with the effect Color 3-way active and the HW device aset to vaapi (should have no effect on rendering). File format ffmpeg.mp4, insertion strategy "create new resources only". Video compression settings depending on test. Rendering times both from cin terminal as manually stopped using stopwatch. When using vaapi_h264, the bitrate was set to 10 Mb/s, else it would encode with 31 Mb and produce a much bigger file. And the source was only 10 Mb/s. 9. RGBA-8 bit, bitrate 0, h264.mp4: CPU load wildly varying 50-98%, 954 secs, 14 fps, filesize 289 MByte, bitrate 8,6 Mb/s. 10. RGBA-FLOAT, bitrate 0, h264.mp4: CPU load wildly varying 55-98%, 1252 secs, 11 fps, filesize 287 MByte, bitrate 8,6 Mb/s. 11. RGBA-8 bit, bitrate 10000000, h264_vaapi.mp4: CPU load 51-63%, 741 secs, 18 fps, filesize 339 MByte, bitrate 10 Mb/s. 12. RGBA-FLOAT, bitrate 10000000, h264_vaapi.mp4: CPU load 46-60%, 1050 secs, 13 fps, filesize xxx MByte, bitrate xxx Mb/s. Where the Cin terminal at the end of rendering normally looks like: Render::render_single: Session finished. ** rendered 13406 frames in 267.184 secs, 50.175 fps audio0 pad 64 0 (64) at 11) and 12) it displayed additionally: Failed to get HW surface format. Failed to get HW surface format. Failed to get HW surface format. Failed to get HW surface format. Failed to get HW surface format. Failed to get HW surface format. Failed to get HW surface format. FFStream::decode: avcodec_send_packet failed. file:/home/mat/Gouda_vaapi_high.mp4 err: Invalid data found when processing input FFStream::decode: failed My conclusion is that hardware-accelerated de- and encoding provides definite benefits. Especially on encoding it leaves more CPU over to do more effects. The GPU document only mentions Intel for vaapi, it should mention AMD as well.

MatN 2019-06-23 20:44 reporter ~0001772	Cuda is very different from hardware decoding/encoding such as controlled by the vaapi/vdpau interfaces. It should be compared to OpenCL, I think. And OpenCL is vendor-independent, unlike Cuda. I think enabling the hardware de/encoding should be different from using aGPU compute interface like Cuda/OpenCL in the performance tab. I´ll be posting some vaapi test after this.

Sam 2019-06-21 00:50 administrator ~0001747	What a positive surprise that the integration of CUDA has worked. Unfortunately I am not able to test it currently, because I still have outdated drivers installed. I will switch to the latest OpenSuse version in a few days anyway and then use this opportunity to install the latest graphics drivers and also test CUDA. In a few days I will add this new feature to the feature page. Thanks again for the great work.

PhyllisSmith 2019-06-21 00:37 manager ~0001744	With the addition of an optional Cuda build/usage, this task is complete as far as I know. A document covering all of the encode/decode/Cuda parts has been updated and is temporarily at: https://www.cinelerra-gg.org/download/GPU_potential_speedup.pdf It has been added locally to the manual which will be updated later. I will mark this resolved later in case anyone wants to add a comment yet.

PhyllisSmith 2019-06-16 22:04 manager ~0001735	Hardware acceleration for encoding for rendering is now available for most graphics card. It will automatically be built in on the monthly builds. The vendors for the boards, have limited which codecs they implemented. Currently available are: Nvidia: h264 and h265 (h265 works only for later Nvidia boards - at least for Maxwell chip boards). Intel HD graphics: h264, mpeg2, mjpeg, h265, and possibly vp8/vp9 (h264 and mpeg2 work on, for example Intel Broadwell).

PhyllisSmith 2019-05-16 03:27 manager ~0001547	As Andrew stated, this is partially implemented with ENCODING, but not for Nvidia graphics boards which need NVENC. I will leave this open until gg has a chance to look into ffmpeg having nvenc removed from the non-free list.

Andrew-R 2019-05-14 04:12 reporter ~0001521	Partially present since https://git.cinelerra-gg.org/git/?p=goodguy/cinelerra.git;a=commit;h=1d4f5d708de0d8ec19300b417354a3374d00ed47 See email at https://lists.cinelerra-gg.org/pipermail/cin/2019-May/000603.html

Date Modified	Username	Field	Change
2018-12-30 19:53	MatN	New Issue
2019-05-14 04:12	Andrew-R	Note Added: 0001521
2019-05-16 03:25	PhyllisSmith	Assigned To	=> goodguy
2019-05-16 03:25	PhyllisSmith	Status	new => assigned
2019-05-16 03:27	PhyllisSmith	Status	assigned => acknowledged
2019-05-16 03:27	PhyllisSmith	Note Added: 0001547
2019-06-16 22:04	PhyllisSmith	Status	acknowledged => feedback
2019-06-16 22:04	PhyllisSmith	Note Added: 0001735
2019-06-21 00:37	PhyllisSmith	Note Added: 0001744
2019-06-21 00:50	Sam	Note Added: 0001747
2019-06-23 20:44	MatN	Note Added: 0001772
2019-06-23 20:44	MatN	Status	feedback => assigned
2019-06-23 20:59	MatN	Note Added: 0001773
2019-06-24 16:24	Andrea_Paz	Note Added: 0001774
2019-06-25 08:13	Olaf	Note Added: 0001775
2019-06-25 17:32	MatN	Note Added: 0001776
2019-06-25 18:25	Pierre	Note Added: 0001779
2019-06-25 19:26	Sam	Note Added: 0001780
2019-06-25 19:28	Sam	Note Edited: 0001780	View Revisions
2019-06-25 20:51	MatN	Note Added: 0001783
2019-06-25 21:54	Sam	Note Added: 0001784
2019-06-25 22:04	Pierre	Note Added: 0001785
2019-06-25 23:43	Andrew-R	Note Added: 0001786
2019-06-26 00:14	Pierre	Note Added: 0001787
2019-06-26 09:11	Olaf	Note Added: 0001789
2019-06-26 14:56	Andrew-R	Note Added: 0001793
2019-08-07 22:28	PhyllisSmith	Status	assigned => closed
2019-08-07 22:28	PhyllisSmith	Resolution	open => fixed
2019-08-07 22:28	PhyllisSmith	Note Added: 0002004

View Issue Details

Activities

Issue History