View Issue Details

IDProjectCategoryView StatusLast Update
0000090Cinelerra-GG[All Projects] Featurepublic2019-08-07 22:28
ReporterMatNAssigned Togoodguy 
PrioritynormalSeverityfeatureReproducibilityalways
Status closedResolutionfixed 
Platformx86_64OSLinuxMintOS Version18.3
Product Version2018-11 
Target VersionFixed in Version 
Summary0000090: Allow optional hardware-supported encoding during rendering
DescriptionDepending on the system, there might be hardware rendering (GPU) available. If available, the available options are listed using the "vainfo" command in a terminal window. On Mint at least, you have to install vainfo first. I don't know how much effort is is to retrieve that info using native Cinelerra code.

If hardware acceleration is available for a chosen format (say h.264 or h.265), could it be made available in the "compression" dropdown list in the video settings of the rendering window? Maybe the quality is less than pure software rendering, but if the hardware causes an considerable speedup then certainly for non-professional use this would be a nice option to have.
TagsNo tags attached.

Activities

PhyllisSmith

PhyllisSmith

2019-08-07 22:28

manager   ~0002004

Closing -- good discussions here though! but you can now do "low optional hardware-supported encoding during rendering" as described in the notes and pdf file.
Andrew-R

Andrew-R

2019-06-26 14:56

reporter   ~0001793

well, all those components (kernel part, 2d/video part for X, OpenGl, OpenCL, cuda, Vulkan, nvenc/dec) quite closely related. I don't think kernel developers are very happy about 10Mb 'black box' module, doing all those DMA and memory operations in kernel address space, at least if you want to send kernel bugreport.

With nouveau current load of some engines implemented as Gallium HUD (and some OpenGL extension). For clocks you can try:
 cat /sys/kernel/debug/dri/0/pstate
0f: core 575 MHz shader 1438 MHz memory 850 MHz
AC: core 399 MHz shader 810 MHz memory 499 MHz
[as root, after mounting debugfs]

Earlier NV (up to GeForce 750) cards can be reclocked manually by echoing higher performance level to this file:
echo 0f > /sys/kernel/debug/dri/0/pstate
cat /sys/kernel/debug/dri/0/pstate
0f: core 575 MHz shader 1438 MHz memory 850 MHz AC DC *
AC: core 576 MHz shader 1458 MHz memory 499 MHz

you see, in my case (nv92) memory failed to reclock, known problem. And even this level of reclocking requre small kernel patch (not all nv92 GF 8800 crads actually have same type of memory, so some are reclockable and some are not). With new and higher-performant cards boot clocks acan be quite low, as low as 50Mhz (!) for some fermi card! No wonder anything OpenGL actually just a bit faster than llvmpipe...

There is list of nouveau tasks, but they require a lot of knowledge to even try:
 https://www.x.org/wiki/SummerOfCodeIdeas/ -> Nouveau (Open Source NVIDIA driver)

For just idea how much info must be parsed and passed into driver for 'simple' h264 decoding:
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nouveau_vp3_video_vp.c

Problem definitely exaggregated by fact firmware running on card video engines (there seems to be more than one - Bitstream Processor handles sequential tasks, Video Processor handles parallelizable tasks, and some xtensa microcontroller coordinates them, IIRC.) ALSO closed-source!

There was parallel effort about RE'ing some ARM video SOCs:
http://linux-sunxi.org/CedarX
https://bootlin.com/blog/tag/vpu/

May be one can convince them to try and help nouveau, too (because it used with some tegra platforms ..with arm CPU).
I'll try to convince nouveau developers to add some dmesg warnings about secure[GPU]Boot/reclocking problems, so it will be more visible, not sitting on some project website no-one visiting BEFORE running those cards with default nouveau driver ....
Olaf

Olaf

2019-06-26 09:11

reporter   ~0001789

With Nvidia it is not only the driver, also the enclosed OpenGL/Glx provides considerably more speed than the Mesa (SGI) installed by the distribution. (glxinfo | grep vendor)
(Videos with mpv jerk with the combi of the distribution! As long as the drivers are delivered by the manufacturer I see no reason why I should keep my paid hardware to a minimum.)

@MatN
> How did you measure video engine load?
nvidia-settings -> Graphics Card Information,
shows among other things the current load of the GPU and the VE.
Pierre

Pierre

2019-06-26 00:14

updater   ~0001787

I have no doubt that they are doing their best and I am sure that if it were possible, they would gladly produce a driver at least as good as Nvidia's.

But I'm not a developer, I can't help them, I need a driver that meets my current expectations for video editing with Cin-GG. For the moment, it is the Nvidia driver who allows me to work best.
Andrew-R

Andrew-R

2019-06-25 23:43

reporter   ~0001786

re: nouveau (libre nvidia driver)
Some problems were explained on IRC few days ago:

https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-18
-----------
13:59 karolherbst: and we have no fix for tearing so far
14:00 karolherbst: what's missing is reverse engineering of the line buffer
14:00 karolherbst: I think... but it depends on what kind of tearing you got
-----------

https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-19
------------
16:20 imirkin_: as always, lack of interest/time is the reason
16:20 kreyren: how many active developers are there in nouveau?
16:20 imirkin_: maybe like 2, depending on how you account for things
------------

https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-06-20
---------
14:33 imirkin: the gpu boots to low clocks
14:33 imirkin: the pmu must change to high clocks
14:34 kreyren: so can we do that on nouveau?
14:34 imirkin: on gm20x, it's a hilarious situation since we can actually change the clocks from non-secure mode
14:34 imirkin: but we *can't* control the fan
14:34 imirkin: on gp10x, we can't do the RE necessary to figure out how to perform the reclocking sequence
14:34 kreyren: is there an issue with overheating if we can't controll the fan? or it that controlled by GPU itself?
14:35 imirkin: the driver controls the fan
14:35 imirkin: so the gpu would overheat
14:35 imirkin: except on a laptop
--------------------

see also Mailinglist for current issues (archives also can be interesting to read):
https://lists.freedesktop.org/archives/nouveau/

I just want to point out nouveau issues come from real technosocial problem (hard-to-drive hardware with many complex processors on-board, coupled with serious lack of manpower), NOT because devs are lazy or don't give a damn.
Pierre

Pierre

2019-06-25 22:04

updater   ~0001785

It's been a long time since I tested the free driver for my Nvidia card. But from memory, with my display spread over three 1080p monitors, it was impossible for me to completely eliminate video tearing, whereas I manage it well with the Nvidia pilot.
Sam

Sam

2019-06-25 21:54

administrator   ~0001784

I certainly can't speak for everyone, but for my situation only the proprietary Nvidia drivers work best. With the free drivers I can't run two 4K monitors, that only works with the proprietary graphics drivers because the additional display port via the free drivers isn't activated properly. There were also problems with the 4K resolution. From time to time, I also play games and there you can see the differences enormously.
MatN

MatN

2019-06-25 20:51

reporter   ~0001783

Oh, are Nvidia open source drivers that bad? I was not aware of that. I don´t do gaming, so have no need for a separate GPU.

Of course the end result is the most important. If only Nvidia proprietary drivers work acceptable on Nvidia hardware, then indeed they should be supported if possible. I withdraw my argument that proprietary Nvidia drivers like Cuda should be avoided.

By the way, the video encoding/decoding is not Cuda or OpenCL, and this issue (90) was about the hardware accelerated encoding/decoding. I think that has been achieved. At least on my tests I did not see any different output from hardware or software; I looked at the encoded file with VLC. vdpau was a a little slower than vaapi.

Issue 215 is more about generic hardware acceleration such as Cuda or OpenCL.
Sam

Sam

2019-06-25 19:26

administrator   ~0001780

Last edited: 2019-06-25 19:28

View 2 revisions

Most people who work with Linux probably do it for the same reasons, because of the freedom of open source code. I try to use open source free software wherever possible. But I have to agree with Pierre. Nvidia is one such topic where I use for performance reasons the proprietary drivers. The free drivers are so bad on my notebook/laptop that I can't use the power of my laptop. I work almost exclusively with my laptop because I'm always on the go.

On laptops only Nvidia graphics cards actually work properly at the moment. For this reason I would leave it to the user which drivers he wants to work with and that includes Cuda. Nevertheless, I welcome any development concerning open source and would like to see the graphics cards on free driver basis become more powerful in the future, so that I can remove the last proprietary drivers. But until then I find this temporary solution acceptable.

Pierre

Pierre

2019-06-25 18:25

updater   ~0001779

@MatN

My comment would probably require a separate discussion, but here it is.

I don't think Cin-GG should limit compatibility with proprietary drivers and their options such as Nvidia's, where possible.

I am fully aware of the advantages offered by the open source world... but not at the price of reduced or limited performance. Cin-GG aims as much as possible to match the possibilities and performance of professional video editing (NLE) software. Those who use these software in the still highly competitive context of audio-visual, know that it is essential to do everything possible to preserve all the maximum quality of the source video sequences (regardless of their origin) and the efficiency of the subsequent processing. Let compatibility with proprietary software and codecs be a separate option, okay... but it should remain possible, easy and not limited.
MatN

MatN

2019-06-25 17:32

reporter   ~0001776

@Olaf: yes, the vaapi and older vdpua interfaces are the the specialized video decoding/encoding hardware blocks in a GPU, not the mostly game-related big GPU blocks. According to ¨https://wiki.archlinux.org/index.php/Hardware_video_acceleration¨ vdpau is legacy, vaapi is current for AMD/Intel/Nvidia, except when using the Nvidia closed source driver for which Nvidia has a proprietary set of interfaces. I think Cin-gg should go the open source route where possible.
For that last reason I also think Cuda should be avoided - it is proprietary Nvidia. Much better to use OpenCL which is supported by all three major GPU hardwares.

How did you measure video engine load?
Olaf

Olaf

2019-06-25 08:13

reporter   ~0001775

Correct me if I'm wrong. Vdpau and Vaapi are drivers for the video engine. They are not drivers for GPU Utilization.

A test to use Vdpau or Vaapi should at least include the results of the load of the video engine. Something like this (only exemplary values):
Without Video Engine: 45% CPU load, 0% Video Engine load, Composer 18 fps
Comparison with Video Engine: 20% CPU load, 80% Video Engine load, Composer 25 fps

Be careful with rendering using the Video Engine, you should always check whether the same high-quality results are produced as with software rendering (CPU). Only then is a comparison possible.

(For the future, because of a request: Drivers for the Video Engine do not support ICC profiles.)
Andrea_Paz

Andrea_Paz

2019-06-24 16:24

updater   ~0001774

I reviewed your results and thank you for the tests you did.
Very interesting (I did not expect it!) that in the presence of effects RGBA-FLOAT is more efficient than RGBA-8bit. Maybe because Cin works internally in floating point and so it combines better with a color space of the same type?
It is also interesting to define the number of CPU cores to 10 (for an 8 threads). I thought it was only valid for plugins engine and not for the CinGG engine in general.
Vaapi seems better than Vdpau (which in fact Nvidia no longer develops in favor of Nvenc).
MatN

MatN

2019-06-23 20:59

reporter   ~0001773

I have done some testing on the effects of using hardware accelerated video decoding and encoding. This is using the vaapi interface, on an AMD GPU. Details below.

CinelerraGG 2019-05 hardware acceleration tests on AMD Ryzen 5 2400G (3.6 GHz, 4core/8threads, built-in GPU), SSD storage.
Memory DDR4 2400 MHz, motherboard Asrock B450 Pro 4, BIOS 3.10 . No overclocking of CPU, GPU or memory.
OS software Mint XFCE 19.1, kernel 4.18.0-21, graphics driver amdgpu, Mesa 18.2.8 .
The XFCE´s panel app ¨cpu graph properties" used to track CPU usage (low cpu usage itself). Update rate set to ¨slowest".

The hardware video decoding/encoding is done by special function units in the GPU, likely not affected by the
size of the GPU but only by the clock speed, which I left at the Bios´ default ¨auto". This is a fairly fast machine, on slower machines a video card might provide more of a speedup.

vainfo shows:

libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_1
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.1 (libva 2.1.0)
vainfo: Driver version: Mesa Gallium driver 18.2.8 for AMD RAVEN (DRM 3.26.0, 4.18.0-21-generic, LLVM 7.0.0)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple : VAEntrypointVLD
      VAProfileMPEG2Main : VAEntrypointVLD
      VAProfileVC1Simple : VAEntrypointVLD
      VAProfileVC1Main : VAEntrypointVLD
      VAProfileVC1Advanced : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264Main : VAEntrypointVLD
      VAProfileH264Main : VAEntrypointEncSlice
      VAProfileH264High : VAEntrypointVLD
      VAProfileH264High : VAEntrypointEncSlice
      VAProfileHEVCMain : VAEntrypointVLD
      VAProfileHEVCMain : VAEntrypointEncSlice
      VAProfileHEVCMain10 : VAEntrypointVLD
      VAProfileVP9Profile0 : VAEntrypointVLD
      VAProfileVP9Profile2 : VAEntrypointVLD
      VAProfileNone : VAEntrypointVideoProc

Cin "Settings->Preferences->Performance, Use HW device" set to ¨none¨ or ¨vaapi¨, project SMP CPUs set to 10. The CPU has 8 threads max, but I never got almost 100% unless I set it to 10.
Cin "Settings->Preferences->Appearance, YUV color space BT709, YUV color range JPEG.
Cin is always started from a terminal to see any errors. After changing HW device, exit and restart cin (to be sure).

Source HD video, 350MByte, mp4, 1920x1080x50p, 10Mb/s, 268 secs.
Load video as ¨replace current project¨.
If the video is loaded and HW device is vaapi, the the terminal shows:
mesa: for the -simplifycfg-sink-common option: may only occur zero or one times!
mesa: for the -global-isel-abort option: may only occur zero or one times!

After loading, Settings->Format->Color Model is set to RGBA-8bit. Depending on the test, change it to RGBA-FLOAT.
Depending on the test, put the 3-way color effect on it but don´t change anything in the effect´s settings.

Playback testing.
1. HW device none, RGBA-8 bit, effect none: CPU load 10%.
2. HW device none, RGBA-FLOAT, effect none: CPU load 12%.
3. HW device vaapi, RGBA-8 bit, effect none: CPU load 7%.
4. HW device vaapi, RGBA-FLOAT, effect none: CPU load 8%.
5. HW device none, RGBA-8 bit, effect Color 3-way: CPU load 73%.
6. HW device none, RGBA-FLOAT, effect Color 3-way: CPU load 66%.
7. HW device vaapi, RGBA-8 bit, effect Color 3-way: CPU load 60%.
8. HW device vaapi, RGBA-FLOAT, effect Color 3-way: CPU load 53%.

Render testing. This is done with the effect Color 3-way active and the HW device aset to vaapi
(should have no effect on rendering). File format ffmpeg.mp4, insertion strategy "create new resources only".
Video compression settings depending on test. Rendering times both from cin terminal as manually stopped
using stopwatch. When using vaapi_h264, the bitrate was set to 10 Mb/s, else it would encode with 31 Mb and produce a much bigger file. And the source was only 10 Mb/s.
9. RGBA-8 bit, bitrate 0, h264.mp4: CPU load wildly varying 50-98%, 954 secs, 14 fps, filesize 289 MByte, bitrate 8,6 Mb/s.
10. RGBA-FLOAT, bitrate 0, h264.mp4: CPU load wildly varying 55-98%, 1252 secs, 11 fps, filesize 287 MByte, bitrate 8,6 Mb/s.
11. RGBA-8 bit, bitrate 10000000, h264_vaapi.mp4: CPU load 51-63%, 741 secs, 18 fps, filesize 339 MByte, bitrate 10 Mb/s.
12. RGBA-FLOAT, bitrate 10000000, h264_vaapi.mp4: CPU load 46-60%, 1050 secs, 13 fps, filesize xxx MByte, bitrate xxx Mb/s.

Where the Cin terminal at the end of rendering normally looks like:
Render::render_single: Session finished.
** rendered 13406 frames in 267.184 secs, 50.175 fps
audio0 pad 64 0 (64)

at 11) and 12) it displayed additionally:
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
Failed to get HW surface format.
FFStream::decode: avcodec_send_packet failed.
file:/home/mat/Gouda_vaapi_high.mp4
  err: Invalid data found when processing input
FFStream::decode: failed

My conclusion is that hardware-accelerated de- and encoding provides definite benefits. Especially on encoding it leaves more CPU over to do more effects. The GPU document only mentions Intel for vaapi, it should mention AMD as well.
MatN

MatN

2019-06-23 20:44

reporter   ~0001772

Cuda is very different from hardware decoding/encoding such as controlled by the vaapi/vdpau interfaces. It should be compared to OpenCL, I think.
And OpenCL is vendor-independent, unlike Cuda. I think enabling the hardware de/encoding should be different from using aGPU compute interface like
Cuda/OpenCL in the performance tab.
I´ll be posting some vaapi test after this.
Sam

Sam

2019-06-21 00:50

administrator   ~0001747

What a positive surprise that the integration of CUDA has worked. Unfortunately I am not able to test it currently, because I still have outdated drivers installed. I will switch to the latest OpenSuse version in a few days anyway and then use this opportunity to install the latest graphics drivers and also test CUDA. In a few days I will add this new feature to the feature page. Thanks again for the great work.
PhyllisSmith

PhyllisSmith

2019-06-21 00:37

manager   ~0001744

With the addition of an optional Cuda build/usage, this task is complete as far as I know. A document covering all of the encode/decode/Cuda parts has been updated and is temporarily at:

https://www.cinelerra-gg.org/download/GPU_potential_speedup.pdf

It has been added locally to the manual which will be updated later. I will mark this resolved later in case anyone wants to add a comment yet.
PhyllisSmith

PhyllisSmith

2019-06-16 22:04

manager   ~0001735

Hardware acceleration for encoding for rendering is now available for most graphics card. It will automatically be built in on the monthly builds. The vendors for the boards, have limited which codecs they implemented. Currently available are:

Nvidia: h264 and h265 (h265 works only for later Nvidia boards - at least for Maxwell chip boards).
Intel HD graphics: h264, mpeg2, mjpeg, h265, and possibly vp8/vp9 (h264 and mpeg2 work on, for example Intel Broadwell).
PhyllisSmith

PhyllisSmith

2019-05-16 03:27

manager   ~0001547

As Andrew stated, this is partially implemented with ENCODING, but not for Nvidia graphics boards which need NVENC. I will leave this open until gg has a chance to look into ffmpeg having nvenc removed from the non-free list.
Andrew-R

Andrew-R

2019-05-14 04:12

reporter   ~0001521

Partially present since https://git.cinelerra-gg.org/git/?p=goodguy/cinelerra.git;a=commit;h=1d4f5d708de0d8ec19300b417354a3374d00ed47

See email at https://lists.cinelerra-gg.org/pipermail/cin/2019-May/000603.html

Issue History

Date Modified Username Field Change
2018-12-30 19:53 MatN New Issue
2019-05-14 04:12 Andrew-R Note Added: 0001521
2019-05-16 03:25 PhyllisSmith Assigned To => goodguy
2019-05-16 03:25 PhyllisSmith Status new => assigned
2019-05-16 03:27 PhyllisSmith Status assigned => acknowledged
2019-05-16 03:27 PhyllisSmith Note Added: 0001547
2019-06-16 22:04 PhyllisSmith Status acknowledged => feedback
2019-06-16 22:04 PhyllisSmith Note Added: 0001735
2019-06-21 00:37 PhyllisSmith Note Added: 0001744
2019-06-21 00:50 Sam Note Added: 0001747
2019-06-23 20:44 MatN Note Added: 0001772
2019-06-23 20:44 MatN Status feedback => assigned
2019-06-23 20:59 MatN Note Added: 0001773
2019-06-24 16:24 Andrea_Paz Note Added: 0001774
2019-06-25 08:13 Olaf Note Added: 0001775
2019-06-25 17:32 MatN Note Added: 0001776
2019-06-25 18:25 Pierre Note Added: 0001779
2019-06-25 19:26 Sam Note Added: 0001780
2019-06-25 19:28 Sam Note Edited: 0001780 View Revisions
2019-06-25 20:51 MatN Note Added: 0001783
2019-06-25 21:54 Sam Note Added: 0001784
2019-06-25 22:04 Pierre Note Added: 0001785
2019-06-25 23:43 Andrew-R Note Added: 0001786
2019-06-26 00:14 Pierre Note Added: 0001787
2019-06-26 09:11 Olaf Note Added: 0001789
2019-06-26 14:56 Andrew-R Note Added: 0001793
2019-08-07 22:28 PhyllisSmith Status assigned => closed
2019-08-07 22:28 PhyllisSmith Resolution open => fixed
2019-08-07 22:28 PhyllisSmith Note Added: 0002004