0000294: Support for RGBA-Float in playback3d.C

ID	Project	Category	View Status	Date Submitted	Last Update

0000294	Cinelerra-GG	[All Projects] Feature	public	2019-09-05 23:51	2020-09-04 20:04

Reporter	Andrew-R	Assigned To	PhyllisSmith
Priority	normal	Severity	minor	Reproducibility	always
Status	acknowledged	Resolution	open
Product Version	2019-06
Target Version		Fixed in Version

Summary	0000294: Support for RGBA-Float in playback3d.C
Description	Hello! I was testing yet another build of CinGG (Cinelerra Infinity - built: Sep 6 2019 00:54:56) and found that while RGBA-8 colormodel plays at 25 fps for both X11 and OpenGL outputs - RGBA-FLOAT drops down to 5-10 fps. Looking at cinelerra-5.1/cinelerra/playback3d.C I see void Playback3D::convert_cmodel(Canvas canvas, VFrame output, int dst_cmodel) { // Do nothing if colormodels are equivalent in OpenGL & the image is in hardware. int src_cmodel = output->get_color_model(); if( (output->get_opengl_state() == VFrame::TEXTURE \|\| output->get_opengl_state() == VFrame::SCREEN) && // OpenGL has no floating point. ( (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGB_FLOAT) \|\| (src_cmodel == BC_RGBA8888 && dst_cmodel == BC_RGBA_FLOAT) \|\| (src_cmodel == BC_RGB_FLOAT && dst_cmodel == BC_RGB888) \|\| (src_cmodel == BC_RGBA_FLOAT && dst_cmodel == BC_RGBA8888) \|\| // OpenGL sets alpha to 1 on import (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGBA8888) \|\| (src_cmodel == BC_YUV888 && dst_cmodel == BC_YUVA8888) \|\| (src_cmodel == BC_RGB_FLOAT && dst_cmodel == BC_RGBA_FLOAT) ) ) return; well a bit down same file there is table describing some conversions, but I'm not sure it covers all cases? void Playback3D::convert_cmodel_sync(Playback3DCommand *command) {skip} static cmodel_shader_table_t cmodel_shader_table[] = { { BC_RGB888, BC_YUV888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_RGB888, BC_YUVA8888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_RGBA8888, BC_RGB888, rgb_to_rgb, rgba_to_rgb_frag }, { BC_RGBA8888, BC_RGB_FLOAT, rgb_to_rgb, rgba_to_rgb_frag }, { BC_RGBA8888, BC_YUV888, rgb_to_yuv, rgba_to_yuv_frag }, { BC_RGBA8888, BC_YUVA8888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_RGB_FLOAT, BC_YUV888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_RGB_FLOAT, BC_YUVA8888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_RGBA_FLOAT,BC_RGB888, rgb_to_rgb, rgba_to_rgb_frag }, { BC_RGBA_FLOAT,BC_RGB_FLOAT, rgb_to_rgb, rgba_to_rgb_frag }, { BC_RGBA_FLOAT,BC_YUV888, rgb_to_yuv, rgba_to_yuv_frag }, { BC_RGBA_FLOAT,BC_YUVA8888, rgb_to_yuv, rgb_to_yuv_frag }, { BC_YUV888, BC_RGB888, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUV888, BC_RGBA8888, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUV888, BC_RGB_FLOAT, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUV888, BC_RGBA_FLOAT, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUVA8888, BC_RGB888, yuv_to_rgb, yuva_to_rgb_frag }, { BC_YUVA8888, BC_RGBA8888, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUVA8888, BC_RGB_FLOAT, yuv_to_rgb, yuva_to_rgb_frag }, { BC_YUVA8888, BC_RGBA_FLOAT, yuv_to_rgb, yuv_to_rgb_frag }, { BC_YUVA8888, BC_YUV888, yuv_to_yuv, yuva_to_yuv_frag }, Thing is, you apparently CAN use floating-point textures and renderbuffesr in openGL 3+! https://learnopengl.com/Advanced-Lighting/HDR -----quote----- Floating point framebuffers To implement high dynamic range rendering we need some way to prevent color values getting clamped after each fragment shader run. When framebuffers use a normalized fixed-point color format (like GL_RGB) as their colorbuffer's internal format OpenGL automatically clamps the values between 0.0 and 1.0 before storing them in the framebuffer. This operation holds for most types of framebuffer formats, except for floating point formats that are used for their extended range of values. When the internal format of a framebuffer's colorbuffer is specified as GL_RGB16F, GL_RGBA16F, GL_RGB32F or GL_RGBA32F the framebuffer is known as a floating point framebuffer that can store floating point values outside the default range of 0.0 and 1.0. This is perfect for rendering in high dynamic range! To create a floating point framebuffer the only thing we need to change is its colorbuffer's internal format parameter: glBindTexture(GL_TEXTURE_2D, colorBuffer); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, SCR_WIDTH, SCR_HEIGHT, 0, GL_RGBA, GL_FLOAT, NULL); The default framebuffer of OpenGL (by default) only takes up 8 bits per color component. With a floating point framebuffer with 32 bits per color component (when using GL_RGB32F or GL_RGBA32F) we're using 4 times more memory for storing color values. As 32 bits isn't really necessary unless you need a high level of precision using GL_RGBA16F will suffice. ---end of quote----- Because for RGBA-Float project colormodel you mostly need RGBA-F16 conversions and texture modes - it should be relatively simple ...ah, not if you count all plugins :/ But if whole pipeline run at R16G16B16A16 - then ffmpeg will covert video streams into this format at decoding stage, and then Cinelerra only need to deal with RGB(A) floating-point until final encoding stage, where again ffmpeg will pick up data, convert it for specific encoder and make it all work .... But plugins all use old 8-bit/channel OpenGL logic, so they probably will require additional attention. glxinfo for me (OpenGL 3.3/DX10 era hw) gives: glxinfo \| grep float GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample, GLX_EXT_fbconfig_packed_float, GLX_EXT_framebuffer_sRGB, GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, GLX_EXT_create_context_es_profile, GLX_EXT_fbconfig_packed_float, GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, GLX_EXT_fbconfig_packed_float, GLX_EXT_framebuffer_sRGB, GL_ARB_color_buffer_float, GL_ARB_compressed_texture_pixel_storage, GL_ARB_debug_output, GL_ARB_depth_buffer_float, GL_ARB_depth_clamp, GL_ARB_get_texture_sub_image, GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, GL_ARB_instanced_arrays, GL_ARB_texture_filter_anisotropic, GL_ARB_texture_float, GL_ATI_blend_equation_separate, GL_ATI_texture_float, GL_EXT_framebuffer_sRGB, GL_EXT_packed_depth_stencil, GL_EXT_packed_float, GL_ARB_clip_control, GL_ARB_color_buffer_float, GL_ARB_compatibility, GL_ARB_debug_output, GL_ARB_depth_buffer_float, GL_ARB_depth_clamp, GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, GL_ARB_texture_float, GL_ARB_texture_mirror_clamp_to_edge, GL_ATI_texture_float, GL_ATI_texture_mirror_once, GL_EXT_abgr, GL_EXT_packed_float, GL_EXT_packed_pixels, GL_EXT_pixel_buffer_object, GL_EXT_clip_control, GL_EXT_clip_cull_distance, GL_EXT_color_buffer_float, GL_EXT_draw_elements_base_vertex, GL_EXT_float_blend, GL_EXT_frag_depth, GL_OES_texture_border_clamp, GL_OES_texture_float, GL_OES_texture_float_linear, GL_OES_texture_half_float, GL_OES_texture_half_float_linear, GL_OES_texture_npot, GL_OES_vertex_half_float (for Core and Compat profiles, and GLX server/client parts) Mesa has this enabled by default since June, 2018 (on hardware where it matters and all paths actually done in specific driver) https://gitlab.freedesktop.org/lima/mesa/commit/66673bef941af344314fe9c91cad8cd330b245eb Anyway, leaving this ticket hanging around, just in case GG or anyone else will have some time to play around with this idea.
Steps To Reproduce	Try to set project format to RGBA-Float (from default RGBA-8) and Output driver to X11-OpenGL. Observe slowdown at playing even single track video.
Additional Information	git version commit 721a106de35567bcab14a0e92718767189acf176 (grafted, HEAD -> master, origin/master, origin/HEAD) Author: Good Guy <[email protected]> Date: Wed Sep 4 12:26:37 2019 -0600 add crop plugin, add timeline bars, render setup err chks, minor tweaks
Tags	No tags attached.

Andrew-R 2020-09-04 20:04 reporter ~0003989	Well, I found similar patch/commit in Olive (related to color management?): https://github.com/olive-editor/olive/commit/9c920cfcc3357229208cdc8b537b0a09151451b7 "added settings for bit depth" Also, my 'new' GeForce GT240 (reclocked, still with nouveau) seems to be faster than my G92 :}

Andrew-R 2020-03-16 16:20 reporter ~0002905	Another thought, while reading about openEXR ..... Here for example you can find interesting statement https://developer.nvidia.com/gpugems/gpugems/part-iv-image-processing/chapter-26-openexr-image-file-format --quote start----- 26.1.2 A "Half" Format Early in 2003, ILM released a new HDR file format with 16-bit floating-point color-component values. Because the IEEE 754 floating-point specification does not define a 16-bit format, ILM created a half format that matches NVIDIA's 16-bit format. The half type provides an excellent storage structure for high-dynamic-range image content. This type is directly supported in the OpenEXR format. The 16-bit, or "half-precision," floating-point format is modeled after the IEEE 754 single-precision and double-precision formats. A half-precision number consists of a sign bit, a 5-bit exponent, and a 10-bit mantissa. The smallest and largest possible exponent values are reserved for representing zero, denormalized numbers, infinities, and NaNs. In OpenEXR's C++ implementation, numbers of type half generally behave like the built-in C++ floating-point types, float and double. The half, float, and double types can be mixed freely in arithmetic expressions. Here are a few examples: half a (3.5); float b (a + sqrt(a)); a += b; b += a; b = a + 7; ---quote end---- Does this mean if Cinelerra about to be built with OpenEXR this new dataype can be used in just slightly modified float routines? Some more links: https://stackoverflow.com/questions/1659440/32-bit-to-16-bit-floating-point-conversion https://galfar.vevb.net/wp/2011/16bit-half-float-in-pascaldelphi/ Found this while thinking about b44 OpenEXR compression .....

Olaf 2019-09-11 11:47 reporter ~0002121	@Andrew-R, Any graphics card with GeForce GTS 450 or higher and the OpenGL drivers from Nvidia should play 1080/25p without any problems (RGBA-float/X11-OpenGL) and without any hardware modifications. My 450 card reaches about 115 FPS with unprocessed fullhd FFvhuff material on the timeline, including pcm audio tracks. (glxgears: 109910 frames in 5.0 seconds = 21981.947 FPS) If later the performance can be improved by 1-2 FPS by a code optimization, all the better. But as sorry as I am to say, in image processing the practical use of the proprietary powerful drivers contrasts with the ideological aspects.

Andrew-R 2019-09-11 09:52 reporter ~0002120	@Olaf: Yes, I use mesa/nouveau, but my card can be partially reclocked, and even boot clocks are not very low (as they are on many other cards) - so, slower, but not untolerably slow. Also, my first test stream was av1 1080p video (no audio), and my second test stream was 720x400 h264 file scaled up to 1080p. Second case was faster, but still around 20 fps, not 25 ..... {speaking about RGBA-float project colorspace and X11-OpenGL output, specifically)

Olaf 2019-09-11 08:00 reporter ~0002119	Andrew-R, your test results (5-10 fps) refer to nouveau and mesa? Nvidia delivers with its drivers an OpenGL that is much faster than mesa. With these in connection with my age old graphics card I play 1080/25p in RGBA-FLOAT with 25 FPS. Only after image manipulation the FPS break in.

Andrew-R 2019-09-11 02:17 reporter ~0002118	And for some reason my attempt at reply disappeared :/ May be I just type it in too slowly! Anyway, i was about to add this 'brilliant' idea about contacting X/mesa (and nouveau) mailing lists in hope they will find some interest in actual application hacking. Links https://lists.x.org/archives/xorg-devel/2018-February/055861.html Depth 30 enablement for modesetting-ddx and fixups for glamor. Mario Kleiner mario.kleiner.de at gmail.com ---------quote------ I used my photometer to make sure the bits come through while testing on NVidia + nouveau, and then also quickly tested on old AMD gfx + radeon-kms and on Intel + intel-kms to make sure that regular desktop and OpenGL apps render correctly on that hw as well. ---end quote----- https://lists.freedesktop.org/archives/mesa-dev/2019-February/214689.html [Mesa-dev] 10-bit fbconfigs break most video players using VAAPI+GLX https://github.com/skeggsb/nouveau/commit/ca5fe1a3e31e1f1e77274616e18296ddd0daba32 kms/nv50-: add fp16 scanout support Phyllis - i'm sorry about you and your dog! OpenGL is big standard, but I hope this quest can be resolved with time and collective work... PS; llvmpipe with mesa 19.3.0-git actually should be able to run some compute shaders - not very fast, but may be good for prototyping on machines where videocards too old for having OpenGL 4.3+ in hardware (like my machine).

PhyllisSmith 2019-09-11 01:25 manager ~0002116	@Andrew GG did look at your patch. He is not sure how this will affect plugins and it is unclear if there might be risks involved. You probably already know that he is no expert when it comes to OpenGL (me and the dog have had to cover our ears in the last couple of months when he was working on OpenGL coding). As always, it is good to have this logged as an issue for future consideration by others.

PhyllisSmith 2019-09-11 00:49 manager ~0002114	@Andrea "I still have one thing to understand: doesn't implementing a CMS mean having internal LUTs that avoid making continuous conversions between color spaces and color models? Or, in any case, to make them faster and more precise because they always refer to the same absolute XYZ coordinates as the colours?" I am not sure that the below quote from GG helps illuminate the question above, but here it is anyway: Start quote: "There is a optional feature that can be used via .opts lines from the ffmpeg decoded files. This is via the video_filter=colormatrix=...ffmpeg plugin. There may be other good plugins (lut3d...) that can also accomplish a desired color transform. This .opts feature affects the file colorspace on a file by file basis, although in principle it should be possible to setup a histogram plugin or any of the F_lut* plugins to remap the colortable, either by table or interp. For output, the yuv<->rgb transformations are via the YUV class, and its tables are initialized using YUV::yuv_set_colors. This sets up the transfer tables for just one version (one of bt601,709,2020) and the color range for mpeg or jpeg. This is limited, but since the product is usually a render, and this transform is designed to match display parameters, it is often correct to select the output colorspace once. If the render needs a colorspace mapping, then the session can be nested and the session output remapped by a plugin. This is all not very glitzy or highly automated, but it does provide a wide color mapping capability." End Quote. With our limited equipment and lack of color management knowledge, I think CMS can only be integrated into CinGG by someone with in depth knowledge of both video and programming skills.

Andrew-R 2019-09-08 13:10 reporter ~0002089	Something like this patch? ---------- diff --git a/cinelerra-5.1/cinelerra/vdevicex11.C b/cinelerra-5.1/cinelerra/vdevicex11.C index 24d1be0..6d87692 100644 --- a/cinelerra-5.1/cinelerra/vdevicex11.C +++ b/cinelerra-5.1/cinelerra/vdevicex11.C @@ -333,6 +333,13 @@ void VDeviceX11::new_output_buffer(VFrame *result, int file_colormodel, EDL ed } break; + case BC_RGBA_FLOAT: + case BC_RGB_FLOAT: + if( device->out_config->driver == PLAYBACK_X11_GL + && !output->use_scrollbars ) + bitmap_type = BITMAP_PRIMARY; + break; + case BC_YUV420P: if( device->out_config->driver == PLAYBACK_X11_XV && window->accel_available(display_colormodel, 0) && diff --git a/cinelerra-5.1/guicast/bctexture.C b/cinelerra-5.1/guicast/bctexture.C index 52787e1..f1fd166 100644 --- a/cinelerra-5.1/guicast/bctexture.C +++ b/cinelerra-5.1/guicast/bctexture.C @@ -124,9 +124,9 @@ void BC_Texture::create_texture(int w, int h, int colormodel) glGenTextures(1, (GLuint)&texture_id); glBindTexture(GL_TEXTURE_2D, (GLuint)texture_id); glEnable(GL_TEXTURE_2D); - int internal_format = texture_components == 4 ? GL_RGBA8 : GL_RGB8 ; + int internal_format = texture_components == 4 ? GL_RGBA32F : GL_RGB32F ; glTexImage2D(GL_TEXTURE_2D, 0, internal_format, texture_w, texture_h, - 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); + 0, GL_RGBA, GL_FLOAT, 0); window_id = BC_WindowBase::get_synchronous()->current_window->get_id(); BC_WindowBase::get_synchronous()->put_texture(texture_id, texture_w, texture_h, texture_components); ----------------- Note, it only adds 32F floating-point textures, not rendering into floating-point frambuffer (glxinfo still not marks any of those FBconfigs as floating-point capable ..may be new kernel needed, or new card :}) I can see 30-bit integer formats (with 2bit alpha), but not sure how useful those can be .... 558 GLXFBConfigs: visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat ---------------------------------------------------------------------------- 0x075 0 tc 0 32 0 r . . 10 10 10 2 . . 0 0 0 0 0 0 0 0 0 None opengl3_patch_very_small_speedup.patch (1,525 bytes) diff --git a/cinelerra-5.1/cinelerra/vdevicex11.C b/cinelerra-5.1/cinelerra/vdevicex11.C index 24d1be0..6d87692 100644 --- a/cinelerra-5.1/cinelerra/vdevicex11.C +++ b/cinelerra-5.1/cinelerra/vdevicex11.C @@ -333,6 +333,13 @@ void VDeviceX11::new_output_buffer(VFrame result, int file_colormodel, EDL ed } break; + case BC_RGBA_FLOAT: + case BC_RGB_FLOAT: + if( device->out_config->driver == PLAYBACK_X11_GL + && !output->use_scrollbars ) + bitmap_type = BITMAP_PRIMARY; + break; + case BC_YUV420P: if( device->out_config->driver == PLAYBACK_X11_XV && window->accel_available(display_colormodel, 0) && diff --git a/cinelerra-5.1/guicast/bctexture.C b/cinelerra-5.1/guicast/bctexture.C index 52787e1..f1fd166 100644 --- a/cinelerra-5.1/guicast/bctexture.C +++ b/cinelerra-5.1/guicast/bctexture.C @@ -124,9 +124,9 @@ void BC_Texture::create_texture(int w, int h, int colormodel) glGenTextures(1, (GLuint*)&texture_id); glBindTexture(GL_TEXTURE_2D, (GLuint)texture_id); glEnable(GL_TEXTURE_2D); - int internal_format = texture_components == 4 ? GL_RGBA8 : GL_RGB8 ; + int internal_format = texture_components == 4 ? GL_RGBA32F : GL_RGB32F ; glTexImage2D(GL_TEXTURE_2D, 0, internal_format, texture_w, texture_h, - 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); + 0, GL_RGBA, GL_FLOAT, 0); window_id = BC_WindowBase::get_synchronous()->current_window->get_id(); BC_WindowBase::get_synchronous()->put_texture(texture_id, texture_w, texture_h, texture_components); opengl3_patch_very_small_speedup.patch (1,525 bytes)

Andrew-R 2019-09-08 10:36 reporter ~0002088	And while looking at cinelerra-5.1/cinelerra/vdevicex11.C - it seems it set up (in function void VDeviceX11::new_output_buffer(VFrame *result, int file_colormodel, EDL edl) ) bitmap_type = BITMAP_TEMP; and then override it to bitmap_type = BITMAP_PRIMARY; IF display_colormodel () = case BC_BGR8888 and three other cases with additional constrains (BC_YUV420P, BC_YUV422P, BC_YUV422) but it doesn't include BC_RGB_FLOAT or BC_RGBA_FLOAT so, tmp conversion is used in this case (from project's RGBA-float (32-bit per component, yes?) to output's . .. in VDeviceX11::write_buffer . So, if I want speed-up I must include rgba-32f type in this switch case, and allocate textures with this format, too .and update some check saying OpenGL display now can accelerate even RGB(A)-float output .....

Andrea_Paz 2019-09-08 08:22 manager ~0002087	I thank GG for the clear and thorough explanation. I'm a bit sad because I seem to have understood that CinGG will never have a color management (CMS) unless you change EVERYTHING. This is impossible and not even desirable. I still have one thing to understand: doesn't implementing a CMS mean having internal LUTs that avoid making continuous conversions between color spaces and color models? Or, in any case, to make them faster and more precise because they always refer to the same absolute XYZ coordinates as the colours?

Andrew-R 2019-09-08 07:04 reporter ~0002086	Wow, thanks a lot for both Phyllis and GG for such technical answer! I see your point about rgb8 -> rgba_float conversion step done in Cinelerra itself as slow path. I was looking at babl (http://gegl.org/babl/index.html#Usage) - I have it installed for GIMP anyway ..... May be some function in babl-0.1.72/extensions/sse4-int8.c can be useful for Cin (even if just for testing speedup/idea) ? I see function like #if defined(USE_SSE4_1) /* SSE 4 / #include <smmintrin.h> #include <stdint.h> #include <stdlib.h> #include "babl.h" #include "babl-cpuaccel.h" #include "extensions/util.h" static inline void conv_y8_yF (const Babl conversion, const uint8_t src, float dst, long samples) { const float factor = 1.0f / 255.0f; const __v4sf factor_vec = {1.0f / 255.0f, 1.0f / 255.0f, 1.0f / 255.0f, 1.0f / 255.0f}; const uint32_t s_vec; __v4sf d_vec; long n = samples; s_vec = (const uint32_t )src; d_vec = (__v4sf )dst; while (n >= 4) { __m128i in_val; __v4sf out_val; in_val = _mm_insert_epi32 ((__m128i)_mm_setzero_ps(), s_vec++, 0); in_val = _mm_cvtepu8_epi32 (in_val); out_val = _mm_cvtepi32_ps (in_val) factor_vec; _mm_storeu_ps ((float )d_vec++, out_val); n -= 4; } src = (const uint8_t )s_vec; dst = (float )d_vec; while (n) { dst++ = (float)(src++) factor; n -= 1; } } [...] static void conv_rgb8_rgbF (const Babl conversion, const uint8_t src, float dst, long samples) { conv_y8_yF (conversion, src, dst, samples 3); } static void conv_rgba8_rgbaF (const Babl conversion, const uint8_t src, float dst, long samples) { conv_y8_yF (conversion, src, dst, samples 4); } #endif -------------- Also, because babl sort-of specializes in color-management too - may be it can be reused at least for some stages if/when color management will come to Cinelerra-GG .... PS: my CPU has SSE4.1: cat /proc/cpuinfo \| grep sse4 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

PhyllisSmith 2019-09-08 01:35 manager ~0002085	Short answer by Phyllis: we could duplicate your results on my previous laptop but the 2 computers we use daily have multiple cpus and the fps stayed just about 24 all of the time. It may be possible to re-write a bunch of stuff to specialize it for opengl data formats and texture design, but is a great distance from the current design that targets the internal data model you specify. And "The if (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGB_FLOAT) ... is testing to see if the conversion can be skipped (returns call). Whereas the table accesses a shader needed to do a color conversion." in response to concern about "covering all cases". Long "real" answer by GoodGuy: hi This is a sort of fuzzy analysis of the data transfers for the different rendering drivers, color models and formats that are used when you do stuff. This data was collected using the "prof2" program which is in the main cin5 src directory. This profiler program uses an alarm signal to frequently collect stack traces to see a birds eye view of the program execution. Here is a piece of some of the data I collected using this program: this top part is time observed for each alarm interrupt. In this case, it is cpu time slices, 100 per sec, that create an intergal histgram of where it was when it was interrupted. 1.540s 1.4% ff_hevc_put_hevc_qpel_hv8_8_sse4 /mnt0/build5/cinelerra-5.1/bin/cin 1.600s 1.4% ff_hevc_put_hevc_bi_qpel_hv8_8_sse4 /mnt0/build5/cinelerra-5.1/bin/cin 1.800s 1.6% shmdt /lib64/libc-2.28.so 2.070s 1.8% ff_hevc_deblocking_boundary_strengths /mnt0/build5/cinelerra-5.1/bin/cin 2.130s 1.9% copy_CTB_to_hv /mnt0/build5/cinelerra-5.1/bin/cin 19.020s 17.0% yuv420_bgr32_mmx /mnt0/build5/cinelerra-5.1/bin/cin 23.230s 20.8% BC_Xfer::xfer_rgba8888_to_rgba_float(unsigned int, unsigned int) /mnt0/build5/cinelerra-5.1/bin/cin 37.430s 33.4% _fini /mnt0/build5/cinelerra-5.1/bin/cin ------------ this part tries to walk the stack, and show the cpu stack path histogram of the time spent in execution. It shows how it got to the bad guys at the bottom of the stack (above). 10.180s 9.1% BC_Xfer::xfer_slices(int) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.760s 10.5% FFVideoStream::load(VFrame, long) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.760s 10.5% FFVideoConvert::convert_cmodel(VFrame, AVFrame) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.760s 10.5% FFMPEG::decode(int, long, VFrame) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.760s 10.5% FileFFMPEG::read_frame(VFrame) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.910s 10.6% File::read_frame(VFrame, int) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.920s 10.6% VEdit::read_frame(VFrame, long, int, CICache, int, int, int) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.930s 10.7% VRender::process_buffer(long, int) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 11.970s 10.7% VRender::run() 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 13.280s 11.9% non-virtual thunk to BC_Xfer::Slicer::run() 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 19.020s 17.0% yuv420_bgr32_mmx 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 23.230s 20.8% BC_Xfer::xfer_rgba8888_to_rgba_float(unsigned int, unsigned int) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 27.020s 24.1% Thread::entrypoint(void*) 1.0 /mnt0/build5/cinelerra-5.1/bin/cin 27.020s 24.1% start_thread 1.0 /lib64/libpthread-2.28.so 37.520s 33.5% _fini 1.0 /mnt0/build5/cinelerra-5.1/bin/cin This is mostly just smoke and mirrors, but it does show that the program is spending a great deal of time converting first from the media format (yuv420) to rgb using ffmpeg sofware scale (sws) transfers to convert to rgb. This is a good idea, since there are a bunch of yuv models (BT601,BT709,BT2020 and MPEG/JPG color ranges) and cin5 only supports one of these at a time. There may be more than one type in your session. That makes the ffmpeg yuv->rgb conversion a needed feature. Next, since your internal buffers are rgb float, the data is converted BC_Xfer::xfer_rgba8888_to_rgba_float. That requires the use of the floating point unit to operate the conversion and memory transfers. The float instructions are much slower than integer instructions, and performance varies greatly depending on cpu models. The rgb8888 to rgb float step is in there because "you said to" in the session format. OpenGL can render float or 8bit, and probably at nearly the same speed, but that is not where the time is spent. It is mostly in the decode, and media data prep for the session render format that is soaking up all of the time. It is true that textures support a wide variety of data/color models, but the demand is for the render setup, not usually for graphics performance. This puts a big constraint on what needs to be programmed, since the result is targeting the software renders (always used as the reference for render) since depending on opengl can produce results that are hard to control. There are a very high number of rendering options for opengl, and every implementation may or may not be exactly identical. So, it may be possible to re-write a bunch of stuff to specialize it for opengl data formats and texture design, but is a great distance from the current design that targets the internal data model you specify. convert_cmodel is only used to convert frames that are from nested edl renders, not composer canvas render drawing. composer canvas draws are normally Playback3D::write_buffer_sync->Playback3D::draw_output, which may use an opengl fragment shader yuv_to_rgb_frag to convert if the drawn frame is yuv. The shader tables you reference are opengl fragment shaders that are used to convert data for the nested renders. The if (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGB_FLOAT) ... is testing to see if the conversion can be skipped (returns call). The table accesses a shader needed to do a color conversion. To affect drawing, the screen buffer format is normally defined by the x11 visual chosen during glx probe, and is constrained to rgb8888 by BC_WindowBase::glx_window_fb_configs. With modern video and device formats, this may need to be upgraded very soon, but I can't talk phyllis into buying any new video graphics cards, high depth monitors, or any new tvs to try out any of this, so for the time being, this is what I can actually test, because it is what is here at my house. The texture formats are also (sadly) always 8bit, BC_Texture::create_texture. It sets the basic parameters for texture internal formats which are almost always used as the data design for opengl operations. This also needs to be upgraded, but it would use more graphics memory and may introduce performance issues also. Mesa (software) opengl is used by almost all distros, unless you specify that you want something else... so internal format choice may widely affect mesa, and therefore the speed of rendering in cin5. and so in summary, it is true that cin5 may be able to use better opengl configuations and parameters, but usually it is doing what it does for pretty good reasons. The main purpose for opengl in cin5 seems to be to speed up editing, not produce the best render design. gg

Andrea_Paz 2019-09-06 07:40 manager ~0002079	I'm sorry I can't help but I thank you for the work you do, which I think is really important for CinGG. It could be the beginning of a color management.

Andrew-R 2019-09-06 02:08 reporter ~0002077	Actually, I tried to hack a bit on Cinelerra, but while my hack seems to work as in showing image in Compositor window - it doesn't speed up things :/ -------------------- diff --git a/cinelerra-5.1/cinelerra/playback3d.C b/cinelerra-5.1/cinelerra/playback3d.C index a7f185b..e45edc6 100644 --- a/cinelerra-5.1/cinelerra/playback3d.C +++ b/cinelerra-5.1/cinelerra/playback3d.C @@ -1491,11 +1491,14 @@ void Playback3D::convert_cmodel(Canvas canvas, if( (output->get_opengl_state() == VFrame::TEXTURE \|\| output->get_opengl_state() == VFrame::SCREEN) && +( // OpenGL has no floating point. +/ ( (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGB_FLOAT) \|\| (src_cmodel == BC_RGBA8888 && dst_cmodel == BC_RGBA_FLOAT) \|\| (src_cmodel == BC_RGB_FLOAT && dst_cmodel == BC_RGB888) \|\| - (src_cmodel == BC_RGBA_FLOAT && dst_cmodel == BC_RGBA8888) \|\| + (src_cmodel == BC_RGBA_FLOAT && dst_cmodel == BC_RGBA8888) \|\| +/ // OpenGL sets alpha to 1 on import (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGBA8888) \|\| (src_cmodel == BC_YUV888 && dst_cmodel == BC_YUVA8888) \|\| diff --git a/cinelerra-5.1/guicast/bctexture.C b/cinelerra-5.1/guicast/bctexture.C index 52787e1..cc50454 100644 --- a/cinelerra-5.1/guicast/bctexture.C +++ b/cinelerra-5.1/guicast/bctexture.C @@ -124,9 +124,9 @@ void BC_Texture::create_texture(int w, int h, int colormodel) glGenTextures(1, (GLuint)&texture_id); glBindTexture(GL_TEXTURE_2D, (GLuint)texture_id); glEnable(GL_TEXTURE_2D); - int internal_format = texture_components == 4 ? GL_RGBA8 : GL_RGB8 ; + int internal_format = texture_components == 4 ? GL_RGBA16F : GL_RGB16F ; glTexImage2D(GL_TEXTURE_2D, 0, internal_format, texture_w, texture_h, - 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); + 0, GL_RGBA, GL_FLOAT, 0); window_id = BC_WindowBase::get_synchronous()->current_window->get_id(); BC_WindowBase::get_synchronous()->put_texture(texture_id, texture_w, texture_h, texture_components); -------------- ogl3_no_speedup.diff (1,799 bytes) diff --git a/cinelerra-5.1/cinelerra/playback3d.C b/cinelerra-5.1/cinelerra/playback3d.C index a7f185b..e45edc6 100644 --- a/cinelerra-5.1/cinelerra/playback3d.C +++ b/cinelerra-5.1/cinelerra/playback3d.C @@ -1491,11 +1491,14 @@ void Playback3D::convert_cmodel(Canvas canvas, if( (output->get_opengl_state() == VFrame::TEXTURE \|\| output->get_opengl_state() == VFrame::SCREEN) && +( // OpenGL has no floating point. +/ ( (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGB_FLOAT) \|\| (src_cmodel == BC_RGBA8888 && dst_cmodel == BC_RGBA_FLOAT) \|\| (src_cmodel == BC_RGB_FLOAT && dst_cmodel == BC_RGB888) \|\| - (src_cmodel == BC_RGBA_FLOAT && dst_cmodel == BC_RGBA8888) \|\| + (src_cmodel == BC_RGBA_FLOAT && dst_cmodel == BC_RGBA8888) \|\| +/ // OpenGL sets alpha to 1 on import (src_cmodel == BC_RGB888 && dst_cmodel == BC_RGBA8888) \|\| (src_cmodel == BC_YUV888 && dst_cmodel == BC_YUVA8888) \|\| diff --git a/cinelerra-5.1/guicast/bctexture.C b/cinelerra-5.1/guicast/bctexture.C index 52787e1..cc50454 100644 --- a/cinelerra-5.1/guicast/bctexture.C +++ b/cinelerra-5.1/guicast/bctexture.C @@ -124,9 +124,9 @@ void BC_Texture::create_texture(int w, int h, int colormodel) glGenTextures(1, (GLuint)&texture_id); glBindTexture(GL_TEXTURE_2D, (GLuint)texture_id); glEnable(GL_TEXTURE_2D); - int internal_format = texture_components == 4 ? GL_RGBA8 : GL_RGB8 ; + int internal_format = texture_components == 4 ? GL_RGBA16F : GL_RGB16F ; glTexImage2D(GL_TEXTURE_2D, 0, internal_format, texture_w, texture_h, - 0, GL_RGBA, GL_UNSIGNED_BYTE, 0); + 0, GL_RGBA, GL_FLOAT, 0); window_id = BC_WindowBase::get_synchronous()->current_window->get_id(); BC_WindowBase::get_synchronous()->put_texture(texture_id, texture_w, texture_h, texture_components); ogl3_no_speedup.diff (1,799 bytes)

Date Modified	Username	Field	Change
2019-09-05 23:51	Andrew-R	New Issue
2019-09-06 02:08	Andrew-R	File Added: ogl3_no_speedup.diff
2019-09-06 02:08	Andrew-R	Note Added: 0002077
2019-09-06 07:40	Andrea_Paz	Note Added: 0002079
2019-09-08 01:35	PhyllisSmith	Assigned To	=> PhyllisSmith
2019-09-08 01:35	PhyllisSmith	Status	new => acknowledged
2019-09-08 01:35	PhyllisSmith	Note Added: 0002085
2019-09-08 07:04	Andrew-R	Note Added: 0002086
2019-09-08 08:22	Andrea_Paz	Note Added: 0002087
2019-09-08 10:36	Andrew-R	Note Added: 0002088
2019-09-08 13:10	Andrew-R	File Added: opengl3_patch_very_small_speedup.patch
2019-09-08 13:10	Andrew-R	Note Added: 0002089
2019-09-11 00:49	PhyllisSmith	Note Added: 0002114
2019-09-11 01:25	PhyllisSmith	Note Added: 0002116
2019-09-11 02:17	Andrew-R	Note Added: 0002118
2019-09-11 08:00	Olaf	Note Added: 0002119
2019-09-11 09:52	Andrew-R	Note Added: 0002120
2019-09-11 11:47	Olaf	Note Added: 0002121
2020-03-16 16:20	Andrew-R	Note Added: 0002905
2020-09-04 20:04	Andrew-R	Note Added: 0003989

View Issue Details

Activities

Issue History