summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-06-07radv: enable HTILE for images that might need variable sample locationsSamuel Pitoiset1-7/+0
This is now supported. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: handle sample locations during automatic layout transitionsSamuel Pitoiset2-18/+168
From the Vulkan spec 1.1.109: "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. [...] and VkRenderPassSampleLocationsBeginInfoEXT can be chained from VkRenderPassBeginInfo to provide sample locations for layout transitions performed implicitly by a render pass instance." Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: determine the first subpass id for every attachmentsSamuel Pitoiset2-1/+20
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: handle sample locations during explicit depth/stencil transitionsSamuel Pitoiset1-7/+28
From the Vulkan spec 1.1.109, "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. VkSampleLocationsInfoEXT can be chained from VkImageMemoryBarrier structures to provide sample locations for layout transitions performed by vkCmdWaitEvents and vkCmdPipelineBarrier calls." This handles explicit depth/stencil layout transitions performed with CmdWaitEvents() or CmdPipelineBarrier(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: allow the depth decompress pass to emit dynamic sample locationsSamuel Pitoiset3-7/+31
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: allow to set dynamic sample locations to the depth decompress passSamuel Pitoiset1-1/+8
If VK_EXT_sample_locations is used, the driver might need to emit the sample locations specified during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07radv: allow to save/restore sample locations during meta operationsSamuel Pitoiset2-0/+14
This will be used for the depth decompress pass that might need to emit variable sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07iris: Sweep the NIR in iris_create_uncompiled_shader().Kenneth Graunke1-0/+2
We run a ton of backend specific passes here (mostly brw_preprocess_nir) and ought to sweep up any unused memory at this point, since we're going to hang on to this NIR for as long as the linked program lives.
2019-06-07ir3: Use the new NIR lowering pass for integer multiplicationEduardo Lima Mitev2-17/+16
Shader-db stats courtesy of Eric Anholt: total instructions in shared programs: 6480215 -> 6475457 (-0.07%) instructions in affected programs: 662105 -> 657347 (-0.72%) helped: 1209 HURT: 13 total constlen in shared programs: 1432704 -> 1427769 (-0.34%) constlen in affected programs: 100063 -> 95128 (-4.93%) helped: 512 HURT: 0 total max_sun in shared programs: 875561 -> 873387 (-0.25%) max_sun in affected programs: 46179 -> 44005 (-4.71%) helped: 1087 HURT: 0 Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07ir3/nir: Add new NIR AlgebraicPass for lowering imulEduardo Lima Mitev3-1/+64
Currently, ir3 backend compiler is lowering integer multiplication from: dst = a * b to: dst = (al * bl) + (ah * bl << 16) + (al * bh << 16) by emitting this code: mull.u tmp0, a, b ; mul low, i.e. al * bl madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16 madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16 which at that point has very low chances of being optimized. This patch adds a new nir_algebraic.AlgebraicPass to performs this lowering during NIR algebraic optimization passes, giving it a better chance for optimizing the resulting code. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16Eduardo Lima Mitev2-0/+55
For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'Eduardo Lima Mitev1-0/+6
They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodesEduardo Lima Mitev1-1/+14
'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07v3d: don't emit point coordinates varyings if the FS doesn't read themIago Toral Quiroga4-5/+27
We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07v3d: add a helper to track variables that need point coordinatesIago Toral Quiroga1-5/+10
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06egl/x11: calloc dri2_surf so it's properly zeroedKenneth Graunke1-1/+1
Commit 2282ec0a refactored drawable creation across various platforms into a new dri2_create_drawable helper function. The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the loaderPrivate, while most other backends passed in dri2_surf directly. To try and handle this, the patch checked if dri2_surf->gbm_surf was non-NULL, and if so, presumed that the caller is the DRM platform and we should use the dri2_surf->gbm_surf pointer. This worked for most platforms, which calloc their dri2_surf structure, zeroing the data. Unfortunately, platform_x11.c used malloc, leaving most of the dri2_surf as garbage. In particular, dri2_surf->gbm_surf was often non-NULL, causing dri2_create_drawable to try and use it, passing a garbage pointer to the createNewDrawable hook, usually leading to a SIGBUS or SIGSEGV when trying to dereference that bad pointer. Since most callers calloc the data, make platform_x11.c follow suit. Fixes crashes with i915_dri.so when running dEQP-GLES2. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-06tests/graw: use C99 print conversion specifier for 32 bit buildsMark Janes1-1/+2
Fixes formatting errors for 32 bit compilations, eg: error: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat] printf("result1 = %lu result2 = %lu\n", res1.u64, res2.u64); Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06panfrost/midgard: Fix crash with unused SSA valuesAlyssa Rosenzweig1-0/+4
Crash introduced in "b38dab101ca7e0896255dccbd85fd510c47d84d1" but not adding a Fixes tag since it's our bug anyway. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-06panfrost: Report sRGB colorspace as not supportedBoris Brezillon1-0/+4
The driver does not support sRGB yet, so let's report it as unsupported. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-06docs: do not use div for line-breakingErik Faye-Lund1-2/+2
HTML has the <p>-tag for this purpose. It adds some margins, but that just makes this read better, IMO. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06docs: fixup code-tag positioningErik Faye-Lund1-1/+1
This reads better if we include the asterisk in the code-block, as it's part of the function-reference, even though it's not technically speaking code. But as the <code>-tag isn't purely for code, this should be fine. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06docs: add missing code-tagsErik Faye-Lund3-26/+30
Looks like I missed a few cases when I recently added more code-tags here. So let's add these cases as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06docs: add accidentally dropped "at"Erik Faye-Lund1-3/+3
When rewriting 20c56e18c21 after review, I accidentally dropped the "at" here. Sorry for that, and let's fix it up! Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: 20c56e18c21 ("docs: use proper links instead of code-tags") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-opGurchetan Singh1-0/+5
AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined flexible YUV format.  Most of the times, it's NV12 or YV12. On Intel, NV12 is preferred since it can be used by the display engine.   This API adds a dependency between gralloc and buffer consumers, unfortunately. Right now, the code seems to work for i915 gralloc, but not cros_gralloc. Add a preprocessor flag to fix this. TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-06ac/nir: Remove stale TODOConnor Abbott1-1/+7
While we're here, copy the comment explaining this from radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-06radeonsi: Don't force dcc disable for loadsConnor Abbott2-13/+0
When e9d935ed0e2 added force_dcc_off(), we forced it off for any preloaded image descriptor which had stores associated with them, since the same preloaded descriptors were used for loads and stores. However, when the preloading was removed in 16be87c9042, the existing logic was kept despite it not being necessary anymore. The comment above force_dcc_off() only mentions stores, so only force DCC off for stores. Cc: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-06mesa/main: Expose EXT_clip_control and related enums and the functionGert Wollny4-2/+26
Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06mapi/glapi/registry: Update gl.xml to latest upstream versionGert Wollny1-1141/+3218
The old copy didn't include EXT_clip_control, so update it. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06virgl: Enable CAP_CLIP_HALFZ if host supports itGert Wollny2-1/+3
On according hosts this enables the piglits as "pass": arb_clip_control-* v2: sync flag with host Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1) Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06svga: Remove unnecessary check for the pre flush bit for setting vertex buffersCharmaine Lee1-4/+4
This fixes the missing rebind when the can_pre_flush bit is not set and the vertex buffers are the same as what have been sent. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Neha Bhende <bhenden@vmware.com> Signed-off-by: Charmaine Lee <charmainel@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-06-06winsys/svga/drm: Fix 32-bit RPCI send messageDeepak Rawat1-12/+23
Depending on whether compiled with frame-pointer or not, the temporary memory location used for the bp parameter in these macros are referenced relative to the stack pointer or the frame pointer. Hence we can never reference that parameter when we've modified either the stack pointer or the frame pointer, because then the compiler would generate an incorrect stack reference. Fix this by pushing the temporary memory parameter on a known location on the stack before modifying the stack- and frame pointers. Also in case of failuire RPCI channel is not closed which lead to vmx running out of channels. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-06-06radv: set the subpass before any initial subpass transitionsSamuel Pitoiset1-2/+3
This might fix initial subpass transitions when multiview is used. Noticed while implementing sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-06anv: Fix check for isl_fmt in assertNataraj Deshpande1-1/+1
Checking isl_fmt returned value in assert seems appropriate instead of format variable. Fixes: f1654fa7e31 "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-06-06v3d: fix scheduling dependency tracking for ALU with small immediatesIago Toral Quiroga1-1/+4
We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06lima/ppir: add missing handling of min/max ops for vec4 add slotVasily Khoruzhick1-0/+6
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-06lima/ppir: fix crash when program uses no registers at allVasily Khoruzhick1-0/+4
Program may need no regalloc at all, e.g. in case when program consists of single discard op. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-06util/hash_table: Assert that keys are not reserved pointersJason Ekstrand1-1/+9
If we insert a NULL key, it will appear to succeed but will mess up entry counting. Similar errors can occur if someone accidentally inserts the deleted key. The later is highly unlikely but technically possible so we should guard against it too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06util/set: Assert that keys are not reserved pointersJason Ekstrand1-0/+10
If we insert a NULL key, it will appear to succeed but will mess up entry counting. Similar errors can occur if someone accidentally inserts the deleted key. The later is highly unlikely but technically possible so we should guard against it too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06glsl/loop_analysis: Don't search for NULL variables in the hash tableJason Ekstrand1-0/+3
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-06nir/propagate_invariant: Don't add NULL vars to the hash tableJason Ekstrand1-1/+10
Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-05intel/compiler: Treat b32csel as potentially producing a Boolean result for ↵Ian Romanick1-2/+4
resolve analysis If the 2nd and 3rd source are both Boolean values, we can potentially avoid a resolve by only resolving the result of the b32csel. No changes on any Gen6+ Intel platform. v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio. Iron Lake total instructions in shared programs: 8142729 -> 8142677 (<.01%) instructions in affected programs: 12890 -> 12838 (-0.40%) helped: 26 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.39% Instructions are helped. total cycles in shared programs: 188549632 -> 188549394 (<.01%) cycles in affected programs: 60754 -> 60516 (-0.39%) helped: 25 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8 helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for cycles value: -12.91 -5.40 95% mean confidence interval for cycles %-change: -0.84% -0.23% Cycles are helped. GM45 total instructions in shared programs: 5013119 -> 5013093 (<.01%) instructions in affected programs: 6764 -> 6738 (-0.38%) helped: 13 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.34% Instructions are helped. total cycles in shared programs: 128977804 -> 128977700 (<.01%) cycles in affected programs: 37738 -> 37634 (-0.28%) helped: 13 HURT: 0 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -8.00 -8.00 95% mean confidence interval for cycles %-change: -0.36% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05intel/fs: Improve discard_if code generationIan Romanick1-3/+43
Previously we would blindly emit an sequence like: mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */ (+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D The first move sets the flags based on the initial execution mask. Later discard sequences contain a predicated compare that can only remove more SIMD channels. Often times the only user of the result from the first compare is the second compare. Instead, generate a sequence like mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */ (+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F /* 15F */ If the results stored in g7 and f0.0 are not used, the comparison will be eliminated. This removes an instruction and potentially reduces register pressure. v2: Major re-write of the commit message (including fixing the assembly code). Suggested by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224434 -> 17198659 (-0.15%) instructions in affected programs: 2908125 -> 2882350 (-0.89%) helped: 18891 HURT: 5 helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02% HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35 HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56% 95% mean confidence interval for instructions value: -1.39 -1.34 95% mean confidence interval for instructions %-change: -1.79% -1.73% Instructions are helped. total cycles in shared programs: 361468458 -> 361170679 (-0.08%) cycles in affected programs: 38470116 -> 38172337 (-0.77%) helped: 16202 HURT: 1456 helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18 helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18% HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28 HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64% 95% mean confidence interval for cycles value: -18.24 -15.49 95% mean confidence interval for cycles %-change: -2.26% -2.14% Cycles are helped. total spills in shared programs: 12147 -> 12176 (0.24%) spills in affected programs: 175 -> 204 (16.57%) helped: 8 HURT: 5 total fills in shared programs: 25262 -> 25292 (0.12%) fills in affected programs: 269 -> 299 (11.15%) helped: 8 HURT: 5 Haswell total instructions in shared programs: 13530316 -> 13502647 (-0.20%) instructions in affected programs: 2507824 -> 2480155 (-1.10%) helped: 18859 HURT: 10 helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1 helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41% HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31 HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31% 95% mean confidence interval for instructions value: -1.49 -1.44 95% mean confidence interval for instructions %-change: -2.42% -2.34% Instructions are helped. total cycles in shared programs: 377865412 -> 377639034 (-0.06%) cycles in affected programs: 40169572 -> 39943194 (-0.56%) helped: 15550 HURT: 1938 helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18 helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25% HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35 HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75% 95% mean confidence interval for cycles value: -14.42 -11.47 95% mean confidence interval for cycles %-change: -2.05% -1.91% Cycles are helped. total spills in shared programs: 26769 -> 26814 (0.17%) spills in affected programs: 826 -> 871 (5.45%) helped: 9 HURT: 10 total fills in shared programs: 38383 -> 38425 (0.11%) fills in affected programs: 834 -> 876 (5.04%) helped: 9 HURT: 10 LOST: 5 GAINED: 10 Ivy Bridge total instructions in shared programs: 12079250 -> 12044139 (-0.29%) instructions in affected programs: 2409680 -> 2374569 (-1.46%) helped: 16135 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2 helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68% 95% mean confidence interval for instructions value: -2.21 -2.14 95% mean confidence interval for instructions %-change: -2.76% -2.67% Instructions are helped. total cycles in shared programs: 180116747 -> 179900405 (-0.12%) cycles in affected programs: 25439823 -> 25223481 (-0.85%) helped: 13817 HURT: 1499 helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18 helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97% HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52 HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42% 95% mean confidence interval for cycles value: -15.68 -12.57 95% mean confidence interval for cycles %-change: -1.77% -1.63% Cycles are helped. LOST: 8 GAINED: 10 Sandy Bridge total instructions in shared programs: 10878990 -> 10863659 (-0.14%) instructions in affected programs: 1806702 -> 1791371 (-0.85%) helped: 13023 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1 helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10% 95% mean confidence interval for instructions value: -1.18 -1.17 95% mean confidence interval for instructions %-change: -1.68% -1.62% Instructions are helped. total cycles in shared programs: 154082878 -> 153862810 (-0.14%) cycles in affected programs: 20199374 -> 19979306 (-1.09%) helped: 12048 HURT: 510 helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18 helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52% HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16 HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17% 95% mean confidence interval for cycles value: -17.97 -17.08 95% mean confidence interval for cycles %-change: -1.84% -1.75% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 8155075 -> 8142729 (-0.15%) instructions in affected programs: 949495 -> 937149 (-1.30%) helped: 5810 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.59% -2.48% Instructions are helped. total cycles in shared programs: 188584610 -> 188549632 (-0.02%) cycles in affected programs: 17274446 -> 17239468 (-0.20%) helped: 3881 HURT: 90 helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6 helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30% HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2 HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07% 95% mean confidence interval for cycles value: -9.35 -8.27 95% mean confidence interval for cycles %-change: -0.85% -0.77% Cycles are helped. GM45 total instructions in shared programs: 5019308 -> 5013119 (-0.12%) instructions in affected programs: 489028 -> 482839 (-1.27%) helped: 2912 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.54% -2.39% Instructions are helped. total cycles in shared programs: 129002592 -> 128977804 (-0.02%) cycles in affected programs: 12669152 -> 12644364 (-0.20%) helped: 2759 HURT: 37 helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4 helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31% HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4 HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04% 95% mean confidence interval for cycles value: -9.53 -8.20 95% mean confidence interval for cycles %-change: -0.79% -0.70% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05intel/fs: Add need_dest parameter to fs_visitor::nir_emit_aluIan Romanick2-4/+7
This is the same as the need_dest parameter to prepare_alu_destination_and_sources. This allows us to not change the register that is expected to hold an result if an instruction is re-emitted. This is particularly a problem if the re-emitted instruction is a partial write. A later patch will use this feature. No shader-db changes on any Intel platform. v2: Don't do the Boolean resolve when there is no destination. If the ALU instruction didn't write a register, there's nothing to resolve. This replaces an earlier patch "intel/fs: Allocate dummy destination register when need_dest is false". Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05intel/fs: Allow cmod propagation across reads and writes of different flagsIan Romanick2-6/+272
This also helps a later patch (intel/fs: Improve discard_if code generation) on about 200 shaders. v2: Document that other instruction sequences are also valid in subtract_merge_with_compare_intervening_mismatch_flag_write. Suggested by Caio. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17224434 (<.01%) instructions in affected programs: 296 -> 292 (-1.35%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.04% -0.81% Instructions are helped. total cycles in shared programs: 361468455 -> 361468458 (<.01%) cycles in affected programs: 2862 -> 2865 (0.10%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31% HURT stats (abs) min: 3 max: 4 x̄: 3.50 x̃: 3 HURT stats (rel) min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -4.34 5.84 95% mean confidence interval for cycles %-change: -0.70% 0.90% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05intel/fs: Fix flag_subreg handling in cmod propagationIan Romanick2-0/+196
There were two errors. First, the pass could propagate conditional modifiers from an instruction that writes on flag register to an instruction that writes a different flag register. For example, cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F could be come cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F Second, if an instruction writes f0.1 has it's condition propagated, the modified instruction will incorrectly write flag f0.0. For example, linterp(16) vgrf6:F, g2:F, attr0:F cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F (-f0.1) discard_jump(16) (null):UD could become linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F (-f0.1) discard_jump(16) (null):UD None of these cases will occur currently. The only time we use f0.1 is for generating discard intrinsics. In all those cases, we generate a squence like: cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d (-f0.1) discard_jump(16) (null):UD Due to the mixed types and incompatible conditions, this sequence would never see any cmod propagation. The next patch will change this. No shader-db changes on any Intel platform. v2: Fix typo in comment in test case subtract_delete_compare_other_flag. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05intel/fs: Add missing tests for cmod_propagate_notIan Romanick1-0/+278
Tests like this should have been added in 4467040cb65 ("i965/fs: Propagate conditional modifiers from not instructions"). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05i965: Allow signed/unsigned integer conversions in miptree up/downloadKenneth Graunke1-24/+0
BLORP now handles this so there's no reason to fall back. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05intel/blorp: Handle SINT/UINT clamping on blits.Kenneth Graunke2-0/+38
This patch makes blorp_blit handle SINT<->UINT blit value clamping. After reading the source's integer data (which is expanded to 32-bit), we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers outside of the representable range). Such blits are not allowed by the OpenGL or Vulkan APIs directly: The Vulkan 1.1 spec for vkCmdBlitImage says: "Integer formats can only be converted to other integer formats with the same signedness." The GL 4.5 spec for glBlitFramebuffer says: "An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: [...] * The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. * The read buffer contains signed integer values and any draw buffer does not contain signed integer values." However, they are useful for other operations, such as texture upload and download, which typically are implemented via blorp_blit(). i965 has code to fall back in this case (which the next commit will delete), and Gallium expects blit() to handle this case for texture upload. Fixes the following tests on iris: - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05anv/pipeline: Move lowering of nir_var_mem_global laterCaio Marcelo de Oliveira Filho1-3/+3
This let deref optimizations apply to globals before lowering them. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05st/nir: Don't use GLSL IR's MOD_TO_FLOOR lowering when using NIR.Kenneth Graunke1-1/+1
Both GLSL IR and NIR perform the same mod -> floor lowering for 32-bit types. But nir_lower_double_ops is slightly more defensive against lowered drcp precision loss, and handles mod(x, x) = 0 directly. This works well...assuming nir_lower_double_ops actually gets an fmod op to lower in the first place. The previous patches enabled NIR-based lowering for the remaining drivers, so we can stop using the GLSL IR lowering when using NIR. Fixes KHR-GL45.gpu_shader_fp64.builtin.mod_dvec[234] on iris. Reviewed-by: Marek Olšák <marek.olsak@amd.com>