Commit graph

36 commits

Author SHA1 Message Date
Martin Möhrmann
9d59b42d1f internal/cpu: remove option to mark cpu features required
With the removal of SSE2 runtime detection made in
golang.org/cl/344350 we can remove this mechanism as there
are no required features anymore.

For making sure CPUs running a go program support all
the minimal hardware requirements the go runtime should
do feature checks early in the runtime initialization
before it is likely any compiler emitted but unsupported
instructions are used. This is already the case for e.g.
checking MMX support on 386 arch targets.

Change-Id: If7b1cb6f43233841e917d37a18314d06a334a734
Reviewed-on: https://go-review.googlesource.com/c/go/+/354209
Trust: Martin Möhrmann <martin@golang.org>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-06 18:44:56 +00:00
Martin Möhrmann
ce914eab65 all: replace runtime SSE2 detection with GO386 setting
When GO386=sse2 we can assume sse2 to be present without
a runtime check. If GO386=softfloat is set we can avoid
the usage of SSE2 even if detected.

This might cause a memcpy, memclr and bytealg slowdown of Go
binaries compiled with softfloat on machines that support
SSE2. Such setups are rare and should use GO386=sse2 instead
if performance matters.

On targets that support SSE2 we avoid the runtime overhead of
dynamic cpu feature dispatch.

The removal of runtime sse2 checks also allows to simplify
internal/cpu further by removing handling of the required
feature option as a followup after this CL.

Change-Id: I90a853a8853a405cb665497c6d1a86556947ba17
Reviewed-on: https://go-review.googlesource.com/c/go/+/344350
Trust: Martin Möhrmann <martin@golang.org>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-08-23 21:22:58 +00:00
Martin Möhrmann
56b6d30531 runtime: use RDTSCP for instruction stream serialized read of TSC
To measure all instructions having been completed before reading
the time stamp counter with RDTSC an instruction sequence that
has instruction stream serializing properties which guarantee
waiting until all previous instructions have been executed is
needed. This does not necessary mean to wait for all stores to
be globally visible.

This CL aims to remove vendor specific logic for determining the
instruction sequence with CPU feature flag checks that are
CPU vendor independent.

For intel LFENCE has the wanted properties at least
since it was introduced together with SSE2 support.

On AMD instruction stream serializing LFENCE is supported by setting
an MSR C001_1029[1]=1 on AMD family 10h/12h/14h/15h/16h/17h processors.
AMD family 0Fh/11h processors support LFENCE as serializing always.
AMD plans support for this MSR and access to this bit for all future processors.
Source: https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf

Reading the MSR to determine LFENCE properties is not always possible
or reliable (hypervisors). The Linux kernel is relying on serializing
LFENCE on AMD CPUs since a commit in July 2019: https://lkml.org/lkml/2019/7/22/295
and the MSR C001_1029 to enable serialization has been set by default
with the Spectre v1 mitigations.

Using an MFENCE on AMD is waiting on previous instructions having been executed
but in addition also flushes store buffers.

To align the serialization properties without runtime detection
of CPU manufacturers we can use the newer RDTSCP instruction which
waits until all previous instructions have been executed.

RDTSCP is available on Intel since around 2008 and on AMD CPUs since
around 2006. Support for RDTSCP can be checked independently
of manufacturer by checking CPUID bits.

Using RDTSCP is the default in Linux to read TSC in program order
when the instruction is available.
e22ce8eb63/arch/x86/include/asm/msr.h (L231)

Change-Id: Ifa841843b9abb2816f8f0754a163ebf01385306d
Reviewed-on: https://go-review.googlesource.com/c/go/+/344429
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Martin Möhrmann <martin@golang.org>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-23 20:32:04 +00:00
Martin Möhrmann
703a7414bd internal/cpu: fix and cleanup ARM64 cpu feature fields and options
Remove all cpu features from the ARM64 struct that are not initialized
to reduce cache lines used and to avoid those features being
accidentially used without actual detection if they are present.

Add missing option to mask the CPUID feature.

Change-Id: I94bf90c0655de1af2218ac72117ac6c52adfc289
Reviewed-on: https://go-review.googlesource.com/c/go/+/267658
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Trust: Martin Möhrmann <moehrmann@google.com>
2020-11-05 10:46:08 +00:00
Jonathan Swinney
776987d857 runtime: improve memmove performance on arm64
Replace the memmove implementation for moves of 17 bytes or larger
with an implementation from ARM optimized software. The moves of 16
bytes or fewer are unchanged, but the registers used are updated to
match the rest of the implementation.

This implementation makes use of new optimizations:
 - software pipelined loop for large (>128 byte) moves
 - medium size moves (17..128 bytes) have a new implementation
 - address realignment when src or dst is unaligned
 - preference for aligned src (loads) or dst (stores) depending on CPU

To support preference for aligned loads or aligned stores, a new CPU
flag is added. This flag indicates that the detected micro
architecture performs better with aligned loads. Some tested CPUs did
not exhibit a significant difference and are left with the default
behavior of realigning based on the destination address (stores).

Neoverse N1 (Tested on Graviton 2)
name                               old time/op    new time/op     delta
Memmove/0-4                          1.88ns ± 1%     1.87ns ± 1%   -0.58%  (p=0.020 n=10+10)
Memmove/1-4                          4.40ns ± 0%     4.40ns ± 0%     ~     (all equal)
Memmove/8-4                          3.88ns ± 3%     3.80ns ± 0%   -1.97%  (p=0.001 n=10+9)
Memmove/16-4                         3.90ns ± 3%     3.80ns ± 0%   -2.49%  (p=0.000 n=10+9)
Memmove/32-4                         4.80ns ± 0%     4.40ns ± 0%   -8.33%  (p=0.000 n=9+8)
Memmove/64-4                         5.86ns ± 0%     5.00ns ± 0%  -14.76%  (p=0.000 n=8+8)
Memmove/128-4                        8.46ns ± 0%     8.06ns ± 0%   -4.62%  (p=0.000 n=10+10)
Memmove/256-4                        12.4ns ± 0%     12.2ns ± 0%   -1.61%  (p=0.000 n=10+10)
Memmove/512-4                        19.5ns ± 0%     19.1ns ± 0%   -2.05%  (p=0.000 n=10+10)
Memmove/1024-4                       33.7ns ± 0%     33.5ns ± 0%   -0.59%  (p=0.000 n=10+10)
Memmove/2048-4                       62.1ns ± 0%     59.0ns ± 0%   -4.99%  (p=0.000 n=10+10)
Memmove/4096-4                        117ns ± 1%      110ns ± 0%   -5.66%  (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4             6.41ns ± 0%     5.62ns ± 0%  -12.32%  (p=0.000 n=10+7)
MemmoveUnalignedDst/128-4            9.40ns ± 0%     8.34ns ± 0%  -11.24%  (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4            12.8ns ± 0%     12.8ns ± 0%     ~     (all equal)
MemmoveUnalignedDst/512-4            20.4ns ± 0%     19.7ns ± 0%   -3.43%  (p=0.000 n=9+10)
MemmoveUnalignedDst/1024-4           34.1ns ± 0%     35.1ns ± 0%   +2.93%  (p=0.000 n=9+9)
MemmoveUnalignedDst/2048-4           61.5ns ± 0%     60.4ns ± 0%   -1.77%  (p=0.000 n=10+10)
MemmoveUnalignedDst/4096-4            122ns ± 0%      113ns ± 0%   -7.38%  (p=0.002 n=8+10)
MemmoveUnalignedSrc/64-4             7.25ns ± 1%     6.26ns ± 0%  -13.64%  (p=0.000 n=9+9)
MemmoveUnalignedSrc/128-4            10.5ns ± 0%      9.7ns ± 0%   -7.52%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4            17.1ns ± 0%     17.3ns ± 0%   +1.17%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-4            27.0ns ± 0%     27.0ns ± 0%     ~     (all equal)
MemmoveUnalignedSrc/1024-4           46.7ns ± 0%     35.7ns ± 0%  -23.55%  (p=0.000 n=10+9)
MemmoveUnalignedSrc/2048-4           85.2ns ± 0%     61.2ns ± 0%  -28.17%  (p=0.000 n=10+8)
MemmoveUnalignedSrc/4096-4            162ns ± 0%      113ns ± 0%  -30.25%  (p=0.000 n=10+10)

name                               old speed      new speed       delta
Memmove/4096-4                     35.2GB/s ± 0%   37.1GB/s ± 0%   +5.56%  (p=0.000 n=10+9)
MemmoveUnalignedSrc/1024-4         21.9GB/s ± 0%   28.7GB/s ± 0%  +30.90%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/2048-4         24.0GB/s ± 0%   33.5GB/s ± 0%  +39.18%  (p=0.000 n=10+9)
MemmoveUnalignedSrc/4096-4         25.3GB/s ± 0%   36.2GB/s ± 0%  +43.50%  (p=0.000 n=10+7)

Cortex-A72 (Graviton 1)
name                               old time/op    new time/op    delta
Memmove/0-4                          3.06ns ± 3%    3.08ns ± 1%     ~     (p=0.958 n=10+9)
Memmove/1-4                          8.72ns ± 0%    7.85ns ± 0%   -9.98%  (p=0.002 n=8+10)
Memmove/8-4                          8.29ns ± 0%    8.29ns ± 0%     ~     (all equal)
Memmove/16-4                         8.29ns ± 0%    8.29ns ± 0%     ~     (all equal)
Memmove/32-4                         8.19ns ± 2%    8.29ns ± 0%     ~     (p=0.114 n=10+10)
Memmove/64-4                         18.3ns ± 4%    10.0ns ± 0%  -45.36%  (p=0.000 n=10+10)
Memmove/128-4                        14.8ns ± 0%    17.4ns ± 0%  +17.77%  (p=0.000 n=10+10)
Memmove/256-4                        21.8ns ± 0%    23.1ns ± 0%   +5.96%  (p=0.000 n=10+10)
Memmove/512-4                        35.8ns ± 0%    37.2ns ± 0%   +3.91%  (p=0.000 n=10+10)
Memmove/1024-4                       63.7ns ± 0%    67.2ns ± 0%   +5.49%  (p=0.000 n=10+10)
Memmove/2048-4                        126ns ± 0%     123ns ± 0%   -2.38%  (p=0.000 n=10+10)
Memmove/4096-4                        238ns ± 1%     243ns ± 1%   +1.93%  (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4             19.3ns ± 1%    12.0ns ± 1%  -37.49%  (p=0.000 n=10+10)
MemmoveUnalignedDst/128-4            17.2ns ± 0%    17.4ns ± 0%   +1.16%  (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4            28.2ns ± 8%    29.2ns ± 0%     ~     (p=0.352 n=10+10)
MemmoveUnalignedDst/512-4            49.8ns ± 3%    48.9ns ± 0%     ~     (p=1.000 n=10+10)
MemmoveUnalignedDst/1024-4           89.5ns ± 0%    80.5ns ± 1%  -10.02%  (p=0.000 n=10+10)
MemmoveUnalignedDst/2048-4            180ns ± 0%     127ns ± 0%  -29.44%  (p=0.000 n=9+10)
MemmoveUnalignedDst/4096-4            347ns ± 0%     244ns ± 0%  -29.59%  (p=0.000 n=10+9)
MemmoveUnalignedSrc/128-4            16.1ns ± 0%    21.8ns ± 0%  +35.40%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4            24.9ns ± 8%    26.6ns ± 0%   +6.70%  (p=0.015 n=10+10)
MemmoveUnalignedSrc/512-4            39.4ns ± 6%    40.6ns ± 0%     ~     (p=0.352 n=10+10)
MemmoveUnalignedSrc/1024-4           72.5ns ± 0%    83.0ns ± 1%  +14.44%  (p=0.000 n=9+10)
MemmoveUnalignedSrc/2048-4            129ns ± 1%     128ns ± 1%     ~     (p=0.179 n=10+10)
MemmoveUnalignedSrc/4096-4            241ns ± 0%     253ns ± 1%   +4.99%  (p=0.000 n=9+9)

Cortex-A53 (Raspberry Pi 3)
name                               old time/op    new time/op    delta
Memmove/0-4                          11.0ns ± 0%    11.0ns ± 1%     ~     (p=0.294 n=8+10)
Memmove/1-4                          29.6ns ± 0%    28.0ns ± 1%   -5.41%  (p=0.000 n=9+10)
Memmove/8-4                          23.5ns ± 0%    22.1ns ± 0%   -6.11%  (p=0.000 n=8+8)
Memmove/16-4                         23.7ns ± 1%    22.1ns ± 0%   -6.59%  (p=0.000 n=10+8)
Memmove/32-4                         27.9ns ± 0%    27.1ns ± 0%   -3.13%  (p=0.000 n=8+8)
Memmove/64-4                         33.8ns ± 0%    31.5ns ± 1%   -6.99%  (p=0.000 n=8+10)
Memmove/128-4                        45.6ns ± 0%    44.2ns ± 1%   -3.23%  (p=0.000 n=9+10)
Memmove/256-4                        69.3ns ± 0%    69.3ns ± 0%     ~     (p=0.072 n=8+8)
Memmove/512-4                         127ns ± 0%     110ns ± 0%  -13.39%  (p=0.000 n=8+8)
Memmove/1024-4                        222ns ± 0%     205ns ± 1%   -7.66%  (p=0.000 n=7+10)
Memmove/2048-4                        411ns ± 0%     366ns ± 0%  -10.98%  (p=0.000 n=8+9)
Memmove/4096-4                        795ns ± 1%     695ns ± 1%  -12.63%  (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4             44.0ns ± 0%    40.5ns ± 0%   -7.93%  (p=0.000 n=8+8)
MemmoveUnalignedDst/128-4            59.6ns ± 0%    54.9ns ± 0%   -7.85%  (p=0.000 n=9+9)
MemmoveUnalignedDst/256-4            98.2ns ±11%    90.0ns ± 1%     ~     (p=0.130 n=10+10)
MemmoveUnalignedDst/512-4             161ns ± 2%     145ns ± 1%   -9.96%  (p=0.000 n=10+10)
MemmoveUnalignedDst/1024-4            281ns ± 0%     265ns ± 0%   -5.65%  (p=0.000 n=9+8)
MemmoveUnalignedDst/2048-4            528ns ± 0%     482ns ± 0%   -8.73%  (p=0.000 n=8+9)
MemmoveUnalignedDst/4096-4           1.02µs ± 1%    0.92µs ± 0%  -10.00%  (p=0.000 n=10+8)
MemmoveUnalignedSrc/64-4             42.4ns ± 1%    40.5ns ± 0%   -4.39%  (p=0.000 n=10+8)
MemmoveUnalignedSrc/128-4            57.4ns ± 0%    57.0ns ± 1%   -0.75%  (p=0.048 n=9+10)
MemmoveUnalignedSrc/256-4            88.1ns ± 1%    89.6ns ± 0%   +1.70%  (p=0.000 n=9+8)
MemmoveUnalignedSrc/512-4             160ns ± 2%     144ns ± 0%   -9.89%  (p=0.000 n=10+8)
MemmoveUnalignedSrc/1024-4            286ns ± 0%     266ns ± 1%   -6.69%  (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-4            525ns ± 0%     483ns ± 1%   -7.96%  (p=0.000 n=9+10)
MemmoveUnalignedSrc/4096-4           1.01µs ± 0%    0.92µs ± 1%   -9.40%  (p=0.000 n=8+10)

Change-Id: Ia1144e9d4dfafdece6e167c5e576bf80f254c8ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/243357
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-02 15:23:43 +00:00
Tobias Klauser
d818d754bd internal/cpu: use anonymous struct for CPU feature vars
Like in x/sys/cpu, use anonymous structs to declare the CPU feature vars
instead of defining single-use types. Also, order the vars
alphabetically.

Change-Id: Iedd3ca51916e3cbb852d2aeed18b3a4c6613e778
Reviewed-on: https://go-review.googlesource.com/c/go/+/221757
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2020-03-02 08:32:38 +00:00
Meng Zhuo
aa739e3236 internal/cpu: add MIPS64x feature detection
Change-Id: Iacdad1758aa15e4703fccef38c08ecb338b95fd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/200579
Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-28 23:18:52 +00:00
Ainar Garipov
69c14df03f all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
bill_ofarrell
36c3da6bc1 internal/cpu: add detection for the new ECDSA and EDDSA capabilities on s390x
This CL will check for the Message-Security-Assist Extension 9 facility
which enables the KDSA instruction.

Change-Id: I659aac09726e0999ec652ef1f5983072c8131a48
Reviewed-on: https://go-review.googlesource.com/c/go/+/174529
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-30 23:31:26 +00:00
Michael Munday
abfeaefa3e internal/cpu: change s390x API to match x/sys/cpu
This CL changes the internal/cpu API to more closely match the
public version in x/sys/cpu (added in CL 163003). This will make it
easier to update the dependencies of vendored code. The most prominent
renaming is from VE1 to VXE for the vector-enhancements facility 1.
VXE is the mnemonic used for this facility in the HWCAP vector.

Change-Id: I922d6c8bb287900a4bd7af70567e22eac567b5c1
Reviewed-on: https://go-review.googlesource.com/c/164437
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-28 13:46:45 +00:00
bill_ofarrell
fa7b7887a6 crypto/elliptic: utilize faster z14 multiply/square instructions (when available)
In the s390x assembly implementation of NIST P-256 curve, utilize faster multiply/square
instructions introduced in the z14. These new instructions are designed for crypto
and are constant time. The algorithm is unchanged except for faster
multiplication when run on a z14 or later. On z13, the original mutiplication
(also constant time) is used.

P-256 performance is critical in many applications, such as Blockchain.

name            old time      new time     delta
BaseMultP256    24396 ns/op   21564 ns/op  1.13x
ScalarMultP256  87546 ns/op   72813 ns/op. 1.20x

Change-Id: I7e6d8b420fac56d5f9cc13c9423e2080df854bac
Reviewed-on: https://go-review.googlesource.com/c/146022
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
2018-12-05 10:58:44 +00:00
Martin Möhrmann
4054eb80eb internal/cpu: move GODEBUGCPU options into GODEBUG
Change internal/cpu feature configuration to use
GODEBUG=cpu.feature1=value,cpu.feature2=value...
instead of GODEBUGCPU=feature1=value,feature2=value... .

This is not a backwards compatibility breaking change
since GODEBUGCPU was introduced in go1.11 as an
undocumented compiler experiment.

Fixes #28757

Change-Id: Ib21b3fed2334baeeb061a722ab1eb513d1137e87
Reviewed-on: https://go-review.googlesource.com/c/149578
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-14 21:47:50 +00:00
Martin Möhrmann
43c7ae18d6 internal/cpu: remove unused and not required ppc64(le) feature detection
Minimum Go requirement for ppc64(le) architecture support is POWER8.
https://github.com/golang/go/wiki/MinimumRequirements#ppc64-big-endian

Reduce CPU features supported in internal/cpu to those needed to
test minimum requirements and cpu feature kernel support for ppc64(le).
Currently no internal/cpu feature variables are used to guard code
from using unsupported instructions. The IsPower9 feature variable
and detection is kept as it will soon be used to guard code execution.

Reducing the set of detected CPU features for ppc64(le) makes
implementing Go support for new operating systems easier as
CPU feature detection for ppc64(le) needs operating system support
(e.g. hwcap on Linux and getsystemcfg syscall on AIX).

Change-Id: Ic4c17b31610970e481cd139c657da46507391d1d
Reviewed-on: https://go-review.googlesource.com/c/145117
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-29 19:23:48 +00:00
Martin Möhrmann
3a03f451de internal/cpu: add options and warnings for required cpu features
Updates #27218

Change-Id: I8603f3a639cdd9ee201c4f1566692e5b88877fc4
Reviewed-on: https://go-review.googlesource.com/c/144107
Run-TryBot: Martin Möhrmann <martisch@uos.de>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-24 06:27:53 +00:00
Martin Möhrmann
2f4b08269d internal/cpu: add invalid option warnings and support to enable cpu features
This CL adds the ability to enable the cpu feature FEATURE by specifying
FEATURE=on in GODEBUGCPU. Syntax support to enable cpu features is useful
in combination with a preceeding all=off to disable all but some specific
cpu features. Example:

GODEBUGCPU=all=off,sse3=on

This CL implements printing of warnings for invalid GODEBUGCPU settings:
- requests enabling features that are not supported with the current CPU
- specifying values different than 'on' or 'off' for a feature
- settings for unkown cpu feature names

Updates #27218

Change-Id: Ic13e5c4c35426a390c50eaa4bd2a408ef2ee21be
Reviewed-on: https://go-review.googlesource.com/c/141800
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 21:46:44 +00:00
Akhil Indurti
9bb790b61f internal/cpu: expose ARM feature flags for FMA
This change exposes feature flags needed to implement an FMA intrinsic
on ARM CPUs via auxv's HWCAP bits. Specifically, it exposes HasVFPv4 to
detect if an ARM processor has the fourth version of the vector floating
point unit. The relevant instruction for this CL is VFMA, emitted in Go
as FMULAD.

Updates #26630.

Change-Id: Ibbc04fb24c2b4d994f93762360f1a37bc6d83ff7
Reviewed-on: https://go-review.googlesource.com/c/126315
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2018-10-15 10:57:04 +00:00
Martin Möhrmann
b612cef047 internal/cpu: use 'off' for disabling cpu capabilities instead of '0'
Updates #27218

Change-Id: I4ce20376fd601b5f958d79014af7eaf89e9de613
Reviewed-on: https://go-review.googlesource.com/c/141818
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 16:44:26 +00:00
Martin Möhrmann
412a0d0825 internal/cpu: enable support for GODEBUGCPU in non-experimental builds
Enabling GODEBUGCPU without the need to set GOEXPERIMENT=debugcpu  enables
trybots and builders to run tests for GODEBUGCPU features in upcoming CLs
that will implement the new syntax and features for non-experimental
GODEBUGCPU support from proposal golang.org/issue/27218.

Updates #27218

Change-Id: Icc69e51e736711a86b02b46bd441ffc28423beba
Reviewed-on: https://go-review.googlesource.com/c/141817
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 15:40:45 +00:00
Martin Möhrmann
59943e6039 runtime: replace sys.CacheLineSize by corresponding internal/cpu const and vars
sys here is runtime/internal/sys.

Replace uses of sys.CacheLineSize for padding by
cpu.CacheLinePad or cpu.CacheLinePadSize.
Replace other uses of sys.CacheLineSize by cpu.CacheLineSize.
Remove now unused sys.CacheLineSize.

Updates #25203

Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f
Reviewed-on: https://go-review.googlesource.com/126601
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-24 18:28:25 +00:00
Martin Möhrmann
61eca08f29 internal/cpu: add a CacheLinePadSize constant
The new constant CacheLinePadSize can be used to compute best effort
alignment of structs to cache lines.

e.g. the runtime can use this in the locktab definition:
var locktab [57]struct {
        l   spinlock
        pad [cpu.CacheLinePadSize - unsafe.Sizeof(spinlock{})]byte
}

Change-Id: I86f6fbfc5ee7436f742776a7d4a99a1d54ffccc8
Reviewed-on: https://go-review.googlesource.com/131237
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-24 17:45:28 +00:00
Martin Möhrmann
916e6a75d4 runtime: move arm hardware division support detection to internal/cpu
Assumes mandatory VFP and VFPv3 support to be present by default
but not IDIVA if AT_HWCAP is not available.

Adds GODEBUGCPU options to disable the use of code paths in the runtime
that use hardware support for division.

Change-Id: Ida02311bd9b9701de3fc120697e69445bf6c0853
Reviewed-on: https://go-review.googlesource.com/114826
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 14:27:07 +00:00
Ian Lance Taylor
555d3d9224 runtime: don't use linkname to refer to internal/cpu
The runtime package already imports the internal/cpu package, so there
is no reason for it to use go:linkname comments to refer to
internal/cpu functions and variables. Since internal/cpu is internal,
we can just export those names. Removing the obscurity of go:linkname
outweighs the minor additional complexity added to the internal/cpu API.

Change-Id: Id89951b7f3fc67cd9bce67ac6d01d44a647a10ad
Reviewed-on: https://go-review.googlesource.com/128755
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2018-08-21 14:36:09 +00:00
Martin Möhrmann
e7915146db internal/cpu: add and use cpu.CacheLinePad for padding structs
Add a CacheLinePad struct type to internal/cpu that has a size of CacheLineSize.
This can be used for padding structs in order to avoid false sharing.

Updates #25203

Change-Id: Icb95ae68d3c711f5f8217140811cad1a1d5be79a
Reviewed-on: https://go-review.googlesource.com/116276
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-20 15:41:37 +00:00
Michael Munday
f976716d93 crypto, internal/cpu: fix s390x AES feature detection and update SHA implementations
Hardware AES support in Go on s390x currently requires ECB, CBC
and CTR modes be available. It also requires that either the
GHASH or GCM facilities are available. The existing checks missed
some of these constraints.

While we're here simplify the cpu package on s390x, moving masking
code out of assembly and into Go code. Also, update SHA-{1,256,512}
implementations to use the cpu package since that is now trivial.

Finally I also added a test for internal/cpu on s390x which loads
/proc/cpuinfo and checks it against the flags set by internal/cpu.

Updates #25822 for changes to vet whitelist.

Change-Id: Iac4183f571643209e027f730989c60a811c928eb
Reviewed-on: https://go-review.googlesource.com/114397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-11 11:41:31 +00:00
Elias Naur
8e1f964b79 cmd/compile,go/build,internal/cpu: gofmt
I don't know why these files were not formatted. Perhaps because
their changes came from Github PRs?

Change-Id: Ida8d7b9a36f0d1064caf74ca1911696a247a9bbe
Reviewed-on: https://go-review.googlesource.com/114824
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-26 20:52:04 +00:00
Anit Gandhi
9cc86063e4 crypto/{aes,internal/cipherhw,tls}: use common internal/cpu in place of cipherhw
When the internal/cpu package was introduced, the AES package still used
the custom crypto/internal/cipherhw package for amd64 and s390x. This
change removes that package entirely in favor of directly referencing the
cpu feature flags set and exposed by the internal/cpu package. In
addition, 5 new flags have been added to the internal/cpu s390x struct
for detecting various cipher message (KM) features.

Change-Id: I77cdd8bc1b04ab0e483b21bf1879b5801a4ba5f4
GitHub-Last-Rev: a611e3ecb1f480dcbfce3cb0c8c9e4058f56c1a4
GitHub-Pull-Request: golang/go#24766
Reviewed-on: https://go-review.googlesource.com/105695
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-23 22:22:09 +00:00
Martin Möhrmann
7cc321e799 internal/cpu: add experiment to disable CPU features with GODEBUGCPU
Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active.

The GODEBUGCPU environment variable can be used to disable usage of
specific processor features in the Go standard library.
This is useful for testing and benchmarking different code paths that
are guarded by internal/cpu variable checks.

Use of processor features can not be enabled through GODEBUGCPU.

To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use:
GODEBUGCPU=avx=0,sse41=0

The special "all" option can be used to disable all options:
GODEBUGCPU=all=0

Updates #12805
Updates #15403

Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab
Reviewed-on: https://go-review.googlesource.com/91737
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-22 18:49:31 +00:00
Meng Zhuo
5137f1ebb0 internal/cpu,runtime: call cpu.initialize before alginit
runtime.alginit needs runtime/support_{aes,ssse3,sse41} feature flag
to init aeshash function but internal/cpu.init not be called yet.
This CL will call internal/cpu.initialize before runtime.alginit, so
that we can move all cpu features related code to internal/cpu.

Change-Id: I00b8e403ace3553f8c707563d95f27dade0bc853
Reviewed-on: https://go-review.googlesource.com/104636
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-10 16:13:52 +00:00
Meng Zhuo
576d896006 internal/cpu: update arm64 cpu features
Follow the Linux Kernel 4.15
Add Arm64 minimalFeatures test

Change-Id: I1c092521ba59b1e4096c27786fa0464f9ef7d311
Reviewed-on: https://go-review.googlesource.com/103636
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-31 07:36:32 +00:00
Keith Randall
1de695fb50 internal/bytealg: move IndexByte asssembly to the new bytealg package
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.

Once this is in, the next step is to move the other functions
(Compare, Equal, ...).

Update #19792

This change seems complicated enough that we might just declare
"not worth it" and abandon.  Opinions welcome.

The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.

The wrapper functions have been cleaned up as they are now actually
checked by vet.

Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
Reviewed-on: https://go-review.googlesource.com/98016
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 22:46:15 +00:00
Fangming.Fang
9e63ba8dc2 internal/cpu: detect cpu features in internal/cpu package
change hash/crc32 package to use cpu package instead of using
runtime internal variables to check crc32 instruction

Change-Id: I8f88d2351bde8ed4e256f9adf822a08b9a00f532
Reviewed-on: https://go-review.googlesource.com/76490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-14 19:07:15 +00:00
Ilya Tocar
122b678a1b cmd/internal/obj/x86: add ADX extension
Add support for ADX cpuid bit detection and all instructions,
implied by that bit (ADOX/ADCX). They are useful for rsa and math/big in
general.

Change-Id: Idaa93303ead48fd18b9b3da09b3e79de2f7e2193
Reviewed-on: https://go-review.googlesource.com/74850
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-02 15:41:50 +00:00
Martin Möhrmann
eb238e2799 internal/cpu: add support for x86 FMA cpu feature detection
Change-Id: I88ea39de01b07e6afa1c187c0df6a258da4aa8e4
Reviewed-on: https://go-review.googlesource.com/62650
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-11 05:20:01 +00:00
Agniva De Sarker
b8e2625d81 all: fix easy-to-miss typos
Using the wonderful https://github.com/client9/misspell tool.

Change-Id: Icdbc75a5559854f4a7a61b5271bcc7e3f99a1a24
Reviewed-on: https://go-review.googlesource.com/57851
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-23 03:07:12 +00:00
Carlos Eduardo Seo
9cd63e6046 runtime, internal/cpu: CPU capabilities detection for ppc64x
This change replaces the current runtime capabilities check for ppc64x with the
new internal/cpu package. It also adds support for the new POWER9 ISA and
capabilities.

Updates #15403

Change-Id: I5b64a79e782f8da3603e5529600434f602986292
Reviewed-on: https://go-review.googlesource.com/53830
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2017-08-14 12:16:42 +00:00
Martin Möhrmann
b2207a94a3 internal/cpu: new package to detect cpu features
Implements detection of x86 cpu features that
are used in the go standard library.

Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.

Updates: #15403

Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-10 17:02:21 +00:00