LoongArch Options (Using the GNU Compiler Collection (GCC))

3.19.22 LoongArch Options

These command-line options are defined for LoongArch targets:

-march=arch-type

Generate instructions for the machine type arch-type. -march=arch-type allows GCC to generate code that may not run at all on processors other than the one indicated.

The choices for arch-type are:

native: Local processor type detected by the native compiler.
loongarch64: Generic LoongArch 64-bit processor.
la464: LoongArch LA464-based processor with LSX, LASX.
la664: LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1 instructions.
la64v1.0: LoongArch64 ISA version 1.0.
la64v1.1: LoongArch64 ISA version 1.1.

More information about LoongArch ISA versions can be found at https://github.com/loongson/la-toolchain-conventions.

-mtune=tune-type

Optimize the generated code for the given processor target.

The choices for tune-type are:

native: Local processor type detected by the native compiler.
generic: Generic LoongArch processor.
loongarch64: Generic LoongArch 64-bit processor.
la464: LoongArch LA464 core.
la664: LoongArch LA664 core.

-mabi=base-abi-type

Generate code for the specified calling convention. base-abi-type can be one of:

lp64d: Uses 64-bit general purpose registers and 32/64-bit floating-point registers for parameter passing. Data model is LP64, where int is 32 bits, while long int and pointers are 64 bits.
lp64f: Uses 64-bit general purpose registers and 32-bit floating-point registers for parameter passing. Data model is LP64, where int is 32 bits, while long int and pointers are 64 bits.
lp64s: Uses 64-bit general purpose registers and no floating-point registers for parameter passing. Data model is LP64, where int is 32 bits, while long int and pointers are 64 bits.

-mfpu=fpu-type

Generate code for the specified FPU type, which can be one of:

64: Allow the use of hardware floating-point instructions for 32-bit and 64-bit operations.
32: Allow the use of hardware floating-point instructions for 32-bit operations.
none
0: Prevent the use of hardware floating-point instructions.

-msimd=simd-type

Enable generation of LoongArch SIMD instructions for vectorization and via builtin functions. The value can be one of:

lasx: Enable generating instructions from the 256-bit LoongArch Advanced SIMD Extension (LASX) and the 128-bit LoongArch SIMD Extension (LSX).
lsx: Enable generating instructions from the 128-bit LoongArch SIMD Extension (LSX).
none: No LoongArch SIMD instruction may be generated.

-msoft-float

Force -mfpu=none and prevents the use of floating-point registers for parameter passing. This option may change the target ABI.

-msingle-float

Force -mfpu=32 and allow the use of 32-bit floating-point registers for parameter passing. This option may change the target ABI.

-mdouble-float

Force -mfpu=64 and allow the use of 32/64-bit floating-point registers for parameter passing. This option may change the target ABI.

-mlasx

-mno-lasx

-mlsx

-mno-lsx

Incrementally adjust the scope of the SIMD extensions (none / LSX / LASX) that can be used by the compiler for code generation. Enabling LASX with mlasx automatically enables LSX, and diabling LSX with mno-lsx automatically disables LASX. These driver-only options act upon the final msimd configuration state and make incremental chagnes in the order they appear on the GCC drivers command line, deriving the final / canonicalized msimd option that is passed to the compiler proper.

-mbranch-cost=n

Set the cost of branches to roughly n instructions.

-mcheck-zero-division

-mno-check-zero-divison

Trap (do not trap) on integer division by zero. The default is -mcheck-zero-division for -O0 or -Og, and -mno-check-zero-division for other optimization levels.

-mcond-move-int

-mno-cond-move-int

Conditional moves for integral data in general-purpose registers are enabled (disabled). The default is -mcond-move-int.

-mcond-move-float

-mno-cond-move-float

Conditional moves for floating-point registers are enabled (disabled). The default is -mcond-move-float.

-mmemcpy

-mno-memcpy

Force (do not force) the use of memcpy for non-trivial block moves. The default is -mno-memcpy, which allows GCC to inline most constant-sized copies. Setting optimization level to -Os also forces the use of memcpy, but -mno-memcpy may override this behavior if explicitly specified, regardless of the order these options on the command line.

-mstrict-align

-mno-strict-align

Avoid or allow generating memory accesses that may not be aligned on a natural object boundary as described in the architecture specification. The default is -mno-strict-align.

-msmall-data-limit=number

Put global and static data smaller than number bytes into a special section (on some targets). The default value is 0.

-mmax-inline-memcpy-size=n

Inline all block moves (such as calls to memcpy or structure copies) less than or equal to n bytes. The default value of n is 1024.

-mcmodel=code-model

Set the code model to one of:

tiny-static (Not implemented yet)
tiny (Not implemented yet)
normal: The text segment must be within 128MB addressing space. The data segment must be within 2GB addressing space.
medium: The text segment and data segment must be within 2GB addressing space.
large (Not implemented yet)
extreme: This mode does not limit the size of the code segment and data segment. The -mcmodel=extreme option is incompatible with -fplt and/or -mexplicit-relocs=none.

The default code model is normal.

-mexplicit-relocs=style

Set when to use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may limit instruction scheduling but allow linker relaxation. with -mexplicit-relocs=none the assembler macros are always used, with -mexplicit-relocs=always the assembler relocation operators are always used, with -mexplicit-relocs=auto the compiler will use the relocation operators where the linker relaxation is impossible to improve the code quality, and macros elsewhere. The default value for the option is determined with the assembler capability detected during GCC build-time and the setting of -mrelax: -mexplicit-relocs=none if the assembler does not support relocation operators at all, -mexplicit-relocs=always if the assembler supports relocation operators but -mrelax is not enabled, -mexplicit-relocs=auto if the assembler supports relocation operators and -mrelax is enabled.

-mexplicit-relocs

An alias of -mexplicit-relocs=always for backward compatibility.

-mno-explicit-relocs

An alias of -mexplicit-relocs=none for backward compatibility.

-mdirect-extern-access

-mno-direct-extern-access

Do not use or use GOT to access external symbols. The default is -mno-direct-extern-access: GOT is used for external symbols with default visibility, but not used for other external symbols.

With -mdirect-extern-access, GOT is not used and all external symbols are PC-relatively addressed. It is only suitable for environments where no dynamic link is performed, like firmwares, OS kernels, executables linked with -static or -static-pie. -mdirect-extern-access is not compatible with -fPIC or -fpic.

-mrelax

-mno-relax

Take (do not take) advantage of linker relaxations. If -mpass-mrelax-to-as is enabled, this option is also passed to the assembler. The default is determined during GCC build-time by detecting corresponding assembler support: -mrelax if the assembler supports both the -mrelax option and the conditional branch relaxation (its required or the .align directives and conditional branch instructions in the assembly code outputted by GCC may be rejected by the assembler because of a relocation overflow), -mno-relax otherwise.

-mpass-mrelax-to-as

-mno-pass-mrelax-to-as

Pass (do not pass) the -mrelax or -mno-relax option to the assembler. The default is determined during GCC build-time by detecting corresponding assembler support: -mpass-mrelax-to-as if the assembler supports the -mrelax option, -mno-pass-mrelax-to-as otherwise. This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one.

-mrecip

This option enables use of the reciprocal estimate and reciprocal square root estimate instructions with additional Newton-Raphson steps to increase precision instead of doing a divide or square root and divide for floating-point arguments. These instructions are generated only when -funsafe-math-optimizations is enabled together with -ffinite-math-only and -fno-trapping-math. This option is off by default. Before you can use this option, you must sure the target CPU supports frecipe and frsqrte instructions. Note that while the throughput of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).

-mrecip=opt

This option controls which reciprocal estimate instructions may be used. opt is a comma-separated list of options, which may be preceded by a ! to invert the option:

all: Enable all estimate instructions.
default: Enable the default instructions, equivalent to -mrecip.
none: Disable all estimate instructions, equivalent to -mno-recip.
div: Enable the approximation for scalar division.
vec-div: Enable the approximation for vectorized division.
sqrt: Enable the approximation for scalar square root.
vec-sqrt: Enable the approximation for vectorized square root.
rsqrt: Enable the approximation for scalar reciprocal square root.
vec-rsqrt: Enable the approximation for vectorized reciprocal square root.

So, for example, -mrecip=all,!sqrt enables all of the reciprocal approximations, except for scalar square root.

-mfrecipe

-mno-frecipe

Use (do not use) frecipe.{s/d} and frsqrte.{s/d} instructions. When build with -march=la664, it is enabled by default. The default is -mno-frecipe.

-mdiv32

-mno-div32

Use (do not use) div.w[u] and mod.w[u] instructions with input not sign-extended. When build with -march=la664, it is enabled by default. The default is -mno-div32.

-mlam-bh

-mno-lam-bh

Use (do not use) am{swap/add}[_db].{b/h} instructions. When build with -march=la664, it is enabled by default. The default is -mno-lam-bh.

-mlamcas

-mno-lamcas

Use (do not use) amcas[_db].{b/h/w/d} instructions. When build with -march=la664, it is enabled by default. The default is -mno-lamcas.

-mld-seq-sa

-mno-ld-seq-sa

Whether a same-address load-load barrier (dbar 0x700) is needed. When build with -march=la664, it is enabled by default. The default is -mno-ld-seq-sa, the load-load barrier is needed.

-mtls-dialect=opt

This option controls which tls dialect may be used for general dynamic and local dynamic TLS models.

trad: Use traditional TLS. This is the default.
desc: Use TLS descriptors.

--param loongarch-vect-unroll-limit=n

The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This parameter sets the upper bound of how much the vectorizer will unroll the main loop. The default value is six.