Непослідовне усічення безпідписаних виразів цілих чисел бітфілдів між C ++ та C у різних компіляторах

Редагувати 2 :

Я налагоджував дивну тестову помилку, коли функція, яка раніше знаходилась у вихідному файлі C ++, але перейшла у дослідний файл C, почала повертати неправильні результати. Приведений нижче MVE дозволяє відтворити проблему з GCC. Однак, коли я, за примхою, склав приклад з Кланг (а пізніше і з В. С.), я отримав інший результат! Я не можу зрозуміти, чи слід трактувати це як помилку в одному з компіляторів, або як прояв невизначеного результату, дозволеного стандартом C або C ++. Як не дивно, жоден компілятор не дав мені жодних попереджень щодо виразу.

Винуватець цього вираження:

ctl.b.p52 << 12;

Тут p52вводиться як uint64_t; вона також є частиною союзу (див. control_tнижче). Операція зсуву не втрачає жодних даних, оскільки результат все ще вкладається в 64 біти. Однак тоді GCC вирішує скоротити результат до 52 біт, якщо я використовую компілятор C ! За допомогою компілятора C ++ всі 64 біти результату зберігаються.

Щоб проілюструвати це, наведена нижче прикладна програма компілює дві функції з однаковими тілами, а потім порівнює їх результати. c_behavior()розміщується у вихідному файлі C та cpp_behavior()у файлі C ++ та main()робить порівняння.

Сховище з прикладом коду: https://github.com/grigory-rechistov/c-cpp-bitfields

Заголовок common.h визначає об'єднання 64-бітових бітових полів і цілого числа та оголошує дві функції:

#ifndef COMMON_H
#define COMMON_H

#include <stdint.h>

typedef union control {
        uint64_t q;
        struct {
                uint64_t a: 1;
                uint64_t b: 1;
                uint64_t c: 1;
                uint64_t d: 1;
                uint64_t e: 1;
                uint64_t f: 1;
                uint64_t g: 4;
                uint64_t h: 1;
                uint64_t i: 1;
                uint64_t p52: 52;
        } b;
} control_t;

#ifdef __cplusplus
extern "C" {
#endif

uint64_t cpp_behavior(control_t ctl);
uint64_t c_behavior(control_t ctl);

#ifdef __cplusplus
}
#endif

#endif // COMMON_H

Функції мають однакові тіла, за винятком того, що одне трактується як C, а інше як C ++.

c-part.c:

#include <stdint.h>
#include "common.h"
uint64_t c_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

cpp-part.cpp:

#include <stdint.h>
#include "common.h"
uint64_t cpp_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

main.c:

#include <stdio.h>
#include "common.h"

int main() {
    control_t ctl;
    ctl.q = 0xfffffffd80236000ull;

    uint64_t c_res = c_behavior(ctl);
    uint64_t cpp_res = cpp_behavior(ctl);
    const char *announce = c_res == cpp_res? "C == C++" : "OMG C != C++";
    printf("%s\n", announce);

    return c_res == cpp_res? 0: 1;
}

GCC показує різницю між результатами, які вони повертають:

$ gcc -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
OMG C != C++

Однак із Clang C та C ++ поводяться однаково та як очікувалося:

$ clang -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
C == C++

З Visual Studio я отримую той же результат, що і з Clang:

C:\Users\user\Documents>cl main.c c-part.c cpp-part.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24234.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
c-part.c
Generating Code...
Compiling...
cpp-part.cpp
Generating Code...
Microsoft (R) Incremental Linker Version 14.00.24234.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
c-part.obj
cpp-part.obj

C:\Users\user\Documents>main.exe
C == C++

Я спробував приклади в Windows, навіть незважаючи на те, що оригінальна проблема з GCC була виявлена в Linux.

— Григорій Речистов
джерело

бітові поля, як відомо, нерівні для великих ширин. Я стикався з подібними проблемами в цьому питанні: stackoverflow.com/questions/58846584 / ...

— chqrlie

@chqrlie Я прочитав оператора С<< як вимагає усічення.

— Ендрю Генле

Будь ласка, опублікуйте приклад stackoverflow.com/help/minimal-reproducible-example . Поточний код не має main.cі, ймовірно, викликає не визначену поведінку кількома способами. ІМО було б зрозуміліше розміщувати однофакторний MRE, який видає різний вихід при компіляції з кожним компілятором. Оскільки інтероп C-C ++ стандартним чином не визначений. Також зауважте, що згладжування об'єднання викликає UB в C ++.

— ММ

@MM Правильно, вона прослизнула, коли я публікувала питання. Я додав це зараз, а також думаю, що невеликий сховище з ним також може бути ідеєю

— Григорій Речистов

@MM "IMO, було б зрозуміліше розмістити однофайловий MRE, який видає різний вихід при компіляції з кожним компілятором." Я не замислювався над цим, оскільки перетворював свій виробничий код у щось менше, але це повинно бути можливо переформулюйте відтворювач в один файл.

— Григорій Речистов

Відповіді:

C і C ++ по-різному ставляться до типів членів бітового поля.

C 2018 6.7.2.1 10 каже:

Бітове поле інтерпретується як те, що має підписаний або непідписаний цілочисельний тип, що складається із заданої кількості бітів…

Зверніть увагу, що це не конкретно щодо типу - це якийсь цілий тип - і не сказано, що тип - це тип, який використовувався для оголошення бітового поля, як uint64_t a : 1;показано в питанні. Це, мабуть, залишає відкритим для реалізації вибір типу.

C ++ 2017 чернетка n4659 12.2.4 [class.bit] 1 говорить про декларацію про бітове поле:

… Атрибут бітового поля не є частиною типу члена класу…

Це означає, що в декларації типу uint64_t a : 1;, : 1яка не є частиною типу члена класу a, тому тип є таким, як би він був uint64_t a;, і, таким чином, тип aє uint64_t.

Таким чином, схоже, GCC розглядає бітове поле в C як деяке ціле число 32-бітового типу або вужче, якщо воно підходить, а бітове поле в C ++ як оголошений тип, і це, здається, не порушує стандартів.

— Ерік Постпшишил
джерело

Я читаю усічення в C як обов'язкове за 6.5.7 4 (формулювання C18 подібне): "Результатом E1 << E2 є E1 лівосторонніми бітовими позиціями E2; звільнені біти заповнені нулями. Якщо E1 має неподписаний тип , значення результату - E1 x 2E2, зменшене по модулю на один більше, ніж максимальне значення, представлене в типі результату. " E1у цьому випадку 52-бітове поле.

— Ендрю Генле

@AndrewHenle: Я бачу, що ти кажеш. Тип бітового поля n- біт - це « n- бітове ціле число» (поки що нехтуючи підписом). Я інтерпретував це як тип n- бітового бітового поля - це якийсь цілий тип, який обирає реалізація. Виходячи виключно з формулювання в 6.7.2.1 10, я підтримую ваше тлумачення. Але проблема з цим полягає в тому, що, враховуючи uint64_t a : 33набір в 2 ^ 33−1 в структурі s, тоді, в реалізації C з 32-бітовим int, s.a+s.aслід отримати 2 ^ 33−2 за рахунок обгортання, але Clang виробляє 2 ^ 34− 2; це, мабуть, трактує це як uint64_t.

— Eric Postpischil

@AndrewHenle: (Детальніше про міркування: В s.a+s.a, звичайні арифметичні перетворення не змінювали б тип s.a, оскільки він ширший unsigned int, тому арифметика робиться в 33-бітовому типі.)

— Ерік Постпшишил

але Кланг виробляє 2 ^ 34−2; це, мабуть, трактує це як uint64_t. Якщо це 64-розрядна компіляція, це, здається, робить Clang узгодженим з тим, як GCC обробляє 64-бітні компіляції, не обрізаючи. Чи Кланг трактує 32- та 64-бітні компіляції по-різному? (І, схоже, я просто дізнався ще одну причину уникати бітових полів ...)

— Ендрю Генле

@AndrewHenle: Ну, старий Apple Clang 1.7 виробляє 2 ^ 32−2 (не 2 ^ 33−2; він трохи втратив!) І з, -m32і -m64з попередженням, що тип - це розширення GCC. У Apple Clang 11.0 у мене немає бібліотек для запуску 32-бітного коду, але згенерована збірка показує pushl $3і pushl $-2перед викликом printf, тому я думаю, що це 2 ^ 34−2. Таким чином, Apple Clang не відрізняється між 32-бітними та 64-бітовими цілями, але змінився з часом.

— Eric Eric Postpischil

Ендрю Генле запропонував сувору інтерпретацію стандарту С: тип бітового поля - це підписаний або непідписаний цілочисельний тип з точно вказаною шириною.

Ось тест, який підтримує цю інтерпретацію: використовуючи _Generic()конструкцію C1x , я намагаюся визначити тип бітових полів різної ширини. Мені довелося визначити їх із типом, long long intщоб уникнути попереджень при компіляції з clang.

Ось джерело:

#include <stdint.h>
#include <stdio.h>

#define typeof(X)  _Generic((X),                         \
                       long double: "long double",       \
                       double: "double",                 \
                       float: "float",                   \
                       unsigned long long int: "unsigned long long int",  \
                       long long int: "long long int",   \
                       unsigned long int: "unsigned long int",  \
                       long int: "long int",             \
                       unsigned int: "unsigned int",     \
                       int: "int",                       \
                       unsigned short: "unsigned short", \
                       short: "short",                   \
                       unsigned char: "unsigned char",   \
                       signed char: "signed char",       \
                       char: "char",                     \
                       _Bool: "_Bool",                   \
                       __int128_t: "__int128_t",         \
                       __uint128_t: "__uint128_t",       \
                       default: "other")

#define stype long long int
#define utype unsigned long long int

struct s {
    stype s1 : 1;
    stype s2 : 2;
    stype s3 : 3;
    stype s4 : 4;
    stype s5 : 5;
    stype s6 : 6;
    stype s7 : 7;
    stype s8 : 8;
    stype s9 : 9;
    stype s10 : 10;
    stype s11 : 11;
    stype s12 : 12;
    stype s13 : 13;
    stype s14 : 14;
    stype s15 : 15;
    stype s16 : 16;
    stype s17 : 17;
    stype s18 : 18;
    stype s19 : 19;
    stype s20 : 20;
    stype s21 : 21;
    stype s22 : 22;
    stype s23 : 23;
    stype s24 : 24;
    stype s25 : 25;
    stype s26 : 26;
    stype s27 : 27;
    stype s28 : 28;
    stype s29 : 29;
    stype s30 : 30;
    stype s31 : 31;
    stype s32 : 32;
    stype s33 : 33;
    stype s34 : 34;
    stype s35 : 35;
    stype s36 : 36;
    stype s37 : 37;
    stype s38 : 38;
    stype s39 : 39;
    stype s40 : 40;
    stype s41 : 41;
    stype s42 : 42;
    stype s43 : 43;
    stype s44 : 44;
    stype s45 : 45;
    stype s46 : 46;
    stype s47 : 47;
    stype s48 : 48;
    stype s49 : 49;
    stype s50 : 50;
    stype s51 : 51;
    stype s52 : 52;
    stype s53 : 53;
    stype s54 : 54;
    stype s55 : 55;
    stype s56 : 56;
    stype s57 : 57;
    stype s58 : 58;
    stype s59 : 59;
    stype s60 : 60;
    stype s61 : 61;
    stype s62 : 62;
    stype s63 : 63;
    stype s64 : 64;

    utype u1 : 1;
    utype u2 : 2;
    utype u3 : 3;
    utype u4 : 4;
    utype u5 : 5;
    utype u6 : 6;
    utype u7 : 7;
    utype u8 : 8;
    utype u9 : 9;
    utype u10 : 10;
    utype u11 : 11;
    utype u12 : 12;
    utype u13 : 13;
    utype u14 : 14;
    utype u15 : 15;
    utype u16 : 16;
    utype u17 : 17;
    utype u18 : 18;
    utype u19 : 19;
    utype u20 : 20;
    utype u21 : 21;
    utype u22 : 22;
    utype u23 : 23;
    utype u24 : 24;
    utype u25 : 25;
    utype u26 : 26;
    utype u27 : 27;
    utype u28 : 28;
    utype u29 : 29;
    utype u30 : 30;
    utype u31 : 31;
    utype u32 : 32;
    utype u33 : 33;
    utype u34 : 34;
    utype u35 : 35;
    utype u36 : 36;
    utype u37 : 37;
    utype u38 : 38;
    utype u39 : 39;
    utype u40 : 40;
    utype u41 : 41;
    utype u42 : 42;
    utype u43 : 43;
    utype u44 : 44;
    utype u45 : 45;
    utype u46 : 46;
    utype u47 : 47;
    utype u48 : 48;
    utype u49 : 49;
    utype u50 : 50;
    utype u51 : 51;
    utype u52 : 52;
    utype u53 : 53;
    utype u54 : 54;
    utype u55 : 55;
    utype u56 : 56;
    utype u57 : 57;
    utype u58 : 58;
    utype u59 : 59;
    utype u60 : 60;
    utype u61 : 61;
    utype u62 : 62;
    utype u63 : 63;
    utype u64 : 64;
} x;

int main(void) {
#define X(v)  printf("typeof(" #v "): %s\n", typeof(v))
    X(x.s1);
    X(x.s2);
    X(x.s3);
    X(x.s4);
    X(x.s5);
    X(x.s6);
    X(x.s7);
    X(x.s8);
    X(x.s9);
    X(x.s10);
    X(x.s11);
    X(x.s12);
    X(x.s13);
    X(x.s14);
    X(x.s15);
    X(x.s16);
    X(x.s17);
    X(x.s18);
    X(x.s19);
    X(x.s20);
    X(x.s21);
    X(x.s22);
    X(x.s23);
    X(x.s24);
    X(x.s25);
    X(x.s26);
    X(x.s27);
    X(x.s28);
    X(x.s29);
    X(x.s30);
    X(x.s31);
    X(x.s32);
    X(x.s33);
    X(x.s34);
    X(x.s35);
    X(x.s36);
    X(x.s37);
    X(x.s38);
    X(x.s39);
    X(x.s40);
    X(x.s41);
    X(x.s42);
    X(x.s43);
    X(x.s44);
    X(x.s45);
    X(x.s46);
    X(x.s47);
    X(x.s48);
    X(x.s49);
    X(x.s50);
    X(x.s51);
    X(x.s52);
    X(x.s53);
    X(x.s54);
    X(x.s55);
    X(x.s56);
    X(x.s57);
    X(x.s58);
    X(x.s59);
    X(x.s60);
    X(x.s61);
    X(x.s62);
    X(x.s63);
    X(x.s64);

    X(x.u1);
    X(x.u2);
    X(x.u3);
    X(x.u4);
    X(x.u5);
    X(x.u6);
    X(x.u7);
    X(x.u8);
    X(x.u9);
    X(x.u10);
    X(x.u11);
    X(x.u12);
    X(x.u13);
    X(x.u14);
    X(x.u15);
    X(x.u16);
    X(x.u17);
    X(x.u18);
    X(x.u19);
    X(x.u20);
    X(x.u21);
    X(x.u22);
    X(x.u23);
    X(x.u24);
    X(x.u25);
    X(x.u26);
    X(x.u27);
    X(x.u28);
    X(x.u29);
    X(x.u30);
    X(x.u31);
    X(x.u32);
    X(x.u33);
    X(x.u34);
    X(x.u35);
    X(x.u36);
    X(x.u37);
    X(x.u38);
    X(x.u39);
    X(x.u40);
    X(x.u41);
    X(x.u42);
    X(x.u43);
    X(x.u44);
    X(x.u45);
    X(x.u46);
    X(x.u47);
    X(x.u48);
    X(x.u49);
    X(x.u50);
    X(x.u51);
    X(x.u52);
    X(x.u53);
    X(x.u54);
    X(x.u55);
    X(x.u56);
    X(x.u57);
    X(x.u58);
    X(x.u59);
    X(x.u60);
    X(x.u61);
    X(x.u62);
    X(x.u63);
    X(x.u64);

    return 0;
}

Ось вихід програми, складений за допомогою 64-бітного кланг:

typeof(x.s1): long long int
typeof(x.s2): long long int
typeof(x.s3): long long int
typeof(x.s4): long long int
typeof(x.s5): long long int
typeof(x.s6): long long int
typeof(x.s7): long long int
typeof(x.s8): long long int
typeof(x.s9): long long int
typeof(x.s10): long long int
typeof(x.s11): long long int
typeof(x.s12): long long int
typeof(x.s13): long long int
typeof(x.s14): long long int
typeof(x.s15): long long int
typeof(x.s16): long long int
typeof(x.s17): long long int
typeof(x.s18): long long int
typeof(x.s19): long long int
typeof(x.s20): long long int
typeof(x.s21): long long int
typeof(x.s22): long long int
typeof(x.s23): long long int
typeof(x.s24): long long int
typeof(x.s25): long long int
typeof(x.s26): long long int
typeof(x.s27): long long int
typeof(x.s28): long long int
typeof(x.s29): long long int
typeof(x.s30): long long int
typeof(x.s31): long long int
typeof(x.s32): long long int
typeof(x.s33): long long int
typeof(x.s34): long long int
typeof(x.s35): long long int
typeof(x.s36): long long int
typeof(x.s37): long long int
typeof(x.s38): long long int
typeof(x.s39): long long int
typeof(x.s40): long long int
typeof(x.s41): long long int
typeof(x.s42): long long int
typeof(x.s43): long long int
typeof(x.s44): long long int
typeof(x.s45): long long int
typeof(x.s46): long long int
typeof(x.s47): long long int
typeof(x.s48): long long int
typeof(x.s49): long long int
typeof(x.s50): long long int
typeof(x.s51): long long int
typeof(x.s52): long long int
typeof(x.s53): long long int
typeof(x.s54): long long int
typeof(x.s55): long long int
typeof(x.s56): long long int
typeof(x.s57): long long int
typeof(x.s58): long long int
typeof(x.s59): long long int
typeof(x.s60): long long int
typeof(x.s61): long long int
typeof(x.s62): long long int
typeof(x.s63): long long int
typeof(x.s64): long long int
typeof(x.u1): unsigned long long int
typeof(x.u2): unsigned long long int
typeof(x.u3): unsigned long long int
typeof(x.u4): unsigned long long int
typeof(x.u5): unsigned long long int
typeof(x.u6): unsigned long long int
typeof(x.u7): unsigned long long int
typeof(x.u8): unsigned long long int
typeof(x.u9): unsigned long long int
typeof(x.u10): unsigned long long int
typeof(x.u11): unsigned long long int
typeof(x.u12): unsigned long long int
typeof(x.u13): unsigned long long int
typeof(x.u14): unsigned long long int
typeof(x.u15): unsigned long long int
typeof(x.u16): unsigned long long int
typeof(x.u17): unsigned long long int
typeof(x.u18): unsigned long long int
typeof(x.u19): unsigned long long int
typeof(x.u20): unsigned long long int
typeof(x.u21): unsigned long long int
typeof(x.u22): unsigned long long int
typeof(x.u23): unsigned long long int
typeof(x.u24): unsigned long long int
typeof(x.u25): unsigned long long int
typeof(x.u26): unsigned long long int
typeof(x.u27): unsigned long long int
typeof(x.u28): unsigned long long int
typeof(x.u29): unsigned long long int
typeof(x.u30): unsigned long long int
typeof(x.u31): unsigned long long int
typeof(x.u32): unsigned long long int
typeof(x.u33): unsigned long long int
typeof(x.u34): unsigned long long int
typeof(x.u35): unsigned long long int
typeof(x.u36): unsigned long long int
typeof(x.u37): unsigned long long int
typeof(x.u38): unsigned long long int
typeof(x.u39): unsigned long long int
typeof(x.u40): unsigned long long int
typeof(x.u41): unsigned long long int
typeof(x.u42): unsigned long long int
typeof(x.u43): unsigned long long int
typeof(x.u44): unsigned long long int
typeof(x.u45): unsigned long long int
typeof(x.u45): unsigned long long int
typeof(x.u46): unsigned long long int
typeof(x.u47): unsigned long long int
typeof(x.u48): unsigned long long int
typeof(x.u49): unsigned long long int
typeof(x.u50): unsigned long long int
typeof(x.u51): unsigned long long int
typeof(x.u52): unsigned long long int
typeof(x.u53): unsigned long long int
typeof(x.u54): unsigned long long int
typeof(x.u55): unsigned long long int
typeof(x.u56): unsigned long long int
typeof(x.u57): unsigned long long int
typeof(x.u58): unsigned long long int
typeof(x.u59): unsigned long long int
typeof(x.u60): unsigned long long int
typeof(x.u61): unsigned long long int
typeof(x.u62): unsigned long long int
typeof(x.u63): unsigned long long int
typeof(x.u64): unsigned long long int

Здається, всі бітові поля мають визначений тип, а не тип, специфічний для визначеної ширини.

Ось вихід програми, зібраний із 64-розрядним gcc:

typestr(x.s1): other
typestr(x.s2): other
typestr(x.s3): other
typestr(x.s4): other
typestr(x.s5): other
typestr(x.s6): other
typestr(x.s7): other
typestr(x.s8): signed char
typestr(x.s9): other
typestr(x.s10): other
typestr(x.s11): other
typestr(x.s12): other
typestr(x.s13): other
typestr(x.s14): other
typestr(x.s15): other
typestr(x.s16): short
typestr(x.s17): other
typestr(x.s18): other
typestr(x.s19): other
typestr(x.s20): other
typestr(x.s21): other
typestr(x.s22): other
typestr(x.s23): other
typestr(x.s24): other
typestr(x.s25): other
typestr(x.s26): other
typestr(x.s27): other
typestr(x.s28): other
typestr(x.s29): other
typestr(x.s30): other
typestr(x.s31): other
typestr(x.s32): int
typestr(x.s33): other
typestr(x.s34): other
typestr(x.s35): other
typestr(x.s36): other
typestr(x.s37): other
typestr(x.s38): other
typestr(x.s39): other
typestr(x.s40): other
typestr(x.s41): other
typestr(x.s42): other
typestr(x.s43): other
typestr(x.s44): other
typestr(x.s45): other
typestr(x.s46): other
typestr(x.s47): other
typestr(x.s48): other
typestr(x.s49): other
typestr(x.s50): other
typestr(x.s51): other
typestr(x.s52): other
typestr(x.s53): other
typestr(x.s54): other
typestr(x.s55): other
typestr(x.s56): other
typestr(x.s57): other
typestr(x.s58): other
typestr(x.s59): other
typestr(x.s60): other
typestr(x.s61): other
typestr(x.s62): other
typestr(x.s63): other
typestr(x.s64): long long int
typestr(x.u1): other
typestr(x.u2): other
typestr(x.u3): other
typestr(x.u4): other
typestr(x.u5): other
typestr(x.u6): other
typestr(x.u7): other
typestr(x.u8): unsigned char
typestr(x.u9): other
typestr(x.u10): other
typestr(x.u11): other
typestr(x.u12): other
typestr(x.u13): other
typestr(x.u14): other
typestr(x.u15): other
typestr(x.u16): unsigned short
typestr(x.u17): other
typestr(x.u18): other
typestr(x.u19): other
typestr(x.u20): other
typestr(x.u21): other
typestr(x.u22): other
typestr(x.u23): other
typestr(x.u24): other
typestr(x.u25): other
typestr(x.u26): other
typestr(x.u27): other
typestr(x.u28): other
typestr(x.u29): other
typestr(x.u30): other
typestr(x.u31): other
typestr(x.u32): unsigned int
typestr(x.u33): other
typestr(x.u34): other
typestr(x.u35): other
typestr(x.u36): other
typestr(x.u37): other
typestr(x.u38): other
typestr(x.u39): other
typestr(x.u40): other
typestr(x.u41): other
typestr(x.u42): other
typestr(x.u43): other
typestr(x.u44): other
typestr(x.u45): other
typestr(x.u46): other
typestr(x.u47): other
typestr(x.u48): other
typestr(x.u49): other
typestr(x.u50): other
typestr(x.u51): other
typestr(x.u52): other
typestr(x.u53): other
typestr(x.u54): other
typestr(x.u55): other
typestr(x.u56): other
typestr(x.u57): other
typestr(x.u58): other
typestr(x.u59): other
typestr(x.u60): other
typestr(x.u61): other
typestr(x.u62): other
typestr(x.u63): other
typestr(x.u64): unsigned long long int

Що відповідає кожній ширині, що має різний тип.

Вираз E1 << E2має тип продукту, що просувається лівий операнд, так що будь-яка ширина менше , ніж INT_WIDTHпідвищується до intза допомогою просування цілого і будь-якою шириною більше INT_WIDTHзалишений в спокої. Результат виразу дійсно повинен бути прирізаний до ширини бітового поля, якщо ця ширина більша за INT_WIDTH. Точніше, він має бути усічений для неподписаного типу, і він може бути визначений для реалізації підписаних типів.

Те ж саме має відбуватися E1 + E2і для інших арифметичних операторів, якщо E1або E2є бітові поля, ширина яких більша за ширину int. Операнд з меншою шириною перетворюється на тип із більшою шириною, і результат має тип типу. Ця дуже контрінтуїтивна поведінка, що спричиняє багато несподіваних результатів, може бути причиною поширеної думки, що бітові поля є хибними і їх слід уникати.

Багато компіляторів, схоже, не дотримуються такої інтерпретації стандарту C, і це тлумачення не очевидно з нинішніх формулювань. Було б корисно уточнити семантику арифметичних операцій, що включають операнди бітових полів у майбутній версії стандарту C.

— chqrlie
джерело

Я думаю, що ключовим терміном є «цілі акції». Обговорення бітових полів із цілими акціями (C11 §6.3.1.1 - Якщо a intможе представляти всі значення вихідного типу (обмежені шириною, для бітового поля), значення перетворюється на an int; в іншому випадку це перетворюють в unsigned intЇх називають цілі акції .. - §6.3.1.8 , §6.7.2.1 ), не охоплюють випадок , коли ширина бітового поля ширше , ніж int.

— Джонатан Леффлер

Це не допомагає, щоб стандарт не визначав (у кращому випадку визначено реалізацію), які типи дозволені для бітових полів, крім int, unsigned intта _Bool.

— Джонатан Леффлер

"будь-яка ширина менше 32", "будь-яка ширина більше 32" і "якщо ця ширина більша за 32" повинні, мабуть, відображати кількість бітів у звичайній intформі, а не бути фіксованою 32.

— Бен Войгт,

Я погоджуюсь, що в стандарті C є проблема (недогляд). Можливо, можна стверджувати, що оскільки стандарт не санкціонує використання uint64_tбітових полів, стандарт не повинен нічого про них говорити - це повинно бути охоплено документацією щодо впровадження визначених реалізацією частин поведінки бітових полів. Зокрема, лише тому, що 52-бітові бітові поля не вписуються в (32-бітові), intце не повинно означати, що вони скребтовані в 32-бітові unsigned int, але це те, що читає буквально 6,3. 1.1 говорить.

— Джонатан Леффлер

Крім того, якщо C ++ вирішив проблеми "великого бітового поля" явно, тоді C повинен слідувати цій ведучі якомога ближче - якщо тільки щось не властиве C ++ щодо цієї роздільної здатності (що мало ймовірно).

— Джонатан Леффлер

Здається, проблема є специфічною для 32-розрядного генератора коду gcc у режимі C:

Ви можете порівняти код складання за допомогою Провідника компілятора Godbolt

Ось вихідний код цього тесту:

#include <stdint.h>

typedef union control {
    uint64_t q;
    struct {
        uint64_t a: 1;
        uint64_t b: 1;
        uint64_t c: 1;
        uint64_t d: 1;
        uint64_t e: 1;
        uint64_t f: 1;
        uint64_t g: 4;
        uint64_t h: 1;
        uint64_t i: 1;
        uint64_t p52: 52;
    } b;
} control_t;

uint64_t test(control_t ctl) {
    return ctl.b.p52 << 12;
}

Вихід у режимі С (прапори -xc -O2 -m32)

test:
        push    esi
        push    ebx
        mov     ebx, DWORD PTR [esp+16]
        mov     ecx, DWORD PTR [esp+12]
        mov     esi, ebx
        shr     ebx, 12
        shr     ecx, 12
        sal     esi, 20
        mov     edx, ebx
        pop     ebx
        or      esi, ecx
        mov     eax, esi
        shld    edx, esi, 12
        pop     esi
        sal     eax, 12
        and     edx, 1048575
        ret

Проблема полягає в останній інструкції, and edx, 1048575яка вирізає 12 найбільш значущих бітів.

Вихід у режимі C ++ ідентичний, за винятком останньої інструкції:

test(control):
        push    esi
        push    ebx
        mov     ebx, DWORD PTR [esp+16]
        mov     ecx, DWORD PTR [esp+12]
        mov     esi, ebx
        shr     ebx, 12
        shr     ecx, 12
        sal     esi, 20
        mov     edx, ebx
        pop     ebx
        or      esi, ecx
        mov     eax, esi
        shld    edx, esi, 12
        pop     esi
        sal     eax, 12
        ret

Вихід у 64-бітному режимі набагато простіший і правильний, але відрізняється для компіляторів C і C ++:

#C code:
test:
        movabs  rax, 4503599627366400
        and     rax, rdi
        ret

# C++ code:
test(control):
        mov     rax, rdi
        and     rax, -4096
        ret

Ви повинні подати звіт про помилки у трекері помилок gcc.

— chqrlie
джерело

Мої експерименти були лише для 64-бітних цілей, але ваш 32-бітний випадок ще більший. Я здогадуюсь повідомлення про помилку. По-перше, мені потрібно повторно перевірити його на останній доступній для мене версії GCC.

— Григорій Речистов

@GrigoryRechistov З огляду на формулювання стандарту C , помилка може бути 64-розрядною ціллю, яка не зможе скоротити результат до 52 біт. Я особисто вважав би це таким чином.

— Ендрю Генле