Trivial Loop не векторизован gcc 4.8.5

Question

Trivial Loop не векторизован gcc 4.8.5

Я пытаюсь узнать больше об автоматической векторизации в GCC. В моем проекте я должен использовать gcc 4.8.5, и у меня есть несколько циклов, которые я вижу, которые не векторизованы. Таким образом, я создал небольшой пример, чтобы играть и понять, почему это не так.

Что меня интересует, так это то, что gcc не векторизует цикл и выясняет, как я могу его векторизовать. К сожалению, я не очень знаком с выходными сообщениями GCC.

а) я ожидаю, что этот цикл будет векторизован как тривиальный случай

б) Есть что-нибудь тривиальное, что я пропускаю?

Спасибо всем большое заранее...

Небольшой пример:

#include <iostream>
#include <vector>

using namespace std;

class test
{

public:
    test();
    ~test();
    void calc_test();
};

test::test()
{
}

test::~test()
{
}

void
test::calc_test(void)
{
vector<int> ffs_psd(10000,5.0);
vector<int> G_qh_sp(10000,1.0);
vector<int> G_qv_sp(10000,3.0);
vector<int> B_erm_qh(10000,50.0);
vector<int> B_erm_qv(10000,2.0);


for ( uint ang=0; ang < 6808; ang++)
{
   ffs_psd[0] += (G_qh_sp[ang] * B_erm_qh[ang])  +  (G_qv_sp[ang] * B_erm_qv[ang]);      
}

}

int main(int argc, char * argv[])
{
  test m_test;
  m_test.calc_test();
}

Я компилирую это с GCC 4.8.5:

c++ -O3 -ftree-vectorize -fopt-info-vec-missed -ftree-vectorizer-verbose=5 -std=c++11 test.cpp

Вывод, который я получаю от компилятора:

test.cpp:34: note: ===vect_slp_analyze_bb===

test.cpp:34: note: === vect_analyze_data_refs ===

test.cpp:34: note: get vectype with 4 units of type value_type
test.cpp:34: note: vectype: vector(4) int
test.cpp:34: note: get vectype with 4 units of type value_type
test.cpp:34: note: vectype: vector(4) int
test.cpp:34: note: get vectype with 4 units of type value_type
test.cpp:34: note: vectype: vector(4) int
test.cpp:34: note: get vectype with 4 units of type value_type
test.cpp:34: note: vectype: vector(4) int
test.cpp:34: note: get vectype with 4 units of type value_type
test.cpp:34: note: vectype: vector(4) int
test.cpp:34: note: === vect_pattern_recog ===
test.cpp:34: note: vect_is_simple_use: operand _27
test.cpp:34: note: def_stmt: _27 = (long unsigned int) ang_212;

test.cpp:34: note: type of def: 3.
test.cpp:34: note: vect_is_simple_use: operand ang_212
test.cpp:34: note: def_stmt: ang_212 = PHI <ang_43(78), 0(76)>

test.cpp:34: note: type of def: 2.
test.cpp:34: note: vect_is_simple_use: operand 4
test.cpp:34: note: vect_recog_widen_mult_pattern: detected: 
test.cpp:34: note: get vectype with 4 units of type uint
test.cpp:34: note: vectype: vector(4) unsigned int
test.cpp:34: note: get vectype with 2 units of type long unsigned int
test.cpp:34: note: vectype: vector(2) long unsigned int
test.cpp:34: note: patt_2 = ang_212 w* 4;

test.cpp:34: note: pattern recognized: patt_2 = ang_212 w* 4;

test.cpp:34: note: vect_is_simple_use: operand _29
test.cpp:34: note: def_stmt: _29 = *_67;

test.cpp:34: note: type of def: 3.
test.cpp:34: note: vect_is_simple_use: operand _34
test.cpp:34: note: def_stmt: _34 = *_69;

test.cpp:34: note: type of def: 3.
test.cpp:34: note: === vect_analyze_dependences ===
test.cpp:34: note: can't determine dependence between *_67 and MEM[(value_type &)__first_111]
test.cpp:34: note: can't determine dependence between *_68 and MEM[(value_type &)__first_111]
test.cpp:34: note: can't determine dependence between *_69 and MEM[(value_type &)__first_111]
test.cpp:34: note: can't determine dependence between *_70 and MEM[(value_type &)__first_111]
test.cpp:34: note: === vect_analyze_data_refs_alignment ===
test.cpp:34: note: vect_compute_data_ref_alignment:
test.cpp:34: note: SLP: step doesn't divide the vector-size.
test.cpp:34: note: Unknown alignment for access: *__first_125
test.cpp:34: note: vect_compute_data_ref_alignment:
test.cpp:34: note: SLP: step doesn't divide the vector-size.
test.cpp:34: note: Unknown alignment for access: *__first_153
test.cpp:34: note: vect_compute_data_ref_alignment:
test.cpp:34: note: SLP: step doesn't divide the vector-size.
test.cpp:34: note: Unknown alignment for access: *__first_139
test.cpp:34: note: vect_compute_data_ref_alignment:
test.cpp:34: note: SLP: step doesn't divide the vector-size.
test.cpp:34: note: Unknown alignment for access: *__first_167
test.cpp:34: note: vect_compute_data_ref_alignment:
test.cpp:34: note: can't force alignment of ref: MEM[(value_type &)__first_111]
test.cpp:34: note: === vect_analyze_data_ref_accesses ===
test.cpp:34: note: not consecutive access MEM[(value_type &)__first_111] = _41;

test.cpp:34: note: === vect_analyze_slp ===
test.cpp:34: note: Failed to SLP the basic block.
test.cpp:34: note: not vectorized: failed to find SLP opportunities in basic block.

РЕДАКТИРОВАТЬ: После Matts ответ ниже:

@Matt:

Большое спасибо за ответ. Я не знал, что вектор не выровнен. Эта информация очень полезна, потому что многие люди считают, что цикл будет векторизованным, даже если они используют вектор в качестве контейнера.

К сожалению, даже с вашими изменениями отчет от gcc по-прежнему не векторизован (на этот раз с другими сообщениями):

test.cpp:47: note: misalign = 0 bytes of ref MEM[(value_type &)&ffs_psd]
test.cpp:47: note: not consecutive access _25 = MEM[(value_type &)&ffs_psd];

test.cpp:47: note: Failed to SLP the basic block.
test.cpp:47: note: not vectorized: failed to find SLP opportunities in basic block.

test.cpp:47: note: misalign = 0 bytes of ref MEM[(value_type &)&ffs_psd]
test.cpp:47: note: not consecutive access _25 = MEM[(value_type &)&ffs_psd];

test.cpp:47: note: Failed to SLP the basic block.
test.cpp:47: note: not vectorized: failed to find SLP opportunities in basic block.

Результат сборки (надеюсь, я скопирую и вставлю правильный раздел, потому что мои знания по сборке не очень хороши):

.L16
vmovdqa 40000(%rsp,%rax), %ymm1
vmovdqa 80000(%rsp,%rax), %ymm0
vpmulld 120000(%rsp,%rax), %ymm1, %ymm1
vpmulld 160000(%rsp,%rax), %ymm0, %ymm0
vpaddd  %ymm0, %ymm1, %ymm0
vpaddd  (%rsp,%rax), %ymm0, %ymm0
vmovdqa %ymm0, (%rsp,%rax)
addq    $32, %rax
cmpq    $27232, %rax
jne .L16

1

c++ gcc auto-vectorization

Источник

user7075711 01 ноя '18 в 15:39

1 ответ

Решение

Другие вопросы по тегам c++ gcc auto-vectorization

user2630666 02 ноя '18 в 04:59 2018-11-02 04:59 · Accepted Answer · 2018-11-02 04:59

Чтобы использовать векторизованные инструкции, операнды должны быть выровнены по соответствующим границам. Например __attribute__((aligned(32))) или же __attribute__((aligned(16))) и т. д. Стандартный распределитель для std::vector не гарантирует выравнивание, даже если класс выровнен. Например std::vector<__m64> A создает вектор типов данных SSE, но они не могут быть выровнены, потому что std::allocator не выравнивает все На мой взгляд, самое простое изменение заключается в использовании std::array с __attribute__((aligned(32)))

#include <iostream>
#include <array>

using namespace std;

int main()
{
    array<int, 10000> ffs_psd __attribute__((aligned(32)));
    ffs_psd.fill(5);
    array<int, 10000> G_qh_sp __attribute__((aligned(32)));
    G_qh_sp.fill(1);
    array<int, 10000> G_qv_sp __attribute__((aligned(32)));
    G_qv_sp.fill(3);
    array<int, 10000> B_erm_qh __attribute__((aligned(32)));
    B_erm_qh.fill(50);
    array<int, 10000> B_erm_qv __attribute__((aligned(32)));
    B_erm_qv.fill(2);


    for ( uint ang=0; ang < 6808; ang++)
    {
        ffs_psd[0] += (G_qh_sp[ang] * B_erm_qh[ang])  +  (G_qv_sp[ang] * B_erm_qv[ang]);      
    }
    cout << ffs_psd[0] << endl;
}

Цикл производит это:

vmovdqa ymm2, YMMWORD PTR [rsp+40000+rax]
vmovdqa ymm1, YMMWORD PTR [rsp+80000+rax]
vpmulld ymm2, ymm2, YMMWORD PTR [rsp+120000+rax]
vpmulld ymm1, ymm1, YMMWORD PTR [rsp+160000+rax]
add     rax, 32
vpaddd  ymm1, ymm2, ymm1
cmp     rax, 27232
vpaddd  ymm0, ymm0, ymm1
jne     .L13
vmovdqa xmm1, xmm0

на Годболт с GCC 4.8.3 -std=c++11 -Wall -Wextra -pedantic-errors -O2 -ftree-vectorize -march=native

Другой вариант заключается в использовании boost::alignment::aligned_allocator с вашим вектором.

Наконец, вы можете написать свой собственный allocator тот vector можно использовать, чтобы правильно выровнять вещи. Вот статья, объясняющая требования к распределителю. Также вот такой вопрос о той же самой основной вещи.