Обробка двох файлів за допомогою awk

9

Я читаю Порівняння двох файлів за допомогою Unix та Awk . Це дійсно цікаво. Я прочитав і перевірив його, але не можу його повністю зрозуміти і використовувати в інших випадках.

У мене два файли. file1має одне поле, а інше - 16 полів. Я хочу прочитати елементи file1 і порівняти їх з 3-м полем file2. Якщо для кожного елемента була відповідність, я сумую поле 5 у file2. Як приклад:

файл 1

1
2
3

файл 2

Для елемента 1 в file1я хочу додати значення в поле 5, file2де значення поля 3 дорівнює 1. І зробити те ж саме для елемента 2 і 3 в file1. Вихід для 1 дорівнює (3 + 4 = 7), а для 2 - 2, а для 3 - 4.

Я не знаю, як я мушу це писати з awk.

text-processing awk

— user55340
джерело

20

Ось один із способів. Я написав це як сценарій awk, щоб я міг додати коментарі:

#!/usr/local/bin/awk -f

{
    ## FNR is the line number of the current file, NR is the number of 
    ## lines that have been processed. If you only give one file to
    ## awk, FNR will always equal NR. If you give more than one file,
    ## FNR will go back to 1 when the next file is reached but NR
    ## will continue incrementing. Therefore, NR == FNR only while
    ## the first file is being processed.
    if(NR == FNR){
      ## If this is the first file, save the values of $1
      ## in the array n.
      n[$1] = 0
    }
    ## If we have moved on to the 2nd file
    else{
      ## If the 3rd field of the second file exists in
      ## the first file.
      if($3 in n){
        ## Add the value of the 5th field to the corresponding value
        ## of the n array.
        n[$3]+=$5
      }
    }
}
## The END{} block is executed after all files have been processed.
## This is useful since you may have more than one line whose 3rd
## field was specified in the first file so you don't want to print
## as you process the files.
END{
    ## For each element in the n array
    for (i in n){
    ## print the element itself and then its value
    print i,":",n[i];
    }
}

Ви можете зберегти це як файл, зробити його виконуваним і запустити так:

$ chmod a+x foo.awk
$ ./foo.awk file1 file2
1 : 7
2 : 2
3 : 4

Або ви можете конденсувати його в одноколірному:

awk '
     (NR == FNR){n[$1] = 0; next}
     {if($3 in n){n[$3]+=$5}}
     END{for (i in n){print i,":",n[i]} }' file1 file2

— тердон
джерело

9

awk '
  NR == FNR {n[$3] += $5; next}
  {print $1 ": " n[$1]}' file2 file1

— Стефан Хазелас
джерело

Це робить додаткову роботу шляхом підсумовування невідповідних полів.

— Еммануїл

@Emmanuel, це все ще одна дивовижна інструкція на рядок файлу2, яка робить її коротшою та швидшою, ніж тердон

— Stéphane Chazelas

приємне рішення!

— Рональд Пауфферт