Как отфильтровать журнал доступа apache, используя awk, sed или cut

Question

Как отфильтровать журнал доступа apache, используя awk, sed или cut

Это мой файл журнала доступа apache. Я хочу, чтобы apache access log uniq count для URL.

"2011-09-07 17:00:00" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/"
"2011-09-07 17:00:17" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:21" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:16" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:29" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:22" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:38" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:44" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:04" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:06" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:14" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http
"2011-09-07 17:00:51" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:45" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:59" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:00" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:09" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:09" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/

В приведенном выше файле я дал образец. файл журнала постоянно растет.
Ожидаемый результат

/abc/index.php/contentapi/discontent/  - 3  
/abc/index.php/data/dataContent/  - 3  
/abc/index.php/Api/ApiContent/ - 5  
/abc/index.php/site/siteContent/ - 6  
/abc/index.php/htmlrequest/htmlContent/ - 5

0

awk sed grep cut

Источник

user4923188 21 май '15 в 05:45

3 ответа

Другие вопросы по тегам awk sed grep cut

user3277393 21 май '15 в 06:04 2015-05-21 06:04 · Answer 1 · 2015-05-21 06:04

Я думаю, что в журнале apache могли быть некоторые опечатки, но как насчет этого:

$ grep -o 'abc/[^ 0-9]*/' apache.log | sort | uniq -c | sort -r
6 abc/index.php/site/siteContent/
5 abc/index.php/htmlrequest/htmlContent/
5 abc/index.php/Api/ApiContent/
3 abc/index.php/data/dataContent/
2 abc/index.php/contentapi/discontent/
1 abc/index.php/contentapi/

1

Источник

user3277393 21 май '15 в 06:04

user3790962 21 май '15 в 06:59 2015-05-21 06:59 · Answer 2 · 2015-05-21 06:59

Это извлекает четвертое поле, которое считается URL

cat logfile | awk -F' ' '{print $4}' | awk -F'/' '{print $2"/"$3"/"$4"/"$5}' | sort | uniq -c

0

Источник

user3790962 21 май '15 в 06:59

user1745001 21 май '15 в 12:38 2015-05-21 12:38 · Answer 3 · 2015-05-21 12:38

С GNU awk для gensub():

$ awk '{cnt[gensub(/(([/][^/]+){4}[/]).*/,"\\1","",$4)]++} END{for (url in cnt) print url " - " cnt[url]}' file
/abc/index.php/contentapi/discontent/ - 3
/abc/index.php/data/dataContent/ - 3
/abc/index.php/site/siteContent/ - 6
/abc/index.php/Api/ApiContent/ - 5
/abc/index.php/htmlrequest/htmlContent/ - 5

0

Источник

user1745001 21 май '15 в 12:38