一篇文章教你搞懂日誌採集利器 Filebeat

點選上方 "程式設計技術圈"關注, 星標或置頂一起成長

後台回復「大禮包」有驚喜禮包！

每日英文

Don't always in the memories of the past love, the sun yesterday, how sun does not dry clothes today.

不要總在過去的回憶里纏綿，昨天的太陽，怎麼都曬不幹今天的衣裳。

每日掏心話

柔軟的時光，揉碎了執著，荒蕪了等待。歲月，來時，腳步很輕，卻驚醒了時光。

責編：樂樂 | 來自：DevOps技術棧

連結：cnblogs.com/zsql/p/13137833.html

程式設計技術圈(ID:study_tech)第 1167 次推文

往日回顧：記住看小電影前一定要檢查網址是不是 HTTPS 的，不然…

正文

本文使用的Filebeat是7.7.0的版本，文章將從如下幾個方面說明：

Filebeat是什麼，可以用來幹嘛
Filebeat的原理是怎樣的，怎麼構成的
Filebeat應該怎麼玩

Filebeat是什麼

Filebeat和Beats的關係

首先Filebeat是Beats中的一員。

Beats在是一個輕量級日誌採集器，其實Beats家族有6個成員，早期的ELK架構中使用Logstash收集、解析日誌，但是Logstash對記憶體、CPU、io等資源耗用比較高。相比Logstash，Beats所佔系統的CPU和記憶體幾乎可以忽略不計。

目前Beats包含六種工具：

Packetbeat：網路資料（收集網路流量資料）
Metricbeat：指標（收集系統、行程和檔案系統級別的CPU和記憶體使用情況等資料）
Filebeat：日誌檔案（收集檔案資料）
Winlogbeat：Windows事件日誌（收集Windows事件日誌資料）
Auditbeat：審核資料（收集審核日誌）
Heartbeat：執行期間監控（收集系統執行期的資料）

Filebeat是什麼

Filebeat是用於轉發和集中日誌資料的輕量級傳送工具。Filebeat監視您指定的日誌檔案或位置，收集日誌事件，並將它們轉發到Elasticsearch或 Logstash進行索引。

Filebeat的工作方式如下：啟動Filebeat時，它將啟動一個或多個輸入，這些輸入將在為日誌資料指定的位置中搜尋。對於Filebeat所找到的每個日誌，Filebeat都會啟動收集器。每個收集器都讀取單個日誌以取得新內容，並將新日誌資料發送到libbeat，libbeat將聚集事件，並將聚集的資料發送到為Filebeat配置的輸出。

工作的流程圖如下：

Filebeat和Logstash的關係

因為Logstash是JVM跑的，資源耗用比較大，所以後來作者又用Golang寫了一個功能較少但是資源耗用也小的輕量級的logstash-forwarder。不過作者只是一個人，加入http://elastic.co公司以後，因為ES公司本身還收購了另一個開源專案Packetbeat，而這個專案專門就是用Golang的，有整個團隊，所以ES公司乾脆把logstash-forwarder的開發工作也合併到同一個Golang團隊來搞，於是新的專案就叫Filebeat了。

Filebeat原理是什麼

Filebeat的構成

Filebeat結構：由兩個套件構成，分別是inputs（輸入）和harvesters（收集器），這些套件一起工作來追蹤檔案並將事件資料發送到您指定的輸出，harvester負責讀取單個檔案的內容。harvester逐列讀取每個檔案，並將內容發送到輸出。為每個檔案啟動一個harvester。harvester負責開啟和關閉檔案，這意味著檔案描述符在harvester執行期保持開啟狀態。如果在收集檔案時刪除或重命名檔案，Filebeat將繼續讀取該檔案。這樣做的副作用是，磁碟上的空間一直保留到harvester關閉。預設情況下，Filebeat保持檔案開啟，直到達到close_inactive。

關閉harvester可以會產生的結果：

檔案處理程式關閉，如果harvester仍在讀取檔案時被刪除，則釋放底層資源。
衹有在scan_frequency結束之後，才會再次啟動檔案的收集。
如果該檔案在harvester關閉時被移動或刪除，該檔案的收集將不會繼續。

一個input負責管理harvesters和尋找所有來源讀取。如果input型別是log，則input將搜尋驅動器上與定義的路徑匹配的所有檔案，並為每個檔案啟動一個harvester。每個input在它自己的Go行程中執行，Filebeat當前支援多種輸入型別。每個輸入型別可以定義多次。日誌輸入檢查每個檔案，以檢視是否需要啟動harvester、是否已經在執行harvester或是否可以忽略該檔案。

Filebeat如何儲存檔案的狀態

Filebeat保留每個檔案的狀態，並經常將狀態重新整理到磁碟中的機碼檔案中。該狀態用於記住harvester讀取的最後一個偏移量，並確保發送所有日誌行。如果無法訪問輸出（如Elasticsearch或Logstash），Filebeat將追蹤最後發送的行，並在輸出再次可用時繼續讀取檔案。當Filebeat執行期，每個輸入的狀態訊息也儲存在記憶體中。當Filebeat重新啟動時，來自機碼檔案的資料用於重建狀態，Filebeat在最後一個已知位置繼續每個harvester。對於每個輸入，Filebeat都會保留它找到的每個檔案的狀態。由於檔案可以重命名或移動，檔名和路徑不足以標識檔案。對於每個檔案，Filebeat儲存唯一的標識符，以檢測檔案是否以前被捕獲。

Filebeat何如保證至少一次資料消費

Filebeat保證事件將至少傳遞到配置的輸出一次，並且不會遺失資料。是因為它將每個事件的傳遞狀態儲存在機碼檔案中。在已定義的輸出被阻止且未確認所有事件的情況下，Filebeat將繼續嘗試發送事件，直到輸出確認已接收到事件為止。如果Filebeat在發送事件的過程中關閉，它不會等待輸出確認所有事件後再關閉。當Filebeat重新啟動時，將再次將Filebeat關閉前未確認的所有事件發送到輸出。這樣可以確保每個事件至少發送一次，但最終可能會有重複的事件發送到輸出。透過設定shutdown_timeout選項，可以將Filebeat配置為在關機前等待特定時間。

Filebeat怎麼玩

壓縮包方式安裝

本文採用壓縮包的方式安裝，Linux版本，filebeat-7.7.0-linux-x86_64.tar.gz。

1 2	curl-L-Ohttps://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

配置範例檔案：filebeat.reference.yml（包含所有未過時的配置項）

在公眾號後端架構師後台回復「架構整潔」，取得一份驚喜禮包。

設定檔：filebeat.yml

基本命令

詳情見官網：https://www.elastic.co/guide/en/beats/filebeat/current/command-line-options.html

1
2
3
4
5
6

export #匯出
run #執行（預設執行）
test #測試配置
keystore #秘鑰儲存
modules #模組配置管理
setup #設定初始環境

例如：./filebeat test config #用來測試設定檔是否正確

輸入輸出

支援的輸入套件：

Multilinemessages，Azureeventhub，CloudFoundry，Container，Docker，GooglePub/Sub，HTTPJSON，Kafka，Log，MQTT，NetFlow，Office 365 Management Activity API，Redis，s3，Stdin，Syslog，TCP，UDP（最常用的就是Log）

支援的輸出套件：

Elasticsearch，Logstash，Kafka，Redis，File，Console，ElasticCloud，Changetheoutputcodec（最常用的就是Elasticsearch，Logstash）

keystore的使用

keystore主要是防止敏感訊息被泄露，例如密碼等，像ES的密碼，這裡可以生成一個key為ES_PWD，值為ES的password的一個對應關係，在使用ES的密碼的時候就可以使用${ES_PWD}使用。

建立一個儲存密碼的keystore：filebeat keystore create
然後往其中新增鍵值對，例如：filebeatk eystore add ES_PWD
使用覆蓋原來鍵的值：filebeat key store add ES_PWD–force
刪除鍵值對：filebeat key store remove ES_PWD
檢視已有的鍵值對：filebeat key store list

例如：後期就可以透過${ES_PWD}使用其值，例如：

1	output.elasticsearch.password:"${ES_PWD}"

filebeat.yml配置（Log輸入型別為例）

詳情見官網：https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

type: log #input型別為log
enable: true #表示是該log型別配置生效
paths： #指定要監控的日誌，目前按照Go語言的glob函式處理。沒有對配置目錄做遞迴處理，例如配置的如果是：
- /var/log/* /*.log #則只會去/var/log目錄的所有子目錄中尋找以".log"結尾的檔案，而不會尋找/var/log目錄下以".log"結尾的檔案。
recursive_glob.enabled: #啟用全域遞迴模式，例如/foo/**包括/foo, /foo/*, /foo/*/*
encoding：#指定被監控的檔案的編碼型別，使用plain和utf-8都是可以處理中文日誌的
exclude_lines: ['^DBG'] #不包含匹配正則的行
include_lines: ['^ERR', '^WARN'] #包含匹配正則的行
harvester_buffer_size: 16384 #每個harvester在取得檔案時使用的緩衝區的位元組大小
max_bytes: 10485760 #單個日誌訊息可以擁有的最大位元組數。max_bytes之後的所有位元組都被丟棄而不發送。預設值為10MB (10485760)
exclude_files: ['.gz$'] #用於匹配希望Filebeat忽略的檔案的正規表示式串列
ingore_older: 0 #預設為0，表示禁用，可以配置2h，2m等，注意ignore_older必須大於close_inactive的值.表示忽略超過設定值未更新的
檔案或者檔案從來沒有被harvester收集
close_* #close_ *配置選項用於在特定標準或時間之後關閉harvester。關閉harvester意味著關閉檔案處理程式。如果在harvester關閉
後檔案被更新，則在scan_frequency過後，檔案將被重新拾取。但是，如果在harvester關閉時移動或刪除檔案，Filebeat將無法再次接收檔案
，並且harvester未讀取的任何資料都將遺失。
close_inactive #啟動選項時，如果在制定時間沒有被讀取，將關閉檔案控制代碼
讀取的最後一條日誌定義為下一次讀取的起始點，而不是基於檔案的修改時間
如果關閉的檔案發生變化，一個新的harverster將在scan_frequency執行後被啟動
建議至少設定一個大於讀取日誌頻率的值，配置多個prospector來實現針對不同更新速度的日誌檔案
使用內部時間戳機制，來反映紀錄日誌的讀取，每次讀取到最後一行日誌時開始倒計時使用2h 5m 來表示
close_rename #當選項啟動，如果檔案被重命名和移動，filebeat關閉檔案的處理讀取
close_removed #當選項啟動，檔案被刪除時，filebeat關閉檔案的處理讀取這個選項啟動後，必須啟動clean_removed
close_eof #適合只寫一次日誌的檔案，然後filebeat關閉檔案的處理讀取
close_timeout #當選項啟動時，filebeat會給每個harvester設定預設時間，不管這個檔案是否被讀取，達到設定時間後，將被關閉
close_timeout 不能等於ignore_older,會導致檔案更新時，不會被讀取如果output一直沒有輸出日誌事件，這個timeout是不會被啟動的，
至少要要有一個事件發送，然後haverter將被關閉
設定0 表示不啟動
clean_inactived #從機碼檔案中刪除先前收穫的檔案的狀態
設定必須大於ignore_older+scan_frequency，以確保在檔案仍在收集時沒有刪除任何狀態
配置選項有助於減小機碼檔案的大小，特別是如果每天都生成大量的新檔案
此配置選項也可用於防止在Linux上重用inode的Filebeat問題
clean_removed #啟動選項後，如果檔案在磁碟上找不到，將從機碼中清除filebeat
如果關閉close removed 必須關閉clean removed
scan_frequency #prospector檢查指定用於收穫的路徑中的新檔案的頻率,預設10s
tail_files：#如果設定為true，Filebeat從檔案尾開始監控檔案新增內容，把新增的每一行檔案作為一個事件依次發送，
而不是從檔案開始處重新發送所有內容。
symlinks：#符號連結選項允許Filebeat除常規檔案外,可以收集符號連結。收集符號連結時，即使報告了符號連結的路徑，
Filebeat也會開啟並讀取原始檔案。
backoff： #backoff選項指定Filebeat如何積極地抓取新檔案進行更新。預設1s，backoff選項定義Filebeat在達到EOF之後
再次檢查檔案之間等待的時間。
max_backoff： #在達到EOF之後再次檢查檔案之前Filebeat等待的最長時間
backoff_factor： #指定backoff嘗試等待時間幾次，預設是2
harvester_limit：#harvester_limit選項限制一個prospector並行啟動的harvester數量，直接影響檔案開啟數

tags #串列中新增標籤，用過過濾，例如：tags: ["json"]
fields #可選欄位，選擇額外的欄位進行輸出可以是標量值，元組，字典等巢狀型別
預設在sub-dictionary位置
filebeat.inputs:
fields:
app_id: query_engine_12
fields_under_root #如果值為ture，那麼fields儲存在輸出檔案的頂級位置

multiline.pattern #必須匹配的regexp模式
multiline.negate #定義上面的模式匹配條件的動作是否定的，預設是false
假如模式匹配條件'^b'，預設是false模式，表示講按照模式匹配進行匹配將不是以b開頭的日誌行進行合併
如果是true，表示將不以b開頭的日誌行進行合併
multiline.match # 指定Filebeat如何將匹配行組合成事件,在之前或者之後，取決於上面所指定的negate
multiline.max_lines #可以組合成一個事件的最大行數，超過將丟棄，預設500
multiline.timeout #定義超時時間，如果開始一個新的事件在超時時間內沒有發現匹配，也將發送日誌，預設是5s
max_procs #設定可以同時執行的最大CPU數。預設值為系統中可用的邏輯CPU的數量。
name #為該filebeat指定名字，預設為主機的hostname

例項一：Logstash作為輸出

filebeat.yml配置：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

# Change to true to enable this input configuration.
enabled: true

# Paths that should be crawled and fetched. Glob based paths.
paths: #配置多個日誌路徑
-/var/logs/es_aaa_index_search_slowlog.log
-/var/logs/es_bbb_index_search_slowlog.log
-/var/logs/es_ccc_index_search_slowlog.log
-/var/logs/es_ddd_index_search_slowlog.log
#- c:programdataelasticsearchlogs*

# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']

# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']

# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']

# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1

### Multiline options

# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation

# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^[

# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false

# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after

#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts #配多個logstash使用負載均衡機制
hosts: ["192.168.110.130:5044","192.168.110.131:5044","192.168.110.132:5044","192.168.110.133:5044"]
loadbalance: true #使用了負載均衡

# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"

# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

./filebeat -e #啟動filebeat

Logstash的配置：

1
2
3
4
5
6
7
8
9
10
11
12

input {
beats {
port => 5044
}
}

output {
elasticsearch {
hosts => ["http://192.168.110.130:9200"] #這裡可以配置多個
index => "query-%{yyyyMMdd}"
}
}

例項二：Elasticsearch作為輸出

filebeat.yml的配置：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

# Change to true to enable this input configuration.
enabled: true

# Paths that should be crawled and fetched. Glob based paths.
paths:
-/var/logs/es_aaa_index_search_slowlog.log
-/var/logs/es_bbb_index_search_slowlog.log
-/var/logs/es_ccc_index_search_slowlog.log
-/var/logs/es_dddd_index_search_slowlog.log
#- c:programdataelasticsearchlogs*

# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']

# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']

# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']

# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1

### Multiline options

# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation

# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^[

# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false

# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after

#============================= Filebeat modules ===============================

filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml

# Set to true to enable config reloading
reload.enabled: false

# Period on which files under path should be checked for changes
#reload.period: 10s

#==================== Elasticsearch template setting ==========================

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: filebeat222

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging

#cloud.auth:

#================================ Outputs =====================================

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["192.168.110.130:9200","92.168.110.131:9200"]

# Protocol - either `http` (default) or `https`.
#protocol: "https"

# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "elastic"
password: "${ES_PWD}" #透過keystore設定密碼

./filebeat -e #啟動Filebeat

在公眾號頂級架構師後台回復「架構」，取得一份驚喜禮包。

檢視Elasticsearch叢集，有一個預設的索引名字filebeat-%{[beat.version]}-%{+yyyy.MM.dd}

Filebeat模組

官網：https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html

這裡我使用Elasticsearch模式來解析ES的慢日誌搜尋，操作步驟如下，其他的模組操作也一樣：

前提：安裝好Elasticsearch和Kibana兩個軟體，然後使用Filebeat。

具體的操作官網有：https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules-quickstart.html

第一步，配置filebeat.yml檔案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
host: "192.168.110.130:5601" #指定kibana
username: "elastic" #使用者
password: "${ES_PWD}" #密碼，這裡使用了keystore，防止明文密碼

# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["192.168.110.130:9200","192.168.110.131:9200"]

# Protocol - either `http` (default) or `https`.
#protocol: "https"

# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "elastic" #es的使用者
password: "${ES_PWD}" # es的密碼
#這裡不能指定index，因為我沒有配置模板，會自動生成一個名為filebeat-%{[beat.version]}-%{+yyyy.MM.dd}的索引

第二步，配置Elasticsearch的慢日誌路徑：

1	cd filebeat-7.7.0-linux-x86_64/modules.d

vim elasticsearch.yml：

第三步，生效ES模組：

1	./filebeat modules elasticsearch

檢視生效的模組：

1	./filebeat modules list

第四步，初始化環境：

1	./filebeat setup -e

第五步，啟動Filebeat：

1	./filebeat -e

檢視Elasticsearch叢集，如下圖所示，把慢日誌搜尋的日誌都自動解析出來了：

到這裡，Elasticsearch這個module就實驗成功了。

PS：歡迎在留言區留下你的觀點，一起討論提高。如果今天的文章讓你有新的啟發，歡迎轉發分享給更多人。

版權申明：內容來源網路，版權歸原創者所有。除非無法確認，我們都會標明作者及出處，如有侵權煩請告知，我們會立即刪除並表示歉意。謝謝!

歡迎加入後端架構師交流群，在後台回復「學習」即可。

猜你還想看

阿里、騰訊、百度、華為、京東最新面試題彙集

圖解 Git 工作原理，看了秒懂！

入職騰訊第九年，我辭職了

深圳一名程式設計師因跳槽違反《競業協定》，賠償騰訊 97.6 萬元。。

BAT等大廠Java面試經驗總結

別找了，想取得 Java大廠面試題學習資料

掃下方二維碼回復「手冊」就好了

嘿，你在看嗎？