一、介绍 logstash是一个数据抽取工具,将数据从一个地方转移到另一个地方。logstash之所以功能强大和流行,还与其丰富的过滤器插件是分不开的,过滤器提供的并不单单是过滤的功能,还可以对进入过滤器的原始数据进行复杂的逻辑处理,甚至添加独特的事件到后续流程中。
Logstash配置文件有如下三部分组成,其中input、output部分是必须配置,filter部分是可选配置,而filter就是过滤器插件,可以在这部分实现各种日志过滤功能。
二、安装 1 wget -c https:// artifacts.elastic.co/downloads/ logstash/logstash-7.17 .5 -linux-x86_64.tar.gz
三、开始 logback的 logstash-sample.conf
一般不进行配置,而是在config目录下新建一个配置文件进行配置。配置完毕后,通过下面命令指定配置文件启动
1 logstash.bat -f ../config/配置文件
读取文件
logstash使用一个名为filewatch的ruby gem库来监听文件变化,并通过一个叫.sincedb的数据库文件来记录被监听的日志文件的读取进度(时间戳),这个sincedb数据文件的默认路径在 <path.data>/plugins/inputs/file下面,文件名类似于.sincedb_123456,而<path.data>表示logstash插件存储目录,默认是LOGSTASH_HOME/data。
input通常有以下几个配置项,只能选用其中一种
file:从文件系统中读取一个文件,很像UNIX命令 “tail -0a”
syslog:监听514端口,按照RFC3164标准解析日志数据
redis:从redis服务器读取数据,支持channel(发布订阅)和list模式。redis一般在Logstash消费集群中作为”broker”角色,保存events队列共Logstash消费。
tcp:从网络中获取数据
stdin:标准输如,从控制台获取
其他扩展,如jdbc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 stdin { } file { path => "/data/*" path => ["/data/*.log" ,"F:/*.log" ] exclude => "1.log" add_field => {"test" =>"test" } tags => "tag1" delimiter => "\n" discover_interval => 15 stat_interval => 1 start_position => beginning sincedb_path => "\/test.txt" sincedb_write_interval => 15 } syslog { type => "system-syslog" port => 10514 }
3.2 output输出插件 output用于向外输出,一个事件可以经过多个输出,而一旦所有输出处理完成,整个事件就执行完成。 一些常用的输出包括(输出的模式可以选用多种):
file: 表示将数据写入磁盘上的文件。
elasticsearch:表示将数据发送给Elasticsearch。Elasticsearch可以高效方便和易于查询的保存数据。
stdout:输出到控制台
(1)输出到标准输出(stdout)
1 2 3 4 5 output { stdout { codec => rubydebug } }
(2)保存为文件(file)
1 2 3 4 5 6 output { file { path => "/data/log/%{+yyyy-MM-dd}/%{host}_%{+HH}.log" } }
(3)输出到elasticsearch
1 2 3 4 5 6 7 output { elasticsearch { hosts => ["192.168.1.1:9200" ] index => "logstash-%{+YYYY.MM.dd}" } }
hosts:是一个数组类型的值,后面跟的值是elasticsearch节点的地址与端口,默认端口是9200。可添加多个地址。
index:写入elasticsearch的索引的名称,这里可以使用变量。Logstash提供了%{+YYYY.MM.dd}这种写法。在语法解析的时候,看到以+ 号开头的,就会自动认为后面是时间格式,尝试用时间格式来解析后续字符串。这种以天为单位分割的写法,可以很容易的删除老的数据或者搜索指定时间范围内的数据。此外,注意索引名中不能有大写字母。
3.4 简单示例 (1)控制栏输入输出
1 2 3 4 5 6 7 8 9 input { stdin { } # 从控制台中输入来源 }output { stdout { codec => rubydebug } }
结果:
1 2 3 4 5 6 7 234234234234234234 { "@timestamp" => 2022 -09 -01 T14:44 :04.277 Z, "message" => "234234234234234234" , "@version" => "1" , "host" => "k8s-master01" }
3.5 filter过滤器插件 filter是可选配置,用来过滤从input中读取的数据,一般用于日志处理
Grok 正则捕获 grok是一个十分强大的 filter
插件,他可以通过正则解析任意文本,将非结构化日志数据弄成结构化和方便查询的结构。他是目前logstash 中解析非结构化日志数据最好的方式。
Grok 的语法规则是:
例如输入的内容为:
1 172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
%{IP:ip}匹配模式将获得的结果为:ip: 192.168.0.1
%{HTTPDATE:timestamp}匹配模式将获得的结果为:timestamp: 07/Feb/2018:16:24:19 +0800
而%{QS:referrer}匹配模式将获得的结果为:referrer: “GET / HTTP/1.1”
下面是一个组合匹配模式,它可以获取上面输入的所有内容:
1 %{IP:clientip}\ \[%{HTTPDATE:timestamp}\] \ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}
通过上面这个组合匹配模式,我们将输入的内容分成了五个部分,即五个字段,将输入内容分割为不同的数据字段,这对于日后解析和查询日志数据非常有用,这正是使用grok的目的。
例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 input { stdin {} }filter { grok { match => ["message" ,"%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" ] } }output { stdout { codec => "rubydebug" } }
输入输出内容:
1 2 3 4 5 6 7 8 9 10 11 12 172.16 .213 .132 [07 /Feb/2019 :16 :24 :19 +0800 ] "GET / HTTP/1.1" 403 5039 { "host" => "k8s-master01" , "message" => "172.16.213.132 [07/Feb/2019:16:24:19 +0800] \"GET / HTTP/1.1\" 403 5039" , "@version" => "1" , "clientip" => "172.16.213.132" , "referrer" => "\"GET / HTTP/1.1\"" , "timestamp" => "07/Feb/2019:16:24:19 +0800" , "response" => "403" , "bytes" => "5039" , "@timestamp" => 2022 -09 -01 T14:48 :40.999 Z }
grok 内置类型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 USERNAME :+ USER :%{USERNAME} INT :(?:?(?:+)) BASE10NUM :(?<!)(?>?(?:(?:+(?:\.+)?)|(?:\.+))) NUMBER :(?:%{BASE10NUM}) BASE16NUM :(?<!)(?:?(?:0x)?(?:+)) BASE16FLOAT :\b(?<!)(?:?(?:0x)?(?:(?:+(?:\.*)?)|(?:\.+)))\b POSINT :\b(?:*)\b NONNEGINT :\b(?:+)\b WORD :\b\w+\b NOTSPACE :\S+ SPACE :\s* DATA :.*? GREEDYDATA :.* QUOTEDSTRING :(?>(?<!\\)(?>"(?>\\.|[^\\"] +)+" |"" |(?>'(?>\\.|+)+')|''|(?>`(?>\\.|+)+`)|``)) UUID :{8}-(?:{4}-){3}{12} # Networking MAC :(?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) CISCOMAC :(?:(?:{4}\.){2}{4}) WINDOWSMAC :(?:(?:{2}-){5}{2}) COMMONMAC :(?:(?:{2}:){5}{2}) IPV6 :((({1,4}:){7}({1,4}|:))|(({1,4}:){6}(:{1,4}|((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3})|:))|(({1,4}:){5}(((:{1,4}){1,2})|:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3})|:))|(({1,4}:){4}(((:{1,4}){1,3})|((:{1,4})?:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3}))|:))|(({1,4}:){3}(((:{1,4}){1,4})|((:{1,4}){0,2}:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3}))|:))|(({1,4}:){2}(((:{1,4}){1,5})|((:{1,4}){0,3}:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3}))|:))|(({1,4}:){1}(((:{1,4}){1,6})|((:{1,4}){0,4}:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3}))|:))|(:(((:{1,4}){1,7})|((:{1,4}){0,5}:((25|2\d|1\d\d|?\d)(\.(25|2\d|1\d\d|?\d)){3}))|:)))(%.+)? IPV4 (?<!)(?:(?:25|2|?{1,2})(?:25|2|?{1,2})(?:25|2|?{1,2})(?:25|2|?{1,2}))(?!) IP :(?:%{IPV6}|%{IPV4}) HOSTNAME :\b(?:{0,62})(?:\.(?:{0,62}))*(\.?|\b) HOST :%{HOSTNAME} IPORHOST :(?:%{HOSTNAME}|%{IP}) HOSTPORT :%{IPORHOST}:%{POSINT} # paths PATH :(?:%{UNIXPATH}|%{WINPATH}) UNIXPATH :(?>/(?>+|\\.)*)+ TTY :(?:/dev/(pts|tty()?)(\w+)?/?(?:+)) WINPATH :(?>+:|\\)(?:\\*)+ URIPROTO :+(\++)? URIHOST :%{IPORHOST}(?::%{POSINT:port})? # uripath comes loosely from RFC1738, but mostly from what Firefox # doesn't turn into %XX URIPATH :(?:/*)+ #URIPARAM \?(?:+(?:=(?:*))?(?:&(?:+(?:=(?:*))?)?)*)? URIPARAM :\?* URIPATHPARAM :%{URIPATH}(?:%{URIPARAM})? URI :%{URIPROTO}://(?:%{USER}(?::*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? # Months: January, Feb, 3, 03, 12, December MONTH :\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHNUM :(?:0?|1) MONTHNUM2 :(?:0|1) MONTHDAY :(?:(?:0)|(?:)|(?:3)|) # Days: Monday, Tue, Thu, etc... DAY :(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) # Years? YEAR :(?>\d\d){1,2} HOUR :(?:2|?) MINUTE :(?:) # '60' is a leap second in most time standards and thus is valid. SECOND :(?:(?:?|60)(?:+)?) TIME :(?!<)%{HOUR}:%{MINUTE}(?::%{SECOND})(?!) # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it) DATE_US :%{MONTHNUM}%{MONTHDAY}%{YEAR} DATE_EU :%{MONTHDAY}%{MONTHNUM}%{YEAR} ISO8601_TIMEZONE :(?:Z|%{HOUR}(?::?%{MINUTE})) ISO8601_SECOND :(?:%{SECOND}|60) TIMESTAMP_ISO8601 :%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? DATE :%{DATE_US}|%{DATE_EU} DATESTAMP :%{DATE}%{TIME} TZ :(?:T|UTC) DATESTAMP_RFC822 :%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} DATESTAMP_RFC2822 :%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE} DATESTAMP_OTHER :%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} DATESTAMP_EVENTLOG :%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND} # Syslog Dates: Month Day HH:MM:SS SYSLOGTIMESTAMP :%{MONTH} +%{MONTHDAY} %{TIME} PROG :(?:+) SYSLOGPROG :%{PROG:program}(?:\)? SYSLOGHOST :%{IPORHOST} SYSLOGFACILITY :<%{NONNEGINT:facility}.%{NONNEGINT:priority}> HTTPDATE :%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} # Shortcuts QS :%{QUOTEDSTRING} # Log formats SYSLOGBASE :%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: COMMONAPACHELOG :%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \ "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) COMBINEDAPACHELOG :%{COMMONAPACHELOG} %{QS:referrer} %{QS:agent} # Log Levels LOGLEVEL :(lert|ALERT|race|TRACE|ebug|DEBUG|otice|NOTICE|nfo|INFO|arn?(?:ing)?|WARN?(?:ING)?|rr?(?:or)?|ERR?(?:OR)?|rit?(?:ical)?|CRIT?(?:ICAL)?|atal|FATAL|evere|SEVERE|EMERG(?:ENCY)?|merg(?:ency)?)
时间处理(Date) date插件是对于排序事件和回填旧数据尤其重要,它可以用来转换日志记录中的时间字段,变成LogStash::Timestamp对象,然后转存到@timestamp字段里。
下面是date插件的一个配置示例(这里仅仅列出filter部分):
1 2 3 4 5 6 7 8 9 10 filter { grok { match => ["message" , "%{HTTPDATE:timestamp}" ] } date { match => ["timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] target => "timestamp" } }
数据修改(Mutate)
gsub可以通过正则表达式替换字段中匹配到的值,只对字符串字段有效,下面是一个关于mutate插件中gsub的示例(仅列出filter部分):
1 2 3 4 5 6 filter { mutate { gsub => ["filed_name_1" , "/" , "_" ] } }
这个示例表示将filed_name_1字段中所有”/“字符替换为”_”。
split可以通过指定的分隔符分割字段中的字符串为数组,下面是一个关于mutate插件中split的示例(仅列出filter部分):
1 2 3 4 5 6 filter { mutate { split => ["filed_name_2" , "|" ] } }
这个示例表示将filed_name_2字段以”|”为区间分隔为数组。
rename可以实现重命名某个字段的功能,下面是一个关于mutate插件中rename的示例(仅列出filter部分):
1 2 3 4 5 6 filter { mutate { rename => { "old_field" => "new_field" } } }
这个示例表示将字段old_field重命名为new_field。
remove_field可以实现删除某个字段的功能,下面是一个关于mutate插件中remove_field的示例(仅列出filter部分):
1 2 3 4 5 6 filter { mutate { remove_field => ["timestamp" ] } }
这个示例表示将字段timestamp删除。
1 2 3 4 5 6 filter { geoip { source => "ip_field" } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 input { file { path => "/home/es/elastic-stack/logstash/example/test.log" start_position => beginning add_field => {"test" =>"test" } #增加标签 stat_interval => 1 } }filter { grok { match => ["message" , "%{IP:clientIp}\ \-\ \-\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}\ %{GREEDYDATA:ua}" ] } date { match => ["timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] target => "timestamp" } mutate { gsub => ["ua" , "\"-\" " , "" ] gsub => ["ua" , "\"" , "" ] gsub => ["ua" , "\r" , "" ] gsub => ["referrer" , "\"" , "" ] split => ["clientIp" , "." ] rename => ["timestamp" , "create_time" ] remove_field => ["message" ] } }output { stdout { codec => rubydebug } # file { }
三 、配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 node.name: test path.data: pileline.id: main pipeline.workers: pipeline.batch.size: 125 pipeline.batch.delay: 50 pipeline.unsafe_shutdown: false pipeline.ordered: auto path.config: config.string: config.test_and_exit: false config.reload.automatic: false config.reload.interval: 3s config.debug: false config.support_escapes: false http.enabled: true http.host: 127.0 .0 .1 http.port: 9600 -9700 modules: - name: MODULE_NAME var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE queue.type: memory path.queue: queue.max_events: 0 queue.max_bytes: 1024mb queue.checkpoint.acks: 1024 queue.checkpoint.writes: 1024 queue.checkpoint.interval: 1000 dead_letter_queue.enable: false dead_letter_queue.max_bytes: 1024mb path.dead_letter_queue: log.level: info log.format: path.logs: path.plugins: []pipeline.separate_logs: false