工作时候,偶尔会有手动检查ELB Access Log的时候,下载下来的access log是空格隔开的,人眼将数值对应到字段名非常累。

就顺手写了个Ruby的小程序来解析access log的内容,转换到csv格式。再使用Excel打开csv文件来进行查看。

用途

读取access log日志,转换为csv。

主要用于临时检查小数据量access log来定位问题。如果是基于access log来进行数据分析,还是需要结合RedShift来进行。

用法

参数–src_access_log来指定input的access log文件,参数–dest_csv_file来指定output的csv文件

1
ruby aws_elb_access_log_parser.rb --src_access_log source_log --dest_csv_file dest_csv

源码

Gist地址: https://gist.github.com/jibing57/ea180bfc3f7cb96e4a1fa67aa7a7c0c2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
require 'csv'
require 'optparse'
class AWSELBAccessLogParser
def initialize()
@@elb_access_log_format=%Q(timestamp elb client:port backend:port request_processing_time backend_processing_time response_processing_time elb_status_code backend_status_code received_bytes sent_bytes "request" "user_agent" ssl_cipher ssl_protocol)
# puts "elb_access_log_format is #{elb_access_log_format.split(" ")}"
@@line_regex = /
(?<timestamp>[^ ]*) # timestamp
\s+(?<elb>[^ ]*) # elb
\s+(?<client>[^ ]*):(?<client_port>[0-9]*) # client:port
\s+(?<backend>[^ ]*):(?<backend_port>[0-9]*) # backend:port
\s+(?<request_processing_time>[-.0-9]*) # request_processing_time value: 0.000056 or -1
\s+(?<backend_processing_time>[-.0-9]*) # backend_processing_time value: 0.093779 or -1
\s+(?<response_processing_time>[-.0-9]*) # response_processing_time value: 0.000049 or -1
\s+(?<elb_status_code>-|[0-9]*) # elb_status_code
\s+(?<backend_status_code>-|[0-9]*) # backend_status_code
\s+(?<received_bytes>[-0-9]*) # received_bytes
\s+(?<sent_bytes>[-0-9]*) # sent_bytes
# \s+\"(?<request_method>[^ ]*)\s+(?<request_uri>[^ ]*)\s+(?<request_version>- |[^ ]*)\" # request section
\s+\"(?<request>[^ ]*\s+[^ ]*\s+[^ ]*)\" # entire request
\s+\"(?<user_agent>[^ ]*.*[^ ]*)\" # entire user_agent
\s+(?<ssl_cipher>[^ ]*) # ssl_cipher
\s+(?<ssl_protocol>[^ ]*) # ssl_protocol
/x
end
def parse_line(line)
return nil if line.nil?
line.match(@@line_regex)
end
def parse_log_to_csv(src_file, dst_file)
if src_file.nil? or dst_file.nil?
puts "please entry the right src_file and dst_file"
return false
end
if !File.readable?(src_file)
puts "src_file[#{src_file}] is not readable"
return false
end
if !File.writable?(File.dirname(dst_file))
puts "dst_file[#{dst_file}] is not writable"
return false
end
# output fields name to dst_file
CSV.open(dst_file, "w") do |data|
first_line = File.open(src_file, "r") {|f| f.readline}
puts "first_line of file[#{src_file}] is #{first_line}"
parts = self.parse_line(first_line)
data << parts.names
end
# parse the log file and store to dest csv file
CSV.open(dst_file, "a+") do |data|
File.open(src_file, "r").each do |line|
parts = parse_line(line)
if parts == nil
puts "Error -- Can't parse line [#{line}]"
next
end
line_csv_array=[]
parts.names.each { |filed_name| line_csv_array.push(parts[filed_name]) }
# puts line_csv_array.inspect
data << line_csv_array
end
end
end
end
# Parse the command line
src_access_log=""
dest_csv_file=""
options = {}
begin
opts = OptionParser.new
opts.banner = "Usage: #{$PROGRAM_NAME} [options] ..."
opts.separator ''
opts.separator 'Options:'
opts.on('-s src_access_log',
'--src_access_log src_access_log',
String,
'Set source access log file') {|key| options[:src_access_log] = key}
opts.on('-d dest_csv_file',
'--dest_csv_file dest_csv_file',
String,
'set output csv file name') {|key| options[:dest_csv_file] = key}
opts.on('-h', '--help', 'Show this message') do
puts opts
exit
end
rescue OptionParser::ParseError
puts "Oops... #{$!}"
puts opts
exit
end
begin
opts.parse!
mandatory = [:src_access_log, :dest_csv_file] # Enforce the presence of
missing = mandatory.select{ |param| options[param].nil? }
if not missing.empty?
puts "Missing options: #{missing.join(', ')}"
puts opts
exit -1
end
rescue OptionParser::InvalidOption, OptionParser::MissingArgument
puts $!.to_s
puts opts#
exit -1
end
AWSELBAccessLogParser.new().parse_log_to_csv(options[:src_access_log], options[:dest_csv_file])

延伸阅读

留言