Telegraf安装及使用

技术2025-10-15 17

1 安装

1.1 创建用户

（1）添加用户

# useradd tigk # passwd tigk Changing password for user tigk. New password: BAD PASSWORD: The password is shorter than 8 characters Retype new password: passwd: all authentication tokens updated successfully.

(2)授权

个人用户的权限只可以在本home下有完整权限，其他目录需要别人授权。经常需要root用户的权限，可以通过修改sudoers文件来赋予权限，使用sudo命令。

# 赋予读写权限 # chmod -v u+w /etc/sudoers mode of ‘/etc/sudoers’ changed from 0440 (r--r-----) to 0640 (rw-r-----)

修改sudoers文件，添加新用户信息 vi /etc/sudoers，添加内容"elastic ALL=(ALL) ALL "

## Allow root to run any commands anywhere root ALL=(ALL) ALL tigk ALL=(ALL) ALL

收回权限

# chmod -v u-w /etc/sudoers mode of ‘/etc/sudoers’ changed from 0640 (rw-r-----) to 0440 (r--r-----)

创建tigk安装目录

# su - tigk $ mkdir /home/tigk/.local

（3）创建目录存放TIGK相关文件

# mkdir /data/tigk # chown tigk:tigk /data/tigk # su - tigk $ mkdir /data/tigk/telegraf $ mkdir /data/tigk/influxdb $ mkdir /data/tigk/kapacitor

1.2 Tar包安装

1.2.1 获取tar包

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.4_linux_amd64.tar.gz

1.2.2 解压tar包

$ tar xf /opt/package/telegraf-1.14.4_linux_amd64.tar.gz -C /home/tigk/.local/

1.2.3 生成简单配置

可执行文件在 {telegraf根目录}/usr/bin/telegraf，配置文件在安装后的etc目录下，也可直接配置生成

查看帮助telegraf --help

生成配置文件 telegraf config > telegraf.conf

生成带cpu、memroy、http_listener和influxdb插件的配置文件 telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config > telegraf.conf

执行程序 telegraf --config telegraf.conf

以后台方式启动 nohup telegraf --config telegraf > /dev/null 2>&1 &

$ cd /home/tigk/.local/telegraf/usr/bin $ ./telegraf --help $ ./telegraf config > telegraf.conf $ ./telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config > telegraf.conf

1.2.4 修改配置文件

[tigk@fbi-local-02 ~]$ mkdir /data/tigk/telegraf/logs

$ mkdir /data/tigk/telegraf/conf $ cp /home/tigk/.local/telegraf/usr/bin/telegraf.conf /data/tigk/telegraf/conf $ vim /data/tigk/telegraf/conf/telegraf.conf 找到[outputs.influxdb]部分提供用户名和密码,修改内容如下 [[outputs.influxdb]] urls = ["http://10.0.165.2:8085"] timeout = "5s" username = "tigk" password = "tigk" [agent] logfile = "/data/tigk/telegraf/logs/telegraf.log"

启动

$ cd /home/tigk/.local/telegraf/usr/bin $ nohup ./telegraf --config /data/tigk/telegraf/conf/telegraf.conf &

1.3 RPM包安装

（1）获取rpm包

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.4-1.x86_64.rpm

（2）安装rpm包

sudo yum localinstall telegraf-1.14.4-1.x86_64.rpm

（3）启动服务、添加开机启动

systemctl start telegraf.service service telegraf status systemctl enable telegraf.service

（4）查看版本，修改配置文件

telegraf --version

配置文件位置（默认配置）：/etc/telegraf/telegraf.conf 修改telegraf配置文件

vim /etc/telegraf/telegraf.conf

（5）启动

service telegraf start

2 使用

2.1 常见命令及配置

（1）命令展示 telegraf –h

$ ./telegraf -h Telegraf, The plugin-driven server agent for collecting and reporting metrics. Usage: telegraf [commands|flags] The commands & flags are: config print out full sample configuration to stdout version print the version to stdout --aggregator-filter <filter> filter the aggregators to enable, separator is : --config <file> configuration file to load --config-directory <directory> directory containing additional *.conf files --plugin-directory directory containing *.so files, this directory will be searched recursively. Any Plugin found will be loaded and namespaced. --debug turn on debug logging --input-filter <filter> filter the inputs to enable, separator is : --input-list print available input plugins. --output-filter <filter> filter the outputs to enable, separator is : --output-list print available output plugins. --pidfile <file> file to write our pid to --pprof-addr <address> pprof address to listen on, don't activate pprof if empty --processor-filter <filter> filter the processors to enable, separator is : --quiet run in quiet mode --section-filter filter config sections to output, separator is : Valid values are 'agent', 'global_tags', 'outputs', 'processors', 'aggregators' and 'inputs' --sample-config print out full sample configuration --test gather metrics, print them out, and exit; processors, aggregators, and outputs are not run --test-wait wait up to this many seconds for service inputs to complete in test mode --usage <plugin> print usage for a plugin, ie, 'telegraf --usage mysql' --version display the version and exit Examples: # generate a telegraf config file: telegraf config > telegraf.conf # generate config with only cpu input & influxdb output plugins defined telegraf --input-filter cpu --output-filter influxdb config # run a single telegraf collection, outputing metrics to stdout telegraf --config telegraf.conf --test # run telegraf with all plugins defined in config file telegraf --config telegraf.conf # run telegraf, enabling the cpu & memory input, and influxdb output plugins telegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb # run telegraf with pprof telegraf --config telegraf.conf --pprof-addr localhost:6060

（2）命令使用

命令解释telegraf --help查看帮助telegraf config > telegraf.conf标准输出生成配置文档模板telegraf --input-filter cpu --output-filter influxdb config只生成数据采集插件为cpu、输出插件为influxdb的配置文档模板telegraf --config telegraf.conf --test使用指定配置文件进行测试、将收集到的数据输出stdouttelegraf --config telegraf.conf使用指定文件启动telegraftelegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb按指定配置文件启动telegraf，过滤使用cpu、mem作为数据采集插件、influxdb为输出插件

（3）配置文档位置

安装方式默认位置默认补充配置文件夹Linux RPM包/etc/telegraf/telegraf.conf/etc/telegraf/telegraf.dLinux Tar包{安装目录}/etc/telegraf/telegraf.conf{安装目录}/etc/telegraf/telegraf.d

（4）配置加载方式命令默认加载telegraf.conf和/etc/telegraf/telegraf.d下的所有配置。选项—config和–config-directory可改变其行为。配置中每一个input模块，都会有对应的线程进行收集。如果有input配置重复，会造成资源浪费。

（5）配置全局tag标签在配置文件中的[global_tags]区域定义key=“value”形式的键值对，这样收集到的metrics都会打上这样子的标签（6）Agent配置 [agent] 区域可以对本机所有进行数据采集的agent进行配置。

属性说明interval数据采集间隔round_interval是否整时收集。如interval=10s，设置会使收集发生在每分钟的00，10，20，30…metric_batch_size发送到output的数据的分批大小metric_buffer_limit发给output的数据buffer大小collection_jitter收集数据前agent最大随机休眠时间，主要防止agent在同一时间收集数据flush_interval发送数据到output的时间间隔flush_jitter发送数据前最大随机休眠时间，主要防止一起发output时出现大的写高峰Precision时间精度logfile日志名debug是否debug模式quiet安静模式，只有错误消息hostname默认os.Hostname()，设置则覆盖omit_hostnameTag中是否需要包含hostname

（7）Input插件通用配置

属性说明interval数据采集间隔，如果设置，会覆盖Agent的设置name_override改变输出的measurement名字name_prefixmeasurement名字前缀name_suffixmeasurement名字后缀Tags添加到输出measurement 的一个tag字典

（8）Output通用插件配置：无通用配置（9）Measurement过滤，可以定义在input，output等插件中

属性说明namepass只有Measurement符合此正则的数据点通过namedropmeasurement符合此正则的数据点被丢弃fieldpass只有fieldkey符合此正则的field通过fielddropfieldkey符合此正则的field被丢弃tagpass只有tag符合此正则的点通过tagdroptag符合此正则的点被丢弃taginclude只有tag符合此正则的点通过,并丢掉不符合的tagtagexclude丢掉符合正则的tag

（10）典型配置举例 ①Input - System – cpu

# Read metrics about cpu usage [[inputs.cpu]] ## Whether to report per-cpu stats or not percpu = true ## Whether to report total system cpu stats or not totalcpu = true ## If true, collect raw CPU time metrics. collect_cpu_time = false ## If true, compute and report the sum of all non-idle CPU states. report_active = false

②Input - System – disk

# Read metrics about disk usage by mount point [[inputs.disk]] ## By default stats will be gathered for all mount points. ## Set mount_points will restrict the stats to only the specified mount points. # mount_points = ["/"] ## Ignore mount points by filesystem type. ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

③Input - System – kernel

# Get kernel statistics from /proc/stat [[inputs.kernel]] # no configuration

④Input - System – MEM

# Read metrics about memory usage [[inputs.mem]] # no configuration

⑤Input - System – netstat

# # Read TCP metrics such as established, time wait and sockets counts. # [[inputs.netstat]] # # no configuration

⑥Input - System – processes

# Get the number of processes and group them by status [[inputs.processes]] # no configuration

⑦Input - System – system

# Read metrics about system load & uptime [[inputs.system]] ## Uncomment to remove deprecated metrics. # fielddrop = ["uptime_format"]

⑧Input - System – ping

# # Ping given url(s) and return statistics # [[inputs.ping]] # ## Hosts to send ping packets to. # urls = ["example.org"] # # ## Method used for sending pings, can be either "exec" or "native". When set # ## to "exec" the systems ping command will be executed. When set to "native" # ## the plugin will send pings directly. # ## # ## While the default is "exec" for backwards compatibility, new deployments # ## are encouraged to use the "native" method for improved compatibility and # ## performance. # # method = "exec" # # ## Number of ping packets to send per interval. Corresponds to the "-c" # ## option of the ping command. # # count = 1 # # ## Time to wait between sending ping packets in seconds. Operates like the # ## "-i" option of the ping command. # # ping_interval = 1.0 # # ## If set, the time to wait for a ping response in seconds. Operates like # ## the "-W" option of the ping command. # # timeout = 1.0 # # ## If set, the total ping deadline, in seconds. Operates like the -w option # ## of the ping command. # # deadline = 10 # # ## Interface or source address to send ping from. Operates like the -I or -S # ## option of the ping command. # # interface = "" # # ## Specify the ping executable binary. # # binary = "ping" # # ## Arguments for ping command. When arguments is not empty, the command from # ## the binary option will be used and other options (ping_interval, timeout, # ## etc) will be ignored. # # arguments = ["-c", "3"] # # ## Use only IPv6 addresses when resolving a hostname. # # ipv6 = false

⑨Input - App – procstat

# [[inputs.procstat]] # ## PID file to monitor process # pid_file = "/var/run/nginx.pid" # ## executable name (ie, pgrep <exe>) # # exe = "nginx" # ## pattern as argument for pgrep (ie, pgrep -f <pattern>) # # pattern = "nginx" # ## user as argument for pgrep (ie, pgrep -u <user>) # # user = "nginx" # ## Systemd unit name # # systemd_unit = "nginx.service" # ## CGroup name or path # # cgroup = "systemd/system.slice/nginx.service" # # ## Windows service name # # win_service = "" # # ## override for process_name # ## This is optional; default is sourced from /proc/<pid>/status # # process_name = "bar" # # ## Field name prefix # # prefix = "" # # ## When true add the full cmdline as a tag. # # cmdline_tag = false # # ## Add PID as a tag instead of a field; useful to differentiate between # ## processes whose tags are otherwise the same. Can create a large number # ## of series, use judiciously. # # pid_tag = false # # ## Method to use when finding process IDs. Can be one of 'pgrep', or # ## 'native'. The pgrep finder calls the pgrep executable in the PATH while # ## the native finder performs the search directly in a manor dependent on the # ## platform. Default is 'pgrep' # # pid_finder = "pgrep"

⑩Input – App – redis

# # Read metrics from one or many redis servers # [[inputs.redis]] # ## specify servers via a url matching: # ## [protocol://][:password]@address[:port] # ## e.g. # ## tcp://localhost:6379 # ## tcp://:password@192.168.99.100 # ## unix:///var/run/redis.sock # ## # ## If no servers are specified, then localhost is used as the host. # ## If no port is specified, 6379 is used # servers = ["tcp://localhost:6379"] # # ## specify server password # # password = "s#cr@t%" # # ## Optional TLS Config # # tls_ca = "/etc/telegraf/ca.pem" # # tls_cert = "/etc/telegraf/cert.pem" # # tls_key = "/etc/telegraf/key.pem" # ## Use TLS but skip chain & host verification # # insecure_skip_verify = true

⑪Input – App – kafka_consumer

# # Read metrics from Kafka topics # [[inputs.kafka_consumer]] # ## Kafka brokers. # brokers = ["localhost:9092"] # # ## Topics to consume. # topics = ["telegraf"] # # ## When set this tag will be added to all metrics with the topic as the value. # # topic_tag = "" # # ## Optional Client id # # client_id = "Telegraf" # # ## Set the minimal supported Kafka version. Setting this enables the use of new # ## Kafka features and APIs. Must be 0.10.2.0 or greater. # ## ex: version = "1.1.0" # # version = "" # # ## Optional TLS Config # # enable_tls = true # # tls_ca = "/etc/telegraf/ca.pem" # # tls_cert = "/etc/telegraf/cert.pem" # # tls_key = "/etc/telegraf/key.pem" # ## Use TLS but skip chain & host verification # # insecure_skip_verify = false # # ## SASL authentication credentials. These settings should typically be used # ## with TLS encryption enabled using the "enable_tls" option. # # sasl_username = "kafka" # # sasl_password = "secret" # # ## SASL protocol version. When connecting to Azure EventHub set to 0. # # sasl_version = 1 # # ## Name of the consumer group. # # consumer_group = "telegraf_metrics_consumers" # # ## Initial offset position; one of "oldest" or "newest". # # offset = "oldest" # # ## Consumer group partition assignment strategy; one of "range", "roundrobin" or "sticky". # # balance_strategy = "range" # # ## Maximum length of a message to consume, in bytes (default 0/unlimited); # ## larger messages are dropped # max_message_len = 1000000 # # ## Maximum messages to read from the broker that have not been written by an # ## output. For best throughput set based on the number of metrics within # ## each message and the size of the output's metric_batch_size. # ## # ## For example, if each message from the queue contains 10 metrics and the # ## output metric_batch_size is 1000, setting this to 100 will ensure that a # ## full batch is collected and the write is triggered immediately without # ## waiting until the next flush_interval. # # max_undelivered_messages = 1000 # # ## Data format to consume. # ## Each data format has its own unique set of configuration options, read # ## more about them here: # ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md # data_format = "influx"

⑫Input – App – exec

# [[outputs.exec]] # ## Command to ingest metrics via stdin. # command = ["tee", "-a", "/dev/null"] # # ## Timeout for command to complete. # # timeout = "5s" # # ## Data format to output. # ## Each data format has its own unique set of configuration options, read # ## more about them here: # ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md # # data_format = "influx"

⑬Output – influxdb

# # Configuration for sending metrics to InfluxDB # [[outputs.influxdb_v2]] # ## The URLs of the InfluxDB cluster nodes. # ## # ## Multiple URLs can be specified for a single cluster, only ONE of the # ## urls will be written to each interval. # ## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"] # urls = ["http://127.0.0.1:9999"] # # ## Token for authentication. # token = "" # # ## Organization is the name of the organization you wish to write to; must exist. # organization = "" # # ## Destination bucket to write into. # bucket = "" # # ## The value of this tag will be used to determine the bucket. If this # ## tag is not set the 'bucket' option is used as the default. # # bucket_tag = "" # # ## If true, the bucket tag will not be added to the metric. # # exclude_bucket_tag = false # # ## Timeout for HTTP messages. # # timeout = "5s" # # ## Additional HTTP headers # # http_headers = {"X-Special-Header" = "Special-Value"} # # ## HTTP Proxy override, if unset values the standard proxy environment # ## variables are consulted to determine which proxy, if any, should be used. # # http_proxy = "http://corporate.proxy:3128" # # ## HTTP User-Agent # # user_agent = "telegraf" # # ## Content-Encoding for write request body, can be set to "gzip" to # ## compress body or "identity" to apply no encoding. # # content_encoding = "gzip" # # ## Enable or disable uint support for writing uints influxdb 2.0. # # influx_uint_support = false # # ## Optional TLS Config for use on HTTP connections. # # tls_ca = "/etc/telegraf/ca.pem" # # tls_cert = "/etc/telegraf/cert.pem" # # tls_key = "/etc/telegraf/key.pem" # ## Use TLS but skip chain & host verification # # insecure_skip_verify = false

2.2 获取官方未提供input plugin的应用

如获取yarn中的应用，并存入influxdb：①可利用input插件exec，执行某个脚本，使其标准输出打印符合influxdb line protocol的日志②通过脚本里利用yarn的api获取正在跑的应用

#!bin/python import json import urllib import httplib host="10.0.165.3:8088" path="/ws/v1/cluster/apps" data=urllib.urlencode({'state':"RUNNING","applicationTypes":"Apache Flink"}) path=path+"?"+data headers = {"Accept":"application/json"} conn=httplib.HTTPConnection(host) conn.request("GET",path,headers=headers) result=conn.getresponse() if(result.status): content = result.read() apps = json.loads(content)["apps"]["app"] for app in apps: if("test" in app["name"] or "TEST" in app["name"] or "Test" in app["name"]): continue app["escaped_name"] = app["name"].replace(' ','\ ') print "APPLICATION.RUNNING,appname=%s,appid=%s field_appname=\"%s\",field_appid=\"%s\" " % (app["escaped_name"],app["id"],app["name"],app["id"])

执行结果为APPLICATION.RUNNING,appname=iot_road_traffic,appid=application_1592979353214_0175 field_appname=“iot_road_traffic”,field_appid=“application_1592979353214_0175” 配置input插件的exec如下

[[outputs.exec]] ## Command to ingest metrics via stdin. command = ["python", "/data/tigk/telegraf/exec/getRunningFlinkJob.py"] # Timeout for command to complete. timeout = "5s" ## Data format to output. ## Each data format has its own unique set of configuration options, read ## more about them here: ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md data_format = "influx"

Processed: 0.013, SQL: 9