带你走进Impala query profile第三篇

技术2023-11-01 114

在本博客系列的第3篇，我将仍然关注查询计划（Query Plan）和执行摘要（Execution Summary）。

现在，让我们看看Profile的Planner Timeline和Query Timeline部分：

Planner Timeline: 21.773ms

- Analysis finished: 1.432ms (1.432ms)

- Equivalence classes computed: 1.478ms (46.487us)

- Single node plan created: 21.005ms (19.526ms)

- Runtime filters computed: 21.020ms (14.599us)

- Distributed plan created: 21.096ms (76.103us)

- Lineage info computed: 21.156ms (59.924us)

- Planning finished: 21.773ms (617.260us)

Query Timeline: 14m8s

- Query submitted: 250.799us (250.799us)

- Planning finished: 24.359ms (24.108ms)

- Submit for admission: 25.375ms (1.016ms)

- Completed admission: 25.565ms (189.521us)

- Ready to start on 6 backends: 26.997ms (1.432ms)

- All 6 execution backends (7 fragment instances) started: 33.022ms (6.024ms)

- Rows available: 14m8s (14m8s)

- First row fetched: 14m8s (695.112ms)

- Unregister query: 14m8s (5.330ms)

- ComputeScanRangeAssignmentTimer: 110.741us

每行的信息都很容易理解，我们可以看到运行查询计划花费了21.773ms，从25.375ms开始向admission提交查询计划，从33.022ms开始在工作节点上执行查询计划，在14m8s时准备好最后一行数据并在14m8s时第一行数据被客户端获取（fetch）。这使你可以很清楚地了解每个阶段所花的时间，如果任何阶段都很慢，那将是非常明显的，然后我们可以开始进一步深入研究，以了解可能发生了什么。

从查询时间轴（Query Timeline）中，我们可以清楚地看到，从开始执行（一共6个执行后端（All 6 execution backends））到数据可用（可用行（Rows available））几乎花费了14分钟（14m8s）。由于这里只是count(*)，所以没有最后一行。

比如下面的案例：（另一个profile）

Query Compilation: 16.268ms

- Metadata of all 1 tables cached: 1.786ms (1.786ms)

- Analysis finished: 6.162ms (4.376ms)

- Value transfer graph computed: 6.537ms (374.918us)

- Single node plan created: 7.955ms (1.417ms)

- Runtime filters computed: 8.274ms (318.815us)

- Distributed plan created: 8.430ms (156.307us)

- Lineage info computed: 9.664ms (1.234ms)

- Planning finished: 16.268ms (6.603ms)

Query Timeline: 35m46s

- Query submitted: 0.000ns (0.000ns)

- Planning finished: 22.001ms (22.001ms)

- Submit for admission: 23.001ms (1.000ms)

- Completed admission: 23.001ms (0.000ns)

- Ready to start on 2 backends: 24.001ms (1.000ms)

- All 2 execution backends (2 fragment instances) started: 36.001ms (12.000ms)

- Rows available: 5m51s (5m51s)

- First row fetched: 5m52s (950.045ms)

- Last row fetched: 35m46s (29m53s)

- Released admission control resources: 35m46s (1.000ms)

- Unregister query: 35m46s (30.001ms)

- ComputeScanRangeAssignmentTimer: 0.000ns

Impala查询运行了很长时间，客户想要找出原因。从查询时间轴（Query Timeline）中，我们可以清楚地看到，从开始执行（一共两个执行后端（All 2 execution backends））到数据可用（可用行（Rows available））几乎花费了6分钟（5m51s）。这6分钟的执行可能是正常的，就像有很多大数据集的连接（join）一样，查询运行几分钟是很常见的。

但是，我们可以注意到Impala花了30分钟将数据传递回客户端，因为第一行在第6分钟获取，而最后一行在第36分钟获取。因此，从这里，我们可以怀疑Impala协调器（coordinator）和客户端之间可能存在一些网络问题（当从客户端，如impala-shell或Hue，到Impala协调器主机获取数据时）。另一种可能性是客户端可能在获取结果时也在执行其他操作，如在屏幕上打印，因为返回的数据可能很大，该操作可能很耗时。

因此，这部分概要信息可以指引我们找到寻找瓶颈的正确方向。

几个关键的信息：

1）反向阅读，从下到上，因为这是Impala的操作顺序

2）比较“Avg Time”和“Max Time”列的数据

3）比较“Rows”和“Est. #Rows”列的数据

4）检查“Detail”列以查看每个操作的Join类型

Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail

----------------------------------------------------------------------------------------------------------------------------------

03:AGGREGATE 1 316.053us 316.053us 1 1 40.00 KB 10.00 MB FINALIZE

02:EXCHANGE 1 14m8s 14m8s 6 1 0 0 UNPARTITIONED

01:AGGREGATE 6 2.746ms 5.205ms 6 1 67.00 KB 10.00 MB

00:SCAN KUDU 6 2m33s 14m8s 6.10M -1 299.00 KB 0 bms_ldgdb.ldg_690marketization_ledger t

很快，我就注意到扫描KUDU（SCAN KUDU）操作的“Avg Time”和“Max Time”的差别很大，平均时间为2分33秒，但是6个主机中的一个主机的最长时间为14分8秒。

因此，我的下一个想法是确定哪些主机的执行速度比其他主机慢，以及是否来自同一主机。为此，我使用字符串“id=0”来搜索Profile文件，“0”是Summary部分“00:SCAN KUDU”中每一行开头的操作编号。

我从文件开头搜索到了“id=0”部分：这里我只过滤出我想要的内容。

……

Instance 2945a77ff619defe:b658730000000005 (host=rrs-hdp-dn04:22000):(Total: 14m8s, non-child: 2.048ms, % non-child: 0.00%)

…

KUDU_SCAN_NODE (id=0):(Total: 14m8s, non-child: 14m8s, % non-child: 100.00%)

BytesRead(16s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 299.00 KB (306176)

- RowsRead: 1.02M (1016239)

- RowsReturned: 1.02M (1016239)

- RowsReturnedRate: 1.20 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 1 (1)

- ScannerThreadsTotalWallClockTime: 14m8s

- MaterializeTupleTime(*): 14m8s

- ScannerThreadsSysTime: 49.992ms

- ScannerThreadsUserTime: 313.952ms

- ScannerThreadsVoluntaryContextSwitches: 1.67K (1669)

- TotalKuduScanRoundTrips: 1.56K (1560)

- TotalReadThroughput: 0.00 /sec

……

Instance 2945a77ff619defe:b658730000000002 (host=rrs-hdp-dn13:22000):(Total: 39s554ms, non-child: 3.704ms, % non-child: 0.01%)

…

KUDU_SCAN_NODE (id=0):(Total: 39s477ms, non-child: 39s477ms, % non-child: 100.00%)

BytesRead(1s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 122.00 KB (124928)

- RowsRead: 1.02M (1016532)

- RowsReturned: 1.02M (1016532)

- RowsReturnedRate: 25.75 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 3 (3)

- ScannerThreadsTotalWallClockTime: 39s478ms

- MaterializeTupleTime(*): 39s474ms

- ScannerThreadsSysTime: 6.998ms

- ScannerThreadsUserTime: 137.979ms

- ScannerThreadsVoluntaryContextSwitches: 100 (100)

- TotalKuduScanRoundTrips: 89 (89)

- TotalReadThroughput: 0.00 /sec

……

Instance 2945a77ff619defe:b658730000000006 (host=rrs-hdp-dn05:22000):(Total: 10s311ms, non-child: 2.554ms, % non-child: 0.02%)

…

KUDU_SCAN_NODE (id=0):(Total: 10s235ms, non-child: 10s235ms, % non-child: 100.00%)

BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 122.00 KB (124928)

- RowsRead: 1.02M (1017341)

- RowsReturned: 1.02M (1017341)

- RowsReturnedRate: 99.39 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 2 (2)

- ScannerThreadsTotalWallClockTime: 10s235ms

- MaterializeTupleTime(*): 10s231ms

- ScannerThreadsSysTime: 20.996ms

- ScannerThreadsUserTime: 107.983ms

- ScannerThreadsVoluntaryContextSwitches: 34 (34)

- TotalKuduScanRoundTrips: 31 (31)

- TotalReadThroughput: 0.00 /sec

……

Instance 2945a77ff619defe:b658730000000001 (host=rrs-hdp-dn03:22000):(Total: 9s700ms, non-child: 2.187ms, % non-child: 0.02%)

…

KUDU_SCAN_NODE (id=0):(Total: 9s632ms, non-child: 9s632ms, % non-child: 100.00%)

BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 122.00 KB (124928)

- RowsRead: 1.02M (1017071)

- RowsReturned: 1.02M (1017071)

- RowsReturnedRate: 105.58 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 1 (1)

- ScannerThreadsTotalWallClockTime: 9s632ms

- MaterializeTupleTime(*): 9s628ms

- ScannerThreadsSysTime: 10.998ms

- ScannerThreadsUserTime: 110.983ms

- ScannerThreadsVoluntaryContextSwitches: 33 (33)

- TotalKuduScanRoundTrips: 31 (31)

- TotalReadThroughput: 0.00 /sec

……

Instance 2945a77ff619defe:b658730000000004 (host=rrs-hdp-dn11:22000):(Total: 8s165ms, non-child: 2.354ms, % non-child: 0.03%)

…

KUDU_SCAN_NODE (id=0):(Total: 8s090ms, non-child: 8s090ms, % non-child: 100.00%)

BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 122.00 KB (124928)

- RowsRead: 1.02M (1018748)

- RowsReturned: 1.02M (1018748)

- RowsReturnedRate: 125.92 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 2 (2)

- ScannerThreadsTotalWallClockTime: 8s091ms

- MaterializeTupleTime(*): 8s085ms

- ScannerThreadsSysTime: 9.998ms

- ScannerThreadsUserTime: 114.982ms

- ScannerThreadsVoluntaryContextSwitches: 31 (31)

- TotalKuduScanRoundTrips: 28 (28)

- TotalReadThroughput: 0.00 /sec

……

Instance 2945a77ff619defe:b658730000000003 (host=rrs-hdp-dn08:22000):(Total: 6s024ms, non-child: 2.396ms, % non-child: 0.04%)

…

KUDU_SCAN_NODE (id=0):(Total: 5s952ms, non-child: 5s952ms, % non-child: 100.00%)

BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

- BytesRead: 0

- KuduRemoteScanTokens: 0 (0)

- NumScannerThreadsStarted: 1 (1)

- PeakMemoryUsage: 122.00 KB (124928)

- RowsRead: 1.01M (1014934)

- RowsReturned: 1.01M (1014934)

- RowsReturnedRate: 170.50 K/sec

- ScanRangesComplete: 1 (1)

- ScannerThreadsInvoluntaryContextSwitches: 0 (0)

- ScannerThreadsTotalWallClockTime: 5s952ms

- MaterializeTupleTime(*): 5s948ms

- ScannerThreadsSysTime: 8.998ms

- ScannerThreadsUserTime: 88.986ms

- ScannerThreadsVoluntaryContextSwitches: 25 (25)

- TotalKuduScanRoundTrips: 22 (22)

- TotalReadThroughput: 0.00 /sec

这里你会看到有6个id=0的KUDU_SCAN_NODE，为什么是6个，因为这里的hosts为6

Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail

----------------------------------------------------------------------------------------------------------------------------------

03:AGGREGATE 1 316.053us 316.053us 1 1 40.00 KB 10.00 MB FINALIZE

02:EXCHANGE 1 14m8s 14m8s 6 1 0 0 UNPARTITIONED

01:AGGREGATE 6 2.746ms 5.205ms 6 1 67.00 KB 10.00 MB

00:SCAN KUDU 6 2m33s 14m8s 6.10M -1 299.00 KB 0 bms_ldgdb.ldg_690marketization_ledger t

现在我可以清楚地看到哪个服务器节点是瓶颈，它是主机rrs-hdp-dn04，用时14分钟8秒，而其他所有主机用时约5秒至40秒。而这6个host平均时间：正好是2m33s

然后我得出结论，该服务器rrs-hdp-dn04出现了一些问题，需要修复。

下面是页面截图：

分区：6个

而刚才找到主机rrs-hdp-dn04，用时最长，所以到rrs-hdp-dn04上去看看：

到此分析完毕。由于集群有专门的运维团队，所以我只是想搞清楚为什么会那么慢，其实我是想集群调优，调整一些参数再测试一下的，但我的权限有限，不能随便进任何服务器，只好到此为止了，以后有机会再更新参数调优和sql优化。

Processed: 0.019, SQL: 9