带你走进Impala query profile第三篇

    技术2023-11-01  106

    在本博客系列的第3篇,我将仍然关注查询计划(Query Plan)和执行摘要(Execution Summary)。

    现在,让我们看看Profile的Planner Timeline和Query Timeline部分:

    Planner Timeline: 21.773ms

           - Analysis finished: 1.432ms (1.432ms)

           - Equivalence classes computed: 1.478ms (46.487us)

           - Single node plan created: 21.005ms (19.526ms)

           - Runtime filters computed: 21.020ms (14.599us)

           - Distributed plan created: 21.096ms (76.103us)

           - Lineage info computed: 21.156ms (59.924us)

           - Planning finished: 21.773ms (617.260us)

        Query Timeline: 14m8s

           - Query submitted: 250.799us (250.799us)

           - Planning finished: 24.359ms (24.108ms)

           - Submit for admission: 25.375ms (1.016ms)

           - Completed admission: 25.565ms (189.521us)

           - Ready to start on 6 backends: 26.997ms (1.432ms)

           - All 6 execution backends (7 fragment instances) started: 33.022ms (6.024ms)

           - Rows available: 14m8s (14m8s)

           - First row fetched: 14m8s (695.112ms)

           - Unregister query: 14m8s (5.330ms)

         - ComputeScanRangeAssignmentTimer: 110.741us

    每行的信息都很容易理解,我们可以看到运行查询计划花费了21.773ms,从25.375ms开始向admission提交查询计划,从33.022ms开始在工作节点上执行查询计划,在14m8s时准备好最后一行数据并在14m8s时第一行数据被客户端获取(fetch)。这使你可以很清楚地了解每个阶段所花的时间,如果任何阶段都很慢,那将是非常明显的,然后我们可以开始进一步深入研究,以了解可能发生了什么。

    从查询时间轴(Query Timeline)中,我们可以清楚地看到,从开始执行(一共6个执行后端(All 6 execution backends))到数据可用(可用行(Rows available))几乎花费了14分钟(14m8s)。由于这里只是count(*),所以没有最后一行。

    比如下面的案例:(另一个profile)

        Query Compilation: 16.268ms

           - Metadata of all 1 tables cached: 1.786ms (1.786ms)

           - Analysis finished: 6.162ms (4.376ms)

           - Value transfer graph computed: 6.537ms (374.918us)

           - Single node plan created: 7.955ms (1.417ms)

           - Runtime filters computed: 8.274ms (318.815us)

           - Distributed plan created: 8.430ms (156.307us)

           - Lineage info computed: 9.664ms (1.234ms)

           - Planning finished: 16.268ms (6.603ms)

        Query Timeline: 35m46s

           - Query submitted: 0.000ns (0.000ns)

           - Planning finished: 22.001ms (22.001ms)

           - Submit for admission: 23.001ms (1.000ms)

           - Completed admission: 23.001ms (0.000ns)

           - Ready to start on 2 backends: 24.001ms (1.000ms)

           - All 2 execution backends (2 fragment instances) started: 36.001ms (12.000ms)

           - Rows available: 5m51s (5m51s)

           - First row fetched: 5m52s (950.045ms)

           - Last row fetched: 35m46s (29m53s)

           - Released admission control resources: 35m46s (1.000ms)

           - Unregister query: 35m46s (30.001ms)

         - ComputeScanRangeAssignmentTimer: 0.000ns

    Impala查询运行了很长时间,客户想要找出原因。从查询时间轴(Query Timeline)中,我们可以清楚地看到,从开始执行(一共两个执行后端(All 2 execution backends))到数据可用(可用行(Rows available))几乎花费了6分钟(5m51s)。这6分钟的执行可能是正常的,就像有很多大数据集的连接(join)一样,查询运行几分钟是很常见的。

    但是,我们可以注意到Impala花了30分钟将数据传递回客户端,因为第一行在第6分钟获取,而最后一行在第36分钟获取。因此,从这里,我们可以怀疑Impala协调器(coordinator)和客户端之间可能存在一些网络问题(当从客户端,如impala-shell或Hue,到Impala协调器主机获取数据时)。另一种可能性是客户端可能在获取结果时也在执行其他操作,如在屏幕上打印,因为返回的数据可能很大,该操作可能很耗时。

    因此,这部分概要信息可以指引我们找到寻找瓶颈的正确方向。

    几个关键的信息:

    1)反向阅读,从下到上,因为这是Impala的操作顺序

    2)比较“Avg Time”和“Max Time”列的数据

    3)比较“Rows”和“Est. #Rows”列的数据

    4)检查“Detail”列以查看每个操作的Join类型

    Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail

    ----------------------------------------------------------------------------------------------------------------------------------

    03:AGGREGATE        1  316.053us  316.053us      1           1   40.00 KB       10.00 MB  FINALIZE

    02:EXCHANGE         1      14m8s      14m8s      6           1          0              0  UNPARTITIONED

    01:AGGREGATE        6    2.746ms    5.205ms      6           1   67.00 KB       10.00 MB

    00:SCAN KUDU        6      2m33s      14m8s  6.10M          -1  299.00 KB              0  bms_ldgdb.ldg_690marketization_ledger t

    很快,我就注意到扫描KUDU(SCAN KUDU)操作的“Avg Time”和“Max Time”的差别很大,平均时间为2分33秒,但是6个主机中的一个主机的最长时间为14分8秒。

    因此,我的下一个想法是确定哪些主机的执行速度比其他主机慢,以及是否来自同一主机。为此,我使用字符串“id=0”来搜索Profile文件,“0”是Summary部分“00:SCAN KUDU”中每一行开头的操作编号。

    我从文件开头搜索到了“id=0”部分:这里我只过滤出我想要的内容。

    ……

    Instance 2945a77ff619defe:b658730000000005 (host=rrs-hdp-dn04:22000):(Total: 14m8s, non-child: 2.048ms, % non-child: 0.00%)

    KUDU_SCAN_NODE (id=0):(Total: 14m8s, non-child: 14m8s, % non-child: 100.00%)

              BytesRead(16s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 299.00 KB (306176)

               - RowsRead: 1.02M (1016239)

               - RowsReturned: 1.02M (1016239)

               - RowsReturnedRate: 1.20 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 1 (1)

               - ScannerThreadsTotalWallClockTime: 14m8s

                 - MaterializeTupleTime(*): 14m8s

                 - ScannerThreadsSysTime: 49.992ms

                 - ScannerThreadsUserTime: 313.952ms

               - ScannerThreadsVoluntaryContextSwitches: 1.67K (1669)

               - TotalKuduScanRoundTrips: 1.56K (1560)

               - TotalReadThroughput: 0.00 /sec

    ……

    Instance 2945a77ff619defe:b658730000000002 (host=rrs-hdp-dn13:22000):(Total: 39s554ms, non-child: 3.704ms, % non-child: 0.01%)

    KUDU_SCAN_NODE (id=0):(Total: 39s477ms, non-child: 39s477ms, % non-child: 100.00%)

              BytesRead(1s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 122.00 KB (124928)

               - RowsRead: 1.02M (1016532)

               - RowsReturned: 1.02M (1016532)

               - RowsReturnedRate: 25.75 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 3 (3)

               - ScannerThreadsTotalWallClockTime: 39s478ms

                 - MaterializeTupleTime(*): 39s474ms

                 - ScannerThreadsSysTime: 6.998ms

                 - ScannerThreadsUserTime: 137.979ms

               - ScannerThreadsVoluntaryContextSwitches: 100 (100)

               - TotalKuduScanRoundTrips: 89 (89)

               - TotalReadThroughput: 0.00 /sec

    ……

    Instance 2945a77ff619defe:b658730000000006 (host=rrs-hdp-dn05:22000):(Total: 10s311ms, non-child: 2.554ms, % non-child: 0.02%)

    KUDU_SCAN_NODE (id=0):(Total: 10s235ms, non-child: 10s235ms, % non-child: 100.00%)

              BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 122.00 KB (124928)

               - RowsRead: 1.02M (1017341)

               - RowsReturned: 1.02M (1017341)

               - RowsReturnedRate: 99.39 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 2 (2)

               - ScannerThreadsTotalWallClockTime: 10s235ms

                 - MaterializeTupleTime(*): 10s231ms

                 - ScannerThreadsSysTime: 20.996ms

                 - ScannerThreadsUserTime: 107.983ms

               - ScannerThreadsVoluntaryContextSwitches: 34 (34)

               - TotalKuduScanRoundTrips: 31 (31)

               - TotalReadThroughput: 0.00 /sec

    ……

    Instance 2945a77ff619defe:b658730000000001 (host=rrs-hdp-dn03:22000):(Total: 9s700ms, non-child: 2.187ms, % non-child: 0.02%)

    KUDU_SCAN_NODE (id=0):(Total: 9s632ms, non-child: 9s632ms, % non-child: 100.00%)

              BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 122.00 KB (124928)

               - RowsRead: 1.02M (1017071)

               - RowsReturned: 1.02M (1017071)

               - RowsReturnedRate: 105.58 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 1 (1)

               - ScannerThreadsTotalWallClockTime: 9s632ms

                 - MaterializeTupleTime(*): 9s628ms

                 - ScannerThreadsSysTime: 10.998ms

                 - ScannerThreadsUserTime: 110.983ms

               - ScannerThreadsVoluntaryContextSwitches: 33 (33)

               - TotalKuduScanRoundTrips: 31 (31)

               - TotalReadThroughput: 0.00 /sec

    ……

    Instance 2945a77ff619defe:b658730000000004 (host=rrs-hdp-dn11:22000):(Total: 8s165ms, non-child: 2.354ms, % non-child: 0.03%)

    KUDU_SCAN_NODE (id=0):(Total: 8s090ms, non-child: 8s090ms, % non-child: 100.00%)

              BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 122.00 KB (124928)

               - RowsRead: 1.02M (1018748)

               - RowsReturned: 1.02M (1018748)

               - RowsReturnedRate: 125.92 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 2 (2)

               - ScannerThreadsTotalWallClockTime: 8s091ms

                 - MaterializeTupleTime(*): 8s085ms

                 - ScannerThreadsSysTime: 9.998ms

                 - ScannerThreadsUserTime: 114.982ms

               - ScannerThreadsVoluntaryContextSwitches: 31 (31)

               - TotalKuduScanRoundTrips: 28 (28)

               - TotalReadThroughput: 0.00 /sec

    ……

    Instance 2945a77ff619defe:b658730000000003 (host=rrs-hdp-dn08:22000):(Total: 6s024ms, non-child: 2.396ms, % non-child: 0.04%)

    KUDU_SCAN_NODE (id=0):(Total: 5s952ms, non-child: 5s952ms, % non-child: 100.00%)

              BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

               - BytesRead: 0

               - KuduRemoteScanTokens: 0 (0)

               - NumScannerThreadsStarted: 1 (1)

               - PeakMemoryUsage: 122.00 KB (124928)

               - RowsRead: 1.01M (1014934)

               - RowsReturned: 1.01M (1014934)

               - RowsReturnedRate: 170.50 K/sec

               - ScanRangesComplete: 1 (1)

               - ScannerThreadsInvoluntaryContextSwitches: 0 (0)

               - ScannerThreadsTotalWallClockTime: 5s952ms

                 - MaterializeTupleTime(*): 5s948ms

                 - ScannerThreadsSysTime: 8.998ms

                 - ScannerThreadsUserTime: 88.986ms

               - ScannerThreadsVoluntaryContextSwitches: 25 (25)

               - TotalKuduScanRoundTrips: 22 (22)

               - TotalReadThroughput: 0.00 /sec

    这里你会看到有6个id=0的KUDU_SCAN_NODE,为什么是6个,因为这里的hosts为6

    Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail

    ----------------------------------------------------------------------------------------------------------------------------------

    03:AGGREGATE        1  316.053us  316.053us      1           1   40.00 KB       10.00 MB  FINALIZE

    02:EXCHANGE         1      14m8s      14m8s      6           1          0              0  UNPARTITIONED

    01:AGGREGATE        6    2.746ms    5.205ms      6           1   67.00 KB       10.00 MB

    00:SCAN KUDU        6      2m33s      14m8s  6.10M          -1  299.00 KB              0  bms_ldgdb.ldg_690marketization_ledger t

    现在我可以清楚地看到哪个服务器节点是瓶颈,它是主机rrs-hdp-dn04,用时14分钟8秒,而其他所有主机用时约5秒至40秒。而这6个host平均时间:正好是2m33s

    然后我得出结论,该服务器rrs-hdp-dn04出现了一些问题,需要修复。

    下面是页面截图:

    分区:6个

    而刚才找到主机rrs-hdp-dn04,用时最长,所以到rrs-hdp-dn04上去看看:

    到此分析完毕。由于集群有专门的运维团队,所以我只是想搞清楚为什么会那么慢,其实我是想集群调优,调整一些参数再测试一下的,但我的权限有限,不能随便进任何服务器,只好到此为止了,以后有机会再更新参数调优和sql优化。

    Processed: 0.012, SQL: 9