一、数据准备
join_d1.c1<int> | join_d1.c2<string> | join_d1.c3<double> | join_d1.c4<string> | |
1 | a | 1.1 | a | |
2 | b | 1.2 | b | |
3 | c | 1.3 | c | |
4 | d | 1.4 | d | |
2 | e | 1.5 | e | |
3 | f | 1.6 | f |
二、查看执行计划
hive> explain select c1,upper(c2) from join_d1 where c3 > '1.2';OKSTAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: join_d1 Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (c3 > 1.3) (type: boolean) Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: c1 (type: int), c2 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 10 Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 50 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: 10 Processor Tree: ListSinkTime taken: 0.041 seconds, Fetched: 35 row(s
三、解析执行计划
注意:HIVE执行引擎,是会根据SQL,进行词法分析,解析,生成AST Tree。再生成QB,生成OperatorTree。进行OperatorTree优化,生成MR
这里,我们关心OperatorTree的执行过程.Operator执行是链式的,可以认为是责任链模式。
首先该SQL,分为两个Stage,
Stage0是FetchTask。是依赖于Stage1的。FetchTask会生成FetchOperator。FetchOperator是读取HDFS每一行数据,再Push到Stage1中。
Stage1是 TableScanOperator -> FilterOperator -> SelectOperator -> . 最后调用Stage0的 ListSinkOperator即结果输出。
过程如下:
FetchOperator从HDFS上读取第一行数据1,a,1.1,a -> TableScanOperator(可做limit限制) -> FilterOperator (1.1 > 1.2) false,则该行抛弃,下一行
FetchOperator从HDFS上读取第一行数据2,b,1.2,b -> TableScanOperator(可做limit限制) -> FilterOperator (1.2 > 1.2) false,则该行抛弃,下一行
FetchOperator从HDFS上读取第三行数据3,c,1.3,c -> TableScanOperator(可做limit限制) -> FilterOperator (1.3 > 1.2) true -> SelectOperator 选择c1,c2 -> ListSinkOperator
.. 依次迭代下去