SlideShare a Scribd company logo
1 of 58
Download to read offline
Debugging Planning Issues
Using Calcite's Built-in Loggers
Stamatis Zampetakis
Alessandro Solimando
Virtual Meetup
March 15, 2023
Alessandro Solimando @asolimando
Senior Software Engineer @ Rakuten*
Committer of Apache Calcite & Apache Hive
PhD in Data Management, University of Genoa
(* the work presented here was done while at Cloudera)
Stamatis Zampetakis @szampetak
Staff Software Engineer @ Cloudera, Hive query optimizer team
PMC member of Apache Calcite & Apache Hive
PhD in Data Management, INRIA & Paris-Sud University
About us
Outline
● Motivation / Common planning issues
● Calcite’s built-in logging (RuleEventLogger)
○ Configuration (XML/Properties)
○ Output explained
● Hive case studies:
○ Hanging TPC-DS queries
○ OutOfMemoryError (HIVE-25758)
○ Wrong results (HIVE-26722)
● Conclusion
Common Planning Issues
Issue Diagnostic Information
OutOfMemoryError Heap dumps
StackOverflowError Stack traces
Unresponsive server Stack traces
Very slow (or hanging) queries Stack traces
Wrong results Query plans
Query crashes (NPE, AssertionError, ClassCastException) Stack traces
Common Planning Issues
● Many times the root cause lies in rule based transformations
Issue Diagnostic Information
OutOfMemoryError Heap dumps
StackOverflowError Stack traces
Unresponsive server Stack traces
Very slow (or hanging) queries Stack traces
Wrong results Query plans
Query crashes (NPE, AssertionError, ClassCastException) Stack traces
Common Planning Issues
● Many times the root cause lies in rule based transformations
● Good logs are key for finding this root cause
Issue Diagnostic Information
OutOfMemoryError Heap dumps
StackOverflowError Stack traces
Unresponsive server Stack traces
Very slow (or hanging) queries Stack traces
Wrong results Query plans
Query crashes (NPE, AssertionError, ClassCastException) Stack traces
Common Planning Issues
● Interactive debugging not always feasible:
○ CI/CU environment with restricted access
○ Data specific problem
● Debugging planning issues interactively (without logs) is difficult:
○ Large number of transformations
○ Complex transformation logic
○ Lots of intermediate objects
○ Repeated occurrences of transformations
Built-in logging: RuleEventLogger
RuleEventLogger - XML Configuration
https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration
Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file
<Configuration>
<Appenders>
<Console name="A1" target="SYSTEM_OUT">
<PatternLayout
pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="A1"/>
</Root>
<logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG">
<MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/>
</logger>
</Loggers>
</Configuration>
RuleEventLogger - XML Configuration
https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration
Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file
<Configuration>
<Appenders>
<Console name="A1" target="SYSTEM_OUT">
<PatternLayout
pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="A1"/>
</Root>
<logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG">
<MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/>
</logger>
</Loggers>
</Configuration>
RuleEventLogger - XML Configuration
https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration
Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file
<Configuration>
<Appenders>
<Console name="A1" target="SYSTEM_OUT">
<PatternLayout
pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="A1"/>
</Root>
<logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG">
<MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/>
</logger>
</Loggers>
</Configuration>
CALCITE-4704 (1.29.0)
CALCITE-4991 (1.30.0)
RuleEventLogger - Properties Configuration
appenders = A1
appender.A1.type = Console
appender.A1.name = A1
appender.A1.target = SYSTEM_OUT
appender.A1.layout.type = PatternLayout
appender.A1.layout.pattern = %m%n
loggers = CBORuleLogger
rootLogger.level = INFO
rootLogger.appenderRefs = A1
rootLogger.appenderRef.A1.ref = A1
logger.CBORuleLogger.name = org.apache.calcite.plan.RelOptPlanner
logger.CBORuleLogger.level = DEBUG
logger.CBORuleLogger.filter.marker.type= MarkerFilter
logger.CBORuleLogger.filter.marker.marker= FULL_PLAN
logger.CBORuleLogger.filter.marker.onMatch= DENY
logger.CBORuleLogger.filter.marker.onMismatch= NEUTRAL
https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration
Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.properties file
CALCITE-4704 (1.29.0)
CALCITE-4991 (1.30.0)
RuleEventLogger - Properties Configuration (HIVE)
appenders = A1
appender.A1.type = Console
appender.A1.name = A1
appender.A1.target = SYSTEM_OUT
appender.A1.layout.type = PatternLayout
appender.A1.layout.pattern = %m%n
loggers = …,CBORuleLogger
rootLogger.level = INFO
rootLogger.appenderRefs = A1
rootLogger.appenderRef.A1.ref = A1
logger.CBORuleLogger.name = org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger
logger.CBORuleLogger.level = DEBUG
logger.CBORuleLogger.filter.marker.type= MarkerFilter
logger.CBORuleLogger.filter.marker.marker= FULL_PLAN
logger.CBORuleLogger.filter.marker.onMatch= DENY
logger.CBORuleLogger.filter.marker.onMismatch= NEUTRAL
Hive tests: Modify data/conf/hive-log4j2.properties file
HIVE-25816
RuleEventLogger - Example
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
logger.CBORuleLogger.filter.marker.onMatch= DENY
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Full plan for rule input [rel#11:LogicalAggregate]:
LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT($2)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#0: Full plan for [rel#15:LogicalProject]:
LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
logger.CBORuleLogger.filter.marker.onMatch= ACCEPT
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate]
call#0: Full plan for rule input [rel#11:LogicalAggregate]:
LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT($2)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject]
call#0: Full plan for [rel#15:LogicalProject]:
LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
logger.CBORuleLogger.filter.marker.onMatch= ACCEPT
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject]
call#1: Full plan for rule input [rel#13:LogicalProject]:
LogicalProject(EMPNO=[$0], EXPR$1=[$2])
LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
call#1: Full plan for rule input [rel#15:LogicalProject]:
...
call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject]
call#1: Full plan for [rel#17:LogicalProject]:
LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
logger.CBORuleLogger.filter.marker.onMatch= ACCEPT
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
RuleEventLogger - Output Explained
./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3
RelOptRulesTest > testAggregateRemove3() STANDARD_OUT
call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject]
call#2: Full plan for rule input [rel#17:LogicalProject]:
LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)])
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
call#2: Full plan for rule input [rel#9:LogicalProject]:
LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject]
call#2: Full plan for [rel#19:LogicalProject]:
LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($3), 1:BIGINT, 0:BIGINT)])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
logger.CBORuleLogger.filter.marker.onMatch= ACCEPT
SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
Hive Case Studies
Hanging TPC-DS queries
● Context: Upgrade Calcite version from 1.25.0 to 1.33.0
● Symptom: TPCDS queries hanging
Hanging TPC-DS queries
● Context: Upgrade Calcite version from 1.25.0 to 1.33.0
● Symptom: TPCDS queries hanging (e.g., query13)
select avg(ss_quantity),
avg(ss_ext_sales_price),
avg(ss_ext_wholesale_cost),
sum(ss_ext_wholesale_cost)
from store_sales
,store
,customer_demographics
,household_demographics
,customer_address
,date_dim
where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and (
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M'
and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3) or
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D'
and cd_education_status = 'Primary' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1) or
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U'
and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1))
and(
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 100 and 200) or
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 300) or
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 250));
Hanging TPC-DS queries
● Context: Upgrade Calcite version from 1.25.0 to 1.33.0
● Symptom: TPCDS queries hanging (e.g., query13)
select avg(ss_quantity),
avg(ss_ext_sales_price),
avg(ss_ext_wholesale_cost),
sum(ss_ext_wholesale_cost)
from store_sales
,store
,customer_demographics
,household_demographics
,customer_address
,date_dim
where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and (
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M'
and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3) or
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D'
and cd_education_status = 'Primary' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1) or
(ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U'
and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1))
and(
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 100 and 200) or
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 300) or
(ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 250));
Hanging TPC-DS queries
● Why does it take so long?
● Where is it stuck?
● What does it do?
● Gather information:
○ Profile the application (async-profiler)
○ Collect stack traces (jstack)
○ Check the logs
Hanging TPC-DS queries
● Why does it take so long?
● Where is it stuck?
● What does it do?
● Gather information:
○ Profile the application (async-profiler)
○ Collect stack traces (jstack)
○ Check the logs
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272)
at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222)
at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225)
at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64)
at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41)
at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778)
at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761)
at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279)
at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272)
at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222)
at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225)
at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64)
at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41)
at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778)
at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761)
at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279)
at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
Depth:~ 2K
Lines
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272)
at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222)
at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225)
at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64)
at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41)
at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778)
at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761)
at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279)
at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
Depth:~ 2K
Lines
From stack + code we can infer that the plan has more than 1K nested Filter operators
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272)
at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222)
at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225)
at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64)
at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41)
at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793)
at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778)
at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761)
at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279)
at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248)
at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source)
at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source)
at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
Depth:~ 2K
Lines
From stack + code we can infer that the plan has more than 1K nested Filter operators
What creates the operators? Why?
Hanging TPC-DS queries - Logs to the rescue
grep -A 10 "Rule.*produced" hive.log
Hanging TPC-DS queries - Logs to the rescue
2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter]
2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
HiveTableScan(table=[[default, store]], table:alias=[store])
--
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter]
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
--
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter]
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
Hanging TPC-DS queries - Logs to the rescue
2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter]
2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
HiveTableScan(table=[[default, store]], table:alias=[store])
--
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter]
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
--
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter]
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
Hanging TPC-DS queries - Root cause
2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter]
2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
HiveTableScan(table=[[default, store]], table:alias=[store])
--
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter]
2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
--
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter]
2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]:
HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '),
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
HivePreFilteringRule matches infinitely and creates identical filters multiple times
Hanging TPC-DS queries - Interactive debugging
OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
Hanging TPC-DS queries - Interactive debugging
OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
// 3. If the new conjuncts are already present in the plan, we bail out
final List<RexNode> newConjuncts =
HiveCalciteUtil.getPredsNotPushedAlready(filter.getInput(),operandsToPushDown);
Hanging TPC-DS queries - Interactive debugging
OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1))
// 3. If the new conjuncts are already present in the plan, we bail out
final List<RexNode> newConjuncts =
HiveCalciteUtil.getPredsNotPushedAlready(filter.getInput(),operandsToPushDown);
mq.getPulledUpPredicates(inp).pulledUpPredicates
Hanging TPC-DS queries - Interactive debugging
OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1))
Hanging TPC-DS queries - Interactive debugging
OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1))
CALCITE-4173 Add internal SEARCH operator and Sarg literal, replacing use of IN in RexCall (1.26.0)
CALCITE-5036 RelMetadataQuery#getPulledUpPredicates support to analyze constant key for the operator of
IS_NOT_DISTINCT_FROM (1.31.0)
Check What CBO Rules Are Applied
Command: grep --no-filename "produced" hive.log | cut -c 100-
Output format: call#$X: Rule [$RULE_DESCRIPTION] produced [rel#$Y:$REL_KIND]
Sample output:
call#1: Rule [HivePreFilteringRule] produced [rel#73:HiveFilter]
call#12: Rule [ReduceExpressionsRule(Filter)] produced [rel#76:HiveFilter]
call#14: Rule [FilterCondition] produced [rel#78:HiveFilter]
call#17: Rule [ReduceExpressionsRule(Filter)] produced [rel#80:HiveFilter]
call#21: Rule [FilterCondition] produced [rel#82:HiveFilter]
call#29: Rule [HiveProjectFilterPullUpConstantsRule] produced [rel#84:HiveProject]
call#46: Rule [HiveJoinAddNotNullRule] produced [rel#89:HiveJoin]
Once you have the rel number ($Y), you can look it up in the logs to see it printed in full
If you are interested in a particular rule application, search for the call ($X), input and output rels are printed close to the
corresponding “produced” line
OutOfMemory at Planning Time
● We look for a sequence of rule applications which loops from a certain point
of the query planning process
● Run “EXPLAIN” against the offending query and look at the rules which are
invoked in the logs to identify the loop
● Once identified, find the first occurrence of applications of this list, and start
analyzing the plans generated right before/after it
● Note: the rel triggering the loop might also be created outside this sequence
OutOfMemory at Planning Time - Example (HIVE-25758)
SELECT c.month, d.con_usd
FROM
(SELECT
cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(),
'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
FROM test1
UNION ALL
SELECT month
FROM test2
WHERE month = 202110
) c
JOIN test3 d ON c.month = d.mth;
● Symptoms: OOM due to the creation of ever increasingly complex filters
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not
available])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(...):INTEGER**])
HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...):INTEGER)])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[CAST(202110):INTEGER])
HiveFilter(condition=[=($0, 202110)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, test3]], table:alias=[d])
** Abbreviation of “CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
_UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647)
CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET
"UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET
"UTF-16LE")):INTEGER”
OutOfMemory at Planning Time - Example (HIVE-25758)
call#198: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#136:HiveJoin]
call#212: Rule [HiveFilterSetOpTransposeRule] produced [rel#142:HiveUnion]
call#223: Rule [HiveFilterProjectTransposeRule] produced [rel#148:HiveProject]
call#228: Rule [HiveFilterMergeRule] produced [rel#152:HiveFilter]
call#239: Rule [HiveFilterProjectTransposeRule] produced [rel#153:HiveProject]
call#258: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#155:HiveJoin]
call#273: Rule [HiveFilterProjectTransposeRule] produced [rel#160:HiveProject]
call#278: Rule [HiveFilterMergeRule] produced [rel#164:HiveFilter]
call#283: Rule [ReduceExpressionsRule(Filter)] produced [rel#166:HiveFilter]
RHS → LHS
LHS → RHS
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveJoin(condition=[=($0, $1)], joinType=[inner], [...])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(...)):INTEGER**])
HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...):INTEGER)])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[CAST(202110):INTEGER])
HiveFilter(condition=[=($0, 202110)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, test3]], table:alias=[d])
** Abbreviation of “CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
_UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647)
CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET
"UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET
"UTF-16LE")):INTEGER”
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveJoin(condition=[=($0, $1)], joinType=[inner], [...])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(...)):INTEGER])
HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[CAST(202110):INTEGER])
HiveFilter(condition=[=($0, 202110)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveFilter(condition=[OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0,
202110))])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, test3]], table:alias=[d])
call#258: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#155:HiveJoin]
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveJoin(condition=[=($0, $1)], joinType=[inner], [...])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(...)):INTEGER])
HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[CAST(202110):INTEGER])
HiveFilter(condition=[=($0, 202110)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveFilter(condition=[AND(OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0,
202110)), IS NOT NULL($0))])
HiveTableScan(table=[[default, test3]], table:alias=[d])
call#278: Rule [HiveFilterMergeRule] produced [rel#164:HiveFilter]
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveJoin(condition=[=($0, $1)], joinType=[inner], [...])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(...)):INTEGER])
HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[CAST(202110):INTEGER])
HiveFilter(condition=[=($0, 202110)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveFilter(condition=[AND(OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0,
202110)), IS NOT NULL($0))])
HiveTableScan(table=[[default, test3]], table:alias=[d])
ReduceExpressionsRule(Filter) can’t simplify the predicate
OutOfMemory at Planning Time - Example (HIVE-25758)
HiveFilter(condition=[
AND(
IN($0, CAST(regexp_replace(...)):INTEGER, 202110),
OR(
AND(
OR(
IS NOT NULL(CAST(regexp_replace(...)):INTEGER),
=(CAST(regexp_replace(...)):INTEGER, 202110)
),
=($0, CAST(regexp_replace(...)):INTEGER)
),
=($0, 202110)
)
)
])
Incomplete / Incorrect Plan
● Usually boils down to identifying the first rel that looks “bad”, and trace it back
by looking for the rel#, which is unique, and should appear in the preceding
part of the logs
● Resolution highly depends on the specific issue, but with a precise call# or
rel# it’s possible to set a conditional breakpoint (rules can be invoked multiple
times before producing the issue)
Incomplete / Incorrect Plan Hive - Example (HIVE-26722)
● OK:
SELECT * FROM (
SELECT a, b FROM t
UNION ALL
SELECT a, NULL FROM t
) AS t2 WHERE a = 1;
● KO: (missing results)
SELECT * FROM (
SELECT a, b FROM t
UNION ALL
SELECT a, cast(NULL as STRING) FROM t
) AS t2 WHERE a = 1;
HiveUnion(all=[true])
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1)])
HiveTableScan(table=[[default, t]], table:alias=[t])
HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647)
CHARACTER SET "UTF-16LE"])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1)])
HiveTableScan(table=[[default, t]], table:alias=[t])
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1)])
HiveTableScan(table=[[default, t]], table:alias=[t])
Incomplete / Incorrect Plan Hive - Example (HIVE-26722)
call#5: Rule [HiveFilterProjectTransposeRule]
produced [rel#41:HiveFilter]
call#6: Rule [HiveFilterSetOpTransposeRule]
produced [rel#43:HiveFilter
]
call#7: Rule [HiveFilterProjectTransposeRule]
produced [rel#47:HiveProject]
call#24: Rule [HivePartitionPruneRule(Filter)]
produced [rel#52:HiveFilter]
call#32: Rule [HiveFieldTrimmerRule] produced
[rel#70:HiveFilter]
call#41: Rule [HiveFilterProjectTSTransposeRule]
produced [rel#77:HiveProject]
call#43: Rule [HiveCardinalityPreservingJoinRule]
produced [rel#87:HiveProject]
call#5: Rule [HiveFilterProjectTransposeRule]
produced [rel#47:HiveFilter]
call#6: Rule [HiveFilterSetOpTransposeRule]
produced [rel#51:HiveUnion
]
call#7: Rule [HiveFilterProjectTransposeRule]
produced [rel#57:HiveProject]
call#13: Rule [HiveFilterProjectTransposeRule]
produced [rel#62:HiveProject]
call#15: Rule [ReduceExpressionsRule(Project)]
produced [rel#66:HiveProject]
call#19: Rule [HiveFilterProjectTransposeRule]
produced [rel#69:HiveProject]
call#49: Rule [HivePartitionPruneRule(Filter)]
produced [rel#73:HiveFilter]
call#64: Rule [HiveFieldTrimmerRule] produced
...
Disabling Rules via Configuration
● AbstractRelOptPlanner#setRuleDescExclusionFilter allows to exclude rules
based on a regex over their description, even if registered in the planner
● Benefits:
○ It avoids recompiling to exclude some rules from planning while troubleshooting
○ Can be a quick workaround to customers once the faulty rule(s) is identified
○ Can be activated “per-query”
○ Less invasive than disabling CBO altogether (e.g., “hive.cbo.enable=false”)
● HIVE-25880: “Add property to exclude CBO rules by a regex on their
description”
● Example:
set hive.cbo.rule.exclusion.regex=HiveJoinPushTransitivePredicatesRule|HivePreFilteringRule;
Conclusion
● Many planning issues boil down to rule transformations making plans bigger
and bigger
● Built-in loggers indispensable for fast troubleshooting
● Easy configuration via XML/Property files (Log4j2)
● Configurable verbosity via FULL_PLAN marker
● Common pain points:
○ Using multiple equivalent operators (OR, SEARCH, IN, etc.)
○ Push/Pull predicate logic in various rules
○ Inconsistent simplifications during planning
● Exploit AbstractRelOptPlanner#setRuleDescExclusionFilter for quick
workarounds and hypotheses verification
Thank you

More Related Content

Similar to Debugging Planning Issues Using Calcite's Built-in Loggers

OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at Runtime
OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at RuntimeOSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at Runtime
OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at RuntimeNETWAYS
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務Mu Chun Wang
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?Enkitec
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application MonitoringRavi Okade
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoringGarelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoringJano Suchal
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVMSylvain Wallez
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
 
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Brian Brazil
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?Wojciech Barczyński
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Jen Costillo
 
.NET @ apache.org
 .NET @ apache.org .NET @ apache.org
.NET @ apache.orgTed Husted
 

Similar to Debugging Planning Issues Using Calcite's Built-in Loggers (20)

OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at Runtime
OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at RuntimeOSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at Runtime
OSMC 2021 | inspectIT Ocelot: Dynamic OpenTelemetry Instrumentation at Runtime
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application Monitoring
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoringGarelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoring
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Sprint 71
Sprint 71Sprint 71
Sprint 71
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2
 
.NET @ apache.org
 .NET @ apache.org .NET @ apache.org
.NET @ apache.org
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Debugging Planning Issues Using Calcite's Built-in Loggers

  • 1. Debugging Planning Issues Using Calcite's Built-in Loggers Stamatis Zampetakis Alessandro Solimando Virtual Meetup March 15, 2023
  • 2. Alessandro Solimando @asolimando Senior Software Engineer @ Rakuten* Committer of Apache Calcite & Apache Hive PhD in Data Management, University of Genoa (* the work presented here was done while at Cloudera) Stamatis Zampetakis @szampetak Staff Software Engineer @ Cloudera, Hive query optimizer team PMC member of Apache Calcite & Apache Hive PhD in Data Management, INRIA & Paris-Sud University About us
  • 3. Outline ● Motivation / Common planning issues ● Calcite’s built-in logging (RuleEventLogger) ○ Configuration (XML/Properties) ○ Output explained ● Hive case studies: ○ Hanging TPC-DS queries ○ OutOfMemoryError (HIVE-25758) ○ Wrong results (HIVE-26722) ● Conclusion
  • 4. Common Planning Issues Issue Diagnostic Information OutOfMemoryError Heap dumps StackOverflowError Stack traces Unresponsive server Stack traces Very slow (or hanging) queries Stack traces Wrong results Query plans Query crashes (NPE, AssertionError, ClassCastException) Stack traces
  • 5. Common Planning Issues ● Many times the root cause lies in rule based transformations Issue Diagnostic Information OutOfMemoryError Heap dumps StackOverflowError Stack traces Unresponsive server Stack traces Very slow (or hanging) queries Stack traces Wrong results Query plans Query crashes (NPE, AssertionError, ClassCastException) Stack traces
  • 6. Common Planning Issues ● Many times the root cause lies in rule based transformations ● Good logs are key for finding this root cause Issue Diagnostic Information OutOfMemoryError Heap dumps StackOverflowError Stack traces Unresponsive server Stack traces Very slow (or hanging) queries Stack traces Wrong results Query plans Query crashes (NPE, AssertionError, ClassCastException) Stack traces
  • 7. Common Planning Issues ● Interactive debugging not always feasible: ○ CI/CU environment with restricted access ○ Data specific problem ● Debugging planning issues interactively (without logs) is difficult: ○ Large number of transformations ○ Complex transformation logic ○ Lots of intermediate objects ○ Repeated occurrences of transformations
  • 9. RuleEventLogger - XML Configuration https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file <Configuration> <Appenders> <Console name="A1" target="SYSTEM_OUT"> <PatternLayout pattern="%m%n"/> </Console> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="A1"/> </Root> <logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG"> <MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/> </logger> </Loggers> </Configuration>
  • 10. RuleEventLogger - XML Configuration https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file <Configuration> <Appenders> <Console name="A1" target="SYSTEM_OUT"> <PatternLayout pattern="%m%n"/> </Console> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="A1"/> </Root> <logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG"> <MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/> </logger> </Loggers> </Configuration>
  • 11. RuleEventLogger - XML Configuration https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.xml file <Configuration> <Appenders> <Console name="A1" target="SYSTEM_OUT"> <PatternLayout pattern="%m%n"/> </Console> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="A1"/> </Root> <logger name="org.apache.calcite.plan.RelOptPlanner"level="DEBUG"> <MarkerFilter marker="FULL_PLAN" onMatch="DENY" onMismatch="NEUTRAL"/> </logger> </Loggers> </Configuration> CALCITE-4704 (1.29.0) CALCITE-4991 (1.30.0)
  • 12. RuleEventLogger - Properties Configuration appenders = A1 appender.A1.type = Console appender.A1.name = A1 appender.A1.target = SYSTEM_OUT appender.A1.layout.type = PatternLayout appender.A1.layout.pattern = %m%n loggers = CBORuleLogger rootLogger.level = INFO rootLogger.appenderRefs = A1 rootLogger.appenderRef.A1.ref = A1 logger.CBORuleLogger.name = org.apache.calcite.plan.RelOptPlanner logger.CBORuleLogger.level = DEBUG logger.CBORuleLogger.filter.marker.type= MarkerFilter logger.CBORuleLogger.filter.marker.marker= FULL_PLAN logger.CBORuleLogger.filter.marker.onMatch= DENY logger.CBORuleLogger.filter.marker.onMismatch= NEUTRAL https://logging.apache.org/log4j/2.x/manual/configuration.html#automatic-configuration Calcite unit tests: Add/Modify core/src/test/resources/log4j2-test.properties file CALCITE-4704 (1.29.0) CALCITE-4991 (1.30.0)
  • 13. RuleEventLogger - Properties Configuration (HIVE) appenders = A1 appender.A1.type = Console appender.A1.name = A1 appender.A1.target = SYSTEM_OUT appender.A1.layout.type = PatternLayout appender.A1.layout.pattern = %m%n loggers = …,CBORuleLogger rootLogger.level = INFO rootLogger.appenderRefs = A1 rootLogger.appenderRef.A1.ref = A1 logger.CBORuleLogger.name = org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger logger.CBORuleLogger.level = DEBUG logger.CBORuleLogger.filter.marker.type= MarkerFilter logger.CBORuleLogger.filter.marker.marker= FULL_PLAN logger.CBORuleLogger.filter.marker.onMatch= DENY logger.CBORuleLogger.filter.marker.onMismatch= NEUTRAL Hive tests: Modify data/conf/hive-log4j2.properties file HIVE-25816
  • 14. RuleEventLogger - Example ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 15. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 16. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 17. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 18. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 19. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] logger.CBORuleLogger.filter.marker.onMatch= DENY SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 20. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Full plan for rule input [rel#11:LogicalAggregate]: LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT($2)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#0: Full plan for [rel#15:LogicalProject]: LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) logger.CBORuleLogger.filter.marker.onMatch= ACCEPT SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 21. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#0: Apply rule [AggregateRemoveRule] to [rel#11:LogicalAggregate] call#0: Full plan for rule input [rel#11:LogicalAggregate]: LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT($2)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) call#0: Rule [AggregateRemoveRule] produced [rel#15:LogicalProject] call#0: Full plan for [rel#15:LogicalProject]: LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) logger.CBORuleLogger.filter.marker.onMatch= ACCEPT SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 22. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#1: Apply rule [ProjectMergeRule] to [rel#13:LogicalProject,rel#15:LogicalProject] call#1: Full plan for rule input [rel#13:LogicalProject]: LogicalProject(EMPNO=[$0], EXPR$1=[$2]) LogicalProject(EMPNO=[$0], DEPTNO=[$1], $f2=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) call#1: Full plan for rule input [rel#15:LogicalProject]: ... call#1: Rule [ProjectMergeRule] produced [rel#17:LogicalProject] call#1: Full plan for [rel#17:LogicalProject]: LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) logger.CBORuleLogger.filter.marker.onMatch= ACCEPT SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 23. RuleEventLogger - Output Explained ./gradlew :core:test --tests RelOptRulesTest.testAggregateRemove3 RelOptRulesTest > testAggregateRemove3() STANDARD_OUT call#2: Apply rule [ProjectMergeRule] to [rel#17:LogicalProject,rel#9:LogicalProject] call#2: Full plan for rule input [rel#17:LogicalProject]: LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($2), 1:BIGINT, 0:BIGINT)]) LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) call#2: Full plan for rule input [rel#9:LogicalProject]: LogicalProject(EMPNO=[$0], DEPTNO=[$7], MGR=[$3]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) call#2: Rule [ProjectMergeRule] produced [rel#19:LogicalProject] call#2: Full plan for [rel#19:LogicalProject]: LogicalProject(EMPNO=[$0], EXPR$1=[CASE(IS NOT NULL($3), 1:BIGINT, 0:BIGINT)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) logger.CBORuleLogger.filter.marker.onMatch= ACCEPT SELECT empno, count(mgr) FROM sales.emp GROUP BY empno, deptno
  • 25. Hanging TPC-DS queries ● Context: Upgrade Calcite version from 1.25.0 to 1.33.0 ● Symptom: TPCDS queries hanging
  • 26. Hanging TPC-DS queries ● Context: Upgrade Calcite version from 1.25.0 to 1.33.0 ● Symptom: TPCDS queries hanging (e.g., query13) select avg(ss_quantity), avg(ss_ext_sales_price), avg(ss_ext_wholesale_cost), sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and ( (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = 'Primary' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1)) and( (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 100 and 200) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 300) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 250));
  • 27. Hanging TPC-DS queries ● Context: Upgrade Calcite version from 1.25.0 to 1.33.0 ● Symptom: TPCDS queries hanging (e.g., query13) select avg(ss_quantity), avg(ss_ext_sales_price), avg(ss_ext_wholesale_cost), sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and ( (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = 'Primary' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1)) and( (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 100 and 200) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 300) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 250));
  • 28. Hanging TPC-DS queries ● Why does it take so long? ● Where is it stuck? ● What does it do? ● Gather information: ○ Profile the application (async-profiler) ○ Collect stack traces (jstack) ○ Check the logs
  • 29. Hanging TPC-DS queries ● Why does it take so long? ● Where is it stuck? ● What does it do? ● Gather information: ○ Profile the application (async-profiler) ○ Collect stack traces (jstack) ○ Check the logs
  • 30. java.lang.Thread.State: RUNNABLE at java.util.TreeMap.compare(TreeMap.java:1294) at java.util.TreeMap.put(TreeMap.java:538) at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272) at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222) at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225) at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64) at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41) at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812) at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793) at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778) at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761) at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279) at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841)
  • 31. java.lang.Thread.State: RUNNABLE at java.util.TreeMap.compare(TreeMap.java:1294) at java.util.TreeMap.put(TreeMap.java:538) at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272) at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222) at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225) at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64) at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41) at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812) at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793) at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778) at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761) at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279) at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) Depth:~ 2K Lines
  • 32. java.lang.Thread.State: RUNNABLE at java.util.TreeMap.compare(TreeMap.java:1294) at java.util.TreeMap.put(TreeMap.java:538) at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272) at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222) at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225) at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64) at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41) at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812) at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793) at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778) at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761) at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279) at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) Depth:~ 2K Lines From stack + code we can infer that the plan has more than 1K nested Filter operators
  • 33. java.lang.Thread.State: RUNNABLE at java.util.TreeMap.compare(TreeMap.java:1294) at java.util.TreeMap.put(TreeMap.java:538) at org.apache.hive.com.google.common.collect.TreeRangeSet.replaceRangeWithSameLowerBound(TreeRangeSet.java:272) at org.apache.hive.com.google.common.collect.TreeRangeSet.add(TreeRangeSet.java:222) at org.apache.hive.com.google.common.collect.RangeSet.addAll(RangeSet.java:225) at org.apache.hive.com.google.common.collect.AbstractRangeSet.addAll(AbstractRangeSet.java:64) at org.apache.hive.com.google.common.collect.TreeRangeSet.addAll(TreeRangeSet.java:41) at org.apache.calcite.rex.RexSimplify$RexSargBuilder.addSarg(RexSimplify.java:3056) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2b(RexSimplify.java:2894) at org.apache.calcite.rex.RexSimplify$SargCollector.accept2(RexSimplify.java:2812) at org.apache.calcite.rex.RexSimplify$SargCollector.accept_(RexSimplify.java:2793) at org.apache.calcite.rex.RexSimplify$SargCollector.accept(RexSimplify.java:2778) at org.apache.calcite.rex.RexSimplify$SargCollector.access$400(RexSimplify.java:2761) at org.apache.calcite.rex.RexSimplify.lambda$simplifyAnd$3(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify$$Lambda$1099/1247334493.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.apache.calcite.rex.RexSimplify.simplifyAnd(RexSimplify.java:1488) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:279) at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:248) at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:223) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:299) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) at org.apache.calcite.rel.metadata.RelMdPredicates.getPredicates(RelMdPredicates.java:292) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates_$(Unknown Source) at org.apache.calcite.rel.metadata.janino.GeneratedMetadata_PredicatesHandler.getPredicates(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getPulledUpPredicates(RelMetadataQuery.java:841) Depth:~ 2K Lines From stack + code we can infer that the plan has more than 1K nested Filter operators What creates the operators? Why?
  • 34. Hanging TPC-DS queries - Logs to the rescue grep -A 10 "Rule.*produced" hive.log
  • 35. Hanging TPC-DS queries - Logs to the rescue 2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter] 2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) HiveTableScan(table=[[default, store]], table:alias=[store]) -- 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter] 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) -- 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter] 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
  • 36. Hanging TPC-DS queries - Logs to the rescue 2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter] 2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) HiveTableScan(table=[[default, store]], table:alias=[store]) -- 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter] 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) -- 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter] 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available])
  • 37. Hanging TPC-DS queries - Root cause 2022-10-07T05:52:23,575 DEBUG calcite.RuleEventLogger: call#1: Rule [HivePreFilteringRule] produced [rel#84:HiveFilter] 2022-10-07T05:52:23,576 DEBUG calcite.RuleEventLogger: call#1: Full plan for [rel#84:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) HiveTableScan(table=[[default, store]], table:alias=[store]) -- 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Rule [HivePreFilteringRule] produced [rel#89:HiveFilter] 2022-10-07T05:52:23,601 DEBUG calcite.RuleEventLogger: call#2: Full plan for [rel#89:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) -- 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Rule [HivePreFilteringRule] produced [rel#94:HiveFilter] 2022-10-07T05:52:23,610 DEBUG calcite.RuleEventLogger: call#3: Full plan for [rel#94:HiveFilter]: HiveFilter(condition=[AND(=($27, $6), =($22, $99), =($105, 2001), =($4, $73), =($60, $3), OR(AND(=($62, _UTF-16LE'M'), =($63, _UTF-16LE'4 yr Degree HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveFilter(condition=[AND(OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')), OR(=($63, _UTF-16LE'4 yr Degree '), HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not available]) HivePreFilteringRule matches infinitely and creates identical filters multiple times
  • 38. Hanging TPC-DS queries - Interactive debugging OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U'))
  • 39. Hanging TPC-DS queries - Interactive debugging OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')) // 3. If the new conjuncts are already present in the plan, we bail out final List<RexNode> newConjuncts = HiveCalciteUtil.getPredsNotPushedAlready(filter.getInput(),operandsToPushDown);
  • 40. Hanging TPC-DS queries - Interactive debugging OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')) SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1)) // 3. If the new conjuncts are already present in the plan, we bail out final List<RexNode> newConjuncts = HiveCalciteUtil.getPredsNotPushedAlready(filter.getInput(),operandsToPushDown); mq.getPulledUpPredicates(inp).pulledUpPredicates
  • 41. Hanging TPC-DS queries - Interactive debugging OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')) SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1))
  • 42. Hanging TPC-DS queries - Interactive debugging OR(=($62, _UTF-16LE'M'), =($62, _UTF-16LE'D'), =($62, _UTF-16LE'U')) SEARCH($62, Sarg[_UTF-16LE'D', _UTF-16LE'M', _UTF-16LE'U']:CHAR(1)) CALCITE-4173 Add internal SEARCH operator and Sarg literal, replacing use of IN in RexCall (1.26.0) CALCITE-5036 RelMetadataQuery#getPulledUpPredicates support to analyze constant key for the operator of IS_NOT_DISTINCT_FROM (1.31.0)
  • 43. Check What CBO Rules Are Applied Command: grep --no-filename "produced" hive.log | cut -c 100- Output format: call#$X: Rule [$RULE_DESCRIPTION] produced [rel#$Y:$REL_KIND] Sample output: call#1: Rule [HivePreFilteringRule] produced [rel#73:HiveFilter] call#12: Rule [ReduceExpressionsRule(Filter)] produced [rel#76:HiveFilter] call#14: Rule [FilterCondition] produced [rel#78:HiveFilter] call#17: Rule [ReduceExpressionsRule(Filter)] produced [rel#80:HiveFilter] call#21: Rule [FilterCondition] produced [rel#82:HiveFilter] call#29: Rule [HiveProjectFilterPullUpConstantsRule] produced [rel#84:HiveProject] call#46: Rule [HiveJoinAddNotNullRule] produced [rel#89:HiveJoin] Once you have the rel number ($Y), you can look it up in the logs to see it printed in full If you are interested in a particular rule application, search for the call ($X), input and output rels are printed close to the corresponding “produced” line
  • 44. OutOfMemory at Planning Time ● We look for a sequence of rule applications which loops from a certain point of the query planning process ● Run “EXPLAIN” against the offending query and look at the rules which are invoked in the logs to identify the loop ● Once identified, find the first occurrence of applications of this list, and start analyzing the plans generated right before/after it ● Note: the rel triggering the loop might also be created outside this sequence
  • 45. OutOfMemory at Planning Time - Example (HIVE-25758) SELECT c.month, d.con_usd FROM (SELECT cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month FROM test1 UNION ALL SELECT month FROM test2 WHERE month = 202110 ) c JOIN test3 d ON c.month = d.mth; ● Symptoms: OOM due to the creation of ever increasingly complex filters
  • 46. OutOfMemory at Planning Time - Example (HIVE-25758) HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not available]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(...):INTEGER**]) HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...):INTEGER)]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[CAST(202110):INTEGER]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, test3]], table:alias=[d]) ** Abbreviation of “CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER”
  • 47. OutOfMemory at Planning Time - Example (HIVE-25758) call#198: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#136:HiveJoin] call#212: Rule [HiveFilterSetOpTransposeRule] produced [rel#142:HiveUnion] call#223: Rule [HiveFilterProjectTransposeRule] produced [rel#148:HiveProject] call#228: Rule [HiveFilterMergeRule] produced [rel#152:HiveFilter] call#239: Rule [HiveFilterProjectTransposeRule] produced [rel#153:HiveProject] call#258: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#155:HiveJoin] call#273: Rule [HiveFilterProjectTransposeRule] produced [rel#160:HiveProject] call#278: Rule [HiveFilterMergeRule] produced [rel#164:HiveFilter] call#283: Rule [ReduceExpressionsRule(Filter)] produced [rel#166:HiveFilter] RHS → LHS LHS → RHS
  • 48. OutOfMemory at Planning Time - Example (HIVE-25758) HiveJoin(condition=[=($0, $1)], joinType=[inner], [...]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(...)):INTEGER**]) HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...):INTEGER)]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[CAST(202110):INTEGER]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, test3]], table:alias=[d]) ** Abbreviation of “CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER”
  • 49. OutOfMemory at Planning Time - Example (HIVE-25758) HiveJoin(condition=[=($0, $1)], joinType=[inner], [...]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(...)):INTEGER]) HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[CAST(202110):INTEGER]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveFilter(condition=[OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0, 202110))]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, test3]], table:alias=[d]) call#258: Rule [HiveJoinPushTransitivePredicatesRule] produced [rel#155:HiveJoin]
  • 50. OutOfMemory at Planning Time - Example (HIVE-25758) HiveJoin(condition=[=($0, $1)], joinType=[inner], [...]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(...)):INTEGER]) HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[CAST(202110):INTEGER]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveFilter(condition=[AND(OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0, 202110)), IS NOT NULL($0))]) HiveTableScan(table=[[default, test3]], table:alias=[d]) call#278: Rule [HiveFilterMergeRule] produced [rel#164:HiveFilter]
  • 51. OutOfMemory at Planning Time - Example (HIVE-25758) HiveJoin(condition=[=($0, $1)], joinType=[inner], [...]) HiveUnion(all=[true]) HiveProject(month=[CAST(regexp_replace(...)):INTEGER]) HiveFilter(condition=[IS NOT NULL(CAST(regexp_replace(...)):INTEGER)]) HiveTableScan(table=[[default, test1]], table:alias=[test1]) HiveProject(month=[CAST(202110):INTEGER]) HiveFilter(condition=[=($0, 202110)]) HiveTableScan(table=[[default, test2]], table:alias=[test2]) HiveFilter(condition=[AND(OR(=($0, CAST(regexp_replace(...)):INTEGER), =($0, 202110)), IS NOT NULL($0))]) HiveTableScan(table=[[default, test3]], table:alias=[d]) ReduceExpressionsRule(Filter) can’t simplify the predicate
  • 52. OutOfMemory at Planning Time - Example (HIVE-25758) HiveFilter(condition=[ AND( IN($0, CAST(regexp_replace(...)):INTEGER, 202110), OR( AND( OR( IS NOT NULL(CAST(regexp_replace(...)):INTEGER), =(CAST(regexp_replace(...)):INTEGER, 202110) ), =($0, CAST(regexp_replace(...)):INTEGER) ), =($0, 202110) ) ) ])
  • 53. Incomplete / Incorrect Plan ● Usually boils down to identifying the first rel that looks “bad”, and trace it back by looking for the rel#, which is unique, and should appear in the preceding part of the logs ● Resolution highly depends on the specific issue, but with a precise call# or rel# it’s possible to set a conditional breakpoint (rules can be invoked multiple times before producing the issue)
  • 54. Incomplete / Incorrect Plan Hive - Example (HIVE-26722) ● OK: SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, NULL FROM t ) AS t2 WHERE a = 1; ● KO: (missing results) SELECT * FROM ( SELECT a, b FROM t UNION ALL SELECT a, cast(NULL as STRING) FROM t ) AS t2 WHERE a = 1; HiveUnion(all=[true]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1)]) HiveTableScan(table=[[default, t]], table:alias=[t]) HiveProject(a=[$0], b=[$1]) HiveFilter(condition=[=(CAST($0):DOUBLE, 1)]) HiveTableScan(table=[[default, t]], table:alias=[t])
  • 55. Incomplete / Incorrect Plan Hive - Example (HIVE-26722) call#5: Rule [HiveFilterProjectTransposeRule] produced [rel#41:HiveFilter] call#6: Rule [HiveFilterSetOpTransposeRule] produced [rel#43:HiveFilter ] call#7: Rule [HiveFilterProjectTransposeRule] produced [rel#47:HiveProject] call#24: Rule [HivePartitionPruneRule(Filter)] produced [rel#52:HiveFilter] call#32: Rule [HiveFieldTrimmerRule] produced [rel#70:HiveFilter] call#41: Rule [HiveFilterProjectTSTransposeRule] produced [rel#77:HiveProject] call#43: Rule [HiveCardinalityPreservingJoinRule] produced [rel#87:HiveProject] call#5: Rule [HiveFilterProjectTransposeRule] produced [rel#47:HiveFilter] call#6: Rule [HiveFilterSetOpTransposeRule] produced [rel#51:HiveUnion ] call#7: Rule [HiveFilterProjectTransposeRule] produced [rel#57:HiveProject] call#13: Rule [HiveFilterProjectTransposeRule] produced [rel#62:HiveProject] call#15: Rule [ReduceExpressionsRule(Project)] produced [rel#66:HiveProject] call#19: Rule [HiveFilterProjectTransposeRule] produced [rel#69:HiveProject] call#49: Rule [HivePartitionPruneRule(Filter)] produced [rel#73:HiveFilter] call#64: Rule [HiveFieldTrimmerRule] produced ...
  • 56. Disabling Rules via Configuration ● AbstractRelOptPlanner#setRuleDescExclusionFilter allows to exclude rules based on a regex over their description, even if registered in the planner ● Benefits: ○ It avoids recompiling to exclude some rules from planning while troubleshooting ○ Can be a quick workaround to customers once the faulty rule(s) is identified ○ Can be activated “per-query” ○ Less invasive than disabling CBO altogether (e.g., “hive.cbo.enable=false”) ● HIVE-25880: “Add property to exclude CBO rules by a regex on their description” ● Example: set hive.cbo.rule.exclusion.regex=HiveJoinPushTransitivePredicatesRule|HivePreFilteringRule;
  • 57. Conclusion ● Many planning issues boil down to rule transformations making plans bigger and bigger ● Built-in loggers indispensable for fast troubleshooting ● Easy configuration via XML/Property files (Log4j2) ● Configurable verbosity via FULL_PLAN marker ● Common pain points: ○ Using multiple equivalent operators (OR, SEARCH, IN, etc.) ○ Push/Pull predicate logic in various rules ○ Inconsistent simplifications during planning ● Exploit AbstractRelOptPlanner#setRuleDescExclusionFilter for quick workarounds and hypotheses verification