-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29376:Using partition spec in DESC FORMATTED sql is unsupported … #6259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…for Iceberg table
| return null; | ||
| } | ||
|
|
||
| private Partition getPartition(Table tab, Map<String, String> partitionSpec) throws HiveException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is almost same as getPartition() in DescTableOperation.java https://github.com/apache/hive/pull/6259/changes#diff-641c62b42b01bff41c89a3b3661c15d6c08fce0e48740347f36fe32448984147R131
Check if you can add them in an utility so it can be reused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking it, moved out this logic from DescTableOperation.java and DescTableAnalyzer.java to HiveIcebergStorageHandler.java
https://github.com/apache/hive/pull/6259/changes#diff-93864ecf035fe51b92185015da842a56837cea89064813de39c278c6f8fed03cR2079
please take a look
soumyakanti3578
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDLUtils.isIcebergTable() has been used in many places in the compiler. I think we should not use a specific table type here.
| } | ||
|
|
||
| private Partition getPartition(Table tab, Map<String, String> partitionSpec) throws HiveException { | ||
| boolean isIcebergTable = DDLUtils.isIcebergTable(tab); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use DDLUtils.isIcebergTable as this API in the compiler is too specific. Instead please use tab.isNonNative() in conjunction to other APIs if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking it, tried to make changes more generic to non native tables instead of iceberg centric please take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramitg254, consider joining with Table.hasNonNativePartitionSupport
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
Outdated
Show resolved
Hide resolved
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
Outdated
Show resolved
Hide resolved
| try { | ||
| List<String> partNames = getPartitionNames(icebergTable, partitionSpec, false); | ||
| return !partNames.isEmpty() && | ||
| Warehouse.makePartName(partitionSpec, false).equals(partNames.getFirst()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need
Warehouse.makePartName(partitionSpec, false).equals(partNames.getFirst())
check?
isn't !partNames.isEmpty() enough? we already pass the partitionSpec to getPartitionNames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But let's say for eg. for partspec is c=6 and parts available are c=6/d=hello6 only, then we would like to throw partition not found exception and that's why .equals is required as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, but could we please do minor refactor and move the whole getPartition into IcebergTableUtil?
public static Partition getPartition(Configuration conf,
org.apache.hadoop.hive.ql.metadata.Table table, Map<String, String> partitionSpec)
throws SemanticException {
List<String> partNames =
getPartitionNames(conf, table, partitionSpec, false);
if (partNames.isEmpty()) {
return null;
}
try {
String expectedName = Warehouse.makePartName(partitionSpec, false);
if (!expectedName.equals(partNames.getFirst())) {
return null;
}
return new DummyPartition(table, expectedName, partitionSpec);
} catch (MetaException e) {
throw new SemanticException("Unable to construct dummy partition", e);
}
}
public static List<String> getPartitionNames(Configuration conf,
org.apache.hadoop.hive.ql.metadata.Table table, Map<String, String> partSpecMap,
boolean latestSpecOnly) throws SemanticException {
Table icebergTable = getTable(conf, table.getTTable());
....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please drop Boolean.TRUE.equals(latestSpecOnly) and Boolean.FALSE.equals(latestSpecOnly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramitg254, please check this, i think that the only remaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so either if we are moving it then hms table should be used as parameter
if you thing iceberg table as parameter for method to be used then moving it won't be good IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, I saw some method have hms table as param as well, I'll update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, moved getPartition but kept hasPartition as it can have its idependent use case as well instead of keeping all of it in getPartition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop hasPartition, and use the snippet attached (i.e. it doesn't have any other usages).
Note, it requires getPartitionNames signature change as well to use Configuration conf, org.apache.hadoop.hive.ql.metadata.Table.
but it should have iceberg table as parameter instead of hms table
no, it shouldn't. hms table is just fine
IcebergTableUtil.getTable is duplicated in 2 caller methods, should be dropped.
Please check the complete snippet above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, used the snippet just a change was needed in the snippet needed to move
List<String> partNames =
getPartitionNames(conf, table, partitionSpec, false);
if (partNames.isEmpty()) {
return null;
}
inside try block as well as it can throw :
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No partition column by the name: pcol
at org.apache.iceberg.mr.hive.IcebergTableUtil.generateExpressionFromPartitionSpec
from generateExpressionFromPartitionSpec in getPartitionNames when executedexplain insert queries so needed to catch that exception and return null as well
| List<FieldSchema> partitionColumns = table.isPartitioned() ? table.getPartCols() : null; | ||
| List<FieldSchema> partitionColumns = null; | ||
| if (table.isPartitioned()) { | ||
| partitionColumns = table.hasNonNativePartitionSupport() ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but maybe we could move hasNonNativePartitionSupport logic inside of Table#getPartCols() , similar to
public List<String> getPartColNames() {
List<FieldSchema> partCols = hasNonNativePartitionSupport() ?
getStorageHandler().getPartitionKeys(this) : getPartCols();
return partCols.stream().map(FieldSchema::getName)
.collect(Collectors.toList());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can check this - would be great, but i think there was some issue. we might handle this in the follow-up, just trying to avoid code duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes actually my first approach was to do this only but there were issues regarding when statistics were computed after insertion resulting in index out of bound exception as that is dependent getPartCols but this is the only one known to me but there can be other scenarios where it is getting used and can cause issues so avoiding this for now.
and will tackle this code duplication once we have updated implementation of getPartCols with a follow up ticket
| List<String> values = new ArrayList<String>(); | ||
| for (FieldSchema fs : this.getTable().getPartCols()) { | ||
| for (FieldSchema fs : | ||
| table.hasNonNativePartitionSupport() ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same #6259 (comment)
deniskuzZ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left minor comments
| } | ||
| } | ||
|
|
||
| public static boolean hasPartition(Table icebergTable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop this, and use the snippet attached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, dropped
|



…for Iceberg table
What changes were proposed in this pull request?
Adds support for using partition spec with describe statement for iceberg table.
and updated the other test outputs as partition information is also gettting printed for desc statement after the changes
Why are the changes needed?
currently using partition spec with describe statement for iceberg table result in unsupported exception
Does this PR introduce any user-facing change?
yes, this statement will not result in exception anymore
How was this patch tested?
build locally and ci tests and added q tests