Recently I was working to setup a Big Data environment in Azure.
From Azure Data Factory I was spinning up an on-demand HDInsight Cluster with an external metastore.
Unfortunately I was always getting the following error: “Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient”.
After contacting Microsoft for support about this error they found the error was caused by a known Hadoop bug:
https://issues.apache.org/jira/browse/HIVE-12536
In short the error was caused by having dashes (-) in the name of the metastore database. After removing the dashes the problem disappeared and I was able to create the on-demand HDInsight cluster.
An excerpt of the error log, the name of my metastore was db-metastore-p:
Logging initialized using configuration in file:/C:/apps/dist/hive-0.14.0.2.2.9.1-1/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.6.0.2.2.9.1-1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.4.2.2.9.1-1-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:445)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:619)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:63)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:73)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2743)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2762)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:426)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1481)
... 13 more
Caused by: javax.jdo.JDOUserException: Could not create "increment"/"table" value-generation container db-metastore-p.dbo.SEQUENCE_TABLE since autoCreate flags do not allow it.
NestedThrowables:
org.datanucleus.exceptions.NucleusUserException: Could not create "increment"/"table" value-generation container db-metastore-p.dbo.SEQUENCE_TABLE since autoCreate flags do not allow it.
Hi Olandese,
I’m facing the same problem. I’m using an ARM template to deploy my data factory, following a Microsoft tutorial (if you google ‘data factory build your first pipeline using ARM’ you will find it). My problem is that I can’t see any way to specify the metastore database name in the JSON parameters for the ARM template. How are you changing the metastore database name?
Thanks!
James
LikeLike
Hi,
you have to add the following property: “hcatalogLinkedServiceName”: “{yourexternastorenamehere}”
LikeLike
Great, thanks for your help Olandese! Unfortunately this hasn’t worked for me, as I get the error ‘HCatalog integration is not enabled for this subscription.’ if I try to use that property.
I have created an Azure SQL database, added a SQL database linked service to my data factory definition:
{
“dependsOn”: [ “[concat(‘Microsoft.DataFactory/dataFactories/’, variables(‘dataFactoryName’))]” ],
“name”: “AzureSqlHiveMetastoreLinkedService”,
“type”: “linkedservices”,
“apiVersion”: “[variables(‘apiVersion’)]”,
“properties”: {
“type”: “AzureSqlDatabase”,
“typeProperties”: {
“connectionString”: “”
}
}
}
Then added the hcatalogLinkedServiceName property pointing to that linked service – but I get the following error when deploying the ARM template:
‘GpPrescribingDataFactory/HDInsightOnDemandLinkedService’ failed with message ‘HCatalog integration is not enabled for this subscription.’
Do you have any further suggestions? As far as I’m aware my subscription doesn’t have any limitations.
LikeLike
You are right, I was encountering the same problem on my private subscription (it was working on the enterprise subscription of the company I’m working for). I posted a question to Microsoft on the Yammer Azure Advisors and I got this answer: “we used to have hcat and schemageneration enabled prior to GA. For subs that were using it during public preview we automatically whitelisted them. We are working on re-enabling this feature soon”
LikeLike
Aargh!
Okay thanks for the information, you’ve been extremely helpful 🙂
I’m trying to sign up to Azure Advisors now. It’s very frustrating because, as far as I can see, using Hive with on-demand HDInsight is basically impossible right now!
LikeLike