Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

We have ETL jobs in Python (Luigi). They all connect to Hive Metastore to get partitions info.

Code:

from hive_metastore import ThriftHiveMetastore
client = ThriftHiveMetastore.Client(protocol)
partitions = client.get_partition_names('sales', 'salesdetail', -1)

-1 is max_parts (max partitions returned)

It randomly times out like this:

  File "/opt/conda/envs/etl/lib/python2.7/site-packages/luigi/contrib/hive.py", line 210, in _existing_partitions
    partition_strings = client.get_partition_names(database, table, -1)
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/hive_metastore/ThriftHiveMetastore.py", line 1703, in get_partition_names
    return self.recv_get_partition_names()
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/hive_metastore/ThriftHiveMetastore.py", line 1716, in recv_get_partition_names
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
    sz = self.readI32()
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
    buff = self.trans.readAll(4)
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 159, in read
    self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))
  File "/opt/conda/envs/etl/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 105, in read
    buff = self.handle.recv(sz)
timeout: timed out

This error happens occasionally.
There is 15 minute timeout on Hive Metastore.
When I investigate to run get_partition_names separately, it returns data within a few seconds.
Even when I set socket.timeout to 1 or 2 seconds, query completes.
There is no record of socket close connection message in Hive metastore logs cat /var/log/hive/..log.out

The tables it usually times out on have large number of partitions ~10K+. But as mentioned before, they only time out randomly. And they return partitions metadata quickly when that portion of code alone is tested.

Any ideas why it times out randomly, or how to catch these timeout errors in metastore logs, or how to fix them ?

question from:https://stackoverflow.com/questions/65898409/python-hive-metastore-partition-timeout

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...