Google Dataproc Presto: how to run queries using Python

I have set up a Google Dataproc cluster running Presto by going through the steps in this link.

It works fine and I am able to run queries through gcloud command-line tool as shown on the link like this.

      gcloud dataproc jobs submit hive \
    --cluster presto-cluster \
    --region=${REGION} \
    --execute "SELECT COUNT(*) FROM chicago_taxi_trips_parquet;"

In the end, the tutorial shows how to run queries on Presto through a java application. I am trying to find a similar solution with Python. Is there a way I can run queries on the Dataproc cluster through my Python application?

I know there are Python clients for Presto but I was not able to find resources on how to connect it with the Presto running on the Dataproc cluster.

Similarly, there is a Python library to submit jobs to Dataproc, but there are no resources as to how can I submit Presto query jobs to the Dataproc cluster.

Can someone tell me how can we connect to a Presto on Google Dataproc and run queries on it remotely using a Python application?

1 ответ

Вы можете найти примеры использования Dataproc Jobs API для отправки поддерживаемых заданий (включая Presto) в официальной документации клиентской библиотеки Dataproc Python: https://cloud.google.com/dataproc/docs/tutorials/python-library-example#submit_a_job_to_a_cluster

Другие вопросы по тегам