Impala Export to CSV
Apache Impala is an open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.
In some cases, impala-shell is installed manually on other machines that are not managed through Cloudera Manager. In such cases, you can still launch impala-shell and submit queries from those external machines to a DataNode where impalad is running. In such a specific scenario, impala-shell is started and connected to remote hosts by passing an appropriate hostname and port (if not the default, 21000
).
To use Impala shell to connect to Impala daemons running on other DataNode machines, you just need to have a DataNode hostname and a port number where impalad is configured, to receive queries and pass both hostname and port with the connect <hostname:port>
command, as shown in the following code:Copy
[Not connected] > connect datanode-hostname
[datanode-hostname:21000]
code to export file:
impala-shell -B -o output.csv --output_delimiter=',' -q "use test;
select * from teams;"
submit query as a file:
impala-shell -B -f my-query.txt -o query_result.txt '--output_delimiter=,'
OR
impala-shell -i <servername:portname> -B -q 'SELECT from_unixtime(field00) as 'in_date', field01, field02 FROM <table> LIMIT 100;' -o query_out.csv '--output_delimiter=\174'
if looking for adding header as well, then include --print header
in the command
impala-shell -B -f my-query.txt -o query_result.txt --print_header '--output_delimiter=,'