Apache Sqoop

Apache Sqoop

Posted by Zhang huirui on June 18, 2019

Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.

Apache Sqoop

data sources

  • structured data sources

    • Relational databases
      • MySQL
  • semi-structured data sources

    • Cassandra, Hbase
  • unstructured data sources

    • HDFS

Sqoop(1.4.7)

sqoop-import

The import tool imports an individual table from an RDBMS to HDFS. Each row from a table is represented as a separate record in HDFS. Records can be stored as text files (one record per line), or in binary representation as Avro or SequenceFiles.sqoop-import-all-tables

The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.

sqoop-import-mainframe

The import-mainframe tool imports all sequential datasets in a partitioned dataset(PDS) on a mainframe to HDFS. A PDS is akin to a directory on the open systems. The records in a dataset can contain only character data. Records will be stored with the entire record as a single text field.

sqoop-export

The export tool exports a set of files from HDFS back to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specified delimiters.

validation

sqoop-job

sqoop-metastore

sqoop-merge

sqoop-codegen

sqoop-create-hive-table

Sqoop2(1.99.7)

Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.

Client modes

  • interactive mode

  • batch mode

    • create

    • update

    • clone

参考文档

http://sqoop.apache.org