Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.
Apache Sqoop
data sources
-
structured data sources
- Relational databases
- MySQL
- Relational databases
-
semi-structured data sources
- Cassandra, Hbase
-
unstructured data sources
- HDFS
Sqoop(1.4.7)
The import
tool imports an individual table from an RDBMS to HDFS. Each row from a table is represented as a separate record in HDFS. Records can be stored as text files (one record per line), or in binary representation as Avro or SequenceFiles.sqoop-import-all-tables
The import-all-tables
tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.
The import-mainframe
tool imports all sequential datasets in a partitioned dataset(PDS) on a mainframe to HDFS. A PDS is akin to a directory on the open systems. The records in a dataset can contain only character data. Records will be stored with the entire record as a single text field.
The export
tool exports a set of files from HDFS back to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specified delimiters.
Sqoop2(1.99.7)
Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.
Client modes
-
interactive mode
-
batch mode
-
create
-
update
-
clone
-