Oracle Big Data SQL FAQ

Starting with BDA V4.0, Oracle Big Data SQL (BDSQL) supports queries against data stored in multiple data sources, including Hadoop. Using Oracle Big Data SQL, you can query data stored in a Hadoop cluster using the complete SQL syntax. BDSQL allows you to view and analyze data from various data stores together, as if it were all stored in an Oracle database. You can execute the most complex SQL SELECT statements against data in Hadoop, either manually or using your existing applications.

Does this mean that Big Data SQL supports the full set of Oracle SQL?

Yes, you can apply all of Oracle SQL against the external tables that are defined over data in Hadoop/NoSQL sources. Basically BDSQL extentds Oracle SQL to Hadoop and NoSQL.

Is an Exadata and BDA required to use BDSQL?

Yes, BDSQL requires an Exadata and BDA. It can not be installed/used in other environments.

So just to clarify does that mean BDSQL can not run on comodity hardware?

Correct. BDSQL is designed for Engineering Systems. It runs on BDA and Exadata.

Is there any way to test it out without a BDA/Exadata.

See the blog: The Oracle Big Data Lite 4.0 Virtual Machine Now Available. Big Data Lite 4.0 can be downloaded. There is a hands-on lab to learn how to securely analyze all your data – across both Hadoop and Oracle Database 12c – using Big Data SQL.

Does BDSQL required a separate licence?

Yes, see Oracle Big Data Appliance Licensing documentation section on Oracle Big Data SQL.

Does Cloudera Manager(CM) provide support to manage BDSQL?

Yes, from CM you can start, stop, and monitor BDSQL services and roles. See: Instructions to Enable and Disable and Check Status of
Oracle Big Data SQL on Oracle BDA with Mammoth V4.0 Release (Doc ID 1930571.1)

Will CM send an alert if BDSQL services/roles are down?

Yes.

Does Oracle Big Data SQL use external tables?

Yes Oracle Big Data SQL provides external tables. An external table is an Oracle Database object that identifies and describes the location of data outside of a database. You can query an external table using the same SQL SELECT syntax that you use for any other database tables.

Does BDSQL support data in HDFS?

Yes. you can create Oracle external tables directly over files stored in HDFS.

Does BDSQL support data in Hive?

Yes, you can create Oracle external tables over Apache Hive data sources.

Does BDSQL support HBase?

BDSQL does not directly support HBase but BDSQL can access data stored in HBase that have Hive tables defined for them.

Hive has a concept of StorageHandlers to allow for HBase integration. This feature allows Hive QL statements to access HBase tables for both read (SELECT) and write (INSERT). It is possible to combine access to HBase tables with native Hive tables via joins and unions. Since Big Data SQL supports Hive, HBase queries from Hive via the above should be supported.

Does BDSQL support Smart Scan Technology?

Yes. External tables do not have traditional indexes, so that queries against them typically require a full table scan. However, Oracle Big Data SQL extends SmartScan capabilities, such as filter-predicate offloads, to Oracle external tables with the installation of Exadata storage server software on Oracle Big Data Appliance. This technology enables Oracle Big Data Appliance to discard a huge portion of irrelevant data—up to 99 percent of the total—and return much smaller result sets to Oracle Exadata Database Machine. End users obtain the results of their queries significantly faster, as the direct result of a reduced load on Oracle Database and reduced traffic on the network.

Does BDSQl use Hive for its metadata?

Yes BDSQL uses Hive for its metadata i.e. to locate and parse data stored in Hadoop/NoSQL sources. BDSQL uses Smart Scan for the core capability of its processing engine however..

Does BDSQL have security features?

Big Data SQL provides for the full range of Oracle Database security features. This allows you to apply the same security policies and rules to your Hadoop data that you apply to your relational data. Additionally the BDA provides these security features to protect data stored in CDH 1) Kerberos authentication, 2) Apache Sentry authorization, 3) On-disk encryption, 4) Oracle Audit Vault and Database Firewall monitoring.

This allows applications to run Oracle SQL over all their data with a unified security model.

Is OPatch used to patch BDSQL on the BDA?

No. On the BDA patching for BDSQL is performed with Mammoth. See the Oracle Big Data Appliance Owner’s Guide section on Installing a One-Off Patch

Two environment variables $CDPATH and $CELLTRACE on the BDA with Big Data SQL installed have ASM names in their path. Is this a bug?

Yes, Oracle Dev has confirmed that this is a bug in the bd_cell rpm which implements BigDataSQL and It only appears after that rpm is installed.

Bug 20123815 $CDPATH AND $CELLTRACE HAVE ASM IN THE PATH ON BDA WITH BIG DATA SQL INSTALLED.

How do you install Exadata Storage Server Software on Oracle Big Data Appliance ? Do you have to install it manually on the BDA?

No. You do not have to install it on the BDA manually. You just need to run: ‘bdacli enable bigdatasql’ on the first node of the BDA cluster to install the required software on each node of the cluster.

Can one Exadata talk to multiple BDA clusters using Big Data SQL?

Yes, One Exadata can talk to multiple BDA clusters.

Just download the bds-exa-install.sh script from the first node of the each cluster and when installing a second or third cluster use the –install-as-secondary=true option.

Without this option the last cluster installed will be the default cluster, but with this option the default cluster will be unchanged.

As bds-exa-install.sh is a cluster specific, you need to download a new bds-exa-install.sh for each BDA cluster. This script is generated by Mammoth and contains cluster specific content. Once downloaded, one can rename bds-exa-install.sh to <cluster_name>_bds_exa_install.sh or anything they like and then run it.

The bda-exa-install.sh should only be run on the exadata nodes. Note that it should be run on all db compute nodes of a RAC db cluster separately. The script works best if the order in which it is run follows the instance id’s. So run it on instance id 1 first and instance n last, though this is not critical. This is because only on instance n the script registers the extproc service with crs, in which crs will try to start up all instances of the extproc services which if not installed yet, may fail to start, but eventually crs will retry and when the instance is installed then the extproc will come up on that instance.

Most of the script actions are described in the help page, but it in normal scenarios no options are needed, with the exception of the –install-as-secondary=true option.

On the BDA side, one has to enable big_data_sql for each cluster if not already done. On the first node of the cluster run bdacli getinfo cluster_big_data_sql_enabled if this returns false then enable big_data_sql with bdacli enable big_data_sql

Once enabled the bds-exa-install.sh can be downloaded on each Exadata compute node using curl. The instructions for this are in the bda software guide but are old.

https://docs.oracle.com/cd/E57371_01/doc.41/e57351/bigsql.htm#BIGUG76720

since these are for BDS 1.0 . BDS1.1 the install script should be run as the oracle user and the script will generate a root script which should be run before continuing.

Is a running 12c database (not just the software binaries) required in order to use Big Data SQL? Can BDA connect to an Exadata machine running an 11g database?

The BDA Software User’s Guide has a section on Installing Big Data SQL. You must have fully installed and configured the 12c database before you can proceed with running the post-installation script (bds-exa-install.sh). See section 6.2.3 of the Software User’s Guide.

Note: The restriction has nothing to with connecting Exadata and BDA. Big Data Connectors work with Exadata machines that run 11g database. The physical IB connections can also be made at any time.

The restriction is very specific to configuring an Exadata database for Big Data SQL usage. You can cable the Exadata rack to the BDA, you can complete the BDA side of the Big Data SQL installation process, but you have to wait with executing the post-installation step described in section 6.2.3 until you have a running 12c database.

This post installation step configures a specific database home (which must be 12c) such that it can execute Big Data SQL queries against a specific Hadoop cluster.

For Big Data SQL is it an absolute requirement to have an InfinBand network connection between the BDA and Exadata?

Yes. Big Data SQL is similar to an Exadata storage cell and uses the same protocols with the database. This is only supported over IB.

For Big Data SQL v1.1 is there a security concern when creating a public DB link?

To elaborate on the question further, is there a security concern when creating a public DB link like:

create public database link BDSQL$_bigdatalite using ‘extproc_connection_data’;
create public database link BDSQL$_DEFAULT_CLUSTER using ‘extproc_connection_data’;

and the need for a directory which is accessible to everyone?

Public DB links to the BDSQL configuration directory do not permit all users access to all data stored in Hadoop. User or role specific security policies can still be put in place for any ORACLE_HIVE or ORACLE_HIVE external table. These public objects simply ensure than any user can reach the BDSQL database-side executable and the Hadoop configuration files. Hence the above is not currently considered these a security risk.

For Big Data SQL v1.1 if two users access the same Hadoop cluster is there a direct way of preventing one user from accessing a different set of Hive table names than the other?

Currently there is no direct way of preventing one user from accessing a different set of Hive tables than another user.

Instead one would work with the normal oracle object privileges by having a non restricted user (dba) create the external tables and revoke read access from the default directory from restricted users.

The latter can be done by revoke public access to directory “ORA_BIGDATA_CL_<cl>” where cl is the cluster name, and then grant read access only to the unrestricted users.

Yes, that should be correct.

What roles/privileges are necessary in order for a general user to execute the “Copying Oracle Tables to Hadoop” operation?

No additional database privilege (other than what is documented in Section 8 Copying Oracle Tables to Hadoop of the Big Data Appliance 4.3 Software User’s Guide) is necessary. The DATAPUMP_EXP_FULL_DATABASE role is not required. If a user outside the schema has read privileges on the table, that user can create the data pump files from the table to use with Copy to BDA.

However, note that the files are written out as data pump files with owner as the OS user running the database (typically “oracle”). Generally this user and users in this group have read permissions on those files. So only someone with the OS privileges of the user running the database, or a member of that OS group, is able to access the files after they are created by the external table.

Note that this operation is different from the export utility for exporting data from the database. Copy to BDA works only with data pump files created using the syntax below, it does not work with data pump files created with the export utility.
Table 8-1 of the Big Data Appliance Software User’s Guide, Section: About Data Type Conversions, explains Oracle to Hive data type conversions. But what about Oracle data types like “LONG”, “BFILE” and “BLOB” which are not explained in Table 8-1?

The Big Data Appliance Software User’s Guide, Section: About Data Type Conversions, does not explain conversions for “LONG”, “BFILE” and “BLOB”.

The following data types are not supported by the Data Pump access driver:
LONG
LONG RAW
BFILE

But the “Oracle Database Utilities Doc”, Section on UnSupported Data Types, has workarounds for these unsupported types.

The following data type are converted to Hive STRING or Hive BINARY:

On BDS V2.0 + BDA 4.3.0

CLOB/NCLOB -> Hive STRING
BLOB/ROWID/UROWID/RAW -> Hive BINARY

On BDS V1.1 + BDA 4.2.0

RAW -> Hive BINARY

Note: CLOB/NCLOB and BLOB/ROWID/UROWID are not converted on BDS V1.1 + BDA 4.2.0.

Does BDS V2.0 support optimizations for Parquet?

There is no optimization in BDS today for Parquet.

Is wallet compatible with BDS V2.0?

Oracle Big Data SQL does not connect to database through listener (or SQLNET). So if this is a SQLNET wallet then BDS should not be impacted.

How should Active SQL Monitor Report be used with BDS V2.0?

Oracle Enterprise Manager is the best way to do this. Generally speaking it’s great point to start debug BDS queries.

We just upgraded to Bundle Patch 13 as part of the OCT 2015 QFSDP do we need to patch again and to which Bundle Patch?

Bundle patch 13 is now supported and a BDS2.0 patch on top of bp13 exists.
Click on (patch number 21419192 ) in the Big Data Appliance 4.4. documentation and it will pull up the patches. Please select the one for 12.1.0.2.13.

You should see this:

Patch 21419192: TRACKING BUG FOR BIG DATA SQL V2.0 RELEASE

Last Updated Dec 20, 2015 9:18 PM (1+ month ago)

Product Oracle Database – Enterprise Edition

Release Oracle 12.1.0.2.13 DB Eng Sys and DB IM
Platform Linux x86-64

Bugs Resolved by This Patch
21419192 TRACKING BUG FOR BIG DATA SQL V2.0 RELEASE

Follow the readme for installation instructions.

The documentation specifies Exadata storage version 12.1.1.1 or 12.1.1.0. Will this work with 12.1.2.2.0 and is that supported?

Yes, it should work. Storage software on Exdata cells are independent of BDS cell software and can be upgraded independently.

Is the patch number 21419192 available for SAP databases?

12.1.0.2 is certified for SAP on Exadata. But you need an SAP Bundle Patch (SBP) for this. The SAP Bundle Patch is a superset of a BP. And the Jan16 BP is part of the February SAP Bundle Patch.

It is not supported to run SAP on Exadata just with the BP. The SBP must be installed. So if 21419192 is part of BP16 then it is also part for February SAP Bundle Patch. It is the case that Patch 21419192 is required for:
12.1.0.2.13 (12.1.0.2 with BP13)
and
12.1.0.2.10 (12.1.0.2 with BP 10)

BDS 2.0 is included in BP Jan16 but you will need Feb16 SBP for SAP.

On BDA V4.4 with Big Data SQL 3.0 can Big Data SQL be run off the BDA?

Big Data SQL 3.0.0 is available in 2 forms

– a BDA patch for BDA v4.4.0 to Exadata

– a standalone (software-only) installer for non-BDA to non-Exadata

It is not supported to run the second form on a BDA cluster. Running BDS 3.0.0 on BDA to non-Exadata (or non-BDA to Exadata) is not currently supported.

Is Big Data SQL v3.0.0 provided by default on BDA V4.4?

No. Big Data SQL 3.0.0 is the latest available for BDA v4.4.0, but it is not included as a part of the BDA Mammoth v4.4.0 download. On BDA V4.4, the released version still contains Big Data SQL 2.0 as it always has. In general versions in already released bundles are not updated. Installing BDS 3.0.0 on BDA v4.4.0 requires applying the patch p22911748_440_Linux-x86-64.zip – whether on BDA X5-2 or BDA X6-2. As always, neither BDAMammoth nor BigDataSQL ships with the appliance from the factory. There are no plans to change this.

With Big Data SQL v3.0.0 what configurations are supported?

1. Exadata to BDA with CDH
2. non-Exadata to non-BDA (both CDH and HDP)

No other configuration is supported for BDS 3.0.0.

Is there way to limit the resources used by Big Data SQL?

The concern is that without limiting the resources used by Big Data SQL the BDA’s resources will be fully consumed by “stray” queries launched by some users. Some possible solutions on the database side might be:

1) Put all the external tables on a capped instance / database.

2) Use the Oracle resource manager for the queries.

Hence the question are there any equivalents on the BDA?

Big Data SQL is behind a Linux c-group with a standard install. That c-group prevents it from grabbing more than allocated on the BDA side.

Why is BDS 2.0.2 discovered on a new BDA 4.5 installation when BDS 3.0.1 should have been installed?

The BDS rpm is only a part of the BDS release and the BDS rpm version number does not always match the official release number. So in fact, you are running Big Data SQL 3.0.1 which has BDS rpm version 2.0.2.

What about Big Data SQL recognition of compressed file?

For example, if files are compressed on CDH, Hive automatically recognizes that input files ending with .gz are compressed. This means GZip-compressed files and plain text files can be intermixed in the same table and Hive can read them. Does Big Data SQL act the same way? Or should GZip-compressed files be gunzipped before reading them with Big Data SQL?

Big Data SQL does act the same way. So there is no need to Gzip-compress files before reading them with Big Data SQL.

Does the database parameter _external_table_smart_scan need to be “ON” for smart scan to work?

Yes. If the database parameter _external_table_smart_scan is set to OFF, smart scan will not work.

Does BDS use the listener TCP or SDP?

No. BDS does not use the listener TCP or SDP. BDS uses the listener solely for the extproc MTA connection hand-off to establish a communication channel between the database foreground/pq process and the extproc thread. Since both of these reside on the same host, this is an IPC end point, although most of the communication between these endpoints actually happens through shared memory.

Does the MOS note: Setup Listener on Infiniband Network using both SDP and TCP Protocol (Doc ID 1580584.1) apply to BDS?

No. That note does not apply in any way to Big Data SQL. This is because, as per the previous update, BDS does not use listener TCP or SDP. In the Big Data SQL documentation no reference is made anywhere to any of the steps presented in Setup Listener on Infiniband Network using both SDP and TCP Protocol (Doc ID 1580584.1). The steps in the Big Data SQL installation guide are the steps supported. As it turns out Big Data SQL does not support modifications to the environment after install other then editing config files for example bigdata.properties to add java jars to class path for custom input formats/serdes etc.

With BDA V4.7/ BDS 3.0.1 is is possible to install BDS on an Exadata and access a non-BDA cluster?

No. This is not a supported configuration for BDS 3.0.1. With BDS 3.0.1 an Exadata can only be used with a BDA hadoop cluster.

What about BDS 3.1?

As of 3.1 the following are supported:
– Exadata to BDA both over IB as well as Ethernet.
– Exadata to non-BDA Hadoop over Ethernet.
– non-Exadata DB to BDA over Ethernet.
– non-Exadata to non-BDA Hadoop over Ethernet.

Where can more information on specific Big Data SQL errors be found?

See: Frequently Asked Questions (FAQ) on Big Data SQL Errors (Doc ID 2134254.1).

Where is the BDS compatibility maxtrix?

The BDS Compatibility Matrix can be found in: Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1).

REFERENCES

NOTE:1641871.1 – DataNode Health turns BAD after Reboot of a Node in CDH Cluster on Oracle Big Data Appliance V2.5 and higher with On-Disk Encryption Enabled
NOTE:1641829.1 – How to Add, Update or Remove Password-based On-Disk Encryption on Oracle Big Data Appliance v2.5/V2.6/V3.*/V4.* Using mammoth-reconfig

Related

Categories