Skip to content

{ Author Archives }

Data Virtualization Technologies – dilemma

1. Denodo and Cisco both are offering data virtualization technologies . 2.The products give you the capability to add a middle layer over different data sources ( RDBMS like Oracle , sqlserver , Excel , XML etc). You can build views in this middle layer. There are tuning opportunities up to a certain extent so […]

How can I become a Hadoop expert

1. There are different Hadoop distributions . After Cloudera and Hortonworks merged , they are coming up with a common commercial distribution . Go to the Cloudera website and there are plenty of resources. 2.There are multiple components in a Hadoop cluster – Hive , Impala , Spark , Hue , Yarn , Cloudera Manager […]

Ten Activities in a big data project

1. You need good power Point presentations to present your project to the business areas . The power point presentation need to show balance between technical terms and easily understood words. You need to highlight the main advantages that the business areas can get from your big data project . 2. A senior manager should […]

Experience with Oracle Big Data Appliance

1. We have around 26 big data appliance nodes in production . We are using Cloudera CDH 5.15.2 as our Hadoop distribution . 2. Recently when we added new nodes to the production cluster , the new CPU’s in the new nodes were not compatible with CDH 5.13. So we were not able to scale […]

Day 5 – Big data Project

1.Developing a java app that shows a list of Hdfs folders by making a call to impala host when lot of other users are as well using the same host simultaneously will bring issues to the webapp if impala is hanging . Either make sure that the impala node that the java host is using […]

Day 4 – Big Data Project

1.when you try to deploy certificates and if the number of alternate/alias names limit is reached , you could try to have multiple certificates for multiple services . 2.users impala queries are taking all available memory on a node and causing memory issues for other queries . The solution is to configure load balancing for […]

Day 3 – Big Data Project

1. If your project is using informatica BDM or EIC and you have multiple linux hosts for example hosting the informatica services, you might want to deploy just one certificate on all hosts . The certificate should have all the host names and the load balancer names. This will ensure that even though the services […]

Day 2 – Big Data Project

Involved in activities related to enabling smart card login into a windows server . This requires a windows GPO. This requires our service to have a specific tree in active directory. To create a new tree in active directory involves multiple process management activities which delay the actual technical work that needs to be done. […]

Day 1 – Big Data Project

1.users had issues running impala queries. Out of memory errors . Multiple users running impala queries. Each node had 97GB physical RAM. One user occupied 75GB and so the other user suffered.Impala logs identified missing statistics. Advised user to gather statistics as a more efficient plan can be found. 2. An issue with source system […]

Oracle resource manager

oracle resource manager