Hadoop

bg
Industry Transformation: the new business as usual

Industry Transformation: the new business as usual

The Industry Transformation category at the Data Impact Awards has never been mo...

bg
Dancing with Elephants in 5 Easy Steps

Dancing with Elephants in 5 Easy Steps

The Corner Office is pressing their direct reports across the company to “Move T...

bg
Global View Distributed File System with Mount Points

Global View Distributed File System with Mount Points

Apache Hadoop Distributed File System (HDFS) is the most popular file system in ...

bg
Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage ...

bg
Large Scale Industrialization Key to Open Source Innovation

Large Scale Industrialization Key to Open Source Innova...

Cloudera’s open source licensing policies have evolved with the changing dynamic...

bg
03 sqoop import 2

03 sqoop import 2

bg
Interacting with AWS S3 using Java on EC2

Interacting with AWS S3 using Java on EC2

Many web applications are being built on top of the Cloud Infrastructure. Let'...

bg
Creating a K8S Cluster on AWS using eksctl

Creating a K8S Cluster on AWS using eksctl

As mentioned in the official K8S documentation (1), there are different ways of...

bg
Changes to the AWS EC2 Instance Metadata Service (IMDS) around the recent Capital One hack

Changes to the AWS EC2 Instance Metadata Service (IMDS)...

Captial One Bank (1) and 30 different organizations were hacked around end of J...

bg
Running Containers on K8S using AWS Fargate

Running Containers on K8S using AWS Fargate

Container orchestration is all in hype. And there are different ways of running...

bg
Prajval in '32nd South Zone Aquatic Championship – 2019'

Prajval in '32nd South Zone Aquatic Championship – 2019'

My son SS Prajval got Silver Medal in the finals of ‘4x50 mts Medley Relay – G...

bg
Debugging K8S applications with Ephemeral Containers

Debugging K8S applications with Ephemeral Containers

It's always CRITICAL to pack a Container image with the minimal binaries req...

bg
AWS AMI vs Launch Templates

AWS AMI vs Launch Templates

Very often I get asked about the differences between AMI (Amazon Machine Images...

bg
How does Key Pair work behind the scenes for Linux EC2 authentication?

How does Key Pair work behind the scenes for Linux EC2 ...

Different  ways of authenticating against Linux EC2 Once a Linux EC2 instance...

bg
How does the K8S cluster gets bootstrapped?

How does the K8S cluster gets bootstrapped?

Although the different Cloud vendors provide managed service like AWS EKS, GCP ...

bg
MicroK8S - Easiest way to get started with K8S for those familiar with AWS

MicroK8S - Easiest way to get started with K8S for thos...

AWS provides different ways of running K8S in the Cloud via EKS, ECS and with/w...

bg
How K8S authentication and authorization works in kubeadm?

How K8S authentication and authorization works in kubeadm?

To get started and practice a few things around K8S, I have setup a K8S cluster...

bg
Sticking to the AWS free tier and building K8S cluster using kops

Sticking to the AWS free tier and building K8S cluster ...

In one of the previous blog we have seen how to setup K8S on AWS using t2.micro...

bg
Accessing private resources using AWS Client VPN

Accessing private resources using AWS Client VPN

AWS VPC supports creating public and private subnets. Any EC2 in the public sub...

bg
Creating a VPC and connecting to the EC2 in the Private Subnet

Creating a VPC and connecting to the EC2 in the Private...

An AWS VPC (Virtual Private Cloud) is a logically isolated network for isolating...

bg
Connecting to S3 Service via VPC Gateway Endpoint

Connecting to S3 Service via VPC Gateway Endpoint

Lets say we are building a image processing application using ML which gets the ...

bg
Optimal VirtualBox network setting for K8S on Laptop

Optimal VirtualBox network setting for K8S on Laptop

In one of the previous blog we looked at setting up K8S on a laptop. The advanta...

bg
Connecting Lens IDE to K8S Cluster using port forwarding

Connecting Lens IDE to K8S Cluster using port forwarding

In the previous blogs (1, 2), I mentioned about setting up K8S Cluster on laptop...

bg
Using the same Keypair across AWS Regions

Using the same Keypair across AWS Regions

In one of the previous blog (1), we looked what happens behind the scenes when w...

bg
Automating EC2 or Linux tasks using "tmux"

Automating EC2 or Linux tasks using "tmux"

A lot of times we do create multiple EC2 instances and install the same software...

bg
Provisioning AWS infrastructure using Ansible

Provisioning AWS infrastructure using Ansible

Cloud infrastructure provision can be automated using code. The main advantage i...

bg
Setting up additional EC2 users with username/password and Keypair authentication

Setting up additional EC2 users with username/password ...

When an Ubuntu EC2 instances is created in the AWS Cloud, we should be able to c...

bg
Applications around the intersection of Big Data / Machine Learning and AWS

Applications around the intersection of Big Data / Mach...

As many of the readers of this blog know I am a big fan of Big Data and the AWS ...

bg
Bicycling - my new hobby

Bicycling - my new hobby

It had been quite some time I blogged here. Lately I had been a bit busy with pe...

bg
Installing K8S on AWS EC2 and connecting via Lens

Installing K8S on AWS EC2 and connecting via Lens

There are tons of ways of setting up K8S on AWS. Today we will see one of the ea...

Using MFA with AWS CLI

Lets say that the AWS account credentials get compromised, the hacker should be ...

bg
ETL VS ELT: The Key Differences

ETL VS ELT: The Key Differences

In this article, we will explore the topic of ETL (Extract, Transform, Load) and...

bg
Hadoop vs SQL

Hadoop vs SQL

In today’s world, Organisations rely on Big Data to fuel their operations, and H...

bg
PHP Vs Javascript: The Right Tech For Your Next Big Project

PHP Vs Javascript: The Right Tech For Your Next Big Pro...

78.9% of all websites use PHP as their server-side programming language. Some of...

bg
How can software engineers and data scientists work together?

How can software engineers and data scientists work tog...

Data scientists are excellent mathematicians with extensive cross-disciplinary k...

bg
A Beginner’s Guide to SQL Programming + 10 Basic Commands to Learn

A Beginner’s Guide to SQL Programming + 10 Basic Comman...

Image Credit – SQLWatchmen Developers and programmers are likely familiar with i...

bg
Top 85 Big Data Interview Questions and Answers for 2024

Top 85 Big Data Interview Questions and Answers for 2024

The global big data and business analytics market was valued at 169 billion U.S....

bg
The Field of Data Science Continues to Change the World for the Better

The Field of Data Science Continues to Change the World...

Sci-fi novels and television shows predicted the invention of self-driving cars,...

bg
The Data Engineer’s Roadmap

The Data Engineer’s Roadmap

Data engineering is fascinating, if not fulfilling career. You are at the helm o...

bg
30 Popular Best Data Science Tools to use in 2024

30 Popular Best Data Science Tools to use in 2024

Data science is a rapidly evolving field that uses scientific methods, processes...

bg
Essential AI Engineer Skills and Tools you Should Master

Essential AI Engineer Skills and Tools you Should Master

As the field of Artificial Intelligence (AI) continues to expand, the demand for...

Apache Phoenix

Apache phoenix is installed under /opt/mas/phoenix/ Launch Phoenix CLI with ...

Kafka Connect

Kafka Connect- https://www.confluent.io/product/connectors/ Kafka Connect Kaf...

HBase Architecture and inserting data

                                            HBase Architecture: Master Slave ...

Amazon Kenisis

Kenisis: Can be run on EC2 Instances. Similar as Kafka Records of a stream c...

SQL Server commands

Get the table name and the row count in a DB: SELECT     t.NAME AS TableName,...

AWS

Command to copy from local linux file system to S3. aws s3 sync ${v_input_pat...

OOZIE

OOZIE Schedule frequency=0 09,15,22 * * * This frequency indicates Job to g...

bg
CICD Process

CICD Process

CICD Process: Continuous Integration and Continuous Deployment Git is the sou...

StreamSets

Streamsets is a datapipeline tool and has multiple built in ready to use proces...

XML Parsing

XML Parsing: Source: https://medium.com/@tennysusanto/use-databricks-spark-x...

Creating Spark Scala SBT Project in Intellij

Below are the steps for creation Spark Scala SBT Project in Intellij: 1. Ope...

Configuring a Spark-submit Job

Configuring Spark-submit parameters Before going further let's discuss on the...

Updating data in a Hive table

This can be achieved with out ORC file format and transaction=false, can be ac...

Mongo Spark Connector

This Article explains the way to Write, Read and Update data to MongoDB. One o...

Improve Spark job performance

Below are the 2 useful links, https://medium.com/swlh/4-simple-tips-to-improve...

Reconciliation in Spark

Input configuration as CSV and get primary keys for the respective tables and Up...

Spark Scala vs pySpark

Performance: Many articles say that "Spark Scala is 10 times faster than pySpark...

Data lake vs. Data warehouse

 What is the difference between a data lake and a data warehouse?A data lake and...

Database Architecture

 How to improve an existing Data Architecture?How do you choose Datalake vs Data...

bg
Normalization in DBMS

Normalization in DBMS

Problems: RedundancyDifferent kinds of Normal Forms:1NF, 2NF,3NF etcData Modelli...

PipEnv

 Pipenv is a Python virtualenv management tool that supports a multitude of syst...

Pytest

 pipenv pip install pytestNow, I have a simple function in a filet.pydef square(...

Databricks

 Databricks provides a community edition for free and can be used to explore it'...

Classes and Object Oriented Python

 In Python, you define a class by using the class keyword followed by a name and...

bg
What is the difference between DELETE and truncate in Hive with examples

What is the difference between DELETE and truncate in H...

Introduction to Apache Hive:Apache Hive is a data warehousing and SQL-like tool ...

bg
50 + AWS MCQ Questions and answers

50 + AWS MCQ Questions and answers

Are you looking to deepen your knowledge of Amazon Web Services (AWS)? Do you wa...

bg
What is Namenode in Hadoop? Key Functions, Handling Datanode Failure, and Best Practices

What is Namenode in Hadoop? Key Functions, Handling Dat...

Introduction to Namenode in Hadoop:The Namenode is a crucial component in the Ha...

bg
Top 22+ FREE Microsoft Power BI Tutorials & Courses Online

Top 22+ FREE Microsoft Power BI Tutorials & Courses Online

Microsoft Power BI is a powerful business intelligence tool that enables users t...

bg
How to Convert a Python Script to a Shell Script: A Step-by-Step Guide with Examples

How to Convert a Python Script to a Shell Script: A Ste...

Python is a versatile and powerful programming language that excels in various a...

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.

ca-pub-4239506253673884