This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.
Hadoop
Industry Transformation: the new business as usual
The Industry Transformation category at the Data Impact Awards has never been mo...
Dancing with Elephants in 5 Easy Steps
The Corner Office is pressing their direct reports across the company to “Move T...
Global View Distributed File System with Mount Points
Apache Hadoop Distributed File System (HDFS) is the most popular file system in ...
Apache Ozone Powers Data Science in CDP Private Cloud
Apache Ozone is a scalable distributed object store that can efficiently manage ...
Large Scale Industrialization Key to Open Source Innova...
Cloudera’s open source licensing policies have evolved with the changing dynamic...
Interacting with AWS S3 using Java on EC2
Many web applications are being built on top of the Cloud Infrastructure. Let'...
Creating a K8S Cluster on AWS using eksctl
As mentioned in the official K8S documentation (1), there are different ways of...
Changes to the AWS EC2 Instance Metadata Service (IMDS)...
Captial One Bank (1) and 30 different organizations were hacked around end of J...
Running Containers on K8S using AWS Fargate
Container orchestration is all in hype. And there are different ways of running...
Prajval in '32nd South Zone Aquatic Championship – 2019'
My son SS Prajval got Silver Medal in the finals of ‘4x50 mts Medley Relay – G...
Debugging K8S applications with Ephemeral Containers
It's always CRITICAL to pack a Container image with the minimal binaries req...
AWS AMI vs Launch Templates
Very often I get asked about the differences between AMI (Amazon Machine Images...
How does Key Pair work behind the scenes for Linux EC2 ...
Different ways of authenticating against Linux EC2 Once a Linux EC2 instance...
How does the K8S cluster gets bootstrapped?
Although the different Cloud vendors provide managed service like AWS EKS, GCP ...
MicroK8S - Easiest way to get started with K8S for thos...
AWS provides different ways of running K8S in the Cloud via EKS, ECS and with/w...
How K8S authentication and authorization works in kubeadm?
To get started and practice a few things around K8S, I have setup a K8S cluster...
Sticking to the AWS free tier and building K8S cluster ...
In one of the previous blog we have seen how to setup K8S on AWS using t2.micro...
Accessing private resources using AWS Client VPN
AWS VPC supports creating public and private subnets. Any EC2 in the public sub...
Creating a VPC and connecting to the EC2 in the Private...
An AWS VPC (Virtual Private Cloud) is a logically isolated network for isolating...
Connecting to S3 Service via VPC Gateway Endpoint
Lets say we are building a image processing application using ML which gets the ...
Optimal VirtualBox network setting for K8S on Laptop
In one of the previous blog we looked at setting up K8S on a laptop. The advanta...
Connecting Lens IDE to K8S Cluster using port forwarding
In the previous blogs (1, 2), I mentioned about setting up K8S Cluster on laptop...
Using the same Keypair across AWS Regions
In one of the previous blog (1), we looked what happens behind the scenes when w...
Automating EC2 or Linux tasks using "tmux"
A lot of times we do create multiple EC2 instances and install the same software...
Provisioning AWS infrastructure using Ansible
Cloud infrastructure provision can be automated using code. The main advantage i...
Setting up additional EC2 users with username/password ...
When an Ubuntu EC2 instances is created in the AWS Cloud, we should be able to c...
Applications around the intersection of Big Data / Mach...
As many of the readers of this blog know I am a big fan of Big Data and the AWS ...
Bicycling - my new hobby
It had been quite some time I blogged here. Lately I had been a bit busy with pe...
Installing K8S on AWS EC2 and connecting via Lens
There are tons of ways of setting up K8S on AWS. Today we will see one of the ea...
Using MFA with AWS CLI
Lets say that the AWS account credentials get compromised, the hacker should be ...
ETL VS ELT: The Key Differences
In this article, we will explore the topic of ETL (Extract, Transform, Load) and...
PHP Vs Javascript: The Right Tech For Your Next Big Pro...
78.9% of all websites use PHP as their server-side programming language. Some of...
How can software engineers and data scientists work tog...
Data scientists are excellent mathematicians with extensive cross-disciplinary k...
A Beginner’s Guide to SQL Programming + 10 Basic Comman...
Image Credit – SQLWatchmen Developers and programmers are likely familiar with i...
Top 85 Big Data Interview Questions and Answers for 2024
The global big data and business analytics market was valued at 169 billion U.S....
The Field of Data Science Continues to Change the World...
Sci-fi novels and television shows predicted the invention of self-driving cars,...
The Data Engineer’s Roadmap
Data engineering is fascinating, if not fulfilling career. You are at the helm o...
30 Popular Best Data Science Tools to use in 2024
Data science is a rapidly evolving field that uses scientific methods, processes...
Essential AI Engineer Skills and Tools you Should Master
As the field of Artificial Intelligence (AI) continues to expand, the demand for...
Apache Phoenix
Apache phoenix is installed under /opt/mas/phoenix/ Launch Phoenix CLI with ...
Kafka Connect
Kafka Connect- https://www.confluent.io/product/connectors/ Kafka Connect Kaf...
HBase Architecture and inserting data
HBase Architecture: Master Slave ...
Amazon Kenisis
Kenisis: Can be run on EC2 Instances. Similar as Kafka Records of a stream c...
SQL Server commands
Get the table name and the row count in a DB: SELECT t.NAME AS TableName,...
AWS
Command to copy from local linux file system to S3. aws s3 sync ${v_input_pat...
OOZIE
OOZIE Schedule frequency=0 09,15,22 * * * This frequency indicates Job to g...
StreamSets
Streamsets is a datapipeline tool and has multiple built in ready to use proces...
XML Parsing
XML Parsing: Source: https://medium.com/@tennysusanto/use-databricks-spark-x...
Creating Spark Scala SBT Project in Intellij
Below are the steps for creation Spark Scala SBT Project in Intellij: 1. Ope...
Configuring a Spark-submit Job
Configuring Spark-submit parameters Before going further let's discuss on the...
Updating data in a Hive table
This can be achieved with out ORC file format and transaction=false, can be ac...
Mongo Spark Connector
This Article explains the way to Write, Read and Update data to MongoDB. One o...
Improve Spark job performance
Below are the 2 useful links, https://medium.com/swlh/4-simple-tips-to-improve...
Reconciliation in Spark
Input configuration as CSV and get primary keys for the respective tables and Up...
Spark Scala vs pySpark
Performance: Many articles say that "Spark Scala is 10 times faster than pySpark...
Data lake vs. Data warehouse
What is the difference between a data lake and a data warehouse?A data lake and...
Database Architecture
How to improve an existing Data Architecture?How do you choose Datalake vs Data...
Normalization in DBMS
Problems: RedundancyDifferent kinds of Normal Forms:1NF, 2NF,3NF etcData Modelli...
PipEnv
Pipenv is a Python virtualenv management tool that supports a multitude of syst...
Pytest
pipenv pip install pytestNow, I have a simple function in a filet.pydef square(...
Databricks
Databricks provides a community edition for free and can be used to explore it'...
Classes and Object Oriented Python
In Python, you define a class by using the class keyword followed by a name and...
What is the difference between DELETE and truncate in H...
Introduction to Apache Hive:Apache Hive is a data warehousing and SQL-like tool ...
50 + AWS MCQ Questions and answers
Are you looking to deepen your knowledge of Amazon Web Services (AWS)? Do you wa...
What is Namenode in Hadoop? Key Functions, Handling Dat...
Introduction to Namenode in Hadoop:The Namenode is a crucial component in the Ha...
Top 22+ FREE Microsoft Power BI Tutorials & Courses Online
Microsoft Power BI is a powerful business intelligence tool that enables users t...
How to Convert a Python Script to a Shell Script: A Ste...
Python is a versatile and powerful programming language that excels in various a...