Author: beginnershadoop

July 2, 2023

Databricks Unity Catalog

The Databricks Unity Catalog is a feature provided by Databricks Unified Data Analytics Platform that allows you to organize and manage metadata about your data assets, such as...

Uncategorized

July 2, 2023

what is airflow task decorator

In Apache Airflow, the task decorator is a Python decorator used to define tasks within a Directed Acyclic Graph (DAG). Airflow is an open-source platform used to programmatically...

Data Architecture

March 25, 2023

Delta Lake is an open-source storage layer that allows developers to build scalable and efficient data pipelines for big data workloads. Delta Lake provides reliability, performance, and flexibility...

Data Architecture

March 25, 2023

Data Architecture for Beginners

Big data is a term used to describe large and complex data sets that require advanced computational and analytical tools to process and interpret. The field of big...

Scala / Spark / Spark Sql

May 10, 2020

Apache Spark: WindowSpec & Window

WindowSpec is a window specification that defines which rows are included in a window (frame), i.e. the set of rows that are associated with the current row by some relation. WindowSpec takes the following when...

Scala / Spark

April 2, 2020

Barrier Execution Mode in Spark

The barrier execution mode is experimental and it only handles limited scenarios. See SPIP: Barrier Execution Mode and Design Doc. In case of a task failure, instead of only restarting the...

Hadoop / impala

October 2, 2019

Impala Export to CSV

Apache Impala is an open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. In some cases, impala-shell is installed manually...

Spark

October 1, 2019

Spark Structured Streaming and Streaming Queries

Structured streaming: Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you...

Python / Scala / Spark / Spark Sql

October 1, 2019

Add constant column in spark

If we want to add a column with default value then we can do in spark. In spark 2.2 there are two ways to add constant value in...

Hadoop / Scala / Spark

October 1, 2019

Salesforce connector in Spark

Salesforce is a customer relationship management solution that brings companies and customers together. It’s one integrated CRM platform that gives all your departments — including marketing, sales, commerce,...

Author: beginnershadoop

Databricks Unity Catalog

what is airflow task decorator

Delta Lake Architecture

Data Architecture for Beginners

Apache Spark: WindowSpec & Window

Barrier Execution Mode in Spark

Impala Export to CSV

Spark Structured Streaming and Streaming Queries

Add constant column in spark

Salesforce connector in Spark

Recent Posts

Archives

Categories