What is an RDD in Spark

•Download as PPTX, PDF•

1 like•140 views

RDD stands for Resilient Distributed Dataset, which are fault-tolerant collections of elements that can be operated on in parallel in Spark. There are two ways to create RDDs: by parallelizing an existing collection or referencing a dataset in an external storage system. RDD operations include transformations, which take an RDD as input and output one or more RDDs lazily without changing the original, and actions, which trigger computation. Transformations are either narrow, originating from a single partition, or wide, taking data from multiple partitions requiring a shuffle.

Technology

What is RDD?
• RDD means Resilient distributed dataset.
• Spark revolves around the concept of RDD which is a fault-
tolerant collection of elements that can be operated in parallel.
• There are two ways to create RDDs, it can be created by
parallelizing an existing collection in your driver program, or
referencing a dataset in an external storage system such as
(HDFS, Hbase, or any datasource offering Hadoop format)

RDDs & its Operations:-
• There are basically two types of RDDs operations in spark.
1. Transformations.
2. Actions.

Transformations
• The RDD transformations are some functions that takes one
RDD as input and form one or more than one RDD as an
output .
• As all RDDs are immutable then the main RDD will not be
changed.
• It is lazy operation though it creates some RDDs but they can
executes when an action is called.

Types of RDD Transformation:
• To improve the computation performance, we can set some
transformations as pipelined. It helps to optimize process.
• There are two kinds of transformations:
1. Narrow Transformation
2. Wide Transformation

Narrow Transformation
• Narrow transformations are
generated as a result of
Map, Filter or these kind of
operations
• It originates from a single
partition in a parent RDD .
Only some partitions are
used to find result.

Wide Transformation
• Wide Transformations are
generated as a result of
GroupBykey(),
ReduceBykey() or these kind
of operations.
• In these case to form a data
partition, it can take data from
more than one partitions.
• It is also known as shuffle
partition.

What's hot

SteganographyMadhani Harsh

Qr code (quick response code)Rohan Sawant

GeoServer, an introduction for beginnersGeoSolutions

Maps and Meaning: Graph-based Entity Resolution in Apache Spark & GraphXDatabricks

Working with ArcGIS OnlineEsri

Gis in telecommAtiqa khan

PostGIS and Spatial SQLTodd Barr

DSpace for Cultural Heritage: adding support for images visualization,audio/v...4Science

BarcodeVaishnavi

GEOGRAPHIC INFORMATION SYSTEM.pptxFizaNaaz8

ArcGIS Enterprise: Web GIS en tu infraestructuraEsri España

Bridging Between CAD & GIS: 8 Ways to Automate Data IntegrationSafe Software

Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks

The GRASS GIS software (with QGIS) - GIS SeminarMarkus Neteler

The Art and Science of DDS Data ModellingAngelo Corsaro

Phone phreakingBenjamin Araneda

공간정보연구원 PostGIS 강의교재JungHwan Yun

Is Your Organization Ready to Embrace a Digital Twin?Cognizant

PPT steganographyparvez Sharaf

Fingerprint BiometricsRudra Prasad Maiti

What's hot (20)

Steganography

Qr code (quick response code)

GeoServer, an introduction for beginners

Maps and Meaning: Graph-based Entity Resolution in Apache Spark & GraphX

Working with ArcGIS Online

Gis in telecomm

PostGIS and Spatial SQL

DSpace for Cultural Heritage: adding support for images visualization,audio/v...

Barcode

GEOGRAPHIC INFORMATION SYSTEM.pptx

ArcGIS Enterprise: Web GIS en tu infraestructura

Bridging Between CAD & GIS: 8 Ways to Automate Data Integration

Transforming AI with Graphs: Real World Examples using Spark and Neo4j

The GRASS GIS software (with QGIS) - GIS Seminar

The Art and Science of DDS Data Modelling

Phone phreaking

공간정보연구원 PostGIS 강의교재

Is Your Organization Ready to Embrace a Digital Twin?

PPT steganography

Fingerprint Biometrics

Similar to What is an RDD in Spark

Rdd transformations bdaShaishavShah8

Unit II Real Time Data Processing tools.pptxRahul Borate

WHAT IS HADOOP AND ITS COMPONENTS? nakshatraL

Apache Spark on HDinsight TrainingSynergetics Learning and Cloud Consulting

Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar

SparkHeena Madan

Apache Spark for BeginnersAnirudh

Apache Spark CoreGirish Khanzode

Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk

SQL Server 2012 and Big DataMicrosoft TechNet - Belgium and Luxembourg

Programming in Spark using PySpark Mostafa

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman

Some thoughts on apache spark & sharkViet-Trung TRAN

A Step to programming with Apache SparkKnoldus Inc.

Introduction to Apache Spark Juan Pedro Moreno

Big Data trainingvishal192091

Cppt Hadoopchunkypandey12

Cpptchunkypandey12

Apache Spark FundamentalsZahra Eskandari

Similar to What is an RDD in Spark (20)

Rdd transformations bda

Unit II Real Time Data Processing tools.pptx

WHAT IS HADOOP AND ITS COMPONENTS?

Apache Spark on HDinsight Training

Geek Night - Functional Data Processing using Spark and Scala

Spark

Apache Spark for Beginners

Apache Spark Core

Secrets of Spark's success - Deenar Toraskar, Think Reactive

SQL Server 2012 and Big Data

Programming in Spark using PySpark

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Some thoughts on apache spark & shark

A Step to programming with Apache Spark

Introduction to Apache Spark

Big Data training

Cppt Hadoop

Cppt

Apache Spark Fundamentals

Recently uploaded

A Call to Action for Generative AI in 2024Results

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Histor y of HAM Radio presentation slidevu2urc

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

🐬 The future of MySQL is Postgres 🐘RTylerCroy

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Recently uploaded (20)

A Call to Action for Generative AI in 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Histor y of HAM Radio presentation slide

Scaling API-first – The story of a global engineering organization

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

08448380779 Call Girls In Civil Lines Women Seeking Men

[2024]Digital Global Overview Report 2024 Meltwater.pdf

What Are The Drone Anti-jamming Systems Technology?

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Boost Fertility New Invention Ups Success Rates.pdf

08448380779 Call Girls In Friends Colony Women Seeking Men

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

A Domino Admins Adventures (Engage 2024)

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Finology Group – Insurtech Innovation Award 2024

🐬 The future of MySQL is Postgres 🐘

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Advantages of Hiring UIUX Design Service Providers for Your Business

Axa Assurance Maroc - Insurer Innovation Award 2024

What is an RDD in Spark

1. What is RDD? • RDD means Resilient distributed dataset. • Spark revolves around the concept of RDD which is a fault- tolerant collection of elements that can be operated in parallel. • There are two ways to create RDDs, it can be created by parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system such as (HDFS, Hbase, or any datasource offering Hadoop format)

2. RDDs & its Operations:- • There are basically two types of RDDs operations in spark. 1. Transformations. 2. Actions.

3. Transformations • The RDD transformations are some functions that takes one RDD as input and form one or more than one RDD as an output . • As all RDDs are immutable then the main RDD will not be changed. • It is lazy operation though it creates some RDDs but they can executes when an action is called.

4. Types of RDD Transformation: • To improve the computation performance, we can set some transformations as pipelined. It helps to optimize process. • There are two kinds of transformations: 1. Narrow Transformation 2. Wide Transformation

5. Narrow Transformation • Narrow transformations are generated as a result of Map, Filter or these kind of operations • It originates from a single partition in a parent RDD . Only some partitions are used to find result.

6. Wide Transformation • Wide Transformations are generated as a result of GroupBykey(), ReduceBykey() or these kind of operations. • In these case to form a data partition, it can take data from more than one partitions. • It is also known as shuffle partition.

7. Thank You

What is an RDD in Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to What is an RDD in Spark

Similar to What is an RDD in Spark (20)

More from ShaishavShah8

More from ShaishavShah8 (18)

Recently uploaded

Recently uploaded (20)

What is an RDD in Spark