SlideShare a Scribd company logo
1 of 27
Download to read offline
Exploring GitHub data with
Apache Drill on Arm64
Ganesh Raju Naresh Bhat
Who Are We Anyway?
What is Linaro:
Leading collaboration in the ARM ecosystem
Apache Drill
Open source distributed SQL query engine for non-relational datastores
- JSON document model
- Columnar
Key Advantages
- Columnar
- Schema on the fly
- Integrates with any non-relational datastore
- Elastic scalability
- Data can be treated like SQL Tables
- SQL like query syntax
- No overhead (creating and maintaining schemas, ETL process, etc )
- Vectorization (SIMD instructions)
Apache Drill on Arm64 Server
Test environment - SW basic configuration
Architecture Gigabyte Marvell® ThunderX2® "Saber" 3 node cluster
OS platform Debian GNU/Linux 9.9 (stretch)
Linux Kernel version Debian 4.16.13.linaro.290-1
GCC version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
GlibC version Debian GLIBC 2.24-11+deb9u4
JAVA version openjdk version "1.8.0_191"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.191-b12, mixed mode)
Hadoop version Hadoop 2.8.5
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
0b8464d75227fcee2c6e7f2410377b3d53d3d5f8
Compiled by jdu on 2018-09-10T03:32Z
Compiled with protoc 2.5.0
Using upstream release packages from apache.org.
Running on commercially available Arm server based on Marvell ThunderX2.
Test environment - SW basic configuration
Zookeeper and libzookeeper-java version 3.4.9-3+deb9u2
Apache Drill version v1.16.0
Jupyter Notebook version
Dataset 3 TB+ of github activity dataset contains a full snapshot . The content is more than 2.8 million open
source GitHub repositories. Which includes more than 145 million unique commits
Can replicate this demo using upstream release packages and open source data set.
jupyter core 4.5.0
jupyter-notebook 6.0.1
qtconsole 4.5.4
ipython 7.7.0
ipykernel 5.1.2
jupyter client 5.3.1
jupyter lab 1.0.9
nbconvert 5.6.0
ipywidgets 7.5.1
nbformat 4.4.0
traitlets 4.3.2
Select * from drillbits;
Select files in dfs;
Top projects this year in Github
Need to paste Apache Drill query snapshot
Top contributors by year
Need to paste Apache Drill query snapshot
Top contributors to Linux by year
Need to paste Apache Drill query snapshot
Top contributors to Bigdata
(Hadoop, Spark, HBase, Hive, drill, etc)
Need to paste Apache Drill query snapshot
Contributors by Country
SELECT * FROM
dfs.`/usersummary/*.json` limit
20
Language Popularity Score
SELECT * FROM
dfs.`/usersummary/*.json` limit 20
Top Python repositories by their commits count
SELECT * FROM
dfs.`/usersummary/*.json` limit 20
Top Apache Projects by contribution
Need to paste Apache Drill query snapshot
Who Are We Anyway?
We are Linaro: Leading collaboration in the
Arm ecosystem
Linaro: Open Source
Delivering high value collaboration
Top 5 company contributor in Linux
kernel
Contributor to >70 open source projects;
many maintained by Linaro engineers
Company 4.8-4.13 Changesets %
1 Intel 10,833 13.1%
2 Red Hat 5,965 7.2%
3 Linaro 4,636 5.6%
Source: Linux Kernel Development Report, Linux Foundation
Selected projects Linaro contributes to
Linaro: BigData Objective
● Ensure that Arm is a first class platform for Hadoop and Spark.
● Profile Hadoop and Spark for real world workloads on 64-bit Arm server
systems.
● Ensure that OpenJDK is running optimally against Hadoop and Spark workloads.
❏ Founded in November 1990
❏ Designs the RISC processor cores
❏ Licenses Arm core designs to
partners who fabricate and sell
to their customers
Arm Ecosystem momentum continues to accelerate
www.arm.com
Workloads
Networking
Virtualization &
Containers
Language & Library
Operating system
COMPANY FOUNDED
1995
FY19 REVENUE
$2.9B
EMPLOYEES
6,000+
LOCATED IN
Santa Clara, CA
R&D CENTERS
US, Israel, India,
Germany, China
PATENTS WORLDWIDE
10,000+
23
Marvell
© 2019 Marvell Confidential, All Rights Reserved.
24© 2019 Marvell Confidential, All Rights Reserved.
• Up to 32 custom Armv8.1 cores, up to 2.5GHz
• Full Out-of-Order, 1, 2, 4 threads per core
• 1S and 2S Configuration
• Up to 8 DDR4-2667 Memory Controllers, 1 & 2 DPC
• Up to 56 lanes of PCIe Gen3, 14 PCIe controllers
ThunderX2 Second Generation High-End Armv8-A Server SoC
25
Marvell powers
the world’s fastest
Arm-based
supercomputer
Driven by 145,152 (5,184 CPUs x 28
cores) ThunderX2 cores
Securing U.S. nuclear arsenal
© 2019 Marvell Confidential, All Rights Reserved.
Marvell-University of Michigan Partnership
Built on Cavium/Marvell-Michigan relationship
Deploy ThunderX for Big Data
● 4800 Cores
● 25 TB Memory
● 40 & 100 Gbps networking
● 3 PB Hadoop File System
Accelerating the software ecosystem for data science for Arm.
Directly consuming Linaro Big Data software builds
We bring an advanced user base in the data science domain
Questions ?
Contact Us:
Ganesh Raju
ganesh.raju@linaro.org
Naresh Bhat
nareshb@marvell.com
naresh.bhat@linaro.org
Blogpost
https://nbhatlinaro.blogspot.com/2019/04/apache-drill-on-arm64.html
Thanks to Linaro Team:
Yuqi Gu
Jun He
Guodong Xu
Inspiration from Felipe Hoffa’s talks on Google
BigQuery
https://s3.amazonaws.com/connect.linaro.org/bkk19/presentations/bkk19-
300k1.pdf

More Related Content

What's hot

BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment Dirk Petersen
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ian Colle
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?Kyle Bader
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage
 
CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!Ceph Community
 
RedisConf17 - Rax, Listpack and Safe Contexts
RedisConf17 - Rax, Listpack and Safe ContextsRedisConf17 - Rax, Listpack and Safe Contexts
RedisConf17 - Rax, Listpack and Safe ContextsRedis Labs
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph SolutionsRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
BeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPCBeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPCinside-BigData.com
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Community
 
Storage based on_openstack_mariocho
Storage based on_openstack_mariochoStorage based on_openstack_mariocho
Storage based on_openstack_mariochoMario Cho
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red_Hat_Storage
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red_Hat_Storage
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 

What's hot (20)

BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 
CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!
 
RedisConf17 - Rax, Listpack and Safe Contexts
RedisConf17 - Rax, Listpack and Safe ContextsRedisConf17 - Rax, Listpack and Safe Contexts
RedisConf17 - Rax, Listpack and Safe Contexts
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
BeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPCBeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPC
 
librados
libradoslibrados
librados
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
 
Storage based on_openstack_mariocho
Storage based on_openstack_mariochoStorage based on_openstack_mariocho
Storage based on_openstack_mariocho
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 

Similar to Exploring Github Data with Apache Drill on ARM64

Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 julyDiego Rodriguez
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) 동현 김
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0Marcel Mitran
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16Kangaroot
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Cheer Chain Enterprise Co., Ltd.
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm updateinwin stack
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Red Hat for IBM System z Update v5
Red Hat for IBM System z Update v5Red Hat for IBM System z Update v5
Red Hat for IBM System z Update v5Filipe Miranda
 
Cross-compilation native sous android
Cross-compilation native sous androidCross-compilation native sous android
Cross-compilation native sous androidThierry Gayet
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
NET core 2 e i fratelli
NET core 2 e i fratelliNET core 2 e i fratelli
NET core 2 e i fratelliAndrea Tosato
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System zShawn Wells
 

Similar to Exploring Github Data with Apache Drill on ARM64 (20)

Linux one vs x86
Linux one vs x86 Linux one vs x86
Linux one vs x86
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
spack_hpc.pptx
spack_hpc.pptxspack_hpc.pptx
spack_hpc.pptx
 
Rhel7 vs rhel6
Rhel7 vs rhel6Rhel7 vs rhel6
Rhel7 vs rhel6
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
 
Red Hat for IBM System z Update v5
Red Hat for IBM System z Update v5Red Hat for IBM System z Update v5
Red Hat for IBM System z Update v5
 
Cross-compilation native sous android
Cross-compilation native sous androidCross-compilation native sous android
Cross-compilation native sous android
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
NET core 2 e i fratelli
NET core 2 e i fratelliNET core 2 e i fratelli
NET core 2 e i fratelli
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
 

More from Ganesh Raju

Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesGanesh Raju
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...Ganesh Raju
 
Apache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectApache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectGanesh Raju
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
ODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectGanesh Raju
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Ganesh Raju
 
Technology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesTechnology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesGanesh Raju
 
Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_CassandraGanesh Raju
 

More from Ganesh Raju (10)

Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and Opportunities
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
 
Apache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectApache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro Connect
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
ODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro Connect
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Technology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesTechnology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and Opportunities
 
Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_Cassandra
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Exploring Github Data with Apache Drill on ARM64

  • 1. Exploring GitHub data with Apache Drill on Arm64 Ganesh Raju Naresh Bhat
  • 2. Who Are We Anyway?
  • 3. What is Linaro: Leading collaboration in the ARM ecosystem
  • 4. Apache Drill Open source distributed SQL query engine for non-relational datastores - JSON document model - Columnar Key Advantages - Columnar - Schema on the fly - Integrates with any non-relational datastore - Elastic scalability - Data can be treated like SQL Tables - SQL like query syntax - No overhead (creating and maintaining schemas, ETL process, etc ) - Vectorization (SIMD instructions)
  • 5. Apache Drill on Arm64 Server
  • 6. Test environment - SW basic configuration Architecture Gigabyte Marvell® ThunderX2® "Saber" 3 node cluster OS platform Debian GNU/Linux 9.9 (stretch) Linux Kernel version Debian 4.16.13.linaro.290-1 GCC version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 GlibC version Debian GLIBC 2.24-11+deb9u4 JAVA version openjdk version "1.8.0_191" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.191-b12, mixed mode) Hadoop version Hadoop 2.8.5 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0b8464d75227fcee2c6e7f2410377b3d53d3d5f8 Compiled by jdu on 2018-09-10T03:32Z Compiled with protoc 2.5.0 Using upstream release packages from apache.org. Running on commercially available Arm server based on Marvell ThunderX2.
  • 7. Test environment - SW basic configuration Zookeeper and libzookeeper-java version 3.4.9-3+deb9u2 Apache Drill version v1.16.0 Jupyter Notebook version Dataset 3 TB+ of github activity dataset contains a full snapshot . The content is more than 2.8 million open source GitHub repositories. Which includes more than 145 million unique commits Can replicate this demo using upstream release packages and open source data set. jupyter core 4.5.0 jupyter-notebook 6.0.1 qtconsole 4.5.4 ipython 7.7.0 ipykernel 5.1.2 jupyter client 5.3.1 jupyter lab 1.0.9 nbconvert 5.6.0 ipywidgets 7.5.1 nbformat 4.4.0 traitlets 4.3.2
  • 8. Select * from drillbits;
  • 10. Top projects this year in Github Need to paste Apache Drill query snapshot
  • 11. Top contributors by year Need to paste Apache Drill query snapshot
  • 12. Top contributors to Linux by year Need to paste Apache Drill query snapshot
  • 13. Top contributors to Bigdata (Hadoop, Spark, HBase, Hive, drill, etc) Need to paste Apache Drill query snapshot
  • 14. Contributors by Country SELECT * FROM dfs.`/usersummary/*.json` limit 20
  • 15. Language Popularity Score SELECT * FROM dfs.`/usersummary/*.json` limit 20
  • 16. Top Python repositories by their commits count SELECT * FROM dfs.`/usersummary/*.json` limit 20
  • 17. Top Apache Projects by contribution Need to paste Apache Drill query snapshot
  • 18. Who Are We Anyway? We are Linaro: Leading collaboration in the Arm ecosystem
  • 19. Linaro: Open Source Delivering high value collaboration Top 5 company contributor in Linux kernel Contributor to >70 open source projects; many maintained by Linaro engineers Company 4.8-4.13 Changesets % 1 Intel 10,833 13.1% 2 Red Hat 5,965 7.2% 3 Linaro 4,636 5.6% Source: Linux Kernel Development Report, Linux Foundation Selected projects Linaro contributes to
  • 20. Linaro: BigData Objective ● Ensure that Arm is a first class platform for Hadoop and Spark. ● Profile Hadoop and Spark for real world workloads on 64-bit Arm server systems. ● Ensure that OpenJDK is running optimally against Hadoop and Spark workloads.
  • 21. ❏ Founded in November 1990 ❏ Designs the RISC processor cores ❏ Licenses Arm core designs to partners who fabricate and sell to their customers
  • 22. Arm Ecosystem momentum continues to accelerate www.arm.com Workloads Networking Virtualization & Containers Language & Library Operating system
  • 23. COMPANY FOUNDED 1995 FY19 REVENUE $2.9B EMPLOYEES 6,000+ LOCATED IN Santa Clara, CA R&D CENTERS US, Israel, India, Germany, China PATENTS WORLDWIDE 10,000+ 23 Marvell © 2019 Marvell Confidential, All Rights Reserved.
  • 24. 24© 2019 Marvell Confidential, All Rights Reserved. • Up to 32 custom Armv8.1 cores, up to 2.5GHz • Full Out-of-Order, 1, 2, 4 threads per core • 1S and 2S Configuration • Up to 8 DDR4-2667 Memory Controllers, 1 & 2 DPC • Up to 56 lanes of PCIe Gen3, 14 PCIe controllers ThunderX2 Second Generation High-End Armv8-A Server SoC
  • 25. 25 Marvell powers the world’s fastest Arm-based supercomputer Driven by 145,152 (5,184 CPUs x 28 cores) ThunderX2 cores Securing U.S. nuclear arsenal © 2019 Marvell Confidential, All Rights Reserved.
  • 26. Marvell-University of Michigan Partnership Built on Cavium/Marvell-Michigan relationship Deploy ThunderX for Big Data ● 4800 Cores ● 25 TB Memory ● 40 & 100 Gbps networking ● 3 PB Hadoop File System Accelerating the software ecosystem for data science for Arm. Directly consuming Linaro Big Data software builds We bring an advanced user base in the data science domain
  • 27. Questions ? Contact Us: Ganesh Raju ganesh.raju@linaro.org Naresh Bhat nareshb@marvell.com naresh.bhat@linaro.org Blogpost https://nbhatlinaro.blogspot.com/2019/04/apache-drill-on-arm64.html Thanks to Linaro Team: Yuqi Gu Jun He Guodong Xu Inspiration from Felipe Hoffa’s talks on Google BigQuery https://s3.amazonaws.com/connect.linaro.org/bkk19/presentations/bkk19- 300k1.pdf