SlideShare a Scribd company logo
1 of 22
Download to read offline
BinaryEdge.io
Be Ready. Be Safe. Be Secure.
Florentino Bexiga
Stories from the Trenches of
Building a Data Architecture
Data Engineer/ Platform Developer
fb@binaryedge.io
WHO WE ARE AND WHAT WE DO
VNC
RDP
Files People
Social
Company
registration
internal
external
Phone
Email
Linked urls
BGP
AS
Whois
AS membership
AS peer
List of IPs
Shared
infrastructure
Co-hosted
sites
Contact
Geolocation
Office
locations
Social
networks
Phone
portscan
dns
Screenshots
Web
Services
http https
Users
AppsFiles
Banners
Image
Classifier
Vulnerabilities
DATA POINTS
metadata
Photos
Family&friends
Behaviour
Likes
Topics
Search
News
Forums
Sub-reddits
Domains
AXFR
MX records
Webserver
Framework
Headers
Cookies
Certificate
Configuration
Authorities
Entities
OCR
SW
ip address
url address
SMB
torrents
peers torrent name categorysource hashes of files
AGENDA
01
02
THE NEED OF A DATA ARCHITECTURE
03
SIMPLE ARCHITECTURE OVERVIEW
04
05
MESSAGE QUEUE
STREAM PROCESSING
06 BATCH PROCESSING
07 DATABASES
08 BONUS ROUND: MANAGEMENT
09 ARCHITECTURE REVISITED
10 CLOUD-BASED ARCHITECTURES
THE BASIC SURVIVAL KIT
THE NEED OF A DATA ARCHITECTURE
Rules before building a data architecture Typical list of needs
Think about what you need to do with the data
There are no more rules
Gather a lot of data coming from different places
Process that data in (close to) real-time
Make data available in multiple formats
Provide ways to easily process that data
SIMPLE ARCHITECTURE OVERVIEW
SENSOR
STREAM PRO-
CESSING
SENSOR
SENSOR
DATA SINK
MESSAGE
QUEUE
FILE
STORAGE
BATCH
PROCESSING
DATABASES APIs PORTALS
THE BASIC SURVIVAL KIT
Apache Hadoop
MapReduce
HDFS
Yarn
Why Apache Hadoop?
Interoperability with many other tools
Great community
Gets the job done
THE BASIC SURVIVAL KIT
THE BASIC SURVIVAL KIT
YARN
Available resources per node for processing
Timeouts
Heap, heap...
HDFS
Same as above
Primary/ Secondary nodes - high availability
Points of attention
MESSAGE QUEUE
Apache Kafka
Originally developed by LinkedIn
Massively scalable publish/ subscribe message queue
High troughout
Low latency
Concepts
Topics
Consumers
Consumer groups
Partitions
Replicas
MESSAGE QUEUE
Points of attention
Timeouts
Message sizes
Retention logs vs cleanup interval !!!!
Also, do not, for the love of god, simply delete all the subdirectories in your“kafka-logs”directory, you will cry.
STREAM PROCESSING
vs. vs.
STREAM PROCESSING
The good parts
Very simple programming model and APIs
Multilanguage support
Points of attention
Mini-batch processing, not real stream
Heavy resource fingerprint
Prone to timeouts of memory errors
Hard to fine-tune to get the right performance
DataFrame API
ML Libraries
Wide community
Wide range of addons
STREAM PROCESSINGSTREAM PROCESSING
The good parts
Stream processing
Multilanguage support
Points of attention
Slightly more complex programming model
Some support for other languages
Works without much configuration effort
Low resources configuration
Wide community
Lots of connectors and addons
Great performance, like,“The flash”great
STREAM PROCESSINGSTREAM PROCESSING
The good parts
Stream processing
Multilanguage support
Buuuuut.....
Does not have a wide community
Does not have that many connectors and addons
Simple API (very similar to Spark)
Dataset API
ML Libraries
Good handling of resources
Low configuration/ optimisation overhead
BATCH PROCESSING
Apache Spark
Multilanguage support
Simple API
DataFrame API
ML Libraries
Wide community
Wide range of addons
Apache Flink
The good parts
Multilanguage support
Simple API (very similar)
DataSet API
ML Libraries
BATCH PROCESSING
Apache Spark
Heavy resource fingerprint
Prone to timeouts of memory errors
Hard to fine-tune to get the right performance
Apache Flink
Points of attention
Less configuration problems
Better handling of resources
Not a big community
Not many addons
DATABASES
Before commiting to a database
01 Think about how you need to access the data
02 Read 1 again
03 Seriously, read 1 again
Select a database, based on your needs, i.e.:
Hardcore read/ write workload and not much advanced querying: HBase
Heavy read/ write workload and minimally dynamic querying: Cassandra
Advanced text querying and not such heavy read/ write workload: something else
BONUS ROUND: MANAGEMENT
Apache Ambari
Provision a Hadoop Cluster
Manage a Hadoop Cluster
Monitor a Hadoop Cluster
Ambari uses Hadoop ecosystem distributions such as:
Hortonworks
Cloudera
ARCHITECTURE REVISITED
SENSOR
APACHE
STORM
SENSOR
SENSOR
DATA SINK
APACHE
KAFKA
APACHE
HDFS
APACHE
SPARK
APACHE HBASE/
CASSANDRA
APIs PORTALS
CLOUD BASED ARCHITECTURES
Pros
Less configuration overhead
Less maintenance overhead
Easily scalable
Reliable
Return focus back to data
and product
Cons
$$$$$$$$$$
CLOUD BASED ARCHITECTURES
SENSOR
GOOGLE
DATAFLOW
SENSOR
SENSOR
DATA SINK
GOOGLE
PUBSUB
GOOGLE CLOUD
STORAGE
GOOGLE
DATAPROC
APIs PORTALS
GOOGLE BIGTABLE/
BIGQUERY
CLOUD BASED ARCHITECTURES
SENSOR
AMAZON DATA
PIPELINE
SENSOR
SENSOR
DATA SINK
AMAZON SIMPLE
QUEUE SERVICE
AMAZON S3
AMAZON ELASTIC
MAPREDUCE
APIs PORTALS
AMAZON
DYNAMODB/
REDSHIFT
BE READY. BE SAFE. BE SECURE.
BinaryEdge AG
Freigutstrasse 40,
8001 Zurich
Switzerland
info@binaryedge.io
www.binaryedge.io
+ 41 78 713 40 00
CONTIGENCY THREAT SAFE IRRELEVANT

More Related Content

What's hot

Cloud computing present
Cloud computing presentCloud computing present
Cloud computing presentJames Sutter
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020Ulf Mattsson
 
Data Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus PandemicData Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus PandemicUlf Mattsson
 
Key note in nyc the next breach target and how oracle can help - nyoug
Key note in nyc   the next breach target and how oracle can help - nyougKey note in nyc   the next breach target and how oracle can help - nyoug
Key note in nyc the next breach target and how oracle can help - nyougUlf Mattsson
 
Data protection on premises, and in public and private clouds
Data protection on premises, and in public and private cloudsData protection on premises, and in public and private clouds
Data protection on premises, and in public and private cloudsUlf Mattsson
 
Layer8 exploitation: Lock'n Load Target
Layer8 exploitation: Lock'n Load TargetLayer8 exploitation: Lock'n Load Target
Layer8 exploitation: Lock'n Load TargetPrathan Phongthiproek
 
New york oracle users group 2013 spring general meeting ulf mattsson
New york oracle users group 2013 spring general meeting   ulf mattssonNew york oracle users group 2013 spring general meeting   ulf mattsson
New york oracle users group 2013 spring general meeting ulf mattssonUlf Mattsson
 
Cloud Security (CASB) for Slack
Cloud Security (CASB) for SlackCloud Security (CASB) for Slack
Cloud Security (CASB) for SlackSachin Yadav
 
Next generation data protection and security for oracle users - gdpr blockc...
Next generation data protection and security for oracle users   - gdpr blockc...Next generation data protection and security for oracle users   - gdpr blockc...
Next generation data protection and security for oracle users - gdpr blockc...Ulf Mattsson
 
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?michaelbasoah
 
Emerging application and data protection for multi cloud
Emerging application and data protection for multi cloudEmerging application and data protection for multi cloud
Emerging application and data protection for multi cloudUlf Mattsson
 
New regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscapeNew regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscapeUlf Mattsson
 
SplunkLive! Wien 2016 - Splunk für IT Operations
SplunkLive! Wien 2016 - Splunk für IT OperationsSplunkLive! Wien 2016 - Splunk für IT Operations
SplunkLive! Wien 2016 - Splunk für IT OperationsSplunk
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCUlf Mattsson
 
Ransomware webinar may 2016 final version external
Ransomware webinar   may 2016 final version externalRansomware webinar   may 2016 final version external
Ransomware webinar may 2016 final version externalZscaler
 
The Security Gap: Protecting Healthcare Data in Office 365
The Security Gap: Protecting Healthcare Data in Office 365The Security Gap: Protecting Healthcare Data in Office 365
The Security Gap: Protecting Healthcare Data in Office 365Bitglass
 
Jul 16 isaca london data protection, security and privacy risks - on premis...
Jul 16 isaca london   data protection, security and privacy risks - on premis...Jul 16 isaca london   data protection, security and privacy risks - on premis...
Jul 16 isaca london data protection, security and privacy risks - on premis...Ulf Mattsson
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudUlf Mattsson
 
Cybersecurity-Serverless-Graph DB
Cybersecurity-Serverless-Graph DBCybersecurity-Serverless-Graph DB
Cybersecurity-Serverless-Graph DBSukumar Nayak
 

What's hot (20)

Cloud computing present
Cloud computing presentCloud computing present
Cloud computing present
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020
 
Data Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus PandemicData Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus Pandemic
 
Key note in nyc the next breach target and how oracle can help - nyoug
Key note in nyc   the next breach target and how oracle can help - nyougKey note in nyc   the next breach target and how oracle can help - nyoug
Key note in nyc the next breach target and how oracle can help - nyoug
 
Data protection on premises, and in public and private clouds
Data protection on premises, and in public and private cloudsData protection on premises, and in public and private clouds
Data protection on premises, and in public and private clouds
 
Layer8 exploitation: Lock'n Load Target
Layer8 exploitation: Lock'n Load TargetLayer8 exploitation: Lock'n Load Target
Layer8 exploitation: Lock'n Load Target
 
New york oracle users group 2013 spring general meeting ulf mattsson
New york oracle users group 2013 spring general meeting   ulf mattssonNew york oracle users group 2013 spring general meeting   ulf mattsson
New york oracle users group 2013 spring general meeting ulf mattsson
 
Cloud Security (CASB) for Slack
Cloud Security (CASB) for SlackCloud Security (CASB) for Slack
Cloud Security (CASB) for Slack
 
Next generation data protection and security for oracle users - gdpr blockc...
Next generation data protection and security for oracle users   - gdpr blockc...Next generation data protection and security for oracle users   - gdpr blockc...
Next generation data protection and security for oracle users - gdpr blockc...
 
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
Are Your Appliance Security Solutions Ready For 2048-bit SSL Certificates ?
 
Emerging application and data protection for multi cloud
Emerging application and data protection for multi cloudEmerging application and data protection for multi cloud
Emerging application and data protection for multi cloud
 
SOC-as-a-Service - comSpark 2019
SOC-as-a-Service - comSpark 2019SOC-as-a-Service - comSpark 2019
SOC-as-a-Service - comSpark 2019
 
New regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscapeNew regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscape
 
SplunkLive! Wien 2016 - Splunk für IT Operations
SplunkLive! Wien 2016 - Splunk für IT OperationsSplunkLive! Wien 2016 - Splunk für IT Operations
SplunkLive! Wien 2016 - Splunk für IT Operations
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYC
 
Ransomware webinar may 2016 final version external
Ransomware webinar   may 2016 final version externalRansomware webinar   may 2016 final version external
Ransomware webinar may 2016 final version external
 
The Security Gap: Protecting Healthcare Data in Office 365
The Security Gap: Protecting Healthcare Data in Office 365The Security Gap: Protecting Healthcare Data in Office 365
The Security Gap: Protecting Healthcare Data in Office 365
 
Jul 16 isaca london data protection, security and privacy risks - on premis...
Jul 16 isaca london   data protection, security and privacy risks - on premis...Jul 16 isaca london   data protection, security and privacy risks - on premis...
Jul 16 isaca london data protection, security and privacy risks - on premis...
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloud
 
Cybersecurity-Serverless-Graph DB
Cybersecurity-Serverless-Graph DBCybersecurity-Serverless-Graph DB
Cybersecurity-Serverless-Graph DB
 

Similar to Pixels Camp 2017 - Stories from the trenches of building a data architecture

Decoupled cms sunshinephp 2014
Decoupled cms sunshinephp 2014Decoupled cms sunshinephp 2014
Decoupled cms sunshinephp 2014Lukas Smith
 
Back to the Basics: SharePoint Fundamentals by Joel Oleson
Back to the Basics: SharePoint Fundamentals by Joel OlesonBack to the Basics: SharePoint Fundamentals by Joel Oleson
Back to the Basics: SharePoint Fundamentals by Joel OlesonJoel Oleson
 
Best Practices to SharePoint Architecture Fundamentals NZ & AUS
Best Practices to SharePoint Architecture Fundamentals NZ & AUSBest Practices to SharePoint Architecture Fundamentals NZ & AUS
Best Practices to SharePoint Architecture Fundamentals NZ & AUSguest7c2e070
 
MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2Information Technology
 
SharePoint in the Extranet Joel Oleson
SharePoint in the Extranet Joel OlesonSharePoint in the Extranet Joel Oleson
SharePoint in the Extranet Joel Olesonwebhostingguy
 
Introduction wss-3-and-moss-2007-12324
Introduction wss-3-and-moss-2007-12324Introduction wss-3-and-moss-2007-12324
Introduction wss-3-and-moss-2007-12324Mogili Venkatababu
 
MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1Information Technology
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudMicrosoft ArcReady
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint Thomas Gölles
 
What’s New SharePoint 2010?
What’s New SharePoint 2010?What’s New SharePoint 2010?
What’s New SharePoint 2010?MicrosoftFeed
 
Demystifying containers and software licensing
Demystifying containers and software licensingDemystifying containers and software licensing
Demystifying containers and software licensingKylie Fowler
 
Optimize Your It Environment With An Hp Blade System Solution
Optimize Your It Environment With An Hp Blade System SolutionOptimize Your It Environment With An Hp Blade System Solution
Optimize Your It Environment With An Hp Blade System Solutionaljimenez
 
Presentatie-Tech-talk.pptx
Presentatie-Tech-talk.pptxPresentatie-Tech-talk.pptx
Presentatie-Tech-talk.pptxrajeevrocks
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
What is an ESB? OPITZ CONSULTING - Winterberg - Trops
What is an ESB? OPITZ CONSULTING - Winterberg - TropsWhat is an ESB? OPITZ CONSULTING - Winterberg - Trops
What is an ESB? OPITZ CONSULTING - Winterberg - TropsOPITZ CONSULTING Deutschland
 
Improve Your Business Standards with Backend Development .pdf
Improve Your Business Standards with Backend Development .pdfImprove Your Business Standards with Backend Development .pdf
Improve Your Business Standards with Backend Development .pdfWPWeb Infotech
 

Similar to Pixels Camp 2017 - Stories from the trenches of building a data architecture (20)

Decoupled cms sunshinephp 2014
Decoupled cms sunshinephp 2014Decoupled cms sunshinephp 2014
Decoupled cms sunshinephp 2014
 
Back to the Basics: SharePoint Fundamentals by Joel Oleson
Back to the Basics: SharePoint Fundamentals by Joel OlesonBack to the Basics: SharePoint Fundamentals by Joel Oleson
Back to the Basics: SharePoint Fundamentals by Joel Oleson
 
Best Practices to SharePoint Architecture Fundamentals NZ & AUS
Best Practices to SharePoint Architecture Fundamentals NZ & AUSBest Practices to SharePoint Architecture Fundamentals NZ & AUS
Best Practices to SharePoint Architecture Fundamentals NZ & AUS
 
MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2
 
SharePoint in the Extranet Joel Oleson
SharePoint in the Extranet Joel OlesonSharePoint in the Extranet Joel Oleson
SharePoint in the Extranet Joel Oleson
 
Introduction wss-3-and-moss-2007-12324
Introduction wss-3-and-moss-2007-12324Introduction wss-3-and-moss-2007-12324
Introduction wss-3-and-moss-2007-12324
 
Webtechnologies
Webtechnologies Webtechnologies
Webtechnologies
 
MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The Cloud
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint
O365 Meetup Graz -Tome Tomovski - Beyond the limits of SharePoint
 
What’s New SharePoint 2010?
What’s New SharePoint 2010?What’s New SharePoint 2010?
What’s New SharePoint 2010?
 
Demystifying containers and software licensing
Demystifying containers and software licensingDemystifying containers and software licensing
Demystifying containers and software licensing
 
Optimize Your It Environment With An Hp Blade System Solution
Optimize Your It Environment With An Hp Blade System SolutionOptimize Your It Environment With An Hp Blade System Solution
Optimize Your It Environment With An Hp Blade System Solution
 
Presentatie-Tech-talk.pptx
Presentatie-Tech-talk.pptxPresentatie-Tech-talk.pptx
Presentatie-Tech-talk.pptx
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
What is an ESB? OPITZ CONSULTING - Winterberg - Trops
What is an ESB? OPITZ CONSULTING - Winterberg - TropsWhat is an ESB? OPITZ CONSULTING - Winterberg - Trops
What is an ESB? OPITZ CONSULTING - Winterberg - Trops
 
Kma share point 2010 overview infra and dev technical info
Kma share point 2010 overview infra and dev   technical infoKma share point 2010 overview infra and dev   technical info
Kma share point 2010 overview infra and dev technical info
 
Improve Your Business Standards with Backend Development .pdf
Improve Your Business Standards with Backend Development .pdfImprove Your Business Standards with Backend Development .pdf
Improve Your Business Standards with Backend Development .pdf
 
Sharepoint2
Sharepoint2Sharepoint2
Sharepoint2
 

More from Tiago Henriques

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfTiago Henriques
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winTiago Henriques
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaTiago Henriques
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a countryTiago Henriques
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocTiago Henriques
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using sshTiago Henriques
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesTiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineeringTiago Henriques
 

More from Tiago Henriques (17)

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdf
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the win
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
 
Hardware hacking 101
Hardware hacking 101Hardware hacking 101
Hardware hacking 101
 
Workshop
WorkshopWorkshop
Workshop
 
Enei
EneiEnei
Enei
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redonda
 
Preso fcul
Preso fculPreso fcul
Preso fcul
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a country
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havoc
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineering
 
Booklet
BookletBooklet
Booklet
 
Talkj4mshare
Talkj4mshareTalkj4mshare
Talkj4mshare
 
Codebits 2010
Codebits 2010Codebits 2010
Codebits 2010
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Pixels Camp 2017 - Stories from the trenches of building a data architecture

  • 1. BinaryEdge.io Be Ready. Be Safe. Be Secure. Florentino Bexiga Stories from the Trenches of Building a Data Architecture Data Engineer/ Platform Developer fb@binaryedge.io
  • 2. WHO WE ARE AND WHAT WE DO VNC RDP Files People Social Company registration internal external Phone Email Linked urls BGP AS Whois AS membership AS peer List of IPs Shared infrastructure Co-hosted sites Contact Geolocation Office locations Social networks Phone portscan dns Screenshots Web Services http https Users AppsFiles Banners Image Classifier Vulnerabilities DATA POINTS metadata Photos Family&friends Behaviour Likes Topics Search News Forums Sub-reddits Domains AXFR MX records Webserver Framework Headers Cookies Certificate Configuration Authorities Entities OCR SW ip address url address SMB torrents peers torrent name categorysource hashes of files
  • 3. AGENDA 01 02 THE NEED OF A DATA ARCHITECTURE 03 SIMPLE ARCHITECTURE OVERVIEW 04 05 MESSAGE QUEUE STREAM PROCESSING 06 BATCH PROCESSING 07 DATABASES 08 BONUS ROUND: MANAGEMENT 09 ARCHITECTURE REVISITED 10 CLOUD-BASED ARCHITECTURES THE BASIC SURVIVAL KIT
  • 4. THE NEED OF A DATA ARCHITECTURE Rules before building a data architecture Typical list of needs Think about what you need to do with the data There are no more rules Gather a lot of data coming from different places Process that data in (close to) real-time Make data available in multiple formats Provide ways to easily process that data
  • 5. SIMPLE ARCHITECTURE OVERVIEW SENSOR STREAM PRO- CESSING SENSOR SENSOR DATA SINK MESSAGE QUEUE FILE STORAGE BATCH PROCESSING DATABASES APIs PORTALS
  • 6. THE BASIC SURVIVAL KIT Apache Hadoop MapReduce HDFS Yarn Why Apache Hadoop? Interoperability with many other tools Great community Gets the job done THE BASIC SURVIVAL KIT
  • 7. THE BASIC SURVIVAL KIT YARN Available resources per node for processing Timeouts Heap, heap... HDFS Same as above Primary/ Secondary nodes - high availability Points of attention
  • 8. MESSAGE QUEUE Apache Kafka Originally developed by LinkedIn Massively scalable publish/ subscribe message queue High troughout Low latency Concepts Topics Consumers Consumer groups Partitions Replicas
  • 9. MESSAGE QUEUE Points of attention Timeouts Message sizes Retention logs vs cleanup interval !!!! Also, do not, for the love of god, simply delete all the subdirectories in your“kafka-logs”directory, you will cry.
  • 11. STREAM PROCESSING The good parts Very simple programming model and APIs Multilanguage support Points of attention Mini-batch processing, not real stream Heavy resource fingerprint Prone to timeouts of memory errors Hard to fine-tune to get the right performance DataFrame API ML Libraries Wide community Wide range of addons
  • 12. STREAM PROCESSINGSTREAM PROCESSING The good parts Stream processing Multilanguage support Points of attention Slightly more complex programming model Some support for other languages Works without much configuration effort Low resources configuration Wide community Lots of connectors and addons Great performance, like,“The flash”great
  • 13. STREAM PROCESSINGSTREAM PROCESSING The good parts Stream processing Multilanguage support Buuuuut..... Does not have a wide community Does not have that many connectors and addons Simple API (very similar to Spark) Dataset API ML Libraries Good handling of resources Low configuration/ optimisation overhead
  • 14. BATCH PROCESSING Apache Spark Multilanguage support Simple API DataFrame API ML Libraries Wide community Wide range of addons Apache Flink The good parts Multilanguage support Simple API (very similar) DataSet API ML Libraries
  • 15. BATCH PROCESSING Apache Spark Heavy resource fingerprint Prone to timeouts of memory errors Hard to fine-tune to get the right performance Apache Flink Points of attention Less configuration problems Better handling of resources Not a big community Not many addons
  • 16. DATABASES Before commiting to a database 01 Think about how you need to access the data 02 Read 1 again 03 Seriously, read 1 again Select a database, based on your needs, i.e.: Hardcore read/ write workload and not much advanced querying: HBase Heavy read/ write workload and minimally dynamic querying: Cassandra Advanced text querying and not such heavy read/ write workload: something else
  • 17. BONUS ROUND: MANAGEMENT Apache Ambari Provision a Hadoop Cluster Manage a Hadoop Cluster Monitor a Hadoop Cluster Ambari uses Hadoop ecosystem distributions such as: Hortonworks Cloudera
  • 19. CLOUD BASED ARCHITECTURES Pros Less configuration overhead Less maintenance overhead Easily scalable Reliable Return focus back to data and product Cons $$$$$$$$$$
  • 20. CLOUD BASED ARCHITECTURES SENSOR GOOGLE DATAFLOW SENSOR SENSOR DATA SINK GOOGLE PUBSUB GOOGLE CLOUD STORAGE GOOGLE DATAPROC APIs PORTALS GOOGLE BIGTABLE/ BIGQUERY
  • 21. CLOUD BASED ARCHITECTURES SENSOR AMAZON DATA PIPELINE SENSOR SENSOR DATA SINK AMAZON SIMPLE QUEUE SERVICE AMAZON S3 AMAZON ELASTIC MAPREDUCE APIs PORTALS AMAZON DYNAMODB/ REDSHIFT
  • 22. BE READY. BE SAFE. BE SECURE. BinaryEdge AG Freigutstrasse 40, 8001 Zurich Switzerland info@binaryedge.io www.binaryedge.io + 41 78 713 40 00 CONTIGENCY THREAT SAFE IRRELEVANT