How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
Dnscluster @ DevOps Krakow 2013
1. DevOps Krakow #Meet 1
DNS CLUSTER
Automated Internal DNS Service with Amazon VPC integration
Sławomir Skowron
System Engineer (DevOps Team)
slawomir.skowron@getbase.com
2013
3. WHAT IS DNS ?
•
Domain Name System is hierarchical and distributed naming
system
•
Essentially name service for TCP/IP networks
•
Allow IP address resolution mechanism
•
Adds tree based domain name space,
•
Name space is sub-divides into zones and start with root zone
•
One of the first NoSQL key-value database
6. DOMAIN NAME SERVERS
Software on servers that store, manage and serve information about own part
of domain namespace called zone
Two types of servers: master and slave
7. DNS QUERIES
Two type of external queries: Recursive and Iterative
•
Recursive - querying other servers until positive response
•
Iterative - add local response (cache, local zone) or give
info where to look for it.
Cached Queries - DNS Cache - improve latency and throughput
10. AMAZON EC2 DNS (VPC) PROBLEMS
•
Route-53 (right now) is not supporting internal DNS domains
•
Amazon VPC Internal DNS support only ec2.internal domains
•
Amazon VPC DHCP in default support only AWS DNS
12. USE CASE
Our own DNS Service
•
Available only in LAN and through VPN
•
Only A and SRV - infrastructure DNS
•
Resolv local and forward if not exist
•
No zone transfer, No slaves, No masters
•
Updates are simple, secure and fast
13. SOLUTION
Our own DNS Service
•
Clustering for High Availability and Performance
•
Integration with our VPC’s DHPC
•
Availability in every Amazon Region
•
Caching
•
Fully Automated and Integrated with Instance Provisioning
•
Support for our name space
17. SOLUTION
•
Puppet 3 as Configuration Management solution
•
Puppet Hiera, PuppetDB integration
•
TheForeman - http://theforeman.org/
•
Foreman integrates with BIND
•
Unbound as DNSCluster core - local zones, forwarder,
cache
•
Git for store zones and versioning
19. WHAT’S WRONG WITH PUPPET ?
•
Puppet is slow
•
Hard and slow flow developing with Puppet
•
Hard to integrate on running machines before
puppet.
•
PuppetDB is ok but it’s not scalable enough
•
Everything go through Foreman and BIND in our case
22. ANSIBLE
•
Minimal setup - Python + Libs - pip install ansible
•
Use existing auth (root, sudo) on SSH as default transport or
accelerated mode
•
Ad-hoc operations built in
•
async, sync and parallel operations
•
Predictable, easy to expand (plugins, connectors, filters, modules)
•
Use powerful templates in jinja2
•
outputs in json,
•
configure in yaml
26. SOLUTION
•
Ansible
•
Unbound as DNSCluster core - local zones, forwarder, cache
•
Git for store zones and versioning
•
Amazon VPC DHCP integration - under development
•
ETCD integration - under development
28. IMPROVEMENT
KISS as core thinking
•
Simple workflow
•
Faster development
•
Fast Deploy with low memory/cpu consumption
•
No central DB
•
All data are stored in 3 places and can be restored from running
machines
•
Work as push or pull workflow
•
Integrated with VPC DHCP if new DNSCluster is created
32. DNSCLUSTER PERFORMANCE
Queries per second / Concurrency
2500
2000
AWS DNS
DNSCLUSTER 1 node (1 cpu core –
ec2.x1.small)
1500
QPS
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 1
pass – 1 unbound thread
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 2
pass – from cache – 1 unbound threads
1000
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 2
pass – from cache – 2 unbound threads
500
0
1
500
Concurrency
1000
33. DNSCLUSTER PERFORMANCE
0.12
Latency / Concurrency
0.1
AWS DNS
DNSCLUSTER 1 node (1 cpu core –
ec2.x1.small)
Latensy [seconds]
0.08
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 1
pass – 1 unbound thread
0.06
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 2
pass – from cache – 1 unbound threads
0.04
UNBOUND local cache (forwarders: 3
dnscluster nodes – 3 x ec2.x1.small) 2
pass – from cache – 2 unbound threads
0.02
0
1
500
Concurrency
1000
34. SOON / NEXT TIME ?
Ansible Universal Template Flow
Created @ Base for simple consistent create/destroy instances
Monitoring and Alerting
second element for our auto scaling