SlideShare a Scribd company logo
1 of 8
Download to read offline
Ironfan
 Your Foundation for Flexible Big Data Infrastructure

Benefits                            Infochimps brings the power of Big
With Ironfan, you can expect:
                                    Data infrastructure to your fingertips.
                                    Traditional systems configuration is a time-consuming process,
•	 Reduced cycle time.
                                    vulnerable to human error. Infochimps leverages the power and
   Provision servers in
   minutes not days.                simplicity of Ironfan as its provisioning and deployment layer, al-
                                    lowing users to easily launch and orchestrate repeatable
•	 Improved visibility.             infrastructure.
   Increased transparency
   means faster problem             The Infochimps Platform reduces cycle time to provision a server
   solving and sharing.             from days or weeks to minutes, enabling simple scaling and rapid
                                    system evolution, dramatically lowering the cost of starting new
•	 Lower support costs.
                                    data analysis jobs. Infochimps even enables continual monitoring
   Experience fewer reactive
   support issues.                  of your system through automated machine provisioning. Spend
                                    your time finding insights, not building infrastructure.
•	 Lower network costs. 	
   Only use the nodes you
   need for the job you are
   running.

•	 Lower risk, more agility.
   Deploy and manage a big
   data stack with minimal
   resources.




© 2012 Infochimps, Inc. All rights reserved.                                                         1
Why Infochimps?
                                    Specialized. Ironfan, Infochimps’ systems configuration tool, le-
                                    verages three years of internal development and external
                                    contributions to its code base. This specialized experience helps
                                    organizations reduce the initial adoption cost and experimentation
                                    necessary to produce well-tuned clusters.


                                    Integrated. Infochimps’ tool development and Big Data expertise
                                    means our team understands and is equipped with the tools to
                                    successfully navigate and troubleshoot the entire Big Data eco-
                                    system of an organization.


                                    Flexible Cost. Infochimps’ Ironfan lets you take advantage of
                                    IaaS (Infrastructure as a Service) providers such as Amazon Web
                                    Services. This allows for all infrastructure costs to be treated as
                                    operating expenses (use what you need) and not capital
                                    expenditures (pay whether you need it or not). Switching from
                                    CapEx to OpEx can dramatically lower the funding barrier to
                                    adopting Big Data internally in an enterprise.


                                    Context. Perhaps best of all, the Infochimps Platform, enabled by
                                    Ironfan, can be used to provide context to an enterprise’s
                                    internal data, whether through public opinion mining (via social
                                    networks), geo-located information, word corpus training for
                                    machine learning, and other commonly useful (but difficult to
                                    accumulate) data. All of these capabilities combine to make
                                    Infochimps a great choice for providing Big Data services to the
                                    budget and process-conscious enterprise customer.




© 2012 Infochimps, Inc. All rights reserved.                                                       2
Understanding the Tools
                                    What is Chef? Chef is a configuration management system,
                                    designed to be a general purpose tool for building repeatable
                                    infrastructure. It uses a Ruby DSL (Domain Specific Language)
                                    allowing you to write out specifications (as cookbooks, roles, etc.)
                                    for infrastructure that is fully composable.


                                    Chef can be used in a number of ways, allowing it to fit into a
                                    variety of existing architectures. Its flexibility, however, means that
                                    it cannot as easily build higher-level abstractions on top of the
                                    architecture it provides.


                                    What is Ironfan? Ironfan, the foundation of The Infochimps
                                    Platform, is a systems provisioning and deployment tool. Ironfan
                                    automates not only machine configuration, but entire systems
                                    configuration to enable the entire Big Data stack, including tools
                                    for data ingestion, scraping, storage, computation, and
                                    monitoring.


                                    Ironfan builds on Chef, but is opinionated about its
                                    architecture, which allows broader integration between
                                    components. It assumes a source repository, a central Chef
                                    Server, and a modern POSIX-compliant operating system for a
                                    base image. Currently, it works best with Git, Amazon Web
                                    Services and Ubuntu 11.04, with exploration into other
                                    virtualization platforms (Vagrant, etc.) and operating systems
                                    (Centos, FreeBSD, etc.) ongoing, both inside and outside of
                                    Infochimps.




© 2012 Infochimps, Inc. All rights reserved.                                                            3
Benefits for the Entire Team
                                    For Systems Administrators, Ironfan removes the guesswork
                                    from building systems, because it reduces the cycle time to build
                                    a server from days or weeks to minutes. Instead of
                                    following long lists of manual processes, a system administrator
                                    makes changes to their Ironfan homebase, and then ushers those
                                    changes into the appropriate systems with the Chef knife and
                                    client programs. This enables rapid iterative development, a
                                    practice of Agile programming shops for years. Up until recently,
                                    this kind of fast-paced development was unavailable to the
                                    average systems administrator. Ironfan also enables repeatable
                                    architecture, another powerful tool. Now, replacing malfunction-
                                    ing components with completely new ones, built from scratch and
                                    loaded with data from live exports or backups is a simple, reliable,
                                    and rapid process, instead of a last-ditch solution. Finally, Ironfan
                                    allows you to make infrastructure inevitable: you can write
                                    definitions, which automatically attach new servers to your
                                    existing architecture, instead of wiring into central services like
                                    monitoring, log ingestion, or orchestration manually, without the
                                    attendant risk of human error.


                                    For Data Scientists or Business Intelligence Teams,
                                    Ironfan can currently build a Hadoop cluster from scratch in less
                                    than an hour with just a small handful of commands, and expand
                                    it in minutes with a single command. Other large scale cluster
                                    technologies (HBase, ElasticSearch, Redis, Flume, etc.) are just
                                    as simple to build. This dramatically reduces the cost of start-
                                    ing new data analysis jobs, allowing for greater experimentation.
                                    Because the underlying architecture is rented by machine-hour,
                                    jobs with predictable costs in machine-hours can be optimized for
                                    rapid execution without large increases in cost. Should the
                                    underlying processing time prove greater than anticipated,
                                    clusters can be scaled up while in use, to improve the chances of
                                    hitting deadlines.


© 2012 Infochimps, Inc. All rights reserved.                                                         4
Benefits for the Entire Team
                                    For Systems Architects or Core Infrastructure Team,
                                    Ironfan allows you to build the repeatable architecture
                                    recommended by ITIL (Information Technology Infrastructure
                                    Library) for reliable IT infrastructure. It becomes simpler to scale
                                    or evolve systems rapidly. Ironfan takes the grunt-work out of
                                    distributing those changes, allowing architects to spend more of
                                    their focus on design details, instead of implementation details.
                                    Since everything is stored in source control, both architects and
                                    administrators can make changes to the infrastructure, confident
                                    that they are not obliterating important history. Also, the same
                                    code can be used to create development, staging, and
                                    production environments, the usual barriers to deployment
                                    caused by differences in the underlying architectures and
                                    deployment mechanisms are significantly reduced.


                                    Because starting new instances with Ironfan is trivial, and paid for
                                    by the hour, capacity can be managed as OpEx rather than
                                    CapEx. This also means that problems with huge capacity spikes
                                    can be considered; turning up a thousand nodes for three days,
                                    then turning them off again, is no longer a laughable fantasy.
                                    Migrations also become significantly easier, as new infrastructure
                                    can be spun up in parallel with the old, without a long term
                                    increase in expense.




© 2012 Infochimps, Inc. All rights reserved.                                                         5
Case Study
                                    How Infochimps Uses Ironfan to Create TrstRank

                                    Since the launch of Twitter, people have clamored for ways to
                                    access and “slice and dice” its data. One of the most common
                                    ways people use the Twitter data corpus is to measure a person’s
                                    importance and influence. Klout is an example of one product that
                                    specializes in this kind of “influencer” data.

What is                             A few years ago, we created our own special version of Klout,
TrstRank?                           one that took advantage of our vast historical record of the
                                    relationships to create an accurate number describing how
TrstRank is an Infochimps           influential a Twitter user is. It’s called TrstRank and it ranks a user
developed dataset and API           on a scale of 1-10, with 10 being the most influential you
that provides Twitter influence     can get.
metrics. This API provides
Twitter influence metrics with      Coming up with such a number like TrstRank is no small task.
the click of a button! TrstRank     Setting aside the issues of getting the data, there are some very
measures Twitter user               real Big Data problems surrounding the product that require
reputation, importance and          special tools for getting it done efficiently. And when you’re a
influence in a far more             bootstrapped startup, like we were at the time, you have to be
robust way than counting the        resourceful if you are going to get by.
number of followers. It is a
sophisticated measure of a          The biggest issue with pursuing a new data product like TrstRank
user’s relative importance          is the same one any company faces when they decide to venture
within the entire Twitter           into new territory - the high risks of wasting time and money.
network.
                                    Wasting Time
                                    One of the first problems you run into as a small team trying your
                                    hand at data science is the excess time spent on server and ma-
                                    chine configuration, instead of focusing on modeling, algorithms,
                                    and manipulating the data.


                                    Ramp-up time for even the first phase of a project like TrstRank
                                    can be a whole day or more of engineering time.


© 2012 Infochimps, Inc. All rights reserved.                                                           6
Case Study (continued)
                                    How Infochimps Uses Ironfan to Create TrstRank

                                    Wasting Money
                                    From our earliest days Infochimps has been based on Amazon
                                    Web Services’ (AWS) cloud, taking advantage of the flexibility
                                    and scalability it provides. With AWS, you pay for what you use,
                                    so you are always inclined to eliminate waste. In our early days
                                    we even created decision trees for when to shut down a cluster or
                                    not, depending on how many hours it was to be up but not used.


                                    This can set conflicting goals for the data scientist who would
                                    prefer to leave a cluster up overnight, even if it’s unused, so they
                                    don’t have to deal with setting everything up again the next day!


                                    Enter Ironfan
                                    We created Ironfan to solve our own problems of how to save
                                    time and money during our data science operations in the cloud.
                                    When we came up with the idea for TrstRank, it was a simple
                                    operation to spin up a cluster for early analysis and experimenta-
                                    tion. We could validate some of our algorithms and ideas on a
                                    simple cluster before moving to something more heavyweight.


                                    Ironfan and TrstRank, Now
                                    Ironfan has continued as a key tool for our monthly TrstRank
                                    operation. We continue to scrape Twitter for follower information,
                                    and with the updated data every month we crunch the TrstRank
                                    numbers again.


                                    With Ironfan, we’re able to run a multiple step operation on
                                    8 billion tweets on clusters of 30 m1.xlarge EC2 machines,
                                    while only running the resources we need when they’re needed.
                                    TrstRank takes 72 hours to complete, with resources being paid
                                    for commensurately. Without Ironfan, we’d be looking at 2-3x the
                                    costs in time and money!


© 2012 Infochimps, Inc. All rights reserved.                                                         7
About Infochimps
                                    Our mission is to make the world’s data more accessible.
                                    Infochimps helps companies understand their data. We provide
                                    tools and services that connect their internal data, leverage the
                                    power of cloud computing and new technologies such as Hadoop,
                                    and provide a wealth of external datasets, which organizations
                                    can connect to their own data.


                                    Contact Us
                                    Infochimps, Inc.
                                    1214 W 6th St. Suite 202
                                    Austin, TX 78703


                                    1-855-DATA-FUN (1-855-328-2386)


                                    www.infochimps.com
                                    info@infochimps.com


                                    Twitter: @infochimps




                      Get a free Big Data consultation
                          Let’s talk Big Data in the enterprise!

     Get a free conference with the leading big data experts regarding your enterprise big data
     project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop
     about your project objectives, design, infrastructure, tools, etc. Find out how other compa-
     nies are solving similar problems. Learn best practices and get recommendations — free.




© 2012 Infochimps, Inc. All rights reserved.                                                        8

More Related Content

What's hot

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...DataStax Academy
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Tanguy MOAL
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...Chad Lawler
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDataWorks Summit
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariHortonworks
 

What's hot (20)

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 

Viewers also liked

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi namboori
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork balvis_ms
 
Orchestrating HBase Cluster Deployment with Ironfan and Chef
Orchestrating HBase Cluster Deployment with Ironfan and ChefOrchestrating HBase Cluster Deployment with Ironfan and Chef
Orchestrating HBase Cluster Deployment with Ironfan and ChefRobert Berger
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...Infochimps, a CSC Big Data Business
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 

Viewers also liked (13)

AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
 
Orchestrating HBase Cluster Deployment with Ironfan and Chef
Orchestrating HBase Cluster Deployment with Ironfan and ChefOrchestrating HBase Cluster Deployment with Ironfan and Chef
Orchestrating HBase Cluster Deployment with Ironfan and Chef
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 
Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 

Similar to Ironfan: Your Foundation for Flexible Big Data Infrastructure

Ansible: Simple yet powerful IT automation tool
Ansible: Simple yet powerful IT automation toolAnsible: Simple yet powerful IT automation tool
Ansible: Simple yet powerful IT automation toolsureshraj43
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsColloquium
 
Intro cloud-1
Intro cloud-1Intro cloud-1
Intro cloud-1Studying
 
Intro cloud-1
Intro cloud-1Intro cloud-1
Intro cloud-1Studying
 
Solu technology partners cloud computing
Solu technology partners   cloud computingSolu technology partners   cloud computing
Solu technology partners cloud computingGeorge L. Smith
 
Apprenda - Overview of the Apprenda Platform
Apprenda - Overview of the Apprenda PlatformApprenda - Overview of the Apprenda Platform
Apprenda - Overview of the Apprenda PlatformApprenda
 
Conduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarConduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarSyed Shaaf
 
Where can you use serverless?  How does it relate to APIs, integration and mi...
Where can you use serverless?  How does it relate to APIs, integration and mi...Where can you use serverless?  How does it relate to APIs, integration and mi...
Where can you use serverless?  How does it relate to APIs, integration and mi...Kim Clark
 
CLOUD ARCHITECTURE AND SERVICES.pptx
CLOUD ARCHITECTURE AND SERVICES.pptxCLOUD ARCHITECTURE AND SERVICES.pptx
CLOUD ARCHITECTURE AND SERVICES.pptxDr Geetha Mohan
 
Softchoice Webinar: IBM PureSystems launch
 Softchoice Webinar: IBM PureSystems launch Softchoice Webinar: IBM PureSystems launch
Softchoice Webinar: IBM PureSystems launchSoftchoice Corporation
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 

Similar to Ironfan: Your Foundation for Flexible Big Data Infrastructure (20)

50120140507002
5012014050700250120140507002
50120140507002
 
50120140507002
5012014050700250120140507002
50120140507002
 
50120140507002
5012014050700250120140507002
50120140507002
 
Ansible: Simple yet powerful IT automation tool
Ansible: Simple yet powerful IT automation toolAnsible: Simple yet powerful IT automation tool
Ansible: Simple yet powerful IT automation tool
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS Solutions
 
Enterprise virtual machine on IBM Cloud
Enterprise virtual machine on IBM CloudEnterprise virtual machine on IBM Cloud
Enterprise virtual machine on IBM Cloud
 
Intro cloud-1
Intro cloud-1Intro cloud-1
Intro cloud-1
 
Intro cloud-1
Intro cloud-1Intro cloud-1
Intro cloud-1
 
Solu technology partners cloud computing
Solu technology partners   cloud computingSolu technology partners   cloud computing
Solu technology partners cloud computing
 
Apprenda - Overview of the Apprenda Platform
Apprenda - Overview of the Apprenda PlatformApprenda - Overview of the Apprenda Platform
Apprenda - Overview of the Apprenda Platform
 
Conduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarConduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminar
 
SmartOS
SmartOSSmartOS
SmartOS
 
Where can you use serverless?  How does it relate to APIs, integration and mi...
Where can you use serverless?  How does it relate to APIs, integration and mi...Where can you use serverless?  How does it relate to APIs, integration and mi...
Where can you use serverless?  How does it relate to APIs, integration and mi...
 
CLOUD ARCHITECTURE AND SERVICES.pptx
CLOUD ARCHITECTURE AND SERVICES.pptxCLOUD ARCHITECTURE AND SERVICES.pptx
CLOUD ARCHITECTURE AND SERVICES.pptx
 
Softchoice Webinar: IBM PureSystems launch
 Softchoice Webinar: IBM PureSystems launch Softchoice Webinar: IBM PureSystems launch
Softchoice Webinar: IBM PureSystems launch
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Infrastructure as Code.docx
Infrastructure as Code.docxInfrastructure as Code.docx
Infrastructure as Code.docx
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 

More from Infochimps, a CSC Big Data Business (9)

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 

Recently uploaded

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Recently uploaded (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Ironfan: Your Foundation for Flexible Big Data Infrastructure

  • 1. Ironfan Your Foundation for Flexible Big Data Infrastructure Benefits Infochimps brings the power of Big With Ironfan, you can expect: Data infrastructure to your fingertips. Traditional systems configuration is a time-consuming process, • Reduced cycle time. vulnerable to human error. Infochimps leverages the power and Provision servers in minutes not days. simplicity of Ironfan as its provisioning and deployment layer, al- lowing users to easily launch and orchestrate repeatable • Improved visibility. infrastructure. Increased transparency means faster problem The Infochimps Platform reduces cycle time to provision a server solving and sharing. from days or weeks to minutes, enabling simple scaling and rapid system evolution, dramatically lowering the cost of starting new • Lower support costs. data analysis jobs. Infochimps even enables continual monitoring Experience fewer reactive support issues. of your system through automated machine provisioning. Spend your time finding insights, not building infrastructure. • Lower network costs. Only use the nodes you need for the job you are running. • Lower risk, more agility. Deploy and manage a big data stack with minimal resources. © 2012 Infochimps, Inc. All rights reserved. 1
  • 2. Why Infochimps? Specialized. Ironfan, Infochimps’ systems configuration tool, le- verages three years of internal development and external contributions to its code base. This specialized experience helps organizations reduce the initial adoption cost and experimentation necessary to produce well-tuned clusters. Integrated. Infochimps’ tool development and Big Data expertise means our team understands and is equipped with the tools to successfully navigate and troubleshoot the entire Big Data eco- system of an organization. Flexible Cost. Infochimps’ Ironfan lets you take advantage of IaaS (Infrastructure as a Service) providers such as Amazon Web Services. This allows for all infrastructure costs to be treated as operating expenses (use what you need) and not capital expenditures (pay whether you need it or not). Switching from CapEx to OpEx can dramatically lower the funding barrier to adopting Big Data internally in an enterprise. Context. Perhaps best of all, the Infochimps Platform, enabled by Ironfan, can be used to provide context to an enterprise’s internal data, whether through public opinion mining (via social networks), geo-located information, word corpus training for machine learning, and other commonly useful (but difficult to accumulate) data. All of these capabilities combine to make Infochimps a great choice for providing Big Data services to the budget and process-conscious enterprise customer. © 2012 Infochimps, Inc. All rights reserved. 2
  • 3. Understanding the Tools What is Chef? Chef is a configuration management system, designed to be a general purpose tool for building repeatable infrastructure. It uses a Ruby DSL (Domain Specific Language) allowing you to write out specifications (as cookbooks, roles, etc.) for infrastructure that is fully composable. Chef can be used in a number of ways, allowing it to fit into a variety of existing architectures. Its flexibility, however, means that it cannot as easily build higher-level abstractions on top of the architecture it provides. What is Ironfan? Ironfan, the foundation of The Infochimps Platform, is a systems provisioning and deployment tool. Ironfan automates not only machine configuration, but entire systems configuration to enable the entire Big Data stack, including tools for data ingestion, scraping, storage, computation, and monitoring. Ironfan builds on Chef, but is opinionated about its architecture, which allows broader integration between components. It assumes a source repository, a central Chef Server, and a modern POSIX-compliant operating system for a base image. Currently, it works best with Git, Amazon Web Services and Ubuntu 11.04, with exploration into other virtualization platforms (Vagrant, etc.) and operating systems (Centos, FreeBSD, etc.) ongoing, both inside and outside of Infochimps. © 2012 Infochimps, Inc. All rights reserved. 3
  • 4. Benefits for the Entire Team For Systems Administrators, Ironfan removes the guesswork from building systems, because it reduces the cycle time to build a server from days or weeks to minutes. Instead of following long lists of manual processes, a system administrator makes changes to their Ironfan homebase, and then ushers those changes into the appropriate systems with the Chef knife and client programs. This enables rapid iterative development, a practice of Agile programming shops for years. Up until recently, this kind of fast-paced development was unavailable to the average systems administrator. Ironfan also enables repeatable architecture, another powerful tool. Now, replacing malfunction- ing components with completely new ones, built from scratch and loaded with data from live exports or backups is a simple, reliable, and rapid process, instead of a last-ditch solution. Finally, Ironfan allows you to make infrastructure inevitable: you can write definitions, which automatically attach new servers to your existing architecture, instead of wiring into central services like monitoring, log ingestion, or orchestration manually, without the attendant risk of human error. For Data Scientists or Business Intelligence Teams, Ironfan can currently build a Hadoop cluster from scratch in less than an hour with just a small handful of commands, and expand it in minutes with a single command. Other large scale cluster technologies (HBase, ElasticSearch, Redis, Flume, etc.) are just as simple to build. This dramatically reduces the cost of start- ing new data analysis jobs, allowing for greater experimentation. Because the underlying architecture is rented by machine-hour, jobs with predictable costs in machine-hours can be optimized for rapid execution without large increases in cost. Should the underlying processing time prove greater than anticipated, clusters can be scaled up while in use, to improve the chances of hitting deadlines. © 2012 Infochimps, Inc. All rights reserved. 4
  • 5. Benefits for the Entire Team For Systems Architects or Core Infrastructure Team, Ironfan allows you to build the repeatable architecture recommended by ITIL (Information Technology Infrastructure Library) for reliable IT infrastructure. It becomes simpler to scale or evolve systems rapidly. Ironfan takes the grunt-work out of distributing those changes, allowing architects to spend more of their focus on design details, instead of implementation details. Since everything is stored in source control, both architects and administrators can make changes to the infrastructure, confident that they are not obliterating important history. Also, the same code can be used to create development, staging, and production environments, the usual barriers to deployment caused by differences in the underlying architectures and deployment mechanisms are significantly reduced. Because starting new instances with Ironfan is trivial, and paid for by the hour, capacity can be managed as OpEx rather than CapEx. This also means that problems with huge capacity spikes can be considered; turning up a thousand nodes for three days, then turning them off again, is no longer a laughable fantasy. Migrations also become significantly easier, as new infrastructure can be spun up in parallel with the old, without a long term increase in expense. © 2012 Infochimps, Inc. All rights reserved. 5
  • 6. Case Study How Infochimps Uses Ironfan to Create TrstRank Since the launch of Twitter, people have clamored for ways to access and “slice and dice” its data. One of the most common ways people use the Twitter data corpus is to measure a person’s importance and influence. Klout is an example of one product that specializes in this kind of “influencer” data. What is A few years ago, we created our own special version of Klout, TrstRank? one that took advantage of our vast historical record of the relationships to create an accurate number describing how TrstRank is an Infochimps influential a Twitter user is. It’s called TrstRank and it ranks a user developed dataset and API on a scale of 1-10, with 10 being the most influential you that provides Twitter influence can get. metrics. This API provides Twitter influence metrics with Coming up with such a number like TrstRank is no small task. the click of a button! TrstRank Setting aside the issues of getting the data, there are some very measures Twitter user real Big Data problems surrounding the product that require reputation, importance and special tools for getting it done efficiently. And when you’re a influence in a far more bootstrapped startup, like we were at the time, you have to be robust way than counting the resourceful if you are going to get by. number of followers. It is a sophisticated measure of a The biggest issue with pursuing a new data product like TrstRank user’s relative importance is the same one any company faces when they decide to venture within the entire Twitter into new territory - the high risks of wasting time and money. network. Wasting Time One of the first problems you run into as a small team trying your hand at data science is the excess time spent on server and ma- chine configuration, instead of focusing on modeling, algorithms, and manipulating the data. Ramp-up time for even the first phase of a project like TrstRank can be a whole day or more of engineering time. © 2012 Infochimps, Inc. All rights reserved. 6
  • 7. Case Study (continued) How Infochimps Uses Ironfan to Create TrstRank Wasting Money From our earliest days Infochimps has been based on Amazon Web Services’ (AWS) cloud, taking advantage of the flexibility and scalability it provides. With AWS, you pay for what you use, so you are always inclined to eliminate waste. In our early days we even created decision trees for when to shut down a cluster or not, depending on how many hours it was to be up but not used. This can set conflicting goals for the data scientist who would prefer to leave a cluster up overnight, even if it’s unused, so they don’t have to deal with setting everything up again the next day! Enter Ironfan We created Ironfan to solve our own problems of how to save time and money during our data science operations in the cloud. When we came up with the idea for TrstRank, it was a simple operation to spin up a cluster for early analysis and experimenta- tion. We could validate some of our algorithms and ideas on a simple cluster before moving to something more heavyweight. Ironfan and TrstRank, Now Ironfan has continued as a key tool for our monthly TrstRank operation. We continue to scrape Twitter for follower information, and with the updated data every month we crunch the TrstRank numbers again. With Ironfan, we’re able to run a multiple step operation on 8 billion tweets on clusters of 30 m1.xlarge EC2 machines, while only running the resources we need when they’re needed. TrstRank takes 72 hours to complete, with resources being paid for commensurately. Without Ironfan, we’d be looking at 2-3x the costs in time and money! © 2012 Infochimps, Inc. All rights reserved. 7
  • 8. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free. © 2012 Infochimps, Inc. All rights reserved. 8