SlideShare a Scribd company logo
1 of 106
Download to read offline
OpenSearch
-Abhi Jain
Agenda
● OpenSearch
○ What is it?
○ Benefits/ Uses
○ How to use it
○ Features
● Migrate from Elastic to OpenSearch
● Tools & Plugins
About Me
● Lead Dev
● Located in Florida
● Trainer
● Presenter
● .NET Developer
● Youtuber: Coach4Dev
● Husband/ Father
Amazon Elasticsearch
● Launched in 2015
● Gained popularity for log analytics usage
● Used open-source Elastic under Apache License v2
● Jan 2021
○ Elastic NV changed licensing strategy
○ After ElasticSearch 7.10.2 & Kibana 7.10.2
■ Not release under Apache License v2
■ Release under Elastic License
OpenSearch
● Sep 2021:
○ Renamed from ElasticSearch to OpenSearch
● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2
● Highly scalable
● Fast access & response to large volumes of data
● Powered by Apache Lucene Search library
Apache Lucene
● Apache Lucene project develops open-source search software
○ Releases a core search library named Lucene core
● Lucene Core
○ Java Library providing powerful indexing and search features
Apache Solr
● Open source search platform
● Built on Apache Lucene
Solr vs ElasticSearch
● Similar performance mostly.
● ES has better support for scalability
○ due to horizontal scaling
■ Better cloud support too
● ES can support multiple doc types in a single index better
○ More difficult to do this in Solr
● ES supports native DSL (Domain Specific Language)
○ Need to program queries in Solr
● https://mindmajix.com/elasticsearch-vs-solr
Why OpenSearch
● Huge amount of machine generated data these days
○ Growing exponentially
● Getting insights is important
● Interactive log analytics
● Real-time application monitoring
● Website Search, etc.
OpenSearch Features
● Easy to set-up and configure
● In-place upgrades
● Enables data monitoring & setting alerts based on thresholds
● Supports authentication, encryption & compliance requirements
OpenSearch vs ElasticSearch
● OpenSearch was forked from Elastic Search
○ Now they are separate from each other
● Each is adding features separately
● OpenSearch
○ Inbuilt support from AWS
OpenSearch features not in ES (free version)
● Centralized user accounts / access control
● Cross-cluster replication
● IP filtering
● Configurable retention period
● Anomaly detection
● Tableau connector
● JDBC driver
● ODBC driver
● Machine learning features such as regression and classification
● Link
ElasticSearch Features
● Based on subscription levels
● https://www.elastic.co/subscriptions
OpenSearch & ElasticSearch Version Support
● Currently supports the following OpenSearch versions:
○ 1.3, 1.2, 1.1, 1.0
● And supports the following ElasticSearch versions:
○ 7.10, 7.9, 7.8, 7.7, 7.4, 7.1
○ 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0
○ 5.6, 5.5, 5.3, 5.1
○ 2.3
○ 1.5
What is Kibana
● Free & open front end application
● Charting tool for Elastic Stack
● Sits on top of Elastic Stack
● Sample Dashboard
OpenSearch Dashboards
● Default visualization tool for data in OpenSearch
● Filter data with queries
● Comes with opensearch service
Terminologies
OpenSearch Cluster
● Synonymous to domain
● Domains are clusters with
○ settings,
○ instance types,
○ instance counts,
○ and storage resources that you specify.
● Group of nodes
○ With same cluster.name attribute
Opensearch Node
● Member of a cluster
● A distinct host
● With IP address
Getting Started
● Create a domain
● Size the domain appropriately for your workload
● Control access to your domain using a domain access policy or fine-grained
access control
● Index data manually or from other AWS services
● Use OpenSearch Dashboards to search your data and create visualizations
Custom Endpoint
● If we want easier to read or custom domain name
● Can use Https
○ Upload SSL certificate
Run OpenSearch locally
● Install docker
● wsl -d docker-desktop
● sysctl -w vm.max_map_count=262144
● Ctrl+C
● docker-compose up
● Visit http://localhost:5601/
● Use admin/admin to login and explore
● Link
Upload Data
● One at a time
● Bulk
Upload Data One At a time
● curl -XPUT -u "master:XXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_doc/1" -d "{"director": "Burton, Tim", "genre":
["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce
Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}" -H "Content-Type:
application/json"
Upload Data Bulk
● curl -XPOST -u "master:XXXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/_bulk" --data-binary @bulk_movies.txt -H "Content-Type:
application/json"
How to Query?
Searching Data
● URI Searches
● Command Line
● OpenSearch Dashboards
Searching Data - URI
● GET Request
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/movies/_search?q=rebel&pretty=true
● Searches all the indices and properties
URI Search Specific fields
● Search movies index and title property
● GET
https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti
tle:house
Get Search Results - Command Line
● curl -XGET -u "master:XXXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_search?q=rebel&pretty=true"
Query DSL
● For more complex queries
○ OpenSearch Domain Specific Language (DSL)
● POST request with query body
●
Get Search Results - Dev Tools
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/_dashboards/app/dev_tools#/console
○ GET _search
○ {
○ "query": {
○ "match_all": {}
○ }
○ }
Search on only specific fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "U.S.",
"fields": ["title", "actor", "director"]
}
}
}
Search - Boosting fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "john",
"fields": ["title^4", "actor", "director^4"]
}
}
}
Search - Pagination
GET _search
{
"from": 0,
"size": 1,
"query": {
"multi_match": {
"query": "Drama",
"fields": ["genre"]
}
}
}
Query -With Highlights In Response
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
},
"highlight": {
"fields": {
"title": {}
},
"pre_tags": "<strong>",
"post_tags": "</strong>",
"fragment_size": 200,
"boundary_chars": ".,!? "
}
}
Query - Count
GET movies/_count
{
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
}
}
Dashboard Query Language
● Use DQL in Dashboards
○ Search for data and visualizations
● Terms Query
○ Search for any text
■ E.g. www.example.com
○ Access object’s nested field
■ E.g. coordinates.lat:43.7102
○ Leading and trailing wildcards
■ host.keyword:*.example.com/*
● Operators
○ AND
○ OR
Dashboard Query Language
● Date and range Queries
○ bytes >= 15 and memory < 15
○ @timestamp > "2020-12-14T09:35:33"
● Nested field query
○ superheroes: {hero-name: Superman}
Dashboard Plugins
Query Workbench
● SQL
○ Run SQL
○ Treat indices as tables
● PPL
○ Piped Processing Language
○ Commands delimited by pipes
Reporting
● Multiple file formats
● On demand/ Scheduled
● Generate from
○ Dashboard
○ Visualization
○ Discover
Anomaly Detection
● Detect unusual behavior in time series data
● Anomaly Grade
● Confidence Score
Notifications
● Supported
○ Amazon Chime
○ SNS
○ SES
○ SMTP
○ Slack
○ Custom Webhooks
Observability plugin
● Visualize/Query time series data
● Event analytics
● Compare the data the way you like
Index Management
● Create ISM policy
● To manage your indexes
Security plugin
● Set up RBAC
●
Migrate from ElasticSearch to OpenSearch
Three major approaches
● Snapshot
● Rolling Upgrade
● Cluster Restart
Snapshot Method
● Generate snapshot in ElasticSearch
● Save in shared directory
● Restore in OpenSearch
● Snapshot
○ Backup of entire cluster state
○ Useful for recovery from failure and migration
● Link
Snapshot Method
● Check Index compatibility
○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster
● Link
● Fastest
● Easiest
● Most efficient
●
Rolling Upgrade
● Official way to migrate cluster
● Without interruption
● Rolling upgrades are supported:
○ Between minor versions
○ From 5.6 to 6.8
○ From 6.8 to 7.14.1
○ From any version since 7.14.0 to 7.14.1
Rolling Upgrade
● Shut down one node at a time
○ Minimal disruption
Cluster Restart Upgrades
● Shut down all nodes
● Perform the upgrade
● Restart the cluster
Mapping
OpenSearch Mapping
● Dynamic
○ When you index a document
○ Opensearch adds fields automatically
○ It deduces their types by itself
● Explicit
○ If you know your data types
○ Preferred way of doing things
OpenSearch Mapping
● If you do not define a mapping ahead of time, OpenSearch dynamically
creates a mapping for you.
● If you do decide to define your own mapping, you can do so at index creation.
● ONE mapping is defined per index. Once the index has been created, we can
only add new fields to a mapping. We CANNOT change the mapping of an
existing field.
● If you must change the type of an existing field, you must create a new index
with the desired mapping, then reindex all documents into the new index.
Text vs keyword data types
● Text type
○ Full text searches
● Keyword type
○ Exact searches
○ Aggregations
○ Sorting
Text vs Keyword
● Inverted Index
Aggregations
OpenSearch Aggregations
● Analyze data
○ In real time too
● Extract statistics
● More expensive than queries
○ Or CPU and Memory
○ In general
Aggregation Query
● Use aggs or aggregations
Example
● Get average of
Data Streams
Data Streams in OpenSearch
● Ingesting time series data
○ Logs
○ Events
○ Metrics, etc.
● Number of documents grows rapidly
● Append Only data
● Don't need to update older documents (Very rarely)
Rollover
● If data is growing rapidly
● Write to index upto certain threshold
○ Then create a new index
○ And start writing to it
● Optimize the active index for high ingest rates on high-performance hot
nodes.
● Optimize for search performance on warm nodes.
● Shift older, less frequently accessed data to less expensive cold nodes,
● Delete data according to your retention policies by removing entire indices.
Index Template
● Data Stream requires an index template
● A name or wildcard (*) pattern for the data stream.
● The data stream’s timestamp field. This field must be mapped as a date or
date_nanos field data type and must be included in every document indexed
to the data stream.
● The mappings and settings applied to each backing index when it’s created.
ILM Policy
● Index Lifecycle Management Policy
● Can be applied to any number of indices
● Usage
○ Allocate
○ Delete
○ Rollover
○ Read Only
○ Wait for snapshot
ILM Policy
● Create a policy:
● Link
Create ILM Policy
Create ILM Policy
Create ILM Policy
Index Template
● Tells ElasticSearch how to configure an index when it is created
● For data streams
○ Configures the stream’s backing indices
○ Configured prior to index creation
Templates Types
● Component Templates
○ Reusable building blocks that configure
■ mappings,
■ settings, and
■ Aliases
○ Not directly applied to indices
● Index Template
○ Collection of component templates
○ Directly applied to indices
○ Some defaults: metrics-*-*, logs-*-*
Create Component Template
● Link
Create Index Template
● Data Stream requires matching index template
● PUT _index_template/{template_name}
Create Index Template
● Link
Create data stream
● Documents must contain timestamp field
● PUT _data_stream/my-data-stream
● Stream’s name must match one of your index template’s index patterns
Get Info About Data Stream
● GET _data_stream/my-data-stream
Delete Data Stream
● DELETE _data_stream/my-data-stream
Cross Cluster Replication
Cross Cluster Replication
● Cross Cluster replication plugin
○ Replicates indexes, mapping & metadata from one cluster to another
● Advantages
○ Continue to handle search requests if there is an outage
○ Can help reduce latency in application
■ Replicating data across geographically distant data centers
Replication
● Active passive model
○ Follower index pulls data from leader index
● It can be
○ Started
○ Paused
○ Stopped
○ Resumed
● Can be secured
○ Security plugin
○ Encrypt cross cluster traffic
Exercise
● Create 2 domains in AWS OpenSearch
● Link
Exercise
● Source Domain Connections Tab -> Outbound ->
○ Create Connection to Destination Domain
● Set access policy on destination domain:
● Link
○
○
Exercise
● Get Connection status
○ GET _plugins/_replication/connect1/_status
● Start syncing
○ PUT _plugins/_replication/connect1/_start
○ {
○ "leader_alias": "Connect1",
○ "leader_index": "movies",
○ "use_roles":{
○ "leader_cluster_role": "all_access",
○ "follower_cluster_role": "all_access"
○ }
○ }
Plugins
Opensearch plugins
● Standalone components
○ That add features and capabilities
● Huge number of plugins available
● E.g.
○ Replication Plugin
○ Security plugin
○ Notification plugin
SQL Plugin
● Lets you run SQL queries on ESDB
● Add data
○ PUT movies/_doc/1
○ { "title": "Spirited Away" }
● Query data
○ POST _plugins/_sql
○ {
○ "query": "SELECT * FROM movies LIMIT 50"
○ }
○
SQL Plugin
● Delete data from ESDB Index
● Enable Delete via SQL plugin
○ PUT _plugins/_query/settings
○ {
○ "transient": {
○ "plugins.sql.delete.enabled": "true"
○ }
○ }
○
SQL PLugin - Delete
● To Delete the data
○ POST _plugins/_sql
○ {
○ "query": "DELETE FROM movies"
○ }
○
Asynchronous Search
● Large volumes of data
● Can take longer to search
● Async
○ Run searches in the background
○ Monitor progress of these searches
○ Get back partial results as they become available
Asynchronous Search
● POST _plugins/_asynchronous_search
● Response contents:
○ ID
■ Can be used to track the state of the search
■ Get partial results
○ State
■ Running
■ Completed
■ Persisted
● Link
OpenSearch Clients
Clients
● OpenSearch Python client
● OpenSearch JavaScript (Node.js) client
● OpenSearch .NET clients
● OpenSearch Go client
● OpenSearch PHP client
Open Search Client for .NET
● OpenSearch.Net
○ Low level client
● OpenSearch.Client
○ High level client
● Sample code: Link
Exercise
● Create a .NET application
● Add a document to OpenSearch using the .NET Application
○ OpenSearch.Client (.NET High level client)
Agents and Ingestion Tools
Beats
● Data shippers
● Agents on servers
● Send data to ES/ Logstash
Grafana
● An open source visualization tool
● Various sources can be used as data source:
○ InfluxDB
○ MySQL
○ ElasticSearch
○ PostgreSQL
● Better suited for metrics visualizations
● Does not allow full text data querying
Logstash
● Free/ Open-Source
● Data processing pipeline
● Ingests data from multitude of sources
● Transforms it
● Sends it to your favorite stash
Logstash - Ingestion
● Data of all shapes/ sizes/ source
○ Can be ingested
● It can parse/ transform your data
Logstash - Output
● ElasticSearch
● Mongodb
● S3
● Etc.
● Link
AWS OpenSearch Security
● Use multi-factor authentication (MFA) with each account.
● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2
or later.
● Set up API and user activity logging with AWS CloudTrail.
● Use AWS encryption solutions, along with all default security controls within
AWS services.
● Use advanced managed security services such as Amazon Macie, which
assists in discovering and securing personal data that is stored in Amazon S3.
● If you require FIPS 140-2 validated cryptographic modules when accessing
AWS through a command line interface or an API, use a FIPS endpoint.
Summary
● Opensearch
○ Open Source Search solution
● Upcoming and supported by AWS
● Caters to most search use cases
○ Great Query performance
● Powerful tools
● Community Support
Connect with me
● Trainings on various tech topics
● For any questions:
○ https://linkedin.com/in/coach4dev

More Related Content

What's hot

Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 

What's hot (20)

Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Elk
Elk Elk
Elk
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
Azure rev002
Azure rev002Azure rev002
Azure rev002
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Grafana vs Kibana
Grafana vs KibanaGrafana vs Kibana
Grafana vs Kibana
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 

Similar to OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in RetailHari Shreedharan
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Marcos García
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportMetosin Oy
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Dimitar Danailov
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with GatlingPetr Vlček
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional ProgrammerDave Cross
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015cameronevans
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 

Similar to OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine (20)

Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience report
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with Gatling
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 

Recently uploaded

Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

  • 2. Agenda ● OpenSearch ○ What is it? ○ Benefits/ Uses ○ How to use it ○ Features ● Migrate from Elastic to OpenSearch ● Tools & Plugins
  • 3. About Me ● Lead Dev ● Located in Florida ● Trainer ● Presenter ● .NET Developer ● Youtuber: Coach4Dev ● Husband/ Father
  • 4. Amazon Elasticsearch ● Launched in 2015 ● Gained popularity for log analytics usage ● Used open-source Elastic under Apache License v2 ● Jan 2021 ○ Elastic NV changed licensing strategy ○ After ElasticSearch 7.10.2 & Kibana 7.10.2 ■ Not release under Apache License v2 ■ Release under Elastic License
  • 5. OpenSearch ● Sep 2021: ○ Renamed from ElasticSearch to OpenSearch ● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2 ● Highly scalable ● Fast access & response to large volumes of data ● Powered by Apache Lucene Search library
  • 6. Apache Lucene ● Apache Lucene project develops open-source search software ○ Releases a core search library named Lucene core ● Lucene Core ○ Java Library providing powerful indexing and search features
  • 7. Apache Solr ● Open source search platform ● Built on Apache Lucene
  • 8. Solr vs ElasticSearch ● Similar performance mostly. ● ES has better support for scalability ○ due to horizontal scaling ■ Better cloud support too ● ES can support multiple doc types in a single index better ○ More difficult to do this in Solr ● ES supports native DSL (Domain Specific Language) ○ Need to program queries in Solr ● https://mindmajix.com/elasticsearch-vs-solr
  • 9. Why OpenSearch ● Huge amount of machine generated data these days ○ Growing exponentially ● Getting insights is important ● Interactive log analytics ● Real-time application monitoring ● Website Search, etc.
  • 10. OpenSearch Features ● Easy to set-up and configure ● In-place upgrades ● Enables data monitoring & setting alerts based on thresholds ● Supports authentication, encryption & compliance requirements
  • 11. OpenSearch vs ElasticSearch ● OpenSearch was forked from Elastic Search ○ Now they are separate from each other ● Each is adding features separately ● OpenSearch ○ Inbuilt support from AWS
  • 12. OpenSearch features not in ES (free version) ● Centralized user accounts / access control ● Cross-cluster replication ● IP filtering ● Configurable retention period ● Anomaly detection ● Tableau connector ● JDBC driver ● ODBC driver ● Machine learning features such as regression and classification ● Link
  • 13. ElasticSearch Features ● Based on subscription levels ● https://www.elastic.co/subscriptions
  • 14. OpenSearch & ElasticSearch Version Support ● Currently supports the following OpenSearch versions: ○ 1.3, 1.2, 1.1, 1.0 ● And supports the following ElasticSearch versions: ○ 7.10, 7.9, 7.8, 7.7, 7.4, 7.1 ○ 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0 ○ 5.6, 5.5, 5.3, 5.1 ○ 2.3 ○ 1.5
  • 15. What is Kibana ● Free & open front end application ● Charting tool for Elastic Stack ● Sits on top of Elastic Stack ● Sample Dashboard
  • 16. OpenSearch Dashboards ● Default visualization tool for data in OpenSearch ● Filter data with queries ● Comes with opensearch service
  • 18. OpenSearch Cluster ● Synonymous to domain ● Domains are clusters with ○ settings, ○ instance types, ○ instance counts, ○ and storage resources that you specify. ● Group of nodes ○ With same cluster.name attribute
  • 19. Opensearch Node ● Member of a cluster ● A distinct host ● With IP address
  • 20. Getting Started ● Create a domain ● Size the domain appropriately for your workload ● Control access to your domain using a domain access policy or fine-grained access control ● Index data manually or from other AWS services ● Use OpenSearch Dashboards to search your data and create visualizations
  • 21. Custom Endpoint ● If we want easier to read or custom domain name ● Can use Https ○ Upload SSL certificate
  • 22. Run OpenSearch locally ● Install docker ● wsl -d docker-desktop ● sysctl -w vm.max_map_count=262144 ● Ctrl+C ● docker-compose up ● Visit http://localhost:5601/ ● Use admin/admin to login and explore ● Link
  • 23. Upload Data ● One at a time ● Bulk
  • 24. Upload Data One At a time ● curl -XPUT -u "master:XXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/movies/_doc/1" -d "{"director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}" -H "Content-Type: application/json"
  • 25. Upload Data Bulk ● curl -XPOST -u "master:XXXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/_bulk" --data-binary @bulk_movies.txt -H "Content-Type: application/json"
  • 27. Searching Data ● URI Searches ● Command Line ● OpenSearch Dashboards
  • 28. Searching Data - URI ● GET Request ● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am azonaws.com/movies/_search?q=rebel&pretty=true ● Searches all the indices and properties
  • 29. URI Search Specific fields ● Search movies index and title property ● GET https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti tle:house
  • 30. Get Search Results - Command Line ● curl -XGET -u "master:XXXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/movies/_search?q=rebel&pretty=true"
  • 31. Query DSL ● For more complex queries ○ OpenSearch Domain Specific Language (DSL) ● POST request with query body ●
  • 32. Get Search Results - Dev Tools ● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am azonaws.com/_dashboards/app/dev_tools#/console ○ GET _search ○ { ○ "query": { ○ "match_all": {} ○ } ○ }
  • 33. Search on only specific fields GET _search { "size": 20, "query": { "multi_match": { "query": "U.S.", "fields": ["title", "actor", "director"] } } }
  • 34. Search - Boosting fields GET _search { "size": 20, "query": { "multi_match": { "query": "john", "fields": ["title^4", "actor", "director^4"] } } }
  • 35. Search - Pagination GET _search { "from": 0, "size": 1, "query": { "multi_match": { "query": "Drama", "fields": ["genre"] } } }
  • 36. Query -With Highlights In Response GET _search { "size": 20, "query": { "multi_match": { "query": "Manchurian", "fields": ["title^4", "actor", "director"] } }, "highlight": { "fields": { "title": {} }, "pre_tags": "<strong>", "post_tags": "</strong>", "fragment_size": 200, "boundary_chars": ".,!? " } }
  • 37. Query - Count GET movies/_count { "query": { "multi_match": { "query": "Manchurian", "fields": ["title^4", "actor", "director"] } } }
  • 38. Dashboard Query Language ● Use DQL in Dashboards ○ Search for data and visualizations ● Terms Query ○ Search for any text ■ E.g. www.example.com ○ Access object’s nested field ■ E.g. coordinates.lat:43.7102 ○ Leading and trailing wildcards ■ host.keyword:*.example.com/* ● Operators ○ AND ○ OR
  • 39. Dashboard Query Language ● Date and range Queries ○ bytes >= 15 and memory < 15 ○ @timestamp > "2020-12-14T09:35:33" ● Nested field query ○ superheroes: {hero-name: Superman}
  • 41. Query Workbench ● SQL ○ Run SQL ○ Treat indices as tables ● PPL ○ Piped Processing Language ○ Commands delimited by pipes
  • 42. Reporting ● Multiple file formats ● On demand/ Scheduled ● Generate from ○ Dashboard ○ Visualization ○ Discover
  • 43. Anomaly Detection ● Detect unusual behavior in time series data ● Anomaly Grade ● Confidence Score
  • 44. Notifications ● Supported ○ Amazon Chime ○ SNS ○ SES ○ SMTP ○ Slack ○ Custom Webhooks
  • 45. Observability plugin ● Visualize/Query time series data ● Event analytics ● Compare the data the way you like
  • 46. Index Management ● Create ISM policy ● To manage your indexes
  • 47. Security plugin ● Set up RBAC ●
  • 48. Migrate from ElasticSearch to OpenSearch
  • 49. Three major approaches ● Snapshot ● Rolling Upgrade ● Cluster Restart
  • 50. Snapshot Method ● Generate snapshot in ElasticSearch ● Save in shared directory ● Restore in OpenSearch ● Snapshot ○ Backup of entire cluster state ○ Useful for recovery from failure and migration ● Link
  • 51. Snapshot Method ● Check Index compatibility ○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster ● Link ● Fastest ● Easiest ● Most efficient ●
  • 52. Rolling Upgrade ● Official way to migrate cluster ● Without interruption ● Rolling upgrades are supported: ○ Between minor versions ○ From 5.6 to 6.8 ○ From 6.8 to 7.14.1 ○ From any version since 7.14.0 to 7.14.1
  • 53. Rolling Upgrade ● Shut down one node at a time ○ Minimal disruption
  • 54. Cluster Restart Upgrades ● Shut down all nodes ● Perform the upgrade ● Restart the cluster
  • 56. OpenSearch Mapping ● Dynamic ○ When you index a document ○ Opensearch adds fields automatically ○ It deduces their types by itself ● Explicit ○ If you know your data types ○ Preferred way of doing things
  • 57. OpenSearch Mapping ● If you do not define a mapping ahead of time, OpenSearch dynamically creates a mapping for you. ● If you do decide to define your own mapping, you can do so at index creation. ● ONE mapping is defined per index. Once the index has been created, we can only add new fields to a mapping. We CANNOT change the mapping of an existing field. ● If you must change the type of an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.
  • 58. Text vs keyword data types ● Text type ○ Full text searches ● Keyword type ○ Exact searches ○ Aggregations ○ Sorting
  • 59. Text vs Keyword ● Inverted Index
  • 61. OpenSearch Aggregations ● Analyze data ○ In real time too ● Extract statistics ● More expensive than queries ○ Or CPU and Memory ○ In general
  • 62. Aggregation Query ● Use aggs or aggregations
  • 65. Data Streams in OpenSearch ● Ingesting time series data ○ Logs ○ Events ○ Metrics, etc. ● Number of documents grows rapidly ● Append Only data ● Don't need to update older documents (Very rarely)
  • 66. Rollover ● If data is growing rapidly ● Write to index upto certain threshold ○ Then create a new index ○ And start writing to it ● Optimize the active index for high ingest rates on high-performance hot nodes. ● Optimize for search performance on warm nodes. ● Shift older, less frequently accessed data to less expensive cold nodes, ● Delete data according to your retention policies by removing entire indices.
  • 67. Index Template ● Data Stream requires an index template ● A name or wildcard (*) pattern for the data stream. ● The data stream’s timestamp field. This field must be mapped as a date or date_nanos field data type and must be included in every document indexed to the data stream. ● The mappings and settings applied to each backing index when it’s created.
  • 68. ILM Policy ● Index Lifecycle Management Policy ● Can be applied to any number of indices ● Usage ○ Allocate ○ Delete ○ Rollover ○ Read Only ○ Wait for snapshot
  • 69. ILM Policy ● Create a policy: ● Link
  • 73. Index Template ● Tells ElasticSearch how to configure an index when it is created ● For data streams ○ Configures the stream’s backing indices ○ Configured prior to index creation
  • 74. Templates Types ● Component Templates ○ Reusable building blocks that configure ■ mappings, ■ settings, and ■ Aliases ○ Not directly applied to indices ● Index Template ○ Collection of component templates ○ Directly applied to indices ○ Some defaults: metrics-*-*, logs-*-*
  • 76. Create Index Template ● Data Stream requires matching index template ● PUT _index_template/{template_name}
  • 78. Create data stream ● Documents must contain timestamp field ● PUT _data_stream/my-data-stream ● Stream’s name must match one of your index template’s index patterns
  • 79. Get Info About Data Stream ● GET _data_stream/my-data-stream
  • 80. Delete Data Stream ● DELETE _data_stream/my-data-stream
  • 82. Cross Cluster Replication ● Cross Cluster replication plugin ○ Replicates indexes, mapping & metadata from one cluster to another ● Advantages ○ Continue to handle search requests if there is an outage ○ Can help reduce latency in application ■ Replicating data across geographically distant data centers
  • 83. Replication ● Active passive model ○ Follower index pulls data from leader index ● It can be ○ Started ○ Paused ○ Stopped ○ Resumed ● Can be secured ○ Security plugin ○ Encrypt cross cluster traffic
  • 84. Exercise ● Create 2 domains in AWS OpenSearch ● Link
  • 85. Exercise ● Source Domain Connections Tab -> Outbound -> ○ Create Connection to Destination Domain ● Set access policy on destination domain: ● Link ○ ○
  • 86. Exercise ● Get Connection status ○ GET _plugins/_replication/connect1/_status ● Start syncing ○ PUT _plugins/_replication/connect1/_start ○ { ○ "leader_alias": "Connect1", ○ "leader_index": "movies", ○ "use_roles":{ ○ "leader_cluster_role": "all_access", ○ "follower_cluster_role": "all_access" ○ } ○ }
  • 88. Opensearch plugins ● Standalone components ○ That add features and capabilities ● Huge number of plugins available ● E.g. ○ Replication Plugin ○ Security plugin ○ Notification plugin
  • 89. SQL Plugin ● Lets you run SQL queries on ESDB ● Add data ○ PUT movies/_doc/1 ○ { "title": "Spirited Away" } ● Query data ○ POST _plugins/_sql ○ { ○ "query": "SELECT * FROM movies LIMIT 50" ○ } ○
  • 90. SQL Plugin ● Delete data from ESDB Index ● Enable Delete via SQL plugin ○ PUT _plugins/_query/settings ○ { ○ "transient": { ○ "plugins.sql.delete.enabled": "true" ○ } ○ } ○
  • 91. SQL PLugin - Delete ● To Delete the data ○ POST _plugins/_sql ○ { ○ "query": "DELETE FROM movies" ○ } ○
  • 92. Asynchronous Search ● Large volumes of data ● Can take longer to search ● Async ○ Run searches in the background ○ Monitor progress of these searches ○ Get back partial results as they become available
  • 93. Asynchronous Search ● POST _plugins/_asynchronous_search ● Response contents: ○ ID ■ Can be used to track the state of the search ■ Get partial results ○ State ■ Running ■ Completed ■ Persisted ● Link
  • 95. Clients ● OpenSearch Python client ● OpenSearch JavaScript (Node.js) client ● OpenSearch .NET clients ● OpenSearch Go client ● OpenSearch PHP client
  • 96. Open Search Client for .NET ● OpenSearch.Net ○ Low level client ● OpenSearch.Client ○ High level client ● Sample code: Link
  • 97. Exercise ● Create a .NET application ● Add a document to OpenSearch using the .NET Application ○ OpenSearch.Client (.NET High level client)
  • 99. Beats ● Data shippers ● Agents on servers ● Send data to ES/ Logstash
  • 100. Grafana ● An open source visualization tool ● Various sources can be used as data source: ○ InfluxDB ○ MySQL ○ ElasticSearch ○ PostgreSQL ● Better suited for metrics visualizations ● Does not allow full text data querying
  • 101. Logstash ● Free/ Open-Source ● Data processing pipeline ● Ingests data from multitude of sources ● Transforms it ● Sends it to your favorite stash
  • 102. Logstash - Ingestion ● Data of all shapes/ sizes/ source ○ Can be ingested ● It can parse/ transform your data
  • 103. Logstash - Output ● ElasticSearch ● Mongodb ● S3 ● Etc. ● Link
  • 104. AWS OpenSearch Security ● Use multi-factor authentication (MFA) with each account. ● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2 or later. ● Set up API and user activity logging with AWS CloudTrail. ● Use AWS encryption solutions, along with all default security controls within AWS services. ● Use advanced managed security services such as Amazon Macie, which assists in discovering and securing personal data that is stored in Amazon S3. ● If you require FIPS 140-2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint.
  • 105. Summary ● Opensearch ○ Open Source Search solution ● Upcoming and supported by AWS ● Caters to most search use cases ○ Great Query performance ● Powerful tools ● Community Support
  • 106. Connect with me ● Trainings on various tech topics ● For any questions: ○ https://linkedin.com/in/coach4dev