SlideShare a Scribd company logo
1 of 56
Download to read offline
企业级搜索引擎Solr基础入门
    出家如初,成佛有余
        ,
     http://www.yeeach.com




       2008年12月
目     录

Lucene基础入门
Solr基础入门




               2
Lucene简介
 基于 Java 的全文信息检索工具包
 Lucene不是一个完整的全文索引应擎 ,而是一个用Java
 写的全文索引引擎工具包,它可以方便的嵌入到各种应用
 中实现针对应用的全文索引/检索功能。
 Lucene历史
  Doug Cutting Created in 1999
     g       g
  Donated to Apache in 2001




                                 •Company Logo
                                             www.nfschina.com
Powered by Lucene

   Technorati
   T h       ti
   Wikipedia
   Internet Archive
   LinkedIn
   monster.com
   monster com
   Eclipse的帮助搜索
   Jira
   SouFun
   IBM Omnifind Y! Edition
   …


http://wiki.apache.org/lucene-java/PoweredBy
Lucene Sub-projects
 Nutch
   Web crawler with document parsing

 Hadoop
   Distributed data processor
                    p
   Implements MapReduce

 Solr
搜索机制示意图
Lucene搜索机制示意图
L




                www.nfschina.com
Lucene Architecture:High Level
Main Packages
 org.apache.lucene.index
 org.apache.lucene.search
   g p
 org.apache.lucene.analysis
 org.apache.lucene.document
 org apache lucene document
 org.apache.Lucene.queryParser
 org.apache.Lucene.store
 org.apache.Lucene.util
Lucene Architecture:Relation


                •analyzer                       •QUERY
                                 •query

•DOC                             •parser



                •document
                 document
       •index
                                           •search



                     •INVERTED
                        •FILE
                         FILE
Lucene数据结构与DB类比




                  •Company Logo
                              www.nfschina.com
Lucene查询语句解析器-项(Term)

 一条搜索语句被拆分为一些项(term)和操作符(
 operator)。项有两种类型:单独项和短语。
 单独项就是一个单独的单词,例如quot;testquot; , quot;helloquot;。
 短语是 组被双引号包围的单词,例如 hello dolly
 短语是一组被双引号包围的单词 例如quot;hello dollyquot;。
 多个项可以用布尔操作符连接起来形成复杂的查询语句




                                    11
Lucene查询语句解析器-域(Field)

 Lucene支持域。您可以指定在某一个域中搜索,或者就
 使用默认域。域名及默认域是具体索引器实现决定的。
 用法:域名+quot;:quot;+搜索的项名。

 例子:
 假设某一个Lucene索引包含两个域,title和text,text是默认域。如果要查
 找标题为“The Right Way”且含有“don‘t go this way”的文章,可以输入:
 title:quot;The Right Wayquot; AND text:go
    或者
 title:quot;Do it rightquot; AND right




                                                      12
Lucene查询解析器-项修饰符(Term
Modifiers)

 用通配符搜索
   用符号“?”表示单个任意字符的通配;用符号quot;*quot;表示多个任意字符的通配。

 模糊查询
   Lucene支持基于Levenshtein Distance与Edit Distance算法的模糊搜索。要使
   用模糊搜索只需要在单独项的最后加上符号“ ”
   用模糊搜索只需要在单独项的最后加上符号“~”。
   例如搜索拼写类似于“roam”的项这样写:roam~,则将匹配形如foam和roams的
   单词。

 邻近搜索(Proximity Searches)
   Lucene支持查找相隔一定距离的单词。邻近搜索是在短语最后加上符号“~”。
   例如在文档中搜索相隔10个单词的“apache”和“jakarta”,这样写:quot;jakarta
   apachequot;~10



                                                            13
Lucene查询语句解析器-布尔操作符1

 布尔操作符可将项通过逻辑操作连接起来。Lucene支持
 AND, quot;+quot;, OR, NOT 和 quot;-quot;这些操作符
 OR
  OR操作符是默认的连接操作符。这意味着如果两个项之间没有布尔操作符,就是
  使用OR操作符。
  例如搜索含有quot;jakarta apachequot; 或者 quot;jakartaquot;的文档,使用这样的查询:
      quot;jakarta apachequot; jakarta或者quot;jakarta apachequot; OR jakarta

 AND
  AND操作符匹配的是两项同时出现的文档 这个与集合交操作相等 符号&&可以
  AND操作符匹配的是两项同时出现的文档。这个与集合交操作相等。符号&&可以
  代替符号AND。
  例如搜索同时含有“jakarta apache” 与 “jakarta lucene”的文档,使用查询:
      quot;jakarta apachequot; AND quot;jakarta lucenequot;

                                                              14
Lucene查询语句解析器-布尔操作符2

 +
        “+”操作符或者称为存在操作符,要求符号“+”后的项必须在文档相应的域中存在
        搜索必须含有“jakarta”,可能含有“lucene”的文档,使用查询:
        搜索必须含有“jakarta” 可能含有“lucene”的文档 使用查询
        +jakarta apache

 NOT
        NOT操作符排除那些含有NOT符号后面项的文档。这和集合的差运算相同。符号!可以代替
        符号NOT。
        例如搜索含有“j k t apache”,但是不含有“jakarta l
        例如搜索含有“jakarta   h ” 但是不含有“j k t lucene”的文档,使用查询
                                               ”的文档 使用查询
        quot;jakarta apachequot; NOT quot;jakarta lucene“

 -
        “-”操作符或者禁止操作符排除含有“-”后面的相似项的文档。
        搜索含有“jakarta apache”,但不是“jakarta lucene”,使用查询:
     quot;jakarta apachequot; -quot;jakarta lucenequot;

                                                           15
Lucene查询语句解析器-分组

 Lucene支持使用圆括号来组合字句形成子查询。这对于
 想控制查询布尔逻辑的人十分有用。
 例如搜索含有“jakarta”或者“apache”,同时含有
  website 的文档,使用查询:
 “website”的文档,使用查询:
 (jakarta OR apache) AND website
 这样就消除了歧义,保证website必须存在,jakarta和apache中之一也存在




                                               16
Lucene查询语句解析器-转义特殊字符

 Lucene支持转义特殊字符,特殊字符包括
  + - && || ! ( ) { } [ ] ^ quot; ~ * ? : 

 转义特殊字符只需在字符前加上符号,例如搜索(1+1):2
 ,使用查询
 ,使用查询:
   (1+1):2




                                          17
目     录

Lucene基础入门
Solr基础入门




               18
What Is Solr
 Solr是一个基于Lucene的Java搜索引擎服务器。Solr 提供了层面搜
 索、命中醒目显示并且支持多种输出格式(包括 XML/XSLT 和
 JSON 格式) 它易于安装和配置 而且附带了 个基于 HTTP 的管
      格式)。它易于安装和配置,而且附带了一个基于
 理界面。
 Solr历史
   Yonik Seeley Developed at CNET
   Donated to Apache in 2006




                                           19
Solr 特点
 Features
    Servlet
    Web Administration Interface
    XML/HTTP, JSON Interfaces
    Faceting
    F   ti
    Schema to define types and fields
    Hi hli hti
    Highlighting
    Caching
    Index R li ti
    I d Replication (Master / Slaves)
                    (M t      Sl    )
    Pluggable
    Java 5
Powered by Solr
  Netflix
  CNET
  AOL:sports and music
  AOL     t    d    i
  Shopper.com
  Drupal module
  GameSpot
  Reddit
  Instructables
  http://news.com
  …
http://wiki.apache.org/solr/PublicServers
Solr Architecture
                            •HTTP Request Servlet                •Update Servlet


                            •Disjunction                             •XML
    •Admin      •Standard                  •Custom     •XML
                               •Max                                 •Update
   •Interface   •Request
                 Request                   •Request •Response
                                            Request Response
                             •Request                              •Interface
                •Handler                   •Handler    •Writer
                             •Handler


     •Config         •Schema                          •Caching
                                                                    •Update
                                 •Solr Core
                                                                    •Handler
             •Analysis                     •Concurrency


                                                                                   •Replication
                                                                                    R li ti
                                        •Lucene




    22
Solr安装配置-环境准备

 由于Solr基于java开发,因此Solr在windows及Linux都能较好部署
 使用,但由于Solr提供了一些用于测试及管理、维护较为方便的
 shell脚本 因此在生产部署时候建议安装在Linux上
 shell脚本,因此在生产部署时候建议安装在Linux上。
 下面以Linux下安装配置Solr进行说明,windows与此类似。
   wget http://apache mirror phpchina com/tomcat/tomcat-6/v6 0 16/bin/apache-tomcat-
        http://apache.mirror.phpchina.com/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-
   6.0.16.zip
   unzip apache-tomcat-6.0.16.zip
   mv apache-tomcat-6.0.16 /opt/tomcat
   chmod 755 /opt/tomcat/bin/*
   wget http://apache.mirror.phpchina.com/lucene/solr/1.2/apache-solr-1.2.0.tgz
        http://apache.mirror.phpchina.com/lucene/solr/1.2/apache solr 1.2.0.tgz

 安装方式
   Solr的安装配置最为麻烦的是对solr.solr.home的理解和配置
   三种安装方式:基于当前路径的方式、基于环境变量solr.solr.home、基于JNDI配置


                                                                                       23
Solr安装配置1-基于当前路径的方式

 cp apache-solr-1.2.0/dist/apache-solr-1.2.0.war
 /opt/tomcat/webapps/solr.war
 mkdir /
       /opt/solr-tomcat
           /
 cp -r apache-solr-1.2.0/example/solr/ /opt/solr-tomcat/
 cd /opt/solr-tomcat
 /opt/tomcat/bin/startup.sh
 备注:
 由于在此种情况下(没有设定solr.solr.home环境变量或JNDI的情况下),Solr查找
  / l 因此在启动时候需要切换到/
 ./solr,因此在启动时候需要切换到/opt/solr-tomcat
                        / l




                                                           24
Solr安装配置2 基于JVM环境变量
Solr安装配置2-基于JVM环境变量

   在当前用户的环境变量中(.bash_profile)增加solr.solr.home
export JAVA_OPTS=quot;$JAVA_OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr“
. .bash_profile
   bash profile

   在/opt/tomcat/catalina.sh中添加如下环境变量
   export JAVA OPTS=quot;$JAVA OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr
          JAVA_OPTS= $JAVA_OPTS Dsolr solr home=/opt/solr tomcat/solrquot;




                                                                         25
Solr安装配置2 基于JVM环境变量
Solr安装配置2-基于JVM环境变量

   在当前用户的环境变量中(.bash_profile)增加solr.solr.home
export JAVA_OPTS=quot;$JAVA_OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr“
. .bash_profile
   bash profile

   在/opt/tomcat/catalina.sh中添加如下环境变量
   export JAVA OPTS=quot;$JAVA OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr
          JAVA_OPTS= $JAVA_OPTS Dsolr solr home=/opt/solr tomcat/solrquot;




                                                                         26
Solr安装配置3 基于JNDI配置
Solr安装配置3-基于JNDI配置

   mkdir –p /opt/tomcat/conf/Catalina/localhost
   touch /opt/tomcat/conf/Catalina/localhost/solr.xml :
<Context docBase=quot;/opt/tomcat/webapps/solr.warquot; debug=quot;0quot; crossContext=quot;truequot; >
   <Environment name=quot;solr/homequot; type=quot;java.lang.Stringquot; value=quot;/opt/solr-tomcat/solrquot;
   override=quot;truequot; />
</Context>quot;




                                                                                         27
Solr测试使用-提交索引数据测试
 使用shell脚本(curl)测试Solr的操作:
 cd apache-solr-1.2.0/example/exampledocs
 vi post.sh        根据tomcat的ip、port修改URL变量的值

 ./post.sh *.xml
   p

 使用Solr的java 包测试Solr的操作:
 查看帮助 j
 查看帮助:java -jar post.jar –help
            j      tj     h l
 提交测试数据:
 java -Durl=http://localhost:8080/solr/update -Ddata=files -jar post.jar *.xml




                                                                                 28
Solr测试使用2-查询索引测试
 查询测试
 通过solr的管理员界面http://localhost:8080/solr/admin查询

 通过curl 测试:
 export URL=quot;http://localhost:8080/solr/select/quot;
 curl quot;$URL?indent=on&q=liangchuan&fl=*,scorequot;




                                                   29
Solr管理界面




           30
管   功能
Solr管理界面功能
Show Config, Schema, Distribution info
Sh     C fi S h      Di t ib ti i f
Query Interface
Statistics
  Caches: lookups, hits, hitratio, inserts, evictions,
  size
   i
  RequestHandlers: requests, errors
  UpdateHandler: adds, deletes, commits, optimizes
                 adds deletes commits
  IndexReader, open-time, index-version, numDocs,
  maxDocs,
Analysis Debugger
  Shows tokens after each Analyzer stage
  Shows token matches for query vs index
                                                         31
Configuration (solrconfig.xml)
<mainIndex> 
   i   d
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
</mainIndex>


<requestHandler name=quot;standardquot; class=quot;solr.StandardRequestHandlerquot; />
<requestHandler name=“customquot; class=quot;your.package.CustomRequestHandlerquot; />


<autoCommit>
    <maxDocs>10000</maxDocs>
    <maxTime>1000</maxTime>
</autoCommit>


<queryResponseWriter name= xml  class= org.apache.solr.request.XMLResponseWriter  
<queryResponseWriter name=quot;xmlquot; class=quot;org apache solr request XMLResponseWriterquot; 
    default=quot;truequot;/>
Schema
 Lucene has no notion of a schema
   Sorting - string vs. numeric
   Ranges - val:42 included in val:[1 TO 5] ?
   Lucene QueryParser has date-range support, but must
   guess.

 Defines fields, their types, p p
                        yp    properties
 Defines unique key field, default search field,
 Similarity implementation



                                                         33
Schema (schema.xml)
Fields
<uniqueKey>id</uniqueKey>


<field name=quot;productsquot; type=quot;textquot; indexed=quot;truequot; stored=“truequot;/>
<field name=quot;keywordsquot; type=quot;text_wsquot; indexed=quot;truequot; stored=“true”/>
<field name=quot;keywordsSortedquot; type=quot;text_sortedquot; indexed=quot;truequot; stored=quot;falsequot;/>
<field name=quot;timestampquot; type=quot;datequot; indexed=quot;truequot; stored=quot;truequot; default=quot;NOWquot;/>


<dynamicField name=quot;*_iquot; type=quot;integerquot; indexed=quot;truequot; stored=quot;truequot;/>
<dynamicField name=quot;desc_*quot; type=quot;stringquot; indexed=quot;truequot; stored=quot;falsequot;/>


<copyField source=“keywordsquot; dest=“keywordsSortedquot;/>
Field Definitions
   Field Attributes: name type indexed stored
                     name, type, indexed, stored,
   multiValued, omitNorms
<field name=quot;id“ type=quot;stringquot;      indexed=quot;truequot; stored=quot;truequot;/>
<field name=quot;sku“ type=quot;textTight” indexed=quot;truequot; stored=quot;truequot;/>
       name= sku type= textTight indexed= true stored= true />
<field name=quot;name“         type=quot;text“      indexed=quot;truequot; stored=quot;truequot;/>
<field name=“reviews“ type=quot;text“     indexed=quot;true“ stored=“falsequot;/>
<field name=quot;category“ type=quot;text_ws“ indexed=quot;truequot; stored=quot;true“
    multiValued=quot;truequot;/>


   Dynamic Fields, in the spirit of Lucene!
    y            ,         p
<dynamicField name=quot;*_iquot; type=quot;sint“ indexed=quot;truequot; stored=quot;truequot;/>
<dynamicField name=quot;*_squot; type=quot;string“ indexed=quot;truequot; stored=quot;truequot;/>
<dynamicField name=quot;*_tquot; type=quot;text“ indexed=quot;truequot; stored=quot;truequot;/>




                                                                             35
Schema-Analyzers
<fieldtype name=quot;nametextquot; class=quot;solr TextFieldquot;>
<fieldtype name= nametext  class= solr.TextField >
    <analyzer class=quot;org.apache.lucene.analysis.WhitespaceAnalyzerquot;/>
</fieldtype>

<fieldtype name=quot;textquot; class=quot;solr.TextFieldquot;>
    <analyzer>
            <tokenizer class=quot;solr.StandardTokenizerFactoryquot;/>
            <filter class=quot;solr.StandardFilterFactoryquot;/>
            <filter class=quot;solr.LowerCaseFilterFactoryquot;/>
            <filter class=quot;solr.StopFilterFactoryquot;/>
            <filter class=quot;solr.PorterStemFilterFactoryquot;/>
            <filter class solr.PorterStemFilterFactory />
    </analyzer>
</fieldtype>

<fieldtype name=quot;myfieldtypequot; class=quot;solr.TextFieldquot;>
    <analyzer>
    <   l     >
            <tokenizer class=quot;solr.WhitespaceTokenizerFactoryquot;/>
            <filter class=quot;solr.SnowballPorterFilterFactoryquot; language=quot;Germanquot; />
    </analyzer>
</fieldtype>
Solr索引接口概述
 Solr对外提供标准的http接口来实现对数据的索引的增加、删除、修改、查
 询。在 Solr 中,用户通过向部署在servlet 容器中的 Solr Web 应用程序发
 送 HTTP 请求来启动索引和搜索。Solr 接受请求,确定要使用的适当
 SolrRequestHandler,然后处理请求。通过 HTTP 以同样的方式返回响应
 。默认配置返回 Solr 的标准 XML 响应,也可以配置 Solr 的备用响应格式
 可以向 Solr 索引 servlet 传递四个不同的索引请求:
   add/update 允许向 Solr 添加文档或更新文档。直到提交后才能搜索到这些添加和更新。
   commit 告诉 Solr,应该使上次提交以来所做的所有更改都可以搜索到。
   optimize 重构 Lucene 的文件以改进搜索性能。索引完成后执行一下优化通常比较好。如果
   更新比较频繁,则应该在使用率较低的时候安排优化。一个索引无需优化也可以正常地运行。
   优化是 个耗时较多的过程。
   优化是一个耗时较多的过程。
   delete 可以通过 id 或查询来指定。按 id 删除将删除具有指定 id 的文档;按查询删除将删除
   查询返回的所有文档。




                                                          37
Solr-Add操作
    HTTP POST to http://localhost:8080/solr/update/
<add>
    <doc>
             <field name=quot;employeeIdquot;>05991</field>
             <field name=quot;officequot;>Bridgewater</field>
             <field name=quot;skillsquot;>Perl</field>
             <field name=quot;skillsquot;>Java</field>
    </doc>
    [<doc> ... </doc>[<doc> ... </doc>]]
</add>




Documents or fields can have boosts attached
Solr-Update / Delete操作
 Inserting a document with already present
 uniqueKey will erase the original
 Deleting
   By uniqueKey field
   <delete><id>05991</id></delete>

   By query
   <delete><query>name:Anthony</query></delete>

 <Commit/>
 <Optimize/>
Solr-Search操作
 Core parameters
    • qt – query type (request handler)
      q q      y yp ( q               )
    • wt – writer type (response writer)
 Common parameters
    •q
    • sort
    • start
    • rows
    • fq – filters
    • fl – return fields
Default Parameters
Query Arguments for HTTP GET/POST to /select

param     default  description
q                  The query
start     0        Offset into the list of matches
rows      10       Number f documents t return
                   N b of d              t to t
fl        *        Stored fields to return
qt        standard Query type; maps to query
                   handler
df        (schema) Default field to search

     41
Default Query Syntax
Lucene Query Syntax [; sort specification]
1.   mission impossible; releaseDate desc
               p
2.   +mission +impossible –actor:cruise
3.
3    “mission impossible” –actor:cruise
                           actor:cruise
4.   title:spiderman^10 description:spiderman
5.   description:“spiderman movie”~10
6.   +HDTV +weight:[0 TO 100]
      HDTV weight:[0
7.   Wildcard queries: te?t, te*t, test*


                                                42
Solr-Search实例
http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=‐
http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet limit=‐
    1&facet.field=cat&facet.mincount=1&facet.field=inStock

<response>
    <responseHeader>
            <status>0</status>
            <QTime>3</QTime>
    </responseHeader>
    <result numFound=quot;4quot; start=quot;0quot;/>
    <lst name=quot;facet_countsquot;>
                     _
    <lst name=quot;facet_queriesquot;/>
    <lst name=quot;facet_fieldsquot;>
            <lst name=quot;catquot;>
                        <int name=quot;musicquot;>1</int>
                        <int name=quot;connectorquot;>2</int>
                         i t      quot;      t quot; 2 /i t
                        <int name=quot;electronicsquot;>3</int>
            </lst>
            <lst name=quot;inStockquot;>
                        <int name=quot;falsequot;>3</int>
                             name false >3</int>
                        <int name=quot;truequot;>1</int>
            </lst>
    </lst>
    </lst>
</response>
Solr-Search Faceting
 Faceting
   Available in StandardRequestHandler and
   DisMaxRequestHandler
Faceted Browsing
                     •computer_type:PC
                                                                 •proc_manu:Intel      •= 594
                    •memory:[1GB TO *]
                          y[         ]
                                                •intersection    •proc_manu:AMD        •= 382
 •computer
                                •price asc          Size()

       •Search(Query,Filter[],Sort,offset,n)
       •Search(Query Filter[] Sort offset n)                     •price:[0 TO 500]     •= 247

•section of                           •Unordered                •price:[500 TO 1000]   •= 689
 ordered
    d d                                set of all
  results                               results                     •manu:Dell         •= 104
    •DocList                      •DocSet
                                                                     •manu:HP          •= 92

                                                                  •manu:Lenovo         •= 75



                 •Query Response


                                                                                                45
Faceted Browsing
Example
     p




                   46
Caching
IndexSearcher’s view of an index is fixed
      Aggressive caching possible
      Consistency for multi-query requests
filterCache – unordered set of document ids matching a query
resultCache – ordered subset of document ids matching a query
documentCache – the stored fields of documents
userCaches – application specific, custom query handlers




                                                                47
Warming for Speed
 Lucene IndexReader warming
   field norms, FieldCache, tii – the term index

 Static Cache warming
   Configurable static requests to warm new Searchers
        g                q

 Smart Cache Warming (autowarming)
   Using MRU items in the current cache to pre populate the
                                           pre-populate
   new cache

 Warming in parallel with live requests



                                                              48
Smart Cache Warming
                            •Warming        •Live
                            Requests        Requests
    •On-Deck                                                 •Registered
         •Solr                                                  •Solr
                                       •Request
  •IndexSearcher
   IndexSearcher                                            •IndexSearcher
                                                             IndexSearcher
                   •2                  •Handler
         •User                                                  •User
                                                       •1
     • Cache            •Regenerator
                         Regenerator                           • Cache
                   •3
                    3
                                        •Autowarming
         •Filter                                                •Filter      •Field
     • Cache            •Regenerator                           • Cache       •Cache

     •Result                                                   •Result
                                                                             •Field
     • Cache            •Regenerator                           • Cache
                                        •Autowarming –                       •Norms
                                                                              N
         •Doc                           warm n MRU              •Doc
     • Cache                            cache keys w/          • Cache
                                        new Searcher

    49
Search Relevancy
 •Document Analysis
  Document                                                         •Query Analysis
                                                                    Query

•PowerShot SD 500                           •power-shot sd500


   •WhitespaceTokenizer                          •WhitespaceTokenizer


•PowerShot      •SD   •500                  •power-shot •sd500

                                                •WordDelimiterFilter catenateWords=0
  •WordDelimiterFilter catenateWords=1

•Power     •Shot      •SD    •500           •power      •shot       •sd    •500
         •PowerShot

          •LowercaseFilter                             •LowercaseFilter

•power       •shot     •sd   •500           •power      •shot       •sd    •500
         •powershot
                                    •A Match!
                                     A

                                                                                       50
Configuring Relevancy
<fieldtype name=quot;textquot; class=quot;solr.TextFieldquot;>
<analyzer>
  <tokenizer class=quot;solr.WhitespaceTokenizerFactoryquot;/>
  <filter class=quot;solr.LowerCaseFilterFactoryquot;/>
    f                       C                 /
  <filter class=quot;solr.SynonymFilterFactoryquot;
         synonyms=quot;synonyms.txt“/>
  <filter class=quot;solr.StopFilterFactory“
          class solr.StopFilterFactory
         words=“stopwords.txt”/>
  <filter class=quot;solr.EnglishPorterFilterFactoryquot;
         protected=quot;protwords.txtquot;/>
</analyzer>
  /    l
</fieldtype>




                                                         51
copyField
   Copies one field to another at index time
   Usecase: Analyze same field different ways
     copy into a field with a different analyzer
     boost exact-case, exact-punctuation matches
     language translations, thesaurus, soundex
<field name=“title” type=“text”/>
<field name=“title exact” type=“text exact” stored=“false”/>
 field name title_exact type text_exact stored false /
<copyField source=“title” dest=“title_exact”/>


   Usecase: I d multiple fields into single searchable
   U        Index lti l fi ld i t i l            h bl
   field


                                                               52
High Availability                                                    •Dynamic
                                                                         HTML
                                  •Appservers                            Generation




                                                                              •HTTP search
                                •Load Balancer
                                                                              requests

                               •Solr Searchers




                                                  •Index Replication
                                                   Index

              •admin queries
                                                 •updates
                                                                                   •DB
                  •updates                                         •Updater
•admin terminal                   •Solr Master



                                                                                         53
Replication
   •Master                                              •Searcher
 •solr/data/index                                    •solr/data/index

                                                                                   •after
                                                                                    mv
                            •new segment



                    •Lucene index segments
                     Lucene
  •1. hard links                                      •2. hard links      •4. mv dir


                                                                                   •after
                                      •3. rsync                                    rsync



•solr/data/snapshot-2006062950000                 •solr/data/snapshot-2006062950000-WIP




                                                                                          54
Resources
 WWW
   http://wiki.apache.org/solr/
   http://www.ibm.com/developerworks/cn/java/j-solr1/
   http://www.ibm.com/developerworks/cn/java/j-solr2/
   http://www.xml.com/pub/a/2006/08/09/solr indexing xml with lucene
   http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-
   andrest.html?page=1
   http://lucene.apache.org/java/docs/queryparsersyntax.html
   http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html

 Mailing Lists
   solr-user-subscribe@lucene.apache.org
   solr-dev-subscribe@lucene.apache.org


                                                                        55
谢 谢!

       56

More Related Content

What's hot

Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionLibin Pan
 
تركيب سيرفر ومجلة ومنتدي
تركيب سيرفر ومجلة ومنتديتركيب سيرفر ومجلة ومنتدي
تركيب سيرفر ومجلة ومنتديnansyrigan
 
مقدمة عن أندرويد
مقدمة عن أندرويدمقدمة عن أندرويد
مقدمة عن أندرويدahmed_hassan
 
Archydro
ArchydroArchydro
Archydroabkhiz
 
قیمت ماسک فلت فولد
قیمت ماسک فلت فولدقیمت ماسک فلت فولد
قیمت ماسک فلت فولدmohammaddoge
 
IPV9人类共同的理想/IPv9 - The common ideal for human being
IPV9人类共同的理想/IPv9 - The common ideal for human beingIPV9人类共同的理想/IPv9 - The common ideal for human being
IPV9人类共同的理想/IPv9 - The common ideal for human beingshizhao
 
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷Bazaar Insight
 
لماذا يحرم الإسلام الخنزير
لماذا يحرم الإسلام الخنزيرلماذا يحرم الإسلام الخنزير
لماذا يحرم الإسلام الخنزيرAbdullah Baspren
 
CEO-018-領導的基本概念
CEO-018-領導的基本概念CEO-018-領導的基本概念
CEO-018-領導的基本概念handbook
 
ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理Shinya Miyazaki
 
Text Mining and SEASR
Text Mining and SEASRText Mining and SEASR
Text Mining and SEASRLoretta Auvil
 
Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Roozbeh Molavi
 
QM-076-六標準差管理方法的解題邏輯與策略
QM-076-六標準差管理方法的解題邏輯與策略QM-076-六標準差管理方法的解題邏輯與策略
QM-076-六標準差管理方法的解題邏輯與策略handbook
 
『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラMasaaki HIROSE
 
Republic 3 4
Republic 3 4Republic 3 4
Republic 3 4huquanwei
 
秩序从哪里来?
秩序从哪里来?秩序从哪里来?
秩序从哪里来?guest8430ea2
 
الاليات القانونية لحماية البيئة
الاليات القانونية لحماية البيئةالاليات القانونية لحماية البيئة
الاليات القانونية لحماية البيئةباجي مختار
 

What's hot (19)

Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese Version
 
تركيب سيرفر ومجلة ومنتدي
تركيب سيرفر ومجلة ومنتديتركيب سيرفر ومجلة ومنتدي
تركيب سيرفر ومجلة ومنتدي
 
It Flyer Page08
It Flyer Page08It Flyer Page08
It Flyer Page08
 
مقدمة عن أندرويد
مقدمة عن أندرويدمقدمة عن أندرويد
مقدمة عن أندرويد
 
Archydro
ArchydroArchydro
Archydro
 
قیمت ماسک فلت فولد
قیمت ماسک فلت فولدقیمت ماسک فلت فولد
قیمت ماسک فلت فولد
 
20070329 Phpconf2007 Training
20070329 Phpconf2007 Training20070329 Phpconf2007 Training
20070329 Phpconf2007 Training
 
IPV9人类共同的理想/IPv9 - The common ideal for human being
IPV9人类共同的理想/IPv9 - The common ideal for human beingIPV9人类共同的理想/IPv9 - The common ideal for human being
IPV9人类共同的理想/IPv9 - The common ideal for human being
 
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷
گزارش توسعه‌دهندگان بازار در فصل پاییز ۱۳۹۷
 
لماذا يحرم الإسلام الخنزير
لماذا يحرم الإسلام الخنزيرلماذا يحرم الإسلام الخنزير
لماذا يحرم الإسلام الخنزير
 
CEO-018-領導的基本概念
CEO-018-領導的基本概念CEO-018-領導的基本概念
CEO-018-領導的基本概念
 
ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理
 
Text Mining and SEASR
Text Mining and SEASRText Mining and SEASR
Text Mining and SEASR
 
Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...
 
QM-076-六標準差管理方法的解題邏輯與策略
QM-076-六標準差管理方法的解題邏輯與策略QM-076-六標準差管理方法的解題邏輯與策略
QM-076-六標準差管理方法的解題邏輯與策略
 
『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ『Ficia』インフラとPerlにまつわるエトセトラ
『Ficia』インフラとPerlにまつわるエトセトラ
 
Republic 3 4
Republic 3 4Republic 3 4
Republic 3 4
 
秩序从哪里来?
秩序从哪里来?秩序从哪里来?
秩序从哪里来?
 
الاليات القانونية لحماية البيئة
الاليات القانونية لحماية البيئةالاليات القانونية لحماية البيئة
الاليات القانونية لحماية البيئة
 

Viewers also liked

Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaGrade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaNiño Caindoy
 
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTER
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTERKasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTER
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTERJhing Pantaleon
 
Presentacioncableadoestructurado 130620221834-phpapp02
Presentacioncableadoestructurado 130620221834-phpapp02Presentacioncableadoestructurado 130620221834-phpapp02
Presentacioncableadoestructurado 130620221834-phpapp02MEP en imágenes
 
La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...Academia de Ingeniería de México
 
Tula, talumpati, maikling kwento, pabula, sanaysay
Tula, talumpati, maikling kwento, pabula, sanaysayTula, talumpati, maikling kwento, pabula, sanaysay
Tula, talumpati, maikling kwento, pabula, sanaysayMariel Flores
 
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)Zacatecas TresPuntoCero
 
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner Module
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner ModuleK to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner Module
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner ModuleNico Granada
 
INFORME DE AUDITORIA GUBERNAMENTAL
INFORME DE  AUDITORIA GUBERNAMENTALINFORME DE  AUDITORIA GUBERNAMENTAL
INFORME DE AUDITORIA GUBERNAMENTALmalbertorh
 
Plan estratégico seguridad de los pacientes de extremadura
Plan estratégico seguridad de los pacientes de extremaduraPlan estratégico seguridad de los pacientes de extremadura
Plan estratégico seguridad de los pacientes de extremaduraSociosaniTec
 
Interacciones farmaco-alimento
Interacciones farmaco-alimentoInteracciones farmaco-alimento
Interacciones farmaco-alimentoGénesis Cedeño
 
Proyectos_de_innovacion
Proyectos_de_innovacionProyectos_de_innovacion
Proyectos_de_innovacionWebMD
 
Actualiteiten ICT Contracten en Partnerships (2012)
Actualiteiten ICT Contracten en Partnerships (2012)Actualiteiten ICT Contracten en Partnerships (2012)
Actualiteiten ICT Contracten en Partnerships (2012)Advocatenkantoor LEGALZ
 
Marco del buen desempeño docente
Marco del buen desempeño docenteMarco del buen desempeño docente
Marco del buen desempeño docente0013
 
Primer Paquete Económico 2017 Zacatecas (2/9)
Primer Paquete Económico 2017 Zacatecas (2/9)Primer Paquete Económico 2017 Zacatecas (2/9)
Primer Paquete Económico 2017 Zacatecas (2/9)Zacatecas TresPuntoCero
 
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 Integration
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 IntegrationSio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 Integration
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 IntegrationJessica Breton
 

Viewers also liked (20)

Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa AsyaGrade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
Grade 8: Araling Panlipunan Modyul 2: Mga Sinaunang Kabihasnan sa Asya
 
Cuadro de cuentas
Cuadro de cuentasCuadro de cuentas
Cuadro de cuentas
 
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTER
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTERKasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTER
Kasaysayan ng Daigdig Araling Panlipunan Grade 9 THIRD QUARTER
 
2º fundamentos3
2º fundamentos32º fundamentos3
2º fundamentos3
 
Presentacioncableadoestructurado 130620221834-phpapp02
Presentacioncableadoestructurado 130620221834-phpapp02Presentacioncableadoestructurado 130620221834-phpapp02
Presentacioncableadoestructurado 130620221834-phpapp02
 
La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...
 
Tula, talumpati, maikling kwento, pabula, sanaysay
Tula, talumpati, maikling kwento, pabula, sanaysayTula, talumpati, maikling kwento, pabula, sanaysay
Tula, talumpati, maikling kwento, pabula, sanaysay
 
Control
ControlControl
Control
 
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
 
Estudio economico De Un Proyecto
Estudio economico De Un ProyectoEstudio economico De Un Proyecto
Estudio economico De Un Proyecto
 
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner Module
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner ModuleK to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner Module
K to 12 - Grade 8 Edukasyon sa Pagpapakatao Learner Module
 
Proyecto Formativo
Proyecto FormativoProyecto Formativo
Proyecto Formativo
 
INFORME DE AUDITORIA GUBERNAMENTAL
INFORME DE  AUDITORIA GUBERNAMENTALINFORME DE  AUDITORIA GUBERNAMENTAL
INFORME DE AUDITORIA GUBERNAMENTAL
 
Plan estratégico seguridad de los pacientes de extremadura
Plan estratégico seguridad de los pacientes de extremaduraPlan estratégico seguridad de los pacientes de extremadura
Plan estratégico seguridad de los pacientes de extremadura
 
Interacciones farmaco-alimento
Interacciones farmaco-alimentoInteracciones farmaco-alimento
Interacciones farmaco-alimento
 
Proyectos_de_innovacion
Proyectos_de_innovacionProyectos_de_innovacion
Proyectos_de_innovacion
 
Actualiteiten ICT Contracten en Partnerships (2012)
Actualiteiten ICT Contracten en Partnerships (2012)Actualiteiten ICT Contracten en Partnerships (2012)
Actualiteiten ICT Contracten en Partnerships (2012)
 
Marco del buen desempeño docente
Marco del buen desempeño docenteMarco del buen desempeño docente
Marco del buen desempeño docente
 
Primer Paquete Económico 2017 Zacatecas (2/9)
Primer Paquete Económico 2017 Zacatecas (2/9)Primer Paquete Económico 2017 Zacatecas (2/9)
Primer Paquete Económico 2017 Zacatecas (2/9)
 
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 Integration
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 IntegrationSio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 Integration
Sio2009 Eq10 L5 Tra Gold Bernstein & Ruh Cap3 Integration
 

Similar to 企业级搜索引擎Solr交流

Chinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhChinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhJesse Cai
 
Search Psychology
Search PsychologySearch Psychology
Search Psychologynaoleiying
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IWei Jen Lu
 
Archydro
ArchydroArchydro
Archydroabkhiz
 
Ontology-based Content Management System (ICIM 2008)
Ontology-based Content Management System (ICIM 2008)Ontology-based Content Management System (ICIM 2008)
Ontology-based Content Management System (ICIM 2008)Brian Hsu
 
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Chui-Wen Chiu
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Phpgmaxsonic
 
095722121-期中報告-UGC
095722121-期中報告-UGC095722121-期中報告-UGC
095722121-期中報告-UGCcherish0906
 
Gorm @ gopher china
Gorm @ gopher chinaGorm @ gopher china
Gorm @ gopher chinaJinzhu
 
20070920173805
2007092017380520070920173805
200709201738055045033
 
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)Windows 7兼容性系列课程(3):有针对的兼容性开发(上)
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)Chui-Wen Chiu
 
اف تي بي
اف تي بياف تي بي
اف تي بيnansyrigan
 
اف تي بي
اف تي بياف تي بي
اف تي بيnansyrigan
 
Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Charles (XXC) Chen
 
Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Chui-Wen Chiu
 
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」devsumi2009
 

Similar to 企业级搜索引擎Solr交流 (20)

Chinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 ZhChinaonrails Rubyonrails21 Zh
Chinaonrails Rubyonrails21 Zh
 
Install Moodle
Install MoodleInstall Moodle
Install Moodle
 
Search Psychology
Search PsychologySearch Psychology
Search Psychology
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part I
 
Revisited
RevisitedRevisited
Revisited
 
Archydro
ArchydroArchydro
Archydro
 
Ontology-based Content Management System (ICIM 2008)
Ontology-based Content Management System (ICIM 2008)Ontology-based Content Management System (ICIM 2008)
Ontology-based Content Management System (ICIM 2008)
 
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Php
 
095722121-期中報告-UGC
095722121-期中報告-UGC095722121-期中報告-UGC
095722121-期中報告-UGC
 
Gorm @ gopher china
Gorm @ gopher chinaGorm @ gopher china
Gorm @ gopher china
 
20070920173805
2007092017380520070920173805
20070920173805
 
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)Windows 7兼容性系列课程(3):有针对的兼容性开发(上)
Windows 7兼容性系列课程(3):有针对的兼容性开发(上)
 
اف تي بي
اف تي بياف تي بي
اف تي بي
 
اف تي بي
اف تي بياف تي بي
اف تي بي
 
Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418
 
Iir 08 ver.1.0
Iir 08 ver.1.0Iir 08 ver.1.0
Iir 08 ver.1.0
 
Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证
 
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」
【13-C-4】 「もう業務はとまらない!オフライン機能を使った業務アプリケーションの実例と最新 Curl 情報」
 
الكشاف
الكشافالكشاف
الكشاف
 

More from chuan liang

移动互联网开发基础
移动互联网开发基础移动互联网开发基础
移动互联网开发基础chuan liang
 
文本挖掘(Text mining)基础
文本挖掘(Text mining)基础文本挖掘(Text mining)基础
文本挖掘(Text mining)基础chuan liang
 
Scrum入门基础(Scrum in a nutshell)
Scrum入门基础(Scrum in a nutshell)Scrum入门基础(Scrum in a nutshell)
Scrum入门基础(Scrum in a nutshell)chuan liang
 
Role Based Access Control Fundamental
Role Based Access Control FundamentalRole Based Access Control Fundamental
Role Based Access Control Fundamentalchuan liang
 
电子商务推荐系统入门基础V2.0
电子商务推荐系统入门基础V2.0电子商务推荐系统入门基础V2.0
电子商务推荐系统入门基础V2.0chuan liang
 
Recommender Systems in E-Commerce V2.0
Recommender Systems in E-Commerce V2.0Recommender Systems in E-Commerce V2.0
Recommender Systems in E-Commerce V2.0chuan liang
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commercechuan liang
 
面向对象的分析设计之UML基础
面向对象的分析设计之UML基础面向对象的分析设计之UML基础
面向对象的分析设计之UML基础chuan liang
 
面向对象的分析设计之RUP基础及用例建模
面向对象的分析设计之RUP基础及用例建模面向对象的分析设计之RUP基础及用例建模
面向对象的分析设计之RUP基础及用例建模chuan liang
 

More from chuan liang (11)

调心
调心调心
调心
 
调心
调心调心
调心
 
移动互联网开发基础
移动互联网开发基础移动互联网开发基础
移动互联网开发基础
 
文本挖掘(Text mining)基础
文本挖掘(Text mining)基础文本挖掘(Text mining)基础
文本挖掘(Text mining)基础
 
Scrum入门基础(Scrum in a nutshell)
Scrum入门基础(Scrum in a nutshell)Scrum入门基础(Scrum in a nutshell)
Scrum入门基础(Scrum in a nutshell)
 
Role Based Access Control Fundamental
Role Based Access Control FundamentalRole Based Access Control Fundamental
Role Based Access Control Fundamental
 
电子商务推荐系统入门基础V2.0
电子商务推荐系统入门基础V2.0电子商务推荐系统入门基础V2.0
电子商务推荐系统入门基础V2.0
 
Recommender Systems in E-Commerce V2.0
Recommender Systems in E-Commerce V2.0Recommender Systems in E-Commerce V2.0
Recommender Systems in E-Commerce V2.0
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
 
面向对象的分析设计之UML基础
面向对象的分析设计之UML基础面向对象的分析设计之UML基础
面向对象的分析设计之UML基础
 
面向对象的分析设计之RUP基础及用例建模
面向对象的分析设计之RUP基础及用例建模面向对象的分析设计之RUP基础及用例建模
面向对象的分析设计之RUP基础及用例建模
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

企业级搜索引擎Solr交流

  • 1. 企业级搜索引擎Solr基础入门 出家如初,成佛有余 , http://www.yeeach.com 2008年12月
  • 2. 录 Lucene基础入门 Solr基础入门 2
  • 3. Lucene简介 基于 Java 的全文信息检索工具包 Lucene不是一个完整的全文索引应擎 ,而是一个用Java 写的全文索引引擎工具包,它可以方便的嵌入到各种应用 中实现针对应用的全文索引/检索功能。 Lucene历史 Doug Cutting Created in 1999 g g Donated to Apache in 2001 •Company Logo www.nfschina.com
  • 4. Powered by Lucene Technorati T h ti Wikipedia Internet Archive LinkedIn monster.com monster com Eclipse的帮助搜索 Jira SouFun IBM Omnifind Y! Edition … http://wiki.apache.org/lucene-java/PoweredBy
  • 5. Lucene Sub-projects Nutch Web crawler with document parsing Hadoop Distributed data processor p Implements MapReduce Solr
  • 8. Main Packages org.apache.lucene.index org.apache.lucene.search g p org.apache.lucene.analysis org.apache.lucene.document org apache lucene document org.apache.Lucene.queryParser org.apache.Lucene.store org.apache.Lucene.util
  • 9. Lucene Architecture:Relation •analyzer •QUERY •query •DOC •parser •document document •index •search •INVERTED •FILE FILE
  • 10. Lucene数据结构与DB类比 •Company Logo www.nfschina.com
  • 11. Lucene查询语句解析器-项(Term) 一条搜索语句被拆分为一些项(term)和操作符( operator)。项有两种类型:单独项和短语。 单独项就是一个单独的单词,例如quot;testquot; , quot;helloquot;。 短语是 组被双引号包围的单词,例如 hello dolly 短语是一组被双引号包围的单词 例如quot;hello dollyquot;。 多个项可以用布尔操作符连接起来形成复杂的查询语句 11
  • 12. Lucene查询语句解析器-域(Field) Lucene支持域。您可以指定在某一个域中搜索,或者就 使用默认域。域名及默认域是具体索引器实现决定的。 用法:域名+quot;:quot;+搜索的项名。 例子: 假设某一个Lucene索引包含两个域,title和text,text是默认域。如果要查 找标题为“The Right Way”且含有“don‘t go this way”的文章,可以输入: title:quot;The Right Wayquot; AND text:go 或者 title:quot;Do it rightquot; AND right 12
  • 13. Lucene查询解析器-项修饰符(Term Modifiers) 用通配符搜索 用符号“?”表示单个任意字符的通配;用符号quot;*quot;表示多个任意字符的通配。 模糊查询 Lucene支持基于Levenshtein Distance与Edit Distance算法的模糊搜索。要使 用模糊搜索只需要在单独项的最后加上符号“ ” 用模糊搜索只需要在单独项的最后加上符号“~”。 例如搜索拼写类似于“roam”的项这样写:roam~,则将匹配形如foam和roams的 单词。 邻近搜索(Proximity Searches) Lucene支持查找相隔一定距离的单词。邻近搜索是在短语最后加上符号“~”。 例如在文档中搜索相隔10个单词的“apache”和“jakarta”,这样写:quot;jakarta apachequot;~10 13
  • 14. Lucene查询语句解析器-布尔操作符1 布尔操作符可将项通过逻辑操作连接起来。Lucene支持 AND, quot;+quot;, OR, NOT 和 quot;-quot;这些操作符 OR OR操作符是默认的连接操作符。这意味着如果两个项之间没有布尔操作符,就是 使用OR操作符。 例如搜索含有quot;jakarta apachequot; 或者 quot;jakartaquot;的文档,使用这样的查询: quot;jakarta apachequot; jakarta或者quot;jakarta apachequot; OR jakarta AND AND操作符匹配的是两项同时出现的文档 这个与集合交操作相等 符号&&可以 AND操作符匹配的是两项同时出现的文档。这个与集合交操作相等。符号&&可以 代替符号AND。 例如搜索同时含有“jakarta apache” 与 “jakarta lucene”的文档,使用查询: quot;jakarta apachequot; AND quot;jakarta lucenequot; 14
  • 15. Lucene查询语句解析器-布尔操作符2 + “+”操作符或者称为存在操作符,要求符号“+”后的项必须在文档相应的域中存在 搜索必须含有“jakarta”,可能含有“lucene”的文档,使用查询: 搜索必须含有“jakarta” 可能含有“lucene”的文档 使用查询 +jakarta apache NOT NOT操作符排除那些含有NOT符号后面项的文档。这和集合的差运算相同。符号!可以代替 符号NOT。 例如搜索含有“j k t apache”,但是不含有“jakarta l 例如搜索含有“jakarta h ” 但是不含有“j k t lucene”的文档,使用查询 ”的文档 使用查询 quot;jakarta apachequot; NOT quot;jakarta lucene“ - “-”操作符或者禁止操作符排除含有“-”后面的相似项的文档。 搜索含有“jakarta apache”,但不是“jakarta lucene”,使用查询: quot;jakarta apachequot; -quot;jakarta lucenequot; 15
  • 16. Lucene查询语句解析器-分组 Lucene支持使用圆括号来组合字句形成子查询。这对于 想控制查询布尔逻辑的人十分有用。 例如搜索含有“jakarta”或者“apache”,同时含有 website 的文档,使用查询: “website”的文档,使用查询: (jakarta OR apache) AND website 这样就消除了歧义,保证website必须存在,jakarta和apache中之一也存在 16
  • 17. Lucene查询语句解析器-转义特殊字符 Lucene支持转义特殊字符,特殊字符包括 + - && || ! ( ) { } [ ] ^ quot; ~ * ? : 转义特殊字符只需在字符前加上符号,例如搜索(1+1):2 ,使用查询 ,使用查询: (1+1):2 17
  • 18. 录 Lucene基础入门 Solr基础入门 18
  • 19. What Is Solr Solr是一个基于Lucene的Java搜索引擎服务器。Solr 提供了层面搜 索、命中醒目显示并且支持多种输出格式(包括 XML/XSLT 和 JSON 格式) 它易于安装和配置 而且附带了 个基于 HTTP 的管 格式)。它易于安装和配置,而且附带了一个基于 理界面。 Solr历史 Yonik Seeley Developed at CNET Donated to Apache in 2006 19
  • 20. Solr 特点 Features Servlet Web Administration Interface XML/HTTP, JSON Interfaces Faceting F ti Schema to define types and fields Hi hli hti Highlighting Caching Index R li ti I d Replication (Master / Slaves) (M t Sl ) Pluggable Java 5
  • 21. Powered by Solr Netflix CNET AOL:sports and music AOL t d i Shopper.com Drupal module GameSpot Reddit Instructables http://news.com … http://wiki.apache.org/solr/PublicServers
  • 22. Solr Architecture •HTTP Request Servlet •Update Servlet •Disjunction •XML •Admin •Standard •Custom •XML •Max •Update •Interface •Request Request •Request •Response Request Response •Request •Interface •Handler •Handler •Writer •Handler •Config •Schema •Caching •Update •Solr Core •Handler •Analysis •Concurrency •Replication R li ti •Lucene 22
  • 23. Solr安装配置-环境准备 由于Solr基于java开发,因此Solr在windows及Linux都能较好部署 使用,但由于Solr提供了一些用于测试及管理、维护较为方便的 shell脚本 因此在生产部署时候建议安装在Linux上 shell脚本,因此在生产部署时候建议安装在Linux上。 下面以Linux下安装配置Solr进行说明,windows与此类似。 wget http://apache mirror phpchina com/tomcat/tomcat-6/v6 0 16/bin/apache-tomcat- http://apache.mirror.phpchina.com/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat- 6.0.16.zip unzip apache-tomcat-6.0.16.zip mv apache-tomcat-6.0.16 /opt/tomcat chmod 755 /opt/tomcat/bin/* wget http://apache.mirror.phpchina.com/lucene/solr/1.2/apache-solr-1.2.0.tgz http://apache.mirror.phpchina.com/lucene/solr/1.2/apache solr 1.2.0.tgz 安装方式 Solr的安装配置最为麻烦的是对solr.solr.home的理解和配置 三种安装方式:基于当前路径的方式、基于环境变量solr.solr.home、基于JNDI配置 23
  • 24. Solr安装配置1-基于当前路径的方式 cp apache-solr-1.2.0/dist/apache-solr-1.2.0.war /opt/tomcat/webapps/solr.war mkdir / /opt/solr-tomcat / cp -r apache-solr-1.2.0/example/solr/ /opt/solr-tomcat/ cd /opt/solr-tomcat /opt/tomcat/bin/startup.sh 备注: 由于在此种情况下(没有设定solr.solr.home环境变量或JNDI的情况下),Solr查找 / l 因此在启动时候需要切换到/ ./solr,因此在启动时候需要切换到/opt/solr-tomcat / l 24
  • 25. Solr安装配置2 基于JVM环境变量 Solr安装配置2-基于JVM环境变量 在当前用户的环境变量中(.bash_profile)增加solr.solr.home export JAVA_OPTS=quot;$JAVA_OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr“ . .bash_profile bash profile 在/opt/tomcat/catalina.sh中添加如下环境变量 export JAVA OPTS=quot;$JAVA OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr JAVA_OPTS= $JAVA_OPTS Dsolr solr home=/opt/solr tomcat/solrquot; 25
  • 26. Solr安装配置2 基于JVM环境变量 Solr安装配置2-基于JVM环境变量 在当前用户的环境变量中(.bash_profile)增加solr.solr.home export JAVA_OPTS=quot;$JAVA_OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr“ . .bash_profile bash profile 在/opt/tomcat/catalina.sh中添加如下环境变量 export JAVA OPTS=quot;$JAVA OPTS -Dsolr.solr.home=/opt/solr-tomcat/solr JAVA_OPTS= $JAVA_OPTS Dsolr solr home=/opt/solr tomcat/solrquot; 26
  • 27. Solr安装配置3 基于JNDI配置 Solr安装配置3-基于JNDI配置 mkdir –p /opt/tomcat/conf/Catalina/localhost touch /opt/tomcat/conf/Catalina/localhost/solr.xml : <Context docBase=quot;/opt/tomcat/webapps/solr.warquot; debug=quot;0quot; crossContext=quot;truequot; > <Environment name=quot;solr/homequot; type=quot;java.lang.Stringquot; value=quot;/opt/solr-tomcat/solrquot; override=quot;truequot; /> </Context>quot; 27
  • 28. Solr测试使用-提交索引数据测试 使用shell脚本(curl)测试Solr的操作: cd apache-solr-1.2.0/example/exampledocs vi post.sh 根据tomcat的ip、port修改URL变量的值 ./post.sh *.xml p 使用Solr的java 包测试Solr的操作: 查看帮助 j 查看帮助:java -jar post.jar –help j tj h l 提交测试数据: java -Durl=http://localhost:8080/solr/update -Ddata=files -jar post.jar *.xml 28
  • 29. Solr测试使用2-查询索引测试 查询测试 通过solr的管理员界面http://localhost:8080/solr/admin查询 通过curl 测试: export URL=quot;http://localhost:8080/solr/select/quot; curl quot;$URL?indent=on&q=liangchuan&fl=*,scorequot; 29
  • 31. 功能 Solr管理界面功能 Show Config, Schema, Distribution info Sh C fi S h Di t ib ti i f Query Interface Statistics Caches: lookups, hits, hitratio, inserts, evictions, size i RequestHandlers: requests, errors UpdateHandler: adds, deletes, commits, optimizes adds deletes commits IndexReader, open-time, index-version, numDocs, maxDocs, Analysis Debugger Shows tokens after each Analyzer stage Shows token matches for query vs index 31
  • 32. Configuration (solrconfig.xml) <mainIndex>  i d <useCompoundFile>false</useCompoundFile> <mergeFactor>10</mergeFactor> <maxBufferedDocs>1000</maxBufferedDocs> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> </mainIndex> <requestHandler name=quot;standardquot; class=quot;solr.StandardRequestHandlerquot; /> <requestHandler name=“customquot; class=quot;your.package.CustomRequestHandlerquot; /> <autoCommit> <maxDocs>10000</maxDocs> <maxTime>1000</maxTime> </autoCommit> <queryResponseWriter name= xml  class= org.apache.solr.request.XMLResponseWriter   <queryResponseWriter name=quot;xmlquot; class=quot;org apache solr request XMLResponseWriterquot;  default=quot;truequot;/>
  • 33. Schema Lucene has no notion of a schema Sorting - string vs. numeric Ranges - val:42 included in val:[1 TO 5] ? Lucene QueryParser has date-range support, but must guess. Defines fields, their types, p p yp properties Defines unique key field, default search field, Similarity implementation 33
  • 35. Field Definitions Field Attributes: name type indexed stored name, type, indexed, stored, multiValued, omitNorms <field name=quot;id“ type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot;/> <field name=quot;sku“ type=quot;textTight” indexed=quot;truequot; stored=quot;truequot;/> name= sku type= textTight indexed= true stored= true /> <field name=quot;name“ type=quot;text“ indexed=quot;truequot; stored=quot;truequot;/> <field name=“reviews“ type=quot;text“ indexed=quot;true“ stored=“falsequot;/> <field name=quot;category“ type=quot;text_ws“ indexed=quot;truequot; stored=quot;true“ multiValued=quot;truequot;/> Dynamic Fields, in the spirit of Lucene! y , p <dynamicField name=quot;*_iquot; type=quot;sint“ indexed=quot;truequot; stored=quot;truequot;/> <dynamicField name=quot;*_squot; type=quot;string“ indexed=quot;truequot; stored=quot;truequot;/> <dynamicField name=quot;*_tquot; type=quot;text“ indexed=quot;truequot; stored=quot;truequot;/> 35
  • 36. Schema-Analyzers <fieldtype name=quot;nametextquot; class=quot;solr TextFieldquot;> <fieldtype name= nametext  class= solr.TextField > <analyzer class=quot;org.apache.lucene.analysis.WhitespaceAnalyzerquot;/> </fieldtype> <fieldtype name=quot;textquot; class=quot;solr.TextFieldquot;> <analyzer> <tokenizer class=quot;solr.StandardTokenizerFactoryquot;/> <filter class=quot;solr.StandardFilterFactoryquot;/> <filter class=quot;solr.LowerCaseFilterFactoryquot;/> <filter class=quot;solr.StopFilterFactoryquot;/> <filter class=quot;solr.PorterStemFilterFactoryquot;/> <filter class solr.PorterStemFilterFactory /> </analyzer> </fieldtype> <fieldtype name=quot;myfieldtypequot; class=quot;solr.TextFieldquot;> <analyzer> < l > <tokenizer class=quot;solr.WhitespaceTokenizerFactoryquot;/> <filter class=quot;solr.SnowballPorterFilterFactoryquot; language=quot;Germanquot; /> </analyzer> </fieldtype>
  • 37. Solr索引接口概述 Solr对外提供标准的http接口来实现对数据的索引的增加、删除、修改、查 询。在 Solr 中,用户通过向部署在servlet 容器中的 Solr Web 应用程序发 送 HTTP 请求来启动索引和搜索。Solr 接受请求,确定要使用的适当 SolrRequestHandler,然后处理请求。通过 HTTP 以同样的方式返回响应 。默认配置返回 Solr 的标准 XML 响应,也可以配置 Solr 的备用响应格式 可以向 Solr 索引 servlet 传递四个不同的索引请求: add/update 允许向 Solr 添加文档或更新文档。直到提交后才能搜索到这些添加和更新。 commit 告诉 Solr,应该使上次提交以来所做的所有更改都可以搜索到。 optimize 重构 Lucene 的文件以改进搜索性能。索引完成后执行一下优化通常比较好。如果 更新比较频繁,则应该在使用率较低的时候安排优化。一个索引无需优化也可以正常地运行。 优化是 个耗时较多的过程。 优化是一个耗时较多的过程。 delete 可以通过 id 或查询来指定。按 id 删除将删除具有指定 id 的文档;按查询删除将删除 查询返回的所有文档。 37
  • 38. Solr-Add操作 HTTP POST to http://localhost:8080/solr/update/ <add> <doc> <field name=quot;employeeIdquot;>05991</field> <field name=quot;officequot;>Bridgewater</field> <field name=quot;skillsquot;>Perl</field> <field name=quot;skillsquot;>Java</field> </doc> [<doc> ... </doc>[<doc> ... </doc>]] </add> Documents or fields can have boosts attached
  • 39. Solr-Update / Delete操作 Inserting a document with already present uniqueKey will erase the original Deleting By uniqueKey field <delete><id>05991</id></delete> By query <delete><query>name:Anthony</query></delete> <Commit/> <Optimize/>
  • 40. Solr-Search操作 Core parameters • qt – query type (request handler) q q y yp ( q ) • wt – writer type (response writer) Common parameters •q • sort • start • rows • fq – filters • fl – return fields
  • 41. Default Parameters Query Arguments for HTTP GET/POST to /select param default description q The query start 0 Offset into the list of matches rows 10 Number f documents t return N b of d t to t fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search 41
  • 42. Default Query Syntax Lucene Query Syntax [; sort specification] 1. mission impossible; releaseDate desc p 2. +mission +impossible –actor:cruise 3. 3 “mission impossible” –actor:cruise actor:cruise 4. title:spiderman^10 description:spiderman 5. description:“spiderman movie”~10 6. +HDTV +weight:[0 TO 100] HDTV weight:[0 7. Wildcard queries: te?t, te*t, test* 42
  • 43. Solr-Search实例 http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=‐ http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet limit=‐ 1&facet.field=cat&facet.mincount=1&facet.field=inStock <response> <responseHeader> <status>0</status> <QTime>3</QTime> </responseHeader> <result numFound=quot;4quot; start=quot;0quot;/> <lst name=quot;facet_countsquot;> _ <lst name=quot;facet_queriesquot;/> <lst name=quot;facet_fieldsquot;> <lst name=quot;catquot;> <int name=quot;musicquot;>1</int> <int name=quot;connectorquot;>2</int> i t quot; t quot; 2 /i t <int name=quot;electronicsquot;>3</int> </lst> <lst name=quot;inStockquot;> <int name=quot;falsequot;>3</int> name false >3</int> <int name=quot;truequot;>1</int> </lst> </lst> </lst> </response>
  • 44. Solr-Search Faceting Faceting Available in StandardRequestHandler and DisMaxRequestHandler
  • 45. Faceted Browsing •computer_type:PC •proc_manu:Intel •= 594 •memory:[1GB TO *] y[ ] •intersection •proc_manu:AMD •= 382 •computer •price asc Size() •Search(Query,Filter[],Sort,offset,n) •Search(Query Filter[] Sort offset n) •price:[0 TO 500] •= 247 •section of •Unordered •price:[500 TO 1000] •= 689 ordered d d set of all results results •manu:Dell •= 104 •DocList •DocSet •manu:HP •= 92 •manu:Lenovo •= 75 •Query Response 45
  • 47. Caching IndexSearcher’s view of an index is fixed Aggressive caching possible Consistency for multi-query requests filterCache – unordered set of document ids matching a query resultCache – ordered subset of document ids matching a query documentCache – the stored fields of documents userCaches – application specific, custom query handlers 47
  • 48. Warming for Speed Lucene IndexReader warming field norms, FieldCache, tii – the term index Static Cache warming Configurable static requests to warm new Searchers g q Smart Cache Warming (autowarming) Using MRU items in the current cache to pre populate the pre-populate new cache Warming in parallel with live requests 48
  • 49. Smart Cache Warming •Warming •Live Requests Requests •On-Deck •Registered •Solr •Solr •Request •IndexSearcher IndexSearcher •IndexSearcher IndexSearcher •2 •Handler •User •User •1 • Cache •Regenerator Regenerator • Cache •3 3 •Autowarming •Filter •Filter •Field • Cache •Regenerator • Cache •Cache •Result •Result •Field • Cache •Regenerator • Cache •Autowarming – •Norms N •Doc warm n MRU •Doc • Cache cache keys w/ • Cache new Searcher 49
  • 50. Search Relevancy •Document Analysis Document •Query Analysis Query •PowerShot SD 500 •power-shot sd500 •WhitespaceTokenizer •WhitespaceTokenizer •PowerShot •SD •500 •power-shot •sd500 •WordDelimiterFilter catenateWords=0 •WordDelimiterFilter catenateWords=1 •Power •Shot •SD •500 •power •shot •sd •500 •PowerShot •LowercaseFilter •LowercaseFilter •power •shot •sd •500 •power •shot •sd •500 •powershot •A Match! A 50
  • 51. Configuring Relevancy <fieldtype name=quot;textquot; class=quot;solr.TextFieldquot;> <analyzer> <tokenizer class=quot;solr.WhitespaceTokenizerFactoryquot;/> <filter class=quot;solr.LowerCaseFilterFactoryquot;/> f C / <filter class=quot;solr.SynonymFilterFactoryquot; synonyms=quot;synonyms.txt“/> <filter class=quot;solr.StopFilterFactory“ class solr.StopFilterFactory words=“stopwords.txt”/> <filter class=quot;solr.EnglishPorterFilterFactoryquot; protected=quot;protwords.txtquot;/> </analyzer> / l </fieldtype> 51
  • 52. copyField Copies one field to another at index time Usecase: Analyze same field different ways copy into a field with a different analyzer boost exact-case, exact-punctuation matches language translations, thesaurus, soundex <field name=“title” type=“text”/> <field name=“title exact” type=“text exact” stored=“false”/> field name title_exact type text_exact stored false / <copyField source=“title” dest=“title_exact”/> Usecase: I d multiple fields into single searchable U Index lti l fi ld i t i l h bl field 52
  • 53. High Availability •Dynamic HTML •Appservers Generation •HTTP search •Load Balancer requests •Solr Searchers •Index Replication Index •admin queries •updates •DB •updates •Updater •admin terminal •Solr Master 53
  • 54. Replication •Master •Searcher •solr/data/index •solr/data/index •after mv •new segment •Lucene index segments Lucene •1. hard links •2. hard links •4. mv dir •after •3. rsync rsync •solr/data/snapshot-2006062950000 •solr/data/snapshot-2006062950000-WIP 54
  • 55. Resources WWW http://wiki.apache.org/solr/ http://www.ibm.com/developerworks/cn/java/j-solr1/ http://www.ibm.com/developerworks/cn/java/j-solr2/ http://www.xml.com/pub/a/2006/08/09/solr indexing xml with lucene http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene- andrest.html?page=1 http://lucene.apache.org/java/docs/queryparsersyntax.html http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html Mailing Lists solr-user-subscribe@lucene.apache.org solr-dev-subscribe@lucene.apache.org 55