This session is for those who are excited by the great power of Apache Solr search for Drupal and want to take things even further. Do you want take complete control over your search interface and offer more than the default features? Have you ever wondered what it takes to add data to your search index? Curious about defining facets, custom sorting, or making cool new widgets for filtering and faceting? Join us for a technical deep dive into the world of Solr search.
The general topics of this presentation will overlap with those covered at Drupalcon SF for the Drupal 6 version, but we will focus on use of the API as found in the Drupal 7 version.
Introducing the Solr index
* Learn about Solr fields, and how to map Drupal data onto them
* See how to add data to the search index
* Execute a search in PHP code and use the results
Using the API for custom search paths and interfaces
* How to use the prepare and alter hooks for the query object, and why they differ.
* Make use facing changes, or add filters that are transparent to the user.
Build custom facets based on node fields
* What comes OOTB
* Hooks to add facets for additional field types
2. We
hope
you
will
leave
having
learned
about:
• What
is
Solr
and
how
do
you
run
it
locally
• Ge9ng
Drupal
data
into
Solr
• Changes
in
Drupal
7
• Field
API
integraAon
• Searching
Solr
from
Drupal
• Modifying
what’s
searched
and
the
results
• Theming
search
results
3. Drupal
Interacts
with
Solr
via
HTTP
• Drupal
sends
data
to
Solr
as
XML
documents
• Solr
accepts
documents
POSTed
to
/update
• A
different
XML
can
be
POSTed
to
delete
• Searching,
etc
are
GET
requests
• If
something
is
not
working
as
expected,
you
can
try
searching
directly
in
Solr
via
URL
• Solr
also
includes
admin
and
analysis
interfaces
(you
need
to
lock
this
down
for
producAon).
4. Run
Solr
Using
the
Example
Dir
Replace the schema.xml and
solrconfig.xml with the ones from
the Drupal module
Invoke the start.jar:
java -jar start.jar
5.
6. Schema:
Defines
Types
&
Fields
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="drupal-0.9.5" version="1.2">
<types>
...
</types>
<fields>
<!-- The document id is derived from a site-spcific key (hash) and the node ID like:
$document->id = $hash . '/node/' . $node->nid; -->
<field name="id" type="string" indexed="true" stored="true" required="true" />
<!-- These are the fields that correspond to a Drupal node. -->
<field name="site" type="string" indexed="true" stored="true"/>
<field name="hash" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
<field name="title" type="text" indexed="true" stored="true" termVectors="true"
omitNorms="true"/>
<field name="sort_title" type="sortString" indexed="true" stored="false"/>
<field name="body" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="teaser" type="text" indexed="false" stored="true"/>
...
</fields>
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>body</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>
</schema>
7. Schema:
Defines
Types
&
Fields
<field name="id" type="string" indexed="true"
stored="true" required="true" />
<!-- These are the fields that correspond to a
Drupal node. -->
<field name="site" type="string" indexed="true"
stored="true"/>
<field name="hash" type="string" indexed="true"
stored="true"/>
8. Dynamic
Fields
Provide
Flexibility
<!-- Dynamic field definitions will be used if the name matches any of the patterns.
The glob-like pattern in the name attribute must have "*" only at the start or the end.
Longer patterns will be matched first. -->
<dynamicField name="is_*" type="integer" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="im_*" type="integer" indexed="true" stored="true" multiValued="true"/>
...
<dynamicField name="ss_*" type="string" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="ts_*" type="text" indexed="true" stored="true" multiValued="false"
termVectors="true"/>
<dynamicField name="ds_*" type="date" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="dm_*" type="date" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="tds_*" type="tdate"indexed="true" stored="true" multiValued="false"/>
<dynamicField name="tdm_*" type="tdate"indexed="true" stored="true" multiValued="true"/>
<dynamicField name="bm_*" type="boolean"
indexed="true" stored="true" multiValued="true"/>
<dynamicField name="bs_*" type="boolean"
indexed="true" stored="true" multiValued="false"/>
...
<!-- Sortable version of the dynamic string field -->
<dynamicField name="sort_ss_*" type="sortString" indexed="true" stored="false"/>
<copyField source="ss_*" dest="sort_ss_*"/>
<!-- A random sort field -->
<dynamicField name="random_*" type="rand" indexed="true" stored="true"/>
<!-- This field is used to store node access records, as opposed to CCK field data -->
<dynamicField name="nodeaccess*" type="integer" indexed="true" stored="false"
multiValued="true"/>
<dynamicField name="*" type="ignored" multiValued="true" />
9. Dynamic
Fields
Provide
Flexibility
<!-- Dynamic field definitions will be used
if the name matches any of the patterns.
The glob-like pattern in the name attribute must
have "*" only at the start or the end.
Longer patterns will be matched first. -->
<dynamicField name="is_*" type="integer"
indexed="true" stored="true"
multiValued="false"/>
<dynamicField name="im_*" type="integer"
indexed="true" stored="true"
multiValued="true"/>
10. Dynamic
Fields
Provide
Flexibility
<!-- Dynamic field definitions will be used if the name matches any of the patterns.
The glob-like pattern in the name attribute must have "*" only at the start or the end.
Longer patterns will be matched first. -->
<dynamicField name="is_*" type="integer" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="im_*" type="integer" indexed="true" stored="true" multiValued="true"/>
...
<dynamicField name="ss_*" type="string" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="ts_*" type="text" indexed="true" stored="true" multiValued="false"
termVectors="true"/>
<dynamicField name="ds_*" type="date" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="dm_*" type="date" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="tds_*" type="tdate"indexed="true" stored="true" multiValued="false"/>
<dynamicField name="tdm_*" type="tdate"indexed="true" stored="true" multiValued="true"/>
<dynamicField name="bm_*" type="boolean"
indexed="true" stored="true" multiValued="true"/>
<dynamicField name="bs_*" type="boolean"
indexed="true" stored="true" multiValued="false"/>
...
<!-- Sortable version of the dynamic string field -->
<dynamicField name="sort_ss_*" type="sortString" indexed="true" stored="false"/>
<copyField source="ss_*" dest="sort_ss_*"/>
<!-- A random sort field -->
<dynamicField name="random_*" type="rand" indexed="true" stored="true"/>
<!-- This field is used to store node access records, as opposed to CCK field data -->
<dynamicField name="nodeaccess*" type="integer" indexed="true" stored="false"
multiValued="true"/>
<dynamicField name="*" type="ignored" multiValued="true" />
11. Dynamic
Fields
Provide
Flexibility
<!-- Sortable version of the dynamic
string field -->
<dynamicField name="sort_ss_*" type="sortString"
indexed="true" stored="false"/>
<copyField source="ss_*" dest="sort_ss_*"/>
<!-- This field is used to store node access
records, as opposed to CCK field data -->
<dynamicField name="nodeaccess*"
type="integer" indexed="true" stored="false"
multiValued="true"/>
<dynamicField name="*" type="ignored"
multiValued="true" />
13. Use the factory method to get an object for
building your queries:
$query = apachesolr_drupal_query(
$keys = '',
$filters = '',
$solrsort = '',
$base_path = '',
$solr = NULL
);
14. The actual class that is returned is
determined by a Drupal variable:
variable_get('apachesolr_query_class',
array('apachesolr', 'Solr_Base_Query'));
22. Use the factory method to get an object for
sending requests to Solr:
$solr =
apachesolr_get_solr($host, $port, $path);
23. The actual class that is returned is
determined by a Drupal variable:
variable_get('apachesolr_service_class',
array('apachesolr',
'Drupal_Apache_Solr_Service.php',
'Drupal_Apache_Solr_Service')
);
24. This allows you to customize the way
search works by providing a different solr
service class than the standard.
variable_set('apachesolr_service_class',
array('acquia_search',
'Acquia_Search_Service.php',
'Acquia_Search_Service')
);
32. Drupal
7
Changes
$query, $params $query->params
$solr->search() $query->search()
• Taxonomy
on
a
node
is
now
a
term
reference
field
(works
as
part
of
the
Field
API
integraAon).
• Fixes
to
core
search
module
APIs
mean
that
some
hacks
are
gone:
e.g.
no,
hook_menu_alter;
we
can
set
apachesolr
as
the
default
via
search
UI.
33. You
Can
Add
Any
Data
to
the
Index
hook_apachesolr_update_index(&$document,
$node, $namespace)
• Used
to
add
more
data
to
a
document
before
it’s
sent
to
Solr.
• Can
also
be
used
to
alter
or
replace
data
added
by
apachesolr
or
another
module.
• This
is
it!
(it
works
like
an
_alter
hook).
34. Image
Data
Using
Dynamic
Fields
/**
* Implementation of hook_apachesolr_update_index().
*/
function apachesolr_image_apachesolr_update_index(&$document, $node, $namespace) {
if ($node->type == 'image' && $document->entity == 'node') {
$areas = array();
$sizes = image_get_derivative_sizes($node->images['_original']);
foreach ($sizes as $name => $info) {
$areas[$name] = $info['width'] * $info['height'];
}
asort($areas);
$image_path = FALSE;
foreach ($areas as $preset => $size) {
$image_path = $node->images[$preset];
break;
}
if ($image_path) {
$document->ss_image_relative = $image_path;
// Support multi-site too.
$document->ss_image_absolute = file_create_url($image_path);
}
}
}
/**
* Implementation of hook_apachesolr_modify_query().
*/
function apachesolr_image_apachesolr_modify_query($query, $caller) {
// Also retrieve image thumbnail links.
$query->params['fl'] .= ',ss_image_relative';
}
35. Image
Data
Using
Dynamic
Fields
if ($image_path) {
$document->ss_image_relative = $image_path;
}
/**
* Implement hook_apachesolr_modify_query().
*/
function
apachesolr_image_apachesolr_modify_query(
$query, $caller) {
// Also retrieve image thumbnail links.
$query->params['fl'] .= ',ss_image_relative';
}
36. UI
to
Exclude
Whole
Content
Types
• ?q=admin/config/search/apachesolr/content-‐bias
37. Control
Indexing
More
Precisely
hook_apachesolr_node_exclude($node, $namespace)
in_array($node->type, variable_get(
'apachesolr_exclude_comments_types', array()))
hook_node_update_index($node)
• hook_node_update_index
output
added
to
body.
• We
can
create
mulAple
documents
from
one
node
(e.g.
document
per
comment).
hook_apachesolr_document_handlers($type,
$namespace)
38. Field
API
IntegraAon
• Most
of
the
Field
API
integraAon
follows
directly
from
the
6.x-‐2.x
CCK
integraAon.
• In
Drupal
7,
we
match
field
types,
rather
than
looking
at
the
widget.
• By
default,
the
data
will
be
indexed
to
Solr
as
mulA-‐valued,
and
named
combining
the
field
module
and
name
sm_$module_$fieldname
39. Typically
need
4
things:
• What
field
types
(or
field
instances)
to
look
for
during
indexing.
• The
data
type
to
use
in
the
index
(index_type)
• A
funcAon
for
extracAng
the
data
from
the
field
while
indexing
(indexing_callback).
• A
funcAon
for
displaying
the
data
from
the
field
during
searches
(display_callback).
80. theme_apachesolr_search_snippets: sets the snippet
// Custom implementation in template.php.
function mcgill_apachesolr_search_snippets($document, $snippets) {
return 'anything you want!';
}
81. Analysis of an apachesolr search request
search_view()
$response = $query->search(...)
$results = apachesolr_search_process_response
($response,$final_query)
theme('search_results', $results, ...)
82. search-result.tpl.php: renders a single search result
<?php print $result['node']->ss_course_code; ?>
If this is user input use check_plain() - Solr can
send you back the same (unsafe) user input you index.
See apachesolr_clean_text() if you want to index text
without tags.
83. Extra thanks to
James McKinney
For use of his slides and for ideas.
jpmckinney on drupal.org
http://evolvingweb.ca/