Enhancing Acquia Search by Indexing Attachments via Search API Attachments

  • Last updated
  • 1 minute read

Goal

To help Drupal developers enhance Acquia Search by indexing attachments like PDFs using the Search API Attachments module

Overview

This article was co-authored by Kass Pepper and JP McNeal.

People love PDFs. They are used to share everything from recipes or restaurant menus to financial reports. From a website search perspective, these beloved docs might as well be scrawled in invisible ink -- their content remains buried without the right indexing.

For Drupal developers using Acquia Search with Search API, contrib modules offer a powerful solution. The Search API Attachments module leverages the capabilities of Apache Solr to extend and add value to the search feature on your site, allowing users to find relevant content within various file types (yes, including PDFs). Here's a detailed guide on how to get this set up for a Drupal 10 site hosted on Acquia.

Note: This tutorial assumes you have installed and configured Acquia Search.

  1. Step 1: Install and enable the Search API Attachments module

    Begin by enhancing your Drupal installation with the ability to process and index file attachments:

    • Confirm the Acquia Search and Acquia Connector modules are installed and working as expected.
    • Add the Search API attachments module to your code base.
    • Sign in with administrator credentials.
    • Navigate via the admin toolbar to Manage > Extend.
    • Enable the Search API Attachments module by selecting its checkbox, then click "Install" at the bottom of the page.
  2. Step 2: Configure Search API Attachments to Use Acquia Search

    Configure the module to tap into the power of Acquia Search:

    • Go to Manage > Configuration > Search and metadata > Search API attachments.
    • Select Solr Extractor from the Extraction method dropdown.
    • From the Solr server dropdown, choose the server with an Acquia Search configuration (it will be named Acquia Search API Solr server, or similar).
    • Test the configuration by clicking Submit and test extraction.
    • If successful you will see a confirmation message, such as "Congratulations! The extraction seems working! Yay!"
    Image
    Search API Attachments Settings Page
  3. Step 3: Reconfigure the Search Index

    The next step instructs Search API to consider file attachments:

    • Logged in as an admin, navigate to Manage > Search and metadata > Search API.
    • In the Index row, click Edit under Operations.
    • On the index configuration page, select the "Processors" tab.
    • Enable the "File attachments" processor by checking its box.
    File Attachments Processor
    • Configure the attachment processor to exclude certain file types and apply any limits that may be required.
    Search API Attachments Processor Settings
    • Don't forget to click "Save".
    • Now, shift to the "Fields" tab and click "Add fields".
    Add Fields Configuration Page
    • Choose the respective attachment fields to be indexed and configure the indexing options like boosting factors.
    Adding Fields Dialog Box
    • Save the changes and ensure to reindex your site's search index for changes to take effect.
  4. Step 4: Update the Search View(s)

    Finally, we need to adjust the views to include content from indexed attachments:

    • Navigate to Manage > Structure > Views.
    • Find and edit the "Search" view.
    • In the "Filter Criteria" section, augment the search to include the newly indexed attachment fields.
    • Make sure you're adding to the existing fields, not replacing them – use Ctrl (Command on macOS) to multi-select.
    Image
    Configuring Full text Search Filter
    • Note that the example in the image above for a "Fulltext" filter which allows fuzzy search criteria for multiple fields in the Solr search. Other options may be to add a new search field to search solely for attachments' internal contents.Apply the changes and save the view.
    • It may be necessary to clear the site's caches and reindex to ensure the updates are reflected in search results.

Configuring Search API Attachments in this way empowers a better search experience on your Drupal site, granting users the convenience of locating content concealed within file attachments, transforming the way content is discovered on your platform. With proper indexing, the invisible ink of PDFs and other documents becomes visible and searchable to your users.

Thumbnail Photo by Agence Olloweb on Unsplash