Search This Blog

Thursday, August 23, 2012

Install PDF iFilter in SharePoint 2010 Foundation

Introduction:  Recently a customer asked me to install a PDF IFilter in SP 2010 Foundation.  After reading a few blogs and doing some experimentation I learned that this would not be as straight forward as I first thought.  The purpose of this blog is to help guide you through this process so you can get it done quickly and easily.

1.  Why a PDF IFilter? 

According to Microsoft - "When crawling content, the crawler uses an IFilter to read individual file types. Some IFilters read only one file type, whereas others can read several file types. If you have to crawl a file type that is not supported by an IFilter that is provided with Microsoft SharePoint Server 2010, you must install and register the appropriate IFilter on the crawl server."

Since PDF is not "a file type that is not supported by an IFilter that is provided with Microsoft SharePoint Server 2010" you will have to install this IFilter on your search servers if you plan to be able to search inside PDF files.  (A complete list of IFilters that are provided by default can be found here.)

2. Where can I get the PDF Ifilter for SharePoint 2010?

The Adobe PDF IFilter 9 for 64-bit platforms is available for download here.

3.  Process Overview:

Before we get started I wanted to give you an overview of the process steps (you will follow these steps on each SharePoint Server that runs the SharePoint Search Service):

A.  Install the PDF IFilter
B.  Save the PDF file icon to the images folder
C.  Add the mapping entry to the DocIcon.XML file
D.  Modifiy the Registry setting (PowerShell)
E.  Add the extension (PowerShell)
F.  Stop/Start the SharePoint Search Service (PowerShell)
G. Reboot the server
H. Run a full crawl
I.  Run  a PDF search test


4. Step A - A. Install the PDF IFilter

Once you have downloaded the PDF IFilter from the link above (or here), copy the downloaded file to your SharePoint Server, unpack the zip file, and run the PDFFilter64Installer.

5. Step B - Save the PDF file icon to the images folder

Download the Adobe PDF file icon from hereSave the Adobe PDF file icon to the following file location: 

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\Template\Images\  

6. Step C - Add the mapping entry to the DocIcon.XML file

Navigate to this folder - c:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\Template\Xml

Make a copy of the DocIcon.xml file. Edit the original DocIcon.xml file using notepad, and add the following entry in the correct location (it should be placed in alphabetical order):

<Mapping Key="pdf" Value="pdf16.gif"/>

7. PowerShell Script:  For steps D, E and F we will use the following PowerShell Script from Thierry Buisson:

################################ 
# Thierry BUISSON 
# http://www.thierrybuisson.fr 
# 
# Activate pdf extention for Foundation 2010 Search 
# source http://support.microsoft.com/kb/2518465 
################################ 
 
function AddExtension([string] $extension){ 
 
    if ($extension -eq $null) { 
        Write-host "No extention Found" 
    } 
    else{ 
        Write-host "Activating extension $extension" 
         
        $gadmin = new-object -comobject "SPSearch4.GatherMgr.1" -strict 
                 
        Foreach ($application in $gadmin.GatherApplications) 
        { 
            write-host "application name is $application.name" 
            Foreach ($project in $application.GatherProjects) 
            { 
                write-host $project.Extensions 
                $project.Gather.Extensions.Add($extension) 
            } 
 
        } 
    } 
} 
 
function AddPdfRegKey(){ 
     
    $pdfKey = "HKLM:\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf" 
     
    $pdfguid = "{E8978DA6-047F-4E3D-9C78-CDBE46041603}" 
     
    if (Test-Path $pdfKey) {   
        write-host "Pdf registry key already exists"  
         
        $key = Get-Item $pdfKey 
        $values = Get-ItemProperty $key.PSPath 
        foreach ($value in $key.Property) { $value + "=" + $values.$value } 
    } 
    else {   
        Write-host "creating key $pdfKey" 
         
        #create key 
        New-Item -Path $pdfKey 
         
        #Set default value to good guid 
        $defaultKeyName = "(default)" 
        Set-ItemProperty -Path $pdfKey -Name $defaultKeyName -Value $pdfguid 
    } 
     
     
} 
  
AddPdfReqKey 
AddExtension "pdf" 
 
& net stop SPSearch4 
& net start SPSearch4 
 
8. Step D - Modifiy the Registry setting (PowerShell)

To make the neccessary registry entries the script will run the "AddPdfRegKey" function. 

9. Step E - Add the extension (PowerShell)

To add the extension the script will run the "AddExtension" function. 
 

10. Step F -  Stop/Start the SharePoint Search Service (PowerShell)

To stop/start the SharePoint Search Service the script will run the "net stop SPSearch4" and "net start SPSearch4" functions. 

11. Step G - Reboot the server

12. Repeat the Steps Above -  Repeat the steps above on anny additional SharePoint server that have the SharePoint Serach Service running on them. (I know it is uncommon to have more than one search server - especially in Foundation - but this kind of b uild can actually be done as well).

12. Step H - Run a full crawl

Run a full crawl on the SharePoint Search Server by using the following stsadm command (be sure you have added some PDF content to each of the web applications before running the full crawl):

stsadm -o spsearch -action fullcrawlstart

13. Step I -  Run a PDF search test

Go to your sites that have PDF content and run a search for the content inside those PDF files.
Conclusion:  Installing a PDF IFilter to your SharePoint 2010 Foundation Search Server(s) can be cahllenging, but by following this process you should be able to get this process done quickly and easily.

I hope that helps!

Tom