iSearch V1.9c Modified by Ambient Solutions ------------- iSearch homepage: http://www.ambientsolutions.com/searchnow.html COPYRIGHT/LICENSE NOTICE ------------------------ Copyright 2002-2003 Ian Willis and Ambient Solutions. All rights reserved. Use of this script by companies or on commercial web site requires a commercial license. For commercial license details and costs, please visit: http://www.ambientsolutions.com This script may be edited/changed as long as you don't remove any copyright statements and "Powered By iSearch" messages. By using this script you agree to take full responsibility for it. Ian Willis is in no way accountable for any damage caused. Reselling or distributing this code without prior written consent is expressly forbidden. If you have any questions about this copyright or license, please contact info@ambientsolutions.com INTRODUCTION ------------ iSearch is a tool for allowing visitors to a website to perform a search on the contents of the site. Unlike other such tools the spidering engine is written in PHP, so it does not require binaries to be run on the server to generate the search index. iSearch takes note of the following data from the HTML section of each page: page title In addition all words from the body are put into the search index. iSearch performs simple page match scoring. Keywords score highly, and some words (those in

to

headings) are given higher relevance in search scoring. REQUIREMENTS ------------ iSearch has the following requirements: 1. A server that supports PHP4. This must include file operations on URLs (i.e. the allow_url_fopen option must be enabled.) 2. A server that supports MySQL UPGRADE ------- To upgrade from a previous version you must reset the URL tables. Once installed, click on the "Reset URL Index" button on the configuration page, then the "Spider" button. INSTALLATION AND SUPPORT ------------------------ Please call Ambient Solutions : 0870 1995415 ACKNOWLEDGEMENTS ---------------- Thankyou to the following people for their contribution in reporting problems and suggestions for improvements to iSearch: Jason Cumberland Joel Stanford Heinrich REVISION HISTORY ---------------- 1.0 - 27th August, 2002 - First release 1.1 - 2nd September, 2002 - Fixed problem initialising tables when first installed. - Made maximum file size a configuration variable 1.2 - 16th September, 2002 - Changed lock timeout from 1 hour to 10 minutes. Added configuration option for this. - Improved partial page exclusion to allow sections to be specifically excluded as well as specifically included. - Tested and improved site map feature. Site maps can now be generated automatically. - Fixed bug with "../.." relative references. - Fixed problem for some users when sorting results. - Fixed bug with file extension extraction with URLs containing a question mark. - URLs without a file name extension are always allowed and are treated as directories. - URLs that are not found are correctly removed from the table. 1.3 - 20th November, 2002 - Added user help, displayed in a popup window. - Limited number of search matches displayed. Several pages are now used to display more than 10 (configurable) matches. - HTML comments are stripped prior to stripping other tags to prevent problems with some JavaScript which confused the PHP strip_tags function. - Added "Internal" password protection mechanism to the admin pages, using a cookie based password. - Made the installation of web server directory protection easier to implement by generating the .htaccess file contents. - Made spidering into an auto-refreshing popup window. This eliminates problems with servers timing out when indexing a large number of pages, and also allows the administrator to see the spidering progress. 1.4 - 19th December, 2002 - Added the missing help.php file to the distribution. - Added experimental frame-following support - Added better checking of URL type - Fixed bug which prevented any pages being indexed when Exclude URL list is empty. 1.5 - 21st January, 2003 - Added exact string matching and bracketed search expressions - Made max file size and number of results per page configurable from the admin page. - Added a feature to follow URLs in frame sets - Added google style results - the description is extracted by looking for search words in the document body text. - Allow found documents to be opened in different frame, configurable from the admin page. - Modified search form include file to allow target frame for search results to be set. - Added seperate log for searches and ability to automatically email log to the administrator. - Added log file viewing and clearing to the admin page. - Added more "breaking characters" to the list. - Fixed bug that caused lowercase meta tag names to be ignored. - Fixed problem with queries containing "NOT" operators. - Directories are always indexed. - Added highlighting of searched words in the results display. - Fixed web page caching problem that sometimes prevented spidering from happening correctly. - Added comments to all configuration options on the admin page. 1.6 - 17th March 2003 - Correctly reads GET and POST variables - Added language selection option - Added style selection option, with CSS for setting style options - Added configuration "Wizard" mode - Added ability to keep a cached copy of each page that is spidered. - Added an option to remove GET variables from URLs before storing in the search index. This is useful for stripping session variables, such as PHPSESSID. - Default setup does not use index locks. - Accented characters are now stored in the search index. - Store and display the size of HTML files in the search index. - Now replaces nbsp chars with a space. - Fixed bug if more than 1 tag appeared in a document - All matching words are translated from HTML special chars to plain ASCII - Fixed a bug caused by empty lines in the Allowed URLs list. - Removed erroneous quotes round META REFRESH tag in reindex.php 1.7 - 21 Mar 2003 - Fixed bug if there are single quotes in page title or description - Added more logging when checking new URLs - Fixed a bug with wizard mode when start URL referred to a directory, but did not have a trailing slash. - Fixed a problem following links. 1.8 - 21 May 2003 - Main include file split into 3 parts, the core, the spider engine and the search engine. - Auto-spider no longer uses locks when they are disabled in the configuration. - Added more robust replacement for getting GET and POST variables. - Added header and footer files, for easier customisation. - Fixed problems with accented characters. - Fixed "Warning: ereg_replace() [function.ereg-replace]: REG_ERANGE" messages. - Added spidering support for tags - A tag is inserted into cached pages without one. - Added "Allow URL(s) beginning" and "Disallow URL(s) beginning" configuration options. - Added ability to add/strip www subdomains. - Added ability to strip default file names. - Added detection of pages not changing to improve spidering times - Added removal of duplicate pages from the search index. - Some Wizard mode configuration options could not be saved (they could from Advanced mode) - Added php_info button to configuration menu to show the PHP configuration options for the server. - Added maximum execution time configuration option. - Added robots.txt parsing. - Improved results scoring algorithm. - Fixed some mistakes in the German language file. - Added support for the "noarchive" robots meta tag data. - Fixed problems with matches on keywords or description only. - Fixed case sensitivity problem on tag. - Fixed a problem with stripping get variables on URLs containing & characters. - Fixed a problem with site map generation when a page contained a link to itself. 1.9 - 9th July 2003 - Added Internet Search feature. - Allow default operator to be changed. - Allow option of letting visitor override the default operator. - Logs are now stored in seperate tables to improve performance when logs get large. - Fixed bug with accented characters and multiple results pages. - Added workaround for PHP strip_tags bug. - Fixed bug with ' character in meta tags. - Added performance improvement to link processing. - Fixed bug with HTML encoding of URLs. - Fixed bug with admin page not including search include file. - Fixed admin bugs when register_globals is turned off. - Dutch language file contributed. - Fix for fread bug when using PHP 4.3.2 1.9a - 10th July 2003 - Fix for the following bugs introduced in version 1.9: - Parse error in isearch_form.inc.php - Error in robots.txt code - Error in search log code regarding mailing of log file. 1.9b - 29th August, 2003 - Fix the following bugs: - Use explicit database connection in all MySQL queries - Added fix for forms that are not utf8 encoded. A new hidden value is introduced to tell iSearch whether a search form is on a utf-8 encoded page or not. - If page had not changed it used to get removed when respidering from the advanced configuration page. - Fixed bug causing infinite loop in frame handling code. - Fixed link finding in nested frames - Fixed a bug which caused some versions of PHP to not find any links on any pages. - Fixed file extension detection code. Previously a directory name could be used as a file extension. - Fixed problems with starting the reindex.php page. - Added support for Basic HTTP Authorisation (.htaccess protected files) - Added the characters .,-@ (dot, comma, dash and at) into the characters that are indexed. - Added scoring critera to the isearch_config.inc.php file, to allow scoring system to be tweaked for individual site requirements. - Added limiting of maximum number of matching pages that will be returned. - Added Rumanian language definitions, provided by Liviu Rau. - Automatically selects alternative file reading if allow_url_fopen php.ini option is not enabled. 1.9c - 2nd October, 2003 - Fix the following bugs: - Default operator configuration option did not work. - Fixed a parsing problem with old versions of PHP. - Fixed warnings when no search words were entered (e.g. by searching for just double quotes). - Fixed previous and next page links with accented characters - Really fixed the bug that caused pages to be removed when respidering from the advanced configuration page if page had not changed. - Fixed a bug with relative references from URLs containing a "?". - Fixed some problems with the admin page for sites with register_globals set Off. - Fixed problem with reindex.php for versions of PHP with $_GET support (versions prior to 4.1.0) - Allow character set of results page to be changed. - Add support for 16 bit character sets. - isearch_find now returns the number of matches found. - reindex.php now detected whether it is being run from the command line (e.g. as a cron task) and spiders the whole site. - Added Russian and Spanish languages.