Download List

專案描述

Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.

System Requirements

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2011-09-10 10:43
0.74

This release adds support for if: conditions in crawl mixes and for general searching. It improves the crawl status UI. It improves the performance of negation in queries. It adds support for Open Search RSS output of search results. There are many bugfixes.
標籤: Minor

2011-08-16 06:26
0.72

This version adds support for crawling xlsx, pptx, and epub. The HTML processor now supports the base tag. The code for the front-end (not crawler) has been changed to work in cPanel hosting environments with only PHP 5.2. Filecache support, in addition to memcached support, has been added. Yioop! can now filter out hosts from search results, if needed, after indexing has been done. Improvements have been made to the start and stop of the crawling process. A command-line query tool has been added.
標籤: Minor

2011-07-31 17:02
0.70

This version moves Yioop! from using a bag of words index to a positional index. Proximity scores are calculated when a multi-word query is done. The host of an inlink is now also incorporated in a ranking. An indexing plugin architecture has been created to make it easier to customize the crawl process; an example recipe plugin is included. An SEO cache view and improved test query statistics have been added. This version finally includes bug fixes for memcached and bmp handling.
標籤: Minor

2011-05-20 15:34
0.68

This version expands the capabilities of the crawl mixing. Now you can take a query and specify that the first result should come from a particular crawl such as an open Web crawl, the second result should come from Web pages from Wikipedia, the next result should be an image, and all remaining results should come again from the open Web crawl results. This version also reworks the index dictionary to try to make search results appear faster. It also has improvements to the way pages are ranked when indexing raw arc, ODF-RDF, or media-wiki dumps.
標籤: Minor

2011-01-28 16:35
0.66

This version provides preliminary support for archive crawling of arc, media wiki, and open directory RDF files. It allows re-crawls of previously created Yioop! WebArchives. It also makes it easier to add stemmers for languages other than English (for which there is already a stemmer). Finally, it fixes several bugs in indexing and improves the group by iterator.
標籤: Minor

Project Resources