Difference between revisions of "Web Frameworks - Workbook"

From mi-linux
Jump to navigationJump to search
Line 15: Line 15:
 
* [[Workshop - week 11]] - Demonstrations
 
* [[Workshop - week 11]] - Demonstrations
 
* [[Workshop - week 12]] - Demonstrations
 
* [[Workshop - week 12]] - Demonstrations
 
 
  
  
Line 48: Line 46:
  
 
http://ajbrown.org/blog/2009/01/04/automated-testing-using-zend-framework-part-1.html - Zend automated testing information
 
http://ajbrown.org/blog/2009/01/04/automated-testing-using-zend-framework-part-1.html - Zend automated testing information
 
 
  
 
(0707202) Useful link about Automatic testing of MVC applications created with Zend Framework - http://www.alexatnet.com/node/12
 
(0707202) Useful link about Automatic testing of MVC applications created with Zend Framework - http://www.alexatnet.com/node/12
 
(0622597)
 
 
Roll you own Search with Zend_Search_Lucene
 
 
Creating index
 
 
<?php
 
 
require_once 'Zend/Feed.php';
 
require_once 'Zend/Search/Lucene.php';
 
 
function sanitize($input) {
 
return htmlentities(strip_tags( $input ));
 
}
 
 
//create the index
 
$index = new Zend_Search_Lucene('/tmp/feeds_index', true);
 
 
$feeds = Array('http://feeds.feedburner.com/ZendDeveloperZone',
 
'http://www.planet-php.net/rss/',
 
'http://www.sitepoint.com/blogs/category/php/feed/',
 
);
 
 
//grab each feed
 
foreach ($feeds as $feed) {
 
 
$channel = Zend_Feed::import($feed);
 
 
echo $channel->title()."\n";
 
 
// index each item
 
foreach ($channel->items as $item) {
 
if ($item->link() && $item->title() && $item->description()) {
 
           
 
    $doc = new Zend_Search_Lucene_Document();
 
       
 
$doc->addField(Zend_Search_Lucene_Field::Keyword('link',
 
sanitize($item->link())));
 
 
$doc->addField(Zend_Search_Lucene_Field::Text('title',
 
sanitize($item->title())));
 
 
$doc->addField(Zend_Search_Lucene_Field::Unstored('contents',
 
sanitize($item->description())));
 
 
echo "\tAdding: ".$item->title()."\n";
 
$index->addDocument($doc);
 
}
 
}
 
}
 
$index->commit();
 
echo $index->count()." Documents indexed.\n";
 
 
Next, we specify the RSS feeds we are interested in and fetch them in a loop. Then, with each feed we loop through the articles and index each one as a seperate Zend_Search_Lucene document.
 
 
$feeds = Array('http://feeds.feedburner.com/ZendDeveloperZone',
 
'http://www.planet-php.net/rss/',
 
'http://www.sitepoint.com/blogs/category/php/feed/',
 
);
 
 
 
//grab each feed
 
foreach ($feeds as $feed) {
 
 
$channel = Zend_Feed::import($feed);
 
 
echo $channel->title()."\n";
 
 
// index each item
 
foreach ($channel->items as $item) {
 
if ($item->link() && $item->title() && $item->description()) {
 
           
 
//Create and index a ZSearch Document  
 
 
}
 
To add a document to our index, we create the document object and specify content for the document's fields. Zend_Search_Lucene provides different ways to analyze and store fields depending on how we need to search them and return the results. In this example, for each RSS item, we want to index the link, title, and description.
 
$doc = new Zend_Search_Lucene_Document();
 
 
$doc->addField(Zend_Search_Lucene_Field::Keyword('link',
 
sanitize($item->link())));
 
 
$doc->addField(Zend_Search_Lucene_Field::Text('title',
 
sanitize($item->title())));
 
 
$doc->addField(Zend_Search_Lucene_Field::Unstored('contents',
 
sanitize($item->description())));
 
 
echo "\tAdding: ".$item->title()."\n";
 
$index->addDocument($doc);
 
value stored? indexed? tokenized? binary?
 
Keyword yes yes no no
 
UnIndexed yes no no no
 
Binary yes no no yes
 
Text yes yes yes no
 
UnStored no yes yes no
 
Keyword fields are stored and indexed, meaning I can search them as well as display them back in my search results. They are not split up into seperate words by tokenization. My link field is a good candidate for a Keyword because I might want to search articles by link URL, and I definitely want to display the link in the search results since the link is serving as my external identifier for the document. Enumerated database fields usually translate well to Keyword fields in Zend_Search_Lucene.
 
It's usually a good idea to store an identifier for each document that can be used as a lookup mechanism in the search results. For this example, it makes sense to use the RSS item's link. If we were building an index from an existing relational database, we would want to store the primary key of the record, and if we were indexing a file system we would probably want to store the path to the file.
 
UnIndexed fields are not searchable, but they are returned with search hits. Database timestamps, primary keys, file system paths, and other external identifiers are good candidates for UnIndexed fields.
 
Binary fields are not tokenized or indexed, but are stored for retrieval with search hits. They can be used to store any data encoded as a binary string, such as an image icon.
 
Text fields are stored, indexed, and tokenized. Text fields are appropriate for storing information like subjects and titles that need to be searchable as well as returned with search results. In my example, the title field of the RSS articles are indexed as Text fields.
 
UnStored fields are tokenized and indexed, but not stored in the index. Large amounts of text are best indexed using this type of field. Storing data creates a larger index on disk, so if you need to search but not redisplay the data, use an UnStored field. In my example, the RSS description--the main body of text--is stored as an UnStored field. UnStored fields are particularly practical when using a Zend_Search_Lucene index in combination with a relational database. You can index large data fields with UnStored fields for searching, and retrieve them from your relational database by using a seperate fields as an identifier.
 
It's also important to note that we named the field to store the description 'contents'. This is no accident. This is the field name that Zend_Search_Lucene will search by default. Internal discussion with the Framework development team is leading to the idea that Zend_Search_Lucene may break away from the Lucene norm and implement a simple way to search all fields instead of just the 'contents' field.
 
Searching the Index
 
Now that we have created a Zend_Search_Lucene index, let's put it to use by performing some searches. You can implement search on an index in just a couple dozen lines of code:
 
<?php
 
 
require_once 'Zend/Search/Lucene.php';
 
 
//open the index
 
$index = new Zend_Search_Lucene('/tmp/feeds_index');
 
 
$query = 'framework';
 
 
$hits = $index->find($query);
 
 
echo "Index contains ".$index->count()." documents.\n\n";
 
 
echo "Search for '".$query."' returned " .count($hits). " hits\n\n";
 
 
foreach ($hits as $hit) {
 
echo $hit->title."\n";
 
echo "\tScore: ".sprintf('%.2f', $hit->score)."\n";
 
echo "\t".$hit->link."\n\n";
 
}
 
 
?>
 
Could it be any easier? We include the library, open our index, seach for a term, and iterate through the result set.You should note that since we used the default case insensitive text analyzer to build the index, the search query should be lowercase.
 
The Zend_Search_Lucene query format is powerful but simple. It's a snap to specify multiple query terms with a special syntax.
 
To search our RSS index for articles that must contain the word 'framework' in the 'contents' field:
 
$query = '+framework';
 
For articles with 'Zend' in the title:
 
$query = 'title:zend';
 
For articles with containing the word 'framework' but without the word 'Zend' in the title:
 
$query = 'framework -title:zend';
 
Conclusion
 
In these simple examples, we have seen that the Zend_Search_Lucene module provides an easy way to add customized search functionality to an any php application without a dependance on external software packages. As the Zend_Search_Lucene module matures, it will no doubt prove to be a prized component of the Zend Framework. In future articles I hope to explore advanced indexing and search capabilities of Zend_Search_Lucene, and put the module through some real-life benchmarks using large data sets, comparing indexing and search performance against some other current popular methods
 

Revision as of 18:48, 2 April 2009

Main Page >> Web Frameworks >> Web Frameworks - Workbook

Workshop schedule:


Useful information:

  • Binay Randhawa (0719961)
 [CRUD Function (Add, edit, delete) -- http://weierophinney.net/matthew/uploads/2007-02-28-FrameworkPresentation.pdf 


Some useful links I have found (0610970)

http://www.developertutorials.com/tutorials/php/zend-framwork-tutorial-8-08-13/page1.html

http://blog.astrumfutura.com/archives/353-An-Example-Zend-Framework-Blog-Application-Part-2-The-MVC-Application-Architecture.html - detail information about MVC pattern

http://zendguru.wordpress.com/category/zend-framework/ - explanation about the ZEND form

http://www.killerphp.com/zend-framework/videos/ - video tutorial about MVC pattern

http://webdeveloper.econsultant.com/ajax-demos-examples-code-samples/ - ajax tutorial

http://ajbrown.org/blog/2009/01/04/automated-testing-using-zend-framework-part-1.html - Zend automated testing information

(0707202) Useful link about Automatic testing of MVC applications created with Zend Framework - http://www.alexatnet.com/node/12