Assign 'tag' to lucene documents
- Getting Started With Lucene.NET
- Searching and more detail on Documents Fields
- Advanced Queries with Lucene.NET
- Case Sensitivity in Lucene.NET Searches
- Faceted searches with Lucene.NET
One of the good aspect of working with lucene.NET is that it is really similar to a NoSql database, because it permits you to store “document” where a document is a generic collection of fields. Lucene has the ability to store not only textual field, but also Numeric Fields to solve interesting scenarios because you are not limited in storing and searching only for text. Suppose you want to categorize all posts of a blog where each post can have one or more Tag and a pertinence value associated to that Tag. The technique used to determine the Tags to associate to a Blog post is not the subject of this discussion, what I need is only a technical way in Lucene.NET to add tags with an integer value to a document and issue query on them. For the sake of this discussion we can say that blog user decide one or more tag word to associate to the post and give a value to 1 to 10 to determine how pertinent the tag is to the post. We can add tags to document that represent a post with this simple code.
|
|
The above snippet of code states that this blog post is pretty related to ORM and CQRS. The important aspect is that each document can have different field inside a document because a document is Schemaless as NoSql databases. You can now query this index in this way.
|
|
This query will retrieve all the documents that have an associated tag named “orm” with pertinence value in range [5 to 10]. You can clearly compose query to express more complex criteria es: all post that are pertinent to orm with a value of 5 to 10 and pertinent to cqrs with a value of 1 to 10 and so on.
|
|
As you see it is really simple to build a BooleanQuery using BooleanClause.Occur.MUST to create AND composition or BooleanClause.Occur.SHOULD if you want to compose with logical OR. To make everything simpler you can inherit from QueryParser to build a specialized parser for tag of your blog.
|
|
The logic is really simple, if the field is one of the standard fields (title or content in this example) simply use the basic QueryParser capability, each field that is not a standard field of the document is by convention a tag and generates a NumericRangeQuery so you can issue a query like “NHibernate cqrs:[5 TO 10]" to find all post that contains the word nhibernate but have also an associated tag whose value is from 5 to 10.
Alk.