(About lucene Apache Lucene is an open source search engine that can easily participate in full-text search functions for Java software. The most important job of Lucene is to index every word of the file. The index makes the search more efficient than the traditional word-by-word comparison. Lucen provides a set of APIs for interpreting, filtering, analyzing files, compiling and using indexes. Its powerful In addition to being efficient and simple, the most important thing is that users can customize its functions at any time according to their needs. The function of lucene Lucene is a sub-project of the apache software foundation [4] jakarta project group, and is an open source [5] full-text search engine toolkit, that is, it is not a complete full-text search engine, but a full-text search engine The architecture of the engine provides a complete query engine and indexing engine, and part of the text analysis engine (two western languages, English and German). The intention of Lucene is to provide software developers with a simple and easy-to-use toolkit to conveniently terminate the full-text retrieval function in the solution system, and perhaps build a complete full-text retrieval engine based on this. The original author of Lucene, a contributor to lucene, is Doug Cutting, a senior full-text indexing/retrieval expert. He was once the chief developer of the V-Twin search engine [6], and later served as a senior system architect at Excite [7]. Currently engaged in some research on the underlying architecture of the Internet. Newly published in the author's own lucene/ and later on SourceForge[8], it became a subproject of the Apache Software Foundation jakarta in late 2001: jakarta.apache.org/lucene/. The use, properties and advantages of Lucene As an open source project, since its inception, Lucene has aroused endless responses from the open source community. Programmers not only use it to build detailed full-text search applications, but also integrate it into various System software, as well as building Web applications, and even some commercial software have adopted Lucene as the core of its internal full-text retrieval subsystem. The website of the apache software foundation uses Lucene as the full-text search engine, and the 2.1 version of IBM's open source software eclipse[9] also adopts Lucene as the full-text index engine of the help subsystem, and the corresponding IBM commercial software WebSphere[10] Also adopted Lucene. Lucene has been used more and more with its open source features, excellent index structure, and outstanding system architecture. Luncene's outstanding strengths as a full-text search engine <imgsrc="sad.gif"smilieid="2"border="0"alt=""/>1) The index file format is independent of how it is used. Lucene defines a set of index file formats based on 8-bit bytes, so that compatible systems may use the index files that can be shared in different ways. (2) On the basis of the inverted index of the traditional full-text search engine, the block index is terminated, which can be used to build a small file index for new files and improve the indexing speed. Then through the merger with the original index, the optimization intention is achieved. (3) The excellent target-oriented system architecture reduces the learning difficulty for Lucene expansion and facilitates the expansion of new functions. (4) A text analysis interface independent of language and file format is designed. The indexer terminates the creation of index files by accepting the Token stream. Users expand new languages ??and file formats, and only need to terminate the text analysis interface. (5) A set of powerful query engines has been tacitly terminated. Users do not need to write their own code even if the system can obtain powerful query capabilities. In the query termination of Lucene, Boolean operations, fuzzy queries (FuzzySearch[11]), grouping and grouping are tacitly terminated. query etc. The prospect of Lucene faces the commercial full-text search engines that already exist, and Lucene also has appropriate advantages. First of all, its development source code distribution method (abide by the Apache Software License [12]), on this basis, programmers can not only fully use the powerful functions provided by Lucene, but also learn in-depth and detailed full-text search engine production technology and aspects Based on the practice of target programming, a better and more suitable full-text search engine is written according to the actual situation of use. At this point, commercial software is far less flexible than Lucene. Secondly, Lucene follows the advantages of open source code's excellent architecture, and designs a reasonable and highly scalable target-oriented architecture. Programmers can expand various functions on the basis of Lucene, such as expanding Chinese management capabilities, from text to text. Extending to the management of text formats such as HTML, PDF [13], etc., the functions of writing these extensions are not only not messy, but also because Lucene properly and reasonably generalizes the system equipment, the extended functions can also easily achieve cross-border way of talent. Finally, after transferring to the Apache Software Foundation, with the help of the network method of the Apache Software Foundation, programmers can easily and)