Java: Lucene Document Sınıfları

2 Nisan 2018 Pazartesi

Lucene Document Sınıfları

Giriş
Document belli bir isme sahip alanlardan oluşur. Document nesnesi yaratıldıktan sonra alanlara veri eklenir. Daha sonra Document nesnesi IndexWriter ile indekslenir. IndexWriter nesnesi Analyzer nesnesine sahiptir. Bu nesne ile document'ın alanlarını indeksler.

Document Sınıfı
Giriş
Şu satırı dahil ederiz.

import org.apache.lucene.document.Document;

Açıklaması şöyle

A document is a collection of fields

constructor
Şöyle yaparız.

Document doc = new Document();

add metodu
Örnek
Şöyle yaparız.

Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new StringField("fieldname", text, Store.NO));

Örnek

Şöyle yaparız.

private static void indexDoc(IndexWriter writer, String name, String content) 
        throws IOException {
  Document document = new Document();
  document.add(new TextField("name", name, Field.Store.YES));
  document.add(new TextField("body", content, Field.Store.YES));

  writer.addDocument(document);
}

Field Sınıfı
Giriş
Şu satırı dahil ederiz.

import org.apache.lucene.document.Field;

Açıklaması şöyle.

Field is the lowest unit or the starting point of the indexing process. It represents the key value pair relationship where a key is used to identify the value to be indexed.

constructor
İmzası şöyle

Field(String name, String value, Store store, Index index)

Store edilen ancak analiz edilmeyen alanlar için açıklama şöyle.

Stored fields that are not indexed are useful to store meta-data about a document that the user won't use to query the index. An example might be a database id where a document comes from. This id will never be used by the user since they does not know about it, so it is generally useless to index it. But if you store it, so you can use it to gather extra information from your db at runtime.

Örnek
Şöyle yaparız.

new Field("id","1", Field.Store.YES, Field.Index.ANALYZED));

constructor - key + value + FieldType
Elimizde şöyle bir kod olsun.

String stringToProcess = "...";
FieldType ft = new FieldType(TextField.TYPE_STORED);
ft.setOmitNorms(false);
ft.setStoreTermVectors(true);

Şöyle yaparız.

Field f= new Field("content", stringToProcess, ft);

TextField Sınıfı

Şu satırı dahil ederiz.

import org.apache.lucene.document.TextField;

Tokenize edilerek indekslenen alanlar için kullanılır.

constructor

Şöyle yaparız.

String cityText = ...;
Document doc = new Document();
doc.add(new TextField("city_text", cityText, Field.Store.YES));

constructor - Reader

Tüm dosyayı indekslemek için şöyle yaparız.

Reader reader = new FileReader(file.getCanonicalPath());
doc.add(new TextField("contents", reader));

Örnek
Şöyle yaparız.

StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
IndexWriter writer = new IndexWriter(directory, config);
Document document = new Document();

document.add(new TextField("content", new FileReader("document.txt"))); 
writer.addDocument(document);
writer.close();

Java

2 Nisan 2018 Pazartesi

Lucene Document Sınıfları

Hiç yorum yok:

Yorum Gönder