Résumé:
There are numerous clusters of historical and ancient documents in archives that are
invaluable, as they are the most common way to share information. However, searching for
this information is time-consuming due to its deteriorated condition and may be unusable.
That is why, in recent years, digitization of these documents has become very popular, but
numbering alone is not sufficient to make information accessible, particularly in historical
manuscripts. Transcribing these documents is quite difficult due to poor preservation,
different writing styles, etc.
An information retrieval technique called "keyword spotting" in document images has
continued to get researchers' interest, which identifies word occurrences in document
images. It represents an attractive alternative to transcription, which can be challenging,
especially in the case of historical documents.
In this thesis, we study keyword spotting in handwritten historical documents using a
Query-by-Example (QbE) approach type and a segmentation-based technique. The word
images in the document are extracted and represented by a collection of textural features.
These features are then used to match the image of the query word to the images in the
reference base and then retrieve the relevant documents. Sundry textural metrics are used
to capture the word shape, including oriented Basic Image Features (oBIFs) and its column
scheme at different scales, Local Phase Quantization (LPQ), Local Binary Patterns (LBP),
Local Directional Number Pattern (LDNP), Complete Local Binary Patterns (CLBP) and
Completed Robust Local Binary Pattern (CRLBP). Likewise, multiple distance
measurements are inspected for the matching phase. For the experiments, we used the
ICFHR-2014 Word Spotting Competition database. The proposed technology evaluated in
the database has yielded profitable results comparable to state-of-the-art technology