Intellexer Comparator is a semantic solution specially developed to eliminate hassles of content comparison. With its help it is possible to accurately compare documents and set the degree of similarity between them.
Intellexer Comparator analyzes each text document and converts it into a specific form that represents the essential meaning of the document. In this new representation, concrete words, phrases and semantic relations between them, initially used as the term base for the documents, acquire a generalized structure and meaning. As a result, processing of information takes place at the level of the possible meanings of each word and at the level of the ideas that each sentence and the context in general may express.
The degree of proximity between documents is calculated within the range of 0-100%, where «0» means "absolutely different texts" and «100» means "the same text".
Additional tools make it possible to adjust (reduce or increase) the number of retrieved documents according to a requested minimum proximity value. Intellexer Comparator makes it possible to preview portions of documents that are similar in meaning.
Intellexer Comparator is designed to ease the process of comparing documents thus offering the following advantages for the specialists in various areas of expertise:
- Effective comparison of one document to multiple versions;
- Comparison of documents of the same subject matter;
- Identification of review changes between document versions.
There are a number of applications where Intellexer Comparator can be useful. For example, in patent search, a research-and-development engineer looking for relevant texts among millions of documents can identify one relevant document and request the search system for find similar ones. Besides, Intellexer Comparator is effective solution for duplicate web pages or text files detection.
For developers and integrators
Use Case
Intellexer Comparator
Intellexer Comparator can be easily integrated into custom Document/Knowledge management systems using programming languages C/C++ and C#. Our SDK contains all necessary include files and import libraries for binding user applications with Comparator module.
Here is a C++ example of how to add Intellexer Comparator to your application:
#include <iostream>
#include <Indexer.h>
#include <IndexManager.h>
#include <Comparator.h>
#include <ComparatorInt.h>
#include <LPXml.h>
#include <LRCore.h>
using std::cout;
using std::cerr;
using namespace NsSemSDK;
int main()
{
try
{
// provide path to license file
SetComparatorLicensePath("../../ISDK_License.xml");
SetLPXMLLicensePath("../../ISDK_License.xml");
SetLanguageRecognizerLicensePath("../../ISDK_License.xml");
// Load indexer database that is required to create an instance of comparator.
// It may be shared among several instances of comparator and indexer.
CInterfacePtr<IIndexerDB> pDB(LoadIndexerDB("../../LDB", "../../LPlugins"));
// Create comparator instance.
CInterfacePtr<IComparator> pComparator(CreateComparator(*pDB));
// Open document index using FireBird DB provider (FBIndexDriver).
CInterfacePtr<IIndexManager> pIndex((IIndexManager *)CreateProvider("FBIndexDriver", "CreateIndexManager"));
pIndex->Open("../Data/Indexer/Index.FDB");
if (pIndex->GetDocumentCount() == 0) {
cout << "Please fill index with IndexerSample before using ComparatorSample";
return 0;
}
// Find similar documents in index.
CInterfacePtr<IFindSimilarResults>
pResults(pComparator->FindSimilar("../Data/Comparator/Test.htm", *pIndex));
// Print search results.
if (pResults != NULL)
{
for (int i = 0, nCount = pResults->GetResultCount(); i < nCount; ++i)
{
const FIND_SIMILAR_RESULT& result(pResults->GetResult(i));
CInterfacePtr<IEnumDocumentInfo> pEnumDocumentsInfo;
pIndex->GetDocumentsInfo(1, &result.nID, &pEnumDocumentsInfo);
pEnumDocumentsInfo->Reset();
const IDocumentInfo *pDocumentInfo;
if (pEnumDocumentsInfo->Next(&pDocumentInfo))
{
cout << "Similarity between ../Data/Comparator/Test.htm and "
<< pDocumentInfo->GetPath() << " is " << (100 * result.dSimilarity) << "%\n";
}
}
}
}
catch (const CSemBaseException& x)
{
// Handle exceptions.
cerr << x.what() << "\n";
}
return 0;
}
As a result, all documents that are similar to sample text will be found