FAQ
- How does indexing work at infotiger?
- Is there any extended search syntax?
- And how can I filter results even more?
- How can I submit my URL for indexing?
- What about the similarity search?
- How do I know that infotiger is visiting my site?
- Is infotiger reachable via TOR (.onion) networks?
How does indexing work at infotiger?
Text search engine
At the moment, infotiger is a text only search engine, covering two languages (English+German) in two seperate indexes.
Pre-processing
Prior to indexing text crawled from websites, a few pre-processing steps are done. The same pre-processing is also applied to every search query, before sending the query to the index.
Tokenizing
In a first step, the "words" in a text or search query are separated, usually at charaters as " " or ".". E.g. "de.wikipedia.org" would be split into three words "de" "wikipedia" "org".
Stopword removal
After tokenizing, the (language dependent) stopwords are removed, as there is very little information in these, e.g. "in", "have", "do", etc. would not be indexed, and are not searchable, but will still appear in the results and snipplets.
Stemming
Finally, stemming algorithms are applied, so e.g. "fishing", "fished", and "fisher" would all be stemmed to "fish".
Similarity index
Besides the classical
Is there any extended search syntax?
The art of phrasing queries :)
Boolean search
By default, search terms are treated as "OR", so
Occam's razor
is identical to:
Occam's OR razor
if you want all of your search terms in a page, you could try:
Occam's AND razor
or even:
(occham's AND razor) OR (Ockhams AND Rasiermesser)
Search for full phrases
surround your phrase by quotation marks '"'
"Latin lex parsimoniae"
Site search
you may search restrict your search to a particular site:
site:de.wikipedia.org
or even search within that site:
site:de.wikipedia.org AND "Ockhams razor"
And how can I filter results even more?
With infotiger you have the possibility to narrow down the results of your search in a variety of ways.
Filter by language
As our search index is language dependent, you have to decide in which index to search. The language for displaying the web pages in the browser is preset, which in most cases corresponds to your national language. If you want to search in our English/German index, you could set the language for the search query accordingly in the drop-down menu.
Filter by publication date
If a web page has a valid publication date, it will be displayed together with the search result. In order to limit the result set to a period of time, you can set this using the corresponding drop-down menu below the input box. Please note: pages without a publication date will not be displayed when applying this filter.
Filter by TigeRank
TigeRank is a ranking for web pages, which assigns higher ranks to popular pages. The procedure is based on the well-known PageRank algorithm, but not identical to it. You can, for example, restrict the results to the top 5% ranked pages, so that only results from pages are displayed, that are among the top 5% of the pages rated highest by TigeRank.
What about the similarity search?
Similarity search
Where can I submit my URL for indexing?
You may submit your URL or site to the infotiger index at our add url page. Currently only pages in English or German language do have a chance to make their way into the index.
How do I know that infotiger is visiting my website?
Infotiger visits web pages to index them. If you run a website yourself, read more about our web crawler here.
Is infotiger reachable via TOR (.onion) networks?
Yes it is: infotiger4xywbfq45mvd5drh43jpqeurakg2ya7gqwvjf2bbwnixzqd.onion