/
Source:Webpage

Source:Webpage

GEODI can also use web pages and RSS news sources as content. Web pages can have very variable structures. To support this variability, the Web Page data source offers many options.

We want to inform you about some important issues related to web page indexing. Indexing web pages may require intensive internet usage. Some sites may think it is a DDOS attack and ban your IP. Also, some page content, due to legal terms, may not be processed and used outside the site. Our sole responsibility is to provide software, and DECE can not be held responsible for any consequences.

Conditions for connection

  1. Access to the web page

  2. Information required for verification required by the token or page for user verification places

 

You can provide a single address or multiple addresses. Domain restriction settings will work independently for each address.

With level=0 only the given page is indexed. The level must be large enough to access all pages. For cases with paging, the level value can be 1000+.

Many web pages use URL parameters. By default, GEODI creates content for each unique URL. But in some cases, a parameter may not change the content of the page. In such situations, you may ignore such parameters to get a better index result.

For example:

https://sample.com

https://sample.com?backtomail=true

 

Some websites may have social media links, advertising pages, or similar pages that you don't want in to the index.

  1. You may list as many pages as you need to exclude. Each page must be separated with “;”.

  2. Wildcards are allowed. *adds* ; *last.html ignores any files containing adds and ends with last.html.

GEODI has rules on a per-web-page basis. Some rules come pre-configured. For example, only the "info box" containing content is processed on Wikipedia pages. Pagination controls found on some web pages (such as links appearing as 1, 2, 3,..., 10 and determining the pages) are automatically processed.