CentreStack allows you to create multiple tenants, each with their own URLs for external access. So, you may want to control the search engine exposure for all of the external URLs separately. In other words, you may want some URLs to be indexed, but not others. The problem with this is that all of the CentreStack URLs point to the same folder in the installation directory. Therefore, if you want to add a robots.txt file to the root folder of your self-hosted CentreStack installation, then all Tenants will be served the same robots.txt file by IIS and will be affected equally.
If you want to block all the CentreStack URLs, then the solution is simple. Simply add a robots.txt file to the C:\Program Files (x86)\Gladinet Cloud Enterprise\root\ folder with the following text:
User-agent: * Disallow: /
IIS URL Rewrite
For separate Tenant search indexing, you could serve different robots.txt files based on IIS rewrites. To do so, first install the URL Rewrite module on your IIS web server because it is not installed by default (https://www.iis.net/downloads/microsoft/url-rewrite - please note that these rules were tested with IIS 10, but should work with other versions as well). Then, follow these steps:
- Open the web.config file located in the C:\Program Files (x86)\Gladinet Cloud Enterprise\root\ folder of your CentreStack server with notepad.
- Add the following text between inside of the <system.webServer> tag, which can be found close to the bottom of the file.
<rewrite>
<rules>
<rule name="Rewrite Robots.txt Based On Sub-Domain">
<match url="robots.txt" ignoreCase="true" />
<conditions>
<add input="{HTTP_HOST}" pattern="([a-z0-9]+)(.?)sync4share\.com$" ignoreCase="true" />
</conditions>
<action type="Rewrite" url="/robots{C:1}.txt" appendQueryString="false" />
</rule>
</rules>
</rewrite> - Change the sync4share\.com in the code above for your own domain name. For example: if your domain name is example.com, the pattern attribute should be "([a-z0-9]+)(.?)example\.com$"
- Save and close the file. You may need to edit the NTFS permissions of the default.aspx file in order to be able to save it back to the protected Program Files (x86) folder.
- Now create several robots.txt files in the root folder. The files name variations should contain the tenant name at the end. For example:
robots.txt
robotstenant1.txt
robotstenant2.txt
robotstenant3.txt - Inside each of the files, insert the necessary robots.txt code. If you do not wish to block any search engines, leave the file blank. If you wish to block all search engines, insert the following code:
User-agent: *
Disallow: / - Now test the website carefully to make sure that you are seeing the correct robots.txt file when you try to access the following URLs:
http://example.com/robots.txt
http://tenant1.example.com/robots.txt
http://tenant2.example.com/robots.txt
http://tenant3.example.com/robots.txt
HTML Meta Tag
Another much simpler hack is to replace the static C:\Program Files (x86)\Gladinet Cloud Enterprise\root\index.htm page with an ASP page that outputs a dynamic HTML meta tag for robots. Meta Tags are not as effective as robots.txt, but they are much easier to implement and should work after a few days. To implement this, follow these steps:
- Create a copy of the index.htm page. Archive the old file as index.html_old
- Rename the copy as default.aspx. This will allow you to add server-side code on the page
- Edit the default.aspx page with notepad and add the following VB code-block just after the <head> tag:
<%
If Request.Url.Host.ToLower().Contains("mysubdomain.example.com") Then
Response.Write("<meta name=""ROBOTS"" content=""NOINDEX, NOFOLLOW"" />")
End If
%> - Replace the mysubdomain.example.com in the code above with the external URL of your tenant. Please make sure to enter the entire URL in lowercase, since the text comparison is done in lowercase.
- Save the file and test the website. You may need to edit the NTFS permissions of the default.aspx file in order to be able to save it back to the protected Program Files (x86) folder.
- Search engines update their indexes at regular intervals based on the popularity, age, and frequency of updates of the website. Well-established domain names will receive visits from the GoogleBot several times a day, everyday. If your site is new and was just recently indexed, the GoogleBot might only visit you once a week, or once a day. So, allow for a several days for the change to take effect and the URLs to be removed. You can manage the indexed URLs for Google with the Google Webmaster Tools (https://www.google.com/webmasters). You can also do quick Google-index checks by simply searching Google for:
site:mysubdomain.example.com
Comments
0 comments
Article is closed for comments.