XenForo 301 Redirects & Crawling Problems

  • Thread starter JGaulard
  • Start date
JGaulard

JGaulard

Moderator
Staff member
  • #1
I'm noticing that the many different 301 redirects built into the XenForo software are causing problems for the search engines. I'd like to address this issue in this post. I'll begin by describing where these redirects originate and then I'll move on to how they affect both search engine crawling as well as indexing. And finally, I'll let you in on how this problem can be remedied. But before all that, I'd like to talk about some of what I've seen and heard out there in regards to other people's XenForo websites.

Since I've installed this software, I've noticed that crawling by search engines is completely inefficient. There is tons of crawling of pages that shouldn't be crawled and that don't matter to the grand scheme of the website in question. Examples of these pages are member profiles, attachments, and 301 redirects. The problem with allowing these types of pages to be crawled is that, first, much of a website's crawl budget is used up and good pages aren't being crawled nearly as much as they should be, if at all, and two, when a search engine crawls a website and comes across thin pages, such as member profiles, as well as 403 forbidden pages, and redirects, that search engine wants to turn around and run for the hills. Crawlers don't like these types of pages and when they encounter them, the crawler will lessen its crawl rate overall, compounding the problems for the site. It's a no win situation you're putting yourself in by allowing search engines to see these types of lousy pages.

I've read a lot on the XenForo community forum from other users about how very little of their sites are being crawled and indexed by Google, Bing, and the other smaller search engines. I've also read about how some sites have suffered ranking drops as well as search engine penalties. One member said that only 30% of their site had been indexed after a migration from vBulletin. The site used to rank just fine in Google, but after the switch to XenForo, the site's rank fell and has yet to return. That sentiment was echoed by other members. As I was reading these posts, I thought of my own issues, which seemed to mirror those of the others. Why was Google not liking my site? Why was it not liking theirs? It seems like the search engines should be gobbling them up. But no. They're not. They're scanning them and leaving many of their pages out of the index. But why?

Let's talk about a few of the easy issues first. Member pages as well as attachments. None of these pages should be crawled. I'll tell you that right off the bat and I won't even explain why, other than that they're all thin. You have a choice here. Block non-registered users (including crawlers) from accessing these pages in the XenForo permission system or block them in the site's robots.txt file. When these pages are allowed to be crawled, blocked in the permission system, and are left unblocked in the robots.txt file, they'll produce tons of 403 forbidden errors, which Googlebot will hate. I've gone back and forth with which the best solution to blocking these pages would be and I've decided that blocking them in the robots.txt file is the best way to go. I've come to this conclusion after studying a good number of some of the largest and most successful XenForo forums out there on the internet. First, virtually all of them block these pages one way or another and most of them block them using the robots.txt file. While not my first choice, I think this is the best options. So just do it.

Disallow: /attachments/
Disallow: /members/

Now I'll move onto the 301 redirect issue. And I know, I'm going out of order here. I got all excited once I began writing that I screwed up the flow. But who cares. It'll all work out in the end.

The redirects in question stem from a few different sources. Node lists, thread lists, thread pages, and quotes within posts. I won't go into the technical details here, but I will tell you that after analyzing one of my websites, I found that when these redirects are crawled by Google before the actual page is, it's the redirect that takes precedent. Even though the original URL is redirected to the target URL, Google, for some reason, isn't honoring the redirect and is choosing the URL that ends in /post-xxxx over the canonical one. I've checked numerous times in the Google Webmaster Console to verify this and it's true. Even for pages that have been crawled weeks, months, and sometimes years ago. The 301 redirected pages are simply not canonicalizing to the target URLs. There's a whole host of reasons why this is terrible, but the most simple to explain is that it's causing a big duplicate content issue. If you have any questions about anything I'm writing or if you'd like me to elaborate on any of this, please ask down below. I'd be happy to go further into all of this.

Anyway, here's the solution. You need to block these URLs in the robots.txt file and you also need to make a few changes to a few templates in the XenForo system. Basically, you don't want Google and the others to ever see these 301 redirected URLs. You don't want them to be crawled. Not until Google decides to actually honor them anyway, which probably won't happen. Yes, it's true. I'm telling you that Google doesn't always honor 301 redirects. While the URL will in fact redirect when clicked on by a human or crawled by an engine, the redirected URL and the target URL aren't guaranteed to merge into one. What I'm seeing is that the redirected URL remains as the indexed one and the target (canonical) URL isn't being indexed at all. Which is terrible because that's actually the one that's live on the site. The redirected URL usually disappears after some time. It's all pretty deep, so again, if you have questions, please let me know.

In the most basic sense, what you're left with is a large number of live URLs that aren't indexed and a large number of URLs that are indexed, but that have no links pointing to them anymore, remaining in the index as orphan pages. What a mess.

Here's what you do to alleviate this problem. In the robots.txt file:

Disallow: /goto/
Disallow: /threads/*/post
Disallow: /threads/*/latest

By the way, if you're interested in my current robots.txt file for this site, just click here.

In the template system, make these changes (current as of version 2.2.1):

Code:
-------------------------------------------------

TEMPLATE: "THREAD_LIST_MACROS"

(remove date link for guests)

line 182

<!-- REMOVE DATE LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<li class="structItem-startDate"><a href="{{ link('threads', $thread) }}" rel="nofollow"><xf:date time="{$thread.post_date}" /></a></li>

<xf:else />

<li class="structItem-startDate"><xf:date time="{$thread.post_date}" /></li>

</xf:if>

<!-- REMOVE DATE LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "THREAD_LIST_MACROS"

(remove date link for guests)

line 218

<!-- REMOVE DATE LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/latest', $thread) }}" rel="nofollow"><xf:date time="{$thread.last_post_date}" class="structItem-latestDate" /></a>

<xf:else />

<xf:date time="{$thread.last_post_date}" class="structItem-latestDate" />

</xf:if>

<!-- REMOVE DATE LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "NODE_LIST_FORUM"

(remove only the "post" portion of URL for guests)

line 126

<!-- REMOVE THREAD POST PORTION IN LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/post', $extras.LastThread, {'post_id': $extras.last_post_id}) }}" class="node-extra-title" title="{$extras.LastThread.title}">{{ prefix('thread', $extras.LastThread) }}{$extras.LastThread.title}</a>

<xf:else />

<a href="{{ link('threads', $extras.LastThread) }}" class="node-extra-title" title="{$extras.LastThread.title}">{{ prefix('thread', $extras.LastThread) }}{$extras.LastThread.title}</a>

</xf:if>

<!-- REMOVE THREAD POST PORTION IN LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "NODE_LIST_SEARCH_FORUM"

(remove only the "post" portion of URL for guests)

line 144

<!-- REMOVE THREAD POST PORTION IN LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/post', $extras.LastThread, {'post_id': $extras.last_post_id}) }}" class="node-extra-title" title="{$extras.LastThread.title}">
    {{ prefix('thread', $extras.LastThread) }}{$extras.LastThread.title}
</a>

<xf:else />

<a href="{{ link('threads', $extras.LastThread) }}" class="node-extra-title" title="{$extras.LastThread.title}">
    {{ prefix('thread', $extras.LastThread) }}{$extras.LastThread.title}
</a>

</xf:if>

<!-- REMOVE THREAD POST PORTION IN LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "POST_MACROS"

(remove post date link for guests)

line 108

<!-- REMOVE POST DATE LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/post', $thread, {'post_id': $post.post_id}) }}" rel="nofollow">
    <xf:date time="{$post.post_date}" itemprop="datePublished" />
</a>

<xf:else />

<xf:date time="{$post.post_date}" itemprop="datePublished" />

</xf:if>

<!-- REMOVE POST DATE LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "POST_MACROS"

(remove share toolstip and link for guests)

line 128

<!-- REMOVE SHARE TOOLTIP LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/post', $thread, {'post_id': $post.post_id}) }}"
    class="message-attribution-gadget"
    data-xf-init="share-tooltip"
    data-href="{{ link('posts/share', $post) }}"
    rel="nofollow">
    <xf:fa icon="fa-share-alt"/>
</a>

<xf:else />

<xf:fa icon="fa-share-alt"/>

</xf:if>

<!-- REMOVE SHARE TOOLTIP LINK FOR GUESTS -->

-------------------------------------------------


Code:
-------------------------------------------------

TEMPLATE: "POST_MACROS"

(remove post date number link for guests)

line 149

<!-- REMOVE POST NUMBER LINK FOR GUESTS -->

<xf:if is="$xf.visitor.user_id">

<a href="{{ link('threads/post', $thread, {'post_id': $post.post_id}) }}" rel="nofollow">
    #{{ number($post.position + 1) }}
</a>

<xf:else />

#{{ number($post.position + 1) }}

</xf:if>

<!-- REMOVE POST NUMBER LINK FOR GUESTS -->

-------------------------------------------------

These changes remove most of the redirects altogether. They also remove the nofollow attributes that are attached to some of the redirects, which is beneficial as well. And in one case, the changes alter the link URL so the redirect is removed, but the URL to the thread page remains. That's good too. By the way, these changes are only applied when a user isn't logged in. When a user is logged in, all goes back to the way it was. The goal is to hide these things from the search engine crawlers.

After I recently implemented these changes, I noticed my site's crawl rate increase and pages that have never been included in the index, included. It's a refreshing change. While things are still in flux, I can definitely sense a positive improvement. Some of the pages need to drop out of the index altogether, and that will take some time, but I have a sense that my site's crawling and ranking will improve greatly because of these changes.

Ask me questions. I would love to answer them.
 
Last edited by a moderator:
Top