Xenforo Duplicate Content

JGaulard

Moderator
Staff member
I am beginning to realize that Xenforo forum software creates a lot of duplicate content. Listen, I'm not complaining and I absolutely love the software. Best I've ever used. It's as smooth as butter and the developers are as smart as they come. There is a lot of duplicate content though, so you need to be careful.

The reason I say this is because for the past month or two, I've been jumping back and forth in my mind with the idea of blocking some directories in the robots.txt file as opposed to allowing them to be crawled and the pages contained therein to return 301 redirect status codes. What I've learned is that Google really doesn't treat 301 redirects as directives. They apparently treat them as suggestions. I was searching Google today for something on the Xenforo site itself and clicked on one of the results for their community. The link was actually a 301 redirect that hadn't been canonicalized with the original thread URL yet and the best part is, this thread began in early 2019.

Here's the deal - I just went to the homepage of this very website (IndyFor). To the right of the forum links are links to individual posts. Here's an example of one:

https://indyfor.com/threads/6/post-56

Do you see that post-56 at the end of the URL? That little bit of code directs the user to the most recent post in the thread. When I click on the link, here's what the ultimate URL looks like:

https://indyfor.com/threads/6/#post-56

So the initial URL 301 redirects to a different one. This is actually the original thread URL:

https://indyfor.com/threads/6/

That's the one that all other redirects are supposed to canonicalize to. They're supposed to merge into the original, which is important because if there are 200 replies inside of a thread, that's 200 redirects. And trust me, there can be many more replies to a thread. Thousands and thousands of them.

There are other instances of similar activity across the site, but for the sake of brevity, I'll leave them out of this discussion (okay fine, I'll mention them below).

Anyway, here's what I'm noticing - when I block these 301 redirect URL inside of my robots.txt file like this:

Disallow: /threads/*/post
Disallow: /threads/*/latest
Disallow: /goto

All of a sudden, threads that weren't indexed show up. When I use the site:indyfor.com command at Google.com and then choose to view results in the past 24 hours, threads I've never seen indexed before, appear. Like they've finally had their duplicates blocked and they can reveal themselves. I've tested this over and over and I'm certain that what I'm seeing is actually occurring, which surprises me because I've always thought of 301 redirects as pretty stable commands. Or directives. Whatever you want to call them.

I'm considering leaving these directories blocked to see what happens. I don't like having so many "Blocked by robots.txt" errors in the Google console, but so be it. I'd rather have them than duplicate URLs that are stopping other URLs from being crawled and indexed.

Do you run this software? What are your thoughts on this? Have you experienced something similar? Please let me know down below.
 
Xenforo Duplicate Content was posted on 08-17-2020 by JGaulard in the Tech Forum.
Top