500 Response on Robots.txt Fetch Can Impact Rich Results


Google’s John Mueller received feedback about a bug in how Search Console verifies rich results. Google will remove images from rich results because of an error in how a CDN hosting images handles a request for a non-existent robots.txt. The bug that was discovered was how Search Console and Google’s rich results test would fail to alert the publisher to the error and subsequently provide a successful validation to the structured data.

In the context of programming a bug occurs when a software program behaves in an unexpected way. A bug in coding isn’t always a problem, but in this case, it could be a failure to anticipate an issue that in turn leads to unintended consequences, like this one.

The publisher asking the question attempted to use Google’s tools to diagnose the reasons for their missing rich results and was surprised to find that they were of no use for this particular error.

While this issue was affecting recipe rich result image previews in Google’s recipe rich results, this issue can be a problem for other situations as well.

So it is good to be aware of this problem as it can come up in other ways.

Recipe rich results image preview disappeared

The person asking the question provided background of what happened.

He relates what happened:

“We ran into a tiger trap, I would say, in terms of rich recipe results.

We have hundreds of thousands of recipes that are indexed and a lot of traffic is coming from the recipe gallery.

And then… after a while it stopped.

and all meta data was checked and google search console was saying… it’s all rich recipe ingredients, it’s all good, it can be shown.

We finally noticed that in Preview, when you preview the result, the image was missing.

And it looks like Google had a change and if robots.txt was required to retrieve images, nothing we could see in the tool was actually saying anything invalid.

And so it’s a little weird, when you check something to say “is this a valid rich recipe result?” And it says, yes, this is great, this is absolutely great, we have all the metadata.

And you check all the URLs and all the images are correct, but it turns out behind the scenes that there was a new requirement that you have a robots.txt.”

John Mueller asked:

“What do you mean you must have robots.txt?”

The questioner replied:

“What we found is that if you requested robots.txt from our CDN, it gave you like 500.

When we put robots.txt in there, immediately the preview started showing correctly.

And that includes crawling and placing it on a static site, I guess.

So we operationally found that robots.txt worked.”

John Mueller nodded and said:

“Yeah okay.

So from our point of view, it’s not like robots.txt file is needed. But it needs to have a proper result code.

So if you haven’t, it should return a 404.

If you have one, we can clearly read that.

But if you return a server error for the robots.txt file, our systems will assume that there is probably a server problem and we will not crawl.

And it’s something that’s been like that from the beginning.

But issues like this where especially when you’re on a CDN and it’s on a different hostname, it’s really hard to detect sometimes.

And I do rich result imaging test, at least as far as I know, it focuses on the content that is on the HTML page.

So the JSON-LD markup you have probably doesn’t check to see if the images are actually achievable.

And then if they can’t be fetched, of course, we can’t even use them in the carousel.

So this might be something we need to figure out how to highlight better. ,

500 error response for CDN Robots. txt can cause problems

This is one of those shows stopping SEO problems that are hard to diagnose, but can cause a lot of negative issues as noted by the question asker.

Typically a crawl for robots.txt that is non-existent should result in a server response code of 404, which means that robots.txt does not exist.

So if the request for the robots.txt file is generating a 500 response code then it is an indication that something is misconfigured on the server or CMS.

The short term solution is to upload a robots.txt file.

But it might be a good idea to dive into CMS or Server to check what is the underlying problem.

Receive a 500 Response Code for a Robots.txt

Negative results for rich result previews of recipes can be a rare issue due to a CDN giving a 500 error response.

The 500 Server Error response code sometimes occurs when something unexpected or missing in the code occurs and the server code responds by terminating the processing and throwing a 500 response code.

For example, if you edit a PHP file and forget to indicate the end of a section of code, it can cause the server to stop processing the code and throw a 500 response.

Whatever the reason for the error response, it’s a good point to keep in mind the rare event that this happens to you when Google tried to bring in robots.txt.

Citation

CDN Bug for Image and Recipe Rich Results

Watch at the 51:45 min mark





Source link

Leave a Comment