Hand crafted bot accounts and community targeted ads, what's the story?

bulwark@infosec.pub · 10 months ago

Hand crafted bot accounts and community targeted ads, what's the story?

Deebster@lemmyrs.org · edit-2 10 months ago

Seems like Lemmy should add a rel=canonical link when browsing federated communities - this would “solve“ this issue (and would be the correct thing to do anyway).

jonne@infosec.pub · 10 months ago

I believe Lemmy instances disallow crawling by default, so SEO is probably not why. Would be nice to find Lemmy results in Google if they can sort out the canonical URL problem. Reddit was a great resource for random questions, and if people move here it should still be easy to find.

Admiral Patrick@dubvee.org · 10 months ago

Nope, it’s allowed.

The default robots.txt disallows access to a few paths but not /post or /comment.

There are lots of crawler bots hitting my instance (ByteSpider being the most aggressive). I just have a list of User Agent regexes I use to block them via Nginx. Some, like Semrush, have IP ranges I can block completely at the firewall (in addition to the UA filters)

Deebster@lemmyrs.org · 10 months ago

What makes you say that? robot.txt just disallows things like /create_community and there’s no robots, googlebot, etc meta tags in the source that I can see, and no nofollow apart from on a few things like feeds.

Also, I’m sure I’ve seen Lemmy appearing in search results already.

StudioLE@programming.dev · 10 months ago

Do you mean rel="nofollow"?

Deebster@lemmyrs.org · edit-2 10 months ago

No, I was referring to the bit about having lots of copies of the same content on each different instance. If example.com/c/comm@* had a meta tag giving the origin community as the rel=canonical link target then only the origin would be in a search engine as the only linker.

rel=nofollow is a good idea too, but less interesting to this semantic html nerd.