Sep152009

How Reddit does permalinks

Tagged:

This is essentially a “nothing” post. Just something my digging has unearthed, something that was in plain sight all along but somebody out there might be interested.

Reddit’s URLs for comments look like this:

  • http://
  • www.reddit.com/
  • r/
  • [SubredditName]/
  • comments/
  • [SubmissionID]/
  • [SanitizedSubmissionTitle]/

The [SanitizedSubmissionTitle] part is calculated by taking the real title of the submission and applying the following steps so that it makes a sane, search-engine friendly URL.

  1. Force the title to only use Unicode characters (this is the step I understand the least)
  2. Replace all white space (tabs, spaces, etc) with underscores
  3. Remove unprintable characters (yes, such things exist)
  4. Remove instance of multiple consecutive underscores
  5. Remove any underscore at the end of the title
  6. Convert the title to all lowercase
  7. Trim the title to the maximum allowed length of 50 characters
  8. If the title was greater than 50 characters, trim the title again, this time to the last word boundary

You could say there’s a flaw in the logic of Step 8 in the case where trimming to 50 characters leaves a whole word occupying the last part of the title. Step 8 would still snip off this last word even though we’ve got complete words and we’re under the 50 character limit. But this is just a nitpick and I doubt anyone has ever noticed.

More?
Previous: Only losers update their blogs
Next: Host your own Windows party

Comments Closed

Jack is no longer taking any comments on this blog post. You can message Jack directly on Twitter. If he is not busy, he'll be more than happy to discuss what you think about this blog post.