| Sep152009 | How Reddit does permalinks |
This is essentially a “nothing” post. Just something my digging has unearthed, something that was in plain sight all along but somebody out there might be interested.
Reddit’s URLs for comments look like this:
http://www.reddit.com/r/[SubredditName]/comments/[SubmissionID]/[SanitizedSubmissionTitle]/
The [SanitizedSubmissionTitle] part is calculated by taking the real title of the submission and applying the following steps so that it makes a sane, search-engine friendly URL.
- Force the title to only use Unicode characters (this is the step I understand the least)
- Replace all white space (tabs, spaces, etc) with underscores
- Remove unprintable characters (yes, such things exist)
- Remove instance of multiple consecutive underscores
- Remove any underscore at the end of the title
- Convert the title to all lowercase
- Trim the title to the maximum allowed length of 50 characters
- If the title was greater than 50 characters, trim the title again, this time to the last word boundary
You could say there’s a flaw in the logic of Step 8 in the case where trimming to 50 characters leaves a whole word occupying the last part of the title. Step 8 would still snip off this last word even though we’ve got complete words and we’re under the 50 character limit. But this is just a nitpick and I doubt anyone has ever noticed.
| More? |
|