URL Encoding Mistakes That Break Real Applications

URL encoding bugs are rarely obvious in a code review. A link looks correct for ordinary test data, then fails when a customer name contains an ampersand, a filename contains a plus sign, or a redirect includes an already encoded parameter. The underlying mistake is usually a confusion between data and structure: software encodes too much, too little, or at the wrong layer.

Double encoding changes the original value

When a space is encoded once, it becomes %20. If that result is encoded again, the percent sign becomes %25, producing %2520. A receiver that decodes once sees the literal text %20 rather than a space. Double encoding often appears when several layers each assume they own serialization.

Encoding should happen at the boundary that constructs the URL. Internal code should pass structured, unencoded values until that point. Likewise, decoding should occur once after parsing. Clear contracts prevent middleware, helpers, and application code from repeating the same transformation.

Plus does not always mean space

In form-style query encoding, a plus sign represents a space. In other URL components, plus is an ordinary character. A value such as a phone number or Base64 string can be corrupted when a generic query parser silently turns plus signs into spaces. Encoding the literal plus as %2B preserves it.

Problems arise when producers and consumers use different conventions. Tests should include literal plus signs and spaces, and API documentation should identify whether query values follow application/x-www-form-urlencoded rules or strict percent encoding.

Reserved characters must be encoded as data

Ampersand separates query pairs, equals separates names from values, hash begins a fragment, question mark begins a query, and slash separates path segments. When those characters belong inside a value, leaving them unencoded changes the URL's structure. A search for “salt & pepper” can become an extra empty parameter instead of one phrase.

Component-aware builders solve most of these bugs. They know that a slash may remain structural in a full path but must be escaped inside one dynamic segment. Generic replacement functions do not understand context and are easy to apply incorrectly.

Decoding before validation can create security gaps

Security filters sometimes inspect a raw URL while routing code later decodes it. An attacker may percent-encode dangerous path sequences so the filter sees harmless text and the router sees a traversal or restricted route. The reverse can also happen when a proxy normalizes more aggressively than the application.

All layers that make routing, authorization, caching, or signature decisions need a shared canonical representation. Normalize once according to a defined policy, reject malformed encodings, and apply security checks to the same form the application uses.

Open redirects often hide in innocent parameters

Login and payment flows frequently accept a return URL. Passing that value through encoding does not make it trustworthy. An attacker can supply an encoded external address and send users to a phishing site after a legitimate action. The problem is authorization, not syntax.

Return destinations should be restricted to approved hosts or, preferably, represented as internal route identifiers. Parse the destination, validate its scheme and host, and beware of user-info syntax, misleading subdomains, and alternate encodings.

Signatures require exact normalization

Signed URLs are sensitive to every textual difference. Parameter order, hexadecimal letter case, spaces represented as plus or percent escapes, and omitted default ports can all change the signature input while preserving an equivalent logical URL. If signer and verifier normalize differently, valid requests fail or ambiguous requests become dangerous.

A signing protocol must define canonicalization precisely. Both sides should use the same implementation where possible and test examples containing repeated keys, Unicode, empty values, and reserved characters.

Malformed escapes should fail clearly

A percent sign followed by incomplete or non-hexadecimal text is not a valid percent escape. Parsers differ in how generously they handle malformed input: some reject it, some preserve the text, and others replace bytes during character decoding. Accepting several interpretations creates ambiguity between security layers and makes bugs difficult to reproduce.

Applications should reject malformed encodings at the edge and return a clear client error. A strict policy prevents later components from “repairing” the same request in different ways. It also distinguishes invalid transport syntax from valid values that merely fail business validation.

Debug the URL at every boundary

When a value arrives corrupted, record the original structured value, the constructed URL, the raw request target, and the parsed result. Comparing those stages reveals where transformation occurred. Browser tools, proxies, frameworks, and server logs may each display a decoded or normalized version, so know which representation you are viewing.

URL encoding is predictable when ownership is clear. Keep values raw inside the application, build addresses with component-aware APIs, decode once, normalize consistently, and validate meaning separately from syntax. Those habits prevent the class of bugs that seem random only because too many layers are changing the same string.