What Are HTML Entities?
HTML entities are special codes used to represent characters that have a reserved meaning in HTML, or characters that cannot easily be typed on a keyboard. They begin with an ampersand (&) and end with a semicolon (;). For example, the less-than sign < must be written as < in HTML to prevent it from being interpreted as the start of an HTML tag.
There are three forms of HTML entities: named entities like &, < and ©; decimal numeric entities like < (for <); and hexadecimal numeric entities like <. All three forms are supported by all modern browsers and produce identical output.
The 5 Essential HTML Characters to Escape
| Character | Named Entity | Decimal | Hex | Why escape it? |
|---|---|---|---|---|
| & | & | & | & | Starts all HTML entities โ must be encoded first |
| < | < | < | < | Starts HTML tags โ creates XSS vulnerabilities |
| > | > | > | > | Closes HTML tags |
| " | " | " | " | Breaks out of HTML attribute values |
| ' | ' | ' | ' | Breaks out of single-quoted attribute values |
HTML Encoding and XSS Prevention
Cross-Site Scripting (XSS) is one of the most common web vulnerabilities. It occurs when user-supplied input containing HTML or JavaScript is inserted into a page without proper escaping. An attacker might submit a value like <script>alert('XSS')</script> which, if unencoded, executes as JavaScript in the victim's browser. Properly HTML-encoding all user input before rendering it in HTML contexts converts those characters into harmless entities that display as text.
Always encode user input at the point of output (when inserting into HTML), not at the point of input. Different contexts (HTML body, HTML attributes, JavaScript, CSS, URLs) require different escaping strategies โ HTML encoding alone is not sufficient inside <script> blocks or URL attributes.
HTML Encoding in Code
Most languages have built-in functions for HTML encoding. In PHP: htmlspecialchars($str, ENT_QUOTES, 'UTF-8'). In Python: html.escape(str). In JavaScript (browser): create a text node โ document.createTextNode(str).textContent. In Node.js: use the he or entities npm package. In Java: StringEscapeUtils.escapeHtml4(str) from Apache Commons. In C# .NET: HttpUtility.HtmlEncode(str) or WebUtility.HtmlEncode(str).