Avoid XSS and allow some html tags with JavaScript

In order to prevent Application from XSS attacks I usually use following rules:
  1. Determine the level of security for your application.
    There are several tools that can protect your application as for me better security is provided by OWASPtools: ESAPI or AntySami.
    Note:Using Sanitization does not guarantee filtering of all malicious code, so tools can be more or less secure.
  2. Understand whether you need to perform sanitization on client, server or both sides. In most cases it’s enough to do this on server side.
  3. Understand whether you need to preserve html tags (and what tags you need to preserve) or not. As it was stated previously not allowing html tags is more secure solution.
Based on this you can find a proper decision.
1. Personally for server code sanitization I used jSoup. As for me it’s pretty good tool to do this.
Usually In order to check input vulnerability I am using following vector:
';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//-->

“>’><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>

  1. In case you need prevent XSS on client side you can use following tools:
    a) JSSANItazer seems a bit outdated
    b) Dust – maintained by twitter; 
These tools easily can allow you to sanitize your input and mainly is answer for your question.
Server side tools mentioned above.
Regarding 3rd point. In case you don’t need to handle html tags you can easily use ESAPI on server side andESAPI4JS on client side. As I understand it doesn’t work for you.
When I read your task I understood that you are storing email message therefore In your case it’s required to sanitize input on server side (using one of tools) and it’s as per you to add it or not on client side. You need only decide whether add another sanitization on UI side or render your “preview page” on server.

Best regex to catch XSS (Cross-site Scripting) attack (in Java)?

Don’t do this with regular expressions. Remember, you’re not protecting just against valid HTML; you’re protecting against the DOM that web browsers create. Browsers can be tricked into producing valid DOM from invalid HTML quite easily.
For example, see this list of obfuscated XSS attacks. Are you prepared to tailor a regex to prevent this real world attack on Yahoo and Hotmail on IE6/7/8?
<HTML><BODY>
xml:namespace prefix="t" ns="urn:schemas-microsoft-com:time">
import namespace="t" implementation="#default#time2">
<t:set attributeName="innerHTML" to="XSSalert("XSS")">
</BODY></HTML>
How about this attack that works on IE6?
<TABLE BACKGROUND="javascript:alert('XSS')">
How about attacks that are not listed on this site? The problem with Jeff’s approach is that it’s not a whitelist, as claimed. As someone on that page adeptly notes:
The problem with it, is that the html must be clean. There are cases where you can pass in hacked html, and it won’t match it, in which case it’ll return the hacked html string as it won’t match anything to replace. This isn’t strictly whitelisting.
I would suggest a purpose built tool like AntiSamy. It works by actually parsing the HTML, and then traversing the DOM and removing anything that’s not in the configurable whitelist. The major difference is the ability to gracefully handle malformed HTML.
The best part is that it actually unit tests for all the XSS attacks on the above site. Besides, what could be easier than this API call:
public String toSafeHtml(String html) throws ScanException, PolicyException {

Policy policy = Policy.getInstance(POLICY_FILE);
AntiSamy antiSamy = new AntiSamy();
CleanResults cleanResults = antiSamy.scan(html, policy);
return cleanResults.getCleanHTML().trim();
}

http://stackoverflow.com/questions/24723/best-regex-to-catch-xss-cross-site-scripting-attack-in-java

Java Best Practices to Prevent Cross Site Scripting

The normal practice is to HTML-escape any user-controlled data during redisplaying in JSP, not duringprocessing the submitted data in servlet nor during storing in DB. In JSP you can use the JSTL (to install it, just drop jstl-1.2.jar in /WEB-INF/lib tag or fn:escapeXml function for this. E.g.
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
...
<p>Welcome <c:out value="${user.name}" /></p>
and
<%@ taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
...
<input name="username" value="${fn:escapeXml(param.username)}">
That’s it. No need for a blacklist. Note that user-controlled data covers everything which comes in by a HTTP request: the request parameters, body and headers(!!).
If you HTML-escape it during processing the submitted data and/or storing in DB as well, then it’s all spread over the business code and/or in the database. That’s only maintenance trouble and you will risk double-escapes or more when you do it at different places (e.g. & would become & instead of & so that the enduser would literally see & instead of & in view. The business code and DB are in turn not sensitive for XSS. Only the view is. You should then escape it only right there in view.

See also:

Cross-Site Scripting And HttpOnly Attribute

Microsoft Internet Explorer(1) has an interesting feature which is not very well known. If a cookie has been set with attribute ‘HttpOnly’ then the browser will forbid any access to it from client-side code. Javascript will not be able to read, write or acknowledge information stored in the cookie.
At first sight this might not seem to be very useful, but if we bring into the picture security of web applications and especially cross-site scripting (XSS)vulnerabilities – things get interesting. One of the classical examples of XSS attack is the one in which a hacker manages to read user’s session identifier from a cookie and use it to access a resource(2).
The most obvious way to remediate that would be to use HttpOnly attribute while setting the JSESSIONID cookie. Unfortunately this step is done by the application server itself and as on now most of them do not use HttpOnly(3). What we might try to do is to rewrite the cookie after it has been created as shown here: http://keepitlocked.net/archive/2007/11/05/java-and-httponly.aspx.
But there’s also another way which you might consider.
We are going to use a Servlet filter to do following:
  1. On the first request within a session we create a token – it should be a random value, just as JSESSIONID is. We store the token as a HttpOnly cookie and also as a session attribute.
  2. On every subsequent request we compare whether the cookie token is equal to the value stored in session.
  3. If cookie token is missing or they are different – we can assume that a third party is trying to impersonate the user by using stolen JSESSIONID.
  4. In such case we can invalidate the whole session, inform an administrator about the attempt or even let the original user know that somebody has been trying to hack in.
Aside of giving us a chance to trace this particular type of abuse, this mechanism can also be used to deal with one more case. Some older servlet containers (that will remain nameless 🙂 ) create session identifiers of a fixed length without providing any configuration option to make them longer. If your application happens to be audited – too short session identifier will most likely by treated as a vulnerability and required to be fixed.
By using the mechanism described above we can effectively control the length of session id, because from practical perspective the session identifier length will be equal to JSESSIONID length plus our hand-generated token length. In real life this approach has helped me to get approval of the ethical hacking team more than once.
A proof-of-concept code below, feel free to suggest changes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import java.io.IOException;
import javax.servlet.*;
import javax.servlet.http.*;
 
public class HttpOnlyTokenFilter implements Filter {
 
    private static final String TOKEN_KEY = "HTTP_ONLY_TOKEN";
 
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
        throws IOException, ServletException {
 
        validateToken((HttpServletRequest)request, (HttpServletResponse)response);
        chain.doFilter(request, response);
    }
 
    private void validateToken(HttpServletRequest request, HttpServletResponse response) {
        HttpSession session = request.getSession(true);
        String token = (String)session.getAttribute(TOKEN_KEY);
        if (token == null) {
            token = getRandomString();
            session.setAttribute(TOKEN_KEY, token);
            response.addHeader("Set-Cookie", TOKEN_KEY + "=" + token + ";httpOnly");
        }
        else {
            String cookieToken = getCookieValue(request.getCookies(), TOKEN_KEY);
            if (token.equals(cookieToken) == false) {
                session.invalidate();
            }
        }
    }
 
    private String getCookieValue(Cookie[] cookies, String cookieName) {
        if (cookies == null) return null;
        for (Cookie cookie : cookies) {
            if (cookieName.equals(cookie.getName())) return cookie.getValue();
        }
        return null;
    }
 
    private String getRandomString() {
        return String.valueOf(System.currentTimeMillis()); // TODO: replace with more random value. I mean it!
    }
 
    public void init(FilterConfig filterConfig) throws ServletException {}
    public void destroy() {}
}
(1)It seems that soon all mayor browsers will support this feature as well, for example http://blogs.securiteam.com/index.php/archives/849
(2) As in http://www.owasp.org/index.php/Session_hijacking_attack
(3) Things are improving here as well: https://issues.apache.org/bugzilla/show_bug.cgi?id=44382