Java and UTF-8 encoding

If the J2SE platform has come a long way in internationalization, entering non-ASCII text in the J2EE world isn’t nearly as easy.

To achieve the same result you have to make some changes in your code and in your web server settings.

Firstly, to make sure that the right value in the Content-Type header precedes the text/html content so your browser correctly auto-detects the right encoding, place the following declaration at the beginning of the JSP:

<%@ page contentType="text/html; charset=utf-8" pageEncoding="UTF-8" %>

Next you have to create a filter that implements the ‘javax.servlet.Filter’ interface so you can have the request parameters encoded with UTF-8:

package com.samaxes.filters;

import javax.servlet.*;
import java.io.IOException;

/**
 * Filter called before every action.
 *
 * @author : samaxes
 */
public class UTF8Filter implements Filter {

    public void init(FilterConfig filterConfig) {
    }

    public void destroy() {
    }

    public void doFilter(ServletRequest servletRequest,
                         ServletResponse servletResponse,
                         FilterChain filterChain)
            throws IOException, ServletException {
        servletRequest.setCharacterEncoding("UTF-8");
        filterChain.doFilter(servletRequest, servletResponse);
    }
}

Now, your server reads the URL POST parameters correctly…

But there still is an issue – during a GET operation.

The trouble is that none of the charset information gets sent back to the web server during a GET or POST operation. The server has no way of knowing how to interpret the url-encoded GET parameters, so it assumes ISO-8859-1.

Fortunately the solution to address this is pretty simple, just specify URIEncoding="UTF-8" in your Tomcat’s connector settings within the server.xml file.

Your application shall now handle UTF-8 just fine.

Published by

Samuel Santos

Java developer, Open Source hacker, Web technologist, JUG Leader.

8 thoughts on “Java and UTF-8 encoding”

  1. Hi,
    Good one.
    But is there anyway by which I may get UTF characters in catalina log (take the case of tomcat)? If so, what kind of modifications we need to do?

  2. [quote comment=”291″]Hi,
    Good one.
    But is there anyway by which I may get UTF characters in catalina log (take the case of tomcat)? If so, what kind of modifications we need to do?[/quote]
    It may be related with the encoding of the machine where you are running Tomcat.
    Are you opening the file as UTF-8?

  3. Dear Sam,
    Thanks for the reply, I have changed the encoding as UTF-8 in server.xml; Though, my System.out.printlns coudnt give me unicode characters. they are printed in ASCII only. Is there any other setting We need to changed to get Unicode characters in System.out stream?

  4. Try adding the attribute -Dfile.encoding=UTF-8 in your server starting script, then restart your server.

    In a DOS console you won’t see any Unicode character; you should use an editor to open your server log in UTF-8 encoding.

  5. as for the POST solution using your filter, you still need to edit web.xml from tomcat to make it handle the filter, right?

    1. Correct, you must declare it in your web application deployment descriptor (web.xml).
      Alternatively you can use the @WebFilter annotation (only if your container supports the Servlet 3.0 spec).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>