457,706,384 widgets served
The largest re-publisher of feeds on the Internet

Getting Rid of the Squigglies

An example of the Squgglies Have you created a FeedSweep widget and it presented correctly in the Designer but when you put in on your own web page, it has "the wiggglies"? Perhaps something like this example?

This is caused by an incorrect Content-Type setting on the page hosting your FeedSweep.

FeedSweep is used around the world and because of this, the question of character sets, encoding and Content-Type settings pops up over and over again. This topic elicits much confusion and misunderstanding by even the more-knowledgeable web developers.

Understanding Content-Type

The problem of matching content encoding is never emphasized in classes and tutorials on how to create web pages. But it is one of the most important factors in producing valid HTML.

Unfortunately, many web developers are not aware of this - or the basics of doing so. It is rarely discussed or written about because it is such a confusing topic. The article “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets” does a good job explaining why. Consider this a must read.

It is a fundamental responsibility of the web developer to display page content in the encoding form it was created in.

Let's start understanding this topic by imagining you work for a map company and have been given a job of creating a list of all the names of cities in the world. This task seems rather straight-forward, until you find out that one of the requirements is to be able to list these names in 40 different languages.

At first look, this task does not seem too difficult. You are not required to translate any sentences into foreign languages – just names. A name in one language is the same in another language isn't it?

Well, not really. Many names use local spelling and characters unique to the local language. So you can go about your task in one of two ways: make 40 lists with one for each language or figure out a way to create a single list but have it display in any one of the 40 languages. Of course, the second option is the easiest. This means you need to obtain your list of names, save it in a form that can be interpreted by any language and then be able to display it in an application that uses that language.

Correct Content-Type for your FeedSweep

Configuring your web page to properly host a FeedSweep widget is no different.

The first step is done for you. All FeedSweep widgets are coded in an almost-universal method called UTF-8. Think of a UTF-8’ed FeedSweep in this fashion: "Give me a FeedSweep but display German characters properly for my German visitors and Portuguese characters for my readers from Portugal." It is important to understand we are NOT translating the FeedSweep from one language to the other. We are simply taking any word using a foreign character (like it a name) and displaying it properly.

The second step is up to you. You have to tell your visitors that the page hosting the FeedSweep widget is “encoded” as UTF-8. If your current page has a coding of "windows-1252" and FeedSweep is "UTF-8" , then you have a mismatch and your readers will see odd characters in the content from time to time when the encoding does not match up.

Setting Content-Type

If you take anything away from this article, understand that for your webpage to actually be valid you MUST declare the character encoding. This lets your visitor's browser know whether to use A to Z letters (ie. Latin), or Chinese, or Arabic or some other character set for the page. You can set this "charset" parameter in one of 3 ways:

  • In the HTTP Content-Type header
  • In an XML declaration (if you are serving XHTML pages)
  • In a META tag

Precedence Rules

Before you go about choosing where you are going to set your Content-Type parameters, it is important to understand that in the case of conflict between multiple encoding declarations, precedence rules apply to determine which declaration wins out. For XHTML and HTML, the precedence is as follows from highest to lowest:

  1. In the HTTP Content-Type header
  2. In an XML declaration (if you are serving XHTML pages)
  3. In a META tag

Precedence rules can be a real problem.

Most web servers are configured to assume a declaration by default. Which default declaration yours is can be shown here by pasting in the URL to the page hosting your FeedSweep widget:



Your HTTP header "Content-Type" value is:




If your server is using a HTTP header Content-Type declaration different from UTF-8 then, even if you include the proper UTF-8 XML (if you are serving XHTML pages) or META tag, the FeedSweep widget will still NOT display properly.

Setting Content-Type with a META Tag

As long as the individual webpage that hosts your FeedSweep is not overridden by the HTTP response header, you can insert this META tag into the HTML of your page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Setting Content-Type in an XHTML Page

Setting the proper Content-Type with XHTML is much more complicated. The problem stems from the fact that different browsers (in particular Internet Explorer 6) interpret in non-standard ways. If you choose to set your Content-Type in an XHTML page rather than in the HTTP header, it is recommended you read the W3C tutorial, "Character sets & encodings in XHTML, HTML and CSS".

A good rule of thumb is to start with the following settings and then test accordingly. Always place an XML declaration as the top of the file as the first line and then add the appropriate DOCTYPE declaration as the next line:

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Setting Content-Type in the HTTP Header

This is the best way to set Content-Type for an individual page because it is highest on the Precedence Rules list. The HTTP header value for the web page hosting your FeedSweep widget can be set in any one of the following server side scripting languages:

.NET

Content type and charset are set on the response object. To set the charset, use:

Response.ContentType = "text/html; charset=UTF-8";

Perl

Output the correct header before any part of the actual page. After the last header, use a double linebreak.

print "Content-Type: text/html; charset=utf-8\n\n";

Python

Use the same solution as for Perl (except that you don't need a semicolon at the end).

print "Content-Type: text/html; charset=utf-8\n\n"

PHP

Use the header() function before generating any content.

header('Content-type: text/html; charset=utf-8');

Java Servlets

Use the setContentType method on the ServletResponse before obtaining any object (Stream or Writer) used for output.

resource.setContentType ("text/html;charset=utf-8");

If you use a Writer, the Servlet automatically takes care of the conversion from Java Strings to the encoding selected.

JSP

Use the page directive:

<%@ page contenttype="text/html; charset=UTF-8"%>

Output from out.println() or the expression elements (<%= object%>) is automatically converted to the encoding selected. Also, the page itself is interpreted as being in this encoding.

ASP

Content type and charset are set on the response object. To set the charset, use:

<%Response.charset="utf-8"%>