464,230,207 widgets served
The largest re-publisher of feeds on the Internet
How to Use Filters

Summary: FeedSweep Filtering is very powerful. Filtering is to FeedSweep what seaching is to Google. It can make your FeedSweep much more interesting and relevant to your audience. Learn how to make the most of filtering with this class in filtering.

A Class in Filtering

How does filtering work?

The filtering feature uses an “assumed AND”. This means that when you type in two or more words as a filter, it assumes your filter is: "word1 AND word2". It also assumes the filter will look for the words in the article contents or article title. You can change these assumptions by applying more advanced filters.

EXAMPLE FILTER: tent medical soldier

This filter will include all articles from all feeds that contain all of these words in any order. It will look for “tent AND medical AND soldier” in all article contents or article titles. It looks for complete word matches, so partial matches are not included.

EXAMPLE FILTER: tent OR medical OR soldier

This filter is much broader than the one above. It will include all articles from all feeds that contain any of these words in any order within article contents or title.

What can be filtered?

Filters can be applied against all feeds in a FeedSweep widget and/or against all of the individual articles (or entries) contained within each feed.

Internally, the FeedSweep engine converts any RSS, Atom or RDF-based feed into a Universal Feed Format. All these types of feeds are seen by the filtering function as having a similar format. For example, a filter applied against author: is comparing the same information whether the original feed(s) was in Atom or RSS format.

When you are applying a filter at the feed level, you are excluding or including all articles contained within that feed. From this viewpoint, a feed filter is a global filter. For example, a filter that does not match the id: of a feed, will exclude all the articles within that feed even if other article filters attempt to include individual articles.

When you are applying a filter on articles, you are excluding or including articles on an invidual basis. For example,  a filter of "NOT marshmellow" (or "-marshmellow"), might only exclude one single article from all the articles of all the feeds.

Important! Consider article filters rather than (global-like) feed filters.

Does capitalization matter?

Filters are mostly not case sensitive. All letters, regardless of how you enter them, are understood as lower case. For example, searches for "george washington," "George Washington," and "George washington" all return the same results. The only exception is when filter terms are enclosed in ditto marks. This creates an exact filter and case could then be relevant. Non-English characters are handled properly as long as they are properly UTF-8 encoded in the source feeds.

How does filtering make choices?

Put simply, filters are either true or false, matched or not matched. There is no such thing as a partial match. Each filter condition is applied and the result either includes or excludes the feed or article being tested. 

If a filter is to be applied to a title: or description:, then the text you provide is considered " a word". Words are defined as having a space before and after the text just like the words in this paragraph. The only exception is when filter terms are enclosed in ditto marks.

If you are creating a filter that is to apply against any other element (like id: or link:), the text you provide for the filter is compared exactly as is.

Why the difference? A filter against a title: or description: should not include words within words. For example, if a filter word of "land" was used, you would not want a match against an article with the word "landlocked" in it. But with filter text of  "yahoo" matching against the link: element, extra spaces before and after the search text would never be correctly matched when compared against "yahoo.com/news/". 

Does filtering observe stop words?

FeedSweep filtering ignores common words and characters known as stop words. These include most pronouns and articles. FeedSweep automatically disregards such terms as "where" and "how," as well as certain single digits and single letters. These terms rarely help to narrow a filter and can significantly slow the process. If you want to use stop words in your filter, enclose your phrase containing stop words in quotation marks.

Does FeedSweep filtering use stemming?

To provide the most accurate results, FeedSweep does not use "stemming" or support "wildcard" filters. Rather, filters compare the exact words that you enter into the search box.


For example, filters of "airlin" or "airlin*" will not yield "airline" or "airlines.". If in doubt, try both forms, for example: "airline" and "airlines."

Why am I not getting any results?

The most likely reason is you are filtering on feeds alone. A filter on a feed will exclude the entire feed - meaning all articles contained within that feed are excluded even if other filters attempt to include articles.

Important! Filters on feeds take precedence over filters on articles.

If this is not the case, then consider that filters are automatically "AND" filters. If your filter has several words in it, then these are automatically seen as "word1 AND word2 AND word3". The filter result is excluding more and more articles with the addition of each AND word condition. On the other hand, if you were to use "word1 OR word2 OR word3", this increases the potential number of articles that might appear.

"OR" filters

FeedSweep filtering supports the logical "OR" operator. To retrieve articles that include either word A or word B, use an uppercase "OR" between terms. You can also use a vertical bar: "|"

EXAMPLE FILTER: london OR paris

This filter will return articles with either "London" or "Paris" in the article contents or title.

Excluding filters

You can exclude a word by putting a minus sign ("-") or a NOT immediately in front of the term you want to exclude. Make sure you include a space before the minus sign with no space directly following.

EXAMPLE FILTER: bass -music

This filter will return articles about bass that do not contain the word "music."

Exact phrase filters

You can add an exact phrase filter by adding quotation marks (sometimes called double quotes) around your filter text. This causes the FeedSweep engine to match all occurrences of that exact series of characters, not including the quotation marks. For example: "Articles of the CONSTITUTION in 1884". Exact phrase searches using quotation marks are useful when searching for famous sayings or specific names.

EXAMPLE FILTER: "father-in-law"

This filter requires the exact phrase quotation marks because it could otherwise be confused as NOTs.


Advanced Operators

A list of all the filter function operators with explanation are provided below.

Important: There can be no space between any filter operator and the following word or text. Correct: author:staffwriter   Wrong: author: staffwriter

author:
If you preface your filter term with author:, the filter is applied against the author name of each article. For example, aauthor:Swift includes articles that contains the word "Swift" in the entry's author name. If there are multiple authors listed in the feed, only the first is compared. Note there can be no space between author: and the following name text.

description:

If you preface your filter term with description:, the filter word is applied against the contents and title of each article. This is the default. A filter of description:flowers AND description:pretty is the same as flowers pretty. Note there can be no space between description: and the following word.

id:

If you preface your filter term with id:, the filter is applied against the unique GUID of each article. For example, id:9887 includes articles where the article's GUID is "9887" or "123459887" or "111198871111".  While there is no wild card matching, partial matches do apply in this case. Unfortunately, there is no standard whatsoever as to how a GUID should be formed by feed creators, so this can be difficult to use in pattern matching. Note there can be no space between id: and the following text.

link:

If you preface your filter term with link:, the filter enables you to restrict your filter results based on the article's self-pointing URL that points back to a human-readable webpage containing that article. To do this, use the link:sampledomain.com syntax in the filter. While there is no wild card matching, partial matches do apply in this case, so link:sampledomain.com matches www.sampledomain.com or sampledomain.com/whatever/thispage.html. Note there can be no space between link: and the following text.

pubdate:

The filter pubdate: enables you to include articles where the published date of the article is dated within the last X days, where X is a number between 1 and 120. For example, pubdate:7 includes only those articles that have been published within the last week. Note there can be no space between pubdate: and the following number.

limit:

The filter limit: enables you to include X number of articles, where X is a number between 1 and 50. Important: this filter is per feed. For example, if you included 3 separate feeds in your FeedSweep and use limit:4, then you will get a maximum of 12 articles/entries included even if other filter options attempt to inclde more. Note there can be no space between pubdate: and the following number.

title:

If you preface your filter term with title:, the filter word is applied against the title of each article only. For example, title:samson includes articles that contains the word "samson" in the title. Note there can be no space between title: and the following word.

feedtitle:

If you preface your filter term with feedtitle:, the filter word is applied against the title of the feed itself. All articles within that feed are included. An article-based filter could then eliminate more articles but it is not possible to have an article-based filter add additional articles for inclusion once a feedtitle: filter is in effectImportant: Do not confuse this with the title of each article. For example, feedtitle:myblog includes all articles within a feed where the feed title contains the word "myblog" in the title. Note there can be no space between feedtitle: and the following word.

feedlanguage:

If you preface your filter term with feedlanguage:, the filter word is applied against the designated language of the feed as defined by the W3C language codes. For example, feedlanguage:en-US includes articles in a feed that defines its language as the U.S. version of English. Important: Several filters might be necessary to include all derivatives of a language. For example, you might want to use feedlanguage:en-US and feedlanguage:en-US to capture both versions of English. Note there can be no space between feedlanguage: and the following text.

feeddescripton:

If you preface your filter term with feeddescripton:, the filter word is applied against the descrption of the feed itself. All articles within that feed are included. For example, feeddescripton:news includes articles that contains the word "news" in the title. Note there can be no space between feeddescripton: and the following word.

feedlink:

The filter feedlink: enables you to restrict your filter results based on the feed's URL. To do this, use the feedlink:somedomain.com syntax in the filter. While there is no wild card matching, partial matches do apply in this case, so feedlink:somedomain.com matches somedomain.com/whatever/thisfeed.rss. Note there can be no space between feedlink: and the following text.

What should I do if I am still having problems with filtering?

If you have tried the suggestions above and you are still having problems, you might want to read the article Filtering and the Universal Feed Format . Feel free to contact us. We would love to help.

 


How would you rate this article?

 

Rating: 17 user(s) have rated this article Average rating: 3.6
Posted by: Admin, on 12/12/2008, in category "FeedSweep How-To's"
Views: this article has been read 28573 times

DiggDigg It!  Del.icio.usDel.icio.us  RedditReddit  StumbleUponStumbleIt  NewsvineNewsvine  FurlFurl  BlinkListBlinkList