This wide- and large- screen layout may not work quite right without Javascript.

Maybe enable Javascript, then try again.

Home PC Administration

WorldWideWeb Filtering

This webpage is an archive from 2009 (or even earlier). I no longer help to administer the computers at the Ipswich Middle/High School as a parent volunteer, as my kids outgrew our public schools several years ago.

Although Ipswich Middle/High School (IMHS) tries to not restrict student uses of the web too much, IMHS must do at least a little bit of filtering. IMHS must comply with the CIPA law, which more or less boils down to preventing access to pornography. Second, in order to keep teachers happy, IMHS also filters out sites whose principal purpose is to wastespend time. This especially means filtering out twitch and other mindless games, typically including almost all Flash-based games. Third, in order to keep students out of trouble, IMHS filters out sites that make it very easy to do something illegal or unwise such as online gambling. And fourth, in order to keep students focused on education rather than non-educational uses of the web, IMHS filters out social networking sites.

IMHS' web filtering software runs directly on their firewall computer, with the whole internal LAN on one side and the world wide web WAN on the other. As the filter runs in parallel with other firewall limitations rather than in series with it, IMHS avoids maintenance problems where the filter would allow something yet the firewall continues to stop it.

The Open Source Software IMHS uses —DansGuardian (and Squid)— is covered widely. (The back half of DansGuardian is slaved to some other proxy software, typically Squid. When used this way, most of the proxy software's possible standalone capabilities are ignored by DansGuardian and are not easily usable.) Information sources include:

(The DansGuardian wiki is often an especially good source for current information and information related to the way IMHS uses the software, and the DansGuardian Wiki FAQ includes a lot of more general information.)

IMHS uses the transparent-intercepting configuration family. By simply intercepting all traffic directed to port 80, IMHS gains several advantages. IMHS filters all uses of the web from their network, regardless of whether it comes from one of their computers or from a computer brought in from outside. IMHS makes it easy to use different browsers —even multiple browsers from the same computer— as no browser proxy settings at all are needed. And IMHS makes it easy for some computers (usually those assigned to teachers) to move back and forth from a home network to the school network without having to change any settings.

Of course using the transparent-intercepting configuration family has disadvantages too. One is that only port 80 (not even the https port 443) can be filtered. IMHS overcomes that disadvantage by using Shorewall/IPtables to filter the other ports including the https port 443. Another is that IMHS cannot reliably figure out which username is using which computer. IMHS overcomes that disadvantage by using IPaddresses instead of usernames as identifiers. (IMHS have arranged that their DHCP never reuses an IPaddress, so IMHS can do this reliably.)

DansGuardian is a web content filter. Content filtering is an excellent approach for younger schoolchildren, but may not be appropriate at the high school level. So for a time DansGuardian was reconfigured so that during the schoolday it acted as a pure URI filter. DansGuardian reverted to its usual content filter configuration at night while the entire day's activity was replayed. In other words the DansGuardian content filtering capability was used as a review mechanism, pointing out problematic URIs so they could be added to the pure URI filter configuration immediately. This review process enabled keeping very close tabs on student web activity even while minimizing interference with web traffic.

Additionally, for a while filtering of search terms was implemented. Filtering of search terms is not directly implemented by DansGuardian (or any other known filter), and was technically rather difficult. Implementing search term filtering ultimately required both source code changes to DansGuardian and creation of special tools, and often resulted in exceedingly large (over 1000 characters) regular expressions. Even more problematic was that since no one else was doing anything similar, there were no example tool skeletons to follow and no advice was available. Even without any input from anyone else, some rules of thumb became obvious:

Lessons Learned

Without a plethora of excellent management tools, filtering of search terms could easily result in erroneous yet unrealized massive overblocking. Search term filtering, which may or may not make sense with pure URI filtering, is clearly unnecessary with content filtering. So when the review strategy involving pure URI filtering was abandoned, search term filtering was abandoned as well.

The difficulty of maintaining the configuration and operation of a really good filter may be so high it's not worth it. Taking other considerations into account, a less extensive filtering technology might be more appropriate. Other filtering technologies include OpenDNS and Squid host filtering. (OpenDNS is an alternative name service that filters out access to certain sites by providing a bogus IPaddress for reaching them. Its shared filter is maintained by the combined efforts of all its users, and so is much more extensive than one individual could do no matter how dedicated.)

Photo filtering is a very difficult technological problem. Many available solutions not only greatly degrade performance, but even then deliver huge percentages of both false positives and false negatives.

Filtering for different ages is so different that not only different standards but even different technologies should be used. For elementary ages, a few false positives are acceptable but even one false negative is catastrophic, and the kids are largely unaware of the filter's existence. For high school ages on the other hand, a very few false negatives are acceptable but false positives are a significant issue, and many of the kids actively attempt to subvert the filter.

Location: (N) 42.680943, (W) -70.839384
 (North America> USA> Massachusetts> Boston> Metro North> Ipswich)

Email comments to Chuck Kollars
Time: UTC-5 (USA Eastern Time Zone)
 (UTC-4 summertime --"daylight saving time")

Peruse Chuck Kollars' Facebook Profile

All content on this Personal Website (including text, photographs, audio files, and any other original works), unless otherwise noted on individual webpages, are available to anyone for re-use (reproduction, modification, derivation, distribution, etc.) for any non-commercial purpose under a Creative Commons License.