iTnews
  • Home
  • Features
  • Technology
  • Security

Guest column: What ISPs need to consider when it comes to Web filtering

By Wayne Neich
Mar 25 2008 11:11AM
Follow google news

As the evolving Web makes managing appropriate surfing and bandwidth use more difficult, it also introduces new security threats that filtering may be uniquely well suited to address. However, the Web content security role makes inaccurate site rating, poor coverage, and delayed rating of new URLs even more costly.


In the early days of URL filtering, the challenge was getting a large enough population of URLs rated to make it unlikely a student or employee could view objectionable Web pages. Back then, the primary drivers were legal liability and productivity.

Early generation filtering technology covered both bases with varying degrees of adequacy. Vendors chose Web crawlers and site mining as the fastest and easiest means of developing large URL databases, but a high percentage of the sites chosen for rating were so obscure as to be irrelevant to users. This made direct comparisons of “number of URLs rated” difficult.

New content threats provide new opportunities and new challenges for Web filtering. As firewalls and desktop antivirus became ubiquitous, hackers and unethical entrepreneurs found the only remaining open door to be the Web browser. Web content threats are the fastest growing computer danger because most organisations leave ports 80 and 443 open through their firewalls. The browser has become the soft underbelly of network security.

Database “coverage” and classification “accuracy” are the most important factors to effectively enforce appropriate use policy and secure Web content. Without either, policies simply won’t work and users will be vulnerable. To be viable, a Web filtering architecture must be as “future proof” as possible to ensure coverage is optimised for both known and new Web pages. The architecture must also provide a means of accurately reflecting the complexity inherent in contemporary Web pages. Overly simplified rating structures are quickly overwhelmed by millions of unique, and often multidisciplinary, Web sites. These are just a few of the many elements that must be addressed to deliver a high database coverage rate with highly accurate classification.

Database Coverage



Coverage is the ability of a filtering product to identify all websites which should be placed in a given category. Coverage answers the question, “Of 100 websites that were actually category ‘X’ (Pornography, Spyware, Gambling, etc.), how many did the filter actually categorise as ‘X’?” The higher the percentage is, then the greater the filter’s coverage.

To have the best coverage, a web filtering product must be able to:

  • Rate domains (rather than URL or IP address) where appropriate
    An individual domain may have thousands of unique URLs underneath it. New URLs may be added under these domains daily, or in some cases, by the minute. For homogenous domains, there are coverage and performance advantages to rating the domain instead of the URL or IP. By rating the domain, all new URLs added under that domain are instantly covered. This also requires less space in the database, which improves overall performance.


  • Categorise websites by IP address, as well as by URL as appropriate
    Websites are accessed not only via URL, but also via IP address. Although this sounds simplistic, not all filtering products are able to categorise both. Some early generation filtering products attempt to infer ratings for requested IP addresses from known URLs by using reverse-DNS lookups, but this is slow and unreliable.


  • Rate sites harvested primarily from user requests
    Another measure of coverage quality is the relevance of a filtering database. No vendor can rate all 16 billion+ web pages on the Web, and it’s not necessary to do so. A large percentage of those pages are defunct or so obscure that including a rating adds no value. They are not relevant for policy enforcement, yet do add a performance cost and hence should be avoided.


  • Transparently pull updates on demand
    Being able to pull new ratings on demand as needed provides better real-time coverage than frequently pushing batches of recent URL ratings to the local copy of the filtering data base. Automated pulling checks for up-to-the-second ratings of the specific Web page being accessed. In contrast, pushing updates at intervals is more likely to result in missing a relevant web site. Frequent pushes use more bandwidth for thousands of sub-optimal refreshes per month, most of which are pages users in your organisation will not see on a given day.


  • Categorise new or unrated Web sites on the fly
    Tens of millions of new pages are created each month, and approximately 30,000 new pornographic pages a day. Web crawlers and data mining are prone to finding irrelevant pages, and such a “boil the ocean” approach finds new pages too slowly. High coverage requires the ability to rate new pages in real time, at the moment a user accesses the page. This is a compliment to the strategy of rating only sites users actually visit.


  • Include relevant categories from a policy enforcement standpoint
    Early generation filtering products often inflated their reported coverage rates by creating meaningless catch-all or miscellaneous categories. This also inflated their number of categories, but added no value for policy enforcement.


  • Recognise and categorise websites in a wide range of languages
    The Internet is a global tool, and used by enterprises and organisations with offices worldwide. Therefore, the ability to categorise web pages and sites across a broad set of languages is critical for web filtering solutions.


Classification Accuracy



Accuracy is the ability of a filtering product to precisely and consistently categorise sites. Accuracy answers the question, “Of the 100 websites the filter categorised as ‘X’ (Pornography, Spyware, Gambling, etc.), how many actually were ‘X’?” The higher the percentage, the greater the filter’s accuracy.

To achieve the highest accuracy, a web filtering product must be able to:

  • Accurately categorise the sites users are ultimately attempting to access
    Users can bypass early generation URL filtering through several widely-known techniques. All of these techniques use an intermediary Web page which pulls content that a user selects from an entirely different kind or category of Web page. Early generation filtering only “sees” the intermediary page, rather than the true destination content. Early generation filtering technology often only has a superficial rating, but this is not helpful for a policy.


  • Place websites in multiple categories, as necessary
    Web pages do not always fit easily into a single category. An accurate web filter would recognise this and classify the site into both of these categories, as many enterprises will allow access to sports sites, but block access to gambling sites altogether.


  • Categorise subdirectories, as well as top-level domains
    An accurate web filtering product should recognise sites that host home pages for users, and categorise the actual content on each specific URL.


Performance



  • Process rating requests “on proxy”
    To minimise impact on user productivity, and scale to the needs of large enterprises, a content filtering solution must be efficiently architected to deliver very high performance. Some commodity operating systems are inherently slower at processing rating requests. Common configurations, such as hosting the filtering intelligence in pass-by mode off-box, are inherently slow.


  • Include IP ratings locally
    Some early generation filtering systems attempt to provide ratings for the IP version of URLs in the database by performing a reverse-DNS lookup whenever just the IP is requested. However, this adds considerable latency to processing the rating request. Frequently, requests are handled so slowly an error message is returned instead of a rating. Such short-cuts only benefit the filtering vendor, not the user.


The nature of Web traffic and browsing habits has evolved far beyond early generation URL filtering architectures. Enforcing appropriate use policy and providing robust Web content security requires a truly dynamic filtering solution. Further, IT receives the visibility and control necessary to keep up with future challenges and opportunities.

Wayne Neich is the Country Manager of Blue Coat Systems, Australian and New Zealand.

Add iTnews as your trusted source

Add iTnews As Your Trusted Source Add iTnews As Your Trusted Source
Got a news tip for our journalists? Share it with us anonymously here.
Tags:
filteringinternetispsnetworkingprivacysecurityweb

Related Articles

  • Meta accuses NSO Group of violating court order by WhatsApp spear phishing Meta accuses NSO Group of violating court order by WhatsApp spear phishing
  • Researchers build self-replicating AI worm with BYO LLM Researchers build self-replicating AI worm with BYO LLM
  • Anthropic opens Claude Mythos Preview AI program to Australia Anthropic opens Claude Mythos Preview AI program to Australia
  • Defence says Palantir is "sandboxed" in its environment Defence says Palantir is "sandboxed" in its environment
Join our WhatsApp Channel

Partner Content

Take control of your connectivity with Telstra’s Adaptive Networks Centre
Partner Content Take control of your connectivity with Telstra’s Adaptive Networks Centre
Onel Consulting Strengthens Its White-Glove Services With Strategic COO Appointment
Promoted Content Onel Consulting Strengthens Its White-Glove Services With Strategic COO Appointment
From test case to control tower: How DXC and ServiceNow are governing enterprise AI at scale
Promoted Content From test case to control tower: How DXC and ServiceNow are governing enterprise AI at scale
Scalable AI solutions: secure delivery
Scalable AI solutions: secure delivery

Sponsored Whitepapers

Agile in the AI Era: why projects still fail
Agile in the AI Era: why projects still fail
When Technology Becomes the Blocker: Unlocking Real Outcomes from AI and Cloud
When Technology Becomes the Blocker: Unlocking Real Outcomes from AI and Cloud
High-volume data sources for AI-driven security analytics
High-volume data sources for AI-driven security analytics
How healthcare organisations can get more value from cloud
How healthcare organisations can get more value from cloud
1 in 3 companies lose SaaS data. Here’s how to prevent it
1 in 3 companies lose SaaS data. Here’s how to prevent it

Events

  • iTnews State of Security Breakfast iTnews State of Security Breakfast
  • iTnews State of Data & AI Breakfast iTnews State of Data & AI Breakfast
  • The 2026 iAwards The 2026 iAwards
  • Integrate 2026 Integrate 2026
  • Security Exhibition & Conference Security Exhibition & Conference
Share on Facebook Share on LinkedIn Share on Whatsapp Email A Friend

Most Read Articles

Anthropic opens Claude Mythos Preview AI program to Australia

Anthropic opens Claude Mythos Preview AI program to Australia

Defence says Palantir is "sandboxed" in its environment

Defence says Palantir is "sandboxed" in its environment

Services Australia describes fraud, debt-related machine learning use cases

Services Australia describes fraud, debt-related machine learning use cases

Microsoft backs down on legal threats against 0day disclosing researchers

Microsoft backs down on legal threats against 0day disclosing researchers

techpartner.news logo
Sydney-based AI-cloud waste startup raises $3m
Sydney-based AI-cloud waste startup raises $3m
Brennan uses NiCE to modernise its contact centre
Brennan uses NiCE to modernise its contact centre
Impact Awards: Tecala slashes customer response times for fintech IQumulate
Impact Awards: Tecala slashes customer response times for fintech IQumulate
Interactive introduces private cloud platform
Interactive introduces private cloud platform
Digital61 expands cybersecurity portfolio
Digital61 expands cybersecurity portfolio
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in any form without prior authorisation.
Your use of this website constitutes acceptance of nextmedia's Privacy Policy and Terms & Conditions.