To Boolean or Not

We once heard a brilliant scientist refer to his frustration with Internet searching, equating the process to "sifting through a toxic waste pit." That's graphic enough. And we know he's not alone.

It's indeed a challenge for professionals today - scientists, engineers and environmental managers alike - to quickly find critical information on the Web. Basic word or phrase searches commonly used on search engines usually return hundreds of hits; the majority of these are likely to be irrelevant to your need.

The answer for many, frequently ending hours of frustrating search attempts, is to conduct a Boolean search. A Boolean search - so named for a British mathematician & linguist - is a method that allows you to specify the ordering, grouping and relationships among the keywords and phrases that your Web search contains. It is, however, counterintuitive to the uninitiated. Many common errors result from poor command of Boolean operators.

Helpful Basic Operators

The most commonly used Boolean operators (search terms) are AND, OR and NOT. Using these search terms as well as a few other characters such as ? or * can help you to narrow the breadth of your searches and thus return fewer results that are more likely to contain the information you are searching for.

Let's look more closely at the use of these most common search terms. The AND operator says to return only those results where both keywords joined by the AND operator are contained within the searched document. If only one keyword, but not both were found, the search fails. Only when both keywords are present will a hit result.

Multiple keywords can be entered with each keyword joined by the AND operator, the results evaluation is the same; all keywords must be present within the searched document or the search fails. For example, if a search was entered using the keywords "wastewater" and "disinfection" and they were joined with the AND operator, then all hits containing both those keywords would be returned. If the same keywords were used in a search, but the OR operator were used instead ("wastewater" OR "disinfection") the search would return all hits that contain either or both of the keywords.

To narrow the search even further, use the NOT operator to refine your results. For example, if a search using "wastewater" AND "disinfection" NOT "ultraviolet" were entered, the search would return all hits that contained the words "wastewater" and "disinfection" but do not contain "ultraviolet" within the document.

Although these types of searches can be useful in narrowing the number of hits, most times even more has to be done to return only useful results. That's why more advanced searches such as a proximity search can be very useful. Wildcard searches and searches using the characters ?, *, and ( ) can also be added to enhance search efficiency . Figure 1 depicts what these operators are and what they do when used in a search. Note that other symbols maybe used with the search engine you are using. Please refer to your search engine documentation for more details.

Figure 1

Character or Word

Example

Function

" "

"ultraviolet wastewater disinfection"

Quotes are used for phrase searches. The search will look for the exact words in the order they are placed between the quotes.

AND or &

"ultraviolet & chlorine wastewater disinfection"

Search for all hits that contain both "ultraviolet wastewater disinfection" and "chlorine wastewater disinfection."

OR or |

"ultraviolet | chlorine wastewater disinfection"

Using the symbol | will allow a combination search. It will find all hits that contain either "ultraviolet wastewater disinfection" or "chlorine wastewater disinfection.".

NOT or ~

"ultraviolet ~ chlorine wastewater disinfection"

Search for all hits that contain "ultraviolet wastewater disinfection" but not "chlorine wastewater disinfection."

noteNote: this is known as a Proximity search.

ultraviolet wastewater disinfection

Using brackets will search for those words individually within 100 characters of each other.

?

Nnote: this is known as a fuzzy word search

chl?ro?????

Using a ? allows you to substitute or hold a place for a character in a word. The search engine will find all variations of that word in a document containing the exact number of combined alpha characters and "?" marks. In this example, the words "chloroprene" and "chlorinated" would all be found.

*

noteNote: this is known as a Wildcard search.

Waste* or *chloro*

Using an * within a search will look for the word or series of characters that are placed before or between the asterisks and other words within the context that follow or contain those set of characters. The search Waste* will find "waste,", "wasted,", and "wastewater" (among others) and the search *chloro* would find the word "dichlorobenzene"

Advancing to More Complex Searches

Even more complex and advanced searches can be conducted by using combinations of the forementioned search operators as words and/or characters, or by setting specific parameters to eaffect the evaluation of the search.

The first example is a nested search. You can create a nested search by combining any number of search expressions and operators to create highly refined searches.

Figure 2 shows how a nested search would be constructed where the objective of the search is to return all hits dealing with variations on the phrase "underground storage tank" with a wildcard after "clo" and "tank" or "UST closure" as an AND search. The search result will provide all occurrences of the phrase underground storage tank, tanks, tanker, tankers closure or UST and closure within one document.

A proximity search allows you to adjust search sensitivity by setting various parameters. Compared to a logical search - which finds selected keywords anywhere in a document - the proximity search specifies that the keywords must exist close to each other. This closeness helps to ensure that you get relevant hits, or responses. This closeness is known as span, and is shown either as the number of words, or characters, that can be used to separate the keywords from each other.

Figures 2-5

A simple search would not find "Air Pollution Control Agency" because the span of characters (including the first word "air" and all spaces between the words) from "air" to "agency" is greater than 21. However, a span of 22 would find the phrase in Figure 3.

In addition to a proximity search, you can also conduct an expandable count search. With this type of search, you can tell the search engine to allow some words to be missing from the document, and still the document would appear.

The example in Figure 4 will find any document containing any two of the three words within 100 characters of each other. In other words, it would allow any one of the words to be missing from the document, provided the other two words are found within 100 characters of each other.

There is also a search method known as an expandable proximity search. This type of search allows you to specify the number of words your search term may be separated by, while allowing one of the words to be missing, yet only if the rest of the search term words are found within the specified parameters, or limits.

The example in Figure 5 will allow any one word to be missing from the document, provided that the other words are found within 45 characters of each other. If all three words are found in the document, then any two of them must be within 45 characters of each other.

The phrase, "Permitting authority means the State air pollution control agency, local agency" will be found because "Permitting,"air" and "agency" are found within 45 characters of each other.

In contrast, the phrase, "Attainment dates of national primary and secondary air quality standards submitted on August 8, 1972, by the New Hampshire Air Pollution Control Agency" will not be found because the words "permitting," "air" and "agency" are not found within a 45 character span.

Also, you may find yourself needing to look up a proper name such as George W. Bush, or Apple Computer. To do this, simply capitalize the first letters of each word. The search engine, in this way, will find only pages that contain those terms as a proper name.

When All Else Fails

Consider the following tips offered by reference librarians, those professionals who spend hours helping people recognize and correct searching mistakes.

Spelling errors are the most common at all age and education levels. For the search engine to get you there, you've got to give it the right directions. In cases where the search engine supports fuzzy logic (very few do), the degree of fuzziness can overcome spelling errors. Fuzziness in this case corresponds to the number of misspelled letters that can occur, yet still allow you to find the word you are looking for.

Conceptual errors are the most complex. Many people, embarking on a search, go at it with the general impression that there has to be an answer out there somewhere, even though the question itself may be ill-conceived. All information is not necessarily available, especially in the format you want. Understanding search logic, content, engine search features and systems at the conceptual level takes time and practice.

As you can see, there are smarter ways to conduct online searches. The techniques are flexible, and yet well defined, and also permit the combining of simpler searches into more complex searches (by using nesting and parentheses). What should be clear is that by using these advanced search techniques,techniques you'll get fewer, but more relevant hits. Ultimately, that will save you time, eliminating those lengthy, mind-numbing lists that become such a frustration.

When you do your next search, consider these tips:

  • Tighten your search with any of the Boolean techniques. The more highly defined, the better.
  • If the returning results are too specific, you can always make your search broader to include more types of content.
  • As the techniques become more familiar, your search results will be greatly enhanced. And, if you still have questions, you can reach us, toll free. Perhaps we can help.
  • Be sure you've got three "e's" in "dichlorobenzene."



This article appeared in the March 2001 issue of Environmental Protection, Vol. 12, No. 3, on page 65.

This article originally appeared in the 03/01/2001 issue of Environmental Protection.

comments powered by Disqus