Focus on the Household

Geodemographics: Focus on the Household (4)

  • by Jock Bickert, November/December 1995

For centuries, the human race has tried to make sense of the millions of behavioral variants of its members. Why on earth do friends, lovers, family members, business acquaintances, and enemies behave the way they do? In the search for some consistency in both explaining and predicting behavior, people have turned to typologies for assistance. If you can put someone in a category whose members behave reliably, you’ve gone a long way toward making some sense of that person’s behavior. Or so goes the reasoning.

Marketers, in their continual need to explain and predict consumer behavior, have turned to typologies, or segmentation systems, for assistance. Their prayers appeared to be answered in the 1970s with the advent of the first geodemographic systems: PRIZM from the Claritas Corporation and ACORN from CACI. These systems benefited from an increase in computing power that took advantage of the relatively new statistical process known as “cluster analysis.”

In their seminal 1970 book, Cluster Analysis, co-authors Daniel Bailey and William Tryon had applied the technique to four decades of U.S. census data from the San Francisco Bay Area. They “clustered” the census tracts from the Bay Area, basing that cluster analysis on variables such as socioeconomic status, employment, condition of housing, etc. To their astonishment, when they looked at 40 years of varied election results within those clusters, they discovered that election behavior remained identical for those clusters of tracts over the four decades. That meant that even though new families moved in and out of those tracts, the aggregate political behavior of the tracts stayed the same. In other words, not only did birds of a feather flock together, but successive generations of those birds flocked in similar fashion.

The commercial geodemographic systems quickly became the staple of mainstream marketers seeking consistency in consumer behavior. These systems have proved successful in store location analyses, in directing media buys, and in locating pockets of new customers. But they have been arguably less successful in providing significant response “lift” in many direct-mail applications. Therefore, the direct-marketing industry has been bereft of marketing typologies, aside from limited survey-generated groups developed by agencies such as Stone & Adler, Grey Direct, and O&M Direct. Those systems shared a common limitation: They identified types of mail-order buyers, but gave the potential user no way to locate individual prospects who belonged in those desirable mail-order categories.

In short, while it was helpful to know where to find a specific flock, what direct marketers really needed was a way to get to know each bird on an individual basis. Because they are based on data from the Census Bureau, which is prohibited from releasing information about individuals, the neighborhood-level systems can only offer generalizations and suppositions.

But the census is no longer the only powerful repository of information about American households. The late 1980s and early 1990s saw the proliferation of large, data-rich, household-level databases in the direct-marketing industry. As a result, they have achieved nearly equal coverage and, in several instances, have assembled databases richer in relevant data than the census.

These private sector data also tend to be more reliable than the information available from the census. Because the census does not release information on individual households–only aggregate data for a given unit of geography (e.g., census tracts or block groups)–systems that use census data as building blocks can only estimate characteristics of a given household from the aggregate of that household’s census geography. Therefore, if a block group is reported as having a median income of $53,147, every household in that block group is assigned an income of $53,147. In today’s increasingly diverse environment, a household with an annual family income imputed to be $53,147 may actually be as low as $25,000 or as high as $100,000.

Although household-based databases have existed for 10 to 15 years, the industry has been slow to derive segmentation typologies from those databases. Instead, segmentation processes have consisted of custom modeling in which individual data items were used to predict direct-response behavior. The models have been successful in generating response lift, but because they are customized to a specific mailer as well as a specific mailing, they are not useful in predicting response across mailers, or even across mailings.

In the last several years, three major household-based segmentation systems have appeared within the direct-response industry. They are DNA, from Metromail and Fair Isaac; Niches, from The Polk Company; and Cohorts II, from Looking Glass, Inc., in concert with what used to be National Demographics & Lifestyles and is now NDL/The Polk Company.

The DNA system actually consists of two elements: DNA Demographic and DNA Lifestyle. Drawn from data on 77 million households, DNA Demographic is life stage-based, beginning with ten age groups. Each age group consists of 6 to 14 Cells. Each Cell is composed of households with similar demographic characteristics. Those 104 Cells have then been aggregated into 25 “Super Cells” in three broad age bands: 20-34, 35-54, and 55+.

DNA Lifestyle assigns households to 1 of 100 Cells based on lifestyle and behavioral information from Metromail’s 25 million survey respondents in its BehaviorBank file. This classification ignores demographic characteristics and focuses instead on behavioral information. The system assigns households to one of 100 Cells, which also have been aggregated into 25 Super Cells. Users can combine DNA Lifestyle with DNA Demographic or with specific individual data.

Data from Mediamark Research have been appended to each Cell in order to enhance the clustering. Those data include magazine and newspaper readership, vehicle ownership, and product usage. Each Cell has been given a four-digit number, with the first two digits indicating the age range and the last two digits being an income indicator. For example, Cell 3001 indicates the most affluent group in the 30-to-34 age group.

The DNA system is definitely positioned as a tool for direct marketers, with its greatest touted strength being prospect identification in direct-marketing applications.

Niches, from The Polk Company, represents a marriage of household-based segmentation and geodemography. The system begins by identifying three dimensions that appear to differentiate individual households in the 80 million-name Polk database: (1) needs, as measured by life-cycle stages (critical definers being age of adults in the household, number of children, and children’s ages); (2) buying power, as measured by wealth factors such as income, dwelling type, and homeownership; and (3) spending patterns, as measured by mail response frequency, credit-card usage, and new car and truck purchases.

The combination of those dimensions produced 108 SuperNiches (groups) which were then cluster analyzed to produce 26 Niches for broader applications. The Niches system employs clever nomenclature, capitalizing on the correspondence between number of Niches and letters of the alphabet. The designations A to Z indicate decreasing affluence. For example, the first five Niches (Already Affluent, Big Spender Parents, Cash-to-Carry, Diamonds-to-Go, and Easy Street) are the most affluent (i.e., incomes greater than $75,000 a year). The last seven Niches have annual incomes less than $20,000, ending with the Young-at-Heart and Zero Mobility.

The descriptions of the groups have also been enhanced by overlaying the system with data from Mediamark, so users know that the Easy Street category (affluent families over age 65) have a greater-than-average propensity to own home computers and engage in credit-card spending.

Niches has been used as a shortcut to expensive custom segmentation techniques in direct-mail prospect identification. In enhancing customer files with Niches categories, Polk’s TotaList Network database is used on a match basis. For unmatched records, GeoNiches is used to fill in by applying a geodemographic extension, using either a census or postal geographic unit.

Although performance results have not been made public, Polk officials point to a number of case studies in the publishing, financial services, member recruitment, and travel industries. In those instances, Niches has outperformed traditional geodemographic systems and has shown only slightly less power in head-to-head comparisons with harder-to-use and more costly regression techniques.

A recent entry in the household segmentation arena has been Cohorts II, a consumer typology developed by Looking Glass, Inc. using the household-level demographic and lifestyle data in the 35-million household Lifestyle Selector database. Unlike either DNA or Niches, Cohorts uses only self-reported data rather than a combination of self-reporting and imputation from other sources.

Seven demographic variables (gender, marital status, income, occupation, home- ownership, and presence and ages of children) and 75 activities and interests (skiing, golf, foreign travel, gourmet cooking, political activity, etc.) were cluster analyzed. This resulted in 27 homogeneous clusters: 11 married groups, 8 groups of single women, and 8 groups of single men.

The initial analysis revealed that not all households can be conveniently assigned to a cohesive cluster. There are grandmothers who ride motorcycles, male hunters who crochet, and sexagenarians who are raising second families. These people defy categorization and have been lumped into a potpourri group known as the “Omegas.” Nearly 9 percent of all U.S. households are Omegas.

The development process also uncovered a correlation between cluster membership and given names. In the 35 million-name database, there were many names that appeared with unusual frequency in only one cluster. For that reason, all of the clusters were given high-indexing first names, resulting in titles like “Jules & Roz” (affluent and physically active urbanites with children), “Denise” (single mothers on a tight budget), and “Elmer” (very sedentary older men). Even the unclassifiable Omegas had a distinguishing characteristic: that group contained all of the classic Greek and Roman mythological names, like Ulysses, Aphrodite, Hercules, and Apollo.

The Cohorts II typology is used to (1) identify true, existing market segments; (2) locate underpenetrated but potentially profitable market segments; (3) evaluate market potential in small areas of geography; (4) identify specific prospect households within targeted segments using The Lifestyle Selector; and (5) develop appropriate advertising and marketing communications.

For decades, the mailing list subsegment of the direct-marketing industry has been the most pragmatic of businesses. Armed with the marketing accountability of measurable response rates, many direct-marketers have been relatively uninterested in seeking explanations for direct mail behavior. The statement: “Just tell me what works; don’t bother me with why it works,” has been industry gospel for years. However, as economic pressures and mountainous direct-mail volumes focus attention on eking out marginally higher response rates, direct marketers are admitting they need to mail “smarter.”

But mailing smarter means understanding customers and prospects better. Sophisticated response models may lift response rates, but they rarely provide insights into why direct-mail recipients behave the way they do. Segmentation systems offer the promise of understanding. Tryon and Bailey reasoned that typologies are desirable because “since a particular type includes many individuals better understood than if no such cumulative information were available.” In other words, knowledge of the many provides insights into the one. If mailers were able to pinpoint the mail order behavior of any cluster or segment–be they 3502, Big Spending Parents, or Buddy & Carole–that behavior should be consistent across mailers, as well as across mailings.

Household-based segmentation systems should also provide the direct-marketing industry with the solution to another of its historical shortcomings. Lester Wunderman, in a keynote address to the Direct Marketing Association’s Fall Conference in 1993, chided the industry for not exploiting one of its strengths. “We (direct marketers) segregate customer groups or database groups to whom we address the same message,” he observed. “That’s not individualized or personalized marketing. What that is is mini-mass marketing.” Cohesive market typologies should allow marketers to craft messages and communications packages that are tailored to the idiosyncrasies of each segment. Marketers can speak the language of 3502 or Buddy & Carole and communicate with them differently than if they were talking to 4504 or Jules & Roz.

Household systems do not come without blemishes. One is the problem of “decay,” where household data are not updated when significant lifestage changes occur. Some household systems avoid that problem by using only records of a certain age; e.g., no older than 18 months. As older names drop off one end, the database is replenished by new names coming in.

Another drawback is the potential for inaccurate data. That, obviously, is a problem with any database, particularly those that rely heavily on imputation from secondary sources. Self-reported data are not free from bugs, of course; individuals can always misstate vital information about themselves.

Perhaps the biggest bugaboo of the new systems is the failure to match 100 percent of a user’s records. No system provides total coverage, just as the U.S. census fails to reach all households. And lack of total coverage certainly does not diminish the explanatory power of such household systems–provided, of course, that their coverage is unbiased.

Arguments will continue to be made for the merits of both types of systems. The resolution will be found in the marketing laboratory, as practitioners experiment to see which system best meets their individual needs.

More Info

  • Cohorts II Looking Glass (303) 893-8600; contact: Jo Moniak
  • DNA Metromail (800) 523-7022
  • Niches The Polk Company (800) 635-5522
  • Return to list of geodemographics articles.