DATA ANALYSIS IN MARKETING RESEARCH
MARKETING RESEARCH PROCEDURE
Marketing
research is undertaken in order to improve the understanding about a marketing
situation or problem and consequently improve the quality of decision-making
related to it. The usefulness of the marketing research output will depend upon
the way the research has been designed and implemented at each stage of the process.
There are five steps in every marketing research process:
C) FIELD WORK
D) DATA ANALYSIS
D) DATA ANALYSIS:
After
you have collected the data, you need to process, organise and arrange it in a
format that makes it easy to understand and directly helps the decision-making
process. Raw data has to be processed and analysed to obtain information. There
are three phases for analysing the data:
a) Classifying the raw data in a more orderly manner;
b) Summarising the data;
c) Applying analytical methods to manipulate the data to highlight
their inter-relationship and quantitative significance.
a) Classifying the raw data: The
most commonly used classification in marketing research are quantitative,
qualitative, chronological and geographical.
Quantitative: In this
classification, data is classified by a numerical measure such as number of
respondents in each market segment, number of years employed, number of family
members, number of units consumed, number of brands stocked or some such
numerical characteristic.
Qualitative: In this
classification, the data is classified by some non-numerical attribute such as
type of occupation, type of family structure (nucleus, or joint family), type
of retail outlet (speciality, general merchant, department store etc.).
Chronological classification is that
in which data is classified according to the time when the event occurred.
In
the geographical classification the data is classified by location which may
either be a country, state, region, city, village, etc.
Summarising the data: The first step in
summarising the data is the tabulation. Individual observations or data are
placed in a suitable classification in which they occur and then counted. Thus
we know the number of times or the frequency with which a particular data
occurs. Such tabulation leads to a frequency distribution as illustrated in
Table 1.
The frequency distribution may involve a single
variable as in Table 1 or it may involve two or more variables which is known
as cross-classification or cross-tabulation.
The frequency
distribution presented per se may not yield any specific result or inference. What we want is
a single, condensed representative figure which will help us to make useful
inferences about the data and also provide yardstick for comparing different
sets of data. Measures of average or central tendency provide one such
yardstick. The three types of averages are the mode, median and mean.
Mode: The mode is the central value or item that occurs most
frequently. When the data is organised as a frequency distribution the mode is
that category which has the maximum number of observations. (in the 121 - 140
category in Table 1). A shopkeeper ordering fresh stock of shoes for the season
would make use of the mode to determine the size which is most frequently sold.
The advantage of mode is that it is easy to compute, is not affected by extreme
values in the frequency distribution and is representative if the observations
are clustered at one particular value or class.
Median: The median is that item which lies exactly half-way between the
lowest and highest value when the data is arranged in an ascending or.
descending order. It is not affected by the value of the observation but by the
number of observations. Suppose you have the data on monthly income of house
holds in a particular area. The median value would give you that monthly income
which divides the number of households into two equal parts. Fifty per cent of
all households have a monthly income above the median value and fifty per cent
of household have a monthly income below the median income.
Mean: The mean is the common
arithmetic average. It is computed by dividing the sum of the values of the
observations by the number of items observed. A firm wants to introduce a new
packing of sliced bread aimed at the customer segment of small nucleus families
of four members. It wishes to introduce the concept of a `single-day pack',
i.e., a pack which contains only that number of bread slices that is usually
eaten in a single day. This strategy would help to keep the price of the pack
well within the family's limited budget. The firm has many opinions on the
ideal number of slices that the pack should contain - ranging from three to as
high as twelve. The firm decides to hire a professional marketing agency to
conduct market research and recommend the number or bread slices it should
pack.
The research agency goes about the task in
two steps. In the first step, it randomly chooses five families (who are
consumers of bread) in each of the four colonies in the city. These families
are asked to maintain for one week a record of the exact number of slices they
consumed each day. From this data, the agency calculates the average (or mean)
number of bread slices eaten per family per day. There would be twenty such
mean values (5 families in 4 colonies each; sample size 20). In the second
step, from these mean values, the model value would provide the answer to the
number of bread slices to be packed in each pack.
The mode in this
frequency distribution is 8. Eight slices is the most commonly occurring
consumption pattern. The agency's recommendation is to pack eight bread slices
in the single-day pack.
The mean, mode and
median are measures of central tendency or average. They measure the most
typical value around which most values in the distribution tend to converge.
However, there are always extreme values in each distribution. These extreme
values indicate the spread or the dispersion of the distribution. To make a valid marketing decision you need
not only the measures of central tendency but also relevant measures of
dispersion. Measures of dispersion would tell you the number of values which
are substantially different from the mean, median or mode. If the number of
observations at the extreme values is large enough to form a substantial
number, it indicates an opportunity for market segmentation. In the earlier
example or bread if in a larger sample, you find that the number of households
who consume three slices per day is also substantially large, the firm may find
it worthwhile to introduce a 3-slice pack for light bread consumers. Such
variations from the central tendency can be found by using measures of
dispersion. The two commonly used measures of dispersion are range and standard
deviation.
Range: The range is the
difference between the largest and smallest observed value. Using the data in
step I in the bread illustration, the largest observed value is 6 and the
smallest observed value is 2, therefore the range is 4. The smaller the figure
of range, the more compact and homogenous is the distribution.
Variance and standard deviation: These two measures of dispersion are based on the deviations
from the mean. The variance is the average of the squared deviations of the
observations values from the mean of the distribution. Standard deviation is
the square root of the variance. The standard deviation is used to compare
two samples which have the same mean. The distribution with the smaller
standard deviation is more homogenous.
Selecting analytical methods: Besides
having a summary of the data, the marketing manager also would like information
on inter-relationships between variables and the qualitative aspects of the
variables.
Correlation: Correlation
coefficient measures the degree to which the change in one variable (the
dependent variable) is associated with change in the other variable
(independent one). As a marketing manager, you would like to know if there
is any relation between the amount of money you spend on advertising and the
sales you achieve. Sales are the dependent variable and advertising budget
is the independent variable. Correlation coefficient, in this case, would tell
you the extent of relationship between these two variables, whether the
relationship is directly proportional (increase or decrease in advertising is
associated with increase or decrease in advertising) or it as an inverse
relationship (increase in advertising is associated with decrease in sales and
vice versa) or there is no relationship between the two variables. However, it
is important to note that correlation coefficient does not indicate a causal
relationship. Sales is not a direct result of advertising alone, there are many
other factors which affect sale. Correlation only indicates that there is some
kind of association - whether it is casual or casual can be determined only
after further investigation. You may find a correlation between the height
of your salesmen and the sales, but obviously it is of no significance. In
1970, NCAER (National Council of Applied and Economic Research) predicted the
annual stock of scooters using a regression model in which real personal
disposable income and relative weighted price index of scooters were used as
independent variables.
Regression Analysis: For determining casual
relationship between two variables you may use regression analysis. Using
this technique you can predict the dependent variables on the basis of the
independent variables.
So
far we have considered relationship only between two variables for which
correlation and regression analysis are suitable techniques. But in reality you
would rarely find a one-to-one casual relationship, rather you would find that
the dependent variables are affected by a number of independent variables. Sales
is affected by the advertising budget, the media plan, the content of the
advertisements, number of salesmen, price of the product, efficiency of the distribution
network and a host of other variables. For determining casual relationship
involving two or more variables, multi-variate statistical techniques
are applicable. The most important of these are the multiple regression
analysis, discriminant analysis and factor analysis.
Multiple regression analysis is
a variation of the regression analysis technique discussed above. The
difference is that instead of considering one you may have two or more than two
independent variables.
Discriminant analysis: In our discussion of
dependent and independent variables, we have so far taken sale as the dependent
variable. Sale is expressed in a numerical form. But not all dependent
marketing variables can be expressed in numbers. Suppose you want to find out
the reasons for customers brand preference for Thums Up vs. coca cola. In this
case, the dependent variable, the brand, is not numerical in nature. A company
is planning to introduce a new brand of detergent bar in the market and wants
to find out the consumer traits associated with detergent bar as compared to
detergent powder. This information would help the company focus its advertising
strategy to exploit such associated traits. Several studies, aiming to
discriminate between users and non-users of a particular brand of a product
have been carried out. In one such study for a popular brand of Shirt, it was
found that significant differences in the personality traits could determine
between users and non-users.
Factor analysis:
The multiple regression technique is based
on the idea that you use truly independent variables. These variables are
neither influenced by the dependent variable nor are they influenced by other
independent variables. But in real life situations, there are many independent
variables which are influenced by other independent variables, i.e. these
independent variables have a high inter-correlation. You may find such an
inter-correlation between the dealer discount structure and the ‘push' which
the dealer provides to your product. Factor analysis is a statistical procedure
which tries to determine a few basic factors that may underline and explain the
inter-correlation among a large number of variables.
Statistical
inference: These procedures
involve the use of sample data to make inferences about the population. The three
approaches used here are: estimates of population values, hypotheses
about population values and tests of association between values in the
population. Statistical inference as an analytical tool for marketing
decisions is gaining wide acceptance.
0 comments:
Post a Comment