Employee surveys are powerful tools and we sometimes get asked ‘how do we know if our survey results are valid?’. The question sounds reasonable and simple, but the subject is actually quite complicated. In fact validity is only part of the picture and there are several things to consider. This post is intended to clarify some of the key concepts when it comes to employee survey validity.
In this post:
Surveys are not psychometric instruments
Employee surveys are not generally considered to be ‘tests’. They are not intended to prove ‘efficacy’ or be a reliable measure of something such as personality.
Employee surveys are simply about seeking feedback from people about how they feel about working for your organisation. Who are we to say whether or not those opinions are valid?
Having said that, there are things that should be done to ensure your employee survey can be relied upon enough for you to be confident you can take action based on the results.
The three key concepts – validity, reliability and significance
Survey validity is only part of the picture. The three concepts to be aware of are:
- Validity – does your survey measure what it is supposed to measure?
- Reliability – does the survey produce the same results when repeated?
- Significance – is the difference between two sets of results different?
The different types of survey validity
Survey Validity refers to whether a survey accurately measures what it is intended to measure. For instance, if you’re running an employee engagement survey, validity ensures the questions are measuring employee engagement – not something else like IT training effectiveness.
There are a few types of validity that are important to consider:
- Content Validity: Does the survey cover all the relevant topics? For example, a survey asking about employee engagement should address key aspects such as leadership, culture, wellbeing, job satisfaction etc.
- Construct Validity: Are the questions measuring the concepts they claim to measure? If you’re asking about stress levels, ensure that the wording clearly reflects the intention, avoiding ambiguity.
- Criterion Validity: This assesses how well one measure predicts an outcome based on another measure. For example, does the employee engagement score correlate with employee turnover rates?
How to ensure survey validity
- Expert Design: Having an expert in employee survey design your survey questions ensures that they are designed to measure the right concepts.
- Pilot Testing: Before rolling out the survey to the entire workforce, you could pilot it with a small group to check if employees interpret the questions correctly.
- Clear and Concise Questions: Avoid complex or leading questions. Make sure each question addresses one concept at a time and is easy for everyone to understand.
- Frequent Updates: Employee attitudes and company dynamics change over time. Regularly review and update your surveys to ensure they remain relevant.
The different types of survey reliability
Reliability is about consistency. If the same group of employees were to take the survey again under the same conditions, would the results be the same? If yes, the survey is reliable.
There are a few common ways to measure reliability:
- Test-Retest Reliability: The way to establish this is to administer the same survey to the same group of people at two different times and check if the results are consistent. Of course, the purpose of repeating an employee survey is that you hope the results will have changed, which makes test-retest reliability a little redundant!
- Internal Consistency: This checks if questions measuring the same concept give consistent results. A common statistical method for this is Cronbach’s Alpha, which provides a score between 0 and 1. A score above 0.7 is generally considered acceptable for internal consistency.
- Inter-Rater Reliability: If multiple people assess the same employees or concepts, inter-rater reliability checks if these different assessments are consistent. This is also redundant for employee surveys as employees are not providing an assessment of a concept, they are providing their opinion on how they feel about it – your benefits might be best in class, but if some people are unhappy with them, they will still score negatively.
How to ensure reliability in employee surveys
- Standardised Questions: How you design the question is an essential part of employee survey design. Standardisation reduces confusion and increases reliability. Avoid using different wording or formats for the same question in different parts of the survey.
- Appropriate Response Scales: Make sure response options are clear and consistent across questions. For example, using a Likert scale in the same format (Strongly Agree to Strongly Disagree) throughout the survey can provide consistency.
Remember that a survey can be reliable without being valid – you could get consistent results for questions that don’t actually measure what you need – but for the survey to be trusted, it should be both valid and reliable.
How to measure statistical significance
Statistical significance helps determine whether the results you’ve obtained from your employee survey are due to chance or represent a true reflection of employee opinions.
If your results are statistically significant, you can be confident that the patterns you observe reflect your entire workforce, not just the group that responded.
- Sample Size: The larger the sample, the more confident you can be in your results. We recommend inviting all of your employees to take part in the survey rather than using only a sample. The reality is that not everyone will respond to the survey so inviting everyone maximises the chance of getting the highest number of responses.
- Sample Makeup: Another aspect of sample size is how representative it is. After the survey closes it can also be useful to calculate the proportion of people who respond from different groups. For example, if 70% of your employees respond to the survey, but only 10% of those are female it could indicate that the results are not representative of how females feel. It can be useful to break down response rates by different demographic groups to explore the proportion of people who respond in each group.
- Statistical tests: Once you have your survey data it is possible to use statistical testing to investigate how much your data is significant. The most common measure to use is something called a p-value.
Using p-values to determine statistical significance
In statistics a p-value is a number that helps you understand whether the results of your survey or experiment happened by chance or are likely to represent a real effect. It gives you a way to test the strength of your data and see if your findings are significant.
In simple terms, the p-value tells you how likely it is that the results you observed could have occurred randomly. If the p-value is very low, it means it’s unlikely that your results happened by chance, suggesting that there’s something meaningful in your data.
- A low p-value (typically less than 0.05) means the results are statistically significant. In other words, there’s less than a 5% chance that the results are due to random variation, meaning they likely reflect what’s happening in the entire population.
- A high p-value (greater than 0.05) suggests that the results might have happened by chance, and there’s not enough evidence to say your findings are significant.
However, statistical significance still depends on how the research (survey) was designed and the sample size so a small p-value in isolation does not tell you the whole story.
Examples of how to use p-values in employee surveys
- Imagine you run a survey across different departments to see if a new leadership programme has improved employee engagement. You compare engagement scores before and after the programme. If the p-value is less than 0.05, you can confidently say that the programme had a significant impact on engagement, and the increase in scores isn’t just due to random chance.
- Suppose you want to know whether job satisfaction is different between two locations of your company. After collecting survey results, you calculate the p-value. If the p-value is less than 0.05, you can conclude that there is a statistically significant difference in job satisfaction between the two locations, not just random variations in the responses.
- In a survey on workplace inclusion, you ask employees to rate how inclusive they feel their teams are. You want to know if women rate inclusion differently than men. After analysing the results, if the p-value is more than 0.05, it would suggest that any difference in inclusion ratings between men and women is likely due to chance and not a meaningful difference.
How are p-values calculated?
The mathematical formula for calculating p-values is complex and most statisticians would use software to do it.
Also, p-values can actually be calculated in different ways. Choosing the right statistical test to calculate a p-value depends on the type of data you have and the question you are trying to answer. Below are some common tests and the situations in which they are used.
1. T-Test
- Used for: Comparing the means of two groups.
- Types:
- Independent T-Test: Used when comparing two separate groups (e.g., comparing job satisfaction between two departments).
- Paired T-Test: Used when comparing the same group at two different times (e.g., before and after a training program).
- Test Statistic: T-value.
- Example: You want to compare average satisfaction scores between two teams to see if there’s a significant difference. A t-test will give you a p-value to determine if the difference in scores is likely due to chance.
2. ANOVA (Analysis of Variance)
- Used for: Comparing means across more than two groups.
- Test Statistic: F-value.
- Example: You’re comparing job satisfaction scores across three departments (e.g., Marketing, Sales, and Finance). ANOVA will help determine if there is a significant difference between the groups and will provide a p-value to guide your decision.
3. Chi-Square Test
- Used for: Categorical data, to test relationships between variables.
- Test Statistic: Chi-square value.
- Example: If your survey asks employees whether they are satisfied (Yes/No) and you want to test whether satisfaction is associated with department, a chi-square test will help determine if there’s a significant association between these two categorical variables.
4. Z-Test
- Used for: Comparing proportions or means when sample sizes are large (typically n > 30).
- Test Statistic: Z-value.
- Example: You want to compare the proportion of satisfied employees between two large locations (e.g., London vs. Manchester). A z-test can determine if the difference in proportions is statistically significant.
5. Mann-Whitney U Test (Non-Parametric)
- Used for: Comparing the ranks of two groups when data is not normally distributed.
- Test Statistic: U-value.
- Example: If your satisfaction scores are heavily skewed (e.g., most people rated either very high or very low), the Mann-Whitney U test is a non-parametric alternative to the t-test that helps determine if there’s a significant difference between the groups.
6. Wilcoxon Signed-Rank Test (Non-Parametric)
- Used for: Paired data with non-normal distributions.
- Test Statistic: W-value.
- Example: If you are comparing satisfaction scores for the same group of employees before and after a policy change, and the data is not normally distributed, the Wilcoxon signed-rank test will give you a p-value indicating whether the change was significant.
7. Correlation Tests (Pearson or Spearman)
- Used for: Measuring the strength and direction of the relationship between two continuous variables.
- Types:
- Pearson Correlation: Used when both variables are normally distributed.
- Spearman Correlation: Used for non-normal data or ordinal variables.
- Test Statistic: Correlation coefficient (r) and corresponding p-value.
- Example: If you want to see if there’s a significant correlation between years of service and job satisfaction, a correlation test will provide a p-value to show whether the relationship is significant.
8. Logistic Regression
- Used for: Analysing the relationship between one or more independent variables and a binary outcome.
- Test Statistic: Z-value or Wald chi-square.
- Example: If you want to determine whether factors like age, gender, and department predict whether an employee is satisfied or not (binary outcome), logistic regression will help you calculate p-values for each factor.
When should you not use use p-values?
While p-values are a valuable tool in statistical analysis, there are several situations where they may not be suitable for analysing survey data. Here are some key instances when using p-values might not be appropriate:
Non-Random Sampling
- When the sample isn’t random: P-values assume that the sample being analysed is randomly selected from the population. If your survey uses a biased or non-random sample (for example, if you only survey certain types of employees), the results may not generalise to the whole population. In such cases, p-values would be misleading.
- Example: If you only survey high-performing employees about job satisfaction, the sample won’t reflect the entire workforce, and using p-values could lead to inaccurate conclusions.
Small Sample Sizes
- When the sample size is too small: P-values rely on sufficient data to be meaningful. With very small sample sizes, the results of a statistical test, including the p-value, may not be reliable. Small samples can lead to high variability, making it difficult to detect true differences.
- Example: If only 10 employees respond to a survey out of 1,000, calculating a p-value could falsely indicate significance (or lack thereof) due to the small and unrepresentative sample.
Qualitative Data
- When the data is qualitative: P-values are designed for numerical data, especially when comparing averages or proportions. If your survey data is qualitative (e.g., open-ended responses, interview transcripts), p-values can’t be used. In these cases, thematic analysis or coding techniques are more appropriate.
- Example: If your survey asks employees for their feedback on company culture using text responses, analysing this data with p-values wouldn’t make sense. Instead, you’d look for patterns or themes in the responses.
Non-Normal Data Distributions
- When the data isn’t normally distributed: P-values often assume the data follows a normal (bell-shaped) distribution, especially for tests like the t-test. If your survey data is heavily skewed or has outliers, the use of p-values might not be appropriate, and alternative non-parametric methods (e.g., Mann-Whitney U test) might be needed.
- Example: If your survey contains a question where most respondents answer at the extreme ends (like “Strongly Agree” or “Strongly Disagree”), the distribution of responses might not be normal, making it problematic to calculate and interpret a p-value.
Multiple Comparisons
- When performing multiple tests: If you’re running many statistical tests on your survey data, relying on p-values can lead to “false positives”—results that appear significant by chance. This is called the “multiple comparisons problem,” where testing many hypotheses increases the likelihood of finding significant results by luck.
- Example: If you compare job satisfaction across 10 different departments and calculate a p-value for each one, you increase the chances of finding a difference that isn’t real. In such cases, corrections like the Bonferroni correction are needed, or you might use a different analysis method.
Non-Actionable Results
- When p-values don’t provide practical insight: Sometimes, a p-value might indicate statistical significance, but the result isn’t practically significant or meaningful for decision-making. A small p-value doesn’t always imply a large effect, and it’s important to look at the context of the survey results.
- Example: A p-value might show that there’s a statistically significant difference in engagement scores between two teams, but if the difference in the actual scores is tiny and wouldn’t change how you manage the teams, then the result may not be useful in practice.
When You Need Descriptive, Not Inferential, Analysis
- When the goal is to describe the data: P-values are used for inferential analysis—when you want to make conclusions about a population based on a sample. However, if the goal of your survey is simply to describe the opinions of respondents (descriptive analysis), p-values aren’t necessary.
- Example: If you’re conducting a satisfaction survey and want to report that 75% of employees are happy with their work-life balance, there’s no need for a p-value to describe this proportion.
Employee survey validity in action
To ensure that your employee survey results are meaningful and actionable:
- Start with a clear purpose: Know what you want to measure before designing your survey.
- Craft your questions carefully: They should be clear, unbiased, and targeted at the concepts you want to measure.
- Maximise sample size: Invite (and encourage) all of your employees to take part in your survey rather than relying on a sample. The more people who respond the more representative the data will be.
- Check reliability and validity: Where possible pilot your survey and review the results for consistency and accuracy. Although, bear in mind the effect of piloting on a small sample of people.
- Use appropriate tools to measure significance: Once you have the data, use statistical methods to confirm that your results are robust and not due to random chance.
- Be pragmatic: At the end of the day, employee surveys are designed to gather feedback on how your people feel at a particular moment in time.
In conclusion
By ensuring the validity, reliability, and statistical significance of your employee surveys, you can be confident that the feedback you gather reflects the true sentiments of your workforce. This allows you to make informed decisions that positively impact employee satisfaction, engagement, and productivity.