# COMMON QUESTIONS

On this page you will have access to the most common questions.

Check if your doubts are among them.

## WHAT IS THE SUITABLE SAMPLE SIZE FOR MY RESEARCH?

There is no magic answer to this question. It would be the same if I asked you, "What should I wear tonight?" Not knowing where I'm going, what I'm going to do, if it's going to be hot or cold, you can't answer correctly.

To calculate the sample size, you need to know some information:

- What is the purpose of the research (population description, comparison between population averages, etc.).

- What is the main outcome variable (primary variable) of the research, and its type (nominal qualitative, ordinal qualitative or quantitative).

- Degree of confidence in the population estimate (usually we should define the errors Alpha (Type I Error) and Beta(Type II Error), in biology the most used is Alpha= 5% and Beta= 20%).

- In the case of comparison between populations, we must know the smallest difference between the populations that is of clinical importance, and consequently is important to be detected by the research.

- It is also necessary to estimate the variability of the primary variable in the populations studied (eg in the case of a quantitative variable, what is the standard deviation in the population).

Without this information, it is impossible to determine the appropriate sample size.

## WHAT IS “POWER OF THE TEST”?

What is “Power of Testing”?

Answer: The statistical power of a test is the probability that the test will actually reject the null hypothesis when it is false. In hypothesis testing, the Beta error (Type II error) is the probability of accepting the null hypothesis when in fact it is false, so the test power is given by 1 – Beta.

Determining the desired test power to calculate the sample size is an important criterion. In biology, they are usual values to adopt a power of 80% or 90% (Beta = 20% or Beta = 10%) in determining the sample size, remembering that the greater the power adopted, the greater the sample size needed.

## WHAT IS THE CORRECT WAY TO TABULATE MY SURVEY DATA TO PERFORM STATISTICAL ANALYSIS?

There is a video on the PESQUISE Channel on youtube that provides guidelines and examples of data tabulation for statistical analysis. In addition, these standards are also available on the website of the Faculty of Dentistry of Bauru-USP.

## ONE OF THE EVIDENCE OF MY RESEARCH PRESENTED VERY DISCREPAND VALUE WHEN COMPARED TO THE OTHERS. CAN I JUST DISCARD IT IN THE STATISTICAL ANALYSIS?

Outliers can occur for several reasons. It is important to identify what happened before discarding the value of one or a few specimens.

First, it is necessary to verify that there was no methodological flaw in the preparation or measurement of the specimen. If there was this type of problem, report what happened in the research report and discard the analysis value. If this has occurred in several specimens, it is better to review the methodology used, as it is not maintaining good standardization.

If no methodological problem is identified and the outliers occurred in only one, or a few cases, there are some statistical formulas to identify whether we can consider the values as outliers and remove them from the analysis.

## WHAT IS SYSTEMATIC ERROR AND CASUAL ERROR?

When performing measurements using methods that may have interference from the subject making the measurement, it is always interesting to assess whether the subject's interference remains at acceptable levels. To assess the “measurement error” we repeat the measurement of some cases. When the repetition is performed by the same subject, we will have the calculation of the “intra-examiner error”, and when the repetition is performed by another subject, we will have the calculation of the “inter-examiner error”. In quantitative measurements, we usually evaluate two types of error to which measurements are subject: 1- “Systematic error” - we want to verify if the second repetition of measurements has, on average, a significant difference from the first measurement, ideally there is no difference significant, as this would indicate that the measurements of one of the repetitions are systematically larger than the other. 2- “Casual error”- we want to verify the difference between the probable real value of the measurement and the value that the subject obtains when taking the measurement, that is, if the individual differences in each element of the research are large or small when the measurement is repeated. In random error, the result is given in the unit of the quantity being measured (eg mm, cm, etc.) and not whether the difference is significant or not.

## HOW DO I INTERPRET THE CORRELATION COEFFICIENT BETWEEN TWO VARIABLES?

The correlation coefficient (r) is a measure of the degree of relationship between two variables. It ranges from -1 to 1, with positive values indicating that the two variables (x and y) grow in the same direction, so as x grows y also grows (eg the older a child, the taller he is). On the other hand, negative correlation values indicate that x and y grow in opposite directions, that is, when x increases and y decreases (eg, the richer a country, the lower the infant mortality rate). As for the coefficient value, the closer to -1 or 1, the greater the degree of correlation between the two variables, and the closer to 0 (zero) the lower the degree of correlation. Thus a value of r = 0.92 indicates a strong correlation between x and y and a value of r = 0.11 indicates a weak correlation between x and y.

A statistical test is usually done to verify that r is statistically significant and therefore a value of p accompanies the value of r. It is important to know that the test has as a null hypothesis that the correlation is null in the population. So a value of p < 0.05 only indicates that the correlation between x and y in the population is not zero. The significant p-value does not indicate that the correlation is strong, only that it is not null. In large samples, a value r = 0.11 can be significant (p < 0.05) and not be a strong correlation.

## I USED KAPPA STATISTICS TO VERIFY THE DEGREE OF AGREEMENT BETWEEN TWO EVALUATORS OF A NOMINAL QUALITATIVE VARIABLE AND ALTHOUGH THE PERCENTAGE OF AGREEMENT HAS GIVEN HIGH, THE KAPPA VALUE WAS VERY LOW, WHY?

This can occur mainly if one of the categories of the variable occurs a very small number of times and in these occurrences there is no agreement between the evaluators. eg if, in an evaluation of 100 cases, evaluator A classifies 98 cases as “Yes” and 2 cases as “No” and evaluator B classifies the 100 cases as “Yes”, the percentage of agreement will be 98% but the kappa gives 0, 00 (zero). We should understand that kappa assesses whether there is good agreement across all categories in which cases can be classified, and in the example above we would have good agreement when the classification is “Yes”, but poor agreement when it is “No”. , resulting in a low kappa value.