TL;DR: Experiment about drug use, have subjects fill out form either anonymously or non-anonymously. How do I tease out results?
In my study, I'm trying to observe the effects that answering an anonymous vs. a non-anonymous survey had on reported drug use. I have now collected information from 230 students (though only 216 submitted a properly filled out survey). For my sampling design, I chose to block my sample by both grade in school and the difficulty of their class. I chose to visit English classes since it is required of all students, no students skip grades like they might in math, and for every year there is an advanced/AP class and a regular class. So, I visited 8 classes total, one for each grade level and level of class difficulty, and handed out roughly the same amount of anonymous v. non-anonymous surveys in each class.
The meat of my question lies in analyzing the results. What I've done so far to see the cumulative effect of the anonymous vs. non-anonymous is to track the percentage of each classroom that answered yes to drug use in both scenarios. Then, I subtract the % of people who answered yes anonymously from the % of people who answered yes non-anonymously. If the number is negative, then people answered yes to drug use more often when they had to write down their names. If it is positive, then people answered yes to drug use more often when they were anonymous. Then, once I have these percentage changes, I am multiplying them by the proportion that each class is of the total population (the class sizes ranged from 21 to 33). If I'm doing this right, then I should add all the weighted %s, and that should be the net effect that answering anonymously has on reported drug use.
My project teammate disagrees with the whole blocking by grade and class level thing, thinks we should focus on one of the two. Additionally, he thinks my last step, where I weight the % change in reported drug use according to the class size, counts as double weighting, and I should add up the percentages before that step to arrive at my net result (which turns out to be something crazy like -62.2%).
Who is right? Is my blocking by grade + class level a statistically valid way of designing an experiment? Is my weighting step correct? I'm attaching my excel for reference, here, it has all the analysis I've done so far.
TL;DR: Experiment about drug use, have subjects fill out form either anonymously or non-anonymously. How do I tease out results?
EDIT: Some data: Of the 14 nonresponses, 11 were from grades 10 and 12, 10 were from regular (not advanced) classes, and 12 were non-anonymous. I imagine this skews the data, but can't quite figure out how to quantify it.
Some more data:
My data in excel. AS and AP are advanced classes, CP are regular classes.

My first step would ignore class level and difficulty. Take the total number $n_1$ of valid anonymous surveys and the number $x_1$ of them reporting drug use. Similarly take the total number $n_2$ of valid non-anonymous surveys $n_2$ and the number $x_2$ of them reporting drug use. Then compare the proportions $p_1 = x_1/n_1$ and $p_2 = x_2/n_2$ to see if they differ significantly.
You are testing the null hypothesis $H_0:\theta_1 = \theta_2$ against the alternative $H_a: \theta_1 \ne \theta_2,$ where the $\theta_i$ are the proportions of admitted drug users in the two populations, anonymous and not. The test statistic is $$Z = \frac{p_1 - p_2}{\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}},$$ which (based on the sample sizes you mention) would be approximately normally distributed under $H_0.$ So you would reject $H_0$ at the 5% level of significance if $|Z| > 1.96.$ (I believe that all the necessary 'weighting' is taken care of in the computation of the $p_i$s and the test statistic $Z.$)
If you find a significant difference, then it may be worthwhile to see if differences are larger or smaller depending on grade level and course difficulty. From my interpretation of what you say, perhaps that would be a two factor ANOVA design with factors grade (at four levels) and difficulty (at two levels). Details of that analysis would depend on the numbers of valid questionnaires at each of the eight combinations of levels. That might be a topic for a later discussion, given more information about your data and your objectives.
Note: You mention that not all of the surveys were properly completed. A possible source of bias might exist if there were more invalid non-anonymous surveys than anonymous ones. You do not mention any difference, but you should check into that.