My research follows these rules, with ideas and feedback from Jimmy Lin and his research group. As usual, feedback does not imply endorsement. Be sure to read his guide if you haven’t already.
Justify every claim and choice by argument, which takes on one (or more) of the following types, in decreasing order of strength:
- Deduction from first principles (e.g., mathematical proof).
- Inductive and abductive reasoning (e.g., strong experimental evidence).
- Literature (e.g., following X et al.).
- Bandwagoning (e.g., everybody uses this model).
- Authority (e.g., Google did it).
- Personal anecdote (e.g., our preliminary runs show this but no data given).
Pick a hypothesis that satisfies the following properties:
- Falsifiable: a hypothesis is not a mere statement of fact. It is a claim that you need to support with the aforementioned argument types.
- Concrete: the wording of the hypothesis must be precise. It must leave no room for misinterpretation, as it’s the centerpiece of the paper.
- Explainable: ideally, the hypothesis has an explanatory clause following it, which provides critical understanding and insight.
Here are some examples:
- Terrible: “BERT is sometimes prunable and other times not.” (unfalsifiable, vague, and unimpactful)
- Bad: “BERT achieves a compression ratio of 10.5 on the SST-2 dataset.” (unfalsifiable)
- Okay: “BERT achieves a higher pruning compression ratio on tasks with shorter sentences than it was pretrained on.”
- Good: “Pretrained transformers achieve higher compression ratios on tasks with shorter sentences than they were pretrained on, because the class token attends to fewer tokens, reducing the parameter need.”
Use plain syntax. Readers are here to admire the problem and the solutions, not the prose.
Use fun semantics. Make the problem and the solutions interesting.
Give simple yet intriguing specific examples. Defer the general case for later (or place in the appendix if advisable). Readers’ attention falls off quickly, and it’s important to capture that for as long as possible. A specific case of your problem allows them to immediately taste what you’re trying to solve, without the complexities of the general case. For example,
- Bad: “Let there be a parametric, differentiable classifier \(f(x; \theta)\) parameterized by (a possibly infinite number of parameters) \(\theta\) over the set of reals. Let there be a generative model \(g(x; \phi)\) for the nondegenerate data probability distribution \(p(x)\) where …”
- Good: “Consider the specific text classification task, with a classification model \(f(x)\) and a language model \(g(x)\).”
Lighten the cognitive load of the reader as much as possible. A good paper uses the least number of neurons as possible to defend itself.
Use the background and related work section to frame the problem and your solution. The introduction describes the problem, and the methods section solves it. To bridge the two, the background and related work part explains why it’s hard and the shortcomings of previous work.