The NLP Analysis tab provides a project-wide quantitative overview of your data. It uses advanced AI models to extract linguistic features from both the Wrong and Corrected texts, then visualizes them in charts designed to reveal trends, differences, and statistically reliable patterns.
The key challenge in this kind of analysis is that your texts may not be the same length. To make comparisons meaningful, Error Analyzer uses normalized frequencies (rate per 1,000 words) instead of raw counts. This ensures that differences reflect real linguistic shifts rather than simple length differences.
Why Normalization Matters #
If one text is longer than another, raw counts are misleading. For example:
- Wrong Text: 10,000 words, 500 nouns.
- Corrected Text: 12,000 words, 600 nouns.
At first glance, you might think the corrected text has “more nouns.” But in fact, both texts use nouns at the same proportional rate (50 nouns per 1,000 words).
Normalization formula:
(Number of times a feature appears / Total number of words) × 1000
This method:
- Creates a fair “apples-to-apples” comparison.
- Aligns with standard practices in corpus linguistics and computational text analysis.
- Provides more interpretable results. For example, “220 nouns per 1,000 words” directly tells you something about the density of nouns in the text.
All charts in this tab use normalized values by default.
Understanding the Charts #
1. Part-of-Speech (POS) and Named Entity (NER) Charts #
- What they show: Bar charts comparing categories such as Noun, Verb, Adjective (POS) and Person, Location, Organization (NER).
- Method: Rates per 1,000 words for Wrong (red bars) and Corrected (blue bars), plus a Delta (Δ) chart showing net change.
- Significance: Useful for high-level lexical patterns. For example, an increase in nouns may indicate more concrete expression, while a decrease in adjectives could suggest conciseness.
2. Dependency (Syntax) Charts #
- What they show: Frequencies of grammatical relations such as nsubj (subject) or dobj (object).
- Method: Wrong vs. Corrected distributions, plus Delta visualization.
- Significance: Provides insight into structural changes. For example, a rise in nsubj:pass across the dataset would suggest systematic movement toward passive voice.
3. Tense, Number, and Surface Edit Charts #
- What they show:
- Tense usage (past, present, future).
- Number usage (singular vs. plural nouns).
- Surface edits as a pie chart (additions, deletions, replacements).
- Significance: Offers quick diagnostic insights into typical learner issues such as verb tense or agreement, and into general correction style.
4. Advanced Statistical Charts (Slope and Volcano) #
- Slope Chart:
- A line chart tracking the most significant dependencies from Wrong to Corrected texts.
- Significance: Highlights consistent, project-wide changes in an intuitive “before → after” view.
- Volcano Plot:
- A scatter plot combining effect size (magnitude of change) and statistical reliability.
- Significance: Points in the top-right or top-left represent the most important findings—large, consistent shifts that are statistically unlikely to be random.
These charts allow you to distinguish between changes that look interesting and those that are actually robust.
Why These Methods Matter for Qualitative Error Analysis #
The charts are not just descriptive. They play a crucial role in error analysis by:
- Revealing systematic shifts in word choice, syntax, and style.
- Supporting the identification of error patterns that may not be obvious from individual examples.
- Combining qualitative interpretation with quantitative evidence, strengthening the reliability of your findings.
Normalization ensures fairness, Delta charts make shifts visible, and advanced statistical plots allow you to focus only on changes that are both meaningful and trustworthy. Together, these methods give you a multi-layered perspective on how errors and corrections differ.
From Visualization to Narrative #
At the top of the tab, the Generate Report button uses an LLM to transform your quantitative results into a written summary. This feature provides:
- A human-readable explanation of what the charts show.
- Highlights of the most important linguistic shifts.
- A structured narrative you can integrate into your reports, papers, or presentations.
This way, you can move directly from raw charts to publishable insights.