I’m an EAP teacher and I hate Corpus Linguistics 12

There, I’ve said it…not sure if I’ll keep my job or not, but I felt I needed to get this off my chest. Now, let’s just qualify this to start with, I’m not sure I actually hate corpus linguistics per se. What I think I hate is this cloud of obligation hanging over EAP teachers that they should somehow be ‘doing’ corpus linguistics. Every time I go to a conference, there seem to be more and more sessions on the importance and value of creating corpora in the classroom, of how we should all be mining data from corpora and getting our students to analyse them. And I dilligently attend these sessions, but always leave slightly perplexed with the same question bouncing around my head: yes, but what am I supposed to do exactly?

I’m also not convinced that the people touting the value of corpus data are always telling the truth of what goes on in their class. The way they tell it, students are earnestly and dilligently analysing corpus data, sharing tremendous insights about how the language is used and finding it very, very useful. It never quite seems they are describing actual students.

Because it all sounds great, until you try to do it yourself. You trot off to one of the famous sites where you can pull out corpus data, such as Lex Tutor or COCA. Not the prettiest sites in the world to start, a bit of dusting and tidying up of the layout might not go amiss. And the language? Yeah, not massively welcoming to someone trying to get their heads round this for the first time. You look on the home page and try to work out what N-Gram means or RT Builder. Focus on Word Forms sounds promising but then it has I.-D.xszxWORDSvgft TTS-DICTATOR underneath it. Did the guy typing this just fall asleep on his keyboard or does that actually mean something?



tonight we’re going to surf the web like it’s 1999

But then you spot something that looks familar: Concordance. That’s definitely something to do with Corpus Linguistics, let’s give that a go. So, you click on that and it takes you to another text-heavy page with phrases such as ‘Associated Word’ space expanded to handle multiple items (for better handling of homographs). Man, my head’s beginning to spin.


Geocities called from 1997, it wants its website back

But you persevere. You click on Clean Sentence Concs as that sounds like, well, it might have concordances and they are clean (better than a dirty one, I’m assuming). Now, there are different corpuses to choose from. Which one do I want? Brown, 2k mini, AWL mini, BNC written/spoken? God I’m horribly confused. Ok, I’ll choose AWL mini, I mean that’s something to do with the Academic Word List, right? It says 550k in brackets next to it, that sounds like a lot and I suspect with a corpus, more is better. Could be wrong about this though.

Ok, there’s a space for a word. I’ll type something in. Let’s try the word ‘research’, that’s pretty academic. This has got to offer up something interesting. And it does, sort of. I get a page of sentences from various sources (presumably academic) with the word research in the middle. Cool.


what I learn from this is that corpus linguistics is confusing

What do I do now? I should probably ‘analyse’ these examples for useful patterns that I can then share with my students or create activities for them to explore them. So I look through and nothing immediately leaps out. There are some obvious collocations in there (major research, research centres/laboratories etc) but if I was a non-native speaker, I’m not sure they would leap out at me. The rest just seem to be a random collection of sentences that tell me little about the word ‘research’.

I suppose what they tell me is that language is messy and unstable and doesn’t always fit into neat categories. That’s probably true, but that doesn’t help me with my lesson tomorrow, I can’t go in front of my students and say, ‘look, we’re going to look at some corpus data today but you probably won’t get much from it because, well, it’s all over the bloody place and I wasn’t sure how I could make it accessible for you. But enjoy nonetheless…’

Basically, EAP teachers need help with this stuff and there’s really not much out there to give us some guidance of where to even get started. Yes, I know that we can’t shirk responsibility when it comes to keeping up with our profession, we have to be ready to take on new trends, techniques and technologies, but this assumes there’s some clear and logical way to access this information.

Now, I think I’ve learnt a bit about corpus data because, well, I’m interested in technology and I like poking around websites and blogs to find out more. But not everyone is going to have that same level of curiosity and if the information is hidden behind impenetrable and obscure websites, it’s unlikely most teachers are going to persist because they are humans and have better things to do with their time.

What’s needed are some solid books or websites (ideally from publishers) that will take teachers step by step through this process and help them understand the link between the amorphous subject Corpus Linguistics and the job of getting their students to write a little bit better by using the right collocation occasionally. Between those two things is a huge chasm of confusion and anxiety.

Luckily there are a few websites out there that deserve an honorable mention for trying to make linguistic data as accessible and useful as possible. Flax is one of those. Although the site design is a little clunky, they do try to make the results visually attractive and with a clear indication of what it means. So, for example if I type in a word, I get a list of common collocations for the word as well as a number next to it indicating frequency. Clicking on the collocation will bring up a dropdown menu with further fine grained collocations of the one clicked and then you can click on these to see them in their original context. All really useful stuff and for teachers and students it’s very clear what information you are getting. If you’re willing to put in a bit of extra time to explore the site, you’ll find lots of extra useful features such as corpuses of abstracts that you can run text analysis tools over. There’s also a place where you can create collocation games that you can send to your students.


this makes a lot more sense

Webcorp is another one I recently discovered and that’s useful for pasting in text and getting some basic data out about it. If you put in a text, it will show you the words that appear most frequently in the text and clicking in them will show you the sentences that it appears in as a set of concordances. This is something that can be easily done with students in class to get them to analyse texts they’ve read and to extract useful vocab.


Just the Word is another one I recommend to students and teachers because the interface is simple to understand (the search box is very Googley) and the results reasonably easy to interpret. It lists all the collocations grouped by meaning and indicates the frequency using a line and number. This is very simple for students to understand.

just the word

I’d also recommend the Oxford Text Checker, where you can paste in any text and check for how many words come from the top 3000 most frequent words in English or from the academic wordlist. It’s a simple tool but actually a lot more useful for teachers and students in helping them guide their vocabulary learning. A quick check of a text or chunk of a text studied in class and you can quickly identify keywords to focus on for more intense vocabulary work.

oxford 3000 text checker

Now the problem with these sites is that they don’t seem ‘scientific’ by virtue of being easy to use. When I use them, I tend to think that I’m not really doing “proper” corpus linguistics because it’s all too simple and basic (most EAP teachers have that natural anxiety working in academia that what we’re doing is somehow less valid or worthwhile because it’s not willfully obscure and impenetrable). But that’s not something to worry about as this is what our students need. They are not studying to be applied linguists (well, a few of them may be) so they don’t need to learn to navigate and interpret confusing and challenging corpus websites.

But at the same time, I think it would be good to see a greater effort to make more complex corpus analysis more accessible to EAP teachers. So, for example, I’d like to know how to create my own corpus based on student essays, analyse the types of mistake they generally make and then use that to inform what I do in the class. But I don’t really know how to do that and I can’t find anywhere on the internet that could help me. And I’m reasonably tech savvy, a teacher who finds technology a little frightening wouldn’t stand a chance trying to navigate these websites.

Ok, I’ll take back my original statement. I don’t hate corpus linguistics, but I do wish there was more stuff out there that would make it accessible to me and other teachers, because at the moment most of these tools feel like the preserve of researchers and professors in Applied Linguistics. Much has been made of the gap between teaching and research, and this kind of thing only serves to heighten that.

I’d be interested to hear from other EAP teachers about their experiences trying to bring in corpus data into the classroom. Have you found a way to make it accessible to students or are you struggling with it? If anyone has any really good techniques, materials or websites, I’d love to hear about them.

Profile photo of David Read

About David Read

I work at the English Language Teaching Centre at the University of Sheffield as the Director of TEL (technology-enhanced learning). I've been an EFL/EAP teacher and teacher trainer for over 20 years and have worked in 14 different countries. Settling down is clearly an issue for me.

Leave a comment

Your email address will not be published. Required fields are marked *

12 thoughts on “I’m an EAP teacher and I hate Corpus Linguistics

  • Jennifer Sizer

    Interesting assessment. I thought I was going to hate this article as I’m quite keen on corpus linguistics. But actually loved the recommendations especially the Oxford text checker I wasn’t aware of and will be using from now on.

    I wondered if you had used/heard of SKELL: https://skell.sketchengine.co.uk/run.cgi/skell which I think is a good all rounder while easy to use for students too.

  • Julie Moore

    Ha ha, this did make me laugh … although in a rather disheartened way. As someone who uses a corpus almost every day (my background’s in dictionaries, so I’ve been doing it professionally for years), I do find it frustrating that the publically-available corpus tools are such a confusing mess The commercial publishers (who hold very large corpora) actually all use the same corpus software by Sketch Engine. Admittedly, it’s not quite as user-friendly as it perhaps could be, but it doesn’t take much to get used to and for those of us who work in the world of commercial corpus research, at least we have a familiar interface to work with.

    Outside of the commercial corpora though, there’s a mishmash of tools, some more user-friendly than others, all with their pros and cons, advocates and detractors. Which isn’t very helpful for a teacher who just wants to dip a toe occasionally.

    There is a book which gives a step-by-step guide for teachers who want to use Sketch Engine; “Discovering English with Sketch Engine” by James Thomas. I haven’t tried working through it properly, but it looks quite helpful. It does though depend on using the subsciption version of Sketch Engine.

    And Collins are hoping to launch a publically-available version of their corpus soon, which should have a slightly more user-friendly interface.

    Until then … good luck!

  • anthonyteacher

    Great post! I have fallen in and out of love with corpus linguistics and DDL. As a tech savvy person and part-time coder, even I get frustrated with the academic corpus tools and switch to sites like you listed above. While there is a lot of potential with using corpora in the classroom, you need to have the right ingredients for it to work right, including the right students and a firm belief in your ability to not only use these tools but to derive learning benefits from them. You’d have to be overly enthusiastic and hope this enthusiasm is contagious.

    I helped my colleague complete his master’s thesis based on introducing, using, and testing BYU-BNC as a DDL tool. After working with students for a month on using corpora, I think they still did not find much benefit in it.

  • mura nava

    hi David
    your perception that there seems to be increasing amount of CL being used in HE is interesting as it backs up increasing usage in this sector from Chris Tribble’s 2012 survey – https://googledrive.com/host/0B7FW2BYaBgeiTWRkVS1RMXViNlE/13Tribblefinal_rev.pdf

    i agree that interfaces like lextutor can be confusing for students (if not properly trained on them like any tool) and maybe profiling tools such as that you mention included in webcorp could be an alternative; Paul Raine has recently released a particularly good one called Text Genie – https://www.apps4efl.com/views/special/text_genie/

    if you want to create your own corpus of student essays have a read of the advice the creators of a Korean Learner Corpus have outlined here http://koreanlearnercorpusblog.blogspot.fr/p/corpus.html

    the best rationale i have read on intergrating corpus information is Frankenberg-Garcia on Integrating corpora with everyday language teaching https://www.academia.edu/3368339/Integrating_corpora_with_everyday_language_teaching –>>recommended read

    finally the G+ CL community is worth a look now and again : ) https://plus.google.com/communities/101266284417587206243


  • Chad Langford

    Hey, David.

    I, too, thought I was going to hate this post (see Jennifer Sizer’s comment), as I myself find great value in corpus linguistics and in using corpora in general. (Actually, what I’m interested in is real data, and corpora/corpus linguistics enables me to do with that data things I couldn’t do otherwise.)

    Instead, I ended up reading a really interesting and enlightening post, and learned a bit as well.

    I agree that the Compleat Lexical Tutor is off-putting visually, and consequently discouraging. But a little experimentation into what I can use it for has reconciled me with that. Unfortunately, most of us don’t usually have time to experiment — we need to know, and know quickly, what the tools can or cannot, and then decide which, if any, of them are useful to us.

    In any case, thanks for the post. I’ll definitely be back.

    Best, – Chad

    (p.s. Would also like to mention that the guy who runs the Compleat Lexical Tutor (can’t remember his name) is quick to answer questions and can be very helpful.)

  • tom cobb

    Surely not too big a stretch to discover that Lextutor has both a Facebook group for elaborate discussions of use as well as about 35 research papers involving its various routines at lextutor.ca/research
    ~ Tom/developer and chief cheerleader