There, I’ve said it…not sure if I’ll keep my job or not, but I felt I needed to get this off my chest. Now, let’s just qualify this to start with, I’m not sure I actually hate corpus linguistics per se. What I think I hate is this cloud of obligation hanging over EAP teachers that they should somehow be ‘doing’ corpus linguistics. Every time I go to a conference, there seem to be more and more sessions on the importance and value of creating corpora in the classroom, of how we should all be mining data from corpora and getting our students to analyse them. And I dilligently attend these sessions, but always leave slightly perplexed with the same question bouncing around my head: yes, but what am I supposed to do exactly?
I’m also not convinced that the people touting the value of corpus data are always telling the truth of what goes on in their class. The way they tell it, students are earnestly and dilligently analysing corpus data, sharing tremendous insights about how the language is used and finding it very, very useful. It never quite seems they are describing actual students.
Because it all sounds great, until you try to do it yourself. You trot off to one of the famous sites where you can pull out corpus data, such as Lex Tutor or COCA. Not the prettiest sites in the world to start, a bit of dusting and tidying up of the layout might not go amiss. And the language? Yeah, not massively welcoming to someone trying to get their heads round this for the first time. You look on the home page and try to work out what N-Gram means or RT Builder. Focus on Word Forms sounds promising but then it has I.-D.xszxWORDSvgft TTS-DICTATOR underneath it. Did the guy typing this just fall asleep on his keyboard or does that actually mean something?
But then you spot something that looks familar: Concordance. That’s definitely something to do with Corpus Linguistics, let’s give that a go. So, you click on that and it takes you to another text-heavy page with phrases such as ‘Associated Word’ space expanded to handle multiple items (for better handling of homographs). Man, my head’s beginning to spin.
But you persevere. You click on Clean Sentence Concs as that sounds like, well, it might have concordances and they are clean (better than a dirty one, I’m assuming). Now, there are different corpuses to choose from. Which one do I want? Brown, 2k mini, AWL mini, BNC written/spoken? God I’m horribly confused. Ok, I’ll choose AWL mini, I mean that’s something to do with the Academic Word List, right? It says 550k in brackets next to it, that sounds like a lot and I suspect with a corpus, more is better. Could be wrong about this though.
Ok, there’s a space for a word. I’ll type something in. Let’s try the word ‘research’, that’s pretty academic. This has got to offer up something interesting. And it does, sort of. I get a page of sentences from various sources (presumably academic) with the word research in the middle. Cool.
What do I do now? I should probably ‘analyse’ these examples for useful patterns that I can then share with my students or create activities for them to explore them. So I look through and nothing immediately leaps out. There are some obvious collocations in there (major research, research centres/laboratories etc) but if I was a non-native speaker, I’m not sure they would leap out at me. The rest just seem to be a random collection of sentences that tell me little about the word ‘research’.
I suppose what they tell me is that language is messy and unstable and doesn’t always fit into neat categories. That’s probably true, but that doesn’t help me with my lesson tomorrow, I can’t go in front of my students and say, ‘look, we’re going to look at some corpus data today but you probably won’t get much from it because, well, it’s all over the bloody place and I wasn’t sure how I could make it accessible for you. But enjoy nonetheless…’
Basically, EAP teachers need help with this stuff and there’s really not much out there to give us some guidance of where to even get started. Yes, I know that we can’t shirk responsibility when it comes to keeping up with our profession, we have to be ready to take on new trends, techniques and technologies, but this assumes there’s some clear and logical way to access this information.
Now, I think I’ve learnt a bit about corpus data because, well, I’m interested in technology and I like poking around websites and blogs to find out more. But not everyone is going to have that same level of curiosity and if the information is hidden behind impenetrable and obscure websites, it’s unlikely most teachers are going to persist because they are humans and have better things to do with their time.
What’s needed are some solid books or websites (ideally from publishers) that will take teachers step by step through this process and help them understand the link between the amorphous subject Corpus Linguistics and the job of getting their students to write a little bit better by using the right collocation occasionally. Between those two things is a huge chasm of confusion and anxiety.
Luckily there are a few websites out there that deserve an honorable mention for trying to make linguistic data as accessible and useful as possible. Flax is one of those. Although the site design is a little clunky, they do try to make the results visually attractive and with a clear indication of what it means. So, for example if I type in a word, I get a list of common collocations for the word as well as a number next to it indicating frequency. Clicking on the collocation will bring up a dropdown menu with further fine grained collocations of the one clicked and then you can click on these to see them in their original context. All really useful stuff and for teachers and students it’s very clear what information you are getting. If you’re willing to put in a bit of extra time to explore the site, you’ll find lots of extra useful features such as corpuses of abstracts that you can run text analysis tools over. There’s also a place where you can create collocation games that you can send to your students.
Webcorp is another one I recently discovered and that’s useful for pasting in text and getting some basic data out about it. If you put in a text, it will show you the words that appear most frequently in the text and clicking in them will show you the sentences that it appears in as a set of concordances. This is something that can be easily done with students in class to get them to analyse texts they’ve read and to extract useful vocab.
Just the Word is another one I recommend to students and teachers because the interface is simple to understand (the search box is very Googley) and the results reasonably easy to interpret. It lists all the collocations grouped by meaning and indicates the frequency using a line and number. This is very simple for students to understand.
I’d also recommend the Oxford Text Checker, where you can paste in any text and check for how many words come from the top 3000 most frequent words in English or from the academic wordlist. It’s a simple tool but actually a lot more useful for teachers and students in helping them guide their vocabulary learning. A quick check of a text or chunk of a text studied in class and you can quickly identify keywords to focus on for more intense vocabulary work.
Now the problem with these sites is that they don’t seem ‘scientific’ by virtue of being easy to use. When I use them, I tend to think that I’m not really doing “proper” corpus linguistics because it’s all too simple and basic (most EAP teachers have that natural anxiety working in academia that what we’re doing is somehow less valid or worthwhile because it’s not willfully obscure and impenetrable). But that’s not something to worry about as this is what our students need. They are not studying to be applied linguists (well, a few of them may be) so they don’t need to learn to navigate and interpret confusing and challenging corpus websites.
But at the same time, I think it would be good to see a greater effort to make more complex corpus analysis more accessible to EAP teachers. So, for example, I’d like to know how to create my own corpus based on student essays, analyse the types of mistake they generally make and then use that to inform what I do in the class. But I don’t really know how to do that and I can’t find anywhere on the internet that could help me. And I’m reasonably tech savvy, a teacher who finds technology a little frightening wouldn’t stand a chance trying to navigate these websites.
Ok, I’ll take back my original statement. I don’t hate corpus linguistics, but I do wish there was more stuff out there that would make it accessible to me and other teachers, because at the moment most of these tools feel like the preserve of researchers and professors in Applied Linguistics. Much has been made of the gap between teaching and research, and this kind of thing only serves to heighten that.
I’d be interested to hear from other EAP teachers about their experiences trying to bring in corpus data into the classroom. Have you found a way to make it accessible to students or are you struggling with it? If anyone has any really good techniques, materials or websites, I’d love to hear about them.