VP-ellipsis 2.0

Ah, that wonderful feeling of finishing and submitting a paper; there's nothing quite like it. Today was one of those joyous days: I submitted the revised version of my chapter on VP-ellipsis for the upcoming second edition of the Blackwell Companion to Syntax (and yes, once again I missed the submission deadline by quite a margin, and once again I'm not happy about that).

I'm of course biased, but I think this version is a substantial improvement over the previous one: the cross-linguistic ellipsis data are now more tightly integrated into the rest of the paper (making their relevance and importance for the theory of ellipsis clearer), the work by Andrew Kehler is now acknowledged (not mentioning it was a substantial omission in the first version), at several points I've incorporated some additional, very recent work on VP-ellipsis (most notably two interesting papers by Philip Miller and collaborators), and of course I got to use the fantastic sentence The precise nature of the movement operation responsible for evacuating Donald Duck out of the ellipsis site is a matter of much debate.

Given that you're all very busy and can't be expected to read every single overview article that crosses your path (let alone a second version of such an article), I thought I'd give you a quick and easy wordcloud summary of the paper. This is what it looks like:

No big surprises here, except perhaps the relatively high frequency of  rutabagasmadame, and spanella, but if you're familiar with what is arguably one of the most important (if not the most important) papers on VP-ellipsis, these terms too quickly fall into place.

  1. Yes, I'm cheating slightly, because in the original quote Donald Duck is in italics, but let's not split hairs here.

Elias's adventures in COMP-agreement land

When my son was born, I was determined not to become one of those typical linguist-parents. You know the type: they diligently write down Every Single Utterance produced by their offspring (no matter how seemingly insignificant) and their work suddenly shows a skew towards language acquisition that was previously entirely absent. That was not going to be me. I was going to enjoy fatherhood to the fullest. No work, all play.

Sure, there are things you can't help noticing and find amusing: left branch extractions ("Daddy, how are you tall?"), a continuous struggle to assign nouns to the correct gender (even with exceptionless classes like diminutives), and my goodness, past participle formation is a bitch! I let it all slip, though: this was not a linguistic interview with a native speaker, it was a (usually dinosaur-related) play session with my three-year old son.

But then he started doing something that I just cannot let slide. Here's what he said earlier today:

      1. azz-e
      2. if-pl
      1. wij
      2. we
      1. thuis
      2. home
      1. zijn
      2. are
  1. "when we're home"

Note the plural agreement ending on the complementizer. This is a phenomenon conveniently known as com(plementizer)-agreement in the linguistics literature. It's a phenomenon I've published on and given talks about, i.e. it's something I have a profound professional interest in. And the thing is: Elias shouldn't be doing this. He's born and raised in a non-comp-agreement area, by non-comp-agreeing parents, and surrounded by similarly non-comp-agreeing friends and family. To drive home this point, here's a map of Elias's home vs. the (relevant) comp-agreeing part of the Low Countries:

In other words, he's not just imitating what he hears in his direct surroundings, nor is comp-agreement a surfacy curiosity situated at the very fringe of grammar and not worthy of serious theoretical investigation. There's something deep about this phenomenon, something that indeed warrants the attention it has received in the generative literature. Given that I remain committed to not becoming a linguist-parent, I will not be using Elias's data in any upcoming publications or talks (nor will I be prodding him for more data), but I do want to thank him for suggesting that daddy might be onto something.

  1. And a great source for meta-conversations like the following:
    Elias: We've swimmed this morning, haven't we daddy?
    Me: Swum. We've swum.
    Elias: What is "swum"?
    Me: Uhm, it's like swim, but uhm...
    Elias: No, we've not swum, we've swimmed!
    Me: Okay then.

  2. One possible source of comp-agreement is his best friend's father, who is Dutch and so might be a speaker of a comp-agreement dialect. I haven't heard him use it, but I'll be listening closely next time we meet.

  3. Comp-agreement comes in various shapes and sizes. What I've mapped here is comp-agreement ending in sjwa in the first person plural, i.e. of the type Elias was using.

  4. Nor is he simply copying—via some process of analogy—the verbal affix onto the complementizer, as he's even using comp-agreement in so-called double-agreement contexts, whereby the complementizer and the verb have a different ending:

        1. azz-e
        2. if-pl
        1. wij
        2. we
        1. thuis
        2. home
        1. kom-t
        2. come-pl
    1. "when we're coming home"

Academic social network etiquette

If you look at my Facebook page, I'm arguably one of the least qualified persons to speak up about this: a modest 136 friends (mostly linguists), no personal info apart from some bare necessities, hardly any posts (the occasional conference or talk announcement), and even fewer likes (the only thing I've liked so far is birthday wishes, mainly because I didn't know what else to do with them and because it seemed rude to do nothing). The discrepancy between the amount of Facebook information I consume and the amount I produce has even earned me the rather dubious title of 'Facebook lurker' in some circles.

Be that as it may, however, I was inspired by a brief rant by Casey Liss in one of the early episodes of Analog(ue) (I think it was episode 3 or 4), in which he rallied against subtweeting your loved ones and/or taking spousal fights to Twitter. This made me realize that there are a number of bad practices that academics engage in on social networks that rattle my chain.

The first concerns tweeting about (interactions with) students. Numerours are posts of the type You'll never guess what a student just mailed me: he wants a deadline extension on a term paper because he's going on a skiing holiday. Here's his e-mail: ... Yes, these posts are typically anonymized (or even translated) and no, they are mostly not public, but in my opinion it's simply not done. Dealing with students is a core part of our job, and you're supposed to do so with professionalism, regardless of the content of the interactions. Besides, Facebook's privacy settings are sufficiently opaque to constantly leave that residu of doubt about who exactly can see what you're posting, and if there's one thing academics tend to forget about their students, it's that they have access to the internet too. Equally annoying, mind you, are overly positive messages about students, of the type Great class today; what a wonderful group of students! I realize that there might be a genuine sentiment of enthusiasm and love for one's job behind such a post, but at the same time, it feels like sucking up (let's not forget: students have access to the internet too) and as such, it is equally inappropriate as publicly criticizing students.

The second (or third, depending on how you count) thing that gets under my skin are posts or tweets about papers that one is reviewing. Typically, these are derogatory comments, meant to illustrate how stupid or uninteresting this paper is, and what a waste of one's valuable time and considerable intellect it is that one has to review this heap of parrot droppings. To such a post (and its writer) I say: tough titty for you, fish face. Writing reviews is part of your job, it is a way of doing service to the academic community, so shut up and get to work.

Thirdly and finally, let's all collectively agree to stop using Facebook and Twitter to whine about a (brilliant, of course) paper of ours that wasn't accepted for publication because some (ignorant, obviously) reviewer was too dim-witted to see the sheer live-changingness of our ideas. There's this fantastic—and probably apocryphal, but who cares—story about the legendary Morris Halle, who, whenever a student came into his office to complain about a paper or abstract that wasn't accepted, pulled open a very long drawer filled to the brim with papers and said "These are all papers and abstracts of mine that weren't accepted", instantly silencing the student in question. Maybe Morris should put that same sentiment in a Facebook post or a tweet?

So there you go, my three pet peeves about the intersection of academia and social media. Now, the—arguably very few—avid readers of this blog might object that what I did in my earlier post entitled A reviewer's review violates the second—or a mix of the second and the third—etiquette rule outlined above. In that post I complained about what a reviewer had written about a paper of mine, claiming that he was mainly trying to promote his own work. Assuming I don't want to hide behind technicalities, I have to plead at least partially guilty to these charges. The reason I felt justified to break my own rules is because the post wanted to bring a positive message—the three ground rules about reviewing—and that I tried (and managed, I'd say) to keep the complaining and whining to a minimum. Whether or not that warranted me engaging in online behavior I don't approve of, I'll leave for others to evaluate.

  1. My second, pseudonymous Facebook account presents an even more barren landscape. Bonus points for those of you who can track it down.

  2. Although this post is about social networks in general, most examples that come to mind are from Facebook. On Twitter I rarely see these things—maybe I don't follow enough academics there.

    1. The post was about a review, not a to-be-reviewed paper.
    2. The post was not on a social network site, but on my own blog, which—truth be told—has a smaller audience than my Facebook page or Twitter account.
    3. I did my best to anonymize everything. In fact, I didn't even make clear which paper of mine it was that had received this review.
    4. I didn't call into question the quality of the review—on the contrary, I admit that adhering to the reviewer's advice has made my own paper better.

Quantity and quality in linguistics

"This book advises you to be wary of forecasters who say that the science is not very important to their jobs, or scientists who say that forecasting is not very important to their jobs! These activities are essentially and intimately related. A forecaster who says he doesn't care about the science is like the cook who says he doesn't care about food. What distinguishes science, and what makes a forecast, is that it is concerned with the objective world. What makes forecasts fail is when our concern only extends as far as the method, maxim, or model."

A great quote from a great book: Nate Silver's The signal and the noise. The art and science of prediction. I read this book over the summer and thoroughly enjoyed it. Nate Silver is a statistician who runs and writes for the highly recommendable blog FiveThirtyEight. (As a quick aside, before reading the book, all I knew about Silver was that he had received an honorary doctoral degree from my university and so I assumed that he was a statistics professor at some Ivy League university, but it turns out he started his career as a baseball analyst and online poker player.)

Back to the quote: I think it hits the nail squarely on the head when it comes to characterizing the relationship between statistics and scientific theory. More specifically, if you replace 'forecasters' with 'quantitative linguists' and 'science' with 'theoretical linguistics', you arrive at the motto that characterizes my current thinking about the field. Theoretical linguistics has to acknowledge and embrace the wealth of quantitative data, methods, and techniques that is out there, but at the same time, pure number crunching uninformed by insights from theoretical linguistics—no matter how methodologically or mathematically impressive—is not going to lead to a deeper understanding of natural language.

Let's make this a bit more concrete. A book that I've drawn a lot of inspiration from lately is Marco RenĂ© Spruit's 2008 PhD-dissertation Quantitative perspectives on syntactic variation in Dutch dialects. It is an excellent, impressive, and innovative piece of work, in which the empirical results of (the first half of) the SAND-project are analysed from a quantitative point of view. However, because Spruit is not a linguist—he's a computer scientist by training—his book forms the perfect illustration of the dichotomy introduced above. For instance, when looking for associations between 485 syntactic variables—associations of the type If a dialect has property A, what are the odds that it also had property B?—he finds no less than 10,730 of them with an accuracy of 90 percent or higher. And when either the antecedent or the consequent is allowed to contain a disjunction, that number even goes up to 56,267,729 (yes, fifty-six million!). With numbers like these you need someone to separate the wheat from the chaff, and that someone—guess what—is a theoretical linguist. In Spruit's own words:

"From a statistical perspective many more linguistically interesting variable associations can be expected to surface upon closer investigation. The explorations described above merely attempt to indicate the great potential of association rule mining as a meaningful contribution to linguistic theory in general and syntactic theory in particular. (..) However, every approach will require extensive consultation with syntactic theorists to meaningfully interpret the data."

It is this synthesis between quantitative-statistical and formal-theoretical approaches that I've been pursuing in a number of recent talks, and there's more to come, so stay tuned.

  1. I'll probably end up reading it a second time, because by the side of the pool I was too lazy to read all—or rather, any—of the footnotes.

Gmail UI woes

A couple of months ago, our IT-department configured the university's firewall to block all outgoing SMTP-traffic (except for its own Exchange server of course). As a consequence, I've been using the Gmail web interface a lot lately, and Oh! My! God! has it been driving me crazy. I'm no designer by any stretch of the imagination, but I'm guessing that whoever is responsible for this jumbled mess wasn't getting straight A's in designer school either. Every day there are numerous aspects of mail.google.com that confuse, irritate, and bewilder me, but for the sake of keeping the amount of complaining on this blog to a minimum, let me pick out my two main grievances. First off, take a look at the following pair:

Pop quiz, hotshot: which of these is the back button and which is the reply button? If I had a eurocent for every time I've mixed these two up, well, I'd have a lot of eurocents. And to add insult to injury: in the view where these two buttons pop up together—the detailed view of a single (thread of) message(s)—the back button is superfluous, because the left-hand column also contains a button to take you to the inbox, which is exactly what the back button does. But hey, it's a good thing replying to messages is not something one does frequently in an email client, right?

A second thing one rarely does, is write new messages. In order to help you execute this obscure task, Google has devised this beauty:

Oh, where to start? There's so many things wrong here. First off, it doesn't look like a button at all. Here's what it looks like in context:

It doesn't look anything like the other buttons in this column. If anything, it looks like the title of the column, something that's not even clickable. I'm guessing the reasoning here was: "Composing a new message is a very common task, so let's make a Big Fat Red Button for it", but the effect has been the exact opposite; by making it stand out so prominently, it completely disappears and becomes invisible.

Looks aside, though, let's focus on what the button says. The Dutch verb opstellen is the literal (Google Translate-style) translation of English 'to compose'. Aha, exactly what someone who wants to compose a new message needs, right? Nope. You see, the verb opstellen is only rarely used in combination with mail or e-mail. You don't have to take my word for it; we can look at some numbers. I did a couple of searches in the Corpus of Contemporary Dutch, a corpus of over 70 million words from (among others) newspapers, journals, legal documents, television news broadcasts, novels, and internet texts. The verb opstellen (in any of its inflectional forms) occurs 24,650 times, the noun e-mail 12,264 times, and mail 8,320 times. The question now is to what extent these sets overlap. It turns out that (e-)mail co-occurs with opstellen only a meagre 16 (sixteen!) times. Compare this to the numbers for sturen 'to send' and schrijven 'to write':

mail e-mail total
opstellen 11 5 16
sturen 1706 1409 3115
schrijven 621 771 1392

What does this mean? Well, it means that Google should have chosen a different name for their button. The verb they've put on there is only very rarely associated with the action connected to the button, which makes it unintuitive and hard to use. That said, though, I think the problem is more fundamental than the choice of verb. Even if the numbers in the table had been reversed, I still think the button would have been poorly designed. The most informative part of the verb phrase een nieuw bericht opstellen 'to compose a new message' is the direct object een nieuw bericht 'a new message', not the verb. So if anything, that's what should have been on the button: nieuw 'new' or nieuw bericht 'new message'. It would sidestep the whole issue of which verb to use, and instead would focus much more directly on the result the user is trying to achieve.

Anyway, the good news is that I will be changing workplaces soon—more on that in a future post—at which point my days of Gmail-web-interface-suffering will be over. I can't wait.

  1. Both nouns are used interchangeably in Dutch.

  2. A quick word on my methodology: for both mail and e-mail I did two searches: one with the noun preceding the verb and one with the noun following the verb. In both cases I allowed for anywhere between 0 and 20 optional intervening words. For opstellen the number of hits was so small that I was able to manually verify all of them and throw out any false positives. For sturen and schrijven I didn't do that because there were too many hits. Note that sturen occurs 72,022 times in the whole corpus, and schrijven 209,487 times.