Linguistics: now with magnets, gamma rays, and lego bricks!

Following up on the previous post, something I think about from time to time is how to translate what I do in my research to a non-linguistic audience. Not because I have an uncontrollable missionary urge or because I feel scientific research cannot have a place in our society unless it is "useful" in some concrete—usually economic—sense, but mostly because I feel people are missing out on all the incredibly cool stuff that we find in natural language. Part of this is arguably also 'hard science envy': what do people think of when you say you're going to show them a cool science experiment? They think of a roll of Mentos going down a Coke bottle, or the Crazy Russian Hacker making a bubble making machine out of soap and dry ice. Whatever it is, I can guarantee you no one is thinking of a linguist, standing in the corner of the room, saying something like: "Hey guys, you know what's really cool about this sentence?"

My physics envy was aggravated recently as I was watching De Schuur van Scheire, a show on Belgian national television, in which a group of nerds has fun trying out all kinds of tricks, experiments, and hacks. Not surprisingly—especially considering the physics background of the host of the show—all these hacks have a strong natural science or engineering bias. Recent examples include a home-made battery, a bike with a pulse jet engine, and Ikea-hacking. The social sciences are completely absent in this show. Understandably so, right? I mean, who needs a bunch of boring men and women reading boring books and writing even more boring papers about those books? That doesn't make for exciting science experiments (let alone exciting television). Well, I beg to differ: linguistics can be every bit as cool, nerdy, and exciting as the hard sciences. As an illustration, I give you: linguistic magnets, gamma rays, and lego bricks!

magnets

Magnets, Wikipedia informs us, are materials or objects that produces a magnetic field. This field is invisible but it pulls on some materials, such as iron, and repels others, such as other magnets. Exciting, right? Invisible stuff attracting or repelling other stuff! I bet you never find that in natural language. Wrong. Allow me to set up two magnets (a positive and a negative one if you will):

  1. John is helping.
  2. John isn't helping.

There, that was easy. Now let's throw some material into the invisible field created by these magnets. I'm going to start with the word anyone. It is attracted by the negative magnet but repelled by the positive one:

  1. *John is helping anyone.
  2. John isn't helping anyone.

See how that first example went wrong? The positive magnetic field created by the sentence is repelling anyone so strongly that it wants to push it right out of the sentence. Now let's try the opposite. We're going throw in pretty well. This expression has the opposite magnetic properties of anyone: it is attracted by positive magnets, but repelled by negative ones:

  1. John is helping pretty well.
  2. *John isn't helping pretty well.

Now let's see what happens if we start stacking magnets. If I take the positive magnetic field trying to repell anyone in example 3 and put a negative magnetic field on top of it, the attraction between anyone and the negative force is stronger than the repulsion from the positive force:

  1. I don't think John is helping anyone.

But if we try the opposite, i.e. if we embed the repulsion of pretty well in example 6 in a larger positive magnetic field, it still goes wrong:

  1. *I think John isn't helping pretty well.

Believe me when I say that this is just the tip of the magnetic linguistic iceberg. We have no time to linger, though, as we have gamma rays to get to!

gamma rays

Gamma rays—thank goodness for Wikipedia—are a type of electromagnetic radiation of an extremely high frequency. They are basically high energy photons. As you might have guessed from my repeated use of the adjective high in the preceding sentences, gamma rays are pretty dangerous and pretty powerful. Just to give you an idea: if we install a gamma ray cannon—if such a thing exists—on one side of the room and put a plywood door in its line of fire, the gamma rays would have no trouble reaching the other side of the room, straight through the door. Same story if you put two plywood doors in between. Or three. Or ten. Or a hundred. In fact, if you want to stop the gamma rays, you're going to need an inches thick slab of lead.

Gamma ray cannons may or may not exist in real life, but they certainly do in natural language. We call them questions. The gamma rays that shoot out of them are called question words. Typical examples are who, what, where, etc. Let's warm up our cannon with a simple shot:

  1. What is John eating?

Note how the question word what does not appear to the immediate right of the verb eating, where we typically find the foods in this sentence (cf. John is eating an apple). Instead, it has an uncontrollable urge to move all the way to the left, like, well, a gamma ray shot out of a gamma ray cannon. Now let's put a plywood door in the line of fire of this cannon:

  1. What does Mary say that John is eating?

Sentence boundaries are the plywood doors of language. Note that the question word what in this example still refers to the food that John is eating. This means that it started out to the immediate right of the verb eating, but from there it has shot leftward uncontrollably. In particular, it has blasted through the sentence boundary marked by that without even the slightest hesitation. So let's give our cannon a bit more of a challenge: we're going to put two plywood doors in the line of fire:

  1. What does Sally think that Mary says that John is eating?

Boom! Straight through, as if the doors simply aren't there. Clearly we need a stronger boundary. Let's try putting five plywood doors blocking our gamma ray cannon:

  1. What do you think that Paul claims that Ellen believes that Sally thinks that Mary says that John is eating?

Like actual gamma rays, our question word keeps shooting through sentence boundaries as if they're simply not there. In order to stop these rays, we need to find something denser. We need to find the linguistic equivalent of a thick slab of lead. Turns out that a particular type of clause boundary is made of lead. Imagine John was eating something at the same time that Mary was off to the store, and you would like to know what that was. You load up your gamma ray cannon, fire off your question word, and BAM! it hits a lead wall:

  1. *What is Mary off to the store while John is eating?

So there you go: that is made of plywood, while while is pure lead. Next time you need to shield yourself from gamma rays, choose your complementizers wisely!

lego bricks

Lego bricks are the ultimate nerd toys: emminently hackable, limited only by one's own imagination, and a wide gamut of Star Wars merchandising. If only there were a way of having them with us all the time. Well, look no further: the very words you speak are nature's best lego bricks. Consider for example the following (not very uplifting but perfectly grammatical) sentence:

  1. John hit the dog with a hat in the garden.

Let's forget about John for the moment; he won't have any role to play in the remainder of the discussion. The rest of the sentence is best represented as follows:

What can you do with bricks like these? You can start building bigger structures of course. Well, here's a kicker for you: the towers you can build based on these four bricks correspond exactly to the different interpretations of the example in 1. Let's see what I mean by doing some actual building. Suppose we start by connecting the first two bricks:

We've now created a little tower that contains both hit and the dog, i.e. we're saying that part of the interpretation of this sentence involves a dog getting hit (which seems pretty accurate). Let's continue building:

The second green block has joined hit the dog and with a hat, i.e. we're now saying that the dog-hitting was performed using a hat. Time for the final step in our tower:

The final brick combines hit the dog with a hat with in the garden. This gives us a nice four-stage tower—though one that is leaning heavily to the right—and it also completes the interpretation of the sentence: a dog was being hit using a hat, and the hitting took place in the garden. However, just like this is by no means the only way we can put these four yellow bricks together, it is also by no means the only interpretation this sentence can have. Suppose we had started off as follows:

This time the first green brick combines the dog with with a hat. What does this mean? It means we're not talking about any old dog. No sir, we're talking about this feisty specimen:

Photo by Digital Vision./Photodisc / Getty Images
Photo by Digital Vision./Photodisc / Getty Images

Combining the second and third yellow brick tells us that we are dealing with a dog who's wearing a hat. If we now want to express that it is this unfortunate creature that's being hit, we have to join the first brick to our tower:

The final step is the same as in our first tower; the fourth yellow brick is added to the structure:

Our second tower is done, as is our second interpretation of this sentence: this time, the creature that's being hit is a dog wearing a hat. The sentence provides no information as to the instrument of the hitting (could be a hat, could be a baseball bat, could be anything), but it does tell us the hitting took place in the garden.

Once again, though, we have exhausted neither the combinatorial possibilities of our lego bricks, nor the possible interpretations of this sentence. Suppose we had combined our hat-wearing dog not with hit, but with in the garden:

IMG_3798.JPG

What does this mean? It means we're not focusing on just any old hat-wearing dog. It turns out John has hat-wearing dogs all over his house: he has one in the kitchen, in the basement, in the bathroom, and he also has one in the garden. And today, he's decided to exercise his animal cruelty on that latter dog:

Behold tower number three and, like its shadow, interpretation number three right behind it. What this says is that the dog that is being hit is the hat-wearing specimen that resides in the garden. Under this interpretation, the sentence provides no information about the instrument used to perform the hitting or the location where the hitting took place.

Three different interpretations for that one (simple) example in 1. Surely we've exhausted this sentence now, right? Well, have we exhausted our tower building options? Clearly not:

Now we start by combining the last two bricks, and once again, we need to ask ourselves: what is the linguistic correlate of this real-world building activity? We're combining with a hat and in the garden. This means that we're not talking about any old hat, we're talking about a garden hat. You see, John is not only cruel to his dogs, he also gets bored easily. He doesn't like having to use the same hat every time he wants to hit a dog, and so he has hats lying all over his house: in the kitchen, the bathroom, the basement, and, yes, the garden. And today he his hitting the dog...

...using his garden hat:

Tower number four (the most stable one so far) and interpretation number four. Note, though, that John is not the only one can enjoy variety in his hat selection. Suppose we give the garden hat to the dog:

Behold a dog who enjoys the finer things in life. He enjoys wearing hats, yes, but it's not like he'll put just any old headgear on his canine head. This dog only wears garden hats. Unfortunately for him, though, this is not his lucky day:

Our poor fashion-sensitive friend is getting hit. We don't know with what or where, but we do know that he is the recipient of a beating. More generally, we have arrived at our fifth and final tower, as well as our fifth and final interpretation of the example in 1.

If you've read this far, chances are you're either really interested or my mom, so let my give you one final bonus round: what would have happened if you wanted to first combine the second brick with the fourth one? Well, something like this:

Or like this:

Either way, Houston, we have a bit of a construction problem: the third brick is stuck under the bridge and cannot partake in any further tower building activities. And you know what's really cool (even if it shouldn't come as a surprise by now)? Construction problems with our lego bricks correspond to interpretation problems for our sentence. You see, one thing the sentence John hit the dog with a hat in the garden cannot mean is that the object of John's hitting is the dog in the garden (the garden dog, let's say) and that the instrument he uses in that hitting activity is a hat. (Take a moment to mull that one over; you'll see I'm right.) And that is exactly the interpretation we were building in the last two pictures: by combining the dog with in the garden, we were singling out a particular dog, namely the garden dog. Once again, then, Lego and linguistics go hand in hand.

This post is long enough as it is, so I'll stop here, but I hope to have shown to at least some of you what an exciting research topic natural language can be. Next time: dark matter!

cobblers

If you move in the same Facebook circles as me—and if you're reading this, chances are there's at least an overlap—you've probably already come across this blog post, in which a Helsinki-based linguist called Joe McVeigh critically reviews this paper, which appeared in PNAS earlier this year. McVeigh really goes to town with the paper, pointing out several flaws (and rightly so, I'd say), but the reason I wanted to rehash it here is because his opening statement struck a chord with me:

"I can understand the temptation to research and report on language. We all use it and we feel like masters of it. But that’s what makes language a tricky thing. You never hear people complain about math when they only have a high-school-level education in the subject. The “authorities” on language, however, are legion. My body has, like, a bunch of cells in it, but you don’t see me writing papers on biology."

You see, of the fourteen authors of the PNAS-paper, not a single one was a linguist. Some Reddit commenters saw in this blog post—and the above quote in particular—a plea against interdisciplinary research and even found it "smacking of arrogance", but that's not at all what this is about. There's something peculiar about language as the object of scientific study, something which invites many a non-linguist to posit hypotheses and develop theories of his own. I see at least two reasons for this. One is the simple fact—also pointed out in the quote—that language is something that we all own. Everyone reads, writes, speaks, intentionally manipulates language in word play, or experiments with it in poetry or literature, whereas only very few of us do the same with the DNA-sequence of a fruit fly. In other words, the object of study is much more accessible than it is in most of the hard sciences. The second reason, I think, is the fact that the basic linguistic insights—the axioms of linguistics if you will—are insufficiently known outside of the field, which in turn might be due to a lack of intradisciplinary consensus on what those axioms are. In short, there's still a lot to do for linguistics, both in terms of moving the field forward, and in terms of informing the outside world of that progress.

Where's where?

Time for another Random Linguistic Observation (henceforth RLO). I'm calling this one The Mystery of the Missing Location. I stumbled upon it in the main Dutch reference grammar, where it's referred to as 'to know with the meaning of to known where'. Here's an example:

      1. Ik
      2. I
      1. weet
      2. know
      1. Jan
      2. John
      1. wonen.
      2. live
  1. "I know where John lives."

Note the discrepancy between the gloss and the translation: the otherwise obligatory locative complement of wonen 'to live' is missing, yet the sentence is perfectly grammatical and moreover, it is interpreted as me having knowledge of where John lives. This in and of itself would be enough to grant RLO-status to this construction, but as it turns out, it has a couple of additional quirks. First, the location disappearance trick only works with complements, not with adjuncts:

      1. *
      1. Ik
      2. I
      1. weet
      2. know
      1. Jan
      2. John
      1. slapen.
      2. sleep
  1. intended: "I know where John sleeps."

Second, the embedded verb cannot be ditransitive:

      1. *
      1. Ik
      2. I
      1. weet
      2. know
      1. Jan
      2. John
      1. het
      2. the
      1. boek
      2. book
      1. leggen.
      2. put
  1. intended: "I know where John is putting the book."
      1. *
      1. Ik
      2. I
      1. weet
      2. know
      1. Jan
      2. John
      1. zich
      2. self
      1. bevinden.
      2. be.find
  1. intended: "I know where John is."

In (3) I'm using the verb leggen 'to put', which requires both a direct object and a locational complement, while (4) features zich bevinden 'to be' (lit. to be-find oneself), an inherently reflexive verb that means the same as zijn 'to be' in its locational sense, as in I am in London (and note that when used in that sense, zijn can occur in the construction under discussion here). Leggen is truly ditransitive, zich bevinden only superficially so (due to its being inherently reflexive), but for the mystery of the missing location, they're all the same, i.e. excluded.

The third and final quirk of the construction is that it is only locative complements that can go missing. Verbs with complements expressing duration or manner do not allow those complements to disappear when selected by weten 'to know':

      1. *
      1. Ik
      2. I
      1. weet
      2. know
      1. de
      2. the
      1. film
      2. movie
      1. duren.
      2. last
  1. intended: "I know how long the movie lasts."
      1. *
      1. Ik
      2. I
      1. weet
      2. know
      1. Jan
      2. John
      1. te
      2. to
      1. werk
      2. work
      1. gaan.
      2. go
  1. intended: "I know how John operates."

I must admit I'm quite puzzled by this: I know of no other context that is so specific and selective in its ellipsis properties. If any of you have any ideas, or if the phenomenon occurs in languages other than Dutch, I'd love to hear about it.

  1. I'm giving the Belgian Dutch version here. In Netherlandic Dutch, the complement of weten 'to know' would be a to-infinitive, not a bare one.

  2. If any of my first-year students are reading this: this makes for an excellent test to distinguish between a locational complement ('voorwerp van plaats') and a locational adjunct ('bijwoordelijke bepaling van plaats').

  3. I guess the ill-formedness of (6) could be due to the fixed expression te werk gaan 'operate' (lit. to work go) being ditransitive in some weird sense (cf. the second quirk), but I know of no verb that only has a subject and a manner complement.

refme

One of the necessary evils of academic writing are references: you have to add them, format them, keep them up to date, etc. Now, granted, the advent of LaTeX/BibTeX has made life a lot easier in this respect: you add a reference to your bib-file once, and from there on out you always have it at your fingertips in whatever referencing style you need. But ... you still have to add the references in the first place. I know, I know, this is quite the first world problem, but like so many first world problems these days, there's an app for that! Enter RefME, an app that promises to be "the free tool to generate citations, reference lists and bibliographies".

What does it do?

In a nutshell, it creates references for you. All you have to do is scan a barcode (of a book), enter a DOI-number, or simply search based on author or title and poof! a reference magically appears, in one of over 6,000 referencing styles for you to choose from. There's an iOS-app and a web-app, and once you have the references you need, you can export them to a variety of places like your clipboard, Evernote, Word, Endnote, BibTex, etc. It sounded pretty good, so I had to put it to the test.

Does it work?

I first decided to scan the barcodes of all the books I had lying about in my immediate vicinity:

  1. R. Zanuttini & L. Horn Micro-syntactic variation in North American English
  2. D. Geeraerts Theories of lexical semantics
  3. H. Smessaert Basisbegrippen morfologie
  4. C. Wheelan Naked statistics
  5. R. Munroe Wat als
  6. L. Gonick & W. Smith The cartoon guide to statistics
  7. F. du Bois & I. Boons Gin & Tonic: de complete gids voor de perfecte mix

I used the iOS-app to scan the barcode on the Geeraerts-book, and in the blink of an eye, the following appeared on screen:

@book{Geeraerts_2009, title={Theories of Lexical Semantics}, ISBN={9780198700319}, publisher={Oxford University Press, USA}, author={Geeraerts, Dirk}, year={2009}, month={Jan}}

Ok, not perfect—publisher's location is missing—but pretty good nonetheless. The same was true for numbers 1, 4, and 6: here and there information was missing, but it got the basics right. The app struggled, though, with the Dutch books on my list. In one case (number 5), it filled in author and title but nothing more, for another (number 3) it could only give me the ISBN-number, while the third one (the G&T-book) it didn't recognize at all. Then I entered some DOI-numbers in the web-app. This produced great results: all of them yielded perfectly formatted and complete references. Finally, I looked for some papers based on their authors and (keywords from) titles. It worked well with journal articles (I looked for Merchant's 2013 LI-article on voice and ellipsis and Adger's 2006 paper on combinatorial variability in the Journal of Linguistics), but behaved strangely with book chapters. I searched for "luigi rizzi fine structure left periphery" in the section Book chapters and found the relevant Haegeman-volume, but without title and author of the chapter filled in, while in the section Journal articles I found title and author, but it looked like the paper had appeared in the 'journal' Kluwer International Handbooks of Linguistics. A similar fate befell Kratzer's seminal 1996-paper on severing the external argument from the verb.

In short, mixed results: when it works, it works quite well and is highly useful, but there are also still clear gaps in RefME's bibliographical knowledge, some not unsurprising (I wouldn't expect a Belgian Dutch guide on how to mix the perfect Gin & Tonic to be in their database), others quite mysterious (like the Rizzi-paper).

How much does it cost?

Nothing. Nada. Bupkis. Typically, this is a cause for concern. Here's what RefME themselves say on their support page in reply to the question "How does RefME make money?":

RefME is very lucky to be supported by investors focused on growth. Our goal is to reach 10 million users within the next 18 months and we are already well on the way to reaching that number. We do know how to make money but don't worry, we aren't selling anyones [sic] data or working with any publishers (and never will!). RefME will also always be free to students :)

Translation: you'll have to accept on good faith that we'll be good. I actually wouldn't mind paying for this app, because it clearly has the potential of being very useful—imagine having to add 100 references to your bib-file in one go; RefME could speed up that process considerably—but then it has to become much more accurate.

  1. Don't judge me.

  2. The HPSG-community has quite a good solution to this conundrum: they host a central bibliography.

so far and yet so near

While I'm at it, here's another new paper of mine, this one in collaboration with Tanja Temmerman. It starts out from a seemingly crazy idea originally proposed by Kyle Johnson in a paper in Lingua, namely that two elements that for all intents and purposes look to be non-adjacent, can nonetheless be considered adjacent under a multidominant analysis. I was quite skeptical of the idea at first—witness footnote 22 in Johnson's paper—but as Tanja and I discovered, it allows you to have your cake and eat it too when it comes to negative indefinites such as no car. As many peope have pointed out, negative indefinites seem to be semantically—and in some cases morphologically—composite in that they contain both negation and an existential. Accordingly, some of the earliest analyses of the phenomenon assumed a kind of fusion or amalgamation approach, whereby clausal negation and the indefinite determiner of the direct object fuse into a single negative determiner no. However, while that works fine for an OV-language like Dutch, where negation linearly precedes the indefinite determiner of the object, it goes awry in English, where the verb intervenes between clausal negation and the direct object (John did not eat a cookie.).

Enter Johnson's idea, and all of a sudden, negation and direct object can be adjacent even in a language like English. What's interesting about this approach—and this is actually the main topic of our paper—is that it correctly predicts negative indefinites to interact with ellipsis in ways that are unexpected under a movement- or Agree-based approach. The paper is relatively technical—and some of the trees rather funky-looking—but the general message I think is clear and quite thought-provoking: being close to one another does not necessarily mean the same thing for morphosyntax as it does for Spell-Out. In other words, Metallica had it right all along.