Archive for October, 2007

MSU Colloquium: Andries Coetzee

Tuesday, October 30th, 2007

MSU Colloquium Series (2007)

Dr. Andries Coetzee.
“Lexical Frequency and Variation”

Wells-A607
Nov. 1; 4:30 pm. A coffee hour will be held at 3:30

Abstract
The problem. Variable phonological processes are influenced by the same grammatical factors as categorical processes. In English, t/d variably deletes from word-final clusters – cf. (1). Table 1 (next page) shows that the frequency of deletion is at least partially determined by phonological context. Several formal models have been developed over the past decade or so that can account fairly well for this grammatical influence on variable processes (Anttila 1997; Boersma & Hayes 2001; Coetzee 2006; etc.).
(1) Pre-C context Pre-V context Pre-Pause context
west bank ~ wes bank west end ~ wes end west ~ wes
However, usage frequency also influences the application frequency of a variable process. t/d-deletion is more likely in more frequent words – west and vest are very similar, but west is more likely to undergo t/d-deletion, corresponding to its higher usage frequency (Table 2). Current models of variation are all strictly grammatical, and cannot account for this frequency influence. I propose a model that allows grammar and lexical frequency to co-determine the application frequency of a variable process.
(2) *PRE-C: No word-final [-Ct/d] before a C-initial word.
*PRE-V: No word-final [-Ct/d] before a V-initial word.
*PRE-##: No word-final [-Ct/d] before a pause.
Chicano English ranking: MAX-L1 à *PRE-C à MAX-L2 à *PRE-V à MAX-L3 à PRE-## à MAX-L4.
The proposal. (i) Variable lexical indexation. I assume that faithfulness constraints can be indexed to lexical classes, and that these constraints are interspersed between the markedness constraints, as shown in (2). An indexed constraint only evaluates words that share its indexation. The novel proposal here is that words do not have to belong to one lexical class exclusively. Since a word can vary its affiliation, it can be evaluated by different indexed constraints on different occasions, resulting in variation. Assume that /west/ can be assigned to L1, L2, L3, or L4. The faithful candidate of /west bank/ violates *PRE-C, and the deletion candidate one of the indexed MAX-constraints, depending on /west/’s lexical class affiliation. If it is assigned to L1, the faithful candidate is optimal, but any other indexation results in deletion. Pre-vocalically (/west end/), the faithful candidate violates *PRE-V. Now two indexations result in preservation (L1, L2), and two in deletion (L3, L4) (cf. tableau below). Pre-pausally only an L4-affiliation results in deletion. The grammatical influence on variation is hence captured – deletion is observed under 3/4 indexations pre-consonantally, 2/4 pre-vocalically position, and only 1/4 pre-pausally.
(ii) Frequency and lexical class affiliation. In the current model, the lexical class of a word is determined at each evaluation occasion. I propose that this process is influenced by the word’s usage frequency. Every word is stored with its own probability distribution function. These functions range from 0 to 1, with the range divided into regions corresponding to the lexical classes. In the example here, values from 0 to .25 correspond to L1, .25 to .5 to L2, etc. Every time a word is submitted to the grammar, a value is chosen randomly from its probability distribution to determine its lexical class affiliation for that evaluation occasion. If a value under .25 is selected it will be evaluated by MAX-L1, etc.
The shape of a word’s distribution function is determined by its frequency. Frequent words have left-skewed distributions so that their distribution mass is concentrated at the higher end. A frequent word will hence more likely select a value resulting in it being classified as L3 or L4 than L1 or L2. Consequently, a frequent word is more likely to be protected by low ranking faithfulness, and hence to undergo deletion. Infrequent words have right-skewed distributions. By similar reasoning, they are more likely to be assigned to L1 or L2, and hence to resist deletion (cf. figure below). Since usage frequency determines the shape of the distribution functions, lexical frequency gets to influence the likelihood of deletion.
Conclusions. There is mounting evidence that lexical factors (usage frequency) play a role in phonology. An adequate model of phonology must include a mechanism through which such lexical factors can contribute to phonological performance. Lexically indexed constraints allow lexical information an indirect entrance into the grammar, which I exploit here to allow grammar and the lexicon to co-determine the frequency with which variable processes apply.

Linguistics Dept. Open House

Sunday, October 28th, 2007

JOIN US!

Linguistics Department Open House
9:30-10:30AM October 31st 2007- HALLOWEEN
4th Floor, Lorch Hall

Wear a costume to be entered into our prize drawing!

Free Donuts and Cider from Washtenaw Dairy

New book: Semi-supervised Learning for Computational Linguistics

Sunday, October 28th, 2007


Abney, Steven. 2007. Semi-supervised Learning for Computational Linguistics. Chapman & Hall/CRC Computer Science & Data Analysis Volume: 8

From the publisher:
-Offers applications in information extraction, parsing, and word senses, such as WordNet
-Provides background material in machine learning that includes the areas of classification and clustering
-Covers a variety of methods, including co-boosting, transductive SVMs, McLachlan’s algorithm, and the EM algorithm
-Examines in detail the concept of label propagation in a graph
-Discusses spectral methods, including the definition of harmonics, the eigenvectors of matrices and graphs, spectral clustering, and the connection to label propagation
-Introduces the necessary mathematics in a just-in-time manner

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

New PhD Update: Catherine Fortin

Friday, October 26th, 2007

This fall, Catherine Fortin (dissertation: Indonesian Sluicing and Verb Phrase Ellipsis: Description and Explanation in a Minimalist Framework, defended summer 2007) began a two-year appointment as a Visiting Assistant Professor in the Carleton College Linguistics Program. She will be teaching several different courses in syntax and related subfields of linguistics (semantics, morphology, intro, acquisition, and ‘field methods’). This term she is teaching Introduction to the Theory of Syntax and First Language Acquisition. With only one other full-time faculty member in Linguistics, Catherine constitutes about half of the program!

Catherine will also be presenting a paper at the upcoming Western Conference of Linguistics (WECOL) at UCSD in November, on some of her dissertation research, Verb Phrase Ellipsis in Indonesian.

Sally Thomason nominated for President-Elect of the Linguistic Society of America

Friday, October 19th, 2007

Candidate for Vice President/President-Elect (1-year term, with two additional years on the Executive Committee as President and Past President)

Sarah Thomason (Ph.D. Yale, 1968) is the William J. Gedney Collegiate Professor of Linguistics at U Michigan, where she has taught since 1999. Before that she taught at U Pittsburgh and Yale. She was editor of Language (1988-1994) and has served on various LSA committees, including a stint as a member of the Executive Committee (2001-2003); she has also taught at three LSA Linguistic Institutes (1969, 1993, and as the Hermann and Klara H. Collitz Professor in 1999). Since 1981 she has worked with the Salish & Pend d’Oreille Culture Committee in Montana, compiling a dictionary and text collection; she held an NSF fellowship for this research in 1999-2001. In 2000 She was President of the Society for the Study of the Indigenous Languages of the Americas. She has served Section Z of the American Association for the Advancement of Science in various capacities, including as chair (1996) and as section secretary (2001-2005). She is currently a member of the international Advisory Boards of the Netherlands Graduate School of Linguistics and the Max Planck Institute for Evolutionary Anthropology. Most of her research focuses on historical linguistics (especially language contact) and Montana Salish; her two major books are Language Contact, Creolization, and Genetic Linguistics (with Terrence Kaufman, 1988) and Language Contact: An Introduction (2001).

New Book: Handbook of Language Development

Friday, October 19th, 2007

E. Hoff & M. Shatz (eds.) Blackwell Handbook of Language Development. It includes chapters from people in linguistics, psych, speech & hearing, and communications.

From the publisher:
The Blackwell Handbook of Language Development provides a comprehensive treatment of the major topics and current concerns in the field. Covering new academic terrain in areas such as brain development, computational skills, bilingualism, education, and cross-linguistic comparisons, this volume explores the progress of twenty-first century research in language development while considering its precursors and looking towards promising research topics for the future. This balanced and accessible volume collects the work of a generation of researchers who are enlarging the field to consider internal and external bases for language development and to address a wide range of language development outcomes.

Presenting recent research in the traditional topics of language development from infancy through early childhood, this book also expands upon those topics to include work on older children, exploring how linguistic knowledge develops with experiences such as learning a second language and acquiring writing skills. The expansive coverage of foundational and emerging topics makes this book an excellent resource for researchers, instructors, and graduate students in developmental psychology, linguistics, and education.

New Paper: Control in Greek

Friday, October 19th, 2007

Title: Control in Greek: It’s another good move
Authors: Konstantia Kapetangianni (U of Michigan) & T. Daniel Seely (Eastern Michigan U).
Published in: New Horizons in the Analysis of Control and Raising
Series: Studies in Natural Language and Linguistic Theory , Vol. 71 (Publisher: Springer)
Davies, William D.; Dubinsky, Stanley (Eds.)

Abstract:

In this paper we attempt an exercise in explanation by deduction relative to subjunctive clauses in Greek. We argue that the standardly noted properties of OC vs. NOC subjunctive clauses, along with a range of (unnoticed) properties problematic to previous accounts, can be “explained” within a reductivist minimalist framework. We start with the question: what is the least we can say? We answer that we can go a surprisingly long way with just this: in some cases, the abstract Agr(eement) element associated with the subjunctive verb form is phi-defective (i.e. does not have the full set of abstract phi features); elsewhere the Agr element associated with subjunctive is phi-complete. With respect to their surface morphology the subjunctive clauses are identical, with respect to underlying abstract phi features, they are not. It is an irreducible property of certain verbs that they select defective Agr in the subjunctive clause. Phi complete Agr occurs elsewhere. We argue that this simple featural distinction goes a surprisingly long way in deducing, and hence explaining, the properties of subjunctive clauses; and has consequences beyond.

Conference on phonological features

Friday, October 19th, 2007

San Duanmu went to a feature conference at University of Paris 3 (Sorbonne-nouvelle) Oct 4-5. The conference title was ‘Where Do Features Come From? Phonological Primitives in the Brain, the Mouth, and the Ear‘.

The weather in Paris was great. The conference was a very interesting and focused. All the talks were on features, mostly with an experimental component. The organizers have asked speakers to provide their handouts or PowerPoint files on the internet. So soon they should be available from the conference website:

This conference was a concluding event of a multi-year research project on features at Paris 3. Several PhD students worked on 1-2 features each and presented their works. Ken Stevens also gave a special session on the quantal theory and the latest work by him and his colleagues and students.

Linguistics Club Meeting: School House Rock

Monday, October 15th, 2007

Report from Ultrafest and Haskins Labs

Thursday, October 11th, 2007

Pam Beddor, Andries Coetzee, and Kevin McGowan have recently returned from NYU where they attended Ultrafest IV. As previously mentioned here, Ultrafest is an annual opportunity for linguists and speech scientists using ultrasound to get together, share work they’re doing with this relatively new tool and discuss common solutions to ultrasound’s unique challenges. We learned a great deal about how ultrasound is used, what its strengths are, and what challenges we can expect to face as we move in this new direction. The department is now researching ultrasound hardware options and will be reviewing demonstration models soon.

Pam and Kevin also had the opportunity to visit the new home of Haskins Laboratories where Pam gave an invited talk on “The phonetics and phonology of nasal gestures” as part of the Haskins Staff Talk series.

During the visit they toured the facilities and were given a hands (and chins)-on introduction to HOCUS (the Haskins Optically-Corrected Ultrasound System) — a bold, multi-year project at Haskins to use optical tracking to allow free and natural head motion during analysis of running speech while still providing the data necessary to orient ultrasound images to the location of the passive articulators in four dimensions.

“Ultrasound systems for research in linguistics range from compact laptop-sized units one can take into the field to finely-tuned installations such as those at Haskins or Maureen Stone’s lab at the University of Maryland, Baltimore“, Kevin reported. “This trip will definitely let us take advantage of others’ experiences with ultrasound as we add this tool to our own lab.”