Archive for the ‘Computational’ Category

New book: Semi-supervised Learning for Computational Linguistics

Sunday, October 28th, 2007


Abney, Steven. 2007. Semi-supervised Learning for Computational Linguistics. Chapman & Hall/CRC Computer Science & Data Analysis Volume: 8

From the publisher:
-Offers applications in information extraction, parsing, and word senses, such as WordNet
-Provides background material in machine learning that includes the areas of classification and clustering
-Covers a variety of methods, including co-boosting, transductive SVMs, McLachlan’s algorithm, and the EM algorithm
-Examines in detail the concept of label propagation in a graph
-Discusses spectral methods, including the definition of harmonics, the eigenvectors of matrices and graphs, spectral clustering, and the connection to label propagation
-Introduces the necessary mathematics in a just-in-time manner

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

Conference presentation: Named Entity Recognition

Thursday, October 4th, 2007

Li Yang has just returned from the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007) in Melbourne, where he presented a paper titled Named Entity Recognition Using Syntactic/Semantic Information, co-authored with Steven Abney.

In their paper, Yang and Abney show that combining deep syntactic knowledge with machine learning methods significantly improves the performance on the task of named entity recognition.

Deep processing is the major theme of PACLING 2007.

Team USA takes top honors at the International Linguistics Olympiad

Thursday, September 13th, 2007

Drago Radev coached the US team in their first trip to the Linguistics Olympiad

From the news report:
Six American high-school students took the top honors in the 2007 International Linguistics Olympiad in St. Petersburg, Russia earlier this month. This year was the first time a delegation represented the United States at the annual competition. Their victory brings a new focus on computational linguistics.

This year’s International Olympiad featured 15 teams representing 9 different countries, including the Netherlands, Russia and Spain. Competitors were given problem sets consisting of sentences in languages most people are not familiar with, including: Tatar; Georgian; a language spoken by indigenous people in Bolivia called Movima; the Papua New Guinean language Ndom; Hawaiian; Turkish; and their English translations. With just this information, the competitors then had to translate more sentences from these languages into English. Winners were judged by how accurately and quickly they could figure out the rules and structure of the languages and complete their translations.

Read the rest