Past Brown Bag Talks
College of Humanities and Social Sciences (CHSS)
[Supported by the U.S. National Science Foundation: Grant No. 1048406]
The purpose of the group is to discuss issues in large datasets, language and speech processing. We are hoping to bring our cross-disciplinary strengths to these areas, to share ideas, to discuss the current state of the art, and to collaborate on research topics.
Beata Klebanov & Michael Flor, Educational Testing Service (Princeton)
February 15, 2017
Stepping Stones to English Competence – Idiomatic Expressions in Argumentative Essays Written by Non-Native English Speakers.
We present a work-in-progress that focuses on idiomatic expressions in student essays. We analyzed a corpus of essays written by non-native speakers of English, in response to several different prompt questions for the TOEFL test. We describe a computational procedure for automatic detection of idiom-candidate phrases in essay texts. A subset of the data was manually annotated for idiomatic expressions. In the annotated dataset, we found that some idiomatic expressions are highly topic-specific, while others have more generic usage. We show that topic-specific idioms are often clearly aligned with a certain stance (for or against) and/or with a certain line of argument. This might stem from idioms’ tendency for having strong evaluative (positive or negative) connotations.
Bios:
Beata Beigman Klebanov is a Senior Research Scientist in the Research and Development division at Educational Testing Service in Princeton, NJ. She received her Ph.D. in computer science in 2008 and her B.S. degree in computer science in 2000—both from The Hebrew University of Jerusalem. She received her M.S. degree (with distinction) in cognitive science from the University of Edinburgh in 2001. Before joining ETS, she was a post-doctoral fellow at the Northwestern Institute for Complex Systems and Kellogg School of Management where she researched computational approaches to political rhetoric. Her interests include discourse modeling, analyzing argumentative and figurative language, and automated semantic and pragmatic analysis of text. At ETS, her focus is on automatically scoring content in student writing. She researches methods to analyze cohesion in student essays, metaphor, topicality, and sentiment, among others.
Michael Flor is a Research Scientist in the Natural Language Processing group of the ETS R&D division. He earned his PhD in cognitive psychology with specialization in psycholinguistics, from Tel Aviv University, Israel. Michael has also worked as a computational linguist for start-up companies, developing natural-language processing algorithms. At ETS, Michael specializes in research and systems development for educational applications, focused on automatic processing of text data, combining statistical, linguistic and cognitive approaches.
**********************************************************************************
Andrew Rosenberg, IBM TJ Watson Research Center
December 8, 2016
Text-to-Speech Synthesis: State of the Art and Opportunities for Improvement.
This talk will describe the major components of IBM Watson’s Expressive Unit-Selection Speech Synthesis engine. I will highlight the linguistic, data, and technological requirements for building a state-of-the-art text-to-speech system and identify legacy approaches that are ripe for innovation.
Bio:
Andrew Rosenberg is a Research Staff Member at IBM TJ Watson Research Center. He received a PhD from Columbia in 2009. His research is on speech synthesis and recognition and NLP, with a focus on how language production characteristics carry and convey information. This involves analyzing prosody (intonation) in speech, and typing behavior in text. From 2009-2016, he was a professor of computer science at CUNY Queens College. He directed the computational linguistics program at the CUNY Graduate Center from 2013-2016. He has authored over 50 papers. He is an NSF CAREER award winner. While a professor, his lab was supported by NSF, DARPA, IARPA, and AFOSR.
**********************************************************************************
Masayuki Gibson, Synfonica LLC
November 17, 2016
The interaction of tone and intonation: evidence for a sequential model of speech melody
Previous studies on intonation in tone languages tend either to be overly narrow, focusing on modeling F0 output for a single language—often just for a single speaker—without accounting for cross?speaker variation or cross?linguistic typological phenomena, or to over?generalize, characterizing prosodic features across languages in broad strokes and lumping together superficially similar features but glossing over language?specific and category?specific phonetic details. This talk presents and analyzes acoustic and perceptual data from Mandarin, Cantonese, North Kyeongsang Korean, and Kansai Japanese—part of a larger study that attempts to fill in the analytical gap left by those previous studies. Several examples of tone?dependent intonation implementation are highlighted and offered as support for a sequential model of speech melody, in which tone and intonation interact prior to phonetic implementation, as opposed to an overlay model, in which they are encoded independently and implemented at the phonetic level in parallel.
Bio:
Masayuki Gibson is a speech scientist at Synfonica LLC, a speech synthesis technology company. He holds B.A.s in Linguistics and Music, respectively, from Rutgers University and a Ph.D. in Linguistics from Cornell University. His theoretical research interests include tone, prosody, and the phonetics-phonology interface. At Synfonica, he works on research and development related to multiple speech synthesis projects, including the company’s knowledge-based synthesis rules and its speech therapy applications. Together with company president Dr. Susan Hertz, he is also working on an extensive set of educational materials for teaching about methods and models for speech research.
**********************************************************************************
Lucas Champollion, NYU
November 30th, 2016
The interaction of compositional semantics and event semantics
The success of event-based semantic analyses in the wake of Davidson’s seminal 1967 paper has not yet been matched by a clear picture of how events fit into the syntax-semantics interface. Allegedly, compositional accounts of various scope-taking elements are either problematic or require nontrivial theoretical commitments once events are introduced (as suggested by Beaver and Condoravdi (2007) for quantification, Krifka (1989) for negation, and Lasersohn (1995, chapter 14) for conjunction).
I suggest a novel perspective on event semantics that overcomes these difficulties. The main innovation is that the event quantifier is part of the lexical entry of the verb. The resulting framework combines with standard treatments of scope-taking elements in a well-behaved way. It is compatible with simple and intuitive accounts of the syntax-semantics interface of quantification, negation and conjunction. This result is relevant to syntacticians and semanticists who are interested in the extent to which a commitment to events favors various analyses of scope-taking expressions, or who would simply like to use events without taking sides in ongoing semantic debates.
The talk will feature a demonstration of the lambda calculator (Version 2 is joint work with Dylan Bumford).
Eliecer Crespo-Fernández, University of Castilla-La Mancha, Spain
March 23, 2016
Euphemism in US Local and State Politics. An Example from New Jersey
Political language is consciously constructed with particular goals in mind. To find the right choice of words to address potential voters is key for political actors not only to give a positive image of themselves but also of the parties they represent. As it is the convention in politics to appear sensitive to people’s worries and sensitivities, politicians try hard to avoid or attenuate all those words which may sound unpleasant, cause discomfort and hence put their public images at risk. To this end, they resort to a wide range of euphemistic strategies when dealing with delicate or embarrassing issues.
Following a critical discourse approach to political language, the goal of this talk is to gain an insight into the strategic functions that euphemism performs in the discourse of local and state politicians from New Jersey. The analysis is based on a sample of their public comments excerpted from New Jersey’s largest newspaper, The Star-Ledger.
The results reveal that New Jersey local and state legislators employ euphemism, as part of a major strategy of positive self-presentation, for a variety of purposes: first to refer to delicate topics without sounding inconsiderate to socially disadvantaged groups; second, to provide a safe ground for verbal attack and criticism; and third, to deliberately conceal from the public controversial topics.
Bio:
Eliecer Crespo-Fernández, Department of Modern Languages, University of Castilla-La Mancha, Spain. His research interests focus on the lexical, semantic and pragmatic dimensions of euphemism and dysphemism which he has approached from different frameworks like (critical) discourse analysis, applied cognitive semantics and critical metaphor theory. He has recently focused on euphemism in political language, with special attention to the role of metaphor, and on the axiological potential of anglicism in discourse.
He has authored four books: El eufemismo y el disfemismo (2007); El lenguaje de los epitafios (2014); Sex in Language. Euphemistic and Dysphemistic Metaphors in Internet Forums (2015); and Describing English. A Practical Grammar Course (in press). He has also edited the special issue entitled Current Trends in Persuasive Discourse (2009) and co-edited the collective volume Euphemism as a Word Formation Process (2012). He has published a number of book chapters and research articles in major journals such as Text&Talk, Spanish in Context, Bulletin of Hispanic Studies and Review of Cognitive Linguistics.
**********************************************************************************
Dr. Iryna Dilay, Ivan Franko Lviv National University, Ukraine
February 24, 2016
Cognitive verbs in English: an attempt at a comprehensive lexicosemantic study
The presentation is concerned with an attempt at a comprehensive study of paradigmatic, syntagmatic and motivational properties of English cognitive verbs as a lexicosemantic group within the semantic field of cognition. I am going to start with the inventory procedure followed by the analysis of semantic relations of the verbs, focusing on both semasiological and onomasiological vantage points of the research. Closely related to the paradigmatic semantic properties are syntagmatic properties of the cognitive verbs manifested in the peculiarities of their valence, collocations and frame structure. I will discuss the deep structure relations of the cognitive verbs underlying their prevalent surface structure S + V + O ± D. Then, based on the corpus-driven evidence, the subliminal meaning of the cognitive verbs collocations will be subject to analysis. Finally, the motivational relations of the cognitive verbs as the third dimension of the lexical system will be addressed in terms of semantic change and cognitive mappings. The suggested approach reveals the principal tendencies underlying the development of the group of English cognitive verbs as a non-discrete dynamic system reflecting the generative potential of the mental lexicon.
Bio:
Dr. Dilay is an Associate Professor in the English Department at Ivan Franko Lviv National University in Ukraine and a Fulbright visiting researcher in the Linguistics Department at Montclair State University (September 2015 – June 2016). Her research focuses on corpus-based study of English verbs. Iryna Dilay’s scholarly interests encompass corpus linguistics, semantics and syntax of verbs, pragmatics, cognitive linguistics, computational linguistics, and Natural Language Processing.
**********************************************************************************
Dr. Byron Ahn, Swarthmore College
November 20, 2016
Focusing on Reflexives
In this talk, I analyze novel data, in which focus accents in English occasionally occur in unexpected places.
Compare the placement of focus in the two question/answer pairs below.
(1)Q: Who embarrassed Jenna? (Agent Question)
A: DANNY embarrassed Jenna. (Sole Narrow Focus on Agent)
(2)Q: Who embarrassed Jenna? (Agent Question)
A: Jenna embarrassed HERSELF. (Sole Narrow Focus on Reflexive)
This pattern in (2), in which a reflexive anaphor bears the focus accent, is striking: WH questions about the *agent* typically require a focus accent on the *agent* in the answer (e.g. Halliday 1967, Krifka 2004, among many others). Critically, this pattern only arises in certain syntactic contexts. Exploration where it is (un)available indicates that syntactic derivations must directly influence prosodic representations.
This research leads to three important conclusions. First, there are at least two types of reflexives in English, though they appear morphologically identical. Second, English reflexivity involves hidden structures that resemble more obvious structures in other languages. Third, and most broadly, the distribution of focal stress (and prosody in general) can be used by theoreticians, learners, and hearers as cues for abstract syntactic structures.
Bio:
Dr. Ahn recently completed his PhD at UCLA (Jan 2015), and has been a visiting assistant professor at Boston University (2014-2015) and Swarthmore College (2015-present). In general, his research program is aimed at reducing the amount of theoretical machinery involved in our model of Language, while expanding its empirical base. Specifically, he is most interested in the syntactic nature of predicates and their arguments, and explores it in English with syntactic and prosodic tools. He’s worked on a range of topics touching on syntax and prosody: the syntax of phrasal stress, reflexive anaphora, emphatic reflexives, grammatical voice, the nature of the syntax-phonology interface, nominal structure in Tongan, intonational contours in yes/no questions of English, tough constructions in English, and Japanese case marking.
Adams Meyers, New York University
April 29, 2015
Escaping Verb-Centrism through NomLex and NomBank: Missing Links in Predicate Argument Structure
**********************************************************************************
Mats Rooth, Cornell University
February 25, 2015
Headed Span Theory in the Finite State Calculus
Headed span theory in phonology is an account of the phonological substance that represents an autosegment such as a nasality or ATR feature as a labeled interval in a line, rather than as a vertex in a graph. The intervals (or spans) have distinguished head positions. Span theory is attractive in computational approaches to phonology that work with finite state sets of strings and finite state relations between strings, because of the possibility of straightforward string encodings of the phonological representations. This talk takes up the problem of working out a detailed, computationally executable construction of span theory in a finite state calculus. This includes a construction the constraint families of headed span theory as operators. For instance, the headed faithfulness constraint FthHdSp(F,X) penalizes underlying segments with value X for the feature F which do not head an [F,X] span on the surface. The family is represented as an operator that constructs a finite state relation that inserts violation marks from a feature and a value. The constraint is directly used in the finite state calculus to optimize a candidate set. At the end, I sketch a proposal for representing transparent segments in harmony using embedded exception spans.
Bio:
Mats Rooth is a professor at Cornell University, in the departments of Linguistics and Computing & Information Science. He does research in two areas: computational linguistics and natural language semantics. He has worked extensively on mixed symbolic/probabilistic models of syntax and the lexicon, on contrastive intonation (what is called focus), and on related phenomena such as ellipsis and presupposition. In addition to these, he is currently working on finite state optimality theory and web harvesting of intonational data. http://conf.ling.cornell.edu/mr249/
**********************************************************************************
Jiwon Yun, Stony Brook University
January 21, 2015
The deterministic prosody of wh-indeterminates
In a number of languages, wh-words are ambiguous between interrogatives and indefinites (e.g. nuku ‘who/someone’ in Korean). This talk concerns how the ambiguity of such ‘wh-indeterminates’ is resolved by prosody, focusing on the case of Korean. Contrary to the previous impressionistic observations that indefinites are distinguished from their interrogative counterparts by lack of phonological prominence, my experimental results indicate that it is phonological phrasing that plays a crucial role in disambiguating wh-indeterminates. The results further show that the role of phonological prominence is rather to force a semantically wide scope interpretation.
**********************************************************************************
Seongyeon Ko, CUNY
January 21, 2015
Debunking the vowel shift hypotheses in Mongolic and Korean
The goal of this talk is to reject the two famous vowel shift hypotheses in the so-called “Altaic” linguistics, which hold in their core that the vowel harmony systems of the oldest attested Mongolic and Korean language (= Old Mongolian and Middle Korean) are based on the “palatal” contrast (front vs. back vowels) and have developed into the modern systems through sequential chain shifts of vowel qualities. Instead, it is argued based on the comparative method, the typology of vowel shifts, and the phonetics of vowel features that both Old Mongolian and Middle Korean vowel harmonies can be best characterized as those based on the feature [Retracted Tongue Root]. It follows then that there were no vowel shifts in the vocalic history of Mongolic and Korean as previously claimed.
**********************************************************************************
Natasha Abner
March 18, 2015
Morphology in Child Homesign: Evidence from Number Marking
Homesigners are deaf individuals who are not exposed to conventional sign languages and create sign systems ‘from scratch’ to communicate with the people around them. This situation provides an opportunity to study the role of language input in language development by investigating the language-like properties that do and do not emerge in homesign. In this research, we investigate the innovation of number language by child homesigners. We show that child homesigners have distinct gestural devices for expressing information about number and that they combine these devices with both deictic (pointing) and iconic gestures. Analysis of the homesigners’ form-based gesture classes also reveal that they exhibit systematic form-meaning mappings characteristic of a morphological system. We also compare number language in child homesign to number expressions in mature (sign) languages.
**********************************************************************************
Jing Peng, Computer Science, Montclair State University
Anna Feldman, Linguistics & Computer Science, Montclair State University
Ekaterina Vylomova, Computer Science, Bauman Moscow State Technical University & Linguistics, Montclair State University
October 8, 2014
Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions
We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are high-ranking representatives of a common topic of discussion, are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective; and therefore, we incorporate a simple analysis of the intensity of emotions expressed by contexts. We investigate the bag of words topic representation of one to three paragraphs containing an expression that should be classified as idiomatic or literal (a target phrase). We extract topics from paragraphs containing idioms and from paragraphs containing literals using an unsupervised clustering method, Latent Dirichlet Allocation (LDA) (Blei et al., 2003). Since idiomatic expressions exhibit the property of non-compositionality, we assume that they usually present different semantics than the words used in the local topic. We treat idioms as semantic outliers, and the identification of a semantic shift as outlier detection. Thus, this topic representation allows us to differentiate idioms from literals using local semantic contexts. Our results are encouraging.
Keelan Evanini, ETS
April 11, 2014
Using Pitch Contours to Improve Automated Spoken Language Proficiency Assessment
An utterance’s prosodic cues (such as intonation and stress) are important for successful communication, but many non-native speakers have difficulty mastering these aspects of a second language. In particular, pitch contours can be difficult for non-native speakers of English, since a variety of contours, each associated with different meanings, can be appropriate for a given utterance. This study focuses on the assessment of pitch contours in the context of an automated spoken language proficiency assessment eliciting read speech. Various measures of representing a speaker’s pitch contour will be presented, along with several methods of comparing a given contour to a model of native speaker pitch contours. Additionally, linguistic information is used to select subsets of the words in the reading passage that are important for the utterance’s prosody in order to improve the performance of the features. Results show that the inclusion of the proposed features based on a speaker’s pitch contour improve the performance of an automated spoken language assessment system.
Bio:
Keelan Evanini is a Managing Research Scientist in the NLP & Speech group in the Research & Development division at Educational Testing Service. His research focuses on developing methods to automatically assess various linguistic aspects of non-native speaking proficiency, including pronunciation, intonation, fluency, and content appropriateness. He has also conducted research into improving other components of a system for automated non-native spoken language assessment, such as the detection of plagiarized spoken responses and optimizing ASR performance for non-native speech. He received a B.A. in Linguistics from the University of California, Berkeley and a Ph.D. in Linguistics from the University of Pennsylvania.
**********************************************************************************
Mark Hubey, Computer Science, Montclair State University
November 22, 2013
Dissimilarity/Distance, Correlation, and All that
**********************************************************************************
Emily Hill, Computer Science, Montclair State University
November 15, 2013
Evaluating Feature Location Techniques for Software Maintenance
Today’s software is large and complex, with systems consisting of millions of lines of code. New developers to a software project face significant challenges in locating code related to their maintenance tasks of fixing bugs or adding new features, called feature location. Developers can simply be assigned a bug and told to fix it—even when they have no idea where to begin. In fact, research has shown that a developer typically spends more time locating and understanding code during maintenance than modifying it. We can significantly reduce the cost of software maintenance by reducing the time and effort to find and understand the code relevant to a software maintenance task.
In this talk, we will explore state of the art approaches to feature location for software maintenance, how they are currently evaluated, and whether elements of a feature can be generically categorized.
Bio:
Emily Hill is an Assistant Professor at Montclair State University. Her primary research interests are in software engineering; specifically on reducing software maintenance costs through building intuitive software engineering and program comprehension tools. Her research is inter-disciplinary and combines aspects of software engineering, program analysis, natural language processing, computational linguistics, information retrieval, text mining, and machine learning.You can find out more on her web site: http://msuweb.montclair.edu/~hillem/.
**********************************************************************************
Marina Kunshchikova, Department of German Linguistics, The Ural State University, Yekaterinburg, Russia
October 25, 2013
Classroom bilingualism and second language wordplay
Bilingualism or two linguistic world’s correlation is a well-known second language acquisition (SLA) phenomenon that has been of a great academic interest for various research and fields of study. Classroom bilingualism (CB) is a relatively new term that has its own place in the theory of SLA and bilingualism. CB is the subject of different linguistic disciplines including general linguistics, psycholinguistics, cognitive linguistics and sociolinguistics. We analyze the second language speech of the high English level students (bilinguals). Our research deals with the non-standard (creative) linguistics features of the classroom bilingual speech. The material of the research is based on the students’ written English speech patterns in the situation of the classroom English –Russian bilingualism (80 essays or 600 000 printed characters have been analyzed) and also on the scripts of the oral English speech classes (60 academic hours or 500 000 printed characters). As a result, the examples of the word-play have been found at all levels of the language system (the graphical, morphological, lexical, syntactic levels). New found second language coinages at several levels of the language system help to develop the theory of language creativity / the theory of the wordplay and to contribute to the theory of the inerlanguage.
Bio:
Marina Kunshchikova is a postgraduate student at the Ural Federal University and an instructor of English in the German philology department. She’s currently a Fulbright visiting researcher at Montclair State University. Her research interests are second language acquisition, psycholinguistics, bilingualism and language creativity as well as corpus linguistics.
**********************************************************************************
Xiaofei Lu, Associate Professor, Department of Applied Linguistics, The Pennsylvania State University
September 27, 2013
A historical analysis of text complexity of the American reading curriculum
The widely adopted Common Core State Standards (CCSS) call for raising the level of text complexity in textbooks used by American students across all grade levels; the authors of the English language arts component of the CCSS build their case for higher complexity in part upon a research base they say shows a steady decline in the difficulty of student reading textbooks over the past half century. In this interdisciplinary study, we offer our own independent analysis of third and sixth grade reading textbooks used throughout the past 115 years. Our dataset consists of 8,041 reading texts selected from 117 textbook series issued by 30 publishers, resulting in a corpus of roughly 10 million words. Each reading text in the corpus was assessed using a large set of readability, lexical complexity, and syntactic complexity measures. Contrary to previous reports, we find that text complexity has either risen or stabilized over the past half century; these findings have significant implications for the justification of the CCSS as well as for our understanding of a “decline” within American schooling more generally.
Bio:
Xiaofei Lu is Gil Watz Early Career Professor in Language and Linguistics and Associate Professor of Applied Linguistics at The Pennsylvania State University. His research interests are primarily in corpus linguistics, computational linguistics, and intelligent computer-assisted language learning.
**********************************************************************************
Matt Mulholland, Montclair State University
Joanne Quinn, Montclair State University
September 25, 2013
Suicidal Tendencies: The Automatic Classification of Suicidal and Non-Suicidal Lyricists Using NLP
Lisa Radding, Ethnic Technologies, LLC, S. Hackensack, NJ
May 3, 2013
Applied Linguistics: Onomastics and Direct Marketing
I am an Onomastician. But to what end? Direct Marketing. More specifically, I facilitate target marketing by ethnicity, religion, language preference, etc. Companies can increase their ROI by creating specialized marketing campaigns, meant to engage specific clientele. To do this, they require predictive consumer intelligence that doesn’t encroach on an individual’s privacy. But basic information about an individual exists within the person’s name. I design a software product that uses onomastic research at its core to enable target marketing.
Linguists develop logical thinking and analytical skills, and gain experience organizing data sets and testing the validity of hypothesized general rules. This valuable skill set can be applied to multiple passions, in a variety of industries. In my case study, the passion is Onomastics and the industry is Direct Marketing.
Bio:
Lisa Radding is the Director of Research at Ethnic Technologies, LLC. In this position, she envisions, researches, and writes methodology enhancements for E-Tech, the industry-leading product in multicultural marketing. As a linguist, but specifically an onomastician (an expert in the study of proper names), she maintains and improves the ethnic name research at the core of E-Tech. Additionally, she is instrumental in the development of sister products under the E-Tech brand that enable added database segmentation particularly focused on the Hispanic, Asian, and African American markets. Ms. Radding has published work in onomastics in the Geographical Review, and has presented her work in this academic field, and in the context of Direct Marketing. Additionally, she currently serves on the Executive Council of the American Name Society.
**********************************************************************************
Michael Flor, Educational Testing Service, Princeton, NJ
April 19, 2013
ConSpel: Automatic spelling correction and the power of context.
Single-token non-word misspellings are the most common type of misspellings in student essays. This paper presents an investigation on using four different types of contextual information for improving the accuracy of automatic correction of such errors. The presentation includes three parts. In part one, I describe the methodology and tools we used for annotating misspellings in a corpus of 3000 student essays written by native and nonnative speakers of English, to the writing prompts of TOEFL® and GRE® tests. Part 2 presents the principles and architecture of a new spell-checking system (ConSpel) that utilizes contextual information for automatic correction of non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Will also briefly touch on technical innovations that make this system possible. Part 3 presents an investigation on using four different types of contextual information. The effectiveness of proposed methods is evaluated with the annotated corpus of essays. Using context-informed re-ranking of candidate suggestions, the ConSpel system exhibits very strong error-correction results. It also corrects errors generated by non-native English writers with almost same rate of success as it does for writers who are native English speakers.
Bio:
Michael Flor is an associate research scientist in the NLP & Speech group, at the R&D Division of Educational Testing Service (ETS). He earned his PhD in cognitive psychology with specialization in psycholinguistics, from Tel Aviv University, Israel. Michael has also worked as a computational linguist for start-up companies, developing natural-language processing algorithms for content-personalization and search-engine applications. At ETS, Michael specializes in NLP research and systems development, focused on automatic processing of text data, combining statistical, linguistic and cognitive approaches.
**********************************************************************************
Barbara Landau, Johns Hopkins University
March 22, 2013
Origins and development of spatial language: Some complexities
The acquisition of spatial language has historically provided an important test-bed for theories of the relationship between language and non-linguistic representation. According to one hypothesis, spatial language emerges from a foundation of non-linguistic spatial concepts that are present pre-linguistically; in this view, the child selects from a prior set of spatial concepts on the basis of linguistic input, electing only a subset of the spatial distinctions that are universally present in his non-linguistic repertoire. These basic distinctions are shown in the child’s earliest spoken language, and correspond to concepts such as containment and support. By contrast, a second hypothesis states that spatial language emerges strictly as a function of linguistic input; in this view, the child creates new spatial concepts (not previously available) on the basis of this linguistic input. In this talk, I consider several challenges to both of these hypotheses. Chief among these challenges is the role played by the combinatorics inherent in spatial language, and hence the meanings that can be expressed across languages. Acknowledging the complexities of the mapping between spatial language and spatial concepts forces us to abandon simplistic hypotheses and begin to think about learning in new and more subtle ways.
About Barbara Landau:
http://en.wikipedia.org/wiki/Barbara_Landau
http://cogsci.jhu.edu/people/landau.html
**********************************************************************************
Smaranda Muresan, Rutgers University
March 1, 2013
Computational Models for Context-dependent Deep Language Understanding
Within the last decade, machine learning techniques have significantly advanced the
field of natural language processing. However, most state-of-the-art machine learning models suffer from a fundamental limitation: they are based on finite-state or context-free formalisms which are inadequate for capturing the whole range of human language. Without richer, linguistically-inspired formalisms these models may soon reach a plateau for studies and applications in which deep language processing is needed. In this talk I will introduce a grammar formalism for deep linguistic processing — Lexicalized Well-Founded Grammar (LWFG) — that allows context-dependent interpretation of utterances and at the same time is learnable from data. Once a grammar is learned, a LWFG parser and semantic/pragmatic interpreter map text to its underlying meaning representation. I will discuss three key features of my framework for deep language understanding: 1) a rich but learnable grammar formalism; 2) a model that can learn complex representations from a small amount of data; and 3) context modeling. I will discuss how this framework can be used for learning consumer-health terminologies from text. If time allows, I will talk about our new NSF-funded project on developing technologies for teaching computer agents to follow instructions given in natural language in order to allow them to learn how to carry out complex tasks on behalf of a user (project in collaboration with Michael Littman and Marie desJardins).
Bio:
Smaranda Muresan is an assistant professor in the Library and Information Science Department, School of Communication and Information at Rutgers University. She is the co-director of the Laboratory for the Study of Applied Language Technologies and Society, and a graduate faculty in the department of Computer Science. She received her PhD in Computer Science from Columbia University in 2006. Before coming to Rutgers she was a Postdoctoral Research Associate at the Institute for Advanced Computer Studies at the University of Maryland. Her research focuses on computational models for language understanding and learning, with applications to health informatics, human-computer instruction, and computational social science. Her research is funded primarily by NSF and DARPA.
**********************************************************************************
Nitin Madnani, Educational Testing Service, Princeton, NJ
February 22, 2013
What Test Takers Say: Analyzing Argument Organization and Topical Trends in Essays
In this talk, I will present two strands of natural language processing research at ETS that were designed to help us understand the nature of test-taker writing in essays.
In the first, I will talk about our research on test taker responses to argument-driven prompts, which contains not only language expressing claims and evidence, but also language used to organize these claims and pieces of evidence. Differentiating between the two may be useful for many applications and I will discuss our automated approach to detecting such high-level organizational elements in argumentative discourse.
In the second part of the talk, I will discuss our research on test takers responses to more generic prompts about social issues. Without an understanding of the trends reflected in these responses, automated scoring systems may not be reliable and may also worsen over time. Our preliminary approach analyzes topical trends in test takers’ responses and correlates these trends with those found in the news. We find evidence that many trends are similar across essays and the news but also observe some interesting differences.
Bio: Nitin Madnani is currently a Research Scientist with the Text, Language and Computation group at the Educational Testing Service in Princeton, NJ. He received his PhD in Computer Science in 2010 from University of Maryland, College Park where he worked with Bonnie Dorr and Philip Resnik on a number of NLP topics but focused on Statistical Machine Translation and Automatic Paraphrase Generation. At ETS, he spends his time on building paraphrase models for use in automated test scoring, improving grammatical error correction, analyzing sentiment in essays and being the resident Python and information visualization geek. More details about his work can be found at http://www.desilinguist.org
**********************************************************************************
Serguei Pakhomov, College of Pharmacy, University of Minnesota
November 16, 2012
Computerized Assessment of Spoken Language for Pharmacodynamic Analysis
In this talk, I will introduce a new area of research that applies computational linguistic methods and tools to the assessment of adverse effects of neuroactive medications. Some anti-epileptic medications have been reported to cause word finding difficulties in a subset of people who take these medications. These word finding problems are currently not very well defined and are difficult to quantify and measure precisely. Furthermore, the underlying brain mechanism(s) that are responsible for these deficits are currently not known. One of the main objectives of our interdisciplinary group at the University of Minnesota Center for Clinical and Cognitive Neuropharmacology (C3N) is to develop methods for more precise and reproducible measurement of these deficits towards better characterization of their behavioral manifestations and the underlying mechanism(s). I will describe a range of computerized instruments that we have developed and that are designed to extract speech and language characteristics from spontaneous speech. In particular, I will present the results of a study of 20 volunteers that were randomized to receive an anti-epileptic medication (topiramate), an anxiolytic medication (lorazepam), or placebo. In one of the cognitive assessment tasks, the subjects were asked to describe a picture. Their responses were audio recorded and subsequently examined in a semi-automated fashion to measure speech fluency. Our findings so far indicate that speech fluency characteristics including duration of silent pauses and the rate of disfluent speech events (um’s and ah’s, word fragments and repetitions) are sensitive to the effects of topiramate and constitute a promising direction for further research.
Bio:
Dr. Pakhomov currently is an Associate Professor at the University of Minnesota College of Pharmacy. He is a co-founder of the University of Minnesota Center for Clinical and Cognitive Neuropharmacology, a member of the Center for Cognitive Sciences and an affiliate member of the Institute for Health Informatics at the University of Minnesota. Dr. Pakhomov earned a Doctorate degree in Linguistics with a Cognitive Science minor from the University of Minnesota in 2001. Prior to his academic appointment at the University of Minnesota, he worked as a research scientist in several commercial and academic organizations including the Mayo Clinic, Lernout and Hauspie, Inc. and Linguistic Technologies, Inc. Dr. Pakhomov’s primary research interest is in applying computational methods to analyze and quantify speech and language characteristics affected by neurodegenerative disorders and neuroactive medications.
**********************************************************************************
Jing Peng, Computer Science
Anna Feldman, Linguistics & Computer Science
October 12, 2012
Identifying Figurative Language in Text: First Results
In our talk we will discuss several experiments whose goal is to
automatically identify figurative language in text. While our
approach does not have to be limited to a specific type of figurative
language, we concentrate on idioms (and metaphors, to some extent).
We explore several hypotheses: 1) the problem of automatic idiom
detection can be reduced to the problem of identifying an outlier in a
dataset; 2) instead of extracting multiword expressions and then
determining which belongs to the idiomatic class, we view the process
of idiom detection as a binary classification of sentences. We apply
principal component analysis (PCA) (Jolliffe 1986; Shyu et al. 2003) for outlier
detection. Detecting idioms as lexical outliers does not exploit class
label information. So, in the following experiments, we use linear
discriminant analysis (LDA) (Fukunaga 1990) to obtain a
discriminant subspace and later use the three nearest neighbor 3NN
classifier to obtain accuracy. We discuss pros and cons of each
approach. All the approaches are more general than the previous
algorithms for idiom detection — neither do they rely on target idiom
types, lexicons, or large manually annotated corpora, nor do they
limit the search space by a particular type of linguistic
construction.
Alice Freed, Professor of Linguistics, Emerita
April 20, 2012
Language and gender in the public eye: Still “different” after all these years.
This talk is a preliminary review and update of the evidence and arguments I compiled about 10 years ago for a chapter in The Handbook on Language and Gender (Freed 2003). In the chapter, I argued that after 30 years of research on language, sex and gender – now nearly 40 – a significant discrepancy existed between public perceptions of how women and men speak (and how they are expected to speak) and the actual character of the language that people use. The persistence of this contradiction seemed to underscore the vitality of well-entrenched stereotypes about sex and gender and the weight and influence of societal efforts to maintain the impression of difference between women and men. What has changed is the type of evidence being presented today in support of these so-called differences. Brain research and the new “ null” is increasingly used to support claims of male-female language difference.
In 2003, I explained that despite the vast quantities of naturally occurring speech samples from a wide range of contexts that null, linguistic anthropologists, and other scholars had analyzed – from the amount of talk, to the structure of narratives, the use of questions, to the availability of cooperative and competitive speech styles – no consistent pattern had been found between either sex or gender and the characteristics of the way we use language. Yet, despite the enormity of our research results, the public representation of the way women and men speak was, and I will argue, still is almost identical to the characterization provided in the middle of the last century.
Sources of evidence about public views of language difference will come mainly from (1) a review of several articles that have recently appeared in the popular press/ mass media and (2) a preliminary analysis that uses several on-line language corpora and library databases that search for the occurrence of “gender difference” in media sources for the years 2000-2010.
Freed, Alice F. 2003. “Epilogue: Reflections on Language and Gender Research.” In The Handbook on Language and Gender. Janet Holmes and Miriam Meyerhoff, (Eds.) Oxford: Blackwell Publishers. Pp. 699-721.
**********************************************************************************
Joel Tetreault
April 11, 2012
A New Twist on Methodologies for ESL Grammatical Error Detection
The long-term goal of our work is to develop a system which detects errors in grammar and usage so that appropriate feedback can be given to non-native English writers, a large and growing segment of the world’s population. Estimates are that in China alone as many as 300 million people are currently studying English as a second language (ESL). In particular, usage errors involving prepositions are among the most common types seen in the writing of non-native English speakers. For example, Izumi et al., (2003) reported error rates for English prepositions that were as high as 10% in a Japanese learner corpus.
Since prepositions are such a nettlesome problem for ESL writers, developing an NLP application that can reliably detect these types of errors will provide an invaluable learning resource to ESL students. In this talk we first review one popular machine learning methodology for detecting preposition and article errors in texts written by ESL writers. Next, we describe a novel approach to ESL grammatical error detection: using round-trip machine translation to automatically correct errors.
This is joint work with Nitin Madnani (ETS) and Martin Chodorow (CUNY).
Bio:
Joel Tetreault is a Managing Research Scientist specializing in Computational Linguistics in the Research & Development Division at Educational Testing Service in Princeton, NJ. His research focus is Natural Language Processing with specific interests in anaphora, dialogue and discourse processing, machine learning, and applying these techniques to the analysis of English language learning and automated essay scoring. Currently he is working on automated methods for detecting grammatical errors by non-native speakers, plagiarism detection, and content scoring methods. Previously, he was a postdoctoral research scientist at the University of Pittsburgh’s Learning Research and Development Center (2004-2007). There he worked on developing spoken dialogue tutoring systems. Tetreault received his B.A. in Computer Science from Harvard University (1998) and his M.S. and Ph.D. in Computer Science from the University of Rochester (2004).
**********************************************************************************
Matt Huenerfauth, Associate Professor, City University of New York, CUNY
January 31, 2012
Learning to Generate Understandable Animations of American Sign Language
A majority of deaf high school graduates in the U.S. have a fourth-grade English reading level or below, and so computer-generated animations of American Sign Language (ASL) could make more information and services accessible to these individuals. Instead of presenting English text on websites or computer software, information could be conveyed in the form of animations of virtual human characters performing ASL (produced by a computer through automatic translation software or by an ASL-knowledgable human scripting the animation). Unfortunately, getting the details of such animations accurate enough linguistically so that they are clear and understandable is difficult, and methods are needed for automating the creation of high-quality ASL animations.
This talk will discuss my lab’s research, which is at the intersection of the fields of assistive technology for people with disabilities, computational linguistics, and the linguistics of ASL. Our methodology includes: experimental evaluation studies with native ASL signers, motion-capture data collection of an ASL corpus, linguistic analysis of this corpus, statistical modeling techniques, and animation synthesis technologies. In this way, we investigate new models that underlie the accurate and natural movements of virtual human characters performing ASL; our current work focuses on modeling how signers use 3D points in space and how this affects the hand-movements required for ASL verb signs.
Bio:
Matt Huenerfauth is an associate professor of computer science and linguistics at the City University of New York (CUNY); his research focuses on the design of computer technology to benefit people who are deaf or have low levels of written-language literacy. He serves as an associate editor of the ACM Transactions on Accessible Computing, the major computer science journal in the field of accessibility for people with disabilities. In 2008, he received a five-year Faculty Early Career Development (CAREER) Award from the National Science Foundation to support his research. In 2005 and 2007, he received the Best Paper Award at the ACM SIGACCESS Conference on Computers and Accessibility, the major computer science conference on assistive technology for people with disabilities; he is serving as general chair for this conference in 2012. He received his PhD from the University of Pennsylvania in 2006.
**********************************************************************************
Dr. Marie Nadolske, Linguistics
December 12, 2011
On the roles of CODAs in sign language research: Separating L1/L2 acquisition from hearing status
This study examines narratives of three groups of American Sign Language (ASL) signers: Deaf native signers (DOD), hearing native signers (CODAs), and highly proficient non-native hearing signers (L2). Through the examination of several language domains, acquisition patterns can be identified based on whether ASL was learned as a first or second language. Alternately, differing language patterns were identified based on whether a signer had “normal” hearing or was Deaf. These findings resulted from the inclusion of the CODA group in this study. Without their valuable data, differences between the hearing L2 signers and Deaf L1 signers would be solely attributable to language acquisition status with no acknowledgement of the potential complications of being a bimodal-bilingual individual.
**********************************************************************************
Dr. Tara McAllister Byun, Communication Sciences and Disorders
November 11, 2011
Perception-Production Relations in Phonological Development
Many children who neutralize phonemic contrasts in production exhibit diminished perceptual discrimination of the same contrasts. It has proven difficult to determine whether these parallel errors reflect the influence of a primary perceptual deficit on production, or vice versa. I will offer evidence on the direction of causation by comparing positional influences on speech production and perception in one four-year-old boy with phonological disorder. The case study subject neutralized some phonemic contrasts only in initial position, a context known to have enhanced perceptual salience to adult listeners. This unique phenomenon in child phonology has been proposed to arise from a child-specific pattern of perceptual sensitivity favoring final position. However, in a nonword discrimination task, the subject was significantly more accurate in detecting contrasts in initial position, where his production errors occurred. In light of this mismatch, I conclude that the subject’s errors must be the consequence of a production-oriented factor. Independent of position, however, the subject’s perception of a phonemic contrast he neutralized in production was decreased relative to other contrasts. I thus argue that this case represents an unambiguous example of a perceptual deficit arising from a primary deficit in the production domain.
**********************************************************************************
Dr. Laura Lakusta, Psychology, Montclair State University
October 14, 2011
Language and memory for motion events: The asymmetry between source and goal paths over development
Human beings talk about events. The capacity to do so requires an interface between spatial cognition and language. However, given that the format of linguistic and non-linguistic representations is likely to differ, the question arises of how these two systems map onto each other and how these mappings are learned. I will present research suggesting one possible solution to this problem: a homology exists between the non-linguistic and linguistic representations of Source and Goal paths. First, when linguistically describing a broad range of events, children and adults are more likely to encode the Goal path rather than the Source path. A Goal bias is also found when individuals represent events non-linguistically, and even extends to the event representations of pre-linguistic infants. Thus, an asymmetry between Goal and Source paths is common to both linguistic and non-linguistic structure and is found early in development. In the second part of my talk, I will present research exploring the strength of this homology. Is a Goal bias rooted generally in cognition or is specific to intentional events? Research with infants, children, and adults suggest the latter – a Goal bias in non-linguistic cognition shows up most strongly for intentional events. These findings raise the important question of how children learn to collapse over conceptual domains for purposes of expressing Paths in language.
**********************************************************************************
Dr. Mary Call, Linguistics
Dr. David Townsend, Psychology
October 6, 2011
Analysis of Temporal Processing During Sentence Comprehension
For several years, we have been studying the comprehension of temporal relationships in English by native speakers of English. We are now extending this research to include the processing of temporal relationships by English language learners. In a pilot study, we collected self-paced reading data from native speakers of Spanish that differs in interesting ways from similar data collected from native speakers of English. One of our current projects is to carry out a larger scale study of this phenomenon.
In addition, we are testing the influence of the first language (Spanish, in this case) on the judgments that learners make about English sentences containing stative verbs that occur in a context-establishing when clause. If these learners are relying on their Spanish (L1) strategies, we predict that they will choose verb forms that are either ungrammatical or less-preferred in English.
Meredyth Krych Appelbaum, Ph.D., Department of Psychology
April 27, 2011
Coordinating two minds: Do familiar partners such as friends and couples have a communicative advantage compared to strangers?
While one might think that language understanding is a relatively straightforward, passive process, in reality people must actively work together to establish the mutual belief that they have been understood (Clark, 1996, Clark & Krych, 2004; Clark &Wilkes-Gibbs, 1986.) Much of the research in referential communication involves strangers, because it is easier to study the establishment of their common ground, as opposed to friends or couples who might already share a great deal of information to which the experimenter is not privy. And yet, much of everyday communication occurs between people who are familiar with their conversational partners. I will provide evidence that language coordination is much more complicated that it would at first seem (Clark & Krych, 2004). Further, I will discuss a recent study that examines the impact of partner familiarity (strangers, friends, vs. couples) on the efficiency of communication for referential communication tasks in which partners have no privately shared common ground. There is mixed evidence in the literature as to whether familiar partners can communicate more effectively than strangers. Based on previous research (Krych-Appelbaum, et al., 2007), we expected and subsequently found that familiar partners did do no better than strangers. One possible reason is that familiar partners may wrongly assume that their partners should understand them better than they actually do.
**********************************************************************************
Paul C. Amrhein, Psychology
March 25, 2011
How Speech Act Theory Informs Psychotherapy Outcomes or Linguistic Pragmatics Meets Psychiatric Medicine
The narrative genre constructed during a psychotherapeutic session is rife with speech acts, most notably, requests and commitments, mutually exchanged by clinician and client. However, a theory capturing this phenomenon had not been put to empirical test until Amrhein, Miller, Yahne, Palmer and Fulcher (2003). Drawing from Austin, Searle, and McCawley, this theory (Amrhein, 2004) posits that much of the work incurred during psychotherapy concerns the clinician evocation of client utterances denoting desires, abilities, needs and reasons leading to expressions of commitment to maintain current behavior patterns with deleterious health consequences or, ideally, to change them. More specifically, it is what Searle calls the “illocutionary force strength” of client verbal commitments that is proposed to be especially prognostic of future behavior. My talk will present this theory and evidence to date indicating that commitment strength is a malleable, psychological construct influenced by treatment modality, therapist skill, and client intellectual characteristics.
**********************************************************************************
Mary Boyle, Communication Sciences and Disorders
February 25, 2011
Semantic Feature Analysis Treatment for Aphasic Word Retrieval Problems: The Challenge of Moving from Naming to Discourse Production
Evidence from single-subject designs suggest that Semantic Feature Analysis Treatment improves confrontation naming of treated items and untreated items for people with mild or moderate aphasia. However, generalization of this improvement to word retrieval during discourse production has been mixed. Providing treatment at the discourse level, rather than at the confrontation naming level, has yielded some promising results, but has generated a new set of questions, including the best way to measure word retrieval in discourse, the stability of such measurements from day to day, and the relationships of these measures to listeners’ perception of a person’s word retrieval ability. This talk will review research results and discuss current projects at the single-word and discourse levels of treatment.
**********************************************************************************
Jing Peng, Computer Science
December 3, 2010
Transfer Learning with Applications to Text Classification
When labeled examples are difficult to obtain in a target domain, transfer learning can be very useful that exploits knowledge obtained from a source domain to improve performance in the target domain. Existing techniques require that sampling distributions between the two domains are the same. However, this requirement is often violated in practice. In this talk, we describe a technique that maps both target and source domain data into a space where we can bound the difference between two induced distributions, thereby dramatically improving performance. We provide experiments that demonstrate the superiority of the proposed technique.
**********************************************************************************
Jen Pardo, Psychology
October 27, 2010
Dominance and Accommodation During Conversational Interaction
Phonetic variation is a problem for psycholinguistic theories of speech production and perception. If the goal of communication is parity between sender and receiver, then the demands of efficient communication should lead to matching in the phonetic forms employed by interacting talkers. However, phonetic variation is neither random nor due solely to physiology, and is the rule rather than the exception in communication. Therefore, there must be other communicative goals that influence the phonetic forms talkers use when speaking. The current project aims to delineate some of the individual, social, and situational factors that influence phonetic form variation in ordinary conversational interactions. In three studies, unacquainted talkers were recorded before, during, and after performing a conversational task together. The recordings were analyzed and excerpts were presented to naive listeners who made perceptual similarity judgments that assessed the degree to which the talkers converged in phonetic form. Overall, there was a reliable tendency for the talkers to become more similar phonetically, but this tendency was subtle and was influenced by sex of the talker and the role of the talker in the interaction. Moreover, the patterns derived from the global assessments of phonetic similarity provided by the listeners were not related to analyses of individual acoustic attributes in a straightforward manner. These patterns of phonetic variation have important implications for an understanding of the processes of speech production, perception, and their connection.
**********************************************************************************
Eileen Fitzpatrick, Linguistics
September 24, 2010
Linguistic Cues to Deception