Current Research Projects
The Montclair State Linguistics Department faculty are all actively engaged in research. Most of the research projects also provide valuable opportunities for hands-on work in applied linguistics. If you are interested in internship opportunities, please read the descriptions of the research projects and contact the relevant faculty member.
The specific projects include:
Metaphor Comprehension in Spanish-English Bilinguals
Understanding metaphors like “couch potato” requires individuals to rapidly identify relevant semantic features and determine how they are connected, making them an ideal test case for examining semantic processing (i.e., how humans understand meaning). There is a longstanding debate regarding whether and how our cerebral hemispheres are specialized for comprehending figurative language, although the majority of this work has focused on language processing with monolingual native speakers. In this project, we examine how bilingual speakers process complex semantic information, employing an experimental paradigm in which words are presented to either the left hemisphere or the right hemisphere. We use bilingualism, a linguistic state rife with variability, as a lens into the cognitive mechanisms underpinning language processing. To that end, this project investigates how individual differences in literacy,vocabulary, memory, and pragmatic abilities modulates bilingual metaphor processing. This study is conducted in the Experimental Linguistics Laboratory (Schmit Hall)
Contact: Lauren Covey
Linguistic Variation in University Writing
Previous studies of university writing (e.g., Goulart, 2021; Gardner et al., 2018; Hardy & Römer, 2013) have shown that undergraduate assignments (e.g., essays, critiques, laboratory reports, etc) have different linguistic profiles. For example, laboratory reports tend to have more phrasal features (premodifying nouns, noun-noun sequences, etc), while essays tend to have more clausal features (adverbials, verb complement clauses, etc). These studies, however, did not consider the extent to which the same assignment can have distinct linguistic profiles across disciplines. That is, a research report written for a history class might have different linguistic characteristics than a research report written for a psychology class. This project addresses this gap by investigating the extent to which there is linguistic variation in undergraduate writing across discipline and communicative purposes. Contact: Larisa Goulart More at CORAL Corpus Lab
Corpus linguistics for teaching
Corpus linguistics is a research methodology that involves the analysis of large collections of language data, or corpora, in order to identify patterns and regularities in language use. Since the late 80s and early 90s, linguists have suggested that corpus tools have the potential to revolutionize language teaching (see Johns, 1986; Rundell & Stock, 1992; Johns, 1991; among others). These researchers point out a number of advantages to using corpus in the classroom, including: (1) Authenticity – Corpora allows students to explore natural language data; (2) Autonomy – Students can act as language detectives and identify language patterns on their own; (3) Specificity: Teachers can use corpora that reflect the type of discourse or disciplinary practices that students will encounter in their daily lives. Even though the call for the inclusion of corpora in the language classroom has been around for many years, teachers still encounter many challenges when working with corpora. This project seeks to advance our understanding of how teacher corpus training can influence the inclusion of corpus in the classroom. This project tests five-component of a theoretical framework for measuring teachers’ perceived corpus literacy (CL) and its subskills. Contact: Larisa Goulart More at CORAL Corpus Lab
Detection and Recognition of Euphemisms
Detecting and interpreting figurative language is a rapidly growing area in Natural Language Processing (NLP). Unfortunately, the processing of euphemisms is lacking in NLP thus far. The project addresses the following: 1) algorithm design for detecting and interpreting euphemisms, and 2) interpretability of black-box neural models by creating a series of new datasets and tasks that explore the embedding space of transformer language models for euphemism recognition. The key insights are 1) euphemistic expressions and their paraphrased counterparts differ in the strength of the sentiment they convey; 2) euphemistic and non-euphemistic interpretation is context-sensitive; 3) euphemisms are vaguer than the taboo expressions they substitute. The experiments test what linguistic properties of euphemisms the deep learning approaches capture and why. The algorithm developed can detect new euphemisms, not previously recorded in dictionaries, without human intervention. The computational work on euphemisms is important to further the understanding of how strategic use of language can bias people’s perceptions of important and highly contentious actions and perhaps find ways how to de-bias language models. This work on euphemisms helps understand what topics are controversial or sensitive in a specific culture. Applying the algorithm to diachronic data and detecting the change in euphemism usage leads to a better understanding of culture changes. The corpora produced are useful for answering questions at the intersection of AI, NLP, linguistics, cultural anthropology, and social psychology. The range of languages provides a natural way of making interesting linguistic observations about euphemisms.
Since euphemisms are a form of verbal behavior, finding a way to detect and interpret euphemisms automatically may lead to a better understanding of human behavior in general.
Funded
See more at the NLP Lab
Contact: Anna Feldman
Prosodic Event Annotation and Detection in Three Varieties of English
The term prosody encompasses all aspects of speech beyond the properties of individual segments, in particular intonation, stress and rhythm. Every utterance is organized prosodically, and prosodic properties encode a wide range of communicative content, including syntactic grouping, emphasis, turn-taking, affect and speech acts. Applications of prosody in automatic spoken language tasks have been well demonstrated. Yet our understanding of prosody, and its integration in speech technologies, have lagged behind, particularly so for language varieties that are already underrepresented.
This project has three related aims. First, to generate a prosodically-annotated, machine-analyzed corpus of spontaneous, task-oriented speech for three varieties of American English: European American English (EAE), African American English (AAE) and Latine English (LE), as spoken in New Jersey. Second, the prosodic annotations and acoustic analyses of these data will be used to develop procedures for the automatic detection of prosodic events in these varieties, specifically prominence and boundary. Third, we will elicit annotations from everyday listeners in order to validate the expert annotations and to uncover the acoustic cues exploited by listeners. Contact: Jonathan Howell
Bilingual Lexical-Conceptual Structure and Its Activation in Codeswitching
Most previous studies of codeswitching (CS) describe some observable grammatical and lexical constraints on such a bilingual speech, but they often remain at a rather surfaced-based observational level. This study investigates bilinguals’ linguistic motivations for CS by exploring the nature and activity of the bilingual mental lexicon in bilingual speech involving CS. It adopts the Matrix Language Frame Model (Myers-Scotton, 1993; Myers-Scotton & Jake, 1995) and the Bilingual Lemma Activation Model (Wei, 2002, 2006, 2015) in describing some outstanding grammatical and lexical constraints as perceived in naturally occurring bilingual speech involving CS. The focus of this study is on the bilingual mental lexicon and its lexical-conceptual structure retrieved and activated as a solution to bilingual communication. It claims that the bilingual mental lexicon contains abstract entries called ‘lemmas’ about lexemes. It further claims that lexemes are universal, but lemmas are language specific and are in contact in bilingual speech production. This study argues that bilingual lexical-conceptual structure is composite and complex, and it is the cross-linguistic differences in lemmas of certain lexical items which motivate CS. Abundant instances of CS involving various language pairs are needed to support such an argument. Contact: Longxing Wei
The Nature and Activity of the Bilingual Mental Lexicon in Second Language Acquisition
First language (L1) transfer in second language acquisition (SLA) has been recognized as a natural and universal phenomenon, and numerous studies of SLA have provided various explanations of sources of language transfer or learner errors. However, most previous studies remain at a surface level of observation ad description. This study proposes an ‘abstract’ approach beyond any surface-level based studies. This approach is abstract in the sense that it explores the nature and activity of the bilingual systems in contact during L2 learning. This study tests the assumptions and claims of the Bilingual Lemma Activation (BLA) Model (Wei, 2002, 2006, 2015) of SLA. According to this model, the bilingual mental lexicon does not simply contain lexemes, but also abstract entries called ‘lemmas’ about them at three levels of linguistic abstraction: lexical-conceptual structure, predicate-argument structure, and morphological realization patterns. The BLA Model assumes that lemmas in the bilingual mental lexicon are language specific, and they compete in L2 learning. Typical instances of language transfer as observed in L2 performance by learners with different L1s are needed to reach the conclusion that any incomplete acquisition of the lemmas of the target language’s particular lexical items may cause learner errors and the less activation of learners’ L1 lemmas drives more successful SLA. Contact: Longxing Wei
Interlanguage as an Outcome of Bilingual Linguistic Systems in Contact
Most previous studies of interlanguage (IL) relate IL performance errors in second language (L2) learning to the developing IL system itself. Though in such studies, language transfer is regarded as one of the processes responsible for IL development, little attention has been paid to the relationship and interaction between learners’ first language (L1) and target language (TL). This study assumes that IL, as a developing linguistic system, involves several linguistic systems, such as learners’ L1, learners’ TL, and learners’ currently acquired L2, and such linguistic systems are in contact in learners’ target-oriented speech production, each contributing different amounts to the developing IL system. This study further assumes that the nature and activity of the bilingual mental lexicon may play a significant role in IL development. As claimed in this study, the bilingual mental lexicon contains language-specific ‘lemmas’ (i.e., abstract entries in the mental lexicon about individual lexemes), and such lemmas are in contact in IL production. Thus, IL performance errors are viewed as consequences of ‘lemma transfer’ of learners’ L1 abstract lexical structure. The lexical structure is ‘abstract’ because it contains three abstract levels of linguistic organization: lexical-conceptual structure, predicate-argument structure, and morphological realization patterns. This study treats IL as an outcome of bilingual linguistic systems in contact at an abstract level. Sufficient IL performance data need be collected and analyzed to test the hypothesis that the developing IL system is driven by an incompletely acquired abstract lexical structure of the TL, and IL development is always a predictable and target-oriented process. Contact: Longxing Wei
Past Research
Automatic acoustic detection of semantic focus
By emphasizing words acoustically, speakers are able to convey which parts of a sentence are backgrounded and which parts they wish to foreground or contrast. This feature of speech, known as focus, is pervasive in English, yet is inadequately modeled in state-of-the-art speech technologies.
The long-term objective of the research is to develop a method of automatically detecting focus that is both useful in speech technology and advances our scientific understanding of how focus is realized acoustically and conditioned pragmatically
The project is innovative in its use both of speech that has been recorded in a laboratory under controlled conditions, and also of speech that occurs naturally, such as in podcasts and videos. Contact: Jonathan Howell Funded.
The Bilingual Mental Lexicon in Interlanguage Development
The research places interlanguage in the domain of language contact. Interlanguage is understood as a composite developing system, and the bilingual mental lexicon contains lemmas resulting from an abstract level in language contact. It is assumed that entries in the bilingual mental lexicon (i.e., lemmas) are composed of three levels of abstract structure: lexical-conceptual structure, predicate- argument, and morphological realization patterns, and these levels in any one lemma (originating with the L1 or any target L2s or even lemmas from other languages) can be split and recombined with levels from another source. Language transfer in second language learning and interlanguage transfer in third language learning are regarded as lemma transfer. The language data for this research project are collected from second and third language learners’ interlanguage production, including both oral and written production. Contact: Longxing Wei
Intrasentential Codeswitching
This research investigates a commonly observed bilingual behavior in so-called ‘mixed’ speech production. Bilinguals may switch to another language within sentence boundaries, that is, morphemes from another language are switched into sentences (i.e., intrasentential codeswitching (ICS)). It is assumed that two languages involved in ICS are not equally activated, with the Matrix Language (i.e., the ‘main’ or ‘host’ language the bilingual is using at the moment of speaking) providing the sentential frame into which morphemes from the Embedded Language (i.e., the ‘guest’ language activated at a certain point of speech production) are switched. It is also assumed that there is a distinction between ‘content’ (lexical) and ‘system’ (functional) morphemes. In ICS, only content morphemes can be switched into the sentential frame provided by the Matrix Language. This research regards ICS as a language contact phenomenon and tests the Bilingual Lemma Activation Model (Wei, 2002, 2003, 2005, 2006) in ICS studies. The data for the research is collected from bilingual natural speech production. Contact: Longxing Wei
A Linguistically-Informed Approach for Measuring and Circumventing Internet Censorship
Internet censorship consists of restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. However, members of the Internet Freedom community lack comprehensive real-time awareness of where and how censorship is being imposed. The challenges to achieving such a solution include but are not limited to coverage, scalability, adoption, and safety. The project explores a linguistically-informed approach for measuring and circumventing Internet censorship. The research takes a new perspective on the problem by investigating a hybrid method for censorship detection and evasion from the lens of linguistic analysis. Joint work with Chris Leberknight (Computer Science), Mung Chiang (Purdue), Prateek Mittal (Princeton). Contact: Anna Feldman. Funded.
Deception Detection
Development of a novel approach for the application of natural language processing and prosodic analysis to the recognition of deceptive statements. Joint work with Deception Detection Technologies. Funded. Contact: Eileen Fitzpatrick
Code-Switched Text Messages (SMSs) in Multiple Languages
Text messaging practices have been studied primarily from a sociolinguistic or discourse analytic perspective but there are studies that focus on syntactic and morphological aspects of text messages as well as pragmatic functions. However, there are very few studies that have investigated code-switching, or language alternation, in text messages or computer-mediated communication (CMC). This research project builds upon previous studies conducted on code-switching as a global phenomenon in SMSs and Facebook chats. The form and functions and linguistic creativity of code-switching in this mode of digital discourse will be investigated. Contact Susana Sotillo
Automatic Idiom Recognition
The main goal of this research project is to develop a language-independent method for automatic idiom recognition. Idiomatic phrases such as “hit the sack,” “eat my hat”, “blow my top” or “go cold turkey” are confusing for computers – and for language learners – to translate because they can often be taken literally, as well as figuratively. To address these challenges, an algorithm is proposed that neither relies on target idiom types, lexicons, or large manually annotated corpora, nor limits the search space by a particular type of linguistic construction. Joint work with Jing Peng (Computer Science). Contact: Anna Feldman. Funded by the NSF through Jan. 30, 2018.
Contrastive Academic Cultures
A contrastive examination of differences that may exist in academic cultures. Data is collected during interviews with international students and with Montclair State professors and analyzed with the goal of creating materials that will help international students integrate more smoothly into Montclair’s academic community. (Mary Call)
Sentence Processing
We are conducting on-line sentence processing experiments to investigate the role of various properties of verbs on sentence comprehension. We are particularly interested in the roles of transitivity and telicity, and our experiments are intended to determine the point in the comprehension of a sentence at which verb properties come into play — whether at the moment that the verb is encountered or at a later point when a syntactic or semantic structure is assigned to a phrase or sentence. The broader significance of this research is that it attempts to determine which properties of sentences are based on the lexical characteristics of individual words, and which are the result of a higher level of syntactic and semantic processing. (Mary Call)
Arabic-English medical lexicon
Construction of an Arabic-English medical lexicon for use in a machine translation system. The project has developed an ontology of terms necessary for doctor-patient interaction and is providing several thousand terms in both languages for MT. Contact: Eileen Fitzpatrick. Funded.
Corpus use in language arts teaching
Examination of the use of language corpora in the teaching of English grammar. Contact: Susana Sotillo
The project collects English text written by English as a Second Language (ESL) students
It stores the text online, collects data on the student writers that is relevant to their second language skills, annotates the text to permit retrieval of usage information and analysis of errors. Contact: Eileen Fitzpatrick
Gender Studies Terminology
The term gender is increasingly replacing the word sex in public discourse (and in the media); in theory, this is not the case in sociolinguistics and language and gender research but a preliminary analysis suggests that in practice, a similar phenomenon is occurring. This project involves a thorough investigation of the use of these terms. (Alice Freed).
Speech Segmentation
Phonetic segmentation of speech and annotation of prosodic features. Contact: Eileen Fitzpatrick. Funded.
Portable Language Technology
The focus of this research is on the portability of technology to new languages and on rapid language technology development. This research takes a novel approach to rapid, low-cost development of taggers by exploring the possibility of taking existing resources for one language and applying them to another, related language. Languages that are either related by common heritage (e.g., Czech and Russian) or by “contact” (e.g., Bulgarian and Greek) often share a number of exploitable properties: morphological systems, word order, and vocabulary. Contact: Anna Feldman. Funded.
Questions in Institutional Discourse
The research investigates the use of questions in institutional discourse (and in other sorts of fixed or partially scripted discourse) the role that questioning plays in (a.) constituting the institutional context itself and (b.) constructing and/or co-constructing participant roles and identities for speakers in these contexts. (Alice Freed).