Description
Idea A: Exploratory Data Analysis for Linguistic Signal for Suicidality in Social Media Idea B: Predictive Modeling Using Linguistic Signal for Suicidality in Social Media
An important note
In this project, you will be looking at posts written by users in an online discussion forum. The dataset comes from an ongoing research project where the goal is finding new ways to help prevent suicides.
If you’re feeling like you (or someone you know) could use some support or assistance, please take advantage of one of the following resources:
• National Suicide Prevention Lifeline: 1-800-273-8255 (TALK).
– Veterans please press 1 to reach specialized support.
• Spanish: 1-800-SUICIDA
• Crisis Text Line: Text ”START” to 741-741
• Online chat: http://www.suicidepreventionlifeline.org/gethelp/lifelinechat.aspx
• https://www.reddit.com/r/SuicideWatch/wiki/hotlines – This page provides information about phone and chat hotlines and online resources in the U.S. and worldwide.
Also, although the posts we’re working with are anonymous, it is absolutely essential that you read and understand Section 2, below, on the ethics of working with social media data.
1 Introduction
Let’s start with some numbers.
• Combining direct and indirect costs, the global cost of mental health conditions for 2010 was estimated at $2.5 trillion dollars.
• Looking at the economic toll of non-communicable disease, if you look at the projected cost of mental illness worldwide taking indirect costs into account (e.g. not just medical care costs, but also things like lost income), the cost outstrips cardiovascular diseases, and it’s more than the costs of diabetes, cancer, and chronic respiratory diseases combined.2
• In the U.S. more than 115 million people live in federally designated Mental Health Care Professional Health Professional Shortage Areas — that is, they live in places where it’s hard to get mental health treatment, even if they realize they need help in the first place, which often they don’t.
• Suicide is the third leading cause of death among youths and young adults aged 10 to 24 years, and second for ages 15-19.4
• Based on a comprehensive meta-analysis of 365 studies (3,428 total risk factor effect sizes), Franklin et al. (2017) concluded that predictive ability for suicidal thoughts and behavior has not improved across 50 years of research.
I could go on, but it seems clear that the importance of mental health as a problem space cannot be overstated.
For clinical psychologists, language plays a central role in diagnosis and in monitoring of patients. Indeed, many clinical instruments fundamentally rely on what is, in effect, manual coding of patient language. For example, in assessment for formal thought disorders, analysis of natural speech is an essential factor in the diagnosis, as the clinician must assess the patient’s language for diagnostic features such as incoherence, derailment, loose associations, and tangentiality (Association, 2013). Applying language technology in this domain, e.g. in language-based assessment, could potentially have an enormous impact, because many individuals are motivated to underreport psychiatric symptoms (consider active duty soldiers, for example) or lack the self-awareness to report accurately (consider individuals involved in substance abuse who do not recognize their own addiction), and also because many people — e.g. those without adequate insurance or in rural areas — cannot even obtain access to a clinician who is qualified to perform a psychological evaluation (APA, 2013; Sibelius, 2013). Bringing language technology to bear on these problems could potentially lead to inexpensive screening or monitoring methods that could be administered by a wider array of healthcare professionals, which is particularly important since the majority of individuals who present with symptoms of mental health problems do so in a primary care physician’s office. Given the burden on primary care physicians to diagnose mental health disorders in very little time, the American Academy of Family Physicians has recognized the need for diagnostic tools for physicians that are “suited to the realities of their practice”.5
This project focuses on suicidality. The majority of assessment for suicide risk takes place via in-person interactions with clinicians, using ratings scales and structured clinical interviews (Batterham et al., 2015; Joiner Jr et al., 1999, 2005). However, such interactions can take place only after patient-clinician contact has been made, and only when access to a clinician is available.
This project focuses on risk assessment for suicidality based on social media postings. It is intended to provide you with the opportunity to exercise what you have learned in class on a challenging, open research problem. In this document, we’ll describe the data, along with the basic goals of the project. (And, of course, how you’ll be graded.) There are no guarantees that you will get sensible or interpretable results. But that’s ok: what matters is how thoughtfully you approach things, how much you demonstrate mastery of ideas and techniques that we’ve learned about over the course of the semester, and how carefully and coherently you describe what you did.
Under normal circumstances this project would have two parts: exploratory data analysis, to dig into the dataset, and then using what you’ve discovered predictive modeling. Because the COVID-19 crisis is creating significant new challenges for people, however, this year the project will involve doing either exploratory data analysis or predictive modeling (with error analysis then providing the opportunity to look more closely at the data).
2 Ethical use of data
2.1 General notes
Whenever you’re working with data that originates with human beings, it’s important to spend some time thinking about appropriate uses of the data both in terms of official rules, and in terms of broader ethical considerations whether or not those are officially mandated.
As an important starting point, “human subjects research” is defined as (a) a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge, that involves (b) a living individual about whom a research investigator obtains data through intervention or interaction with the individual, or individually identifiable information. In the U.S., the official definition of human subjects research and the rules surrounding it grew out of abuses that took place in the absence of formalized regulation when researchers convinced themselves that the benefits of their studies outweighed what should have been obvious harms. At universities (and many other organizations), the proper conduct of human subjects research are overseen by a committee called an Institutional Review Board, or IRB.
Formally speaking, this project is actually not human subjects research, for two reasons. First, in general class assignments are not research, because they are intended to help train students or give them experience with research methods, as opposed to collecting information systematically with the intent to develop or contribute to generalizable knowledge. (The intent matters: I hope you’ll learn enough from this assignment to be able to do good research, possibly even to follow up this class assignment with a real research project — see below — but the work you’re doing for this project during this semester is not intended to produce publications.) In addition, this project in particular doesn’t involve human subjects research according to the formal definition: we are working with publicly available social media behavior, which involves neither intervention, nor interaction with individuals, nor individually identifiable information, since Reddit is an anonymous social media site and the data have gone through another layer of automatic de-identification as an additional safeguard.
That said, any project involving social media needs to be handled with great sensitivity, particularly when touchy issues like mental health are involved. It is important, therefore, that you not disseminate or share the data we are working with, and it would also be completely inappropriate to use Web searches to look for further information from or about a user in these datasets, even for benign purposes. Following Benton et al. (2017), rather than quoting any individual postings in your writeup, you should carefully paraphrase, so that someone else doing a search would be less likely to find the posting.
If you are interested in further reading about the ethics of research on social media ask me; there are a lot of new papers emerging.
2.2 Use of the dataset
The primary dataset for this educational project has been collected from an online social media source. The following specifies conditions for your proper use of the dataset. If you are unable to meet these conditions please select Project C, not Project A or B as described here. If you have decided to do Project A or Project B, please send me the statements below followed by the names of the member of your group as a signature. Please then also do that again in the final project writeup.
1. We have read Benton et al. (2017).
2. We understand that privacy of the users and their data is critical, and absolutely no attempt can be made to de-anonymize or interact in any way with users.
3. We understand that this project is being done solely for educational purposes, and the results cannot be used directly in research papers. If we get promising results and would like to develop the ideas into a research paper for publication, or to use what we have done further for another class, we will talk with Prof. Resnik about obtaining suitable Institutional Review Board review. (It’s not hard.)
5. We will store the dataset and any derivatives on computers that require password access. If we are working in an environment where other people can log in, e.g. a department server, we will set file permissions restrictively so that only you have access. You can also use group permissions limited
to members of your group — but under no circumstances will data related to this project be worldreadable.
6. Any copies of the data or derivatives of it will be accompanied by a clear README.txt file identifying Prof. Resnik as the contact person and stating further re-distribution is not to take place without contacting him first. If anyone we know is interested in the dataset, we will refer them to Prof. Resnik, rather than providing the data ourselves.
7. Once we have completed the project, we will delete any copy of the dataset we have made, including any derived files (e.g. tokenized versions of the documents).
8. We will not cut/paste any text content from this dataset into our project proposal, project writeup, onto the class discussion board, into e-mail, etc. If we want to identify a specific posting, e.g. in discussion on the class discussion board, we will use the ID from the dataset. If we want to give examples, we will create a paraphrase instead of the original text. For example, if a posting said What’s this world come to? http://t.co/ XxI4QnMew we could change it to I wonder what this world has come to? http://t.co/ YYY. (Or just make up a post that demonstrates whatever it is you want to describe.)
In your final project writeup please also include the following statement: We have deleted all our copies of the project dataset.
And again, if you have any questions or concerns, of course please speak with me.
3 Background on the problem
3.1 Some prior computational work
There are too many references in here for you review all of them, but I’m erring on the side of too much rather than too little information. I’ll try to provide some guidance as to the most useful things to look at and I’m happy to answer questions; please feel free to also use the class discussion board to talk about this.
3.2 Background on risk factors for suicidality
• Thoughts
– Thinking about suicide, having suicide on their mind
– Having told friends or family they are thinking about suicide
– Feeling that they are a burden to others
– Endorsement of suicidal beliefs, even without the word suicide (e.g., I deserve to die, I can never be forgiven for the mistakes I made)
– A “fuck it” (screw it, game over, farewell) thought pattern
• Feelings
– A sense of agitation, not being able to “stand still” physically or mentally Popovic et al. (See also 2015)
– Indications of being impulsive; risky behavior (e.g. reckless driving, promiscuity)
– Expressing lack of hope for things to get better
• Logistics
– Talking about plans that involve suicide
– Talking about methods of attempting suicide, even if not planning
– Preparation, actually taking actions to prepare for an attempt
– Having access to lethal means (a way to take their own life), especially firearms
– Having enough privacy or isolation to make an attempt
• Context
– Previous attempts
– An event or life change that is leading them think about suicide
– Isolation from friends and family
In addition, there is some quite recent work that has been attempting to formalize and validate suicidal crisis in diagnostic terms as a mental state that is specifically characteristic of imminent suicide risk, as distinguished from the mental state associated with depression or lifelong suicide risk. A particularly promising line of work involves a definition of suicide crisis syndrome. To be identified as having suicide crisis syndrome, the individual must meet both criterion A and two of the criteria from B:
• Criterion A: Frantic hopelessness or state of entrapment defined as being stuck in a life situation that is painful and intolerable, and a feeling that all routes of escape are blocked.
• Criterion B:
– Affective dyscontrol, including emotional pain or mental pain; severe panic with agitation, and dissociation; rapid mood swings that can include happiness; and acute anhedonia.
– Cognitive dyscontrol, which can include ruminative flooding associated with headache or head pressure; cognitive rigidity; and inability to suppress the ruminative thoughts. (For example, you might assess by asking: “Do you control the thoughts or do the thoughts control you?”) – Overarousal with insomnia and agitation.
– Social withdrawal and isolation, and evading communication.
See https://www.mdedge.com/podcasts/psychcast/dr-igor-galynker-identifying-suicide-crisis-syndrome-part-1 for discussion and some references.
3.3 Project dataset
• We identified the 11,129 users who had ever posted to r/SuicideWatch (which we’ll sometimes abbreviate as SW), a discussion forum where the aim is to provide peer support for people considering suicide. Posters on SW generally tend to fall into one of three categories’: people who themselves are considering the possibility of self-harm, people who are worried about a friend or loved one, and people who want to help. The fact that someone posted to SuicideWatch can be viewed as a form of indirect supervision, i.e. a noisy indicator of possible suicidality.
• Of those 11,129 users, a subset of 934 were randomly selected, and crowdsource workers labeled each individual on a four-point scale for risk based on reading their SW postings. Those risk labels can be viewed as a moderately reliable labeling for risk, based on our analysis of inter-rater agreement.
• Of those 934 users, a subset of 242 were labeled on the four-point risk scale by four experts in suicide prevention looking at their SW posts. Each individual was looked at by all four experts, and the inter-expert reliability (agreement on risk levels) was high, so their consensus ratings can reasonably be considered ground truth.
• As a “control” group, we identified 934 users who never posted on SW or on any other mental health related subreddits.
Note that for all of these users, in addition to posts on SW we also have all the posts they ever posted anywhere on Reddit.
(a) No Risk (or “None”): I don’t see evidence that this person is at risk for suicide;
(c) Moderate Risk: I see indications that there could be a genuine risk of this person making a suicide attempt;
(d) Severe Risk: I believe this person is at high risk of attempting suicide in the near future.
Also, for purposes of this project, you should exclude any posts to forums related to mental health, since, even if evidence from those forums proved highly predictive, the content there is specifically generated by people talking about their mental health issues and results would not be generalizable to social media outside of Reddit. The set of mental health subreddits to exclude includes Anger, BPD, EatingDisorders, MMFB, StopSelfHarm, SuicideWatch, addiction, alcoholism, depression, feelgood, getting over it, hardshipmates, mentalhealth, psychoticreddit, ptsd, rapecounseling, schizophrenia, socialanxiety, survivorsofabuse, and traumatoolbox.
3.4 Additional data
The subset of MyPersonality data is being made available for this project with permission of the researchers who created the dataset. It includes one subset of data where users filled out a survey used for assessing depression, and another subset where users filled out a survey for a personality inventory involving the traits of openness, conscientiousness, extraversion, agreeableness, and neuroticism. The last of these, neuroticism, is a predictor of depression.
4 The project problem
There are two alternatives to choose from: exploratory data analysis using computational linguistics methods and models, and supervised learning to distinguish severe-risk users posting on SuicideWatch from users who are not at severe risk. Your group only needs to do Project A or Project B, not both.
4.1 Project A: Exploratory data analysis
4.1.1 Ideas for potentially relevant features of language
The goal here is for you to go deeper than you have so far with techniques for exploring differences in language use. Are there detectable differences in the language of users who are high risk versus those who are not?
To tackle this, you should look identify features of language that you think might be worth exploring in social media for identifying positive users, and formulate ideas for how to implement the relevant analysis. Here are just a few ideas, but you should consider these simply as examples and generate ideas of your own also. You should definitely look at some of the relevant literature that I cited for ideas.
Operationalizing features that are specifically related to suicide risk. Background on potentially relevant features appears above on Section 3.2. This is probably the most interesting avenue to pursue in terms of doing something new and interesting, although other features below are important to consider also
vegetative/energy level sleep tired night bed morning class early tomorrow wake late asleep long hours day sleeping nap today fall stay time
somatic hurts sick eyes hurt cold head tired back nose itches hate stop starting water neck hand stomach feels kind sore
negative/trouble coping don(’t) hate doesn care didn(’t) understand anymore feel isn(’t) stupid make won(’t) wouldn talk scared wanted wrong mad stop shouldn(’t)
anger/frustration hate damn stupid sucks hell shit crap man ass god don blah thing bad suck doesn fucking fuck freaking real
emotional stress feel feeling thinking makes make felt feels things nervous scared lonely feelings afraid moment happy worry comfortable stress excited guilty
anxiety feel happy things lot sad good makes bad make hard mind happen crazy cry day worry times talk great wanted
Table 1: LDA-induced themes related to depression.
because (a) you’ll want to look at the value of interesting features compared to less interesting baselines, and (b) implementing less interesting features that you understand well is a good way of making sure your code is correct.
Word-based techniques. A baseline approach to any language-based classification task is to look at surface language use, e.g. using simple unigram or n-gram features or association-based methods like the ones we exercised in the homework assignments. It’s possible that n-gram language models could potentially pick up differences in use compared to typical language use. Some of the relevant background papers discuss specific sets of words associated with depression or suicidality; it would be interesting to compare what they found with your findings on this dataset.
Syntactic variation. Another intriguing possibility to consider is that variation in syntactic choices might be related to underlying mental health status. It is well known from the lexical semantics literature that grammatical constructions are linked to underlying semantic properties such as causation (was an event caused or did it just happen?), volition (did the agent of the event intend to make it happen?), telicity (did the event have a defined endpoint?), and affectedness (was the object of an event affected by it?). Greene and Resnik 2009 showed that these semantic properties can mediate the relationship between what people hear and their judgments based on what they hear — for example, given a story about an event where somebody kills someone else by drowning them, a headline like Victim drowns is perceived as more sympathetic to the perpetrator than Perpetrator Drowns Victim, because, in contrast to a subject-verb-object transitive structure, an inchoative construction like Victim drowns de-emphasizes the causal and volitional role of the perpetrator and the affectedness of the victim. As a real-world example, when the chairman of British Petroleum testified in front of the U.S. Congress about the Deepwater Horizon oil rig disaster, he referred to “an explosition in which eleven workers were lost”, not an explosion that killed eleven workers.
How might syntactic variation be related to depression or suicidality? One could use computational linguistics methods to explore, for example, a number of hypotheses related to the concept of negative attentional bias, that is, the finding that people suffering from depression tend to focus more on negative information (Feng et al., 2015). For example, one hypothesis might be that, beyond simply using more negative words (which is already well established), someone who is depressed might be more likely to put themselves as the object of a negative verb, consistent with the perception of being affected by negative states or events. Conversely, one might hypothesize that a depressed person might be less likely to view themselves as capable of causally affecting things around them in a positive way, and therefore less likely to use language where they are the agent of a positive, causal event. Pennebaker has found predictive differences in pronoun use: depressed people use the word “I” much more often than emotionally stable people, likely reflecting an inward-facing perspective; but of course that pronoun in English only appears in subject position, so could there be something deeper going on that involves not only the subject but also the syntactic constructions and/or the positive-or-negative valence of the verb? Taking this a step further, perhaps similar distinctions in viewpoint might exist more generally whether or not the person himself or herself is involved in the event, e.g. a greater use of detransitivizing constructions (inchoative, passive) might be connected with a general view of the world as involving things that “just happen” as opposed to being caused with a purpose.
Other forms of dimensionality reduction. Bedi et al. (2015) use latent semantic analysis (LSA) as a way to capture semantic content, in order to operationalize the idea that people suffering schizophrenia often manifest greater discontinuity of thought, e.g. “derailment”, where someone’s language includes sequences of unrelated or only remotely related ideas. Along with LDA, LSA or deep learning techniques could be used to explore lower-dimensional lexical, sentence, or document representations, and/or semantic trajectories or consistency of content within or across posts. The latter point also raises the possibility that other sequential characteristics of the language might be relevant.
Non-language measures. The main focus for this project, obviously, is natural language processing. But it would be perfectly reasonable to also explore some non-language characteristics such as average volume of postings, lengths of postings, or temporal patterns in postings such as whether people are more likely to be posting late at night (e.g. bucketing timestamps into 3- or 4-hour windows). One could also combine that with language characteristics, too, e.g. perhaps symptom domains such as agitation can be detected from language but are more relevant when they’re seen very late at night.
Visualizations. There are many interesting ways to visualize relevant information — feel free to explore some of these, though please also resist the temptation to get sucked into visualization itself too much and not spend enough time thinking about the problem itself from an NLP perspective. One visualization that’s been used in the mental health setting that’s easy and interesting is “Venn clouds”; see https: //github.com/coppersmith/vennclouds. (It’s also worth noting that although a lot of people really dislike word clouds as a visualization method for text, in practice I’ve found that one really practical way to use them is as a visualization for topic models, where each topic gets its own cloud and the “weight” of a word in a topic is used in place of frequency.)
These are just a few ideas — you should look at relevant papers and it’s likely you’ll also come up with others!
Once you’ve got a set of features that you hypothesize might be useful, there are a number of ways you might consider exploring them in the data. Statistical hypothesis testing is certainly one: for any given feature, you could evaluate the hypothesis that it appears among positive users more often than among control users. This is also a way of doing feature selection for supervised learning (see e.g. http://scikit-learn. org/stable/modules/feature_selection.html). Another possibility would be using principal components analysis (PCA) to take a larger set of features and reduce it to (hopefully) interpretable subsets. Still another would be to take a representation learning approach to see whether a network could learn higher-level abstract features that capture information relevant to the task. And yet another after that might be to use attention in a neural network to highight parts of postings that are particularly relevant; in this domain, a nice example is the explanation generation approach in Kshirsagar et al. (2017). And of course your assignments have included examples of potential outcomes of exploratory data analysis, e.g. hypothesis tests, top-N features that distinguish among the groups of interest, or heat maps or other visualizations that might help bring interesting patterns to the surface.
4.1.2 Defining contrasts for analysis
You have some flexibility in how you decide to draw contrasts for purposes of your analysis.
Note that if this were a project that involved exploratory data analysis and predictive modeling, then using the test data for exploratory analysis would be prohibited. For purposes of this class project, however, if you are doing exploratory work you are welcome to use the expert-labeled individuals. It will just be important to make sure that no insights from that ever go into developing predictive models that are evaluated on the same dataset.
4.2 Project B: Supervised classification
The goal here is to develop a classifier to identify “positive” users from controls, using linguistic and possibly other features.
4.2.1 Defining the classification task
Even with that, there are actually several different possible tasks you could be doing, depending what information you make availabe to the system. Zirikly et al. define three variations:
• Task A is about risk assessment: the task simulates a scenario in which there is already online evidence that a person might be in need of help, and the goal is to assess the degree of risk from what they posted. This task uses the smallest amount of data, with each user typically having no more than a few SuicideWatch posts. This would be a binary classification of severe-risk (d) versus the lower-risk categories (a-c).
• Task C is about screening. Here predictions are made only from users’ posts that are not on SuicideWatch (even though ground truth about the individual was determined by looking at their SW posts). This task simulates a scenario in which someone has opted in to having their social media monitored (e.g., a new mother at risk for postpartum depression, a veteran returning from a deployment, a patient whose therapist has suggested it), and the goal is to identify whether they are at risk even if they have not explicitly presented with a problem. This task would be a binary classification of severe-risk (d) versus controls, and you could also train or develop your system using labeled data in categories (a-c), or not, as you see fit. (Note that this definition is different from the Zirikly et al. Task C, which used d versus a-c just as in the other tasks.)
You are welcome to choose which of these three tasks you are going to work on. Task A most closely resembles standard supervised classification tasks, although in this case you might have multiple documents per individual and it is the individual, not the document, that gets labeled. Task B is a variation where the really interesting question is how you might be able to exploit more information about the individual if you had it. Task C is undoubtedly the most challenging, since the idea is to find people who are at severe risk without any evidence that they have reached out for help on a peer-support group.
4.2.2 Classification approach
All of the possibilities in Section 4.1 are certainly fair game in terms of features, as are the symptoms or characteristics in Section 3.2, and of course you can propose other elements of analysis that might be predictive, if you like. But the thing I definitely do not want you to do is to simply treat this as a generic supervised classification problem, e.g. by dumping bag-of-words features into a large-margin classifier (the old fashioned approach) or by simply dumping the data into a standard deep learning pipeline where you start with BERT and a typical classifier architecture, fine-tune on the task, train, and test. It’s fine to do things like that as baselines for comparison, and certainly fine to build on standard approaches and packages — don’t reinvent the wheel! — but if you’re not doing anything that involves thinking about this particular problem and data, you’re not really doing the project that I’m assigning. I care much more about you connecting what you’re doing here with what you’ve learned over the semester, than I do about your system actually performing well on a classification task.
As a detail to consider that’s particular to this project and different from most other supervised clsasification problems, one key decision to make is whether to use all of the data from a user as a single training instance (“document”), or whether to do something more sophisticated in terms of modeling. Here are two things to consider:
• The fact that a user is a positive instance certainly does not mean that everything they post reflects their suicidality.
There are lots and lots of possibilities. What’s most important is being thoughtful about your choices, demonstrating your understanding/proficiency with material we’ve covered in the course, and producing a strong, convincing report of what you did.
4.2.3 Evaluation
F1 http://en.wikipedia.org/wiki/Receiver_operating_characteristic. I recommend doing this also, although it’s optional.
Error analysis. I don’t expect you to spend as much time digging around in the data as if you were doing Project A (exploratory data analysis). However, in a supervised classification setting one really good way to get into the data is via error analysis. You can do this in formative evaluations, in the process of improving your system, or you can do it as part of the summative evaluation of the system, looking at the performance of the classifier on the real test data at the end, or you can do both.
5 What you need to do
Follow the top-level guidance for all projects in terms of proposal, deliverables, etc. Here are some notes worth bearing in mind particularly for Project Ideas A and B.
Project proposal. This project is deliberately underspecified: the first part of your job, assuming you’ve figured out who you’ll work with, is to scope out a project that will be feasible within the necessary time frame. The biggest risk here is carving off a project that is too ambitious to be done with the time you have, so try to propose a project plan that describes what your group plans to try in enough detail that we can steer you away from approaches that are likely to get you bogged down. Make sure to leave room for unanticipated problems — messy data that could need to be cleaned up, etc. As I emphasize further below, this is not a textbook exercise; you’re playing with real-world stuff, and real-world problems are unpredictable.
I strongly recommend that you look at relevant papers to (a) identify relevant properties of language that you’re going to explore, and (b) sketch out how you plan to operationalize those properties algorithmically. As part of this process, I recommend taking a look at the dataset, in this process. (To be pure, don’t look at test data.)
Note that you are more than welcome to use/adapt off-the-shelf code rather than implementing things yourselves. In fact, this is strongly encouraged. It’s better to spend your time exercising what you’ve learned, not creating your own from-scratch implementations of SVM classification, LDA, deep learning classifiers, syntactic parsing, etc. You can also use the class discussion board to talk about code, implementations, etc. No group will be penalized for intellectual generosity in sharing what they learn with other groups! Please just make sure to acknowledge others in your writeup at the end if they were helpful to your group, saying explicitly what they did that helped.
Project writeup. See the main top-level project page for what’s expected in writeups. Here’s some additional guidance particularly for these project choices.
• Introduction. High level description of what you decided to do and why, and what you expected (or at least hoped) to get out of it. Although in principle it would be good for you to get practice at providing a motivating introduction like I gave in Section 1, the way that people do in a conference or journal paper, you do not need to do so, unless you’ve got something new to say that hasn’t been said above or in previous literature. I already know what I said and I’ve already read a lot of the previous literature, and I would much rather you spend your time on the sections of the writeup that really matter. Definitely do not just spit back material from this document.
• Data and methods. Some recommended things to include:
1. Any data and resources you used other than what I’ve already given you. (For that you can just say you used the data and resources that you were given for the project.) For any other data or resources: how you got it, basic properties (size, etc.),. If applicable, include anything you needed to do with data or resources (including what I gave you) in order to work with it.
2. Basic information about preprocessing, e.g. how you did tokenization, removal of stopwords (if you did that), etc.
3. Relevant descriptions of which language (and metadata, if applicable) characteristics you looked at and why. Make sure to cite relevant source as appropriate. (Please use a citation style that includes the authors inline, e.g. “a fantastic paper (Resnik, 1999)”, not “a fantastic paper [13]”.).
Include a description of what you did to operationalize or implement the text analysis to capture those characteristics.You do not need to regurgitate textbook- or article-style descriptions of existing algorithms, just point to the source (bibliographic reference and, if relevant, where you got code), if you are using something that exists rather than designing something new. Again, note that you are not required to invent new things or implement from scratch for this project; applying what you’ve learned to this new problem space is fine. However, you do need to provide relevant details. As a good example, consider something like this excerpt (made up for purposes of illustration):
“To obtain word classes based on topic modeling, we trained Chang’s implementation of sLDA (Blei et al. 2008, http://cran.r-project.org/web/packages/lda)) with 40 topics, using each author’s combined set of posts as the document, and that author’s group (+1 for positive, -1 for control) as the response variable. We chose k = 40 as the number of topics by trying values between 20 and 50 to see which worked best on heldout (dev) data. See Table 3 for the 40 topics, and see Appendix A for excerpts from of several documents, modified to preserve anonymity, along with the posterior distribution of topics for each example document.”
• Relevant information about any other algorithms and models, e.g. PCA, supervised classification, etc. Identify what approach you took and why, which software you used and its relevant parameters, etc.
• Evaluation
1. For exploratory data analysis, present a well structured, informative discussion of what you found (or didn’t find), including examples, figures, tables, etc. as appropriate.
2. For classification, describe how you evaluated what you did in development and final testing, e.g. including details like cross-validation if you used it, evaluation metrics, etc., plus discussion of how you did error analysis and what you found. For discussion of evaluation metrics and presentation of results, see Lin and Resnik Evaluation in NLP and good prior papers. As an example of describing development, you might find yourself including a statement like the following:
“We tuned the α and β parameters using a grid search with values of 0.01, 0.05, and 0.1 for each parameter. For testing, we then used the combination of α and β that performed best in the grid search as evaluated using 5-fold cross validation on the training data.”
3. Ethical issues. Read Benton et al., “Ethical Research Protocols for Social Media Health Research”, http://www.ethicsinnlp.org/workshop/EthNLP-2017.pdf#page=106 and discuss each of the issues in Sections 3.1 through 3.8 in terms of what you did or did not do in this project (or, if it’s not relevant, explain why). A sentence or two is fine for these — this doesn’t need to be a focus of the writeup or take a lot of time, but I do want each group to have read and discussed this. Then make sure to include the statement and typed signatures as discussed in 2.2.
• Discussion and future work
1. Qualitative discussion and conclusions. In what ways did you succeed, and in which ways didn’t you? Are there any surprises in the data, or interesting things to highlight — more generally, what did you learn? What directions seem most promising for future work?
2. Optionally, any particular difficulties or hurdles you encountered. Please feel free to include ways in which final projects like this could be made better.
References
APA. 2013. The critical need for psychologists in rural america.
American Psychiatric Association. 2013. Diagnostic and Statistical Manual of Mental Disorders, 5th edition. American Psychiatric Association.
Philip J Batterham, Maria Ftanou, Jane Pirkis, Jacqueline L Brewer, Andrew J Mackinnon, Annette Beautrais, A Kate Fairweather-Schmidt, and Helen Christensen. 2015. A systematic review and evaluation of measures for suicidal ideation and behaviors in population-based research. Psychological assessment 27(2):501.
Gillinder Bedi, Facundo Carrillo, Guillermo A Cecchi, Diego Fern´andez Slezak, Mariano Sigman, Nat´alia B Mota, Sidarta Ribeiro, Daniel C Javitt, Mauro Copelli, and Cheryl M Corcoran. 2015. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophrenia 1.
Adrian Benton, Glen Coppersmith, and Mark Dredze. 2017. Ethical research protocols for social media health research. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. pages 94–102.
Rafael A Calvo, David N Milne, M Sazzad Hussain, and Helen Christensen. 2017. Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering 23(5):649–685.
Glen Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, and Margaret Mitchell. 2015. CLPsych
2015 shared task: Depression and PTSD on Twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. North American Chapter of the Association for Computational Linguistics, Denver, Colorado, USA.
Zhengzhi Feng, Xiaoxia Wang, Keyu Liu, Xiao Liu, Lifei Wang, Xiao Chen, and Qin Dai. 2015. The neural mechanism of negative cognitive bias in major depression—theoretical and empirical issues. https://www.intechopen.com/ books/major-depressive-disorder-cognitive-and-neurobiological-mechanisms/ the-neural-mechanism-of-negative-cognitive-bias-in-major-depression-theoretical-and-empirical-issues.
SK Fineberg, S Deutsch-Link, M Ichinose, T McGuinness, AJ Bessette, CK Chung, and PR Corlett. 2015. Word use in first-person accounts of schizophrenia. The British Journal of Psychiatry 206(1):32–38.
Stephan Greene and Philip Resnik. 2009. More than words: Syntactic packaging and implicit sentiment. In Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics. Association for Computational Linguistics, pages 503–511.
Sharath Chandra Guntuku, David B Yaden, Margaret L Kern, Lyle H Ungar, and Johannes C Eichstaedt. 2017. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18:43–49.
Eric Horvitz and Deirdre Mulligan. 2015. Data, privacy, and the greater good. Science 349(6245):253–255.
Oliver P John and Sanjay Srivastava. 1999. The big five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of personality: Theory and research 2:102–138.
Thomas E Joiner Jr, Rheeda L Walker, Jeremy W Pettit, Marisol Perez, and Kelly C Cukrowicz. 2005. Evidence-based assessment of depression in adults. Psychological Assessment 17(3):267.
Thomas E Joiner Jr, Rheeda L Walker, M David Rudd, and David A Jobes. 1999. Scientizing and routinizing the assessment of suicidality in outpatient practice. Professional psychology: Research and practice 30(5):447.
Rohan Kshirsagar, Robert Morris, and Samuel Bowman. 2017. Detecting and explaining crisis. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality. Association for Computational Linguistics, pages 66–73. http://aclweb.org/anthology/W17-3108.
Naomi Lee. 2014. Trouble on the radar. Lancet 384(9958):1917.
Franc¸ois Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30:457–500.
http://www.aclweb.org/anthology/W16-0312.
D.N. Milne. 2017. Triaging content in online peer-support: an overview of the 2017 CLPsych shared task.
Available online at http://clpsych.org/shared-task-2017.
Danielle Mowery, Hilary Smith, Tyler Cheney, Greg Stoddard, Glen Coppersmith, Craig Bryan, and Mike Conway. 2017. Understanding depressive symptoms and psychosocial stressors on twitter: A corpus-based study. Journal of Medical Internet Research 19(2).
James W Pennebaker and Laura A King. 1999. Linguistic styles: language use as an individual difference. Journal of personality and social psychology 77(6):1296.
Dina Popovic, Eduard Vieta, Jean-Michel Azorin, Jules Angst, Charles L Bowden, Sergey Mosolov, Allan H Young, and Giulio Perugi. 2015. Suicide attempts in major depressive episode: evidence from the bridgeii-mix study. Bipolar disorders 17(7):795–803.
Philip Resnik and Jimmy Lin. 2010. Evaluation of nlp systems. The handbook of computational linguistics and natural language processing 57:271.
Philip Resnik, Rebecca Resnik, and Margaret Mitchell, editors. 2014. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. Association for Computational Linguistics, Baltimore, Maryland, USA. http://www.aclweb.org/anthology/W/W14/W14-32.
Kathleen Sibelius. 2013. Increasing access to mental health services.
Http://www.whitehouse.gov/blog/2013/04/10/increasing-access-mental-health-services.
Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, pages 2968–2978.
https://www.aclweb.org/anthology/D17-1322.




Reviews
There are no reviews yet.