I am excited to be given the opportunity to co-host a tutorial on the
textnets package for R at the 2018 European Symposium Series on Societal Challenges with my amazing colleagues Taylor Brown and Marcus Mann. The symposium will take place in Cologne, Germany from December 5 to 7 with our workshop and many more scheduled for the first day of the event.
NB: This post describes the tentative schedule and will be updated as we finalize the details for the tutorial.
- December 5, 2018.
The tutorial will introduce participants to a newly developed R package that leverages techniques from network analysis/graph theory to improve and innovate on approaches to automated text analysis. There is growing interest in automated detection of latent themes in large-scale, unstructured text data. While topic models have become a popular choice for such tasks, our alternative provides several significant advantages over these conventional “bag of words” models. Three distinct advantages of our network-based approach are: 1) the potential to apply the concept of triadic closure to identify the meaning of words, i.e. the meaning of any two connected words can be understood more accurately in the context of a third word; 2) the greater flexibility in document length compared to the generally required long texts for topic models, which is a significant advantage in an age where short social media messages are pervasive; and 3) the incorporation of recent advances in community detection, which provides an innovative way to group words and documents by leveraging clustering observed within networks.
Tutorial participants will acquire a robust understanding of the fundamentals of both automated text analysis and network analysis, and will be encouraged to explore how their combination can improve upon existing text analysis methods. The main objective of the tutorial will be to enable participants to apply
textnets methods to their own research questions using our R package. In doing so, participants will develop the skills to tailor the existing functionality of the package to their particular needs. As a second objective and in the spirit of open science, participants will be introduced to the tools and given the opportunity to contribute to our open-source repository on Github–suggesting changes or extending the functionality of the package for the benefit of future users.
The tentative structure of the tutorial will be as follows:
Introduction and motivation for the
- What is automated text analysis, what are its applications, and what are its limitations?
- History: Content analysis
- Methods: Topic models and word embeddings
- Limitations of current methods
- What is network analysis and how can it address these limitations?
- Network analysis basics: nodes, edges, network measures
- Meaning from triadic relationships
- Lower threshold for required text lengths
- Two mode networks and their visual representation (documents connected by words and vice versa)
- Network measures & community detection
- Polar ties for opposing views on topic
Introduction of the package and example application
- What is the textnets package, what are its functions and their arguments?
- Input data format
- Preparing text data
- Creating text networks
- Visualizing text networks
- Interpreting text networks
- Measuring brokerage
- The US gun debate on Twitter - a textnets application
- Introduction of corpora of tweets
- Show preparation of tweets
- Identifying communities in Twitter account text network
- Identifying topics in communities
Hands-on example application and brainstorming application ideas
- BYOData or try out textnets on a dataset we provide
- Small group discussions to identify promising applications
Concluding presentations and discussion
- Present participant-generated text networks
- Discussion of brainstorming ideas
- Future textnets developments