messages
conversations
users
We studied the community dynamics of Ubuntu IRC forums over one month. Through this study, we tracked the development of the chat network over time and examined dynamic participation patterns. We also observed factors that affect a user's embeddedness in the network and their 'importance' as discussion initiators or responders. These findings informed a web application that predicts the best times for discussion on a given topic for practical collaboration.
Project Type: social network analysis, data visualization, front-end development
Role: graph visualization, data mining, UX design, prototyping, front-end development
Tools: Python (BeautifulSoup, scikit-learn) for data mining, Gephi for graph visualization, Python (matplotlib, seaborn) for visualization, Figma for design and prototyping, Python (Tkinter) for app development
Date: February - April 2019
Code: GitHub
Open source software holds advantages over proprietary software in terms of security, cost, and flexibility. Because these products are widely used, they are also rapidly deployed for performance assessment. Developers often create specialized support forums called Internet Relay Chats (IRCs) to help users navigate these changes. IRCs provide opportunities for leveraging computing methods to increase collaboration across a large community.
The primary objective of this study was to decipher connections and restructure communication strategies among users in the channel to facilitate information flow. These strategies could guide a variety of users towards achieving their desired objectives via online communication. Our analysis aimed to:
The biggest advantage of the Ubuntu IRC over other IRCs is that it follows a peer-to-peer approach towards primarily software-related discussions, which allows for research that applies to other technical domains like business, collaborative learning, and military control. As more and more people join the IRC, they provide diverse approaches to problem-solving, thereby increasing the quality of solutions. This community thus evolves as a goal-oriented community, where the primary objective is to facilitate discussions.
Our dataset contained 601,705 messages from 9,731 users. To build the chat corpus, we scraped Ubuntu technical support channel logs. Each message had four relevant properties: channel, timestamp, username, and data.
Participants were divided into 'experts' and 'non-experts' – users with high network authority scores (high input and output content nodes) were labelled experts and the rest were labelled non-experts. This notion of role-influenced behaviour taken from centralized systems helped us informally discover the nature of these social interactions if such roles were mapped to conversations.
We used reply structure and word context techniques to extract a network from the data. Because IRC etiquette states that users address each other directly ('@___'), we were able to establish ties between participants. The network consisted of outward links from message senders to recipients. Multiple references built stronger ties between the corresponding nodes (users). Thus, our directed weighted network consisted of IRC users as nodes, with the edge weights corresponding to their communication frequency. Nodes were sized according to the betweenness centrality of the participant. Finally, sub-networks were coloured according to their modularity classes.
Often, participants on an open forum such as IRC are subjected to long wait times until their queries are resolved. If an expert is unavailable for a long time, the question can get buried under others and eventually remain unsolved, leading to a lossy information exchange. We identified the times of day when experts discussed a topic to help new users predict when their query would most likely be resolved.
The machine learning approaches we used for studying linguistic behaviour and topic models[1] are based on the assumption that the corpus is dynamic. Therefore we developed an indicative term-based categorization approach using chat sessionization and keyword mining. We restricted the report to hour-long bins to ensure that the analysis is neither too narrow nor too broad.
I incorporated this approach into a user-facing and administrator-operated message analysis system.
It was interesting to examine IRCs because "people who are located in geographically distant locales, who are of different national and linguistic backgrounds, and who might otherwise never come into contact, can engage in real-time interactions that resemble the immediacy of in-person face-to-face encounters"[2]. We provided:
[1] Guille, Adrien, Hakim Hacid, Cécile Favre, and Djamel A. Zighed. “Information diffusion in online social networks: A survey.” ACM SIGMOD Record 42, no. 1 (2013): 17-28.
[2] Paolillo, John. “The virtual speech community: Social network and language variation on IRC.” Journal of Computer-Mediated Communication 4.4 (1999): 0-0.