#Stop the spread

--

Source of the image: https://www.theguardian.com/environment/2020

With the second wave for COVID 19 right about to start, there are calls to raise the awareness of the public to mask up to stop the spread and save as many lives as possible. As of October 2020, the number of infected people with COVID 19 across the globe exceeded 34 Million, while it exceeded 7 Million in America ( source of the numbers: John Hopkins Coronavirus Resource center).

In that capacity, I felt interested to explore the network topology, to see the structure related to tweeting in this hashtag, vertices, edges, top influential nodes,and do some analysis on the #stopthespread on Twitter. I used NodeXL, and the dataset was extracted from Twitter. NodeXL supports data extracted from Youtube, Twitter, and Flicker. So Twitter is the most relevant medium where the data resides. Also, NodeXL used to support data extracted from Facebook but now it is deactivated, due to the changes in Facebook data-related policies. 1000 Tweets were extracted that were posted between April and October 2020.

Nodes are the Twitter accounts(usernames) that are part of the tweets that have #stopthespread, and the edges reflect the relationship between the nodes in terms of Tweet that hashtag, or retweeting, or being mentioned in a retweet, or replying to it.

Data is composed of 1153 Vertices, 1587 Edges. The diagram below shows the big picture connection between the vertices and the 1587 edges between them. There are three big clusters (I put blue arrows pointing to them). Later on in the report, I discuss more the top three clusters with a zoom-in view of the three.

Fig 1: Vertices and edges for #stopthespread

Fif 2 shows the breakdown of the 1587 edges by Relationship between pairs of vertices, whether it was a tweet, retweet, a mention, or reply.

Fig2 — Edges by Relationship

The table below shows the top 20 vertices sorted in descending order by (in-degree). So the Table shows the top influencers based on in-Degree. In addition to the Centrality measures based on betweenness (Bridges)and Closeness. The Top vertices referenced are ONT Health, Karen Carter Peterson, Rep. Party of Louisiana, Governor Tony Evers, PA dept of Health, and Ivana Yelich. The vertices sorted in descending value by out-degree only does not provide any useful info as the accounts/users are unknown.

Table 1: Top Vertices based on in — degree

The table below shows the Top Vertices sorted by the degree(in & out) of the Vertex sorted in descending order.

Table 2: Top vertices based on Degree (in and out)

Then for the Top 10 vertices in #stopthespread, in the table above, I ran another query to retrieve the last 500 tweets for the 10 accounts below, to see the linkage between them. In addition to seeing the centrality measure among them. It turned out as follows in Table 3. The 10 vertices(who are the top 10 based on in-degree in #stopthespread) have 4954 Edges between them (for those who are connected).

Table 3: Centrality for top vertices

The diagram below shows the connection between the 10 Top vertices who tweeted or retweeted about stopping the spread and have high In-Degree based on the 2nd query of getting the latest 500 tweets for those 10 accounts.

Fig 3: Edges between Top vertices
Fig4 — Total Overall Tweets to date by Top 10 vertices

Network Topology(Types of Nodes for Top Influencers(Based on degree in and followers)):

Out of the 1153 Vertices, only 57 had one million followers or more. I selected a sample of those vertices to see the shape of the network for each vertex individually while showing up to 2.5 levels of adjacent vertices in each subgroup.

Based on the diagrams below, it seems the type of the node is Hub for (Donald Trump and Melania Trump), While (Joe Biden, WHO, the White House, CDC) are closer to a Bridge Type.

Fig 5- Donald Trump — The betweenness Centrality is 12180 and Closeness Centrality is 0.002
Fig 6— Joe Biden— The betweenness Centrality is 12895 and Closeness Centrality is 0.002
Fig 7- WHO— The betweenness Centrality is 358 and Closeness Centrality is 0.021
Fig 8-CDC- The betweenness Centrality is 3892 and Closeness Centrality is 0.001
Fig 9- The White House — The betweenness Centrality is 100 and Closeness Centrality is 0.001
Fig 10- Melania Trump — The betweenness Centrality is 2349 and Closeness Centrality is 0.001

Top Three Clusters:

Fig 9 shows the largest cluster, which is the Top right cluster in (fig 1), but with zoom in to show the nodes and edges clearly. There are 162 vertices in this cluster. Some of the public figures or famous accounts in this cluster include the House of Democrats, Fox News, CNN, Donald Trump, Joe Biden, Nancey Pelosi, the White House, POTUS, and FLOTUS.

Fig 11: Largest Cluster in the data set

Fig 10 shows a close view of one of the second largest cluster. The vertex in the center is for the official account for Ontario’s Ministry of Health. Followed by 609, and is following 72,292. Total Tweets until Oct 2020 are 8,671 tweets. ONT Health is in a group that is composed of 115 vertices.

Fig 12: Second Largest Cluster (In the Center- ONTHealth)

Fig 11 shows the third-largest cluster, which is the top left cluster in (fig 1). There are 29 vertices in this cluster. Some of the public figures or famous accounts in this cluster include WHO, WHO east Asia, BBC story Works, UNICEF India, National AIDS Control Organization.

Since this cluster includes UNICEF India and WHO East Asia, most of the names of the user account classified here are located in the same region.

Fig 13: Third largest Cluster

Words appearing with this Hashtag

Some of the words that were paired or appeared #stopthespread. That is created through Graph Metrices-> Word and word pair and then changing the options to tweets.

Table 4: words appearing with #stopthespread

Finally, this report shows information and some analysis of the 1000 tweets from #stop the spread that was posted between April and Oct 2020. Some of the limitations in the study are the small sample size, but that was done to be able to find meaningful network topology and connections. The larger the vertices and edges, the harder it may be to have any insights or conclusions.

No filtration was done by location, which added many vertices out of the U.S., but still, it shows a global view for the connection and spread of vertices.

The edge is based on tweets, so the same report may be replicated by looking at followers linkage to see what may differ in this case.

Another limitation is the absence of influencers who have an impact but do not have a Twitter account. For example, Dr. Faucci is also spreading the message across TV, radio, and media channels, but he does not use twitter. So he may have been one of the highest degree vertices.

Regarding the tool used NodeXL, that tool is really powerful, but it lacks good tutorials or training materials. I believe more deep analysis may be done with by practice. All the work above is based on trial and error until a meaningful visualization was found. NodeXL has limited online tutorials are built based on not real data, so it looks great but when applied to real data (actual tweets), with large values, connections cannot be inferred easily and it needs more solid idea and familiarity with the data set to organize in a way to reveal meaning!

Tutorials used:

--

--