Who may Americans favor more for the 2024 Presidency Elections if they decide to run for elections?

--

Source of the images: Michelle Obama — Times.com, Andrew Cuomo — https://www.wxxinews.org/

……………Michelle Obama Vs. Andrew Cuomo…………

There have been calls in 2020 by the public on Twitter for Michelle Obama and Andrew Cuomo to run for the upcoming presidency. Although they both shared no intentions for doing that at the moment, I felt curious to compare the sentiment of both public figures on Twitter to know who does the public favor more!

Data source for the data is twitter, Twitter was selected as the source for data, as the calls and hashtags were raised on this platform. There are different combinations of hashtags and tagging, so I selected a couple of pairs of tags to compare the results and the polarity or sentiment score. The pairs are (Michelle Obama Vs. Andrew Cuomo), (michelle obama Vs. andrew cuomo), (Michelle For President Vs. Cuomo For President), (michelle obama, cuomo), and finally (Michelle 2024, Vs. Cuomo 2024). The different combinations were tested after manually checking Twitter and seeing the hashtags used by their fans.

In most of the combinations above, Michelle Obama got higher scores and more positive sentiment, but one of the biases that may have on the results is the “case sensitivity”, writing their both names with upper and lower case change the results and flip the choices. That is the reason for testing different pairs of combinations, with upper and lower case.

Bias may be related to the random selection of Hashtags based on random words that I thought may help find where the data is located. So, there be other combinations that can help increase the confidence in the results. The selection and cleaning were done using functions on the python libraries listed later in the article, so I believe there should be ethical concern regarding selection and filtration, as it was not manually, and no group or set of tweets that were excluded intentionally.

Another bias was found after checking a sample of the tweets retrieved by the code and finding that it was not relevant to the presidency but related to Michelle Obama being in Netflix’s board of trustees. On the other hand, Andrew Cuomo had many tweets related to COVID and other NY related issues. That is the reason for adding two more pairs that have specifically the year “2024” and the keyword “for president”.

Data were retrieved using python libraries (tweepy, textblob, preprocessor, statistics, typing, preprocessor.api).

First, an account was created on develope.twitter.com, to get the API key, and API secret key. Then, four functions were developed to retrieve tweets, clean tweets, sentiment/polarity determination, and finally a function to calculate the average sentiment score. Further details about the four functions are listed below. Regarding the bugs, I have experience using python, but it was new to use text related extraction and cleaning functions. The bugs were more related to selecting which built-in functions, passing wrong parameters…etc (most are traditional/normal bugs), and StackOverflow was used to find solutions.

Retrieve_tweets(), take a word of type string, and use “tweepy. Cursor” to retrieve top tweets in English, and append it to all tweets list. The function returns all the tweets in full text. Different rounds were run 10,100, 1000 tweets. The results were compared across the 3 values and the different pairs of strings listed above.

The images below shows the retrieve function added three times for 10, 100, and 1000 tweets. This could be a question too, if the tweets are sufficient or not in terms of sampling.

cleantweets(), take a list of tweets retrieved with the type string, and return the cleaned list of tweets. clean() function from the preprocessor.api library was used as shown below.

The sentiment() was determined using TexBloB. This library enables breaking the phrase into parts and extracting nouns for sentiment/ polarity classification. The output of this function is a list of scores appended for each cleaned tweet.

The scores() function, call the three retrieve functions for 10,100 and 1000 tweets and then call the three cleaning functions and pass the output of retrieve to them and then call mean(), from the statistics library. The scores() function is called in the main block that is shown below. The output for each range of tweets is posted to compare the scores between Michelle Obama and Andrew Cuomo.

In the main block, the two strings are entered and then passed to the scores() function, which calls the retrieve(), clean(), and sentiment() functions.

The Scores return 3 sentiment scores per string (for 10,100 and 1000 tweets). Then an “If condition” determines who wins or who has better sentiment across the three values, and the final message is posted accordingly.

Results vary based on the pairs entered. In most of the results, Michelle Obama got higher scores. For some rounds, Andrew would win 1 round will Michelle exceeds in the other 2 (Rounds means the number of tweets retrieved). Surprisingly, the only pair where Cuomo had higher scores in all when the year 2024 was added next to his name (Michelle 2024, Cuomo 2024) as shown below. When using the word “President” in the search, Michelle won in all three. The images below show a screenshot of the results across the different pairs.

The results below shows the pairs highlighted in yellow for each round and the average sentiment score. The first three lines show the scores for Michelle Obama, while the other three lines are for Cuomo.

# Both have “For President after their name”

# Both have “2024”

# Both lower case, with Andrew’s last name only

# Both lower case

# First and Last name with Initial Capitalized

Finally, Out of the 16 Rounds outputs, Michelle won 9 times versus 7 for Cuomo. Limitations may be related to the size of tweets retrieved. But, in conclusion, based on what was retrieved, it seems the public may favor Michelle Obama to Cuomo if she runs for the presidency!

// Resources for python — https://stackoverflow.com/

--

--