Outlier Detection in Election Data Using Geospatial Analysis: A Case Study on Ensuring Election Integrity
In the aftermath of the recent elections, the Independent National Electoral Commission (INEC) faced several legal challenges concerning the integrity and accuracy of the election results. Allegations of vote manipulation and irregularities were widespread, prompting a thorough investigation into the matter. Our mission was to uncover potential voting irregularities and ensure the transparency of the election results by identifying outlier polling units where the voting results deviate significantly from neighboring units, indicating potential influences or rigging.
Task Overview
The objective of our analysis was to identify outlier polling units based on the votes each party received. Using geospatial techniques, we aimed to find neighboring polling units and calculate an outlier score for each party in each unit. The goal was to pinpoint polling units where the voting results significantly deviated from their neighbors, indicating potential irregularities or influences.
Methodology
Dataset Preparation
Data Collection:
- We started by downloading the
{YOUR_SELECTED_STATE}_crosschecked
spreadsheet or CSV file for our selected state of origin. - The initial dataset lacked geographic coordinates (latitude and longitude) for each polling unit or ward. Therefore, we employed geocoding techniques to obtain these values.
Geocoding:
- Geocoding involves converting addresses into geographic coordinates. We used the Google Maps Geocoding API to perform this task.
- Each address (polling unit) was sent to the API, which returned the corresponding latitude and longitude.
- We appended these coordinates to the original dataset, ensuring that each polling unit had accurate location data.
Here’s an example of how we added the coordinates using Python:
import pandas as pd from geopy.geocoders import GoogleV3 import time # Load the dataset df = pd.read_csv('mee.csv') # Initialize the Google Geocoder geolocator = GoogleV3(api_key='YOUR_GOOGLE_MAPS_API_KEY') # Function to geocode addresses def geocode_address(address): try: location = geolocator.geocode(address) return location.latitude, location.longitude except: return None, None # Apply the geocode function to each address in the dataset df['latitude'], df['longitude'] = zip(*df['PU-Name'].apply(geocode_address)) # Save the updated dataset df.to_csv('mee_with_coordinates.csv', index=False)
Neighbour Identification
To identify neighboring polling units, we defined a radius of 1 kilometer. Using the geodesic distance method, we calculated the distance between each polling unit and all other units. Units within the defined radius were considered neighbors.
Distance Calculation:
- We utilized the
geopy
library in Python to calculate the geodesic distance between each polling unit and its potential neighbors. - A function was created to calculate the distance between two geographic points (latitude and longitude).
from geopy.distance import geodesic def calculate_distance(lat1, lon1, lat2, lon2): return geodesic((lat1, lon1), (lat2, lon2)).kilometers
Identifying Neighbors:
- For each polling unit, we identified its neighbors based on the predefined radius (1 km).
- We iterated through each polling unit and calculated the distance to all other units. If the distance was within 1 km, the unit was considered a neighbor.
neighbors = {} for index, row in df.iterrows(): current_pu = (row['latitude'], row['longitude']) current_neighbors = [] for idx, compare_row in df.iterrows(): if index != idx: compare_pu = (compare_row['latitude'], compare_row['longitude']) distance = calculate_distance(*current_pu, *compare_pu) if distance <= 1: current_neighbors.append(idx) neighbors[index] = current_neighbors df['neighbors'] = df.index.map(neighbors)
Outlier Score Calculation
For each polling unit, we compared the votes each party received with those of its neighboring units and calculated an outlier score for each party based on the deviation of votes from neighboring units.
Calculating Differences:
- For each polling unit, we calculated the absolute difference in votes for each party compared to the votes of its neighboring units.
Outlier Score:
- We calculated an outlier score for each party as the absolute difference between the unit’s votes and the average votes of its neighbors.
def calculate_outlier_score(votes, neighbors_votes): return abs(votes - neighbors_votes.mean()) parties = ['APC', 'LP', 'PDP', 'NNPP'] for party in parties: df[f'{party}_outlier_score'] = df.apply( lambda row: calculate_outlier_score(row[party], df.loc[row['neighbors'], party]), axis=1 )
Sorting and Reporting
The final step involved sorting the dataset by the outlier scores for each party to identify the most significant outliers. We provided a detailed report explaining the methodology and findings, highlighting the top 3 outliers and their closest polling units.
- We sorted the dataset by outlier scores for each party and identified the top 3 outliers.
sorted_outliers = df.sort_values(by=[f'{party}_outlier_score'], ascending=False) top_outliers = sorted_outliers.head(3)
- A detailed report was generated, explaining the methodology, findings, and key insights.
- Visualizations, such as maps and charts, were used to illustrate the results.
Findings
The analysis revealed several polling units with significant deviations in voting results compared to their neighbors. These units were flagged as potential outliers, indicating possible irregularities or influences.
Top 3 Outliers:
- The top 3 outliers for each party were identified and analyzed in detail.
- Each outlier’s closest polling units were examined to understand the deviations.
Visualization:
- Maps and charts were used to visualize the outliers and their neighboring units, providing a clear picture of the potential irregularities.
Through geospatial analysis and outlier detection, we were able to identify polling units with significant deviations in voting results, ensuring greater transparency in the election process. This investigation provided valuable insights into potential voting irregularities, contributing to the integrity of the election results.
By following this comprehensive methodology, we successfully uncovered potential voting irregularities, contributing to the transparency and accuracy of the election results. The use of geocoding techniques and geospatial analysis proved to be effective in identifying outlier polling units, ensuring a thorough investigation into the matter.