A comparison of database systems for rapid search results in large amounts of leaked data
MetadataShow full item record
A large portion of the information on the Internet is stored in databases, since databases arethe natural place among online services for storage of user information. Hackers occasionallybreach web sites and publish database records online, resulting in personal user informationleaked in the public domain and available for abuse by malicious actors.Several web services offer the opportunity to search in numerous data breaches with as muchas six billion records or more and response times less than one second for each single query.This thesis investigates performance optimization methods that can be applied to the databasesystems MySQL Percona, MongoDB Percona, Splunk and Elasticsearch to provide rapidsearch results on leaked data with average hardware specifications.Background information on database systems and core functionality is described along withpossible configuration options for operating system tuning. State-of-the-art database solu-tions for handling big data are also presented.The database systems have been extensively tested and show that all database systems canbe tuned for improved performance. MySQL and MongoDB delivers query results in almostreal-time for exact searches when using indexes, while Splunk is among the faster solutionsfor both exact and wildcard queries.Research confirms that Elasticsearch is the fastest performing solution for searching leakeddata with an average response time 1.58 seconds on data sets containing between 10 and 100million records.
Master's thesis Information- and communication technology IKT590 - University of Agder 2019