Week 5
May 8 - May 12
Summary
Through week 5 of my senior project, I released an official version of KeyFlare available on all desktop environments. I heavily refined KeyFlare, an interactive, semantic image segmentation application, and gathering valuable user feedback, leading me to recognize that the project is still not yet viable for a daily use case. This week marks a strategic shift in the project, a shift that emphasizes getting user engagement and gaining experience.
My recent accomplishments include:
Rewriting the class "pipeline" (abbreviation for image pipeline) into two classes, "ImagePipeline" and "GUI," and consequently allowing for selecting coordinates of interest faster.
Optimizing the code to significantly reduce the average coordinates of interest processing time from 0.3685034963 seconds to 0.1267196177 seconds, on average, resulting in a responsive application.
Expanding the user base by having two new beta users from macOS and Ubuntu test the project. I learned that the macOS version does not work yet.
Allowing users to exit the script by pressing any character not shown on the screen and enhancing the GUI design.
Numerous small performance and usability improvements.
I sent the the project to 100 Medfield High School students to gather feedback and increase engagement with the project. Although the response rate was low, a few meaningful responses marked significant areas for improvement.
In the upcoming week, I will make KeyFlare available on popular marketplaces to facilitate greater download-ability and user engagement. Hopefully, this step will ensure the project's success.
Overall, week five has been a period of growth and learning. I have learned a lot about creating applications and shift in focus is something I wish I could have began earlier. However, even though I began late, This shift has and will continue to help shape KeyFlare into an even useful application for its target audience
Documentation
Processing Image
Converts the original image from RGB to grayscale.
Applies an adaptive thesholding algorithm using a Gaussian filter.
Identify contours.
Applies a dilation operation with kernel size (1,4) and finds contours again.
Identify contours.
Processing Data
Configures an Rtree algorithm for efficiency.
Puts all bounding boxes into this Rtree.
Identifies overlapping regions and removes the larger region.
Rationale
I discovered that UIs follow an important pattern, they are "divideable" into sections. If you take a ruler and draw a straight line that extends to both edges of the screen, you will find such a line on a UI that divides the UI into two parts. You can continue this process for a UI almost always till the base components, aka clickable elements. This process, however is recursive in nature, and such a method is not suited to the goal of image processing algorithms, producing an algorithm that is parallelizable, an algorithm that can do every step separately instead of relying on previous or intermediary steps and then combine everything at the end. To make such an algorithm, I use canny edge detection to find contours, it looks for edges with same color intensity, with morphological operations, which blurs an image at different strengths, to find overlapping boxes. Then, I use those overlapping boxes to remove larger boxes and sometimes smaller boxes to detect and filter boxes effectively. This attempts at finding those UI element components by looking at the larger sections and then the smaller elements and forming a guess at what is a background image and what is not. However, this process is still messy, but it works on most cases. Improvements with this process rely on a huge rewrite to the Rtree algorithm for the filtering process for reliability, usability, and efficiency improvements, which is necessary for a large enough user base, but highly unnecessary when for a small user base, all it means is that users type an extra character to select a character if the filtering algorithm guesses whether a component is a background or not incorrectly.
Previous Goal
The goal of this week is to enhance my knowledge of Flutter and Python to implement Scikit-learn's Naive Bayes MultinomialNB classifier and small dialog boxes using Flutter, altogether creating the final prototype and user GUI for allowing users the ability to interact with their screen through a keyboard.