Research

Instance-level Object Segmentation Within Video Frames Captured by Vehicle

Autonomous Driving has attracted tremendous attention in the last few years. The deployment of reliable self-driving cars will change how we get around forever. Among the many enabling technologies for autonomous driving, environmental perception is the most relevant to the vision community.

When you're driving, how important is it to be able to quickly tell the difference between a person vs. a stop sign? It's a hugely important, but typically very simple, distinction that you would make reflexively. Autonomous vehicles are not able to do this quite as effortlessly. In this task, participants are given a set of video sequences with fine per-pixel labeling, in particular instances of moving objects such as vehicles and pedestrians are also labeled. The goal is to evaluate the state of the art in video—based scene parsing, a task that has not been evaluated previously due to the lack of fine labeling. This competition is hosted by the 2018 CVPR workshop on autonomous driving (WAD), with dataset and evaluation metric contributed by Baidu Inc.

Improving Current Techniques in Deep Learning for Time Series Analysis in the Context of Sales Forecasting

Time series analysis and forecasting has been an important area of study since the earliest moments of mathematics and signal processing. However, traditional methods, often accepted by industry, are statically created models with parameters that are inserted by human experts. With the advent of machine learning, and even more recently, deep learning, the world of time series analysis is ready for a revision. The key value proposition of deep learning is the fact that it enables end-to-end machine learning, i.e. being able to do feature engineering and machine learning all within a single model. In doing so, the need for human experts to perform feature engineering is removed, allowing organizations, corporations, and individuals with little domain knowledge, if given enough data, to develop accurate and impactful models. Applying deep learning to time series analysis is no (relatively) new idea, and it has been successful.

Our study aims to improve the existing techniques within deep learning for time series modelling. In particular, we will look at the ways to improve sales forecasting using deep learning methods on time series. While many studies have been done in areas such as financial market, few studies have been applied to the sales forecasting problem. However, this is important because it is always difficult to predict the sales of products for companies, and if done poorly, companies can experience a huge loss simply because of over-stocking. This is particularly the case when it comes to grocery stores, since many of the goods are perishable. However, predicting sales is also complicated. In a recent paper published in 2017, it states that “Sales forecast is a challenging problem in that sales is affected by many factors including promotion activities, price changes, and user preferences”. Moreover, it mentioned that traditional machine learning approaches have limited accuracies to this problem. Therefore, our research aims to find a better model that can contribute to this field. To do so, we will study the impact of ensemble models, unique representations of data, specific network architectures, and understand the nature of particular problems that involve time series with the goal of creating a more streamlined and accurate technique or set of techniques.

Furthermore, the major problems within time series modelling has to do with the high amounts of random noise within time series data. From slightly inaccurate instruments to inherent randomness, a significant amount of noise is present in every data set. In addition, time as a feature cannot be treated like any other feature in a classification or regression problem - there is no explicit relationship between the 5th second and the 6th second. The final challenge is actually a good one - there are many models that exist within machine learning and deep learning in particular that can model time series; the challenge is to find the model or models that work the best.

Detecting Driver Drowsiness and Attentiveness Through Facial Recognition

Safely operating a vehicle is essential to the safety of the driver as well as everyone surrounding the vehicle. An inattentive, drowsy, or asleep driver, however, is a hazard to both the driver and the people surrounding the vehicle. The National Highway Traffic Safety Administration estimates that drowsy driving was responsible for 72,000 crashes, 44,000 injuries, and 800 deaths in 2013. However, these numbers are underestimated and up to 6,000 fatal crashes each year may be caused by drowsy drivers. Drowsiness as well as inattentiveness is very common due to heavy workloads as well as distracting technologies, respectively.

In order to combat and prevent this from affecting drivers on the road, we are researching the possibility of implementing facial recognition in combination with deep learning in order to detect whether or not a driver is inattentive, tired, or asleep. Our group wants to study facial recognition as an application of computer vision to improve the safety and security of our society as the concept has a very high potential to prevent future car accidents due to drowsiness and inattentiveness. The difficulties of this research will be identifying the proper databases in order to conduct our research from for the facial recognition. There are also many possibilities that we must consider. For example, sunlight might blind a drivers eye and the deep learning algorithm will identify the driver as drowsy. We must also research how facial recognition works - specifically the algorithms and machine learning principles. We need to find and analyze the tangible results of facial recognition implementation. This technology is still fairly new so there is still plenty of unknown factors and issues surrounding it and its use.

The results would consist of how accurately the model is able to determine if someone is about to fall asleep while driving. The percentage of times the model accurately predicts someone’s level of drowsiness would be the basis for how we determine the success. The model would be deemed successful if it performs noticeably better than current models that we attempt to improve on. The results of the model would let future researchers know what variables we researched were either necessary or unnecessary in order to improve accuracy and develop their research accordingly. In conclusion, if the model proves to be very successful then the topic of pushing this model in someway to every car should be talked about. For example, the model can be established in phone applications or used by car manufacturing companies in order to have the model integrated in future car models.

Electronic Access Control System Using Facial Recognition, Gait Analysis, and Global Scene Understanding

Electronic access control such as vision-based biometric authentication systems are a useful application to enhancing security in a variety of environments. High functioning security systems have the potential to ensure better protection of both property and people in an unpredictable world. By using facial recognition in conjunction with gait analysis, issues such as differentiating creating a non- intrusive system are able to be solved, making security all the more accessible. Additionally, global scene understanding can be used to used to analyze video footage to compare a 2D image to a 3D image. This feature will prevent potential threats should a 2D replica be used to simulate a 3D model given that the goal of global scene understanding is to analyze an image or video footage of an environment filled with objects that are organized in a meaningful way and be able to comprehend the interaction between those objects.

Using this information, we aim to explore whether a recognition system that takes global scene understanding into account is possible in order to create an access control system that prevents security breaches through 2D images and provides a unique distinction for each user through gait analysis. Additionally, through global scene understanding, we will be able to detect multiple users, and restrict access when the system recognizes something suspicious, i.e. the present of unauthorized user. We hope to challenge the capabilities of relatively explored fields such as facial recognition and gait analysis in conjunction with developing fields such as scene understanding to optimize security.

Though this proposed system has the potential to improve security while making it accessible to the user, its use of global scene understanding presents certain challenges. Global scene understanding is reliant on certain ambiguities that the human brain is easily able to detect. Implementing an understanding of ambiguities between visual objects in a machine is difficult because it is an incredibly nuanced topic. These ambiguities include, but are not limited to occlusion, viewpoint, illumination, and background clutter. Nevertheless, this issue has the potential to be solved with layered recognition opposed to hierarchical recognition which has previously been favored among researchers. Hierarchical recognition fails to take into account the objects which human vision prioritizes while layered recognition desires to mimic human vision.

It is anticipated that this form of authentication will be capable of higher accuracy and security compared to today’s facial recognition. There have been experiments where twins were unable to be differentiated by Apple’s Face ID on the iPhone X. Identical twins may have identical facial features, but if they live different lifestyles they may have different body types and movement habits so the gait analysis would be an added layer of security. Because we plan to measure the movement of the user, a unique attribute, it would be nearly impossible to impersonate a user.

Improving 9-1-1 Operating Efficiency Using Natural Language Processing

With 240 million calls made to 9-1-1 in 2017, it is evident that U.S. citizens are taught to rely on the emergency number to provide help when they need it most. Because U.S. citizens have become so reliant on their local Public Safety Answering Point (PSAP) to provide adequate help as quickly as possible, with new problems always emerging and there is a constant need to improve the efficiency of the system. One of the biggest problems with the 9-1-1 system is the wait time callers have to deal with when their local PSAP is flooded with calls. Whether multiple people calling about the same incident or there is an unexplained spike in calls, back ups do happen quite frequently in PSAPs. When back ups occur, callers are on hold in a queue, waiting to be connected to an operator. These moments where life and death rely on how quickly help can arrive need to be minimized at all costs.

Our team has decided to research ways to reduce the amount of time callers are held in a queue after calling 9-1-1. In order to do this, we have proposed to use natural language processing (NLP) to prioritize callers in dire situations when on hold. If we are successfully able to reduce the time wasted in queue after dialing 9-1-1, we would reduce the amount of time it takes for help to arrive to the caller, potentially saving thousands of lives.

The biggest anticipated difficulty with our research plan is figuring out how to use the pitch, volume, etc. from the audio files to find out if someone deserves to be moved up in a queue or not. Everybody speaks differently, and being able to analyze every single combination of accent, vocal frequency, and language accurately will prove to be a very difficult task.

In the future, our team hopes this model will be used when a 9-1-1 caller is sent to a queue before being serviced by a 911 operator. There are already systems in place for a recorded message to play once on hold, so we could use this to ask the caller a few basic questions like, “What is going on?” or “Where are you located?” The caller’s responses to these questions will be analyzed by the model we create to see if they need to be serviced before someone else in the queue with a less severe situation. This would reduce the amount of time callers in severe situations would be on hold, reducing the amount of time it takes for first responders to reach those who need it most.

What’s This? Landmark Detection in DC

When is a site-seeing city like DC, there are an immense amount of landmarks and it can be difficult to identify which one is which. Our machine learning model would take a picture of a landmark, identify it, and return information about that landmark. The purpose of this would be to aid tourists in their tours which would benefit them as well as the city since people would be more inclined to visit one with easily identifiable landmarks. The difficulties that we expect to occur are that a significant number of landmarks look very similar.

The results would consist of whether the model is able to accurately name / recognize the landmark. The percentage of times the model accurately recognizes the landmark would be the basis for how we determine the success. The model would be deemed successful if it performs noticeably better than current models that we attempt to improve on.

Predicting Bee Behavior

Bees are important and necessary part of our world’s ecology. Unfortunately, bees are diminishing at an alarming rate. Perhaps if people understood them better, they would care for them more. Our study will try to save the bees in two ways. First, by trying to get deeper insight on why they are dying. Second, increasing awareness of how amazing and important bees are by identifying their behaviors with machine learning using time-series image/video capturing. Without bees, numerous amounts of crops that humans currently produce would die out. This would lead to mass starvation and famine. Some problems that we could encounter include that bees, by themselves, are very complex creatures. Also, swarms of bees are somewhat of an entity, but each bee within the swarm is also an individual. Identifying bees within a swarm could pose an issue because our classifier could potentially try to pick up every single bee, which could cause a lot of noise and potential outliers. Predicting the behavior of a swarm is challenging because in some ways the model needs to “think” like a bee. The model would need to understand what it is like to be a bee, understanding what the bee is doing and why it is doing it.

By creating an app that allows people to look at bees and see what they’re doing, people will understand bees more. When people understand how much of an impact bees have in our world, people will start to care more about the future of the bee population. In turn, the bees will be a step closer to be saved. Also, if the app gets popular enough, through the data by the users, we might find some deep insight on why the bees are in trouble. Perhaps, in the end, our machine learning model will be able to even discern something that is very complex, such as swarm flight patterns.

Multimodal Sentiment Analysis for Visual, Textual, and Aural Content

Right now, machine learning is being used everywhere online to analyze text and predict various outcomes. Google suggests searches based off what you have previously typed, Facebook analyzes your browsing and “like” patterns to show you ads based off your interests, and machines like Siri and Alexa listen to you, answer questions you might have, and even perform tasks per your requests. While these uses for Natural Language processing have been researched for years, topics such as sentiment analysis have been researched less, and have flown under the radar. Sentiment analysis is the process of determining the mood of text based on word usage and grammatical structure. Our goal as researchers is to extend the process of sentiment analysis from the largely researched field of textual sentiment analysis to visual sentiment analysis. Sentiment analysis is very practical in our society for many reasons. With the massive amount of data on the internet, finding a good opinion on certain topics can be difficult. Many marketers use sentiment analysis to analyze massive amounts of data in order to find the average opinion. Researchers can also gauge the sentiment of our current president by analyzing tweets or news articles about him. Sentiment analysis could be used to test how the public feels about an experimental product or new movie, then this information can be used to generate greater revenue, which is good for the economy. Likewise, the government could adjust policy by taking into account the sentiment of the population. All of these opportunities make sentiment analysis very useful in our society. However, with the rise of video sharing sites like YouTube, Facebook, and Vine, analyzing text has become less necessary and visual/aural sentiment analysis is becoming more and more necessary. What was once written online is now spoken, and researchers need new ways to analyze videos and audio to determine sentiment along a wider range of content that is found online.

Some of the challenges that come along with visual and aural data sentiment analysis is that introducing video and audio data into a dataset is much more difficult because the machine must take the extra steps of turning the audio into text. Video data is difficult as well because analyzing a 2D set of pixels and understanding them as a coherent piece requires more code and training to accomplish.

The anticipated outcome of the research is for our model to be a combined sentiment extraction system that works on videos with audio and a clear speaker present. Our system is expected to be able to output a consensus from our combined three separate extraction systems, and give us a predicted polarity of the speaker. Our research should allow companies or people to quickly and efficiently determine the sentiment of a speaker in a video towards a certain topic. This will allow for better data aggregation on certain topics. Our research if successful should serve as a stepping stone for more accurate and efficient sentiment extraction systems overall. The combined nature of our system should become a focus in current research, as the three separate systems all have predictions and confidence scores. Their combined output allows for a accurate and efficient extraction system which should become the focal point of research in multimodal sentiment analysis. All things considered, we hope that though our research we are able to create a useful model that successfully incorporates aural, visual, and textual cues in order to discover the sentiment of videos found on the internet.

AI Versus the Complexity of Malicious Code

With the rapid growth and continuous change in technology within the past few decades, cybersecurity has grown to become an extremely crucial factor to take into account when dealing with assets, reputations, and business. Malicious code poses a serious risk to all of these areas, which has resulted in the growth of the cybersecurity field. Most threat protection solutions have conventionally relied on two main sources of operation. First of all, the malware must be recognizable (i.e. it needs to be already seen). This is not particularly ideal for the fact that as our technology becomes more advanced, so do the hackers with malicious intent. We can expect every malware attack to be one already handled, but rather new types of attacks could pop up every single day. Secondly, most cybersecurity tactics rely heavily on human experts. This may not seem that bad, but humans are indeed a valuable resource; we do not have an infinite number of cyber security specialists available at our disposal. This is where artificial intelligence can fit closer to the ideal scenario when it comes to handling cyber security threats (i.e. malicious code). Artificial intelligence does not necessarily need to recognize a problem from past recollections (i.e. it can be made adaptable to recognize new malware on its own) and it most definitely does not require as many human experts to run. This machine learning security approach can allow us to confidently protect ourselves against today’s malware as well as the malware of the future. Unlike humans, artificial intelligence does not “forget” what has happened previously and, as mentioned earlier, does not rely on having to see a particular piece of malware beforehand to know how to handle it. AI can also be utilized to handle malicious files without any sort of connection to a internet network. Even though we will never be able to reach 100% malware prevention in cyber security, artificial intelligence-based solutions seem to be the next step in tackling these problems more effectively/efficiently, allowing us to save system resources, make businesses more secure, and stop malware even before it execute.

This problem is particularly important to all individuals because everything around us in this age of technology - from the food we order through an app, the textbooks we order for class, or even streaming a show on Netflix - is vulnerable to cyber attacks and data breaches. The main difficulties in doing research on this topic is that the field of cybersecurity is very vast and if we want sufficient research, we would be forced to formulate some sort of basic AI system that can handle malicious code. At our current level of understanding, we have almost no idea how to do something like that. Also, AI in cybersecurity seems to be a pretty new concept (i.e. as seen when looking at prior research) so there may not be as many tools and resources available to help us address the research problem as in depth as we would like to.

At the end of this project, we anticipate to have implemented a system based on machine learning techniques researched throughout the semester to analyze and identify malicious code hidden in otherwise safe programs. Some of our methods could also be extended to other environments or situations that are similar to our goal of detecting malicious code. Many of the methods we will be implementing will be applicable to a wide range of scenarios, which means that some of our work may be transferable to solve other problems. If all goes well, we will both create a useful system, as well as learn a tremendous amount about machine learning and cybersecurity throughout the course of the research project.

Deep Learning Approach to Credit Card Fraud Detection Using Time Series Data

Credit card fraud is arguably one of the greatest concerns of banks, credit cardholders, and retailers. Criminals acquire credit cardholders’ information to purchase items, which causes banks across the world to lose vast amounts of money. As of 2014, approximately 5 cents for every 100 dollars spent was lost due to fraud, totalling $16.31 billion dollars in losses. This trend is expected to worsen over time, as $35.54 billion is expected to be lost due to credit card fraud in 2020. Therefore, it is our goal to create a model that will assist banks in detecting fraudulent activity in order to prevent losses from reaching this level.

There are several major difficulties that will come with doing this research. One of them is the fact that credit card fraud is in general not very common among the US population. This obviously makes it difficult to be accurate in our model because it is very possible to get false positives and miss fraud entirely. Another potential concern is the fact that the people that are performing these fraudulent activities are getting more advanced with their methods. In this way, our research could become obsolete because criminals are catching on to the techniques used to detect fraud. For example, once the criminals know that we are paying special attention to how they differ from the model already created by one person they may change their tactics and follow the patterns of the victim. In addition, the model that we are creating will be tailored to identifying outliers, but an algorithm can only work to a certain extent and some outliers will inevitably go undetected within the dataset. This is a problem because these outliers must be used to detect fraud, since instances of fraud are outliers.

We expect that the model will provide more accurate results than traditional credit card fraud detection methods. This is because our model will detect fraud based on each individual’s transaction history, which means that the model would be specific to each person. Therefore, we expect that the model will maximize the number of true negatives and minimize the number of false negatives. This model can be used by banks, as Kültür and Çağlayan suggested for their model, to alert customers that they may be victims of fraud and to decline fraudulent transactions. We are optimistic that our model will provide our intended results.