Month: September 2021


<标题>Project Goals

<标题>In this project, you will be developing a simple Java application (processfile) using an agile, test-driven process involving multiple deliverables. While you will receive one grade for the entire project, each deliverable must be completed by its own due date, and all deliverables will contribute to the overall project grade.

Specification of the processfile

Utility processfile is a simple command-line utility written in Java with the following specification:


processfile allows for simple text manipulation of the content of a file.


processfile OPTIONS FILE


<标题>Program processfile performs basic text transformations on lines of text from an input FILE. Unless the -f option (see below) is specified, the program writes transformed text to stdout and errors/usage messages to stderr. The FILE parameter is required and must be the last parameter. OPTIONS may be zero or more of the following and may occur in any order:

● -f Edit file in place. The program overwrites the input file with transformed text instead of writing to stdout.

<标题>● -n Add line numbers to each line, unpadded starting from 1.

● -s Keep only the lines containing the given string.

● -r Replaces the first instance of string1 in each line with string2.

● -g Used with the -r flag ONLY; replaces all occurrences of string1 in each line with


Project Overview

<标题>This project will be built based on the paper Visual Text Correction written by Amir Mazaheri and Mubarak Shah[1]. In this project, given a video clip word description and its corresponding video, our proposed framework will detect the inaccurate word and replace it with a more proper one. The topics such as constructing a more realistic dataset and improving an existing NLP result will be explored and discussed in our project.

Summary of previous work

The foundation of our project is built on the paper written by Amir Mazaheri and Mubarak Shah. In that paper, a system utilizes two modules, the inaccuracy detection module and the correct word prediction module, to find the most inaccurate word in a video description and replace it with a more proper one. The inaccuracy detection module uses a convolutional n-gram network and LSTM to extract the textural information and uses a gating module to extract the visual information. A combination of two makes the prediction of the more inaccurate word. For the correct word prediction module, the system uses text encoding and video encoding to make the prediction for the word.

<标题>Potential solutions

Approach 1: Generate a more realistic inaccurate description:

<标题>The original false dataset is created only according to the frequency of words and by swapping words with the same part of speech in the sentence. As a result, some extreme unrealistic examples like “swimming in the kitchen” are added to the dataset. Thus, one of our approaches can improve the false dataset by adding more metrics to construct it. We can construct the false dataset by replacing the original word with some words which have a better coherence to the location and movement. In addition, the possible replacement can be built in a tree structure and we can generate the false one from it.

Object-Oriented Programming

1 Preparation

A Zip archive containing the files for the assignment is provided in Minerva. When you unzip it, you should see three files of earthquake data, along with four .java files. One of these,, is complete and ready for you to use; the other three files containing empty Java classes.

There are ‘To Do’ comments inside the classes that summarise the code that you need to write, and detailed step-by-step instructions are provided below.

Before you begin programming, take some time to study the sample data. You can use a spreadsheet application to do this. Each row in the dataset represents a single earthquake. The different columns represent various parameters measured for an earthquake. You don’t need to know what most of these are, although further details can be found at the USGS Earthquake Hazards Program website if you are interested. The only columns relevant to this assignment are: latitude, longitude, depth and magnitude.

<标题>Note that filenames consist of two parts: a severity level (often expressed in terms of earthquake magnitude) and a time period. The two parts are separated by the underscore character. Thus, 2.5_day.csv contains details of earthquakes of magnitude 2.5 or greater, recorded over a single day; 4.5_week.csv records earthquakes of magnitude 4.5 or greater, over a one-week period; and significant_month.csv records ‘significant’ earthquakes recorded during a one-month period.

2 Quake Class

This class encapsulates the details a single earthquake.

1. Edit in a text editor. Add to the class fields that represent the latitude, longitude, depth and magnitude of an earthquake.

<标题>2. Create a constructor for Quake that takes a single String parameter. This string represents all of the data provided by the USGS for a single earthquake, with each value separated from the others by commas. To see real examples of what this string looks like, open one of the data files in a text editor (not in a spreadsheet), and examine any of the lines after the first. Note: you can use a method of the String class to help you implement this. Your constructor should perform some basic validation, checking that latitude is within the allowed range of ?90.0 ? to 90.0 ? , and that longitude is within the allowed range of ?180.0 ? to 180.0 ? . You should throw an instance of QuakeException, containing an appropriate error message, if latitude or longitude are invalid.

<标题>3. Write simple ‘getter’ methods for each of the fields. These methods should simply return the values of the fields.

4. Write a toString method for Quake. This should generate and return a string representation of the Quake object. The string should contain magnitude, depth, latitude and longitude, in that order, formatted like this example: M5.0, 12.6 km, (35.4975°, 141.0217°) Note: the ‘degrees’ symbol can be generated with the Unicode escape sequence \u00b0. As you complete each step of implementing this class, remove the corresponding ‘To Do’ comment and check that your code compiles. Once you’ve written the constructor and getter methods, you can test them by writing a small program that creates a Quake object from a string. You can use one of the lines from the given data files as the string.