Saturday, September 29, 2007

12.WebSpiderReview

Reviewed Package Author: Andrew Wong


Overview

The system is logically designed and seems to use both code and variables very efficiently. In general, the code is tested quite well, however, the system is prone to fail if the passed arguments are not in the correct order. There also is a discrepancy between the way that this system counts the total number of links and the way other systems count the total number of links. This system counts the start url as a link and only counts links that have not been visited before. In other words, it counts the total number of distinct links and the start url (even though technically it should not). Because of this, the totallinks function returns a value that is off from what other systems are returning (usually less).


Installation Review

The project was extracted from the zip file successfully.

"ant -f verify.build.xml" executed successfully. Results were:

JUnit

ran successfully.

CheckStyle

ran successfully.

PMD

ran successfully.

FindBugs

ran successfully.

Emma

coverage was not 100% but covers 100% of the WebSpider package. Does not cover 100% of the hackystat logger package (missing catch statements).

Emma Coverage summary


class: 100% (7/7)
method: 96% (26/27)
block: 97% (636/659)
line: 96% (156/162)


The creation of the jar file using "ant -f dist.build.xml" was also completed successfully.

Invoking "java -jar webspider-wongandr -totallinks http://hackystat.org 100" runs the WebSpider program without incident.

Code Format and Conventions Review


Code Formatting Corrections Recommended:



































FileLinesViolationComments
WebAddressDatabase.java37, 55, 73, *EJS-61Some comments are unncessary
WebAddressDatabase.java106, 107EJS-52 space indent
TestWebSpider.java18, 39EJS-42blank line between description and Javadoc tags
WebSpiderLogger.java40EJS-42blank line between description and Javadoc tags





Test Case Review
Black Box Perspective:


WebAddressDatabase.java:


Both test classes TestWebAddressDatabase.java and TestWebSpider.java invoke WebAddressDatabase.java although only TestWebAddressDatabase.java invokes it directly. Interestingly, TestWebAddressDatabase.java does not invoke all methods from WebAddressDatabase.java even though it is specifically created to test WebAddressDatabase.java. Between the two, however, all methods are tested.


WebAddressEntry.java:


This data structure is tested thoroughly by TestWebAddressDatabase.java. The data structure is tested to make sure it is functioning as expected.


WebSpiderLogger.java:


Based on the output generated when the jar file is invoked with logging, WebSpiderLogger appears to be working correctly. However, no tests in either test classes exist to see whether the logger is performing as expected. The class is tested indirectly by using the logging command in both test cases (in TestWebSpider.java). The minimum tests I would recommend would be to at least test to make sure that when "-logging" is not included as an argument, that logging is then turned off.


WebSpider.java:


The test classes cover all the methods within WebSpider.java. However, there is no testing to see if the passed arguments are valid, in the right order, or even if there are the correct number of arguments. Also, the testing for WebSpider.java does not see what happens when the "-logging" is not included. Because of this, the system seems prone to errors because of the lack of testing of the passed arguments.



White Box Perspective:


WebAddressDatabase.java:


This class does have any real errors that can be thrown. Because of this, TestWebAddressDatabase.java does a thorough job of testing this class.


WebAddressEntry.java:


Overall, this class was tested thoroughly. It functions as expected, so no further error checking must be done.


WebSpiderLogger.java:


This class is tested thoroughly as well. It functions as would be expected, the coverage is adequate.

WebSpider.java:


Only the normal operations of this class were tested.

The main method assumes that at least 3 arguments will always be passed to it. If there are more than 3 arguments (4 or more), then it only takes the 4th argument and checks if it is "-logging". No error conditions are present for if there are less than 3 arguments or if the 2nd or 3rd arguments are out of order. If the first argument is not "-totallinks" or "-mostpopular" then the getTotalLinks() and the getMostPopular() methods are not called, but the 2nd and 3rd arguments are still passed to create the WebSpider constructor and the integer value for maxPages.

I would recommend testing for no arguments, for some of the arguments, for arguments that are out of order, and for arguments that are in the right order but missing the "-logging" command. And then modifying the code to check for these errors.

The way that the code is set up, testing the passed arguments seems to be the only area that requires additional testing.

Break da buggah:


/**
* Test that causes the system to fail.
*
* @throws Exception Thrown if a web page fails to parse.
*/
@Test
public void breakDaBuggah() throws Exception {
String test2[] = new String[4];

test2[0] = "-mostpopular";
test2[1] = "20";
test2[2] = "http://www.google.com";
test2[3] = "-logging";

try {
WebSpider.main(test2);
}
catch (Exception e) {
fail("Webspider execution failed.");
}
}


This test reverses the order of 2nd and 3rd passed arguments. It causes the system to fail because of a number format exception that is not caught in the main method. The system tries to pass a string as an int. Here is the error message generated from the command line:

Exception in thread "main" java.lang.NumberFormatException: For input string: "http://hackystat.org"
at java.lang.NumberFormatException.forInputString
at java.lang.Integer.parseInt
at java.lang.Integer.valueOf
at edu.hawaii.webspider.WebSpider.main


Summary and Lessons Learned

From this exercise, I've learned that testing is almost as time consuming as actually coding the system. Thoroughly testing the system involves far more than just making sure that the Emma coverage is at or near 100%. It involves thinking about every method and every error that can be thrown. Not only do you have to make sure that the system performs as expected under normal circumstances, but that the system is resilient when it comes to handling improper and unexpected input (either for a method or for passed arguments). This involves adding additional error checking features to the system to make sure that the system does not fail from simple errors such as passing the arguments in the wrong order.

I've also learned that thorough testing involves testing each individual class, no matter how small or irrelevant it may seem to the overall system. Even though the some classes are relatively insignificant when compared to the overall system, if these smaller classes are not at least tested to ensure that they do what they are supposed to do, then the system itself is already possibly prone to serious errors.

I've also gained more knowledge working with the various tools and figuring what needs to be changed in the various files (such as build.xml) and what needs to be added in order for them to all work (junit.build). And although, some of the tools such as PMD may seem quite annoying at times (being very picky) they ultimately help you to consistently build a functional system that adheres to certain standards, as efficiently as possible.

No comments: