Friday, November 7, 2014

Exercise 6: Data Normalization, Geocoding, and Error Assessment

Introduction

The goal of this lab was to develop skills in data normalization, geocoding, and then assessing the errors after the processes have been run.  For this exercise the class was giving a excel spread sheet containing the locations of different sand mines across Wisconsin.  The table included addresses, facility names, operator, city name, county, and more: see figure below 2.  The only problem was that that some of the addresses were not normalized and contained only the PLSS information.  For example: NE SW Sec 2, 7N, 3W, our task was to fix these addresses and other information so the data could be used in ArcMap for geocoding.  The last step of the the exercise involved querying out the mines assigned to us and assessing the location of the mines using ArcMap. 

Methods

As I stated above each student was given an excel sheet containing information regarding sand mines in Wisconsin, figure .  The task involving the spreadsheet was to normalize the data so the correct faculty address, community (city), zip code, state, and the mine unique ID field, were correctly entered into a new personal excel sheet for future use.  Each student was responsible for at least 16 mines, I ended up normalizing 22 mines.  Dr. Christina Hupy, course instructor, developed a system where each student was assigned a number and each number was assigned to a different mine.  Therefore four students attempted to normalize each mine to test the accuracy of our geocoding, which can be seen in the results tab below.  To find the correct address for each mine I used google maps, Google search, and the PLSS shape file our geospatial technician, Martin Goetll provided for us. Finding the correct addresses was a large task, but searching the web by using some of the addresses provided and the facility name made it easier to find.  In or order to geocode it is critical to normalize the data.  If you were to try and geocode the mines before normalizing it would not work, therefore normalizing the table with the correct information is a critical step in the process.

After normalizing the data the next step is to geocode the mines in ArcMap using the geocode tool.  This involves signing into Esri's ArGIS online, adding the excel sheet of the mines normalized, filling out the correct parameters, and finally running the tool.  After geocoding the address a table like figure 1 will appear on the screen.  In my case all 22 of my mines matched with a score of 90 or higher, therefore I did not have to manually match any of my mines.  In fact all but six had a score of 100 showing that I did a good job normalizing my table for geocoding.

Figure 1: Results showing how many addresses matched after geocoding

After geocoding the mines the next step is to merge all of my classmates mines into one feature class, to then query out the mines that I used to compare accuracy.  Querying out all of the mines that were the same as mine to test the accuracy was a somewhat difficult task because some classmates did not correctly normalize the data.  All of the mine id's were not located in the same field therefore searching through the attribute table to find all the mine ids that matched mine was necessary.  After querying out the mines that matched mine the process is complete and the point distance too was run to give a result of the accuracy between the mines.


Results

Figure 2: Mine table before they were normalized  

Figure 3: Part of my normalized table
Figure 1 shows the table provided for us giving the information of the mines and figure 3 shows my part of my normalized table.  Figure 4 below shows the location of my mines in purple and then the the queried/matched mines in black.  As you can see some of my mines and classmates mines matched up perfectly and others were not matched but decently close.  One thing to note is that there can be the same mine twice or three times in black as each mine was normalized four times by our class.  Resulting in more black squares than purple triangles. 

Figure 4: Map showing my mines purple and queried mines in black

The table below shows my mines (Input) and the queried mines (near) and the distance, in meters, between them.  As you can see about two thirds of my mines matched very accurately with the queried mines but some of them did not match well at all.

Figure 5: Table showing the distance in meters between my mines and
the queried mines


Discussion

I experience many errors while working through this exercise, but I think this assignment was meant to produce errors and challenge us in handling these errors.  Using the normalized tables of my classmates caused error when querying out the mines I needed.  Some people from the class did not use the correct mine id field when normalizing making it difficult to find the mines I needed.  Also another source of error came when trying to find the correct address, zip code, and city of the mine.  Because some of the mines only contained PLSS information finding the correct address of mine became difficult.  Also because some of the mines do not even exist yet and are proposed or inactive finding an existing address was tough.  After running the point distance tool to check for accuracy of my mines and the classmates mines table errors were common.  In figure 5, from input 2 down you can see the accuracy is not precise at all.  This was because of a data entry error when normalizing the tables.  Either that is on me or one of classmates, I did notice for one of my entries I did not put an s in front of the address causing it to not be in the same place as my classmates who did include an s.  Not including the s caused the geocode process to not match it accurately to the right address therefore causing an error. 

How can we know which points are actually correct and which ones are not? We can tell which mines are accurate by looking at the match address field after geocoding.  If the matched address field contains the same address I have and my classmate has then that is the correct location of the mine.  You can also look at the distance field, figure 5, for support and if it is very close to zero then it is the correct location.  We can also tell it is a correct location if the score is 100 and the status is M, which means matched.  By looking at these three things in both my mines table and my classmates mines table I can tell which mines contain the actual correct location and which mines do not. 

Conclusion

This exercise developed skills in normalizing tables, geocoding, and then assessing errors.  This helped developed new skills when dealing with the location of a building.  It shows how accurate and clean you need to be when working with geocoding in ArcMap. I was pretty happy with the results I got, I think I normalized my table well and the accuracy of my mines compared to my classmates was average.  Doing another assignment like this I am sure we will be more efficient and smooth when normalizing the tables and finding the correct address. 

Sources:
Mines provided by Christina Hupy
US census bureau

No comments:

Post a Comment