Instructor: David Schuff

Weekly Question #6: Complete by October 19, 2017

Leave your response as a comment on this post by the beginning of class on October 19, 2017. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your opinions, not so much particular “facts” from the class!If you sign in using your AccessNet ID and password you won’t have to fill in the name, email and captcha fields when you leave your comment.

Here is the question:

In class we discussed several examples of how data can be stored in different formats in two different sources (i.e., a date can be represented mm/dd/yyyy or dd/mm/yyyy). Give your own example of a scenario, other than the ones we discussed in class, where data could be represented two different ways. How would you reconcile this conflict when combining the data sources?

22 Responses to Weekly Question #6: Complete by October 19, 2017

  • Time is an example of data that is able to be entered into a database in two different ways. Most of the world uses the 24-hour clock, but places like the U.S. use the 12-hour clock. For example, 15:00 and 3:00 pm both mean the same thing. It’s important to keep the time values consistent to avoid confusion. It would be best to use the universal 24-hour clock, but this must be used throughout the entire database if you decide to implement it.

  • Time is another data format the can vary among data sources. One data source may use the 24-hour clock while another may represent time on a 12-hour clock. The second data source will show 5:30 PM as 5:30 PM and the first will just show 17:30. To reconcile the differences I would change the 24-hour clock to the 12-hour clock since this format is easier to read and understand.

  • I’m actually working on an issue with this on a website project me and a couple of friends are working on – there are two methods of displaying a post ID that someone makes on the website. There is a 64-bit “Snowflake” value that is almost guaranteed to be unique and avoid duplicates, due to the fact that it’s 20 digits long, and a 32-bit, base 64 (1234567890 and the alphabet, both capitalized and not capitalized, ! and ?), 6 digit long “Short ID” that is displayed to the user. The Snowflake value is stored in the back end database while the Short ID gets stored in the user’s cache – the reconciling comes from trying to make the Short ID’s compression algorithm as close to lossless as possible, so that the Snowflake value can be restored from the Short ID.

  • The rate data can be presented in two ways: “%” or “.”. For example, the revenue of Company A increased by 0.6 or 60% in 2016. We’ll reconcile the data by change into % or “.” consistently.

  • A good example of data that can be represented in multiple different ways is Currency. Different countries have unique ways of formatting currency, and the formats vary drastically. While a thousand dollars in the US format looks like this: “1,000.00” , one thousand of the Swedish Krona looks like this: “1 000,00”. In order to reconcile a conflict where you are trying to join a databases with multiple currency formats, an established data standard would need to be established in order to avoid confusion and problems within the database.

  • Phone number is another example that can be entered in different format. For example, a database might key in phone number as ‘xxxxxxxxxx’ in a database or as ‘xxx-xxx-xxxx’. This two different ways of keying in data can become a problem if a person wants to combine or use both database. In my opinion, it is best to let the data type to be VARCHAR(12) so that both format can be used universally.

  • One example of two different formats is the metric system vs. the american measuring system. Something may be measured in inches on one database and centimeters on the other. Deciding between the two may depend on where the company is now based. Changing the measurements might require going through each record and converting the measurement to the other form. The other option would be to leave them as they are, but if you are a company such as Lowes or Home Depot and this database is a part of your website, it may confuse and frustrate customers leaving the burden of converting units to them.

  • An example of a scenario where data could be represented two different ways would be time. In the United States we use the 12 hour clock, but most other countries use the 24 hour clock. 1:00 PM is the same as 13:00. The 12 hour clock would be difficult to use in a database, because you would have to include if it is AM or PM. For databases you should probably use a 24 hour clock if you are recording times.

  • An ecample of a scenario where data could be represented two different ways would be months. For example, you could write out September or you could just use the number 9. I think to make it easier for a database, I would reconcile so all months would go by numbers. This would help with spelling errors as well.

  • Zip codes tend to have two different formats. One of the formats is a five digit number, and the other is a five digit number, a hyphen, and four more digits after that. The most common zip code format that I see is the five digit number. An easy way to reconcile this conflict when combining the different data sources would be to just show the first five digits of the zip code. This would remove the last four digits on the nine digit zip code, but would not necessarily change the zip code itself.

  • Time can be stored in a few different ways (whether it’s clocking in or out, or how many seconds it took to complete something).
    If, for example, it took a machine 4 minutes and 5 seconds to finish a task, the database can store it as 04:05 or 4:05 (it will depend on the INT).
    If, for example, employees are clocking in and clocking out, let’s say at 1:30PM, the database can store that time as 01:30, 1:30, or 13:30.

  • At work, I am doing a lot of cost basis exceptions for advisors. An advisor reaches out to our team and requests updated share amounts for tax purposes. The biggest problem I have come across is the way in which companies display their account numbers. For example Mass Mutual displays their account number as (xx-xxxxxx) and Wells Fargo’s account numbers are (xxxx-xxxx). It’s very tedious placing the dash in the appropriate place for each company. To solve this problem, one would have to design our database to either remove the dashes all together or create a macro to place the dashes into the appropriate places on the excel spreadsheet we use.

  • Product/materials weights can be labeled differently depending on where in the world the data is being input. Weight would usually be described in pounds in the USA, but might be described in grams or kilograms in the U.K. A way to reconcile this difference in data inputs could be to run one set of data through a program which automatically converts the unit of measurement to the other wholesale so that every entry from the one geographic location is converted into the unit of measurement used in the other location.

  • An example of something that can be labeled differently from one database to another would be distance. As we all know there is the metric system and the standard system. If you are transferring data from databases in two different continents of the world such as the United States and Germany, you will have one measuring distance in meters, kilometers, etc. and another in feet, inches, and miles. One way to reconcile this is to have a formula that you multiply each value by in order to convert the data from either metric to standard or standard to metric. This would be a tedious process, but easily applied to massive amounts of data at once.

  • Another example of data could be stored differently in two databases is different email addresses. I know that I have multiple accounts for several websites using my temple email address and my personal email address. For example, I had an account for Hulu under my personal email address and an account for Spotify under my student email address. So when the companies partnered up I was not able to get the free Hulu with student because I already had an account.

  • I suppose phone numbers could be in different formats. For example a company dealing with American phone numbers could write 215-555-1234 versus 2155551234. If you are an international business, you also have to deal with the country code, and I have seen that written as either +276 or (+276). In the former’s case, one would just have to remove or add hyphens. For the latter’s, one would just have to add or remove the parenthesis.

  • An example of a piece of data that can be stored two ways is a phone number with an extension. This could be stored in one column with a larger varchar limit or in two columns. When combining this data, these two field could be joined into one, or you could increase the varchar limit of one of the columns before joining the two tables.

  • A social security number can be represented in two ways. First is the way it is usually presented: xxx-xx-xxxx. Websites for some places (such as banks) that require your social security number will ask that you type it out as above with the dashes, while some places only ask that you enter it is xxxxxxxxx. The simplest way to handle this, I imagine, would be to enter a command that removes all “-” characters from the social security number entries in a database.

  • One way data could be represented in two different ways would be with weight. In the United States, we use pounds as a measurement of weight, but in other countries weight is measured in kilograms. You could reconcile this conflict by maybe having a feature in the database that will convert the weight to the appropriate metric system based on what country the information is entered for.

  • Another scenario of how data can be represented different ways in a database is attributes that have a “Yes” or “No” answer to them that could be represented by binary. For example, when joining customers from two different movie theater databases, an attribute can be whether the customer is a premium member. In a table this can be represented by the variable characters datatype or just binary. I would reconcile the conflict by coding all Yes values as 1’s and No values as 0’s.

  • An example of how data can be stored in two different formats is a phone number. One way phone numbers can be stored is through dashes for example 123-456-7890. But, its perfectly ok to not use the dashes and it will still make the same phone call for example 1234567890. I would store them as xxx-xxx-xxxx because you can then use the area code as a indicator of where the call is being made.

  • Measurements can be stored in different formats. For example, a measurement can be entered using either the imperial system(inch, ft, miles) or the metric system(km,cm). When entering a measurement into a database in the United States it would make sense to use the imperial system, but if a Company is international it might be difficult to know which measurement system to use.

Leave a Reply

Your email address will not be published. Required fields are marked *

Where and when do we meet?
Alter Hall 232
12:30 - 1:50 Tuesdays and Thursdays
Office Hours
David Schuff (instructor):
2:00-3:00, Tuesdays and Thursdays
Speakman Hall 210E and email (see my site)

Lauren Soentgen (ITA):
1:00 - 2:00 Mondays
11:00-12:00 Fridays
Speakman Hall 210D (and see her site)