Tips for Excel, Word, PowerPoint and Other Applications
Text to Column in Excel - Fixed Width
Why It Matters To You
Many times, the information we need to get into Excel comes in list (web page, word, text document, PDF) etc, that doesn't automatically dump into the appropriate columns and rows. If you're lucky, you'll get a data set that is delimited in some form (more on this in another Tip), but often times it isn't.
I recently had an opportunity to help a co-worker out. We were looking at air travel booked through our travel agent and trying to separate domestic vs. international travel. Unfortunately, all we had were the airport codes for the start and end of each travel segment.
The initial data set was 13+K travel records, all with multiple airport codes, so we needed a way to automatically look up airport codes and designate them as domestic or international (more on this in a future tip). Our first step however, was just to build a list of airport codes, which I found out later numbers close to 2,400 airports. Fortunately, a quick Google search for "airport code list" pops up a nice page (see below).
Data Sources and Exercise Files
- http://www.photius.com/wfb2001/airport_codes_alpha.html - source of my data
- texttocolumn_fixed.xls - all of the named tabs referred to below are in this file. The original list of 2400 airport codes has been trimmed down to the first 100 for brevity sake.
How To ...
Example 1: Source Web Site
Getting the Raw Data Into Excel
If you go to the above web site and simply highlight the airport code listing, copy, then Edit: Paste Special: Text (ALT + E, S, V), you'll get a list like Example 2 below. I pasted the entire data set in the next tab, called "Raw List".
Seperating Airport Codes from Full Name
The first step is always to look at your data set and understand the logical rules that dictate how it's set up. The first thing I notice is that the 3-letter airport code is separated from the description by a hyphen. However, a further inspection also shows hyphens in other areas, which will limit the utility of "-" as a delimiter.
Fortunately, I notice that all airport codes are 3 letters and are separated from the description by SPACE-HYPHEN-SPACE (" - "). A quick scroll down the pasted data set confirms this, for all of the lines that I care about. I make this last comment because you'll see some header and footer data (e.g., Row 162 and 165 in "Raw List") that I plan to delete, so I don't really care what happens to them.
You can use the data in "Raw List" for this exercise. Starting with cell B2, select the entire data set. The last entry is Row 2632, so make sure you get everything.
Once you have your data set selected, let's activate the Text to Column wizard, by going to the main menu and selecting Data:Text to Columns (ALT+D, E). You should see a window that looks like this:
You get to choose between Delimited (which means there is a character (e.g., comma, pipe, etc) that separates distinct data elements) or Fixed Width, which means that each data element starts at the same # of characters, starting from the left. We're going to select Fixed Width.
Clicking Next will take you to Step 2 of 3. Here is where you get to set all of your breaks. Basically, each break marks how the columns will get created.
Excel will try to guess at where the likely breaks are to give you a start, but you can fine tune it by 1) clicking on the ruler to add a new break, and 2) grabbing a break line and moving it left or right. In this case, let's grab the break line at position 5 and move it right one, to position 6. This will make the new column start at the beginning of the airport name.
Scroll right/left and up/down to see how your breaks will affect other rows in your data set.
If you're happy and you know it, click on Next to got to Step 3 of 3. Excel will give you a preview of what your columns will look like. The destination cell is important. If you don't change it, it will actually parse your source data. Also, if you have data adjacent to your source, it could get overwritten. Click on Finish to convert your data.
This is what I end up with (full data set is in the tab "Data to Columns - Step 1:
Equally important, I check to see how my header and footer information converted. Here's what I see.
Do you notice anything? Again, we're looking for patterns that can help us expedite the filtering and cleaning process. In this case, if I look at the second column, only valid airport codes contain the hyphen, so if I sort by column C, it should group all of my valid codes and strip out everything else. Well, let's try it out ... after I add a column heading.
Here's the sort I used. You can see how it worked out in the tab, "Data to Columns - Step 2"
The sort worked out pretty well. After I do this, I simply delete the noise at the end of the data set (former header and footer data) and delete the hypens in the middle. Next step ... Identify the domestic airports ... But we'll save that for another Tip.
When all is said and done, here's what I finally ended up with.
Imported, Cleaned, and Formatted Data
Notes
Last updated | 9/3/07 |
Application Version | Excel 2003 |
Author | Michael Kan |
Pre-requisites | None |
Related Tips |