C-CWEL Excel Practice

From CoolWiki
Revision as of 22:47, 9 May 2013 by Rebull (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Email from 23 Apr

The basic skills you'd need in Excel are:

(1) Reading a text file into Excel. (I do have a screencapture tutorial on that -- again, developed for another team, but fundamentally the same concepts even if the first little bit isn't directly applicable : http://www.youtube.com/watch?v=nCJ3ctOGvNk&list=UUQfN0BTwaSZ6ABsQcieCpdA&index=1&feature=plcp

(2) Convincing Excel how to ignore certain data. In my world, I need a placeholder for 'no data', so in most of my tables, -9.00 means 'no data available'. The most efficient way to convince Excel that that is 'no data' is just a global search and replace. There are some subtle tricks on this, which is why I am listing this as a separate skill.

(3) Manipulating columns. For example, programming column E to be column B-D. More handling of "no data" issues here. (For a test set, I could limit this to just be objects where there are data in all columns, making the 'no data' issues non-existent. But for the "real thing", this will be an issue.)

(4) Plotting columns against other columns, and setting limits on what is plotted such that the 'no data' objects don't appear. Personally, I think plotting in Excel is a gigantic pain in the @$$ and I never get it right on the first try. it is far easier for me to write code to make plots, so that is what I do. I will be of limited help on this; I will just end up poking buttons until I stumble on what works for your excel version. You need to pick 'xy plots' or 'scatter plots' or other words like those (exactly what you pick is dependent on your excel version.) There are several you tube tutorials from the general public that I can find that address making plots in Excel. (Again, for a test set, I could limit this to just be objects where there are data in all columns, making the 'no data' issues non-existent, but this will be an issue in our "real" data set.)

(5) Plotting lines on top of such plots, NOT ones that just connect the points as plotted in the prior step, but instead can be represented by equations. You could create an additional pair of columns (or set of columns) on another sheet within the same workbook to overplot on the plot; that's what I would think would be the easiest thing to do.

(6) Identifying objects in different sectors of that plot. This is not particularly trivial in Excel, at least as I can think of off the top of my head. The same basic approach I used in my own code would probably be the easiest here -- add another excel column that is 1 ('yes') if the object passes the test and 0 ('no') if not. (And repeat for each of the series of tests.) The code that would automatically populate that column would probably be relatively complex, compared to the other items here. THIS IS THE LEAST IMPORTANT OF THE SET OF 6.

NOW, keep in mind that the main reason I want to do this is to give you a better understanding of the Koenig color cuts, because I sense that this is important to you. in terms of the skills you're going to need to do the rest of the things we need to do, #6 here is the least important, in part because for our BRC 38 catalog, Xavier has already run his color cuts, so we already have the set of yes/no categories for each object, so we need not reproduce all of that work. I would like to have you do at least a few cuts so that you get the idea. #6 really is the least critical, because for illustrative purposes, we could do step 6 by hand.

Sample source lists

Walking (simplest)

File:Simplest.txt -- set of 20 objects, all known young stars, columns are name, j, h, k, w1, w2, w3, w4, all in magnitudes. File has only actual detections (as opposed to limits or 'no data' flags) at JHK, all four WISE bands. Can you read this in, and say, plot, W1 vs. W1-W4? Remember that brighter is smaller numbers in magnitudes, so for this plot, you will need to reverse the axes so that down is a larger number, and up (brighter) is a smaller number. How about J-H vs. J-K? This involves skills 1, 3, and 4 from the above. Plot whatever else you want; this was just a suggestion.

Jogging (harder)

File:Jogging.txt -- set of 40 objects, all known young stars. Same columns as before. Now has some -9 values, which indicate no data. Same task as 'walking' (make those plots with those issues), but now cope with -9's. Note that for plotting just one band against another, fixing the axes as displayed will remove the -9s. But we essentially never plot just one band against another; it's usually one band against a color, or a color against another color. If you have no data in just one of the two channels, you can fix the axis limits. But if you have no data in BOTH of the two channels, -9 - (-9) is exactly 0, so just setting limits on your plot will not remove those objects. (Note also that there might be some values < 0 that are legitimate magnitudes, but there are no legitimate magnitudes as small as -9.) This highlights skill 2 above. Plot whatever else you want; this was just a suggestion.

Running (hardest)

File:Running.txt -- set of 552 objects, with mostly the same columns, but one new one at the end: status. If "status" = 'knownyso', then it is a known YSO; if "status" = 'rejcand', then it is a rejected candidate YSO object, e.g., not likely a YSO. There are all just real detections in here (at least, I think that is the case...). Plot whatever you want, but make the 'knownyso' objects one color or symbol and the 'rejcand' a different color/symbol. Try (just to pick a plot out of the air) H vs. H-K. What is the average H for the known ysos? The average H for the rejected candidates? Can you identify the specific reddest objects in this plot (e.g., largest H-K)? What is the brightest rejected candidate? (This is working towards skill 6.) Draw a line in this plot at H=8 and H-K=1. How many known YSOs are in each quadrant of that plot (eg., above and below H=8 and left and right of H-K=1? (This is skill 5 and working towards 6.) The Excel functions 'average' and 'countif' may be of some help in doing this.

That oughta be enough to keep you busy... :)

The actual color selection task (the real deal)

File:Realwisetable.tbl.txt -- real wise data table. Look at the long header to find all the columns. Most important for now anyway are catalog number, w*mpro, and *_m_2mass. Plot one of Xavier's first plots (fig 8 in the paper), [3.4] vs. [3.4]-[12] (which is w1 vs. w1-w3). Draw a line at [3.4]=13 from [3.4]-[12]=-1 to 5, and [3.4] = 1.9 * ([3.4]-[12]+2.55).

If you get this far thinking, pff, easypeasy, plunge onward and do the rest of Xavier's plots.