Hi, all here..
I got one problem in my data analysis case with SQL Server 2005 data mining algorithms-in the dependency network, the attributes mostly affected the predictive value are different in different mining structures tho.
For example, I got a, b,c as input attributes to predict attribute d, in mining structure 1 (with all attributes discretized), i got a as the attribute which has the strongest relation with predictive attribute d. However in mining structure 2 (with all attributes values continuous), I got attribute b as the strongest attribute with predictive attribute d?
So what is the problem tho? In this case, how can I tell which attribute actually has the strongest relation with predictive attribute?
Thanks a lot in advance for help.
This is expected. You are modeling two different problems here. In the discretized space you are asking "are ranges of data predictive of my target", in the other you are using the raw values.
For example, say you are predicting credit worthyness based on age and salary. If you discretize you are asking "which age and salary ranges indicate creditworthiness." Your ranges for age could be 0-10,11-20,21-40, etc. and salary could be 0-30k, 30-60k, etc.
In this model you may find that people who are in the 0-10 bucket have very little creditworthiness and therefore Age is the strongest predictor.
If you choose continuous, you may find that overall, a very low salary, e.g. < 13K was a stronger indicator and age overall played a minor role. This could be because the bucketing hid detail from the algorithm, or even (if predicting a continuous) because the salary played a stronger role in a regression formula.
|||Hi, Jamie,thanks a lot .
No comments:
Post a Comment