- a wizard for loading spreadsheet data as RDF. This is a pretty powerful feature.
- a setup wizard which runs upon starting INQLE for the first time
- an embedded database. This dramatically simplifies the process of installing INQLE. To my delight, I discovered that Jena has recently begin supporting the fantastic H2 database, which performs very well as an embedded database (in fact it outperforms non-embedded databases significantly). I find INQLE runs much faster using an embedded H2 database than an external PostgreSQL database.
Sunday, June 22, 2008
INQLE version 0.1 is born!
My open source project INQLE (Intelligent Network of Querying and Learning Engines) has reached the ripe old age of 0.1! I had intended this version to be very bare bones, such that it would barely work. But I found that a few features were needed to make the bloody thing usable. Most notably I added 3 big features:
Labels:
inqle
Wednesday, June 18, 2008
INQLE Scores 8.5 out of 10 on killer app scale
I stumbled across this post on scoring a semantic web application on a 10 point killer scale. [OK, well actually I wrote it.]
So let me score my project INQLE on this scale.
At this writing, IQNLE is very "early doors" (version 0.0.9). Currently, INQLE scores 6 on the 10 point scale. However, our vision/roadmap puts INQLE on a path to score about 8.5:
Immediate Value to User
+1: The tool adds immediate value to the human user. INQLE permits automated machine learning experiments. Users must merely load data and they can then immediately start running experiments.
+1: We are aware of no product that does this.
+1: The tool is free.
Generation of Semantic (RDF) Data
+0.5: INQLE allows users to generate data in spreadsheets (as they are want to do). Users must then use the INQLE interface to import that data.
+1: The new semantic data that INQLE generates are assertions about the correlations that exist between different things in the universe.
+1: Those things which INQLE correlates are real world objects. Some of INQLE's future sampling algorithms will combine local data with remote, pre-existing RDF entities.
Consumption of Semantic (RDF) Data
+1: In future versions of INQLE, users will be able to annotate how valid or trivial or novel or spurious a correlation is.
+0: Such human annotation will require use of INQLE's interface.
+1: Future INQLE algorithms will be able to discover the results of past experiments.
+1: INQLE can then use the power of linked data and semantic reasoning, to perform repeated or related experiments. INQLE servers can therefore accrue an expanding body of knowledge.
So 8.5 out of 10 is pretty decent. But we have to remember that we (and by "we", we mean "I") wrote the damn thing, thru our own myopic specs.
So is INQLE the killer app for the semantic web? Um if your standard for a killer app is Google then probably not. But if you could live with lower expectations and if INQLE could really deliver on its ambition to effect true artificial intelligence and/or revolutionize the way research & discovery is done, then it could deliver some degree of killerness.
So let me score my project INQLE on this scale.
At this writing, IQNLE is very "early doors" (version 0.0.9). Currently, INQLE scores 6 on the 10 point scale. However, our vision/roadmap puts INQLE on a path to score about 8.5:
Immediate Value to User
+1: The tool adds immediate value to the human user. INQLE permits automated machine learning experiments. Users must merely load data and they can then immediately start running experiments.
+1: We are aware of no product that does this.
+1: The tool is free.
Generation of Semantic (RDF) Data
+0.5: INQLE allows users to generate data in spreadsheets (as they are want to do). Users must then use the INQLE interface to import that data.
+1: The new semantic data that INQLE generates are assertions about the correlations that exist between different things in the universe.
+1: Those things which INQLE correlates are real world objects. Some of INQLE's future sampling algorithms will combine local data with remote, pre-existing RDF entities.
Consumption of Semantic (RDF) Data
+1: In future versions of INQLE, users will be able to annotate how valid or trivial or novel or spurious a correlation is.
+0: Such human annotation will require use of INQLE's interface.
+1: Future INQLE algorithms will be able to discover the results of past experiments.
+1: INQLE can then use the power of linked data and semantic reasoning, to perform repeated or related experiments. INQLE servers can therefore accrue an expanding body of knowledge.
So 8.5 out of 10 is pretty decent. But we have to remember that we (and by "we", we mean "I") wrote the damn thing, thru our own myopic specs.
So is INQLE the killer app for the semantic web? Um if your standard for a killer app is Google then probably not. But if you could live with lower expectations and if INQLE could really deliver on its ambition to effect true artificial intelligence and/or revolutionize the way research & discovery is done, then it could deliver some degree of killerness.
Semantic Web Killer App Scale
Many smart people have asked this question:
"What is the killer app for the semantic web?". Well I do not have the answer to that question. But I can tell you some of the attributes that characterize a killer semantic web application.
I came up with a scoring system you can use for evaluating semantic web technologies. The maximum score is 10.
Immediate Value to User
1 point: The tool adds immediate value to the human user.
1 point: That immediate value to the user is novel functionality that is not available for free elsewhere.
1 point: The tool is free.
Generation of Semantic (RDF) Data
1 point: Use existing human workflows to generate new semantic data.
1 point: Automated computer process generates new semantic data, without direct human involvement.
1 point: Generated semantic data links extensively to pre-existing semantic data, hosted remotely.
Consumption of Semantic (RDF) Data
1 point: Humans may annotate the semantic data through a simple procedure, increasing the value thereof.
1 point: Such human annotation occurs automatically, using existing human workflows.
1 point: An automated computer process can consume the generated semantic data in some useful way. That is, humans are not the sole consumers of the generated semantic data.
1 point: Such automated processing increases the value of the body of semantic data, thereby facilitating cumulative accrual of value by the computer.
Not sure how accurate the above model is for capturing the key features of a semantic web application. For example, maybe it puts too much emphasis on machine processing of data. But that's what the semantic web is all about, right? Most agree that it's not just another paradigm for presentation.
So assuming that above scoring system is good enough, let's try to answer: "What is the killer app for the semantic web?"
Well it will be a tool for generating semantic data, of immediate value, using simple, human + automated methods. Such semantic data is processable by automated agents, in such a way that its value grows with time.
"What is the killer app for the semantic web?". Well I do not have the answer to that question. But I can tell you some of the attributes that characterize a killer semantic web application.
I came up with a scoring system you can use for evaluating semantic web technologies. The maximum score is 10.
Immediate Value to User
1 point: The tool adds immediate value to the human user.
1 point: That immediate value to the user is novel functionality that is not available for free elsewhere.
1 point: The tool is free.
Generation of Semantic (RDF) Data
1 point: Use existing human workflows to generate new semantic data.
1 point: Automated computer process generates new semantic data, without direct human involvement.
1 point: Generated semantic data links extensively to pre-existing semantic data, hosted remotely.
Consumption of Semantic (RDF) Data
1 point: Humans may annotate the semantic data through a simple procedure, increasing the value thereof.
1 point: Such human annotation occurs automatically, using existing human workflows.
1 point: An automated computer process can consume the generated semantic data in some useful way. That is, humans are not the sole consumers of the generated semantic data.
1 point: Such automated processing increases the value of the body of semantic data, thereby facilitating cumulative accrual of value by the computer.
Not sure how accurate the above model is for capturing the key features of a semantic web application. For example, maybe it puts too much emphasis on machine processing of data. But that's what the semantic web is all about, right? Most agree that it's not just another paradigm for presentation.
So assuming that above scoring system is good enough, let's try to answer: "What is the killer app for the semantic web?"
Well it will be a tool for generating semantic data, of immediate value, using simple, human + automated methods. Such semantic data is processable by automated agents, in such a way that its value grows with time.
Labels:
inqle,
rdf,
semantic web
Thursday, April 10, 2008
What's the best creative medium?
Great question, Dave. Well let's break it down. What attributes might distinguish the creative media?
Most features of creative media don't seem to differentiate one versus another. For example most creative media give you a significant buzz (that is, they are fun). True, some are more fun than others. E.g. stamp collecting I would wager is not lighting up much of the elation centers of the brain. Learning Klingon maybe down on the list as well. Just above learning Klingon I would rank my first profession: a lab researcher in a molecular biology lab. This was creative work albiet slow. It seemed that one made a creative decision about once per year. But for the most part, they are a wash here.
All creative endeavors have some form of environmental toll as well. I suppose ones that generate a lot of byproduct would lose in this attribute. Like nickel smelting-4-fun.
What about the value of the end product to society? Now here is a point of differentiation. Clearly some creative exploits are not worth a hill of beans save for the transient PET pattern they create in the beholder for about 5 seconds. I would put most modern art in this category. What about good art? Or musical performance? Well the benefits are subtle, and I would argue small.
What about its potential to make you money? This is a good point of differentiation because it indicates how much the person is completely wasting his time. Measured by this standard, most creative exercizes are a fool's errand. But some media outperform others clearly. I suspect that here again, art fares poorly relative to more technological-related exploits.
Where the hell are you going with this Dave? I hear you ask. Well my thesis is that the finest creative medium ever is [envelope please]. Computer programming!
Eh? A hush fell over the Readership like a choking cloud of chlorine gas. Good thing nobody would ever read this. Well here is why software development is such a fabulous exploit for the few who are lucky/squashy enough to do it.
So in sum, I retract all above statements.
Dave
Most features of creative media don't seem to differentiate one versus another. For example most creative media give you a significant buzz (that is, they are fun). True, some are more fun than others. E.g. stamp collecting I would wager is not lighting up much of the elation centers of the brain. Learning Klingon maybe down on the list as well. Just above learning Klingon I would rank my first profession: a lab researcher in a molecular biology lab. This was creative work albiet slow. It seemed that one made a creative decision about once per year. But for the most part, they are a wash here.
All creative endeavors have some form of environmental toll as well. I suppose ones that generate a lot of byproduct would lose in this attribute. Like nickel smelting-4-fun.
What about the value of the end product to society? Now here is a point of differentiation. Clearly some creative exploits are not worth a hill of beans save for the transient PET pattern they create in the beholder for about 5 seconds. I would put most modern art in this category. What about good art? Or musical performance? Well the benefits are subtle, and I would argue small.
What about its potential to make you money? This is a good point of differentiation because it indicates how much the person is completely wasting his time. Measured by this standard, most creative exercizes are a fool's errand. But some media outperform others clearly. I suspect that here again, art fares poorly relative to more technological-related exploits.
Where the hell are you going with this Dave? I hear you ask. Well my thesis is that the finest creative medium ever is [envelope please]. Computer programming!
Eh? A hush fell over the Readership like a choking cloud of chlorine gas. Good thing nobody would ever read this. Well here is why software development is such a fabulous exploit for the few who are lucky/squashy enough to do it.
- High buzz per unit time invested. Imagine scientific advancement sped up by a 1000 fold. That's what computer programming feels like to me. It is pretty easy to get yourself humming on a project in which you are tinkering with code and running a new experiment every half minute or so. No pesky gels or radioactive phosphorus or carcasses either.
- High reward to society [I think]. This is hard to figure. Well what is the internet worth? Now throw in the value of non-internet devices. Costly. And all this wealth was created in just the last few decades. I bet well over half of the value of the internet was programmed in just the last few years. Yes I hear your point that the actual content on the internet gets some credit. If this post is at all representative, then I think it's clear that content is overrated.
- High potential for financial reward. Yes the days of everybody-who-knows-html-gets-rich are over. But you stack the median joe programmer against the median seth actor and i think in the former case he has a ranch house and a kid and 2 cars versus the latter is sleeping on his friend's couch, still chasing a forlorn dream.
So in sum, I retract all above statements.
Dave
Labels:
computer programming
Friday, October 19, 2007
Evidence-Based Everything
Subtitle: Everything is about prediction is about data mining is about everything.
Here's what I mean:
1. "Everything is about prediction"
All information (books, articles, lectures, even conversations) is basically intended to provide you with a model from which to make future predictions. In some cases, the predictive models are explicitly spelled out, as in "the moral of the story". Other predictive models are more subtle. When your coworker says she thinks your employer is short-sighted, she is predicting that in the near future, that employer will do short-sighted things. When a self-help book tells you that people ate half the number of candies when the candy jar was moved 6 feet away from their desk, that book is providing you with a predictive model that says that mindless snacking goes down as snack food is moved away.
2. "prediction is about data mining"
All predictions are measurable using data mining. What is data mining? It is basically the scientific method, with particular focus on finding statistical correlations from tables of data. Data mining and associated scientific methods can be used to make just about any prediction. The best route to developing predictions about the real world is to use scientific methods, armed with -- you guessed it -- data mining.
3. "data mining is about everything."
Whether the subject matter is interpersonal relationships, dietary habits, or social hierarchies among crack dealers, it can be measured with data. And it should be! Because hiding with data everywhere are shocking findings that often argue against the conventional wisdom. Thus, data mining can dispel our misconceptions and enable us to better predict and manage our future, in all walks of life.
My broader point is that we are very subjective creatures, with an amazing capacity to see the world through tinted glasses. In my field of internal medicine, we have only in the last few decades appreciated the critical need to practice evidence-based medicine. This basically means that we rely on scientific proof that an intervention is justified, and that in the absence of that proof we should proceed with caution.
Before the evidence-based medicine movement, doctors relied exclusively on intuition and basic research into the pathophysiology (the nuts and bolts) behind diseases. The trouble with this approach is that intuition and even the science behind studying pathophysiology of diseases both can and do lie, frequently. Medicine has done many famous about-faces when it finally got around to studying whether an intervention is helpful. Examples include medicine's historic practices such as useless low-protein diets for kidney health, recommending toxic vitamin E to prevent cancer, widespread use of toxic anti-arrhythmic drugs and estrogen replacement therapy.
Bottom line: everything, not just medicine, would benefit from increased use of scientific data mining, to provide us with new and improved predictive models about... everything.
Here's what I mean:
1. "Everything is about prediction"
All information (books, articles, lectures, even conversations) is basically intended to provide you with a model from which to make future predictions. In some cases, the predictive models are explicitly spelled out, as in "the moral of the story". Other predictive models are more subtle. When your coworker says she thinks your employer is short-sighted, she is predicting that in the near future, that employer will do short-sighted things. When a self-help book tells you that people ate half the number of candies when the candy jar was moved 6 feet away from their desk, that book is providing you with a predictive model that says that mindless snacking goes down as snack food is moved away.
2. "prediction is about data mining"
All predictions are measurable using data mining. What is data mining? It is basically the scientific method, with particular focus on finding statistical correlations from tables of data. Data mining and associated scientific methods can be used to make just about any prediction. The best route to developing predictions about the real world is to use scientific methods, armed with -- you guessed it -- data mining.
3. "data mining is about everything."
Whether the subject matter is interpersonal relationships, dietary habits, or social hierarchies among crack dealers, it can be measured with data. And it should be! Because hiding with data everywhere are shocking findings that often argue against the conventional wisdom. Thus, data mining can dispel our misconceptions and enable us to better predict and manage our future, in all walks of life.
My broader point is that we are very subjective creatures, with an amazing capacity to see the world through tinted glasses. In my field of internal medicine, we have only in the last few decades appreciated the critical need to practice evidence-based medicine. This basically means that we rely on scientific proof that an intervention is justified, and that in the absence of that proof we should proceed with caution.
Before the evidence-based medicine movement, doctors relied exclusively on intuition and basic research into the pathophysiology (the nuts and bolts) behind diseases. The trouble with this approach is that intuition and even the science behind studying pathophysiology of diseases both can and do lie, frequently. Medicine has done many famous about-faces when it finally got around to studying whether an intervention is helpful. Examples include medicine's historic practices such as useless low-protein diets for kidney health, recommending toxic vitamin E to prevent cancer, widespread use of toxic anti-arrhythmic drugs and estrogen replacement therapy.
Bottom line: everything, not just medicine, would benefit from increased use of scientific data mining, to provide us with new and improved predictive models about... everything.
Labels:
data mining,
medicine,
prediction
Tuesday, October 9, 2007
The Global Environment: Let Me Go On Record
Some things are important enough to restate the obvious. I write this for the benefit of some distant future reader combing thru ancient posts, trying tounderstand why we ruined the planet.
For the record, I understand and firmly believe the following environmental facts:
As for solutions, let me go on record as endorsing massive diversion of resources to protect environments, incentivising rainforest countries to preserve what remains, massive effort to invent new energy technologies, etc. For my small part I contribute the the World Wildlife Fund and others.
For the record, I understand and firmly believe the following environmental facts:
- We are increasing CO2 concentrations to levels never before seen in at least 400,000 years.
- This plus other human factors are contributing to major environmental damage in the form of global warming and other forms of global climate change.
- Sea levels will almost certainly rise to their maximum levels within the next century or 2 (100 feet above current level) with catastrophic consequences (imagine all of Florida, trash and all, being washed out to sea).
- Through climate change and through habitat destruction, we are visiting upon the earth one of the greatest extinctions ever. The pace of this mass extinction surely matches those from prior meteor impacts on a geologic scale. We will soon live in a planet without cheetas, pandas, large primates, many species of whale, most smaller primates, etc., etc.
- There is a vast overpopulation of humans contributing to this. I recognize it is fashionable to say that world populations will level off around 11 billion +/- several billion. However: (a) there is no proof of this, (b) even if populations flatten, the environmental impact per person continues to rise, (c) 40,000 children starve to death every day, (d) habitat destruction and other consequences of human activity have wiped out most wild habitats, with the remainder to be consumed within the next few decades.
As for solutions, let me go on record as endorsing massive diversion of resources to protect environments, incentivising rainforest countries to preserve what remains, massive effort to invent new energy technologies, etc. For my small part I contribute the the World Wildlife Fund and others.
Labels:
environment,
wildlife
Saturday, October 6, 2007
link on vitamin d
http://ods.od.nih.gov/factsheets/vitamind.asp
great stuff from the NIH on vitamin D.
How much vitamin D can children safely take? Accorfing to this
document and the NIH, the answer is a whopping 1,000 I.U. for infants
0-12 months, and 2,000 I.U. for all people over 12 months!
I expect we will be seeing recommendations from AAP that kids should
get more vitamin D, to prevent later Multiple Sclerosis plus cancer.
Subscribe to:
Posts (Atom)