[link] 2008-11-02T10:08:32 Graphs from sparql results: This is a response to Download SPARQL results directly into a spreadsheet So far you've motivated seeing the results of a query in a table and making a graph from them. I'd like to have both of those capabilities in a webapp. E.g. I should be able to embed a live graph in my own page like this:
<img src="http://sparqlgrapher.com/svg/example.com/query=SELECT+?date+?price+{...}">
Visiting my hypothetical sparqlgrapher.com directly would give you a UI to layout and customize the graph. When you're done, you'd take that url and embed it elsewhere (or just take a copy of the image, if you want a one-off).
[link] 2008-10-13T22:38:18 rdflib vs jena graph creation APIs: I actually looked at the jena RDF API today, and I was interested to see how graph creation compares to rdflib's style, which is the one I normally use. From the Jena introduction (minus the model setup and some comments): An rdflib python port of that: If I were making a new version of the rdflib API, here's what I'd consider:String personURI = "http://somewhere/JohnSmith";
String givenName = "John";
String familyName = "Smith";
String fullName = givenName + " " + familyName;
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));johnSmith = URIRef("http://somewhere/JohnSmith")
givenName = "John"
familyName = "Smith"
fullName = givenName + " " + familyName
graph.add((johnSmith, VCARD['FN'], Literal(fullName)))
name = BNode()
graph.add((johnSmith, VCARD['N'], name))
graph.add((name, VCARD['Given'], Literal(givenName)))
graph.add((name, VCARD['Family'], Literal(familyName)))
I also obviously prefer 'edge' to 'property', since edges sound more like the free-form graph that we're making. What system would have a property whose value is another property? That's a perfectly natural RDF construct, but pred1.addProperty(pred2, pred3) doesn't look so natural. It's also less surprising that edges can be traversed both ways. Other systems with "properties" don't always support that, causing users to make redundant inverse properties where they think they might need to traverse backwards.
[link] I played with a bunch of New Fangled Web Technologies and redid my home page. Almost everything is dynamically derived from data sources that I presumably keep up to date for other reasons. The foaf part and projects list part aren't done yet. I also haven't removed all the zope pages yet, unfortunately. (Zope turned out not to be a good system for making a low-maintenance site that lasts for 10+ years.) I hope to have a DOAP document for each project, which will make them easy to list on my home page as well as other project-list systems.
2008-08-03T22:25:34 New home page:
[link] 2008-04-27T19:51:52 Using freebase to help with dbpedia searches: I wrote this response to a thread on a mailing list, but I can't find anyplace where sourceforge has my reply online (I did receive it in an email). I would have expected it on this archived thread. So here it is again, at a place I can link to.
On 21 Apr 2008, at 14:40, robl wrote:SELECT * FROM pages WHERE page_title LIKE "Queen%Elizabeth"
This would perform a case insensitive match on Queen(anything)
Elizabeth
(at least in mySQL)....
Is there quick way to do what I want ? Are there any indexes I could
apply to improve things (I have already created the indexes
specified at
http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/
kidehen@openlinksw.com's%20BLOG%20%5B127%5D/1298)
?
Or do I need to create a conventional SQL table of resource names and
then do a SQL LIKE query on those ?
You might also want to check out freebase. Here's the approach I'm about
to attempt, myself. Start with a reconciliation query:
http://sandbox.freebase.com/dataserver/reconciliation/?name=Queen+Elizabeth&types=%2Fpeople%2Fperson&responseType=html
- the reconciliation service handles misspellings and other variations
- s/html/json/ for the machine readable version
Then look at the freebase page or perform a query:
http://www.freebase.com/view/en/elizabeth_ii_of_the_united_kingdom
That page has this link:
http://en.wikipedia.org/wiki/index.html?curid=12153654
On that page, we have
<a href="http://en.wikipedia.org/wiki/Elizabeth_II_of_the_United_Kingdom">article</a>
Maybe freebase can just hand us that link instead of the curid one. I
haven't gotten to that part of my code yet. I don't know how often the
last word of the freebase URI is in sync with the WP one, but that seems
like it would be the least reliable. Following freebase's designated WP
link is probably more robust.
Finally, take the wiki name, and make a dbpedia URI:
http://dbpedia.org/page/Elizabeth_II_of_the_United_Kingdom
You probably noticed that elizabeth_ii_of_the_united_kingdom wasn't the
first result for 'Queen Elizabeth' of type /people/person. I'm not sure
if freebase considers that a bad result page or not. The reconciliation
service is new, so now's probably a great time to tell them how
important good results are to you :)
[link] 2008-03-05T23:53:13 Notes from the talks at Semantic Web: Are Scalable Graph Data Applications Possible?: Notes from the talks at Semantic Web: Are Scalable Graph Data Applications Possible? I was looking forward to more Oracle demos and roadmap-type discussion, but instead the highlight was allegro.
jeff from oracle says:
fraud detection is using graphs
we're generating data faster than we can process it
business value comes from: reduce cost of operations; aid decision-making; improve the transparency of business operations (e.g. for businesses that need to meet regs)
nice slide on DB approaches, broken into disk/ram, native/layered, etc
siderean is a company doing in-mem, multiple machine storage
david from mulgara
key to web scaling is the late binding of address to resource. Allows the information mgmt technique of the web to scale well
the next gen mulgara version, which the team was meeting about this week in SF, will use lots of disk, perhaps 40G ram, and store 100B (?) triples
vertica:
SQL DBMS, focus on analytics
50+ customers: verizon, comcast, level3
came from the cstore project, MIT
MIT library catalog is rdf, 50M triples: Barton Dataset
uniprot protein dataset is 262M triples. vertica serves that dataset for public querying
jans aasman, franz inc:
23 years old company, 2 yr with a triple store
customers do 'event handling and activity recognition'
50 customers, plus free download
monterey aquarium doing Marine Metadata Interoperability Project
los alamos is studying who reads what publications, graph structures in readership
sun doing baetle, the bug tracking one
japan telecom KDDI is doing spam and fraud detection with allegro. they need to determine what is spam across their busy network. they create new spamassassin rules over time.
OFFIS using rdf for info about power grid usage
allegro loads 1e9 quads in 8 hours
has sesame interface
supports xml schema datatypes, e.g. range queries on dates. Literals can be stored as their own numbers
'social network analytics library' for degrees, cliques, group stats
quick loads from oracle for temporary dbs used for analytics (coming soon)
RDFS++ reasoner for the usual inferences
temporal reasoning (allen's temporal logic, for intervals)
their time/space handling helps with event search. one query involves a place and radius, person connections, other event details
police, e.g., need temporal reasoning
"homeland security is interested in every type of imaginable event"
"find all meetings that happened in december within 5 miles of berkeley that was attended by the most important person in Jans' friends and friends of friends"
they have a custom query language for their various datatypes and their capabilities, like (geo-box-around !geoname:Berkeley ?event 5 miles)
even 3 months of american phone call records is already petabytes
jans' thesis was about car driving behavior
GPS (maybe plus phone) leads to rich data about people- work, purchasing, etc
My question, which I didn't get to ask: How do the approaches compare in terms of latency for very small queries? Many of my queries are not batched together well, or my app needs to make a lot of decisions during the graph traversal.
[link] Some goals for a better wiki system: A common case seems to be "add a new page and list it in some existing TOC section". Another one is "add a new section (paragraph or more) to this page". Editing words within an existing section that you didn't write, that might be rare. I still like tinymce, although nelix_ isn't a fan. Wikis that I use (that I'm trying to be better than) are: twiki, zwiki, confluence. Related: rdf blog engine ideas
(1) 2008-02-22T00:38:14 Goals for a wiki system:
[link] Event:
http://sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&eventId=13012&nodeID=1 Progress in the user experience on the web, if we look at what the
user has to do and what the rest of the system has to do: examples: 'sandy' is an email reminder assistant 'farecast' for airfare. suggests alternate cheaper flights,
trends. Looked cool from the screenshot and description. Tom remarked at the end that finally, intelligence and computation
will be able to be what we compete with, instead of just having "brand
bullies" :) And, "each time AI does a job well, it always disappears" check Nova's blog for slideshow semweb says, put metadata in the data so new software can reuse the
past work (naturally!) seems very close to that friendlist thing from that other blogger i
read, i forget the exact name builds a 'semantic interest profile' about you. picks
people/places/organizations/topics you're interested in create a 'twine' (like squidoo lens, page about a topic). The twines
had surprising urls: like http://twine.com/twine/my-house, right at
the global level. Are the urls different depending on who's logged in?
Or does Nova's own stuff just go to the top? :) A bookmarklet opens a transparent frame right on top of an external
page you want to tag. From there, it's like delicious, but gathers a
bit more data automatically. When he used the bookmarklet on an amazon page, twine pulled some more
fields from the page about the book on the marked pages, twine finds words and topics and makes the links edit-in-place UI to fix the fields of the data it found; add more
fields. like freebase they do some auto-summary of text from a wikipedia page query is like newegg power search (or most semweb stuff for that
matter), pick a type, add your filters email in your own items to your 'recent items' list, just like a
ticketing system would accept new tickets. URLs in the mail get
crawled and those sites show up in your items too. (calo had a more
turbocharged version of this, where they'd go hunting for info about
everything and build big profiles about users and stuff) goal of twine is organization. is this automating my tasks? the users
will reveal what is valuable to automate. the name = magic + (something) + digital grafitti 19-25 year olds have 2x as much free time as other youth (japan, at least) important for them to know what everyone else is thinking predicts what to do, e.g. 'eat' (when it thinks you're hungry based on
time, place, your emails, your explicit queries). Nice. it reads emails only to guess what kind of activity you're currently
doing. 11% of the test email dataset had information related to
leisure activities (which is all magitti cares about). That seems low
to me. Maybe that's all the ones they were able to correctly process
(or maybe there's something I'm not estimating right about the emails
of 20-somethings in Japan) look at your past behavior to learn your patterns of
eat/see/shop/... They can make plots based on day-of-week and
time. This is what I want for my home automation. ppl want to use the phone UI with one hand. 6 big buttons surrounding
the content pie menu on the phone. 4 quadrants only, sometimes more narrow ones
for the border buttons. They looked really usable. see yelp-style ratings on businesses, takes your star rating as you
look at the page. collaborative recommendation stuff hit the lower-left one to change your activity from 'any' to something
else. Even if you dont say anything, they still list good ideas from
their best matches of your activity, place, reviews, etc you can force the activity ('shopping for clothes') and it refilters. flickr photo locations plus tags shows popular tags on the
map. 'tagmaps' from yahoo research berkeley. pretty cool to zoom in
and out. using 4M photos, last year's data upcoming version has 30M photos. Sometimes, these tags annotate world
maps better than the pros do. autotag your vacation photos by using the place of the photo see the 'fireeagle' project for how web apps can know your location i dont have live notes about the best demos, since I had to change
seats to see the screen. The phone app that shows various feeds of
pics included "wallet" (the photos you often show people), "my wife"
(the photos she's taking now), "any flickr photos tagged with 'happy'
near this location". when reviewing all the tags on flickr, they consider the time too so
as to figure out which things that are actually events ('bluegrass
festival') and not places ('the mission'). This is like a topic I got
into at a semweb meetup once: with just the tags on delicious, could
you produce the names of all the states and their capitols? (I think
yes) This is a big research project that covers CPOF (recently in a Wired
article) and has some kind of cross funding and sharing with many
other projects, including twine. cognitive assistant that learns and organizes SRI, darpa includes Command Post of the Future builds 'relational model' of user's world. not sure if it's rdf guesses what emails are about, what tasks they go with. you give feedback 'meeting understanding'. remote people are in everyone's
headsets. CALO writes transcript, action items, Q/A pairs. when he comes to a mtg, calo knows what all the people have been doing has some kind of chat bot for scheduling a meeting (and other tasks,
apparently). you use limited natural language AI uses 'probable beliefs', revises them as new facts come
in. 'probabilistic consistency engine' can update knowledge with new facts. each year, they test the system (like an SAT test) and it has to
improve. questions like "what to do when tom can't make a meeting:
A. reschedule; B. tell tom; ...". They compare the baseline untrained
CALO to an instance with 16 users for 2 weeks, and note whether calo
does better at the test after that learning. they have a full self-contained office environment, and a lite version
(used by DARPA). lite one has almost no interface the lite version does: google desktop search PLUS nlp (!). calo found
someone's home page, pulled number and address and job title. Noted
the person's publications and web pages to see what the person does. followup query: "people with expertise in learning" then ".. that work
at SRI" to narrow it down A query for "slides about iris" finds individual slides in past
presentations. then you search for similar slides to a
near-match. Apparently the normal desktop searches look for keywords
and stuff in a whole .ppt, which is obviously not as useful. make a new presentation just based on title. digs up all relevant slides 'preppak' for a meeting. finds all documents that are required or
recommended for the meeting in the meeting, you can watch the transcript, which knows the person
since everyone wears a mic. Testing within the government calo is a personal assistant, doesn't share much with groups. some
things (e.g. meeting schedule) are shared. you dont reveal all your
meeting time prefs, but the calos negotiate it
2007-12-14T09:15:52 Notes from Intelligence at the Interface: tom gruber, tomgruber.org
twine, nova
Remember when you started using delicious? it took 5 mins to learn
most of the functionality, but then several days to notice that this
is really worthwhile and it's going to help a lot. I expect a
similar, but stronger, effect from twine. You learn the mechanics of
checking information in, then after doing it for a while you notice
which of your former laborious tasks have melted away. I also have
high hopes for systems connected to twine. It's like a more polished
version of piggybank. And they're going to add in recommendations,
which may bring the 'smarts' closer to what magitti or calo is
doing.
PARC magitti
Finally, some novel UI work on a phone-based UI. It looked really
nice-- low on sparkles and icons, high on usability. The app itself
(recommendations and guides for your leisure time) seems good, and
it was amazing to see a Japanese paper-printing company looking for
ways to get into new media. Feels like the only stories I hear in
the USA are about old companies putting their effort in keeping
their old businesses going (e.g. big oil). Anyway, there was some
cool personal activity prediction stuff like where they look at your
messages and your past trends to guess what you want to do -right
now-. I hope to get into exactly that kind of thing on my home
automation project.
yahoo
The phone-photo-tag part of this demo gave the most feeling of "you
are looking into the future of technology" of all the presentations
tonight. The UI was not elaborate. Mainly, it's that your phone
camera is helping you tag your photos in real time (like delicious,
except it knows your position and millions of past flickr tags too)
and it's readily presenting you with other photos of
interest. Everyone using this would essentially be running their own
little version of justin.tv (photos, not video). The
heavily-assisted tagging helps you organize your photos, and
therefore organize your memories. Valuable! The speaker mentioned an
example of looking up where you last had dinner with that
friend. Since it was so easy at the time, you would have taken a
photo and tagged it with the friend and the restaurant. Problem
solved.
CALO
The calo express part of the demo was pretty nice. It's a much
smarter desktop search that would easily beat whatever you're using
now. Especially what I'm using now, which is nothing (and I've tried
a few OSS projects a little). Things took a turn for the
industrial-strength-awesome when it got into the meeting planning
and recording features, mainly for the amount of tech they're
throwing at the problem. The AI testing stuff was also amazing, and
it helped connect the project back to real life: if they don't make
a certain amount of progress in their AI evalutions, they don't get
funded for the next year.
[link] http://cvs.bigasterisk.com/viewcvs/room/sys.dpms?rev=1.1&view=auto New program to watch whether my screens are powered on or DPMS-sleeping. I also track the idle time and the currently-focused window, since I happened to find code for those while I was working out the DPMS. The result is a little RDF graph: (I know the URLs and date formats are poor right now) The program should be easy to run if you're on X and you have rdflib, py2.5, and python-xlib. This results format is part of my new plan to have each program regenerate entire graphs of whatever they measure. I'm thinking of sending the graphs around with jabber, using pubsub to send them only when they change. That would be unlike https://stpeter.im/?p=1328 which uses SPARQL as you'd expect. For example, if the user (or DPMS timer) turned off the screen, the last triple in my example would change to the _5:Suspend node. Other listeners who have subscribed to the computerIdleState graph will get an updated version of it. The reason I started tracking screen state was simply to measure how many hours my 300W monitors are on per month. Either this program, or some listener one, will have to log that data somewhere. Of course, there are obvious uses for logging idle time patterns too, and that measurement probably wants a bit more compression. (Example app: tell my friends on jabber that I'm out, but my average return time for Tuesdays is 9:45pm.) I really have to move this old home automation project off CVS and onto darcs. I don't mind the conversion, but I want to keep at least some of the cvsweb urls working since I think I've pasted them into a lot of postings all over the web.
2007-12-02T03:19:09 Watching the X screen power state:
@prefix _4: <http://dash/>.
@prefix _5: <http://bigasterisk.com/computerIdleState/power/>.
@prefix idle: <http://bigasterisk.com/computerIdleState/>.
_4:console idle:focusClass "rxvt";
idle:focusName "XTerm";
idle:focusWindowName "drewp@dash:/my/proj/room <>";
idle:lastNonIdle "1196593439.27"^^<http://www.w3.org/2001/XMLSchema#float>;
idle:power _5:On.
[link] Updated: fixed FuXi link I'm trying to do my home automation with RDF and reasoning. RDF is the unified way to write all the configurations, and I'm hoping to use a logic engine (maybe FuXi or Euler) to write the control systems. Hopefully those will make it easy for humans or computers to edit the setup. I look forward to being able to ask an N3 proof system "why is the porch light on?" and having it tell me "the web said the sun has set by now, you tripped a motion sensor within the last 15 minutes, and there was no other light shining in this area, therefore I turned on the porch light". Tonight I cobbled together the first working version of some home automation components talking RDF. A bluetooth dongle constantly searches for devices, and if it finds one, it states that [the bluetooth sensor] [senses] [the URI for the device]. Here's that program. (BTW, avoid bluetooth chips by Integrated System Solution Corp and prefer ones by Cambridge Silicon Radio. The ISSC one I got has the lousy address 11:11:11:11:11:11 that's hard to change since I'm not using windows. Also, this bluetooth intro is really good.) Next in my home automation system, a reasoning program hears about new statements and executes the right logic to produce more statements about what should happen. This program is a stub for now- it just turns the presence of my phone into a statement to power the door lock. But devices.n3 suggests what some of the logic might eventually look like. Finally, an output program has been watching for statements about pins on the parallel port it controls. The reasoning program said to put power on bit2, so this program sets the output accordingly. On that pin is a circuit with an optoisolator, a triac, a transformer, and the electric strike that releases the door. When the real logic is in place, the proof system should be able to say "I unlocked the door because someone friendly was nearby, because Drew is friendly and Drew carries a phone with the bluetooth address I saw".
2007-09-09T04:14:40 RDF reasoning for home automation: 
[link] Here's how to use tabulator to render a simple data table. My test data might be a bit confusing since the terms overlap with tabulator terms. I'm trying to compare query runtimes of various queries on different databases. The result I'm trying to produce is a table showing how long each database took on each query. Here's my mockup data in n3: Note the line which associates the 6 results with this document. Without that link, tabulator won't put the results in its outline. I used cwm to create an XML version of that data, which you can view that data in tabulator with the following link. [Update: there was no need to convert; tabulator can read n3 thanks to a version of the cwm parser translated to js with pyjs!] Tabulator has a query-building interface where you click on predicates and other nodes to constrain your result rows, but I couldn't figure out how to make the table I wanted. Instead, I used the SPARQL tab at the bottom and wrote my own query: In english, that says "find queries with results for the two databases, and report their times in columns named after the databases". You can load tabulator with my datafile and that query together: tabulator with mockup data and query You have to click the radiobutton next to 'Query' to see the results. Now I'll actually write my database benchmark, and I'll have it output result sets for each db. I should be able to combine the result sets together and display them in a table with the method described above. The biggest issue with abusing tabulator in this way is that I have to grow my query for each new database I test. Also, that query won't display a row unless it has results from all databases. It would be nice to have all cells optional, so I can still see a row if it only has a result from one database.
2007-09-01T17:34:55 Data table with tabulator:
@prefix : <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:result rdfs:label "test result" .
:db rdfs:label "database" .
:time rdfs:label "elapsed time" .
<> :result
[a :Result; :query :q1; :db :rdflibBdb; :time ".5"],
[a :Result; :query :q2; :db :rdflibBdb; :time ".6"],
[a :Result; :query :q3; :db :rdflibBdb; :time ".7"],
[a :Result; :query :q1; :db :db2; :time ".8"],
[a :Result; :query :q2; :db :db2; :time ".9"],
[a :Result; :query :q3; :db :db2; :time ".11"]
.
SELECT ?query ?bdb ?db2
WHERE
{
?v1 <http://example.org/db> <http://example.org/db2> .
?v0 <http://example.org/db> <http://example.org/rdflibBdb> .
<http://bigasterisk.com/post-rdf/timing-results6.rdf> <http://example.org/result> ?v0 .
?v1 <http://example.org/query> ?query .
?v0 <http://example.org/time> ?bdb .
?v0 <http://example.org/query> ?query .
?v1 <http://example.org/time> ?db2 .
}
[link] I discovered today that Unicode comes with an rdf symbol (almost): ༜ That's ༜, "TIBETAN SIGN RDEL DKAR GSUM". Use this if your font doesn't show the character.
2007-04-06T16:46:43 unicode rdf symbol:
[link] 2006-11-29T11:36:30 RDF literals as subjects: Any proposal about allowing RDF literals as subjects, especially one that's for language purposes (in this case it's direction support), needs to address why RDF's current design has the exceptional 'language' attribute on literals. If your proposal is so good, why didn't RDF allow arcs from literals in the first place and avoid the langage/datatype special cases altogether? I really know nothing about the direction support issue, but if it's one of the last few language-specific issues and it really ought to be separate from the 'langauge' attribute, I am inclined to prefer one more special case on literals than a total redo of the constraints on rdf graphs. My main concern with literals as subjects is that people will treat them like "casual" URIs that aren't universally unique.
![]() | Unless otherwise noted, all content licensed by Drew Perttula under a Creative Commons License. |