<D <M <Y
M> D>

2008-03-05T23:53:13 Notes from the talks at Semantic Web: Are Scalable Graph Data Applications Possible?:

Notes from the talks at Semantic Web: Are Scalable Graph Data Applications Possible?

I was looking forward to more Oracle demos and roadmap-type discussion, but instead the highlight was allegro.



jeff from oracle says:
fraud detection is using graphs
we're generating data faster than we can process it
business value comes from: reduce cost of operations; aid decision-making; improve the transparency of business operations (e.g. for businesses that need to meet regs)
nice slide on DB approaches, broken into disk/ram, native/layered, etc
siderean is a company doing in-mem, multiple machine storage



david from mulgara
key to web scaling is the late binding of address to resource. Allows the information mgmt technique of the web to scale well

the next gen mulgara version, which the team was meeting about this week in SF, will use lots of disk, perhaps 40G ram, and store 100B (?) triples



vertica:
SQL DBMS, focus on analytics
50+ customers: verizon, comcast, level3
came from the cstore project, MIT
MIT library catalog is rdf, 50M triples: Barton Dataset
uniprot protein dataset is 262M triples. vertica serves that dataset for public querying



jans aasman, franz inc:
23 years old company, 2 yr with a triple store
customers do 'event handling and activity recognition'
50 customers, plus free download
monterey aquarium doing Marine Metadata Interoperability Project
los alamos is studying who reads what publications, graph structures in readership
sun doing baetle, the bug tracking one
japan telecom KDDI is doing spam and fraud detection with allegro. they need to determine what is spam across their busy network. they create new spamassassin rules over time.
OFFIS using rdf for info about power grid usage
allegro loads 1e9 quads in 8 hours
has sesame interface
supports xml schema datatypes, e.g. range queries on dates. Literals can be stored as their own numbers
'social network analytics library' for degrees, cliques, group stats
quick loads from oracle for temporary dbs used for analytics (coming soon)
RDFS++ reasoner for the usual inferences
temporal reasoning (allen's temporal logic, for intervals)
their time/space handling helps with event search. one query involves a place and radius, person connections, other event details
police, e.g., need temporal reasoning
"homeland security is interested in every type of imaginable event"
"find all meetings that happened in december within 5 miles of berkeley that was attended by the most important person in Jans' friends and friends of friends"
they have a custom query language for their various datatypes and their capabilities, like (geo-box-around !geoname:Berkeley ?event 5 miles)
even 3 months of american phone call records is already petabytes
jans' thesis was about car driving behavior
GPS (maybe plus phone) leads to rich data about people- work, purchasing, etc



My question, which I didn't get to ask: How do the approaches compare in terms of latency for very small queries? Many of my queries are not batched together well, or my app needs to make a lot of decisions during the graph traversal.


[Main]

Unless otherwise noted, all content licensed by Drew Perttula
under a Creative Commons License.