DISQUS

Paul Buchheit: The power of links and the value of global knowledge

  • Brandon Wirtz · 1 year ago
    You could argue LinkedIn has done this. They have profile completeness as a "value" when you search for people, and number of connections matters. What they haven't done is look in to having "Trusted" people, or weighting certain people as anchors the way google does. The W3C site for example is a Trusted site, so links from it have more weight, Shouldn't someone who Bill Gates or Steve Jobs graces with a "Link" carry more weight than someone CarrotTop links to when you search for a Technology Professional?
    Perhaps that is broken, since Bill might link to Melinda's cousin because he knows her, not because she is the greatest Microsoft Certified Engineer on the planet.
    I always through Dating sites should weight people. Put a few "ringers" in the mix and have them go on first dates and say this is a swell guy or gal, and people who choose to try him/her out probably have good taste so we'll favor people they like in other people's searches.
    I know when I interview for a Job, or a Date working at Microsoft for a time boosts my rank with a great many. So perhaps assign some rank to employers...
    But generally People don't like when they are "sized up" so doing so publicly would likely piss people off.
  • Daniel Tunkelang · 1 year ago
    Even LinkedIn seems hesitant to acknowledge that some links are more equal than others, and that there's more to evaluating a link chain than its length. Check out the Measuring and Extracting Proximity in Networks work at AT&T Labs, or work by Christos Faloutsos on graph mining. And even these are opaque measures. I think the real coup will be making these measures context-dependent and transparent. We are exploring this approach at Endeca.
  • Brandon Wirtz · 1 year ago
    I think the big problem all rating systems face is that you have to evaluate not only link strength but link category...

    Sticking with my Bill Gates analogy from before. I would use Bill's favorite Code Jockey in a heart beat. I don't know that I would use Bill's Favorite Florist. So you have to assign authority based on some rules and then links based on those rules.

    A website example. I think RxList.com is the best site for free information on prescription medications. So when I detect that you are searching for a prescription medication at http://www.isayhello.com/ I favor RxList.com, and Some other rules kick in so that pages that are not too far from RxList, or other drug reference sites score higher in my results. Because I have picked an anchor for a category I assign "rank" based on not just the quality of your site but your quality as a function of distance from a site in the category I believe the search to originate from.
  • Dax · 1 year ago
    Web & World Integration I like why this conversation is heading. Paul any in sight into whether or not 'ringers' as stated above have been used in any social networking sites?
  • Maarten · 1 year ago
    this is great example of datamining with social sites:
    http://jheer.org/vizster/
  • chandan · 1 year ago
    great article.
  • Varun Mahajan · 1 year ago
    Good food for thought. And I chewed it quite thoroughlly :). A very obvious( if PageRank can be termed as obvious now) value can be extracted from all this data(some people call it social map). Any social site or any search engine, can make an algoritm(call it PeoplePageRank ) similar to PageRank for people. In that way, it will be easier to determine which friend of a paricular person is proficient/expert/knowlegable/right in which field. And if that particular person makes any search, the search can be customized(or at least tilted) accornding to his/her friend's opinions.
    Not sure if I made perfect sense here. But the idea is well built in my mind
  • Mustafa K. Isik · 1 year ago
    In my view the interpersonal link data is already "out there" - what seems to be lacking is the utilization.

    What kind of distinct valuable data does Facebook have access to that a webmail host could not already tap into (privacy implications set aside)?

    To me it seems as though it does not require a proprietary service to datamine personal connections. Proper utilization of existing data, implicitly gathered through existing ubiquitous (email) communication, instead of explicit user interaction ("adding people to friends") within the confines of a particular social network, is the way to go.

    In the end, wouldn't something like GMail with a tightly integrated and more detailed Google Accounts profile page yield much more data to analyze?

    Paul, you mention webmail hosts in your post as well. I am adding the assumption that as soon as utilization of the data derived from email communication matures, there is not going to be room for a "proprietary" network which relies on users connecting explicitly.
  • jfno · 1 year ago
    Great Followup. Those were exactly the kind of info I thought were missing from the first post. Not really missing, but had to be prevalent. Maybe FriendFeed will be this SOMEONE.
  • Arun · 1 year ago
    Paul, great article... When I read the initial article, my mind went along the lines of how Mahalo is nailing the search niche that you were hinting at.

    One interesting possibility of this relationship data could be for marketing firms to better identify potential contacts. Also, maybe could be used to improve collaborative filtering results (like Amazon does) based on the groups you belong to...
  • adk · 1 year ago
    "I'm confident that SOMEONE will begin mining this data, and that it could ultimately be more valuable than the link data from the web.[...] Google, in particular, is much better at data mining and also has quite a bit of human link data (from Gmail and Orkut)."

    Not to mention Google's efforts at indexing the microformats linkage with the Social Graph API, which may do for the social graph what their search engine did for the web.
  • Michael Cayley · 1 year ago
    Not just someone.

    Because corporate valuation is increasingly linked to the social capital embedded in these links, every corporation is going to begin investing into further understanding of these links.

    Interesting that this post is so on topic. I landed here searching for you, since I have referred to some of your work in an e-book that I am going to release that makes the argument that social capital is linked to corporate valuation and I want you to take peek before it goes out. If you get in touch, I will send you a link to the paper.

    Cheers,
    mc
  • nraynaud · 1 year ago
    http://www.nzherald.co.nz/section/story.cfm?c_i... I know of people leveraging this data :)
  • Adam Smith · 1 year ago
    The biggest problem here is that the data is not open. Whereas anyone could crawl the web in 1999, the social graphs out there are closed (to innovation).
  • Ranjit Mathoda · 1 year ago
    If you added a "hide topic" feature on Friendfeed you'd probably get a good idea of what kinds of keywords people don't want to learn about from some people. That would be a potentially interesting way of defining what expertise people look for from certain friends. Although alternatively it could just show information overload on a topic someone is actually an expert about. That's part of the problem with people links; sometimes we unlink in our activity with certain people because of dislike, sometimes because other things are grabbing our attention, sometimes because of overload.
  • Zoltan · 1 year ago
    Great post. I am researching this topic for my upcoming post about the growing importance of ranking content by the social graph. PageRank assumed that there are two group of users out there: Authors and Readers, i.e. if a relevant content to a keyword is linked to another content (Author has the authority and the control over this), than the linked content must be somewhat relevant as well. With user generated content, the division between authors and readers are not that obvious. PageRank values the explicit, intentional connection between content with weight on the authority , while SocialRank will put more and more value on social relationship between contents, that is, contents which are "linked" through the social graph. Furthermore, while the Facebook and LinkedIn etc. connections are mostly explicit (I declare that you are my friend and vica versa) there are a mass of implicit social relationships out there as well.

    Is the fact, that I left this comment on your post establishes a "relationship" between you and me? If so, does my related blog post will be considered more relevant? Not due to backlinking which is probably "nofollow" anyways, but due to our "social relationship"? What is the weight of this relationship? How do you rate qualitative and quantitative metrics on implicit relationships like this? Does it strengthen the connection between you and me if you reply to this comment, i.e. you accept my "relationship"? What about ad-hoc and temporary relationships? What kind of standards can be used to expose implicit relationships? The FB API and OpenSocial and FOAF etc. seems to deal with explicit connections only (I am your friend and you are mine and we agreed on this), at least at the implementation level.

    I wrote about this a year ago in a much smaller context at http://realbird.typepad.com/news/2007/04/search...

    Since then many new standards have been created (FB API, OpenSocial, Google's API for searching FOAF etc.) which will enable the next generation of search engines to evolve.

    The biggest challenge here, as you said, is that the noise-to-signal ratio is several magnitude higher when dealing with ad-hoc and implicit connections.
  • clovekchodiaci · 1 year ago
    Yes, Linkedln is great concept of social networking.
  • Scott Wheeler · 1 year ago
    Hey Paul - a friend just pointed me to this article, as usually happens, after it'd been sitting for a while. Not sure if you noticed our beta-launch on news.YC last week (Directed Edge), but this is pretty much the problem that we're trying to solve. We started off with Wikipedia just since it was a nice big graph to practice on (see, e.g. http://pedia.directededge.com/article/Gmail), but with the idea of using it on social sites.

    We've kind of been backed into calling it a recommender engine, since tool-to-pull-neat-things-out-of-info-graphs didn't have quite the same ring to it, but that normally brings up associations with collaborative filtering, and kind of the raison d'ĂȘtre of what we're doing is a similar belief that you can pull all sorts of interesting information out of the social graph.

    I'd be super interested if you have any comments.
  • adebuche · 1 year ago
    Hello,

    Great post ! This is a key - though much of a Graal yet - issue of web search.
    I was surprised the other day to notice under the Google search box the "Personalized" option, as I did not remember personalizing anything. I had a dreaded thought it might be a personalization using the subject of my gmail emails, or Google Marker's referenced websites ... Privacy issues can be tricky ;-)
    I am a firm believer knowledge doesn't reside on the web (only data and information does), but in human beings. Plus if you add simple volume / complexity issues such as the depth of a web graph (someone once measured it to be 11 links) vs the depth of a human graph (the 6 degrees theory), you can get 2 great uses for mining those links :
    1) identify the experts that have the solutions to your problems in an open innovation scheme (think the early attempts of Yahoo! Answers or Linkedin Answers), not to mention recruiters trying to identify the best man outside the usal suspects (Linkedin has developped a specific data-mining / query tool for recruiters that is not often talked about), and competitive intelligence experts trying to assess the quality of the data by judging the network / reputation of a given information source,
    2) using them as a social filter to find what's relevant (the initial 'gross' tentative behind Facebook's Feed algorithms, as well as the collaborative filtering Amazon uses that Arun mentioned above). Using some of my friends as anchors in specific fields I could narrow down a search much more efficiently