A Blog recording the life a Web Scientist
Over the last 3 days, I’ve been working on collecting and analysing the Twitter communications feed for the Digital Futures 2012 Conference, which used the #DE2012 hashtag.
The following analysis is a summaries overview of the work that I’ve been involved in at Southampton University, in a joint project with myself and Edelman ltd. The project, which aims to investigate novel ways of identifying different user roles within online social media platforms such as Twitter has been in development for the past year, during which a number of academic publications, workshops and demos have been achieved. As part of the project’s outcomes, a tool has been developed to explore the dynamic communications of Twitter communications, helping identify how specific individuals are critical to the spread of information. However, one of the biggest problems with visualising and exploring the online communications within Twitter is the sheer volume of information, even within a conference such as Digital Futures 2012. As Figure 1 shows, an unclassified retweet network becomes very cluttered, and it becomes increasingly difficult to identify different user roles as the volume of tweets increase. The aim of the classification model (Described Here: http://eprints.soton.ac.uk/272986/) is to overcome this, making it much clearer to draw out potentially valuable users. The application of the model can be seen in Figure 2.
What follows are the dynamic growth of the networks over the 3 different days of the Digital Futures 2012 conference.
Day 1. Dynamic view of unclassified Retweet Network #DE2012
Day 2. Dynamic view of classified Retweet Network #DE2012
Day 3. Dynamic view of classified Retweet Network #DE2012
As Day 1 shows, viewing the growth of communications in terms of a unclassified network, and ultimately identifying specific users becomes difficult; there is just too much information to comprehend, with multiple network hubs of communication, which seems to indicate a network of disconnected conversations. However, as the videos of Day 2 and Day 3 show, using the classification model, the communications – and individual users – become far easier to comprehend. The Red users, which are those that have been retweeted a certain number times (Day 2 is a minimum of 10, Day 3 is a minimum of 20), are potentially valuable sources of information. However, even more interesting (and potentially useful) are the yellow users who are curators of these highly retweeted users; nor are they a type of user that draws together multiple sources of (potentially) valuable information, but they are also the first in the network to do so. In effect, the yellow users are to some extent responsible for widening the network of communication.
I’ve also included an overview of the timeline of Tweets during the conference and a measure of the top tweeting and retweeted users. What’s interesting is that examining the difference in the highly retweeted users between Day 2 and Day 3 (see videos), nicoleebeale and cyberdoyle change roles, yet the number of curators (yellow users) remain constant.
Hopefully in the next few weeks I’ll be blogging about some new methods that I’ve been working on for performing analysis on Twitter communications, so check back soon!
For me, the theme of day 3 was about sustainability, and how Open Data can help facilitate the process of creating a sustainable future for a growing world. James Cameron gave a brilliant presentation during the opening plenary session, looking at the current data, tools, technologies, enterprises and social processes that are helping the world become more sustainable – and he painted a very worrying picture. We currently don’t have the right things in place to support the planet in an effective way, but potentially Open Data may be able to offer a helping hand to such a major task.
As James argued, we need not only data, but ways to visualise, and manage the ecosystem of various systems that exist, he alluded to a system that resembles something of a master control centre, where people can monitor, analyse and take action on the complex streams of information that are available to them.
James also make some important points about the social processes of this task – most crucial, it is not the task for a single person, it down to the responsibility of groups, teams, organisations and societies to help produce this data, maintain it, and monitor and make decisions based on it. There seems to be a lack of clear processes to examine how climate change can occur and be worked on, there needs to be a measurable performance indicators to make this happen. As James explained, we need to create public goods enterprises, which focus on use and re-use of open knowledge and open data.
It does seem that the buzz word for this conference is ‘ecosystem’ which has so far has taken a broad meaning, from ecosystem of development, to a global ecosystem of communities and technologies working towards common goals.
Tiago Peixoto followed James presentation with the important topic of participatory budgeting and involvement of citizens. Demonstrating a case study of brazil, the benefits that taking such an approach improves not only economic stability, but also improved welfare, and standards of living (a reduction in poverty rates and child mortality).
However, the cost of the participatory is high, both socially and technically. Getting people to take part in it is not easy, and requires the careful integration of it into their social practices. A great example was given By Tiago; In brazil it is very difficult to get the citizens to come to an Internet Café, or some government building, however, putting the voting system in the local church, youth centre or community centre will make it much easier to get citizens to vote.
Another highlight of Day 3 was the “Challenges of Working with Crowd-Sourced Data” which was part of the Open Research and Education track. An diverse panel consisting of Philip Thiago, Anahi Ayala Lacucci, Linda Raftree, Victor Miclovich, Stephan Davenport, Soren Giler and Daniel Gonzalaz each contributed to a number of good, thought provoking questions, provided by both the session host and also questions by the panel. Key points raised from the discussions included identifying and nurturing the crowd for completing tasks, yet focuses on the importance that they are humans, and they have their own goals and objectives. Finding the right incentives is a must, and as examples were told, these are not necessarily financial incentives or rewards, it can be something as simple as a printable certificate or sticker, making the crowd know that they are part of something important and potentially beneficial to society. This also raised an important question of exploitation of the crowd, and how that the tasks that they are doing should not be something that brings them no benefit, this is potentially taking advantage of the helpful or uninformed.
It is important also to create an ecosystem (the Buzz word of the day) around crowd sourcing, the worst thing that can be done is create an application, get people to participate, and then do nothing with the data. Feedback is required for sustainability, and there needs to me some measurable impact in order to demonstrate the capabilities and benefits that the crowd’s time has provided.
Two further important points were raised during this session, the first regarding the design of crowd sourcing services, and second, the trust worthiness and scaling issues of this kind of data. It was nice to see a new definition from Victor about how crowd sourcing should be defined, it is important to consider the development of these technologies as “Crowd Crafting” and as part of the process, include the crowd rather than presume their actions. By doing so, it’s development avoids the risk of potentially insulting cultures or country specific issues.
Trust is also an issue with the development and eventual growth of these systems, and currently two methods are operational, assessing the data by hand, or using AI, and although the latter is not always accurate, when dealing with a large set of data, then analysis by hand becomes a difficult task indeed.
The afternoon highlight for me was the ability to host a session for academic research interested in open data research, which included 5 speakers from a diverse set of backgrounds and locations! For a full list of presentations visit: http://okfestival.org/topic-stream-open-research-and-education/. There was a really good level of engagement with the crowd for each presenter, with plenty of questions and feedback on their work. For me, one of the things I’m really looking forward to in future OKFN events and hopefully OKFest 2013 is an even bigger focus on the Open Data research that is coming out of academia, especially with Rufus’s talks earlier in the Week regarding the need to collaborate more the academic community!
Finally, a real highlight for the evening session was the great talk by Hans Rosling, who’s award-winning work has been not only influential in the global development, but also within the world of data visualisation and using graphics and interfaces to represent highly complex datasets and theories in simple and comprehendible ways. Rather than describe the brilliance of the talk, it is far better to spend a spare 45 mins to watch it (Starts at around 30 mins in): http://bambuser.com/v/2996396
Recapping Day 3, without a doubt, the theme today – which built upon the previous two days – was how to create and maintain an Open Data ecosystem, which is both a social and technical network of actors. I’ll stop there before I start to go off into ANT territory – I’ll save that for another day…
As before, here is Day 3 of the #OKFest Twitter conversations! I’ve also included a Wordle of the most used words within the stream, these are the top 100 words, with common words (and, this, etc) removed.
Following the same manner as the previous day, Day 2 kicked off to a timely start, with the session hosted by Jarmo Eskelinen. The session, which featured 3 invited speakers including Philip Thigo, Carlos Rossel, and Ville Peltola offered a fresh set of ideas for a new day at OKFest, with an increase in focus towards the changing world, and the impacts that Open Data is and will continue to have on society.
Philip’s talk, which focused on the transformation of Kenya was inspiring yet grounded in a number of serious issues. Kenya itself if a country of mixed wealth, as Philip explained, in a distance of less than 20km there exist two extremes in living conditions, education, and basic human standards. Yet it appears that these issues are superseded by things that perhaps are less important (to those that require the help the most). As Philip rightfully said, the current challenges in Kenya are not being addressed, they are being swept under the carpet, something that needs to change. A key point raised was Kenya (and other countries) need to be addressing needs, and not creating demand, which undoubtedly, are not there. Open Data is a key component within this environment of change, empowering citizens, but also providing the state with capacity to respond when required. When asked what is the biggest opportunity that Open Data will provide for developing countries, Philip replied with a very important message – it will provide a voice for the people.
Carlos Rossel, from the World Bank was up next, providing an update on the World Bank’s commitment towards Openness, which not only involves the opening up of datasets that individuals once paid for, but also the openness of their own records, how the organisation is run, finance, etc. It was great to see the World Bank’s data portal and the increase in Web Traffic (>2 million ) since the development of their Open Data portal. If there was one key message that really stood out from this presentation was 3 words, Results, Accountability, and Openness, which as Carlos suggested, needs to be conceptualised as Openness, Accountability, and Results. This workflow (or iterative process) is something that needs to be embraced by all, openness is truly the first step to enable transparency, accountability, adoption, change and growth to happen. However, as day one of OKFest demonstrated, and I’m sure the other days will follow a similar path, the strive towards openness is a complex, long-term journey.
The final presentation was given by Ville Peltola from IBM, discussing the interested topic of Smart Cities and the Three I’s: Instrumented, Interconnected and Intelligent. Some really insightful ideas were provided by Ville, especially with the conceptualisation of cities of complex network of systems, which all operate together, adapting, influencing and competing with each other. A little off topic, but this kind of conceptualisation of systems is something that I’ve been researching, looking at how we can understand complex networks of technologies, humans and artefacts as network of Actor-Networks, all interconnected and co-evolving around each other. Ville’s comment regarding the problems of taking a reductionalistic approach also played well to these concepts; understanding the overall function of a network becomes increasingly difficult if the macro becomes obscured. Ville provided much food for thought, and closed with an important statement which brings back the discussion to the need for open data – “People are the most important part of systems” (although slightly socially deterministic), and with open data, smart cities are one step closer to achieving their goals.
After the short coffee break and a difficult decision to which session to attend, The Transparency and Accountability session on Corporate Transparency, Corruption and Open Data looked exciting, especially with two presentations by Chris Taggart and Rosie Sharpe.
Chris was first to present, with an update on OpenCorperates, something that has come on a long way since I last saw it at OGDC in Warsaw last year. However, before this, Chris provided a useful insight into the complexities of company structures, how they are complex networks of networks, interconnected, self-looping (potentially). Furthermore, these are growing in complexity, scale, and opacity. As Chris suggested, this leads to many problems, and most important perhaps is obscurity for the bad guys, those who are committing illegal or even unethical activity. As Chris said, it is easy for the bad guys to get lost in the crowd, but open data helps reduce this crowd down to something manageable, increasing the chance of at least identifying who and where they are. OpenCorperates is helping achieve this, with the sole aim to provide an entry for every corporate entity in the world, and is underpinned by 5 key elements: An Open Identifying system (using URI’s), simple search capabilities, Sources for additional information, Reconciliation, and importantly, a platform which provides access to all this information via simple, yet powerful API’s. It was great to see that in over 20 months, OpenCorperates has grown significantly, with over 45 million companies now listed, and increasing support from those providing the data. If this was achieved in only 20 months, let’s see what 2013 brings!
Rosie Sharpe from Global Witness followed Chris’s presentation, with an important discussion on why companies need to publish who they are. A number of case studies were listed regarding the dealings of large, western banks and corrupt government and their ministers, demonstrating the problems with companies being able to hide who they are, which is facilitating the process of hiding illegal activities such as money laundering. Rosie’s concise overview of how money laundering works really demonstrated the simplicity of the process, and the loopholes which exist to make it possible. Rosie’s closing point really was an important one, if society thinks (and prosecutes with a long term sentence) Fake ID’s are a crime, then why can companies hide behind fake ID’s, and be prosecuted with the risk of losing their personal assets? The final quote that Rosie provided is something that we should all consider:
“Anyone registering a limited company should have to declare the names of the real people who ultimately own it, wherever they are, and report any changes, lying about this should be a crime” – The Economist, January 2012
That was the end of the morning sessions, lunch arrived, and unfortunately I had to attend an external call, missing some of the afternoon sessions. However as before I’ve had been tracking the Twitter stream (which I will provide a dynamic video of its growth at the end), and similar to yesterday here are some interesting visualisation of both Day 1 and 2 of the #OKFest twitter stream.
OKFest Recap – Day 1
At the end of Open Government Data Camp (OGDC) 2011, I promised myself that in the following year I would get more involved with the Open Data community, fast forward a year (well just about) and OKFest 2012 has just began, and this year, I’m helping run a session for Academic Research interested in the impacts of Open Data (Not so much a shameless plug as it’s on my blog).
OKFest kicked off to a brilliant (and timely start), attending only OGDC means I have only that to compare it to, but this year, instead of the Warsaw warehouse – which was brilliant in itself – OKFest is being held in the Arabia Campus at the Aalto University, Helsinki, Finland. A clever, complex building, with galleries, cinemas, auditoriums and hangout spaces (and also some elevators that don’t arrive at the expected floor); the venue this year definitely lives up to last year’s location, and this time, I’m not having to wear a coat inside (for those that were not there last year, heaters were required constantly).
As by tradition, Rufus opened up the Plenary, offering some important advice to the current Open Data and Information climate; the original challenge of obtaining the information is now underway and to some extent (but still requires work) is being achieved. The challenge now faced is how to use the information, and how to use it effectively – a key message, that appeared to lay the foundations for the rest of the day.
Following Rufus was Martin Tisne, who discussed his experience with working with government to become more open and also his current work at Omidyar, pushing transparency and driving change at the government level. Martin’s presentation drove home some really important messages, especially in regards to data engagement and use; open government (data) is great, but only when people use it. One of the most important questions is why do people use it, and what can be done to increase engagement? On the other side of this, Martin also raised questions about how data should be opened, and suggests two paths, hacking out the data, or working with government, setting up standards (which was discussed in the following sessions), and working with the right stakeholders. As Martin wrapped up, he left us with some really good thinking points, open data needs to be recognised as an ecosystem, and by getting everyone (technologists, developers, civil servants, businesses, citizens) will improve its chances of growth and success.
Next up was Farida Vis, with a lighter start to her presentation (yet, with a serious message), discussing the publication of UK Allotment data (or the lack of it). As Farida explained, allotment data may not be sexy, but it’s important, perhaps not for government (although I’m sure that they could use that data to explore the relationship between areas with higher concentrations of vegetable allotments tend and local greengrocers), but it’s important for the individuals that own, run, or request an allotment. Farida also emphasised the point that this data is the product of individuals, the data holders are not central or local government, and this needs to be recognised. The second part of the presentation examined Farida’s work on analysing the London riots and associated social media streams, demonstrating the power of (big) data – which did raise the question about the relationship between open data and social media data. This is a question that definitely needs addressing, especially with the changing tides of Terms and Conditions, and restrictions of data collection and archiving.
After a short coffee break the second, OKFest attendees had a choice between 12 topic streams ranging from the likes of Open Government, Open Cities, Sustainability, Cultural Heritage, Data Journalism, just to name a few. Having all these tracks is amazing and has really brought diversity to the Festival, however the only downside to it (a human constrain unfortunately) is that one can only attend a single session at a time – I suppose it is possible to watch all steams simultaneously via webcast, but my battery life was bad enough as it is!
Attending the Transparency and Accountability session, the first talk in the Open Government Data movements and Related Initiatives session was given by Teemu and Salla, providing an overview of the activities of the Open Government Partnership (OGP) in Finland. It was great to see that more and more countries and getting involved with the OGP, and also the speed to which this is happening. The draft roadmap for Finland’s commitment to the OGP is set to be completed by early 2013, and as Teemu and Salla suggested; the current plans are still in draft, so suggestions for improvements are more than welcome. A number of take away points that was brought up during this presentation strongly resembled that of last year’s OGDC, pointing out that Open Government is not just about Open Data, it is the process that surrounds it, and also how governments need to adapt to the way citizens and businesses participant with Open Data. These are important messages, and tie nicely with the points Martin raised about Open Data as an ecosystem in the opening session.
The second talk of the session was given by Marta Nagy-Rothengass from the European Commission on the vision and strategy of Europe’s Data. Marta drove a strong message about the EU’s drivers for Open Data, arguing the case for better business and economic opportunities and also improved and more appropriate governance and policies. It was great to see that revisions in the EU commission re-use decision see’s the drive towards machine-readable formats, and also the use of licences that enable a genuine right to re-use. Marta also discussed the launch of the new EU data portal which aims to be publically available by 2013, a new hub for European-wide data!
To close the session, a number of quick-fire (and when I say quick, 2 minutes or less!) presentations were given providing an update on the progress of Open Government in a wide variety of countries, including (wait for it): Uruguay, Italy, Nigeria, UK, Slovakia, Argentina, Brazil, Kenya, France South Africa, Australia, Czech Republic, Holland, Canada, Ireland, Spain, US, Israel, Germany, Belgium, Estonia, and also a talk by Chris Taggart from Open Corporates. These ranged from a number of positive to not so great updates, yet often shared the common problem of how to gain and sustain the commitment of stakeholders needed for OGD success (typically the government and citizens). However there was an overall sense of growth in OGD around the world, and it was great to hear such a wide mix of countries reporting on their efforts so far.
That wrapped the morning up, and lunch very quickly became a hotbed for discussions about the morning’s sessions, catching up with colleagues, and frantically finding a socket to charge laptops.
Attending the second Transparency and Accountability session on Open Government Standards provided a great way to kick off the afternoon. The panel, consisting of Martin, Jose Alonso, John Wonderlich, and Rufus gave a diverse set of talks about the development of Standards for Open Government. Martin kicked off with thoughts about standards can offer a practical way for OGD to grow, and how technical standards are important, but at the same time can become barriers, or silos between organisations or governments. Questions were also raised about how the development of standards for Metadata, how should this move forward, is a global standard required?
Jose followed this with a talk about the Web Foundation Open Data Index, a project which recently launched, accessing 61 different countries in terms 14 different indicators of Open Data. It was great to see the start of a cross-comparison and ranking system for countries involved in OGD, perhaps offering an incentive to improve Open Data standards and increase the publication of data. More information about this project can be found here: http://www.webfoundation.org/projects/the-web-index/
Reflecting the morning sessions, John’s talk drove home the message that Open Data is much more than releasing data, it is allowing citizens to connect with their government, allowing them to know how their country, city, region, or town functions. Open Data should be about empowering citizens using data, technologies and applications, and more so, it needs to reflect the needs of society, just as Martin said previously, it needs to be part of an ecosystem which is constantly evolving and adapting.
Finally Rufus rounded up the panel’s presentations with a strong message that opening data is not a process that has an end, it is something that needs to continue to happen. Yes, we currently have a good amount of data, but this must continue to grow, commitment from stakeholders are required if the OGD initiatives are to continue to develop. Aligned to the idea of the ecosystem, Open Data is part of a 2 part process, opening data, and using the data – and now there needs to be focus on using the data efficiently, and this is partly done by putting the data in the hands of the citizens. As Rufus pointed out, the Open Data community has matured, and the pitfalls and issues that have become apparent during the years need to be reflected on and learnt from. Hard work is still required for Open Data to continue to grow and using measurements, clear principles and standards will help this.
The floor was then open up for questions, provoking the question (which was asked in the morning session) about opening 100% of government data up. It was great to see the panel agree that not all data should be opened up, and for obvious reasons, some data, such as health data is too sensitive for public consumption (yet some would argue against this). An important question was also raised about the barriers that standards may introduce, with an agreement that there needs to be some level of standards (both technical and social), but the most important thing is to get the data out there (as long as it isn’t in PDF…)! Rufus also made an important point about governments worrying about the return of investment as a result of publishing data; in the past, money was spent in the technology sector without the guarantee that there would be any economic gain or ROI, Open Data needs to be approached with the same mind set, and growth takes time. As Andrew Stott suggested, try and make the case for Closed Data and see what benefits and ROI it provides.
The session came to a close on a positive note, and was a reflection of the morning’s attitude towards the current state of development of Open Government and Open Data. There is definitely a positive buzz and energy within the community this year, and hopefully this continues to grow with day 2 of OKFest.
Just for your amusement, I’ve done a quick network analysis of the #OKFest Tweets from Day 1, the graph below shows the retweet network, with the red nodes being users that have been retweeted 75 times or more. For more information about the colour coding, have a read of this: http://dx.doi.org/10.1145/2187980.2188256
The other statistics are for your amusement as well, enjoy!
The Network Science and Web Science 2012 were held back-to-back on the 18th – 24th June 2012 in North Western University, Chicago, USA. This was the first time these conference were ran together at the same time and location and due to their cross over in research topics, promised to be a great week of presentations, discussions, learning and networking. Both disciplines that these conferences represent are relatively new fields of research, thus a great breeding ground for novel and exciting research. The conference participants, presenters and keynote speakers all come from a variety of disciplines and research interests, ensuring a true interdisciplinary environment – this was reflected in the accepted papers for both Web Science and Network Science.
The following discussion will describe some of the highlights of both conferences, including a selection of the best papers, keynotes, and general outcomes of Network Science and Web Science 2012.
Network Science 2012
Network Science, a recently formed discipline draws upon a multidisciplinary field of researchers, including from Mathematicians. Physicists and Computer Scientists, but also draws upon those interested in Psychology and the Social Sciences.
The first two days of the conference was occupied with a number of Workshops, and also a Network Science School, which offered an jam packed 2-day introduction into the field of Network Science; although I didn’t attend this, I was told that it was an extremely useful and informative 2-day program, offering some great learning outcomes and also connections with those already researching in the field of Network Science.
Attending the ‘Languages and Network Science’ workshop on the first day was a great way to jump into the conference – focusing on the use of network science to help model a number of problems and research topics associated with language learning, processing and semantics.
The Workshop – a full day programme – kicked off with presentations regarding the modelling of phonological network structures, first examining how children learn and increase their corpus of words and then moving on towards examining the connectivity of words, how they are learnt and the semantic similarities that they share. The use of Network Science approaches to modelling these networks provided a test-bed for experiments with language networks, from modelling how certain words are learnt by children at different ages, to examining the mapping of semantics between words. Although this is not my current area of research, the morning session was very interesting and useful to attend, it demonstrated the broad range of research that can be classed at network science, However, I was expecting to see much more real-world empirical data within experiments, which, as this report will raise again, tended to be the typical methods used within the research presented.
The afternoon of the Languages and Network Science workshop soon became populated with research on social networking and collaborative sites such as Twitter and Wikipedia. Paolo Masucci gave an interesting paper on the semantic flow of language between different Wikipedia pages, using the Italian mafia as a great example of how semantics links between Wikipedia pages can be used to build up networks of connections. In addition to this, Xiaoju Zheng’s presentation on the use of Twitter hashtag dispersion was extremely relevant, providing some figures such as 98% of hashtags are words, and that well dispersed – spread amongst multiple user communities – are more likely to stay popular and not die. Words2Play.com was also demoed, which is a social machine which makes a game out of splitting up blended hashtags (i.e. twestival, twitter and festival).
Overall, the workshop offered a good start to NetSci12, setting the standard high for the next few days of presentations and discussions. Although the selection of presentations in the afternoon were more geared towards empirical-based research, what was noted though was the heavy focus towards theoretical modelling of single variable problems, something that I was not expecting or researched into before this workshop. Furthermore, another take away lesson from day one was the extent to which Network Science is being used in other disciplines, not only within the computational sciences, but physical and social sciences as well; and within each, using the theory and methods of network science to tackle their research problems.
The second day of Network Science consisted of a tutorial on the Sci2 – Science of Science – tool, a similar tool to Network Workbench, offering analysis and visualisation of large network datasets. The half day workshop provided a hands-on walkthrough of the tool, demonstrating its various features and capabilities – definitely a great way to quickly become efficient in the use of the software.
The official opening of NetSci2012 started off with two keynote speakers: Luis Amaral and Iain Couzin. Luis gave an insightful personal perspective on the field of Network Science and Iain provided an exciting presentation on his work looking at social networks in animals and collective behaviour; fascinating work, demonstrating that animal social networks are usually based and tracked by proximity, and the strength of interactions are based on the number of interactions that they have with others, thus the individual decisions and ‘opinions’ of the animal are closely linked to the local majority.
The afternoon session was split up into 5 parallel sessions which ranged from social networks to fundamental network properties. Attending the social network session, a range of interesting and relevant presentations were given, with highlights from Alex Rutherford’s study on mobilizing people fast – modelling the DARPA challenge experiment, reporting that distant or long range friends tends to be more active than those in close proximity. Looking at this session with my Web Science hat on, Sameet Sreenivasan’s research on tipping points of views/opinions within social networks was also very relevant, demonstrating the push needed to create a change in political positions of social network users – demonstrating that during the transitional period, there tends to be an intermediary state, which represents the indecisiveness of users.
Attending the two social networking sessions in the afternoon, the reoccurring theme, which I mentioned before was the use of theoretical models to examine different kinds of social phenomena, yet I felt some of them could have been made stronger with the use of more empirical data – I put this down to disciplinary differences though, despite this, the attended talks were insightful and demonstrated the application of Network Science.
The second day of NetSci12 began again with a selection of keynotes and invited speakers, one which was particularly interesting was Neil Johnson’s research on ultrafast financial transactions, examining how the speed at which transactions are produced is important for “global control”.
Reflecting on the Network Algorithms and Network Measures tracks in the afternoon session of day 2 (although some presentations were beyond the scope if my understanding!), a number of new novel approaches were given for dynamic analysis of networks, an area which I have been interested in for quite some time. The argument for dynamic analysis to understand the changes within a network was clear, and was also applied to help identify different communities and groups within networks – something well worth looking into!
The conference dinner was also a worthwhile trip, as it featured a after dinner keynote from Barabasi, discussing the evolution of Network Science and his current work within the medical sciences. Some really interesting ideas (and graphics) were shown, especially the application of new approaches in network science to help map out the pathways of human diseases.
The third and final day of NetSci was a morning filled with excellent keynotes by James Fowler and Lada Adamic; James’s talk looking at large scale networks and examining social influence and political mobilisation – with a really good take away message from it being: You should ask the question of why things happen, but you need to ask why things don’t happen! Lada’s talk – on social memes – was timely and had a lot of relevance to some of the work that I’ve been doing on influence and diffusion of messages within social networks. The talk really gave a good overview of the diffusion of memes within social networks, how they change, adapt and become something completely different from the original along the diffusion path – notably a growing area of research. The morning session finished with a light-hearted talk by Michael Macy on “why do liberals drink lattes” – which examined political preferences in social networks.
Finishing the morning session was the signal for the end of NetSci2012 and the beginning of Web Science 2012, with an adjoining keynote given by John Kleinberg on Status and Evolution in online social networks.
Web Science 2012
The Opening Keynote of Web Science 2012 was given by Kleinberg, who talked about social status and feedback effects in online social networks, discussing two interesting theories, that of balance and status, which examined the role of status and power that individuals have within a network, and as Kleinberg discussed, those with high status tend to be close to other with high-status, yet, the rate at which you are respected is lowest amongst those at the same status as you. Furthermore, those with low power tend to act more as the coordinators within a network.
After the keynote, the first session was a presentation session on social network and friendships, in particular, Daniele Quercia et al. paper on Loosing “Friends” on Facebook was fascinating, demonstrating that the lack of common ‘metadata’ amongst users was strongly correlated to the unfriending of individuals, with age playing a major factor.
The panel session which followed this – Social computing and collective intelligence – saw some really diverse and interesting research, from examining recommendations on (food) ingredients Websites to the use of social media platforms to enhance daily deal Web services. After listening to the panel discuss their work, the underlying theme of these presentations seemed to fit in with the concept of the Social Machine, how they can be designed (which is part of a social and technical process), and how they become accepted and used by the masses. An interesting thought is can existing social machines – Twitter, Wikipedia – be used as a platform of springboard for new machines, can they grow on-top of/side-by-side each other?
The Second day of Web Science opened with a keynote from Sonia Livingstone, who talked about the use and affordances of digital technological in and around the classroom. This was a much welcome talk to the Web Science community, as it stepped outside the usual boundaries of tech-heavy talks towards one that was much more about real-world experiences and in-the-field research. Following this, Clare Hooper presented a great opening presentation on the cross over between HCI and Web Science, and was an excellent piece of research asking (and providing answers) to some of the fundamental issues within the Web Science discipline –discussions that are well long over-due! Similar to the research that I have been working on, Clare pushed forward the idea of qualitative and quantitative methods within the intersect of HCI and Web Science, arguing their own individual strengths and suitability. Terhi Nurmikko – A fellow Web Science student from Southampton – also gave a brilliant presentation of the use of semantic Web technologies within Cuneiform Studies, offering a great solution to help the niche field with ways to make documenting and working with artefacts much easier, it was great to see the field of archaeology being represented at Web Science.
The afternoon keynote was given by Sinan Aral, who discussed measuring influence in social (media) networks –an area which I’ve been interested in for quite some time, especially with my current work with Edelman who is interested in developing tools to examine influence within social networks such as Twitter. The presentation gave a detailed overview of the difficulties of detecting and classifying influence within a social network, the problems with casual estimations of influence, drawing upon concepts such as Constructed Observational evaluation to discuss alternative ways to deal with detecting influence. As part of the talk Sinan discussed a recent Yahoo study which examined 27 million users and found that you are 16x more likely to adopt a technology if an individual has a friend of a friend that does so to. Statistics like this make me wonder how quickly users could migrate to a new social media service (like the Facebook vs. Google+ debate), and also adds to the argument of the Web being a temporally stabilized set of networks, only held together by their continuous support and activities.
Following the keynote, the second presentation panel of the day began kicking off with Chris Phethean, another fellow Southampton Web Scientist. His talk on the use of social media within charities looks to be a promising area of research, examining how social media can be used to help further and support a campaigns and audience base. After a short break, the next panel session began, with a great series of papers and discussions concerned with ‘democracy, policy and the Web’. Two standout papers were given: Kieron O’Hara’s Transparency, Open Data and trust in the Government, and Sabrine Saad et al. (presented by Stéphane Bazan) research on the Infowar in Syria. Some really great in-action research, highlighting the real dangers of the Web, the consequences of its use within political uprising (Syria Electronic Army).
The end of day 2 was closed by a keynote from Danah Boyd who discussed the ever growing threat of privacy on the Web, drawing upon her extensive ethnographic research to show how data privacy is changing, how individuals are dealing with it currently (like teenagers hiding meaning rather than content), and how in future we may need to implement some formal system or policy to protect society. A very important point was made during the keynote: data is like DNA, you can’t share your own without sharing someone else’s. The talk also sparked the debate to what was ethical within the world of ‘Big Data’ research, just because the data is out there, does it mean that it should be used? Will the data that is currently available now be (ethically) usable in the future? It is these kinds of (Web Science) questions that really need to be explored further.
The final day of Web Science 2012 began with an Industry panel session, which then was followed by a diverse (yet technical) set of presentations on social network analysis. In particular Lars Backstrom et al. paper on “Four Degrees of Separation” was very interesting, which analysed the strength of connections and distance within the Facebook social networking site, revealing that in 2008, the average distance was 5.82, but in 2008, it reached 4.74 (4.32 in the US)! Great findings, but this really needs to be put into context, as the study is only considering Facebook as the source of evidence, Web Science needs to be careful not to make too many claims, as (as Lars said himself, the media tend to report things wrong, or entirely misinterpret it).
After lunch, the final presentation session on “Methods and Applications” began, with me being the last presentation to be given. The diversity of the research being presented were good, with Hans Akkermans’s presentation on how power laws occur, to Jérome Kunegis discussing fairness on the Web, introducing an alternative approach to the power law for understanding the long tail effect of the Web – the Gini coefficient. Following this, I presented my work on Mixing methods and Theory to Explore Web Activity, which focused on the use of using both qualitative and quantitative data to show that Web activity is a product of complex socio-technical networks that need to be understood not only from the online and offline perspective, but also at the micro and macro level, and whilst doing so, maintaining a perspective that isn’t deterministic. The presentation got some really great responses (and challenges), and the argument against the growing trend in Twitterology (coined by Professor Catherine Pope) was put forward. This definitely created a (well-needed) stir in the conference, sparking lots of good debate!
To end Web Science 2012, a well-known figure in the Computer Science (and Web Science) discipline closed the session – Luis Von Ahn, who developed the highly popular (and somewhat frustrating) ReCAPTCHA service. His keynote was both light hearted – ReCAPTCHA art and the church of Inglip – and thought provoking, not only revealing the new language translation/learning social machine DuoLingo, but also posing the question of “How do we coordinate 10 million to solve crime?” – A question which was directed at the use of social computation to solve the high crime rate in South American countries. Although this sparked a number important political, social and philosophical issues (which, without a doubt was a good thing), asking these kind of questions are the kind of things that need to be done to push Web Science in new directions – it is only with these kind of questions that we can start to unravel issues associated with it.
Both Network Science and Web Science 2012 were great – the speakers, the location, the organisers and the weather! The week provided many thought provoking debates, ideas, and new research areas/interests. The keynotes chosen were excellent, providing a well-balanced perspective of the current direction and challenges that these fields face. With Network Science 2013 located in Denmark and Web Science 2013 in Paris, it’s time to get writing again!
The ICWSM2012 was held in Dublin, Ireland on the 4-7th June 2012. It was a multi-part conference, with the first day dedicated to half and full day workshops, ranging from the impact of social media on journalism (http://www.arcomem.eu/icwsm-2012-workshop/), the use of large scale data mining with social media (http://www.ramss.ws/), to examining new and exciting ways to visualise the ever growing network of social media data (http://socmedvis.ucd.ie/). The remaining days (5-7th) were for the main conference event, with a single track schedule that ensured that all papers and posters could be attended. Before discussing the main conference, let’s spend a moment on discussing the Social Media Visualisation workshop.
SocMedVis (as it’s liked to be called) was opened by Ben Shneiderman, who provided a great start to the workshop. His keynote explored the challenges to visualising the ever growing pool of data that social networking sites are generating and gracefully reflected on his highly cited phrase: “Overview, Zoom and Filter, Details on Demand”. A really important message Ben gave was the practicality of creating visualisations of networks – they are to allow users to thing, not to paint and view a picture. The network needs to be functional and allow the user to perform tasks, and as he stated, it should do this in three ways, providing an overview of the entire network (the macro), Zooming in and filtering on specific parts of the network (micro), and then providing details on this when required. Having a pretty network is only benefit, not essential. Interestingly, this really has some strong ties with the research I’ve been doing on creating a methodology for understanding Web Activity, and how one need to be able to examine both the micro and macro, with detail when required. I digress, but this has definitely left me thinking about content and context of networks. Towards the end of Ben’s talk, he introduced a new way (his PhD student is working on) to visualise networks, reducing clutter and complexity using a concept known as glyphs – in essence, it replaces fans (the attached nodes of a high in-degree node) with arches, which are of different sizes based upon the number of attached nodes within the fan. There are also other concepts such as bridges for multi-connected nodes, but the general idea is to provide a cleaner way to understand a network diagram – something to look out for indeed!
After Ben’s keynote, a coffee break was given, during which the morning poster session commenced. Despite it being only a workshop day, attendance was great, and the poster that I was presenting on visualising Twitter networks using a classification model was well received, questions and advice given, and plenty of discussions providing ideas to take away and develop further. One particular discussion left me thinking about not only the cascades of retweets within a network (and using this as a way to identify influential individuals), but also the use of entities within tweets – URLs, extra hashtags, photo’s – to create cascades of Tweets.
After the morning coffee break, 3 papers were given (4 was listed, but one of the presenters couldn’t make it), offering a range of exiting research focusing on Images, Blogs and Stream. The 3 selected papers offered some great research, explore different ways of understanding cultural differences within Instagram (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4782/), Examining the posting behaviour of bloggers and how they use social media (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4741/) and finally the introduction of a new, large scale research project which aims to develop a new method and set of tools to analyse information in blogs (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4740/). Some really interesting things were taken away from these three presentations, firstly, the ability to spot cultural differences based on the colours of Instagram photo’s, and also the openness and usability of Instagram’s API, allowing for mass collection of data. Secondly, the relationship between people that blog and their use of Twitter – 82% of bloggers use and advertise on Twitter, and the times that people write blogs – anytime of the day – compared to the use of Twitter, which is predominately during the day. Finally, it was great to see research that is pushing the multidisciplinarity angle, being aware that visualising networks can only provide so much information; it’s about the context as well.
After lunch, the Workshop resumed (with even more attendees than before), with an Applications panel discussing visualisation of data from a multi-disciplinary perspective, a great way to follow the final presentation of the morning. This raised some interesting topics regarding the use of social media visualisation in the social sciences, how it can be used effectively and efficiently. This draws upon some of the research areas I’ve been investigating, specifically the use of Big Data (and visualising it) within the social sciences. How we this data be used efficiently to gain a better understanding of social processes, structures etc. These types of questions really are really probing at the fundamental capabilities of the disciplines in question, but are worth asking.
Another coffee break followed this panel, providing another round of questions, networking and discussions, fuelled by the thought provoking debates regarding the use of visualisation across disciplines. The second and final paper session was focused towards microblogs, from examining how large amounts of data can be distilled and visualised simply (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4785/) to examining how political opinions and stance can be identified through the use of social media (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4774/). Again, this panel showcased some great working applications and research ideas for interpreting and visualising large streams of microblogging data (Twitter specifically), and also opened up a number of research areas where my work on Visualising Twitter conversations can be taken, specifically the use of Web-based visualisation. Overall, it was a great workshop, and the community that attended, the papers presented, and the discussions had, are promising for next year’s workshop.
The main conference track started on the 5th June, opened with a keynote from Google+ engineering director, Andrew Tomkins. His talk discussed the fundamentals of social networks why they exist and how we need to engineer online platforms more efficiently to get the most out of them. His presentation led to discussing how social networks are formed so that human can perform and complete tasks efficiently (this raised some debate on the Twitter stream), and how social networking platforms need to harness this; the next step in social networking is social task completion, especially as Web users spend 1/3 of their time using social networking sites or communicating with each other. Interestingly, his talk is very close to the concept of social computing/machines, but instead of harnessing the power of humans to complete computationally difficult tasks, the next step in social networking is harnessing what is already on the Web – and by that definition, what users already do – to turn it into a more efficient and task orientated machine. Taking the content already out there and the activities currently performed and making them social, i.e. retail, banking, etc. Andrews talk offered a great vision on the future direction of social networking, suggesting that such platforms need to now embrace much more than just offering communications and networking between friends, they need to make everything (on the Web) social.
The first paper presentation session focused on Privacy and security, starting a paper (which was a runner for the best paper award) on the motivations and truth behind the use of TripAdvisor (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4675/). This not only discussed the J-shape distribution of product reviews, but also discussed the current processes used to determine fake reviews – by hand, only 62% of the fake reviews can be found. An interesting finding of this paper was that single time reviewers tended to be more extreme and opinionated than multi-time reviewers, is this to do with the fact that the latter users are more worried about their social presence in the reviewing community? Shifting focus towards privacy, Stutzman’s paper on privacy on social networking sites and its relevance to social capital (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4666/) also proved to be interesting, finding that social capital doesn’t really play a prominent role in determining ones privacy concerns or settings, which tied nicely into Page’s presentation on boundary preservation and the use of Google+’s circle feature (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4679/); users were more aware of their use and control over their circles, which was closely related to their desire to preserve their social boundaries related to their offline circles. Interesting, as Stutzman’s research suggests that social capital doesn’t affect privacy (which implicitly is affecting the network that they are in), yet individuals aim to preserve their circles of users based on the pre-existing offline boundaries. The final presentation of the session fit well with the on-going discussion on privacy, examining the fine line of disclosure and concealment of social media users (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4613/). Using Item Response Theory (IRT) as an analytical framework, a study was performed to examine the disclosure of Facebook user’s metadata, such as political position, gender, age, etc. From this, two interesting points were raised, men were more willing to share information (beyond their inner circle) and also geographic and work-related information was highly-valued, and was less likely to be shared in a public setting.
Following this session, the lightening presentations were given, consisting of 9, 1 minute presentations supported by a single slide (or a copy of their poster), all around the topic of diffusion & propagation, topics and sentiment analysis. This is a presentation format that I hadn’t seen before and actually offers a really good way for the audience to engage with the presenters, as after they present, they get to stand by their posters take questions.
The afternoon presentations shifted focus towards user profiling and grouping, again starting with a best paper candidate – examining how social media can be used to examine different characteristics of a city (www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4682/). A highlight of this session, and a paper close to my research area was presented by Sharad Goel for Yahoo! Research, examining the activities of users on the Web and their browsing behaviour (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4660/), providing some verification and validation to the work that I’ve been doing examining the structure of the Web (albeit from a quantitative perspective). In addition to this, the research also pointed out that the browsing behaviour of a user can be used to identify specific details, such as ethnicity, financial status, etc. Also, interestingly, a user’s educational background determined the type of browsing and activities that they perform; in effect providing evidence to support the argument of the digital divide.
Another paper that stood out (It was a best paper candidate too) was Adam Sadilek et al. research on modelling the spread of disease using social networks and the social interactions between users (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4660/). Using a large dataset, their framework for tracking disease demonstrated that increased interactions between users who have infectious diseases were more likely to catch it, and co-located interactions (not necessarily with the infected user) increases the risk of infection as well. With a growing trend in tacking real-world issues, the use of information diffusion in studies such as this shows the real benefit and impact of social media (and the researchers)!
The second day of the conference was opened by a keynote by Lada Adamic, with the title of “the information life of social”. This was a real contrast to the previous days keynote by Andrew Tomkins, as it focused on social networks as a platform for sharing information. Some really interesting figures were presented, including incentives to share based on your friends – you are 7.3 times more likely to share something if your friend shares it. Another interesting Figure raised is the likelihood of sharing something at all – which has a 0.26% probability – this seems low, but when you scale it up, if this was even at 1%, then the information overload would cause an epidemic of shared content, social processes actually act as an effective sharing filter. Lada’s keynote then discussed the diffusion of meme’s, and showed that the rate of diffusion was similar to Yule’s process of evolution – how organisms mutate. Lada also showed that certain memes, based on their subject or content spread faster and more efficiently than others, the use of a qualitative study of why would have been a nice addition to this to add more content to why this was the case. Overall, the keynote offered a great insight into the diffusion of shared content, and discussed some timely topics such as the spreading of meme’s across social networks.
The rest of the morning was the same format as before, first with a paper presentation round focusing on sentiment and emotion, followed by a number of quick fire presentations related to geographical research topics. A paper which was of real interest to me and also possibly research within Southampton’s Web and Internet Science research lab was a paper given by Yelena Mejova, who was looking at the analysis sentiment across different social media streams (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4580/). The research showed that using a model that is trained on a diverse dataset offers a generalizable solution to examine sentiment across different social media platforms. Examining three different types of social media – reviews, Twitter, Blogs – it turns out that reviews, followed by Twitter, and then finally blogs provide the best way to train a sentiment classifier. However, the implications of this research reach beyond just the training of sentiment classifiers, it is suggesting that there are similarities between social media platforms, even though they are diverse and used for different purposes. In regards to the Web Observatory project (http://www.w3.org/community/webobservatory/), knowing that cross sentiment analysis of multiple social media platforms is possible provides a useful reference for current the current being conducted – examining the diffusion of information across a variety of social media platforms such as Microblogs and Wikis.
The afternoon of second day was filled with a number of industry led sessions, opening with Igor Perisic keynote, a senior director of engineer for LinkedIn, who discussed the dynamics of social networking in terms of the Job ecosystem, followed by two industry panels, news and business. These were focused towards the application and benefit of social media, but also supplied some interesting facts and figures including: over 370 million tweets are produced per day and 12 of the top 25 social news providers didn’t exist 10 years ago; clearly an example of a fast paced, rapidly changing community. As with any research interested in the cutting edge, keeping up with the latest methods to collect, analyse and present findings will no doubt be a challenge, yet, it is these challenges that make it so rewarding. The evening of the 6th was reserved for the conference welcome event, which took place in Dublin’s famous Guinness factory, a great place to network, discuss future research opportunities and relax with the community of social media enthusiasts.
The final day of the conference began with Fabrizio Sestina’s keynote on the new EU funded project on Collective Awareness Platforms, designed to help foster a sustainable environment and improve social innovation. The topics he discussed- Internet Science, collective actions, technology design, social policies, legal frameworks – were very similar to the goals of Web Science, especially with the call for multidisciplinary projects (which was actually a requirement to get funding). This will be an interesting project to watch, especially in terms of the overlap with Web Science research, hopefully in the future, collaboration between the Web Science research labs and this project will be possible.
A paper which stood out was that of Duc Minh Luu et al. (http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4625/), who discussed the diffusion of items across social networks. Their work, which was discussed with me during the poster presentations earlier on during the week, examined the spreading of items within social networks, introducing and explaining their use of the Bass Model to examine the diffusion of entities within a network. Interestingly, their work on the temporal diffusion of content over time showed similarities with the research that I have been doing in regards to the dynamic changes in network structure of communications between users. Both Doc Minh Lee’s and my findings show the average degree distribution of a network increases overtime, interesting that this is the same for a communications (mentions) network as the diffusion of items (i.e. URLs, images, video).
Overall the conference was a great success, offering a diverse set of research papers, keynotes and panel sessions. The workshops were a great way to begin the week, providing the opportunity to interactive and network with others in the same research community. The single track format of the conference ensured that all presentations could be attended, and the lightening sessions offered a much more fluid and engaging way to discuss presenter’s research. Let ICWSM2013 be as much of a success as 2012.
Well, ICTD2012 is over, and what a great conference it has been! Loads to take away from it, there was really great energy there, which has been truly reflected in the Twitter conversations that developed over the past 4 days.
Following up from the previous post, I have examined the dynamics networks that formed as a result of the communications of twitter users (tweetees); these communications are either retweets (like the graphs shown in Part 1 of this blog post), or through users mentioning each other within their tweets.
Before we get to looking at the different communication networks that emerged, let’s examine the frequency of tweets during the conference. Over the 4 days, 2299 tweets were made, and out of those, 1437 of them were retweets, and out of the remaining tweets, 646 had mentions contained in them. That’s a good deal of communications between people! This can now be examined in a little more detail.
Slideshow 1 is of the Mention communications captured before and during the ICTD2012 conference:
Let’s first look at the evolution of the ‘mention’ network over the past 4 days. As the slideshow shows, there was a good amount of communications between users, this resulted in an average degree of 1.7 with a maximum in-degree of 33, which basically means each user was on average in communications with 1.7 (call it 2 people). Also, a little bit more analysis of the data revealed the top 10 users to be mentioned within the conference.
Slideshow 2 is of the retweet communications captured before and during the ICTD2012 conference:
Slideshow 3 shows the comparison between the unclassified and classified retweet network at Day4 of ICTD2012:
Examining the retweet network graph exposes the same kind of activity, over the 4 days, the number of retweets between users was phenomenal, with an average in degree of 3, and a maximum in-degree of 49, and a maximum out-degree (number of times a user retweeted other unique users) of 47. Putting this into context, this means that there was a lot of sharing of information (but was it valuable…) between users, and some users were showing signs of being Hubs (users retweeting many other users) and authorities (users being highly retweeted by other users). As with the previous post, I applied the classification model on top of the data set, setting the minimum number of retweets to be a red node (an authority) to 40 (same as part 1). At the end of day 4, we are now seeing the addition of one red node (@anahi_ayala) and a load more yellow nodes connecting these users together; however ranking these yellow nodes based upon the number of times that they were the first ones to retweet a tweet that got retweeted multiple times (which results in the size of the yellow nodes), @melodyrclark and @shikohtwit came up top.
Twitter has become an integral part/tool/distraction during academic conferences, supplying an endless stream of communication between participants, feedback and praise (sometimes criticism), and even as a ‘news ticker’ for those not able to take part and attend the conference.
As with any large streams of information (conferences being no exception), it is often hard to keep track of what is being said, who is producing the important and valuable content, and what is just back chatter (i.e. what’s for lunch or where the ‘afterparty’ is). Finding the important information is always a challenge as more often than not, especially when trying to obtain a concise but comprehensive overview of a conference.
As part of my research which i have been demoing (Identifying Communicator Roles within Twitter), I’ve been working on ways to help identify different user roles within topical Twitter conversations, helping interested parties to who may be the users to follow (or target!). The model (which I won’t go into detail here, come and talk for more information) is based upon filtering users based on the dynamic network retweet network structure that occurs as conversations occur between users (bound by a specific hashtag). Examining the timeline of these tweets the model can be applied and different user roles start to become identifiable.
After receiving a good amount of interest today, I thought a good way of demonstrating my work would be to show the #ICTD2012 twitter retweet conversation network graph, both unclassified and then classified. The unclassified retweet network, shown in figure 1 (forget the colours and size, all nodes are the same) is all the retweets captured during the ICTD2012 conference during day 1 and 2, as it can be seen, it is very messy and identifying potentially important users becomes a difficult task. In comparison to this, Figure 2 shows the same dataset with the model applied to it, and immediately it is much clearer, and certain users begin to become identifiable. This time, the red nodes (@meowtree and @RitseOnline) are those users that are being highly retweeted (which in this graph the minimum retweets needed to be a red node is 40). More importantly, the orange nodes (@Anandstweets, @ekisesta, @Katrinskaya, @katypearce, @virbrussa) are the ones that are actually connectors between these highly retweeted users, potentially users that might be a good source to follow for an aggregated feed of news! What is really interesting though is how this will change over the next few days, will their roles stay the same, will more red and orange nodes start to appear? Something that only time will tell!
This is obviously a very brief overview of the concepts that underpins the classification model, which is still in its very early stage of development, but the applications of this for the ICTD community could be beneficial in the future.
Stay tuned for another look at the #ICTD2012 Twitter conversation towards the end of the conversation; let’s see how the network changes in the next two days!
The Digital Engagement Conference 2011 was took place within St James Park, Newcastle (or less formally known as SportsDirect.com) during the 15-17th November. The crowd had grown since the 2010 conference; presenting itself with a mix of familiar and new faces ready to talk about the present and future impacts of digital technologies on the modern society. It kicked off to a quick start, including the necessary safety briefing (note to ones self, if there is a fire alarm, do not move for 6 minutes, and then proceed to leave the building in a calm fashion). The opening speech was great by John Baird, who started the morning off on a high by announcing the winner of the best student paper award – Laura Carletti for “A grassroots initiative for digital preservation of ephemeral artefacts: the Ghostsigns project”. Following this, Professor James Hollan gave an engaging keynote on his work of bringing HCI into the digital world, which included various approaches to improving every day human experiences such as driving. He closed with a statement that appeared to encompass the theme for the rest of the conference – we need to use rich data to improve people’s lives.
Following this, a selection of speakers invited from a variety of industries gave their take on the current and future growth and development of the digital economy. Gary Moulton from Microsoft kicked off, discussing the need for technologies which cater for all sectors of society, highlighting their experiences with the developing technologies and product which are suitable for all ages and levels of ability. Ian Marshall then gave a sobering yet needed presentation on fusion of digital technology within the finance sector. Ian discussed the trailing use of current technologies and raised four points of concern: keeping up with the pace of change, the structures of organisations and operations, keeping data or information reliable, relevant, and integral, and finally the risk and problems that data security presents. Dave Sharp then discussed the video games industry, highlighting that over the years it has become a much tougher industry to survive in – it’s a 90/10 industry, where 90% of the profits are made from only 10% of the games made. Dave also highlighted the gaps between academia and the industry; suggesting that academia needs to keep up to speed with the pace of change and also prepare graduates more efficiently, making their transition into the industry more streamlined. Dave also raised a key point, the stereotype of the ‘games developer’ has changed, no longer are they ‘all dressed in black wearing Metallica’ (as Dave put it) individuals; the field has widened, requiring people from all different academic and vocational backgrounds. But as Dave made us aware, finding these people, and moreover, letting these people know that this industry needs them is not an easy task. The audience was then given a presentation on the decline and possible failure of the pharmaceutical industry, which has been in decline since 2001, with over $1 trillion of stocks being wiped even though $600 million has been pumped into the industry. Alarming figures with repercussions for not only the drug companies but also the patients that use them. Where did the industry go wrong? Oversimplification. What’s the solution? Network Pharmacology, which is the combination of network sciences and chemical biology, something which not many would of heard of. The presentation ended with a number of challenges for the digital economy to allow the industry to survive, including: educating people with the right skills and also develop and improve computational systems for analysing new drugs. Finally, Aart van Helteren from Philips gave an excellent keynote on Philips drive towards digital health technologies including DirectLife – an active lifestyle technology, and Lifeline – a monitoring technology for the elderly. Although this is excellent work, and something that is potentially beneficial to society, especially the old and frail, as Aart agreed, getting these technologies into the home is not so simple, with barriers not only from the people, but also from governments themselves.
A Hungry crowd then proceeded to discuss the issues over lunch – possibly equally as good as the presentations, given the choice of dishes. This was then followed by the workshop sessions, which ranged from Gaming in the Digital Economy to examining Intellectual Property issues. The afternoon of the Wednesday was then filled with a 3 slots of parallel sessions which focused on a variety of themes including music and sound, assistive technologies, cultural heritage, managing user sourced information, independent living and new directions of the digital economy.
This lead quite nicely to the evening event, which included a short but chilly trek to the Great North Museum to where the poster presentation and conference banquet was held. A excellent choice of location yet bizarre, discussing current research ideas in a room filled with ancient artefacts and full size replicas of creatures from a time long ago. The banquet was executed well, with the presentation of ‘Telling Tails of Engagement’ competition winner (and the announcement of DE2012 in Aberdeen) happening between the starter and main. The evening ended well, some brilliant discussions; there was a real buzz in the air as people departed.
Thursday morning started promptly, with the opening keynote by Don Marinelli discussing the Entertainment Technology Center (ETC) in Carnegie Mellon. A truly engaging presentation highlighting ETC’s approach to getting academics, students and industry to work together. This really addressed the concerns and problems that Dave Sharp discussed on the previous day, closing the gap and improving the relations between academia and commercial industry.
Following Don’s presentation, a short coffee break was taken, and then remaining parallel sessions began, which included: technologies for sensitive spaces, crowdsourcing, Open Data and security, engaging users online, connecting communities, and supportive services.
Unfortunately my time at the conference finished after the second round of parallel sessions, but I did get to catch some great crowdsourcing and open data demonstrations, areas which are of great interest to me.
Since last year, the digital economy agenda has grown, and so has the community which supports it. As with any maturing subject, continuous effort is required to make it successful, and DE2011 has shown that this support is in full strength. I’m looking forward to what DE2012 brings (apart from the deep fried Mars bar).