GEOSPEX Custom Banner

GEOSPEX | geospatial analysis | urban planning | environmental issues

Getting a handle on GeoData- Part III

Following Part II ‘Getting a Handle on GeoData’, this third and final post primarily summarizes the 4th chapter in Kitchin’s The Data Revolution Big Data, Open Data, Data Infrastructures and Their Consequences wherein BIG DATA is detailed.

Here’s a nice video overview with Kitchin that captures some of the key points highlighted thus far, as well as those in this post regarding BIG DATA:

The thrust of Kitchin’s Chapter 4, is the theoretical positioning of BIG DATA with special emphasis on its unique characteristics. BIG DATA characteristics are often summarized as either the ‘3 Vs’ or the ‘4 Vs’; Kitchin expounds and adds to these, and discusses some of their ramifications. In further chapters, Kitchin expands on the roles of users, corporations and producers of BIG DATA.

4Vs_1[ 4Vs ]

KEY PT. 16

BIG DATA is slippery; there is not a thoroughly agreed upon ‘industry standard’ definition…

Like many terms used to refer to the rapidly evolving use of technologies and practices, there is no agreed academic or industry definition of big data. The most common makes reference to the 3Vs: volume, velocity and variety (Laney 2001; Zikopoulos et al. 2012).

KEY PT. 17

BIG DATA is BIG and its only getting alot BIGGER; and…finding places to store all this data is becoming a real challenge.

…the number of servers (virtual and physical) worldwide will grow by a factor of 10, the amount of information managed by enterprise datacenters will grow by a factor of 50, and the number of files the datacenter will have to deal with will grow by a factor of 75, at least. (Gantz and Reinsel, 2011)

…the data generated are so vast that they neither get analysed nor stored, consisting instead of transient data. Indeed, the capacity to store all these data does not exist because, although storage is expanding rapidly, it is not keep pace with data generation (Gantz et al. 2007; Manyika et al. 2001).


KEY PT. 18

BIG DATA usually operates on the mantra ‘More is Better’; sheer volume of data becomes a valued hallmark of BIG DATA projects. Kitchin terms this capacity of BIG DATA as Exhaustivity.

…big data projects strive towards capturing entire populations (n=all), or at least much larger sample sizes than would traditionally be employed in small data studies (Mayer-Schonberger and Cukier 2013).

Like other forms of data, spatial data has grown enormously in recent years, from real-time remote sensing and radar imagery, to large crowdsourced projects such as [OSM], to digital spatial trails created by GPS receivers being embedded in devices.

…its little wonder we’re drowning in data. If we can track and record something, we typically do. (Zikopoulos et al. 2012)

KEY PT. 19

In addition to its sheer volume, BIG DATA is also becoming much more resolute (large scale in map terminology) and indexed (identifiable as unique instances).

In addition to data exhaustivity, big data are becoming much more fine-grained in their resolution, together with a move towards strong indexicality (unique labelling and identification) (Dodge and Kitchin 2005).

This increase in the resolution of data has been accompanied by the identification of people, products, transactions and territories becoming more indexical in nature [i.e., a bottle of shampoo can now be tagged with an RFID chip with unique ID code].

KEY PT. 20

Along with the indexicality of BIG DATA, there is a trend to amplify the
relationality of different datasets; that is, interconnections between BIG DATASETS are increasingly valuable, especially to data brokers and profiling companies.

Relationality concerns the extent to which different sets of data can be conjoined and how those conjoins can be used to answer new questions.

…its is the ability to create data that are highly relational that dries the vast data marketplace and the profits of data brokers and profiling companies.

KEY PT. 21

Like so many contemporary human activities, BIG DATA is marked by a default, ‘Always On’ mode wherein data is continuously streaming- in the 3/4Vs terminology, this is known as Velocity.

A fundamental difference between small and big data is the dynamic nature of data generation. Small data usually consist of studies that are freeze-framed at a particular time and space… For example, censuses are generally conducted every five or ten years. In contrast, big data are generated on a much more continuous basis, in many cases in real-time or near to real-time.

Analysing such streaming data is also a challenge because at no point does the system rest, and in cases such as the financial markets micro-second analysis of trades can be extremely valuable.

KEY PT. 22

BIG DATA further differs from Small Data in that its actual structure can be, and often is, much more Varied; that is full of structured, unstructured or semi-structured data within one BIG DATASET.

Both small and big data can be varied in their nature, being structured, unstructured or semi-structured, consisting of numbers, text, images, video, audio and other kinds of data. In big data these different kinds of data are more likely to be combined and linked, conjoining structured and unstructured data… Small data, in contrast, are more discrete and linked, if at all, through key identifiers and common fields.

As the Open Data Center Alliance (2012: 7) notes, [p]reviously, unstructured data was either ignored or, at best, used inefficiently. However, advances in distributed computing and database design using NoSQL structures, and in data mining and knowledge discovery techniques, have hugely increased the capacity to manage, process and extract information from unstructured data.

KEY PT. 23

Small Data has traditionally been relatively rigid in terms of research design and data management. In BIG DATA, the opposite is true: the shear volume and velocity of BIG DATA ensures both the necessity and the capacity through new database design and data techniques for increased Flexibility.

The use of NoSQL databases means that changeable data can be managed at high velocity, adapting to new fields. This means that it is possible to adapt data generation on a rolling basis and to perform adaptive testing… Because the volumes of people using these sites [Facebook, Google, ect.] are vast, their sample sizes are enormous, meaning they can make changes without fear of losing representativeness.

As distinquished above, Kitchin’s seven characteristics of BIG DATA beyond the typical 3/4Vs, raise serious questions about our ‘deluge of data’. As Kitchin states, What does it mean for society, government and business to gain access to very large, exhaustive, dynamic, fine-grained, indexical, varied, relational, flexible and scalable data? To what extent can such data provide penetrating insights into the human condition or help address some of the most pressing social, political, economic and environmental issues facing the planet? Indeed, these are the ‘BIG’ questions that we should/must ask of BIG DATA.