GEOSPEX Custom Banner

GEOSPEX | geospatial analysis | urban planning | environmental issues

Exploring ArcGIS Interface with Natural Earth Data

The following video tutorial utilizes Natural Earth Data ‘starter kit’ to explore some of the essential features of ArcGIS interface for entry-level cartography. I use this tutorial for some of my classes during the early stages of installing and situating students with ArcGIS:

Introduction to ArcMap Interface | Natural Earth Data from stephen metts on Vimeo.

This video features basic ArcMap interface navigation and file manipulation/organization. Natural Earth data is used exclusively in this tutorial.

Along with the video, I have a PDF ‘checklist’ that can be printed and used in conjunction with the video.

Best Wishes on getting situated with ArcGIS!


Prepping Geographic Attribute Data for Excel

For a variety of reason we often wish/need to extract the tabular data component from a geographic feature into Excel format. There are several-to-many ways to extract tabular data from GIS platforms, but often we want the resulting extract to be in explicit, Excel table format- an ubiquitous, common file format that is easy to work with not only by ourselves but others that may not be working in a GIS platform on a particular project.

Both ArcGIS and QGIS handle this essential task in similar ways. The essential steps are as follows:

1. Import Geographic feature into the GIS table of contents.
2. Utilize a tool to create the export to Excel format.
3. Point export to desired location; simply open the extract/export in Excel.

A GIS stack exchange detailing the process further can be had HERE

Its pretty straightforward and simple; and its an often-used workflow step, especially for those that are working on projects where value is being added by the GIS AND is also being worked upon by either ourselves or others outside the project GIS.

Prepping Geographic Attribute Data for Excel from stephen metts on Vimeo.

Often for a variety of reasons we want/need to export attribute data into Microsoft Excel table format.

This video tutorial covers this process using both ArcGIS and QGIS as the export platforms to derive the same/similar Excel formatted table.

The above video is a tutorial guide for this task across both ArcGIS and QGIS; the results are essentially the same: a coherent Excel table that mirrors the original attribute table of the original geographic feature. From here, the Excel table can be further processed in Excel and then potentially joined back to the geographic feature; or simply worked within Excel resulting in a revised/updated tabular data product.

SOTM 2015 in NYC: One Week Out

OSM1 [ STATE OF THE MAP CONFERENCE, 2015, NYC-United Nations ]

In anticipation of SOTM 2015 at the United Nations this upcoming weekend, a great overview of both OpenStreetMap (OSM) and Humanitarian OpenStreetMap (HOT) can be found via this GitHubGist:

OSM2 [ OSM GitHubGist ]

Throughout the conference program, there are more than a few sessions devoted to OSM for humanitarian work. Its clear that trends in digital humanitarianism and pressing climate change issues are becoming increasingly central to both the development and consumption of OSM as platform and as geodata.

Recently at The New School in NYC, the Milano program hosted a mapping workshop to support the Missing Maps effort associated with OSM and HOT. A blog post giving an overview to this workshop can be found here.

Getting a handle on GeoData- Part III

Following Part II ‘Getting a Handle on GeoData’, this third and final post primarily summarizes the 4th chapter in Kitchin’s The Data Revolution Big Data, Open Data, Data Infrastructures and Their Consequences wherein BIG DATA is detailed.

Here’s a nice video overview with Kitchin that captures some of the key points highlighted thus far, as well as those in this post regarding BIG DATA:

The thrust of Kitchin’s Chapter 4, is the theoretical positioning of BIG DATA with special emphasis on its unique characteristics. BIG DATA characteristics are often summarized as either the ‘3 Vs’ or the ‘4 Vs’; Kitchin expounds and adds to these, and discusses some of their ramifications. In further chapters, Kitchin expands on the roles of users, corporations and producers of BIG DATA.

4Vs_1[ 4Vs ]

KEY PT. 16

BIG DATA is slippery; there is not a thoroughly agreed upon ‘industry standard’ definition…

Like many terms used to refer to the rapidly evolving use of technologies and practices, there is no agreed academic or industry definition of big data. The most common makes reference to the 3Vs: volume, velocity and variety (Laney 2001; Zikopoulos et al. 2012).

KEY PT. 17

BIG DATA is BIG and its only getting alot BIGGER; and…finding places to store all this data is becoming a real challenge.

…the number of servers (virtual and physical) worldwide will grow by a factor of 10, the amount of information managed by enterprise datacenters will grow by a factor of 50, and the number of files the datacenter will have to deal with will grow by a factor of 75, at least. (Gantz and Reinsel, 2011)

…the data generated are so vast that they neither get analysed nor stored, consisting instead of transient data. Indeed, the capacity to store all these data does not exist because, although storage is expanding rapidly, it is not keep pace with data generation (Gantz et al. 2007; Manyika et al. 2001).


KEY PT. 18

BIG DATA usually operates on the mantra ‘More is Better’; sheer volume of data becomes a valued hallmark of BIG DATA projects. Kitchin terms this capacity of BIG DATA as Exhaustivity.

…big data projects strive towards capturing entire populations (n=all), or at least much larger sample sizes than would traditionally be employed in small data studies (Mayer-Schonberger and Cukier 2013).

Like other forms of data, spatial data has grown enormously in recent years, from real-time remote sensing and radar imagery, to large crowdsourced projects such as [OSM], to digital spatial trails created by GPS receivers being embedded in devices.

…its little wonder we’re drowning in data. If we can track and record something, we typically do. (Zikopoulos et al. 2012)

KEY PT. 19

In addition to its sheer volume, BIG DATA is also becoming much more resolute (large scale in map terminology) and indexed (identifiable as unique instances).

In addition to data exhaustivity, big data are becoming much more fine-grained in their resolution, together with a move towards strong indexicality (unique labelling and identification) (Dodge and Kitchin 2005).

This increase in the resolution of data has been accompanied by the identification of people, products, transactions and territories becoming more indexical in nature [i.e., a bottle of shampoo can now be tagged with an RFID chip with unique ID code].

KEY PT. 20

Along with the indexicality of BIG DATA, there is a trend to amplify the
relationality of different datasets; that is, interconnections between BIG DATASETS are increasingly valuable, especially to data brokers and profiling companies.

Relationality concerns the extent to which different sets of data can be conjoined and how those conjoins can be used to answer new questions.

…its is the ability to create data that are highly relational that dries the vast data marketplace and the profits of data brokers and profiling companies.

KEY PT. 21

Like so many contemporary human activities, BIG DATA is marked by a default, ‘Always On’ mode wherein data is continuously streaming- in the 3/4Vs terminology, this is known as Velocity.

A fundamental difference between small and big data is the dynamic nature of data generation. Small data usually consist of studies that are freeze-framed at a particular time and space… For example, censuses are generally conducted every five or ten years. In contrast, big data are generated on a much more continuous basis, in many cases in real-time or near to real-time.

Analysing such streaming data is also a challenge because at no point does the system rest, and in cases such as the financial markets micro-second analysis of trades can be extremely valuable.

KEY PT. 22

BIG DATA further differs from Small Data in that its actual structure can be, and often is, much more Varied; that is full of structured, unstructured or semi-structured data within one BIG DATASET.

Both small and big data can be varied in their nature, being structured, unstructured or semi-structured, consisting of numbers, text, images, video, audio and other kinds of data. In big data these different kinds of data are more likely to be combined and linked, conjoining structured and unstructured data… Small data, in contrast, are more discrete and linked, if at all, through key identifiers and common fields.

As the Open Data Center Alliance (2012: 7) notes, [p]reviously, unstructured data was either ignored or, at best, used inefficiently. However, advances in distributed computing and database design using NoSQL structures, and in data mining and knowledge discovery techniques, have hugely increased the capacity to manage, process and extract information from unstructured data.

KEY PT. 23

Small Data has traditionally been relatively rigid in terms of research design and data management. In BIG DATA, the opposite is true: the shear volume and velocity of BIG DATA ensures both the necessity and the capacity through new database design and data techniques for increased Flexibility.

The use of NoSQL databases means that changeable data can be managed at high velocity, adapting to new fields. This means that it is possible to adapt data generation on a rolling basis and to perform adaptive testing… Because the volumes of people using these sites [Facebook, Google, ect.] are vast, their sample sizes are enormous, meaning they can make changes without fear of losing representativeness.

As distinquished above, Kitchin’s seven characteristics of BIG DATA beyond the typical 3/4Vs, raise serious questions about our ‘deluge of data’. As Kitchin states, What does it mean for society, government and business to gain access to very large, exhaustive, dynamic, fine-grained, indexical, varied, relational, flexible and scalable data? To what extent can such data provide penetrating insights into the human condition or help address some of the most pressing social, political, economic and environmental issues facing the planet? Indeed, these are the ‘BIG’ questions that we should/must ask of BIG DATA.

Getting a handle on GeoData- Part II

Following Part I ‘Getting a Handle on GeoData’, this post has been derived primarily from Rob Kitchin’s new publication Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. If these posts are of interest, I strongly recommend getting the book as my culled points are designed specifically to address issues that I foresee as valuable for an upcoming, introductory lecture on GeoData- and they admittedly missing an awful lot of valuable material found in the book.

In Kitchin’s second chapter, Small Data, Data Infrastructures & Data Brokers, the stage is set calling out ‘Small Data’ against the hyped, emerging backdrop of ‘Big Data’. Doing a quick image search for ‘Big Data’, visual analogies fall generally into 3 types- Big Data as cities; Big Data as tubes and Big Data, the world at large. Visually, there is a real lack of precision as to what Big Data actually is except for zero’s and one’s spanning across BIG THINGS. Kitchin takes on the vagueness of Big Data by situating it against Data we already know well….

BDall [ Big Data doing Big Things! ]

KEY PT. 12

We can know Big Data by what it is NOT (small data).

“Until [recently] the term ‘small data’ was rarely, if ever, used. Its deployment has arisen purely as the de facto oppositional term to so-called ‘big data’.”

“Small data may be limited in volume and velocity, but they have a long history of development, with established methodologies and modes of analysis, and a record of producing answers to scientific questions.”

“Big data generally capture what is easy to ensnare- data that are openly expressed (what is typed, swiped, scanner, sensed, ect.; people’s actions and behaviors; the movement of things)….”

Small DATA are those map portals and ‘tailored’ spatial repositories we’ve known and used day-in-and-out for the last two decades.

Following Kitchin’s line of thought of understanding Big by way of Small, private actors like Facebook, Google, Wal-Mart, Amazon and Netflix do not particularly attend to ‘tailored’ data, small data. They are in it for data derived from people’s (consumer’s) actions and behaviors, and they see the value, especially, of ALOT of this Big Data. As a tag-along, Data Brokers, Aggregators, Consolidators and Resellers repackage Big Data, creating not a ‘Data Commons’ of small data, but rather proprietary, closed products. We know them, again, vaguely, as Epsilon, Acxiom, Datalogix, Alliance Data Systems, eBureau, ChoicePoint, CoreLogic, Equifax, Experian, ID Analytics, Infogroup, Innovis, Intelius, Recorded Future, Seisint and TransUnion, ect.

Yeah… not the first stop for a GIS analyst or cartographer looking for open, well-managed spatial data!

KEY PT. 13

The Big Data hype has effectively masked criticism of the emerging Big Data industry.

“By gathering together large stores of small data held by public institutions and private corporations and mashing them together with big data flows, data brokers can produce various kinds of detailed individual and aggregated profiles tha can be used to micro-target, assess and sort markets, providing high-value intelligence for clients.”

“Interestingly, given the volumes and diversity of personal data that data brokers and analysis companies possess, and how their products are used to socially sort and target individuals and households, there has been remarkably little critical attention paid to their operations.”

“At present, data brokers are generally largely unregulated and are not required by law to provide individuals access to the data held about them, nor are they obliged to correct errors relating to those individuals (Singer 2012b).”

KEY PT. 14

As a corollary to Big Data, the Open Data Movement has emerged as an increasingly effective approach to ‘opening’ closed data- often in the public sector, often targeting Small Data

The Open Data Movement seeks to radically transform this situation [closed and/or proprietary data], both opening up data for wider reuse, but also providing easy-to-use research tools that negate the need for specialist analytic skills. The movement is build on three principles: openness, participation and collaboration (White House 2009).

In particular, attention has been focused on opening data that has been produced by state agencies (often termed public sector information – PSI) or publically funded research.

principles [ OpenGovData ]

For Berners-Lee (2009), open and linked data should ideally be synonymous and he sets out five levels of such data, each with progressively more utility and value. His aspiration is for what he terms five-star (level five) data- a fully operational semantic Web.

5star [ 5 Star Open Data ]

KEY PT. 15

Even as the Open Data Movement has shown some successes, politics, sustainability, utility and usability are all significant, unresolved issues. In short, Open Data has its own hype issues which can mask important and controversial ramifications of the movement.

At one level, the case for open and linked data is commonsensical…However, the case…is more complex, and their economic underpinnings are not at all straightforward.

Much more critical attention then needs to be paid to how open data projects are developing as complex sociotechnical systems with diverse stakeholders and agendas.

Getting a handle on GeoData- Part I

In the summer of 2014, ESRI launched their ‘open data’ portal, a bit of a jump the shark moment. Everything BIG DATA and OPEN DATA is all the rage for many converging reasons- and has been so for a while now.

ArcGIS Open Data Portal [ ArcGIS Open Data Portal ]

From an academic perspective, this is all very exciting but challenging in that its difficult to simply teach/instruct students new to GIS all the what/how’s of a system AND cover DATA sufficiently. Often students walk away with the notion Geo Data = shapefiles and rasters. How to get beyond all the file structures and formats, not to mention the system itself, AND convey important dimensions and ramifications of DATA itself?

Solution #1 Rob Kitchin’s new publication The Data Revolution:
Big Data, Open Data, Data Infrastructures and Their Consequences

data revolution [ Data Revolution ]

Published in 2014, this is an ideal guide to the essentials of what is DATA; what we are currently doing with it that is fundamentally different than in the past; and finally speculation and ramifications of both BIG and OPEN DATA for information systems.

Broken into several chapters as listed below, it occurs to me that this is the perfect outline for a complete overhaul of a DATA Lecture I’ve tried to sandwich between Intro to Vector and Raster Model Lectures! Often Introductory GIS courses really don’t consider Geo Data in depth; much less DATA itself as a stand-alone lecture topic; so this is a bit of a unorthodox approach, but one whose time I think has come.

Kitchin’s TOC:

  • 01 Conceptualising Data
  • 02 Small Data, Data Infrastructures & Data Brokers
  • 03 Open and Linked Data
  • 04 Big Data
  • 05 Enablers and Sources of Big Data
  • 06 Data Analytics
  • 07 The Governmentals and Business Rationale for Big Data
  • 08 The Reframing of Science, Social Science and Humanities Research
  • 09 Technical and Organisational Issues
  • 10 Ethical, Political, Social and Legal Concerns
  • 11 Making Sense of the Data Revolution
  • To get started, I’m planning to utilize some of Kitchin’s points that strike a cord from each chapter to form a narrative designed to live as a full-fledged lecture. This is the Part 1 post of several as I develop this Lecture over the next 2 weeks…

    KEY PT. 1

    The SCALE/SPEED of DATA today:

    “The Scale of the emerging data deluge is illustrated by the claim that ‘[b]etween the dawn of cilivlization and 2003, we only created 5 EXABYTES of information; now we’re creating that amount EVERY TWO DAYS (Hal Varian, chief economist with Goolge, cited in Smolan and Erwitt 2012).”

    3,000 Years = Every 2 Days

    Here’s Hal:
    ArcGIS Open Data Portal [ Hal Varian ]

    KEY PT. 2

    Buzz vs. Reality

    “These new opportunities have sparked a veritable boom in what might be termed ‘data boosterism’; rallying calls as to the benefits and prospects of big, open and scaled small data, some of it justified, some pure hype and buzz.”

    KEY PT. 3

    DATA is NOT Neutral

    “While many analysts may accept data at face value, and treat them as if they are neutral, objective, and pre-analytic in nature, data are in fact framed technically, economically, ethically, temporally, spatially and philosophically. Data do not exist independently of the ideas, instruments, practices, contexts and knowleges used to generate, process and analyse them (Bowker 2005; Gitelman and Jackson 2013).”

    KEY PT. 4

    DATA is NOT random- its completely SELECTED

    “Data harvested through measurement are always a selection from the total sum of all possible data available- what we have chosen to take from all that could potentially be given. As such, data are inherently partial, selective and representative, and the distinguishing criteria used in their capture has consequence.”

    KEY PT. 5

    DATA is NOT facts or information per se, and in relation to computers (and GIS) needs to be PROCESSED

    “…from a computational position data are collections of binary elements that can be processed and transmitted electroncially….data constitute the inputs and outputs of computation but have to be processed to be turend into facts and informations (for example, a DVD contains gigabytes of data but no facts or information per se) (Floridi 2005).”

    KEY PT. 6

    Data Vary across 5 main attributes

  • Form- Qualitative vs. Quantitative
  • Structure- Structured, semi-structured or unstructured
  • Source- Captured, derived, exhaust, transient
  • Producer- Primary, secondary, tertiary
  • Type- Indexical, attribute, metadata.
  • KEY PT. 7

    Quantitative data is generally MORE common than qualitative data relative to GIS (my assertion). As such, is measured across 4 general categories:

  • Nominal > Cateogrical > unmarried vs. married
  • Ordinal > Rank Order> low, medium, high
  • Interval> Measured across a certain scale> Temp across Celsius scale
  • Ratio> Scale posesses a true zero origin> Exam on a scale 0-100
  • KEY PT. 8

    Primary vs. Secondary vs. Tertiary COLLECTION processes

  • Primary > Researcher
  • Secondary > One person’s primary data can be used ‘secondarily’ by another
  • Tertiary > Derived data- counts, categories and statistical results.
  • KEY PT. 9

    Indexical vs. Attribute vs. Metadata TYPES of data

  • Indexical > Indexical data are those that enable identification and linking (unique identifiers- important in GIS)
  • Attribute > Represent aspects of a phenomenon, but not indexical in nature.
  • Metadata > “Data about data”.
  • KEY PT. 10

    Regardless of how you think about DATA, its @ the bottom of the ‘Knowledge Pyramid’

    pyramid[ Knowledge Pyramid ]

    KEY PT. 11

    Databases and Data Infrastructures: where/how DATA is stored, organized and put to work- Not a benign process at all.

    “Data infrastructures host and link databases into a more complex sociotechnical structure. As with databases, there is nothing interent or given about how such archiving and sharing structures are composed.”

    GPS + ArcGIS vs. GPS + QGIS: Pluses and Minuses

    The benefits and preferences of one desktop GIS vs the other – specifically ArcGIS vs. QGIS – are legion and numerous; and not the subject of this post. Both platforms do however tout the potential to interface with handheld GPS units for both upload/download capabilities of geospatial data- waypoints, routes and tracks.

    Handheld, recreational GPS units (as opposed to commercial units like Trimbles) continue to advance in their locational precision, and they are increasingly used for field survey work that doesn’t need the surveyor precision of a commercial unit. Precision at 5 Meters is relatively common at this point, especially with GPS + GLONASS found in many units such as the newer Garmins.

    What doesn’t seen to be keeping up with the advances in the handhelds themselves are the interface options both in ArcGIS and QGIS. Both platforms require extra plugins/dependencies to effectively sync directly with a handheld GPS. This is nothing less than frustrating. For QGIS some leeway can be given for the open source nature of the platform; plugins are developed by its open development community. But for ArcGIS that relies on proprietary, profit-driven development, not to feature a quick, easy and dependable handheld support as of the latest release (10.2.2) is a definite oversight.

    For ArcGIS, STILL (2015!) the only port option available as default in the GPS interface tool is a serial port. ArcGIS will simply not recognize a standard USB port that is part and parcel of handhelds . Really the only option available for GPS to ArcGIS directly is create a virtual port via third-party software. North River Geographic has a post on this process of creating a virtual port; and MxGPS has an extension– both processes are certainly more hassle than simple ‘plug and play’.

    ArcGIS serial port GPS interface [ ArcGIS serial port GPS interface ]

    For QGIS, GPS capability resides in its plugin architecture, specifically its GPS Tools plugin. On the face, this is all and good, but there are, again, dependencies in the form of GPSBABEL that need to be downloaded and installed to create the bridge between the GPS unit and QGIS. This is not without its complications as discussed in this GIS stack exchange post.

    QGIS GPS Tools Plugin [ QGIS GPS Tools Plugin ]


    QGIS GPS Tools Interface for USB [ QGIS GPS Tools Interface for USB ]

    Given the limitations outlined above of both ArcGIS and QGIS relative to GPS, I’ve been resorting to a workaround found in an application designed and developed by the Minnesota Department of Natural Resources- the DNRGPS.


    Although it doesn’t provide a direct connection with either ArcGIS or QGIS, it operates much better than either interfaces for the unit itself in prepping both upload/downloads between unit-desktop.

    DNRGPS- Initial Load [ DNRGPS- Initial Load ]

    Typical File Types [ DNRGPS- Typical File Types ]

    DNRGPS is not designed to work specifically with QGIS but it does handle .gpx and other geospatial data types common to QGIS. Its not designed to run on Mac, not tested on Linux, but is stable on Windows. It features good documentation; its open source, accessible, free and is built on dependable components, specifically GDAL, GPSBABEL, PROJ4 and ESRI’s File GDB API. I like it and now depend on it as my ‘go to’ GPS input/output step prior to mapping in ArcGIS or QGIS.

    Syrian Displacement Camps- rapid formation as urban ‘growth’

    While the Syrian Civil War rages on, millions of Syrians have been displaced both within and outside the country. Its a nearly unprecedented displacement; the spatial result resembling full-fledged, full-formed cities in their own right. Among these DP (displaced persons) camps, Zaatari is arguably the best known due to its massive scale and organization. An excellent overview report of ‘The Instant City’ is available from the Affordable Housing Institute which details the camp’s social and built infrastructure. From the NY Times, the camp/city’s ‘grassroots economy’ is considered.

    Beyond the social and civil ramifications of this displacement, the camp/city’s spatial ‘reorganization’ viewed from afar is nothing less than throughly alarming. The DP camps that are emerging from the crisis are unprecedented and atypical- they have become a new form of urban growth. As beneficial as research efforts such as NYU Stern’s Planet of Cities are at cataloging urban growth through innovative methodologies, the speed and scale of the emerging era of DP camps may prove an ultimate challenge to both academic research and the urban planning field.

    Considering three of the Syrian DP camps- Zaatari, Killis & Mrajeeb Al Fhood, each camp’s unique formation is easily captured through Google Earth Imagery. Belllingcat journalism has a post referring to the process. Further in the case of Zaatari, the ‘city’ is now featured part and parcel in select map engines; certainly in Open Street Map.

    [OSM Coverage- Zaatari DP Camp]

    Mapping lat/Long of each of the three camps reveals another shared characteristic- they all maintain some proximity to more ‘normal’ urban centers and typical urban growth patterns. In this way, they are one with global, urban sprawl patterns but with one critical caveat: this is a growth pattern not of dispersion but of intense concentration- suburbia’s alter ego, so to speak.

    To follow: each camp’s Lat/long mapped in proximity to urban centers, and a Google Earth imagery composite of the camp’s rapid emergence:

    zataari_overview[Zaatari Displacement Camp]

    [Zaatari Displacement Camp]

    Zaatari Camp Expansion- Circa 2013 via Google Earth Imagery ]

    killis_overviewKillis Displacement Camp ]

    [Killis Displacement Camp]

    Kilis Oncupinar Accommodation Facility Expansion- Circa 2013 via Google Earth Imagery ]

    azraq_overviewAzraq DP Camp ]

    [Azraq displacement camp]

    Al_AZRAQ_smAzraq displacement camp ]

    Data + Mapping for Ebola Epidemic, West Africa

    In response to the Ebola epidemic, mapping and open data communities have come together in various ad-hoc forums to collectively advance knowledge, data & innovation- often from afar, often in hopes of being useful to individuals and organizations situated in West Africa.

    I attended the Ebola Open Data Jam with Meetup on 10/18/2014 in NYC; the following session is designed in part to continue the work that was done during the first meetup in NYC:

    Open Data for Africa Planning 10/7/2014 | Washington DC

    These sessions are supported and promoted by individuals and organizations; specifically Steven Adler with IBM; Jeanne Holm with & NASA and Rich Robbins with Upper West Strategies.

    During the NYC session, there was a lot of ‘noise’ to start, and some ‘signal’ at the end. Specifically, three major themes emerged. First, the need to develop, decipher and distribute data sets in open repository formats; second, the need to get a handle on the locational capacity of cell phone usage and SMS, as well as ‘big data’ originating in West Africa; and third, contributing VGI (volunteered geographic information) to assist logistical efficiency of humanitarian efforts and organizations.

    The first, understandable task for dealing with the epidemic from a technical standpoint is to simply access data in an open format. This is not an easy task at all as West Africa is typical of many developing countries in that data organization is not necessarily a priority issue. To address this challenge, the meetup is utilizing a DKAN open data platform. Steven Adler has been adamant that data supported within the platform conforms to standard metadata model (DCAT>DCAT+).

    The repository resulting from the meetup sessions:

    [ ]

    The second task involves harnessing the power of locational capacity of cell phones, SMS (short message service) and other ICT (information and communications technology) to track and analyze the potential spread of the epidemic. In the humanitarian community there’s certainly been marked progress over the years in strategies for working with ICT; the Ebola epidemic is yet another test of these advances. The meetup struggled with conceptualizing an approach to deal with ‘big data’, esp. Twitter, utilizing its time stamp and locational capacity. There was a fair amount of discussion on the prospect of gaining access to ICT data in West Africa. One very recent advance that may prove helpful in this ongoing task is the establishment of Social Media Hashtag Standards for Disaster Response. This standardization would greatly assist in the effort to pull out the ‘signal’ from social media ‘noise’.

    The third task incorporated the VGI efforts of a breakout group of HOT mappers for West Africa. Generally using the ID editor platform for editing OSM through HOT Tiles, the group was able to make OSM edits for high priority tiles. The following visualization captures the general ‘before’ OSM coverage vs. the ‘after’; its striking, typical of the amazing advancement that is HOT- allowing VGI mappers focused effectively on specific tasks.

    OSM Coverage 'Before' [ OSM Coverage ‘Before’ ]

    OSM Coverage 'After' [ OSM Coverage ‘After’ ]

    OSM HOT Mappers - Ebola Open Data [ OSM HOT Mappers – Ebola Open Data ]

    Going forward, hopefully these voluntary efforts will indeed be useful to fighting the Ebola epidemic in West Africa. Certainly there will be ‘lessons learned’ from these efforts that can then feed into further advancements in Open Data & Mapping for Humanitarian Efforts.

    Additional Resources for Open Data & Mapping for Ebola in West Africa:

    HDX repository for Ebola

    Good, insightful overview of tech efforts for Ebola intervention from IBM

    Gabriele Almon’s overview post for Ebola mapping resources

    Caitlin Rivers github repository

    NYC Census Tracts – Idiosyncrasies & Vagrancies

    Thematic mapping with U.S. Census data is of course a very common, valid approach to socioeconomic analysis across time and geography. The census tract areal unit is often an appropriate geography for both dense urban cities as well as less dense suburban and rural locations. But there are situations where complications arise, several in particular that hamper thematic mapping across greater NYC composed by its 5 boroughs- Kings county, Queens county, Bronx county, New York county and Richmond county.

    The first issue arises from the contiguous nature of census tracts germane to the TIGER (Topologically Integrated Geographic Encoding and Referencing) format developed by and for U.S. Census data. The geographic component for census tracts works well for instances wherein topography is uniform; for example, rural areas in the midwest where land is relatively uninterrupted and uniform. This is not the case in dense urban areas that abut interrupting water bodies. In greater NYC, the Hudson, East River and various harbor waterways pose a real complication to the contiguous nature of TIGER. In effect, the TIGER geography is too inclusive in these situations as it does not factor the difference between land (where people live) and water (where people usually don’t live).

    To address this issue, city planning utilizes an amended file essentially clipped to the shoreline. However, this clipping process introduces another complication; but on balance for general thematic mapping its usually the preferred problem. This issue is an ‘overhang’ of census tracts that really belong in one borough, but end up stranded along the shoreline in another borough. In the following case, a Manhattan tract is stranded across the East River in Brooklyn.

    Census Tract ‘overhang’ across East River  ]

    The second issue– certainly not unique to changing cities – is tract change over time, in this case 2000 to 2010. This presents a significant complication to accurate longitudinal mapping. If the areal geometry and/or attribute identification between the early and later tract states is in any way different, an ‘apples to apples’ analysis cannot proceed in bulk. Many tracts may remain stable, but those that change do so in predictable be variable ways- enough to demand alternative mapping approaches. In the following, the blue 2010 highlights are locations where a change has taken place in the 2010 tract; the orange represents change that has taken place in the 2000 tract:

    2000 – 2010 census changes  ]

    The U.S. Census does a good job typifying these changes through online documentation. The first is an overview of the 2000-2010 changes; the second resource is the actual relationship files themselves. With the relationship files in hand (.csv) the changes notated by the U.S. Census can be tagged to the GEOID for the expression of the 2010 tracts. The following shows tracts that have changed in some way (light blue) vs. those that have remained stable across 2000 – 2010 in both their geometry and attributes (dark blue):


    Tract changes essentially fall into several predictable categories. Interestingly, there are approximately a dozen geometry changes that the U.S. Census relationship file does not tag but do in fact exist as 2000-2010 changes in greater NYC. These particular changes can be typified generally as REVISIONS- usually relatively small changes along the edges of a tract. The following example occurs at Holy Cross Cemetery in Brooklyn wherein a half block of building(s) are brought into the 2010 expression of tract 085200:

    Tract 085200 Revision  ]

    The second change involves the consolidation of 2000 tracts into 2010 tracts, what the U.S. Census terms a MERGE:

    Census Tract Consolidation  ]

    The third change is simply a SPLIT of a 2010 tract, often due to a significant increase in population density within the 2000 tract geometry from 2000 to 2010:

    Census Tract Split at 027400 2000 Tract  ]

    In greater NYC, these tract changes can be further typified via GIS analysis utilizing a tabulated intersection. The results can be joined with the 2010 tract geometry to quantify the percentage of change. This works well for REVISIONS and MERGES; but does not capture SPLITS as the input zone feature – 2010 tracts – does not have a lesser class feature by which to quantify percentage of change. Of the 2010 tracts count of 2166, approximately 7% are REVISIONS and 10% MERGES. If the U.S. Census relationship files are taken into account for unaltered tracts, approximately 65% of tracts are UNALTERED, leaving approximately 18% as SPLITS. These are very approximate numbers resulting from loose SQL selections and reliance on the U.S. Census relationship file categorizations.

    Regardless the exact breakdown of no change, revisions, merges and splits, its clear that there’s a lot of discrepency between the 2000 and 2010 census tracts for greater NYC. If one wants to map change over time, what’s the best way to proceed? There’s several workarounds, all with their own +s and -s.

    The first option is to conduct analysis and mapping using the smaller, often considered more stable U.S. Census block. This gives a finer-grained unit of analysis useful for both picking up generalized patterns at smaller scales, as well as capturing important ‘street level’ differences at larger scales. This is indeed the approach taken by CUNY to such great effect:

    Important to note however is that changes between 2000 and 2010 at the census block level have indeed occurred (the absolute difference in named blocks alone is a 2076 – approximately 5.5% – 2010 gain over 2000); its just that the overall effect for mapping – especially at a smaller scale – is less of an overall burden than with census tracts. Regardless, one is still presented with the same normalization challenges if the goal is to map longitudinally within census blocks and forego the ‘side-by-side’ approach adopted by CUNY.

    Given this lingering normalization issue regardless of the particular U.S. Census areal unit, a second option is to conduct areal interpolation from one census areal unit into another, and proceed with a 2000 interpolation comparison with 2010 census tracts or blocks. This approach has its own baked in accuracy issues, but it will allow for the normalization of 2000 data into 2010 geometry, and forego the main ‘apples to apples’ analysis challenge thus far.

    A third and likely easiest/best approach may be to rely on the good work of Spatial Structures in the Social Sciences at Brown University and utilize their open data for longitudinal mapping analysis. This is a great resource that is easily accessible to 2010 census geometries for mapping as far back as the 1970s. Further, the tools provided by the program allow for utilization of a user’s own data to supplement the essential socioeconomic variables currently available.

    Finally, a great resource for ‘snapshots’ of particular census geographies (having little to do with longitudinal analysis per se) is Census Reporter using predominately American Community Survey data from 2012.