NASA Earth Meteorite Landings

25 May 2020

This post shows you how to merge geographic map (which we get from from json file format) and data points (in this case meteorite landings) into a single map and visualize the data.

We will use the URL from NASA’s Open Data Portal API from NASA, it is a comprehensive data set from The Meteoritical Society. The dataset contains information on all of the known meteorite landings. For the sake of this tutorial we will request data and import it as JSON. Our target is to extract the following features from that data to create a dataframe and then map the points to visualize the locations of the landings. The columns we are interested are listed below:

-ID
-Year
-Fall
-Name
-Name Type
-Mass
-Latitude
-Longitude

As usual let us import all the necessary libraries that we are going to use.

import pandas as pd
import geopandas as gpd #used for transforming geolocation data
import matplotlib.pyplot as plt

from datetime import datetime  #to convert data to datetime that does not fall 
#within the pandas.to_datetime function timeframe
from shapely.geometry import Point  #transform latitude/longitude to geo-coordinate data
from geopandas.tools import geocode #get the latitude/longitude for a given address
from geopandas.tools import reverse_geocode  #get the address for a location using latitude/longitude

import requests  #similar to urllib, this library allows a computer to ping a website
import json      #library to handle JSON formatted data
%matplotlib inline

The url we are accessing needs no authentication. When accesing APIs it is better to check if we need authentication keys or not. Besides that, it is also worth cheking the sharing protocol for each websites. Next we use requests library to request the url.

#URL to NASA meteorite API
url = r"https://data.nasa.gov/resource/y77d-th95.json"

#the get function checks to make sure that the website/server is responding back
#200 means that we're good
#https://www.restapitutorial.com/httpstatuscodes.html
resp = requests.get(url)
resp

<Response [200]>

We can check the response status code using the above code line, for further information you can check the website. Once we get the <Response [200]> we can go ahead and check if we really are accessing the site by printing it out. As it can be a large output, it is advisable to limit the content if possible.

#send a request to the website to return back text data from the API
#returns data as JSON string
str_data = resp.text
str_data[:1500]

'[{"name":"Aachen","id":"1","nametype":"Valid","recclass":"L5","mass":"21","fall":"Fell","year":"1880-01-01T00:00:00.000","reclat":"50.775000","reclong":"6.083330","geolocation":{"type":"Point","coordinates":[6.08333,50.775]}}\n,{"name":"Aarhus","id":"2","nametype":"Valid","recclass":"H6","mass":"720","fall":"Fell","year":"1951-01-01T00:00:00.000","reclat":"56.183330","reclong":"10.233330","geolocation":{"type":"Point","coordinates":[10.23333,56.18333]}}\n,{"name":"Abee","id":"6","nametype":"Valid","recclass":"EH4","mass":"107000","fall":"Fell","year":"1952-01-01T00:00:00.000","reclat":"54.216670","reclong":"-113.000000","geolocation":{"type":"Point","coordinates":[-113,54.21667]}}\n,{"name":"Acapulco","id":"10","nametype":"Valid","recclass":"Acapulcoite","mass":"1914","fall":"Fell","year":"1976-01-01T00:00:00.000","reclat":"16.883330","reclong":"-99.900000","geolocation":{"type":"Point","coordinates":[-99.9,16.88333]}}\n,{"name":"Achiras","id":"370","nametype":"Valid","recclass":"L6","mass":"780","fall":"Fell","year":"1902-01-01T00:00:00.000","reclat":"-33.166670","reclong":"-64.950000","geolocation":{"type":"Point","coordinates":[-64.95,-33.16667]}}\n,{"name":"Adhi Kot","id":"379","nametype":"Valid","recclass":"EH4","mass":"4239","fall":"Fell","year":"1919-01-01T00:00:00.000","reclat":"32.100000","reclong":"71.800000","geolocation":{"type":"Point","coordinates":[71.8,32.1]}}\n,{"name":"Adzhi-Bogdo (stone)","id":"390","nametype":"Valid","recclass":"LL3-6","mass":"910","fall":"Fell"'

We need to decode the json file and that can be achieved as follows

#loads function reversed dictionary order
#dictionary objects are unordered in general
NASAdata = json.loads(str_data)
#NASAdata

It is also better to verify that JSON object is list and check its length, this is done by

type(NASAdata)
len(NASAdata)```

To help us navigate through the tree, we keep asking the types until we reach the end of the hierarchical line.


```python
#first level keys in JSON object
NASAdata[0]

{'name': 'Aachen',
 'id': '1',
 'nametype': 'Valid',
 'recclass': 'L5',
 'mass': '21',
 'fall': 'Fell',
 'year': '1880-01-01T00:00:00.000',
 'reclat': '50.775000',
 'reclong': '6.083330',
 'geolocation': {'type': 'Point', 'coordinates': [6.08333, 50.775]}}

#dict['key']['key'][index]
#dictionary name, dictionary key, dictionary key, then list index
NASAdata[0]['geolocation']

{'type': 'Point', 'coordinates': [6.08333, 50.775]}

type(NASAdata[0]['geolocation'])

dict

NASAdata[0]['geolocation'].keys()

dict_keys(['type', 'coordinates'])

#dict['key']['key'][index]
#dictionary name, dictionary key, dictionary key, then list index
NASAdata[0]['geolocation']['type']

'Point'

type(NASAdata[0]['geolocation']['type'])

str

#dict['key']['key'][index]
#dictionary name, dictionary key, dictionary key, then list index
NASAdata[0]['geolocation']['coordinates']

[6.08333, 50.775]

type(NASAdata[0]['geolocation']['coordinates'])

list

len(NASAdata[0]['geolocation']['coordinates'])

NASAdata[0]['geolocation']['coordinates'][0]

6.08333

Let us now create an empyt list where we will put the retrivied information. We use the notation namels, where name is the name of the column and ls indicate that we are creating new list.

idmls = []
yearls = []
fallls = []
namels = [] 
nametypels = []
massls = []
latitudels = []
longitudels = []

#extract the values for the meteore dataset

for meteor in NASAdata:
      
    #get the value of each key and if the key doesn't exist, set a variable to be None
    try: idm = meteor['id']
    except: idm = None
            
    try: year = meteor['year']
    except: year = None
            
    try: fall = meteor['fall']
    except: fall = None
        
    try: name = meteor['name']
    except: name = None
            
    try: nametype = meteor['nametype']
    except: nametype = None
            
    try: mass = meteor['mass']
    except: mass = None
            
    try: latitude =meteor['geolocation']['coordinates'][1]
    except: latitude = None
            
    try: longitude = meteor['geolocation']['coordinates'][0]
    except: longitude = None
            
    
    idmls.append(idm)
    yearls.append(year)
    fallls.append(fall)
    namels.append(name)  
    nametypels.append(nametype)
    massls.append(mass) 
    latitudels.append(latitude)
    longitudels.append(longitude) 

#check to see first 5 items of a few lists
print(idmls[:5])
print(yearls[:5])
print(fallls[:5])
print(namels[:5])
print(nametypels[:5])
print(massls[:5])

['1', '2', '6', '10', '370']
['1880-01-01T00:00:00.000', '1951-01-01T00:00:00.000', '1952-01-01T00:00:00.000', '1976-01-01T00:00:00.000', '1902-01-01T00:00:00.000']
['Fell', 'Fell', 'Fell', 'Fell', 'Fell']
['Aachen', 'Aarhus', 'Abee', 'Acapulco', 'Achiras']
['Valid', 'Valid', 'Valid', 'Valid', 'Valid']
['21', '720', '107000', '1914', '780']

#check number of items in each list
print(len(idmls))
print(len(yearls))
print(len(fallls))
print(len(namels))
print(len(nametypels))
print(len(massls))
print(len(latitudels))
print(len(longitudels))

#match indices of each list and zip into one list
meteorList = list(zip(idmls, yearls, fallls, namels, 
                    nametypels, massls, latitudels, longitudels))

#names for columns in data frame
colnames = ['id', 'year', 'fall','name', 'nametype', 'mass', 'latitude', 'longitude']

#create data frame with column names in it
df = pd.DataFrame(meteorList, columns=colnames)

df.head(10)

	id	year	fall	name	nametype	mass	latitude	longitude
0	1	1880-01-01T00:00:00.000	Fell	Aachen	Valid	21	50.77500	6.08333
1	2	1951-01-01T00:00:00.000	Fell	Aarhus	Valid	720	56.18333	10.23333
2	6	1952-01-01T00:00:00.000	Fell	Abee	Valid	107000	54.21667	-113.00000
3	10	1976-01-01T00:00:00.000	Fell	Acapulco	Valid	1914	16.88333	-99.90000
4	370	1902-01-01T00:00:00.000	Fell	Achiras	Valid	780	-33.16667	-64.95000
5	379	1919-01-01T00:00:00.000	Fell	Adhi Kot	Valid	4239	32.10000	71.80000
6	390	1949-01-01T00:00:00.000	Fell	Adzhi-Bogdo (stone)	Valid	910	44.83333	95.16667
7	392	1814-01-01T00:00:00.000	Fell	Agen	Valid	30000	44.21667	0.61667
8	398	1930-01-01T00:00:00.000	Fell	Aguada	Valid	1620	-31.60000	-65.23333
9	417	1920-01-01T00:00:00.000	Fell	Aguila Blanca	Valid	1440	-30.86667	-64.55000

df.tail(10)

	id	year	fall	name	nametype	mass	latitude	longitude
990	23984	1986-01-01T00:00:00.000	Fell	Tianzhang	Valid	2232	32.94667	118.99000
991	23989	1878-01-01T00:00:00.000	Fell	Tieschitz	Valid	28000	49.60000	17.11667
992	23998	1927-01-01T00:00:00.000	Fell	Tilden	Valid	74800	38.20000	-89.68333
993	23999	1970-01-01T00:00:00.000	Fell	Tillaberi	Valid	3000	14.25000	1.53333
994	24004	1807-01-01T00:00:00.000	Fell	Timochin	Valid	65500	54.50000	35.20000
995	24009	1934-01-01T00:00:00.000	Fell	Tirupati	Valid	230	13.63333	79.41667
996	54823	2011-01-01T00:00:00.000	Fell	Tissint	Valid	7000	29.48195	-7.61123
997	24011	1869-01-01T00:00:00.000	Fell	Tjabe	Valid	20000	-7.08333	111.53333
998	24012	1922-01-01T00:00:00.000	Fell	Tjerebon	Valid	16500	-6.66667	106.58333
999	24019	1905-01-01T00:00:00.000	Fell	Tomakovka	Valid	600	47.85000	34.76667

Geolocation Data

Geospatial information is data that is referenced by spatial or geographic coordinates. The data that we will be working with in this lesson is vector data - features that are represented by points, lines, and polygons. Points are defined by a pair of (x,y) coordinates. They usually represent locations, place names, and other objects on the ground. Lines are the connection between two points. They can have properties such as length, direction, flow, etc. Polygons are a series of lines connected together to form a shape. They can have properties such as area, perimeters, and centroids. In this notebook, you will need to install the geopandas and geoPy libraries. Also, download this text file to use in the example code below.

Geocoding and Reverse Geocoding

Geocoding is taking an address for a location and returning its latitudinal and longitudinal coordinates. Reverse geocoding would then be the opposite - taking the latitudinal and longitudinal coordinates for a location and returning the physical address.

#take an address and return coordinates
#returned variable is a geo-dataframe with 2 columns, geometry (the geographical shape) and the full physical address
ex1_geo = geocode("1830 Metzerott rd, Adelphi, MD", provider='nominatim')
ex1_geo

	geometry	address
0	POINT (-76.96662 39.00261)	Metzerott Road, Avenel, Buck Lodge, Adelphi, P...

#structure of full address
#each API structures addresses differently
ex1_geo['address'].iloc[0]

"Metzerott Road, Avenel, Buck Lodge, Adelphi, Prince George's County, Maryland, 20740, United States of America"

#use latitude and longitude to get physical address
#pass through using Point geometry
#also returns geo-dataframe with geometry and full physical address
ex2_geo = reverse_geocode([Point(-77.15879730243169, 39.0985195)], provider='nominatim')
ex2_geo

	geometry	address
0	POINT (-77.15880 39.09852)	Montgomery College, 51, Mannakee Street, West ...

Geocode a Dataframe column

We need to download the continents.json GeoJSON file for the world map that will be charted.

meteor_df = df
meteor_df.head()

	id	year	fall	name	nametype	mass	latitude	longitude
0	1	1880-01-01T00:00:00.000	Fell	Aachen	Valid	21	50.77500	6.08333
1	2	1951-01-01T00:00:00.000	Fell	Aarhus	Valid	720	56.18333	10.23333
2	6	1952-01-01T00:00:00.000	Fell	Abee	Valid	107000	54.21667	-113.00000
3	10	1976-01-01T00:00:00.000	Fell	Acapulco	Valid	1914	16.88333	-99.90000
4	370	1902-01-01T00:00:00.000	Fell	Achiras	Valid	780	-33.16667	-64.95000

#data type of each column
meteor_df.dtypes

id            object
year          object
fall          object
name          object
nametype      object
mass          object
latitude     float64
longitude    float64
dtype: object

#only dataframe with non-null year column values
meteor_df = meteor_df.loc[meteor_df['year'].notnull()]

#change year column into a string
#need to use string type for getYear function below
meteor_df['year'] = meteor_df['year'].astype(str)

#function to split apart the date from the timestamp
def getYear(col):
    #get YYYY-MM-DD value
    date = col.split("T")[0]
    
    #extract year from date
    dt = datetime.strptime(date, '%Y-%m-%d')
    return dt.year

#replace the year timestamp data with only the year (using the getYear function)
meteor_df['year'] = meteor_df['year'].apply(getYear)
meteor_df.head()

	id	year	fall	name	nametype	mass	latitude	longitude
0	1	1880	Fell	Aachen	Valid	21	50.77500	6.08333
1	2	1951	Fell	Aarhus	Valid	720	56.18333	10.23333
2	6	1952	Fell	Abee	Valid	107000	54.21667	-113.00000
3	10	1976	Fell	Acapulco	Valid	1914	16.88333	-99.90000
4	370	1902	Fell	Achiras	Valid	780	-33.16667	-64.95000

#see columns with null values
meteor_df.count()

id           999
year         999
fall         999
name         999
nametype     999
mass         971
latitude     987
longitude    987
dtype: int64

#only include rows with non-null latitudes (which means longitude is also not null) and non-null mass
meteor_df = meteor_df.loc[(meteor_df['latitude'].notnull()) & meteor_df['mass'].notnull()]
meteor_df.count()

id           959
year         959
fall         959
name         959
nametype     959
mass         959
latitude     959
longitude    959
dtype: int64

#make a new column to hold the longitude & latitude as a list
meteor_df['coordinates'] = list(meteor_df[['longitude', 'latitude']].values)

#see new coordinates column
meteor_df.head()

	id	year	fall	name	nametype	mass	latitude	longitude	coordinates
0	1	1880	Fell	Aachen	Valid	21	50.77500	6.08333	[6.08333, 50.775]
1	2	1951	Fell	Aarhus	Valid	720	56.18333	10.23333	[10.23333, 56.18333]
2	6	1952	Fell	Abee	Valid	107000	54.21667	-113.00000	[-113.0, 54.21667]
3	10	1976	Fell	Acapulco	Valid	1914	16.88333	-99.90000	[-99.9, 16.88333]
4	370	1902	Fell	Achiras	Valid	780	-33.16667	-64.95000	[-64.95, -33.16667]

#list values in coordinates column is classified as object type
meteor_df['coordinates'].dtypes

dtype('O')

#convert the coordinates to a geolocation type
meteor_df['coordinates'] = meteor_df['coordinates'].apply(Point)

#coordinates column now has POINT next to each coordinate pair value
meteor_df.head()

	id	year	fall	name	nametype	mass	latitude	longitude	coordinates
0	1	1880	Fell	Aachen	Valid	21	50.77500	6.08333	POINT (6.08333 50.775)
1	2	1951	Fell	Aarhus	Valid	720	56.18333	10.23333	POINT (10.23333 56.18333)
2	6	1952	Fell	Abee	Valid	107000	54.21667	-113.00000	POINT (-113 54.21667)
3	10	1976	Fell	Acapulco	Valid	1914	16.88333	-99.90000	POINT (-99.90000000000001 16.88333)
4	370	1902	Fell	Achiras	Valid	780	-33.16667	-64.95000	POINT (-64.95 -33.16667)

#coordinates column with geolocation data is just a regular pandas Series type
type(meteor_df['coordinates'])

pandas.core.series.Series

#create a geolocation dataframe type using the coordinates column as the geolocation data
geo_meteor = gpd.GeoDataFrame(meteor_df, geometry='coordinates')

#geo-dataframe looks the same as regular dataframe
geo_meteor.head()

	id	year	fall	name	nametype	mass	latitude	longitude	coordinates
0	1	1880	Fell	Aachen	Valid	21	50.77500	6.08333	POINT (6.08333 50.77500)
1	2	1951	Fell	Aarhus	Valid	720	56.18333	10.23333	POINT (10.23333 56.18333)
2	6	1952	Fell	Abee	Valid	107000	54.21667	-113.00000	POINT (-113.00000 54.21667)
3	10	1976	Fell	Acapulco	Valid	1914	16.88333	-99.90000	POINT (-99.90000 16.88333)
4	370	1902	Fell	Achiras	Valid	780	-33.16667	-64.95000	POINT (-64.95000 -33.16667)

#verify coordinates column is geolocation data type
type(geo_meteor['coordinates'])

geopandas.geoseries.GeoSeries

I already have the json data file with coordinates for the continents, but I presume it is not a difficult task to get the file online.

#import file that contains a world map shape polygons
#will use to plot the coordinates of meteorite landings
filepath = "datasets/continents.json"

#data contains polygon shape coordinates for different map body types (continents, etc.)
map_df = gpd.read_file(filepath)
map_df.head()

	CONTINENT	geometry
0	Asia	MULTIPOLYGON (((93.27554 80.26361, 93.14804 80...
1	North America	MULTIPOLYGON (((-25.28167 71.39166, -25.62389 ...
2	Europe	MULTIPOLYGON (((58.06138 81.68776, 57.88986 81...
3	Africa	MULTIPOLYGON (((0.69465 5.77337, 0.63583 5.944...
4	South America	MULTIPOLYGON (((-81.71306 12.49028, -81.72015 ...

The following code maps both your continent and meteorites into a single map.

#plot coordinates on top of map graph

#this is to set the size of the borders
fig, ax = plt.subplots(1, figsize=(15,10))

#this is the map 
basemap = map_df.plot(ax=ax,cmap='Greens')

#plot coordinates on top of map graph
geo_meteor.plot(ax=basemap, color='blue', marker="o", markersize=10)

#take off axis numbers
ax.axis('off')

#put title on map
ax.set_title("NASA Meteorite Landings", fontsize=25, fontweight=3)

Text(0.5, 1, 'NASA Meteorite Landings')

png