Scrap data from rsoe-edis.org.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
K e019e7119b Change find-text keybind 2 weeks ago
setup Forgot getnewtypes.py 2 weeks ago
README.md Increment to-do list 2 weeks ago
sigh Change find-text keybind 2 weeks ago

README.md

sigh, a program to cure optimism


sigh is a simple program made to scrape, save and print information about disaster events around the world listed on rsoe-edis.org. Epidemics, floods, heatwaves, earthquakes, leakage of toxic chemicals, cyclones, plane crashes, zoonoses, you name it. Enough to definitely cure your optimism for the day, if not more.

sigh is meant to be used either with fzf submenus when run with no flag or arguments, or directly from CLI with queries.

Installation

Dependencies

  • bs4
  • requests
  • json
  • jtbl
  • argparse

Those are Python dependencies for the autogenerated .py scripts, they can be installed with pip3 install bs4 requests json jtbl argparse.

Usage

Scrap worldwide events from rsoe-edis.org

Usage: we [OPTION]

'we' depends on its directory structure, do not manually move the executable
to your PATH. './we --setup' will create a symbolic link to your ~/.local/bin.

  Options:
    -g, --get [CODE]          Scrap data for event type to data/CODE.json.
    -p, --print [CODE]        Print saved data as json for event type.
    -t, --table [CODE]        Print saved data as table for event type.
    -l, --list-types          List types currently associated to scripts.
    -s, --setup               Create event scrap scripts and optionally link 'we' to PATH.
    -u, --update-types        Fetch current event types and save them.
    -R, --rm-scripts          Remove existing script(s) and optionally remove 'we' from PATH.
    -C, --clear-data ([CODE]) Remove queried data or all data files (clean data/ directory).
    -v, --version             Print version.
    -h, --help                Print this help.

  Improve me:
    https://git.teknik.io/matf/worldevents```
#### Example for animal epidemic (type EPA):

```bash
$ we --list-types
AAT	EPA	EPD	EPH	INH	HEC	LSC	PSI	SDM	TER	
EVP	PPP	IND	SUE	IBE	OUD	ERQ	LSL	VOE	FLD	
CBE	MIA	OHI	OTE	TRI	AIR	PRA	WTR	CYC	DRT	
EXR	HAI	HEW	LIT	PTF	SEW	STO

$ we --get epa
Appended 3 event(s) to /home/user/Projects/worldevents/data/EPA.json. ✔

$ we --print epa
{
  "Date": [
    "2021-08-25 20:28:42",
    "2021-08-25 19:49:22",
    "2021-08-25 16:39:46"
  ],
  "Location": [
    "Benin, Africa",
    "Nigeria, Africa",
    "South Africa, Africa"
  ],
  "Title": [
    "Benin - Benin confirms H5N1 avian flu outbreak",
    "Nigeria - Nigeria's southern state reports bird flu outbreak",
    "South Africa - Khayelitsha animal clinic records two rabies cases after more than 20 years"
  ],
  "Details": [
    "https://rsoe-edis.org/eventList/details/111380/0",
    "https://rsoe-edis.org/eventList/details/111370/0",
    "https://rsoe-edis.org/eventList/details/111325/0"
  ]
}
###
# Not implemented yet
###
$ we --table epa
Date                 Title                                                                        Details
-------------------  ---------------------------------------------------------------------------  ------------------------------------------------
2021-08-25 20:28:42  Benin - Benin confirms H5N1 avian flu outbreak                               https://rsoe-edis.org/eventList/details/111380/0
2021-08-25 19:49:22  Nigeria - Nigeria's southern state reports bird flu outbreak                 https://rsoe-edis.org/eventList/details/111370/0
2021-08-25 16:39:46  South Africa - Khayelitsha animal clinic records two rabies cases after mor  https://rsoe-edis.org/eventList/details/111325/0

Clear existing data

$ we --clear-data air
Wrn: permanently delete AIR.json? This cannot be undone. [y/N] y
/home/user/Projects/worldevents/data/AIR.json removed. ✔

$ we --clear-data
Wrn: you are about to permanently delete 8 previously scraped data file(s). Type YES to confirm. YES
/home/user/Projects/worldevents/data/ directory cleaned. ✔

$ we --rm-scripts
Wrn: remove 37 scrap scripts? [y/N] y
/home/user/Projects/worldevents/scripts/ directory cleaned. ✔
Run 'we -s' to regenerate scrap scripts.

Also remove 'we' symbolic link from your PATH? [y/N] n

To do

  • Automate fetching the list of event types into setup/types.txt
  • Add an option to scrap all categories at once instead of putting strain on the website with a request for every event category; done but does one request per even type, plus details for every event.
  • Implement -t
  • Fix conflict in fzf for custom commands when fzf feeds an array (print, table), see 2604
  • Check that fzf headers are correct when above bullet done
  • Implement multi arguments for -R
  • Interactive mode
  • Human readable categories, not only types
  • Better fzf interactions (prompt for other option or sequence of options when relevant), get back to main menu. etc.
  • Better appending (avoid duplicates, add request date, merge into same json objects instead of creating new ones); duplicates not handled, better to do it either externally or switch to db system
  • Make functions into a master script that would show data, delete data, and fetch eventsot done yet)

Disclaimer

I was just reading about the gemini protocol and testing it with the cool amfora client, then stumbled upon gemini://aetin.art/earth.gmi. I found the concept pretty cool, so I started playing with it. I am not a programmer, let alone in Python, therefore I do not know if this will ever become feature complete. Cassandra helped (a lot) with getevents.py.

This is merely a way for me to play with web-scraping and Python for something I find useful, but I am not responsible for what you may use this for. Please just don’t abuse sigh --get so that folks at rsoe-edis.org do not feel the need to add reCaptchas to their website.