Query Yahoo finance historical data via Python requests

Query Yahoo finance historical data via Python requests

evolving your manner

Overview

This term, one of my friends serves as the TA for a course about algorithm trading. Then she needs historical data for the course project. However, querying data from Yahoo API ichart.finance.yahoo.com always failed. As it is said Yahoo closed the api on May 18th 2017, but leave a download button on its webpage. For example, https://finance.yahoo.com/quote/BIDU/history?p=BIDU, you can find the download link like https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao for “BIDU”. Then we can build a similar url to download the historical data as CSV file.

I found a post, NEW YAHOO FINANCE QUOTE DOWNLOAD URL, which clearly explained how to build the download url. Here, I just present the brief review.

https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao

Look at the above url for the historical data, there is a crumb parameter that is not familiar with us. Actually, you can find it in the html file:

curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/BIDU?p=BIDU > baidu.html

Curl the page and save the html and cookie. There are several crumbs showing in the page, but there is only one CrumbStore":{"crumb":"SCYl9KtqqXZ"}, we can search and save it. But that’s not enough, if you directly query data from this url, you probability get an error. We still need a parameter from the cookie text:

# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.yahoo.com	TRUE	/	FALSE	1540910372	B	0mjeuqhcveed4&b=3&s=62

The B values is what we need, we can store this value for later use.

Query using python requests

Ok, now doing it with python requests. Things we need to do:

  1. build the url for python request
  2. save the cookie B value
  3. find the crumb parameter from the page
  4. set the start and end date, for whole historical data, it’s from now to the very first

Here is the script adapted from YAHOO FINANCE QUOTE DOWNLOAD PYTHON, itself explained what is going on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import re
import sys
import time
import datetime
import requests

def get_cookie_value(r):
    return {'B': r.cookies['B']}

def get_page_data(symbol):
    url = "https://finance.yahoo.com/quote/%s/?p=%s" % (symbol, symbol)
    r = requests.get(url)
    cookie = get_cookie_value(r)
    lines = r.content.decode('unicode-escape').strip(). replace('}', '\n')
    return cookie, lines.split('\n')

def find_crumb_store(lines):
    # Looking for
    # ,"CrumbStore":{"crumb":"9q.A4D1c.b9
    for l in lines:
        if re.findall(r'CrumbStore', l):
            return l
    print("Did not find CrumbStore")

def split_crumb_store(v):
    return v.split(':')[2].strip('"')

def get_cookie_crumb(symbol):
    cookie, lines = get_page_data(symbol)
    crumb = split_crumb_store(find_crumb_store(lines))
    return cookie, crumb

def get_data(symbol, start_date, end_date, cookie, crumb):
    filename = '%s.csv' % (symbol)
    url = "https://query1.finance.yahoo.com/v7/finance/download/%s?period1=%s&period2=%s&interval=1d&events=history&crumb=%s" % (symbol, start_date, end_date, crumb)
    response = requests.get(url, cookies=cookie)
    with open (filename, 'wb') as handle:
        for block in response.iter_content(1024):
            handle.write(block)

def get_now_epoch():
    # @see https://www.linuxquestions.org/questions/programming-9/python-datetime-to-epoch-4175520007/#post5244109
    return int(time.time())

def download_quotes(symbol):
    start_date = 0
    end_date = get_now_epoch()
    cookie, crumb = get_cookie_crumb(symbol)
    get_data(symbol, start_date, end_date, cookie, crumb)

symbol = input('Enter the symbol: ')
print("--------------------------------------------------")
print("Downloading %s to %s.csv" % (symbol, symbol))
download_quotes(symbol)
print("--------------------------------------------------")

From bottom to top, we enter the symbol we want to download the data, then call the download_quotes() function. It sets the dates range, get cookie B value and look for the crumb parameter, then download it through requests. Finally, we get the CSV file for the historical data saved in the symbol.csv file. If Yahoo updates its api later, we probably could do query in a similar fashion.

THE END
Ads by Google

林宏

Frank Lin

Hey, there! This is Frank Lin (@flinhong), one of the 1.41 billion . This 'inDev. Journal' site holds the exploration of my quirky thoughts and random adventures through life. Hope you enjoy reading and perusing my posts.

YOU MAY ALSO LIKE

Using Liquid in Jekyll - Live with Demos

Web Notes

2016.08.20

Using Liquid in Jekyll - Live with Demos

Liquid is a simple template language that Jekyll uses to process pages for your site. With Liquid you can output complex contents without additional plugins.

Setup an IKEv2 server with strongSwan

Tutorials

2020.01.09

Setup an IKEv2 server with strongSwan

IKEv2, or Internet Key Exchange v2, is a protocol that allows for direct IPSec tunnelling between networks. It is developed by Microsoft and Cisco (primarily) for mobile users, and introduced as an updated version of IKEv1 in 2005. The IKEv2 MOBIKE (Mobility and Multihoming) protocol allows the client to main secure connection despite network switches, such as when leaving a WiFi area for a mobile data area. IKEv2 works on most platforms, and natively supported on some platforms (OS X 10.11+, iOS 9.1+, and Windows 10) with no additional applications necessary.

TOC

Ads by Google