Query Yahoo finance historical data via Python requests

evolving your manner

2017.10.30

Overview

This term, one of my friends serves as the TA for a course about algorithm trading. Then she needs historical data for the course project. However, querying data from Yahoo API ichart.finance.yahoo.com always failed. As it is said Yahoo closed the api on May 18th 2017, but leave a download button on its webpage. For example, https://finance.yahoo.com/quote/BIDU/history?p=BIDU, you can find the download link like https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao for “BIDU”. Then we can build a similar url to download the historical data as CSV file.

Data from the link

I found a post, NEW YAHOO FINANCE QUOTE DOWNLOAD URL, which clearly explained how to build the download url. Here, I just present the brief review.

https://query1.finance.yahoo.com/v7/finance/download/BIDU?period1=1506780002&period2=1509372002&interval=1d&events=history&crumb=NbjLKgotcao

Look at the above url for the historical data, there is a crumb parameter that is not familiar with us. Actually, you can find it in the html file:

curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/BIDU?p=BIDU > baidu.html

Curl the page and save the html and cookie. There are several crumbs showing in the page, but there is only one CrumbStore":{"crumb":"SCYl9KtqqXZ"}, we can search and save it. But that’s not enough, if you directly query data from this url, you probability get an error. We still need a parameter from the cookie text:

# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.yahoo.com	TRUE	/	FALSE	1540910372	B	0mjeuqhcveed4&b=3&s=62

The B values is what we need, we can store this value for later use.

Query using python requests

Ok, now doing it with python requests. Things we need to do:

build the url for python request
save the cookie B value
find the crumb parameter from the page
set the start and end date, for whole historical data, it’s from now to the very first

Here is the script adapted from YAHOO FINANCE QUOTE DOWNLOAD PYTHON, itself explained what is going on:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

import re
import sys
import time
import datetime
import requests

def get_cookie_value(r):
    return {'B': r.cookies['B']}

def get_page_data(symbol):
    url = "https://finance.yahoo.com/quote/%s/?p=%s" % (symbol, symbol)
    r = requests.get(url)
    cookie = get_cookie_value(r)
    lines = r.content.decode('unicode-escape').strip(). replace('}', '\n')
    return cookie, lines.split('\n')

def find_crumb_store(lines):
    # Looking for
    # ,"CrumbStore":{"crumb":"9q.A4D1c.b9
    for l in lines:
        if re.findall(r'CrumbStore', l):
            return l
    print("Did not find CrumbStore")

def split_crumb_store(v):
    return v.split(':')[2].strip('"')

def get_cookie_crumb(symbol):
    cookie, lines = get_page_data(symbol)
    crumb = split_crumb_store(find_crumb_store(lines))
    return cookie, crumb

def get_data(symbol, start_date, end_date, cookie, crumb):
    filename = '%s.csv' % (symbol)
    url = "https://query1.finance.yahoo.com/v7/finance/download/%s?period1=%s&period2=%s&interval=1d&events=history&crumb=%s" % (symbol, start_date, end_date, crumb)
    response = requests.get(url, cookies=cookie)
    with open (filename, 'wb') as handle:
        for block in response.iter_content(1024):
            handle.write(block)

def get_now_epoch():
    # @see https://www.linuxquestions.org/questions/programming-9/python-datetime-to-epoch-4175520007/#post5244109
    return int(time.time())

def download_quotes(symbol):
    start_date = 0
    end_date = get_now_epoch()
    cookie, crumb = get_cookie_crumb(symbol)
    get_data(symbol, start_date, end_date, cookie, crumb)

symbol = input('Enter the symbol: ')
print("--------------------------------------------------")
print("Downloading %s to %s.csv" % (symbol, symbol))
download_quotes(symbol)
print("--------------------------------------------------")

From bottom to top, we enter the symbol we want to download the data, then call the download_quotes() function. It sets the dates range, get cookie B value and look for the crumb parameter, then download it through requests. Finally, we get the CSV file for the historical data saved in the symbol.csv file. If Yahoo updates its api later, we probably could do query in a similar fashion.

THE END

Ads by Google

林宏

Frank Lin

Hey, there! This is Frank Lin (@flinhong), one of the 1.41 billion . This 'inDev. Journal' site holds the exploration of my quirky thoughts and random adventures through life. Hope you enjoy reading and perusing my posts.

Query Yahoo finance historical data via Python requests

Overview

Data from the link

Query using python requests

林宏

YOU MAY ALSO LIKE

Using Liquid in Jekyll - Live with Demos

Setup an IKEv2 server with strongSwan

Install Nextcloud with PlanetScale cloud database

TOC

Query Yahoo finance historical data via Python requests

Overview

Data from the link

Query using python requests

林宏

YOU MAY ALSO LIKE

Using Liquid in Jekyll - Live with Demos

Setup an IKEv2 server with strongSwan

Install Nextcloud with PlanetScale cloud database

TOC

RECENT READS