How to Get All Stocks from the S&P500 in Python

Jachowski
InsiderFinance Wire
4 min readApr 24, 2022

--

Get all stocks from the S&P500 in less than a minute

Look how many people. Just imagine how much time would you spend in identifying each of them and storing their names in a dataset. I don’t know what we could do with them, but for stocks, there’s a very concise solution.

Basically, we’re going to:

  • Read all tickers’ stocks belonging to the S&P500 from Wikipedia,
  • Store them in a list,
  • Download each stock from Yahoo Finance in less than a minute and
  • Save all data in a Pandas dataframe.

Let’s get into code stuff!

Import the Libraries

We need four libraries for this operation:

  • BeautifulSoup to pull data out of HTML files,
  • Requests to grab the source code from Wikipedia’s page,
  • YahooFinance to get stock data and
  • Datetime to deal with datetime objects.

If you have never installed them, copy and paste the following code:

pip install beautifulsoup4
pip install requests
pip install yfinance
pip install datetime

After that, we can import them.

Said and done:

import bs4 as bs
import requests
import yfinance as yf
import datetime

Get S&P500 Tickers from Wikipedia

Here it is:

In the first line, we visit the Wikipedia page we’re interested in.

In the second line, we create a BeautifulSoup object that parses HTML the way a web browser does.

In the third line, we find the table we’re looking for, i.e. the Wikipedia table containing S&P500 stocks data.

Note: In Wikipedia, all table contents are under the class ‘wikitable sortable’ : so we have to specify it in order to move through a table.

At this point, we create an empty list and populate it using a for loop by which we iterate through the table:

For each row (tr stands for table row), after the header row — that’s why we’re looping with [1:] — we grab the ticker and append it to the list.

We stored all symbols in the list tickers. Let’s have a look.

print(tickers)

Every symbol was imported and stored with the new line character (\n): we need to remove it. Something easy with list comprehension:

tickers = [s.replace('\n', '') for s in tickers]

Let’s check again.

print(tickers)

Much better, uh?

Import S&P500 Stocks

After setting the time period, i.e. the start date and the end date, we can easily download all stocks with yfinance.

start = datetime.datetime(2020, 1, 1)
end = datetime.datetime(2022, 1, 1)
data = yf.download(tickers, start=start, end=end)

Note: It can happen that in output you receive such a message:

Nothing deadly serious. It simply means that those stocks are not included in the dataframe.

Let’s move forward. data is a Pandas DataFrame. If we inspect it, that’s what it looks like:

It’s not beautiful as the soup we made before. But with just two lines of code it could be much better:

The basic operation is .stack() that stacks the column headers into a multiindex. This multiindex is then converted to columns (reset_index()) which then get proper names. After that, data are sorted by Symbol and Date. Finally, we set Date as the index.

Here is how the dataframe looks now:

Way better. Now, we have symbols on rows and for each date are indicated the daily values. If you scroll down, once you reach the end date, i.e. 2021–12–31, you find the second stock, followed by all others.

In case you’re going to re-use the same dataframe again and again, be sure to save it in a csv file:

df.to_csv('sp500_stocks.csv')

That’s all for this article. Hope it will be useful for many of you as it was for me. If you need clarification or you have advice, feel free to contact me:

Cheers 🍻

--

--