===============================

강의내용

File / Exception / Log Handling

예외는 예상 가능한 예외와 예상 불가능한 예외가 있음.

예외가 발생할 경우 후속 조치는 대처 필요

1) 없는 파일 호출 -> 파일 없음을 알림

2) 게임 이상 종료 -> 게임 정보 저장

Exception Handling

try ~ except 문법

try:

예외 발생 가능 코드

except <Exception Type>:

예외 발생시 대응하는 코드

for i in range(10):
    try:
        print(i, 10/i)
    except ZeroDivisionError:
        print("error")
        print("Not divided by 0")

error
Not divided by 0
1 10.0
2 5.0
3 3.3333333333333335
4 2.5
5 2.0
6 1.6666666666666667
7 1.4285714285714286
8 1.25
9 1.1111111111111112

Built-in Exception 기본적으로 제공하는 예외

Exception 이름	내용
IndexError	List의 index 범위를 넘어갈 때
NameError	존재하지 않은 변수를 호출 할 때
ZeroDivisionError	0으로 숫자를 나눌 때
ValueError	변환할 수 없는 문자/순자를 변환할 때
FileNotFoundError	존재하지 않는 파일을 호출할 때

등등.. 필요하면 만들 수도 있음

a = [1,2,3,4,5]
for i in range(10):
    try:
        print(i, 10/i)
        print(a[i])
        print(v)
    except ZeroDivisionError:
        print("error")
        print("Not divided by 0")
    except IndexError as e:
        print(e)
    except Exception as e: # 어디서 에러가 뜨는지 모르기 때문에 권장하지 않음
        print(e)

try ~ except ~ else

try:

예외 발생 가능 코드

except <Exception Type>:

예외 발생시 대응하는 코드

else:

예외가 발생하지 않을 때 동작하는 코드

try ~ except ~ finally

try:

예외 발생 가능 코드

except <Exception Type>:

예외 발생시 대응하는 코드

finally:

어쨋든 동작

for i in range(5):
    try:
        result = 10 / i
    except ZeroDivisionError:
        print("Not divided by 0")
    else:
        print(10/i)
    finally:
        # print(i, '------', result) # result 없다면서 에러뜸
        print(i,'==========')

Not divided by 0
0 ==========
10.0
1 ==========
5.0
2 ==========
3.3333333333333335
3 ==========
2.5
4 ==========

raise

- 필요에 따라 강제로 Exception을 발생

raise <Exception Type>(예외정보)

while True:
    value = input("변환할 정수 값을 입력하세요: ")
    for digit in value:
        if digit not in "0123456789":
            # raise가 에러 고의로 발생
            raise ValueError("숫자값을 입력하지 않으셨습니다.")
    print("정수값으로 변환된 숫자 -", int(value))

assert

- 특정 조건에 만족하지 않을 경우 예외 발생

assert 예외조건(True of False)

def get_binary_number(decimal_number : int):
    assert isinstance(decimal_number, int)
    return bin(decimal_number)

print(get_binary_number(10.0))

File Handling

기본적인 파일 종류로 text 파일과 binary 파일로 나눔

Binary 파일	Text 파일
- 컴퓨터만 이해할 수 있는 형태인 이진(법)형식으로 저장된 파일 - 일반적으로 메모장으로 열면 내용이 깨져 보임(메모장 해설 불가) - 엑셀파일, 워드 파일 등등	- 인간도 이해할 수 있는 형태인 문자열 형식으로 저장된 파일 - 메모장으로 열면 내용 확인 가능 - 메모장에 저장된 파일, HTML 파일, 파이썬 코드 파일 등

f = open("i_have_a_dream.txt", "r")  # 주소만 f에 연결한 것. 아직 안읽음.
contents = f.read()
print(contents)
f.close()

with open("i_have_a_dream.txt", "r") as my_file:
    contents = my_file.read()
    print(type(contents), contents)

read는 전체읽기.

readlines는 다 읽어서 줄바꾸기 마다 리스트 요소로 저장

readline은 읽을 그때그때 마다 메모리에 올려서 읽기

write 모드는 "w", "a" 가 있음. w는 덮어씌우는거 a는 그 다음에 붙혀쓰는거.

윈도우는 읽는 인코딩 포맷이 cp949일 수 있음.

f = open("count_log.txt", "w", encoding="utf8")
for i in range(1, 11):
    data = "%d번째 출입니다.\n" % i
    f.write(data)
f.close()

디렉토리

import os

try:

os.mkdir("abc")

except FileExistsError as e:

print("Already created")

로 이미 있는 경우는 try except로 처리하거나

os.path.exists("abc") # True or False

로 if문으로 처리할 수도 있음

import shutil # 은 알아서 찾아보도록

dest = os.path.join("abc", "memo.txt")

shutil.copy(source, dest)

import pathlib

cwd = pathlib.Path.cwd()

cwd.parent

cwd.parent.parents

... 필요할때마다 검색으로 찾아서 사용해라

import os
if not os.path.exists("log"):
    os.mkdir("log")
    
TARGET_FILE_PATH = os.path.join("log", "count_log.txt")
with open(TARGET_FILE_PATH, 'a', encoding="utf8") as f:
    import random, datetime
    for i in range(1, 11):
        stamp = str(datetime.datetime.now())
        value = random.random() * 100000
        log_line = stamp + "\t" + str(value) + "값이 생성되었습니다."
        f.write(log_line)

Pickle

- 파이썬의 객체를 영속화(persistence)하는 built-in 객체

- 데이터, object 등 실행중 정보를 저장 -> 불러와서 사용

- 저장해야하는 정보, 계산 결과(모델) 등 활용이 많음

import pickle

class Multiply(object):
    def __init__(self, multiplier):
        self.multiplier = multiplier
        
    def multiply(self, number):
        return number * self.multiplier
    
multiply = Multiply(5)
multiply.multiply(10)

f = open("multiply_object.pickle", "wb") # write binary
pickle.dump(multiply, f)
f.close()

f = open("multiply_object.pickle", "rb")
multiply_pickle = pickle.load(f)
multiply_pickle.multiply(5)

리스트, 클래스 등 정말 아무거나 저장이 다 가능하다

Logging Handling

- Console 화면에 출력, 파일에 남기기, DB에 남기기 등등.. 남기자!

- 로그를 분석하여 의미있는 결과를 도출할 수 있음.

- 실행시점에서 남겨야 하는 기록, 개발시점에서 남겨야하는 기록

Level	개요	예시
debug	개발시 처리기록을 남겨야하는 로그 정보를 남김	- 다음 함수로 A를 호출함 - 변수 A를 무엇으로 변경함
info	처리가 진행되는 동안의 정보를 알림	- 서버가 시작되었음 - 서버가 종료됨 - 사용자 A가 프로그램에 접속함
warning	사용자가 잘못 입력한 정보나 처리는 가능하나 원래 개발시 의도치 않는 정보가 들어왔을 때 알림	- Str입력을 기대했으나, Int가 입력됨 -> Str casting으로 처리함 - 함수에 argument로 이차원 리스트를 기대했으나 -> 일차원 리스트가 들어옴, 이차원으로 변환후 처리
error	잘못된 처리로 인해 에러가 났으나, 프로그램은 동작할 수 있음을 알림	- 파일에 기록을 해야하는데 파일이 없음 -> Exception 처리후 사용자에게 알림 - 외부서비스와 연결 불가
critical	잘못된 처리로 데이터 손실이나 더 이상 프로그램이 동작할 수 없음을 알림	- 잘못된 접근으로 해당 파일이 삭제됨 - 사용자의 의한 강제 종료

import logging

if __name__ == '__main__':
    logger = logging.getLogger("main")
    logging.basicConfig(level=logging.DEBUG) # default는 warning부터

    logger.debug("틀렸어!")
    logger.info("확인해!")
    logger.warning("조심해!")
    logger.error("에러났어!")
    logger.critical("망했다!")

기본적인 로딩방법.

설정할게 많다. 로그파일, 레벨..

설정할게 많다.. 크게 2가지 방법으로 설정함.

1) configparser- 파일에

2) argparser - 실행시점에

configparser

- 프로그램의 실행 설정을 file에 저장함

- Section, Key, Value 값의 형태로 설정된 설정 파일을 사용

- 설정파일을 Dict Type으로 호출후 사용

'example.cfg'

[SectionOne]
Status: Single
Name: Derek
Value: Yes
Age: 30
Single : True

[SectionTwo]
FavoriteColor = Green

[SectionThree]
FamilyName: Johnson

import configparser

config = configparser.ConfigParser()

config.read("example.cfg")
print(config.sections())

print(config["SectionThree"])
for key in config["SectionOne"]:
    value = config["SectionOne"][key]
    print("{0} : {1}".format(key, value))

argparser

- Console 창에서 프로그램 실행시 Setting 정보를 저장함

- 거의 모든 Console 기반 Python 프로그램 기본으로 제공

- 특수 모듈도 많이 존재하지만(TF), 일반적으로 argparse를 사용

- Command-Line Option 이라고 부름

C:\: python3 test_argparser.py -a

import argparse

parser = argparse.ArgumentParser(description="Sum two integers.")

parser.add_argument(
    "-a", "--a_value", dest="a", help="A integers", type=int, required=True
)
parser.add_argument(
    "-b", "--b_value", dest="b", help="B integers", type=int, required=True
)

args = parser.parse_args()
print(args)
print(args.a)
print(args.b)
print(args.a + args.b)

(venv) E:\Coding_test>python temp.py
usage: temp.py [-h] -a A -b B
temp.py: error: the following arguments are required: -a/--a_value, -b/--b_value

(venv) E:\Coding_test>python temp.py -a 10 -b 10

Namespace(a=10, b=10)

Logging 적용하기

'logging.conf'

[loggers]
	keys=root

	[handlers]
	keys=consoleHandler

	[formatters]
	keys=simpleFormatter

	[logger_root]
	level=DEBUG
	handlers=consoleHandler

	[handler_consoleHandler]
	class=StreamHandler
	level=DEBUG
	formatter=simpleFormatter
	args=(sys.stdout,)

	[formatter_simpleFormatter]
	format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
	datefmt=%m/%d/%Y %I:%M:%S %p

log formatter

- Log의 결과값의 format을 지정해줄 수 있음

import logging

formatter = logging.Formatter('%(asctime)s %(levelname)s %(process)d %(message)s')

2018-01-18 22:47:04,385 ERROR 4100 ERROR ocured

2018-01-18 22:47:22,385 ERROR 4439 ERROR ocured

...

Python data handling

CSV, 웹(html), XML, JSON (json 많이씀)

Comma Separate Value (CSV)

- Text파일 형태로 데이터 처리시 문장 내에 들어가 있는 "," 등에 대해 전처리 과정이 필요

import csv

seoung_nam_data = []
header = []
rownum = 0

with open("korea_foot_traffic_data.csv", "r", encoding="cp949") as p_file:
    csv_data = csv.reader(p_file)  # csv 객체를 이용해서 csv_data 읽기
    for row in csv_data:  # 읽어온 데이터를  한 줄씩 처리
        if rownum == 0:
            header = row  # 첫 번째 줄은 데이터 필드로 따로 저장
        location = row[7]
        # “행정구역”필드 데이터 추출, 한글 처리로 유니코드 데이터를 cp949로 변환
        if location.find(u"성남시") != -1:
            seoung_nam_data.append(row)
        # ”행정구역” 데이터에 성남시가 들어가 있으면 seoung_nam_data List에 추가
        rownum += 1

with open("seoung_nam_foot_traffic_data.csv", "w", encoding="utf8") as s_p_file:
    writer = csv.writer(s_p_file, delimiter="\t", quotechar="'", quoting=csv.QUOTE_ALL)
    # csv.writer를 사용해서 csv 파일 만들기 delimiter 필드 구분자
    # quotechar는 필드 각 데이터는 묶는 문자, quoting는 묶는 범위
    writer.writerow(header)  # 제목 필드 파일에 쓰기
    for row in seoung_nam_data:
        writer.writerow(row)  # seoung_nam_data에 있는 정보 list에 쓰기

웹 (WWW)

<!doctype html>

<html>

<head>

<title>Hello HTML</title>

</head>

<body>

<p>Hello World!</p>

</body>

</html>

html을 분석하는 방법..

str, regex, BeautifulSoup

정규식 (regular expression)

- 정규 표현식, regexp 또는 regex 등으로 불림

- 복잡한 문자열 패턴을 정의하는 문자 표현 공식

- 특정한 규직을 가진 문자열의 집합을 추출

010-0000-0000 ^\d{3}\-\d{4}\-\d{4}$

203.252.101.40 ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$

- 주민등록 번호, 전화번호, 도서 ISBN 등 형식이 있는 문자열을 원본 문자열로부터 추출함

- HTML 역시 tag를 사용한 일정한 형식이 존재하여 정규식으로 추출이 용이함

- 관련자료: http://www.nextree.co.kr/p4327/

- 엄청 많으니까 스스로 찾아서 하는 공부가 필요

1) 정규식 연습장으로 이동. http://www.regexr.com

2) 테스트하고 싶은 문서를 Text란에 삽입

3) 정규식을 사용해서 찾아보기

https://wikidocs.net/1642

https://docs.python.org/3.7/howto/regex.html

import re

search 한개만 찾기, findall 전체찾기

import re
import urllib.request

url = "http://goo.gl/U7mSQl"
html = urllib.request.urlopen(url)
html_contents = str(html.read())
id_results = re.findall(r"([A-Za-z0-9]+\*\*\*)", html_contents)
# findall 전체 찾기, 패턴대로 데이터 찾기

for result in id_results:
    print(result)

import urllib.request  # urllib 모듈 호출
import re

url = "http://www.google.com/googlebooks/uspto-patents-grants-text.html"  # url 값 입력
html = urllib.request.urlopen(url)  # url 열기
html_contents = str(html.read().decode("utf8"))  # html 파일 읽고, 문자열로 변환

url_list = re.findall(r"(http)(.+)(zip)", html_contents)
for url in url_list:
    print("".join(url))  # 출력된 Tuple 형태 데이터 str으로 join

import urllib.request
import re

url = "http://finance.naver.com/item/main.nhn?code=005930"
html = urllib.request.urlopen(url)
html_contents = str(html.read().decode("ms949"))
stock_results = re.findall('(\<dl class="blind"\>)([\s\S]+?)(\<\/dl\>)', html_contents)

samsung_stock = stock_results[0]  # 두 개 tuple 값중 첫번째 패턴
samsung_index = samsung_stock[1]  # 세 개의 tuple 값중 두 번째 값
# 하나의 괄호가 tuple index가 됨
index_list = re.findall("(\<dd\>)([\s\S]+?)(\<\/dd\>)", samsung_index)
print(*index_list,sep='\n')
print()

for index in index_list:
    print(index[1])  # 세 개의 tuple 값중 두 번째 값

('<dd>', '2021년 01월 22일 16시 10분 기준 장마감', '</dd>')

('<dd>', '종목명 삼성전자', '</dd>')

('<dd>', '종목코드 005930 코스피', '</dd>')

('<dd>', '현재가 86,800 전일대비 하락 1,300 마이너스 1.48 퍼센트', '</dd>')

('<dd>', '전일가 88,100', '</dd>')

('<dd>', '시가 89,000', '</dd>')

('<dd>', '고가 89,700', '</dd>')

('<dd>', '상한가 114,500', '</dd>')

('<dd>', '저가 86,800', '</dd>')

('<dd>', '하한가 61,700', '</dd>')

('<dd>', '거래량 30,430,330', '</dd>')

('<dd>', '거래대금 2,679,425백만', '</dd>')

2021년 01월 22일 16시 10분 기준 장마감

종목명 삼성전자

종목코드 005930 코스피

현재가 86,800 전일대비 하락 1,300 마이너스 1.48 퍼센트

전일가 88,100

시가 89,000

고가 89,700

상한가 114,500

저가 86,800

하한가 61,700

거래량 30,430,330

거래대금 2,679,425백만

eXtensible Markup Language

HTML 처럼 tag로 구분.

BeautifulSoup를 주로 사용해서 분석함

from bs4 import BeautifulSoup

with open("books.xml", "r", encoding="utf8") as books_file:
    books_xml = books_file.read()  # File을 String으로 읽어오기

# xml 모듈을 사용해서 데이터 분석
soup = BeautifulSoup(books_xml, "lxml")
# author가 들어간 모든 element 추출
for book_info in soup.find_all("author"):
    print(book_info)
    print(book_info.get_text())

JavaScript Object Notation

- 원래 언어인 Java Script의 데이터 객체 표현 방식

- 간결성으로 기계/인간이 모두 이해하기 편함

Python의 Dict type과 유사. 그래서 전환할 수 있다.

- json 모듈을 사용하여 순 쉽게 파싱 및 저장 가능

- 각 사이트마다 Developer API의 활용법을 찾아 사용

json_example.json

{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}

import json

with open("json_example.json", "r", encoding="utf8") as f:
    contents = f.read()
    json_data = json.loads(contents)
print(json_data)
print(type(json_data))
for employee in json_data["employees"]:
    print(employee["lastName"])

{'employees': [{'firstName': 'John', 'lastName': 'Doe'}, {'firstName': 'Anna', 'lastName': 'Smith'}, {'firstName': 'Peter', 'lastName': 'Jones'}]}
<class 'dict'>
Doe
Smith
Jones

import json

dict_data = {"Name": "Zara", "Age": 7, "Class": "First"}

with open("data.json", "w") as f:
    json.dump(dict_data, f)

================================

피어세션

크론트앱 (리눅스엔 있음)

용량이 너무 넘어가면 pandas로 하면 터짐. hadoop이나 spark, flask 테라단위

정규식 알려주는 사이트 좋다

기업 입사방식들 정보 공유

=========================

교수님 오픈세션

비밀임

==========================

조교님과의 만남

kaggle 같은 competition . baseline code 같은걸 보기 좋다. 최신 트렌드 찾는데 좋다. 에서 시작하는게 제일 좋다

캐글. 카카오아레나.

논문에 수학이 좋다. 지금 수업 6개월동안 소화하기 힘듬. 전체적인 그림을 공부하는데 이 부스트캠프가 좋을 것이다. 다 보고 본인에게 맞는 분야를 설정한다면 큰 것을 얻어간다. 초점을 이쪽에.

mlops 는 순수 엔지니어링. 도커. 어떻게 안전하게 나르느냐. devops

캐글과 격차가 너무 크다. 어케? 큰 그림을 이걸 통해 배운다. 다른 코드들도 많이 봐두고. 공통코드들. 내공이 쌓여 어느순간 흐름 탐. 그래도 대회같은거 나갈려면 기초적인건 알아야겠지.

안드로이드? 적당한 기업 수요와 내가 원하는 것과 tradeoff 해야.

석사 학사 차이? 그닥.. 그동한 내가 뭘 했다.

==========================

후기

정규식, json 까지 배울지 몰랐다.

파이썬이 드디어 끝났다. 수학부턴 좀 여유가 있을까..

'과거의 것들 > AI Tech boostcamp' 카테고리의 다른 글

AITech 학습정리-[DAY 7] 경사하강법 (0)	2021.11.26
AITech 학습정리-[DAY 6] Numpy / 벡터 / 행렬 (0)	2021.11.26
AITech 학습정리-[DAY 3] 파이썬 기초 문법 II (0)	2021.11.26
AITech 학습정리-[DAY 2] 파이썬 기초 문법 (0)	2021.11.26
AITech 학습정리-[DAY 1] 파이썬/AI 개발환경 준비하기 (0)	2021.11.26

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

AITech 학습정리-[DAY 5] 파이썬으로 데이터 다루기

강의내용

File / Exception / Log Handling

Exception Handling

File Handling

Logging Handling

configparser

argparser

Logging 적용하기

Python data handling

Comma Separate Value (CSV)

웹 (WWW)

정규식 (regular expression)

eXtensible Markup Language

JavaScript Object Notation

피어세션

교수님 오픈세션

조교님과의 만남

후기

'과거의 것들 > AI Tech boostcamp' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역