【opencv练习】青果课程表文字提取（三）-油猴中文网

李恒道 发表于 2024-12-10 06:32:36

【opencv练习】青果课程表文字提取（三）

最后简单根据x，y做一些数据整理
这里我为了图方便直接约10
```js
data = {}

def insert_lesson_data(x, y, text):
pos = int(int(x / 10) * 10)
if pos not in data:
   data = []
data.append((y, text))
```
打印之前记得排序一下
```js
for x in sorted(data.keys()):
print(f"{x}:")
data.sort(key=lambda item: item)
for y, text in data:
   print(f"x: {x} y: {y}, text: {text}")
```
效果图
!(data/attachment/forum/202412/10/063216fgplrhlqufqppp8c.png)
完整代码
```js
import logging
import cv2
import numpy as np
from paddleocr import PaddleOCR

logging.disable(logging.DEBUG)
data = {}
def insert_lesson_data(x, y, text):
pos = int(int(x / 10) * 10)
if pos not in data:
   data = []
data.append((y, text))
ocr = PaddleOCR(use_angle_cls=True, lang="ch")
originImage = cv2.imread("test.png")
image = cv2.cvtColor(originImage, cv2.COLOR_BGR2GRAY)
denoised_image = cv2.fastNlMeansDenoising(image, None, 30, 7, 21)
ret, thresh = cv2.threshold(denoised_image, 220, 255, cv2.THRESH_TRUNC)
_, binary = cv2.threshold(thresh, 128, 255, cv2.THRESH_BINARY_INV)
contours, hierarchy = cv2.findContours(binary, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
# cv2.drawContours(originImage, contours, -1, (0,0, 255), 2) 画全部
for i in range(len(contours)):
area = cv2.contourArea(contours)
x, y, w, h = cv2.boundingRect(contours)
if area > 500 and w >= 100 and w <= 888:
   top_left = (x, y)
   bottom_right = (x + w, y + h)
   roi = thresh
   result = ocr.ocr(roi, cls=False)
   text = ""
   for line in result:
         if line is not None:
            for word_info in line:
               text += word_info
   cv2.rectangle(originImage, top_left, bottom_right, (0, 0, 255), 2)
   insert_lesson_data(x, y, text)

for x in sorted(data.keys()):
print(f"{x}:")
data.sort(key=lambda item: item)
for y, text in data:
   print(f"x: {x} y: {y}, text: {text}")
cv2.imshow("originImage", originImage)
cv2.waitKey(0)
cv2.destroyAllWindows()

```

steven026 发表于 2024-12-11 06:58:03

看了下PaddleOCR没有Node版本吗{:4_113:}
之前Node的试过Tesseract、dddd感觉都不太行识别率不是很高

另外哥哥会的好多啊Py都会，爱了{:4_94:}

李恒道 发表于 2024-12-11 16:04:33

steven026 发表于 2024-12-11 06:58
看了下PaddleOCR没有Node版本吗
之前Node的试过Tesseract、dddd感觉都不太行识别率不是很高

我也不知道，大不了继续express裹一层QAQ

不是应该羡慕我学了CV吗！

Ailurus 发表于 2025-6-11 16:40:08

ggnb！转人工智能会头秃的（悲

页: [1]

油猴中文网's Archiver

【opencv练习】青果课程表文字提取（三）